Table of contents
4. Partial simulation map preview
1. How to get the source code
Use version vivado2019.2
How to get 1:
Click the download link (decompression password C+123456):
How to get 2:
If the download link fails, please contact the blogger via WeChat or private message.
2. Algorithm description
The main feature of the DA algorithm is that it cleverly uses the look-up table to convert the fixed-coefficient MAC operation into a look-up table operation, and its operation speed does not decrease with the increase of coefficients and the number of bits of input data. The scale has been greatly improved.
For an FIR (Finite Length Unit Impulse Response) filter, its basic structure is a segmented delay line, and the output of each segment is weighted and accumulated to obtain the output of the filter. The output y is the inner product of the input x and the coefficient h:
Expand the second part of Equation (3) and sum up separately, which is also the origin of the name "distributed algorithm", we can get:
In this way, equation (3) can be simplified to
Calculating h[n]xb[n] is to use a lookup table to implement a mapping, and then weight this mapping by the corresponding second power, and finally the output of the filter can be obtained.
Implementation structure of distributed FIR
Figure 1 shows the most direct implementation structure of the distributed FIR filter, and the dotted line is the pipeline register. For data with small bit width, the Da algorithm is not only fast, but also occupies very little chip resources.
Figure 1 Shift-add DA structure
For formula (4), each product term in parentheses represents the binary AND operation of a bit of the input variable and a constant, the plus sign represents the arithmetic sum operation, and the exponential factor weights the values in the parentheses. If a lookup table is constructed in advance, which stores all possible combinations of values in parentheses, the combination vector corresponding to all input variables (xb[N-1], xb[N-2], … ,xb[0 ]) to address the table. The table structure is shown in Table 1.
The distributed algorithm implemented in LUT is that since the size of LUT increases exponentially with the increase of N, if there are too many filter coefficients N, the scale of the lookup table is very large. To reduce the size, partial table calculations can be used. Since FIR filters are linear filters, the lower-order filter outputs can be summed, thereby defining the output of a higher-order filter. For example, a 16-input lookup table can be split into 4 parallel lookup tables, as shown in Figure 2. And so on, a larger LUT can be split into multiple smaller LUTs. If pipelining is added, this structural change will not reduce the speed, but can greatly reduce the size of the design.
Figure 2 Simplified scale DA structure
Let's start designing and implementing in FPGA.
Considering that the complexity of the look-up table of the program based on DA calculation will greatly increase with the increase of the input bit width and the filter order, here, we will filter the input of the filter under the premise of satisfying the design index. The bit width is changed to 12 bits, and the order is 16.
3. Part of the program
//input signal register reg [7:0] DIN_8b_0; reg [7:0] DIN_8b_1; reg [7:0] DIN_8b_2; reg [7:0] DIN_8b_3; reg [7:0] DIN_8b_4; reg [7:0] DIN_8b_5; reg [7:0] DIN_8b_6; reg [7:0] DIN_8b_7; reg [7:0] DIN_8b_8; reg [7:0] DIN_8b_9; reg [7:0] DIN_8b_10; reg [7:0] DIN_8b_11; reg [7:0] DIN_8b_12; reg [7:0] DIN_8b_13; reg [7:0] DIN_8b_14; //addition result register reg [25:0] temp_1_1,temp_1_2,temp_1_3,temp_1_4; reg [25:0] temp_2_1,temp_2_2; reg [25:0] temp_3; assign Dout = temp_3[25:10]; //Lookup table function, look up the bits corresponding to the multiplication of A3,A2,A1,A0 function[15:0] look_A3_A0; input [3:0] DIN; begin case(DIN) 4'b0000: look_A3_A0=16'h0; 4'b0001: look_A3_A0=16'h0; 4'b0010: look_A3_A0=16'h65; 4'b0011: look_A3_A0=16'h65; 4'b0100: look_A3_A0=16'h18f; 4'b0101: look_A3_A0=16'h18f; 4'b0110: look_A3_A0=16'h1f4; 4'b0111: look_A3_A0=16'h1f4; 4'b1000: look_A3_A0=16'h35a; 4'b1001: look_A3_A0=16'h35a; 4'b1010: look_A3_A0=16'h3bf; 4'b1011: look_A3_A0=16'h3bf; 4'b1100: look_A3_A0=16'h4e9; 4'b1101: look_A3_A0=16'h4e9; 4'b1110: look_A3_A0=16'h54e; 4'b1111: look_A3_A0=16'h54e; endcase end endfunction //Use the look-up table to find the results of the input signal, and wait until seven results always @(posedge CLK or posedge Reset) begin if(Reset) begin //0 lookup0_1 <= 0; lookup0_2 <= 0; lookup0_3 <= 0; lookup0_4 <= 0; sum0_1 <= 0; sum0_2 <= 0; sum0 <= 0; //1 lookup1_1 <= 0; lookup1_2 <= 0; lookup1_3 <= 0; lookup1_4 <= 0; sum1_1 <= 0; sum1_2 <= 0; sum1 <= 0; //2 lookup2_1 <= 0; lookup2_2 <= 0; lookup2_3 <= 0; lookup2_4 <= 0; sum2_1 <= 0; sum2_2 <= 0; sum2 <= 0; //3 lookup3_1 <= 0; lookup3_2 <= 0; lookup3_3 <= 0; lookup3_4 <= 0; sum3_1 <= 0; sum3_2 <= 0; sum3 <= 0; //4 lookup4_1 <= 0; lookup4_2 <= 0; lookup4_3 <= 0; lookup4_4 <= 0; sum4_1 <= 0; sum4_2 <= 0; sum4 <= 0; //5 lookup5_1 <= 0; lookup5_2 <= 0; lookup5_3 <= 0; lookup5_4 <= 0; sum5_1 <= 0; sum5_2 <= 0; sum5 <= 0; //6 lookup6_1 <= 0; lookup6_2 <= 0; lookup6_3 <= 0; lookup6_4 <= 0; sum6_1 <= 0; sum6_2 <= 0; sum6 <= 0; //7 lookup7_1 <= 0; lookup7_2 <= 0; lookup7_3 <= 0; lookup7_4 <= 0; sum7_1 <= 0; sum7_2 <= 0; sum7 <= 0; end else if(count_4b==15) begin .................................... //4 lookup4_1 <= look_A3_A0({DIN_8b_12[4],DIN_8b_13[4],DIN_8b_14[4],DIN[4]}); lookup4_2 <= look_A7_A4({DIN_8b_8[4],DIN_8b_9[4],DIN_8b_10[4],DIN_8b_11[4]}); lookup4_3 <= look_A7_A4({DIN_8b_7[4],DIN_8b_6[4],DIN_8b_5[4],DIN_8b_4[4]}); lookup4_4 <= look_A3_A0({DIN_8b_3[4],DIN_8b_2[4],DIN_8b_1[4],DIN_8b_0[4]}); sum4_1 <= lookup4_1 + lookup4_2; sum4_2 <= lookup4_3 + lookup4_4; sum4 <= sum4_1 + sum4_2; //5 lookup5_1 <= look_A3_A0({DIN_8b_12[5],DIN_8b_13[5],DIN_8b_14[5],DIN[5]}); lookup5_2 <= look_A7_A4({DIN_8b_8[5],DIN_8b_9[5],DIN_8b_10[5],DIN_8b_11[5]}); lookup5_3 <= look_A7_A4({DIN_8b_7[5],DIN_8b_6[5],DIN_8b_5[5],DIN_8b_4[5]}); lookup5_4 <= look_A3_A0({DIN_8b_3[5],DIN_8b_2[5],DIN_8b_1[5],DIN_8b_0[5]}); sum5_1 <= lookup5_1 + lookup5_2; sum5_2 <= lookup5_3 + lookup5_4; sum5 <= sum5_1 + sum5_2; //6 lookup6_1 <= look_A3_A0({DIN_8b_12[6],DIN_8b_13[6],DIN_8b_14[6],DIN[6]}); lookup6_2 <= look_A7_A4({DIN_8b_8[6],DIN_8b_9[6],DIN_8b_10[6],DIN_8b_11[6]}); lookup6_3 <= look_A7_A4({DIN_8b_7[6],DIN_8b_6[6],DIN_8b_5[6],DIN_8b_4[6]}); lookup6_4 <= look_A3_A0({DIN_8b_3[6],DIN_8b_2[6],DIN_8b_1[6],DIN_8b_0[6]}); sum6_1 <= lookup6_1 + lookup6_2; sum6_2 <= lookup6_3 + lookup6_4; sum6 <= sum6_1 + sum6_2; //7 lookup7_1 <= look_A3_A0({DIN_8b_12[7],DIN_8b_13[7],DIN_8b_14[7],DIN[7]}); lookup7_2 <= look_A7_A4({DIN_8b_8[7],DIN_8b_9[7],DIN_8b_10[7],DIN_8b_11[7]}); lookup7_3 <= look_A7_A4({DIN_8b_7[7],DIN_8b_6[7],DIN_8b_5[7],DIN_8b_4[7]}); lookup7_4 <= look_A3_A0({DIN_8b_3[7],DIN_8b_2[7],DIN_8b_1[7],DIN_8b_0[7]}); sum7_1 <= lookup7_1 + lookup7_2; sum7_2 <= lookup7_3 + lookup7_4; sum7 <= sum7_1 + sum7_2; end else; end
4. Partial simulation map preview
The filter coefficient results are as follows:
The combined results are as follows:
The simulation results are as follows:
01_115m