US20050131976A1 - FFT operating apparatus of programmable processors and operation method thereof - Google Patents
FFT operating apparatus of programmable processors and operation method thereof Download PDFInfo
- Publication number
- US20050131976A1 US20050131976A1 US10/731,404 US73140403A US2005131976A1 US 20050131976 A1 US20050131976 A1 US 20050131976A1 US 73140403 A US73140403 A US 73140403A US 2005131976 A1 US2005131976 A1 US 2005131976A1
- Authority
- US
- United States
- Prior art keywords
- value
- group
- fft
- data
- count
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
Definitions
- the present invention relates to a fast Fourier transform (FFT) operating apparatus and an operation method thereof. More particularly, in a programmable processor which can be used in a variety of standards and enable processing of high speed telecommunication algorithms in real-time basis and also guarantee flexibility in system design, the present invention relates to a FFT operating apparatus and a method thereof for carrying out FFT operation which is the kernel function of DMT (Discrete MultiTone) and OFDM (Orthogonal Frequency Division Multiplexing) modems.
- DMT Discrete MultiTone
- OFDM Orthogonal Frequency Division Multiplexing
- fast Fourier transform is used in a variety of fields of communication systems such as the asymmetric digital subscriber line (ADSL), the wireless asynchronous transfer mode (ATM), the short distance wireless communication network, and the applications such as a matched filter, a spectrum analysis, and a radar.
- the FFT is especially required for the establishment of the OFDM, i.e., the next-generation high speed telecommunication algorithm.
- the FFT is the algorithm that transforms signal in time domain into frequency domain. Because the FFT can reduce the operations required for the Discrete Fourier Transform (DFT) significantly by using the periodicity of trigonometric function, operations can be carried out with efficiency.
- DFT Discrete Fourier Transform
- N-point DFT being divided into two N/2 DTFs
- even N - 1 ⁇ x ⁇ ( n ) ⁇ ⁇ w N nk + ⁇ n 0
- the N-point DFT is divided into several 2-point DFTs, and this process is referred to as the radix-2 DIT (Decimation-in-Time) FFT.
- radix-2 and radix4 DIT FFTs are the most frequently used for the implementation.
- the radix-2 DIT FFT is split into odd-numbered and even-numbered samples as in the formula 2, while the radix-4 DIT FFT is split into four sets. Between these two FFTs, the radix-2 DIT FFT has a simpler butterfly structure, and thus requires less number of multipliers and space. However, the number of stages increases in the radix-2 DIT FFT, and thus it consumes much more operation cycles compared to the radix-4 DIT FFT.
- the radix-4 DIT FFT can enable high speed processing, too, but it has a complicated butterfly structure and increases the number of multipliers. Also, operations for butterfly input data and addresses are complicated, which are quite hard to implement. Additionally, as the FFT having 4 n length is performed, the radix-4 DIT FFT has to be used in combination with the radix-2 DIT FFT for the FFT having a 2 n length.
- the FFT is divided into DIT (Decimation-In-Time) FFT and DIF (Decimation-In-Frequency) FFT according to whether the dividing is based on time domain or frequency domain.
- the formula 2 which is divided with respect to time domain, is categorized into the DIT FFT. If the dividing is performed with respect to X(k) in the frequency domain, it can be categorized into the DIF FFT.
- the DIT FFT In the digital signal processor, it is the DIT FFT usually used for FFT. While the DFT FFT adopts the configuration of performing addition/subtraction and then multiplication, the DIT FFT, as shown in FIG. 1 , adopts the configuration of performing multiplication and then addition/subtraction. Accordingly, for the digital signal processor based on a multiplier-accumulator, the DIT FFT is more suitable for operations.
- the DSP 56600 core is a fixed-point digital signal processor which consists of one 16 ⁇ 16 multiplier-accumulator (MAC) and one 40-bit ALU (arithmetic and logic unit), and carries out radix-2 complex FFT butterfly operation using two parallel shift instructions. Since the DSP 56600 core has the configuration of a single multiplier-accumulator, it has a wide area, however, with less operation efficiency compared with the configuration of a dual multiplier-accumulator. It takes 8N+9 cycles in order for the DSP 56600 core to perform N radix-2 complex FFT butterfly operations.
- MAC multiplier-accumulator
- ALU arithmetic and logic unit
- FIG. 2 shows another example of an operator using the DIT FFT, especially showing a CarmelTM DSP core by Infineon Technologies AG.
- the CarmelTM DSP core is a 16 bit fixed-point decimation core, which includes two multiplexers 11 , 11 ′ to select values for a data memory, two latch registers 12 , 12 ′ to store selected outputs from the multiplexers 11 , 11 ′, data bus switches 13 , 13 ′ to switch data such as result of data operation at the data memory to input to a corresponding operator in accordance with a desired operation, two registers 14 , 14 ′ storing data for input to the next-stage multiplier-accumulator, a first arithmetic unit 15 having a 16 ⁇ 16 MAC, a 40-bit ALU, and an exponenter and a shifter for a block fixed point operation, a second arithmetic unit 16 having a 16 ⁇ 16 MAC and a 40-bit ALU, and an accumulator bank 17 to accumulate and store results operated in the
- the CarmelTM DSP core which adopts a CLIW (Configurable Long Instruction Word) architecture, can carry out up to 6 operations including 2 parallel data shifts in a single cycle. Also, as the CarmelTM DSP core supports an automatic scaling mode, an overflow generated in the FFT operations can be handled without having to use an additional cycle. However, the CarmelTM DSP core has a complex hardware configuration since the CarmelTM DSP core is designed in the CLIW architecture to allow the parallel processing of the operations. To carry out N radix- 2 complex FFT butterfly operations by using the CarmelTM DSP core, 2N+2 cycles are required.
- FIG. 3 shows yet another example of an operator using the DIT FFT, especially showing a StarcoreTM SC140 operator.
- the SC140 applying a VLIW (Very Long Instruction Word) architecture includes two data memory buses 21 , 21 ′ to send/receive data to and from the data memory, 8 shifter/limiters 22 to shift or limit the operated data stored in the data register and load the data to the data memory buses 21 , 21 ′, and four 40-bit ALUs 24 , 25 , 26 , 27 .
- each of the ALUs 24 , 25 , 26 , 27 has a MAC, it is possible to carry out up to four MAC operations or ALU operations in a single cycle.
- the FFT operations are carried out in a less operation cycle than the digital signal processor having a single or dual MAC.
- the StarcoreTM SC140 has a large size and consumes lots of power due to the integration of lots of the operation components. Further, it is difficult to efficiently allot the operation components due to the data dependency and to read or write the required data in the memory during a single cycle due to the lack of the data bus. As a result, the bottleneck may occur so that the performance of the dual MAC structure does not reach to twice as much.
- Table 1 shows the comparison in the number of the butterfly operation cycle and the N-point FFT operation cycle of the Carmel DSP core and the TMS320C62x. As shown in Table 1, except for the butterfly operation cycle, additional cycles are required. In case of the Carmel DSP core, (2N/2)log 2 N cycles are needed for the butterfly operation, and in case of the TMS320C62x, (4N/2)log 2 N) cycles are needed. TABLE 1 Number of butterfly operation cycle N-point FFT Carmel DSP core 2 (2N/2)log 2 N + 5 N/4 + 10 log 2 N + 4 TMS320C62x 4 (4N/2)log 2 N + 7 log 2 N + N/4 + 9
- FIG. 4 shows an operation of a general 8-point radix-2 DIT FFT.
- N point FFT operation there are log 2 N stages and N ⁇ 1 groups. Accordingly, there are 3 stages and 7 groups in FIG. 4 , and as the number of the stages increases, the number of the butterflies in the group increases or decreases.
- the FFT operation is carried out in one stage and then in the next stage. In a stage, the operation is carried out by the group.
- C or assembly codes to implement the FFT as shown in FIG. 5 , 3 looping instructions are used for the operations of the stages, the groups, and the butterflies in each group, which may vary according to the architectures of a programmable processor and the program. Generally, 3 or 4 cycles are required to carry out the looping instruction in the digital signal processor. Assuming that L cycles are required for a single butterfly operation and M cycles are required to carry out the looping instruction, the number of the cycles to carry out the N point FFT operation can be obtained through the following formula 3. (L ⁇ N/2)log 2 N+M ⁇ (N ⁇ 1)+M log 2 N+ ⁇ [Formula 3]
- (L ⁇ N/2)log 2 N which is determined by L, may be changed according to the number of the MACs and the ALUs of the digital signal processor, and M ⁇ (N ⁇ 1)+M log 2 N, which is determined by M, may be changed according to the configuration of a program controller of the digital signal processor.
- the address of input data increases by 1. Meanwhile, when the group is altered, the address of input data of butterfly varies according the size of the group. For this, ⁇ is used to denote the number of the required cycles and the cycles required to the data shift. If the parallel processing is feasible as in the VLIW processor, the number of the additional operation cycles, except for the butterfly, may be reduced to some degree by parallel-processing diverse instructions through the assembly decoding. However, the effect of the parallel processing is not sufficient. Referring to FIG. 4 , the address modification according to the alteration of the group is described by way of example. “a” in the first butterfly ( ⁇ circle over ( 1 ) ⁇ in FIG.
- group 1 of the stage 2 is a memory address 0 and “b” is a memory address 2 .
- “a” in the second butterfly of the stage 2 ( ⁇ circle over ( 2 ) ⁇ in FIG. 4 , group 1 ) is a memory address 1
- “b” is a memory address 3 .
- “a” in the third butterfly of the stage 2 ( ⁇ circle over ( 3 ) ⁇ in FIG. 4 , group 2 ) is a memory address 4
- “b” is a memory address 6 .
- the address of the input data “a” in the group 1 increases from 0 by 1. Meanwhile, as the operation is altered from the group 1 to the group 2 , the address of “a” changes from 1 to 4. That is, as the group is altered, the address increment of the input data also changes.
- An aspect of the present invention is to provide a fast Fourier transform (FFT) operating apparatus and an operation method thereof to reduce operations cycles additionally generated in a programmable processor except for a butterfly operation.
- FFT fast Fourier transform
- a radix-2 complex FFT operation method to carry out a FFT operation in the programmable processor includes generating a start signal and applying a FFT operation signal if the FFT starts, generating an offset address of a butterfly input/output data to read a data and write an operated result in a data memory, storing the generated offset address of the butterfly input/output data in an offset register of a programmable processor, switching a data to provide the butterfly input data from the data memory and write the output data in the data memory, carrying out a butterfly operation using two multiplier-accumulators, an arithmetic and logic unit, and an exponenter, and generating a stop signal and resetting the FFT operation signal when the operation is ended.
- operation instructions SBUTTERFLY subtract butterfly
- ABUTTERFLY add butterfly
- performance can be enhanced by minimizing operation cycles generated during a looping instruction, data shift, and address calculation of butterfly input data except for the butterfly operations.
- FIG. 1 is a view showing a structure of a DIT FFT butterfly
- FIG. 2 is a view showing a configuration of the conventional Carmel DSP core operator by Infineon Technologies AG;
- FIG. 3 is a view showing a configuration of the conventional SC140 operator by StarcoreTM
- FIG. 4 is a flow graph showing an operation of a conventional 8-point radix-2 DIT FFT
- FIG. 5 is a view showing a programming architecture of a FFT using a looping instruction
- FIG. 6 is a view showing a configuration of a programmable processor for FFT according to the present invention.
- FIG. 7 is a flow graph showing an operation of a butterfly according to the present invention.
- FIG. 8 is a flow chart showing the generation of an offset address of DIT butterfly data
- FIG. 9 is a view showing a configuration of an operator carrying out the operation of FIG. 8 ;
- FIG. 10 is a view showing a configuration of a data processor carrying out the DIT butterfly operation according to the present invention.
- FIG. 11A is a view showing a configuration of a dual multiplier-accumulator having separate 2 multiplier-accumulators
- FIG. 11B is a view showing a configuration of a dual multiplier-accumulator using a 3-input adder
- FIG. 11C is a view showing a dual multiplier-accumulator carrying out functions of FIGS. 11A and 11B using a multiplexer;
- FIG. 12 is a view showing a configuration of a data bus switch of the data processor.
- FIG. 6 shows a fast Fourier transform (FFT) operating apparatus to fast operate a N point radix-2 DIT FFT operation without generating additional cycles except for butterfly operations.
- the FFT operating apparatus includes a program controller 110 , a program memory 120 , a FFT address generator 130 , an address generator 140 , a data processor 150 , a data memory 160 , and a flag register 170 .
- the program controller 110 generates a FFT start signal and controls a programmable processor.
- the program memory 120 stores an application of the programmable processor.
- the FFT address generator 130 generates an offset address of a FFT butterfly input data and an operation stop signal.
- the address generator 140 uses the offset address generated in the FFT address generator 130 to calculate an address of the data memory 160 .
- the data memory 160 stores data, and the data processor 150 uses the data stored in the data memory 160 to carry out an arithmetic and logic operation.
- the flag register 170 generates a FFT operation signal.
- the data processor 150 includes a data bus switch circuit to receive the butterfly input data from the data memory 160 and to write an output data in the data memory 160 , a butterfly operation circuit having two multiplier-accumulators to multiply and accumulate the data and one arithmetic and logic unit, an exponential operation circuit to carry out an exponential operation of the data during the butterfly operation, an input register to store data memory values, and an accumulator to store operation results and reuse the stored data for the operation.
- FIG. 7 is a flow graph of the butterfly operation according to the present invention, which shows the butterfly of FIG. 1 as a complex operation.
- the complex operation is represented as the following formula 4.
- “a” and “b” denote the butterfly input data
- “c” and “d” denote the butterfly output data
- “w” denotes a twiddle factor.
- Subscripts “r” and “i” respectively denote a real part and an imaginary part of each data.
- the program controller 110 controls a program of a conventional programmable processor. Also, the program controller 110 decodes a FFT instruction, transmits an N value from the N point FFT to the FFT address generator 130 , and generates the FFT operation start signal. The FFT address generator 130 receives the N value and the operation start signal from the program controller 110 to generate the offset address of the data.
- FIG. 8 shows a method to generate the offset address of the data in the FFT address generator 130 , which includes starting the FFT if the FFT start signal is ‘1’; initializing a group count, a loop count, and a group count max value to ‘1’, respectively, a group offset value to ‘ ⁇ 1’, a loop count max value to ‘N/2’, and an offset address value of the twiddle factor to ‘0’ when the FFT starts; calculating an address of an input data A by adding the group offset and the loop count value, and an address of an input data B by adding the group offset, the loop count, and the loop count max value; if the loop count value is not equal to the loop count max value, increasing the loop count value by 1 and resuming from calculating the addresses of the input data A, B; if the loop count value is equal to the loop count max value, initializing the loop count value to ‘1’, setting the group offset value with a value obtained by multiplying the loop count max value by 2 and adding the group offset value, and
- the loop count max value and the group count max value respectively represent the number of the butterflies and the number of the groups that are included in each of the groups and the stages. If the loop count value and the group count value respectively reach its max value, the operation carried out to a next group and stage.
- the group offset represents the address modification value when the group is altered.
- FIG. 9 shows the configuration of the FTT address generator 130 to carry out the operations in FIG. 8 .
- the FTT address generator 130 includes a logical sum logic 131 , an adder 132 , GR, WR, LCR, and GCR registers 133 , a group counter 134 , a loop counter 135 , a glue logic 136 , a first adder 137 , a second adder 137 ′, a first comparator 138 , a second comparator 138 ′, and a third comparator 138 ′′.
- the logical sum logic 131 generates an initialization signal of a register to store the loop count value and a register to store the group count value according to the start signal and a group count match signal.
- the adder 132 updates the group offset by a value obtained by multiplying the group offset and the loop count max value by 2 and adding the multiplied value.
- the GR, WR, LCR, GCR registers 133 store the group offset, the twiddle factor, the loop count max value, and the group count max value.
- the group counter 134 calculates the group count value, and the loop counter 135 calculates the loop count value.
- the glue logic 136 consists of a logic which generates a signal to initialize the group counter and the loop counter.
- the first adder 137 outputs the address of the input data A by adding the group offset and the loop counter value.
- the second adder 137 ′ outputs the address of the input data B by adding the output from the first adder 137 and the loop count max value.
- the first comparator 138 compares the loop count value and the loop count max value
- the second comparator 138 ′ compares the group counter value and the group count max value
- the third comparator 138 ′′ is input with the N value and the group count max value and compares the group count max value and the N/2 value.
- the loop counter 135 and the group counter 134 are initialized to ‘1’, and GR, WR, LCR, GCR registers 133 are initialized to ‘ ⁇ 1’, ‘0’, ‘N/2′, and ‘1’, respectively. If values of the loop counter 135 and the LCR register 133 are identical, ‘1’ is applied to the loop count match signal. If values of the group counter 134 and the GCR register 133 are identical, ‘1’ is applied to the group count match signal. The group counter 134 carries out the counting only if the loop count match signal is ‘1’. The loop counter 135 and the group counter 134 are re-initialized when the loop count match signal and the group count match signal become ‘1’, respectively.
- the GR register 133 has a load input terminal to update a GR register value and another load input terminal to initialize.
- the WR register 133 increases a WR register value by 1 if the loop count match signal is ‘1’, and is initialized to ‘0’ if the group count match signal is ‘1’.
- the WR register 133 outputs a bit-reversed value.
- the LCR register 133 carries out a 1-bit right shift if the group count match signal becomes ‘1’.
- An initial value of the LCR register 133 is N/2.
- the GCR register 133 carries out a 1-bit left shift every time the group count match signal is applied. If the GCR register value becomes N, the FFT operation stop signal is generated.
- the offset address generated in the FFT address generator 130 is input to an offset register of the programmable processor and used as an offset for a base address.
- a programmable processor which is being currently developed uses plural arithmetic and logic units to calculate the address. Hence, a final data address can be calculated by using the offset address generated in the FTT address generator 130 .
- FIG. 10 shows the configuration of the data processor 150 to efficiently carry out the FFT.
- the data processor 150 includes two multiplier-accumulators and an arithmetic and logic unit to carry out the butterfly operation, a data bus switch circuit to control data according to the operation flow, 8 input registers, and three accumulators.
- the multiplier-accumulator according to the present invention may function as two separate multiplier-accumulators or carry out a function of adding and accumulating two multiplied results.
- FIG. 11A shows a configuration of a conventional dual multiplier-accumulator having two separate multiplier-accumulators to output two accumulated results.
- FIG. 11B shows a configuration capable of accumulating sum of two multiplied results by using a 3-input adder.
- FIG. 11C shows a dual multiplier-accumulator capable of carrying out the above conventional functions by using the multiplexer according to the present invention. If a selection input of the multiplexer is ‘0’, the dual multiplier-accumulator operates as in FIG. 11A , and if a selection input is ‘1’, the dual multiplier-accumulator operates as in FIG. 11B .
- Five input registers store a r , a i , b r , b i , w r , and w i , respectively.
- Three accumulators are required to store 2 multiplier-accumulator values and one arithmetic and logic unit value.
- FIG. 12 shows the data bus switch of the data processor 150 .
- the data bus switch can be implemented using six 2 ⁇ 1 multiplexers adapted to a data bus switch of a conventional digital signal processor without having to re-design the circuit.
- the FFT operation method and a circuit to implement the FFT operation method are provided to enhance performance by minimizing the operation cycles which occur in the looping instruction, the data shift, and the address calculation of the butterfly input data in addition to the butterfly operation, in the conventional programmable processor of which performance is not enhanced through the acceleration of the butterfly operation.
- the operating apparatus of the conventional digital signal processor can be re-used by including the FFT address generator 130 and the switch circuit of the data to thereby enhance the performance and facilitate the design and the modification.
- Table 2 shows the comparison between the conventional programmable processor and the number of the FFT operation cycles together with the number of the multiplier-accumulators.
- the configuration according to the present invention does not generate additional operation cycles except for the butterfly operation.
- the 256-point FFT has performance enhanced 16% ⁇ 57%.
- the FFT operating apparatus applies less hardware to the conventional programmable processor to thereby reduce the number of the FFT operation cycles, provide design flexibility to a FFT processor which have been implemented with a conventional on-demand semiconductor chip, and allow a real-time processing of an advanced telecommunication system.
- Formula MAC DSP1620 16065 — — 1 DSP56602 8 9600 49680 — 1 DSP56303 — 9096 — — 1 TMS320C54x 8 8542 42098 — 1 TMS320C55x 5 4786 — — 2 TMS320C62x 4 4225 20815 (4N/2)log 2 N + 7log 2 N + N/4 + 9 2 TMS320C67x — 4286 20716 (2N/2)log 2 N + 23log 2 N + 6 2 Carmel DSP 2 2452 11624 (2/N)log 2 N + 5N/4 + 10log 2 N + 4 2 core Palm DSP 2 — — — 2 core Frio core 3 3176 — — 2 StarCore 1.5 — — — 4 (SC140) Configuration 2 2051 10243 (2N/2) log 2 N + 6 6 of the present invention
Landscapes
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
- Digital Transmission Methods That Use Modulated Carrier Waves (AREA)
Abstract
Description
- This application claims the benefit of Korean Patent Application No. 2002-78393 filed Dec. 10, 2003, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
- 1. Field of the Invention
- The present invention relates to a fast Fourier transform (FFT) operating apparatus and an operation method thereof. More particularly, in a programmable processor which can be used in a variety of standards and enable processing of high speed telecommunication algorithms in real-time basis and also guarantee flexibility in system design, the present invention relates to a FFT operating apparatus and a method thereof for carrying out FFT operation which is the kernel function of DMT (Discrete MultiTone) and OFDM (Orthogonal Frequency Division Multiplexing) modems.
- 2. Description of the Related Art
- Generally, fast Fourier transform (FFT) is used in a variety of fields of communication systems such as the asymmetric digital subscriber line (ADSL), the wireless asynchronous transfer mode (ATM), the short distance wireless communication network, and the applications such as a matched filter, a spectrum analysis, and a radar. The FFT is especially required for the establishment of the OFDM, i.e., the next-generation high speed telecommunication algorithm. The FFT is the algorithm that transforms signal in time domain into frequency domain. Because the FFT can reduce the operations required for the Discrete Fourier Transform (DFT) significantly by using the periodicity of trigonometric function, operations can be carried out with efficiency. The DFT can be expressed by the following formula 1:
- By re-arranging x(n) of the
formula 1 into odd-numbered and even-numbered samples, respectively, N-point DFT being divided into two N/2 DTFs can be expressed as the following formula 2: - As the
formula 2 is repeated, the N-point DFT is divided into several 2-point DFTs, and this process is referred to as the radix-2 DIT (Decimation-in-Time) FFT. - Among the methods to split the DFT of
formula 1, radix-2 and radix4 DIT FFTs are the most frequently used for the implementation. - The radix-2 DIT FFT is split into odd-numbered and even-numbered samples as in the
formula 2, while the radix-4 DIT FFT is split into four sets. Between these two FFTs, the radix-2 DIT FFT has a simpler butterfly structure, and thus requires less number of multipliers and space. However, the number of stages increases in the radix-2 DIT FFT, and thus it consumes much more operation cycles compared to the radix-4 DIT FFT. The radix-4 DIT FFT can enable high speed processing, too, but it has a complicated butterfly structure and increases the number of multipliers. Also, operations for butterfly input data and addresses are complicated, which are quite hard to implement. Additionally, as the FFT having 4n length is performed, the radix-4 DIT FFT has to be used in combination with the radix-2 DIT FFT for the FFT having a 2n length. - Further, the FFT is divided into DIT (Decimation-In-Time) FFT and DIF (Decimation-In-Frequency) FFT according to whether the dividing is based on time domain or frequency domain. The
formula 2, which is divided with respect to time domain, is categorized into the DIT FFT. If the dividing is performed with respect to X(k) in the frequency domain, it can be categorized into the DIF FFT. - In the digital signal processor, it is the DIT FFT usually used for FFT. While the DFT FFT adopts the configuration of performing addition/subtraction and then multiplication, the DIT FFT, as shown in
FIG. 1 , adopts the configuration of performing multiplication and then addition/subtraction. Accordingly, for the digital signal processor based on a multiplier-accumulator, the DIT FFT is more suitable for operations. - For example, the DSP 56600 core is a fixed-point digital signal processor which consists of one 16×16 multiplier-accumulator (MAC) and one 40-bit ALU (arithmetic and logic unit), and carries out radix-2 complex FFT butterfly operation using two parallel shift instructions. Since the DSP 56600 core has the configuration of a single multiplier-accumulator, it has a wide area, however, with less operation efficiency compared with the configuration of a dual multiplier-accumulator. It takes 8N+9 cycles in order for the DSP 56600 core to perform N radix-2 complex FFT butterfly operations.
-
FIG. 2 shows another example of an operator using the DIT FFT, especially showing a Carmel™ DSP core by Infineon Technologies AG. The Carmel™ DSP core is a 16 bit fixed-point decimation core, which includes twomultiplexers latch registers multiplexers data bus switches registers arithmetic unit 15 having a 16×16 MAC, a 40-bit ALU, and an exponenter and a shifter for a block fixed point operation, a secondarithmetic unit 16 having a 16×16 MAC and a 40-bit ALU, and anaccumulator bank 17 to accumulate and store results operated in the first and secondarithmetic unit data bus switches -
FIG. 3 shows yet another example of an operator using the DIT FFT, especially showing a Starcore™ SC140 operator. The SC140 applying a VLIW (Very Long Instruction Word) architecture includes twodata memory buses limiters 22 to shift or limit the operated data stored in the data register and load the data to thedata memory buses bit ALUs ALUs - However, the Starcore™ SC140 has a large size and consumes lots of power due to the integration of lots of the operation components. Further, it is difficult to efficiently allot the operation components due to the data dependency and to read or write the required data in the memory during a single cycle due to the lack of the data bus. As a result, the bottleneck may occur so that the performance of the dual MAC structure does not reach to twice as much.
- In case of carrying out the N complex FFT butterfly operation using the SC140, 1.5N cycles are required. The above digital signal processors focus on increasing the number of the operators to accelerate the FFT butterfly operation or adjusting the data path fit for the butterfly operation flow. However, there is a limitation to reduce the operation cycle of the butterfly with respect to the limited number of the operators.
- Assuming that two cycles are required for the butterfly operation, (N/2)log2 N butterflies are needed for the N-point FFT. Thus, if other influences are not considered, (2N/2)log2 N cycles are needed for the N-point FFT. In fact, during the FFT operation, operation cycles may be additionally generated for data shift or data address calculation.
- Table 1 shows the comparison in the number of the butterfly operation cycle and the N-point FFT operation cycle of the Carmel DSP core and the TMS320C62x. As shown in Table 1, except for the butterfly operation cycle, additional cycles are required. In case of the Carmel DSP core, (2N/2)log2 N cycles are needed for the butterfly operation, and in case of the TMS320C62x, (4N/2)log2 N) cycles are needed.
TABLE 1 Number of butterfly operation cycle N-point FFT Carmel DSP core 2 (2N/2)log2 N + 5 N/4 + 10 log2 N + 4 TMS320C62x 4 (4N/2)log2 N + 7 log2 N + N/4 + 9 -
FIG. 4 shows an operation of a general 8-point radix-2 DIT FFT. In case of the N point FFT operation, there are log2 N stages and N−1 groups. Accordingly, there are 3 stages and 7 groups inFIG. 4 , and as the number of the stages increases, the number of the butterflies in the group increases or decreases. - The FFT operation is carried out in one stage and then in the next stage. In a stage, the operation is carried out by the group. As for C or assembly codes to implement the FFT, as shown in
FIG. 5 , 3 looping instructions are used for the operations of the stages, the groups, and the butterflies in each group, which may vary according to the architectures of a programmable processor and the program. Generally, 3 or 4 cycles are required to carry out the looping instruction in the digital signal processor. Assuming that L cycles are required for a single butterfly operation and M cycles are required to carry out the looping instruction, the number of the cycles to carry out the N point FFT operation can be obtained through the followingformula 3.
(L×N/2)log2 N+M×(N−1)+M log2 N+α [Formula 3] - In
formula 3, (L×N/2)log2 N, which is determined by L, may be changed according to the number of the MACs and the ALUs of the digital signal processor, and M×(N−1)+M log2 N, which is determined by M, may be changed according to the configuration of a program controller of the digital signal processor. - In the butterfly operation in a group of a stage, the address of input data increases by 1. Meanwhile, when the group is altered, the address of input data of butterfly varies according the size of the group. For this, α is used to denote the number of the required cycles and the cycles required to the data shift. If the parallel processing is feasible as in the VLIW processor, the number of the additional operation cycles, except for the butterfly, may be reduced to some degree by parallel-processing diverse instructions through the assembly decoding. However, the effect of the parallel processing is not sufficient. Referring to
FIG. 4 , the address modification according to the alteration of the group is described by way of example. “a” in the first butterfly ({circle over (1)} inFIG. 4 , group 1) of thestage 2 is amemory address 0 and “b” is amemory address 2. “a” in the second butterfly of the stage 2 ({circle over (2)} inFIG. 4 , group 1) is amemory address 1, and “b” is amemory address 3. “a” in the third butterfly of the stage 2 ({circle over (3)} inFIG. 4 , group 2) is amemory address 4, and “b” is amemory address 6. The address of the input data “a” in thegroup 1 increases from 0 by 1. Meanwhile, as the operation is altered from thegroup 1 to thegroup 2, the address of “a” changes from 1 to 4. That is, as the group is altered, the address increment of the input data also changes. - As aforementioned, to reduce the number of the operation cycles of the N point FFT in the programmable processor such as the digital signal processor, it is required to minimize the additional operation cycles except for the butterfly operation cycles. However, since the conventional digital processors do not support the hardware structure to reduce the additional operation cycles except for the cycles required for the butterfly operations, it is difficult to reduce the number of the operation cycles.
- An aspect of the present invention is to provide a fast Fourier transform (FFT) operating apparatus and an operation method thereof to reduce operations cycles additionally generated in a programmable processor except for a butterfly operation.
- To achieve the above aspect of the present invention, a radix-2 complex FFT operation method to carry out a FFT operation in the programmable processor includes generating a start signal and applying a FFT operation signal if the FFT starts, generating an offset address of a butterfly input/output data to read a data and write an operated result in a data memory, storing the generated offset address of the butterfly input/output data in an offset register of a programmable processor, switching a data to provide the butterfly input data from the data memory and write the output data in the data memory, carrying out a butterfly operation using two multiplier-accumulators, an arithmetic and logic unit, and an exponenter, and generating a stop signal and resetting the FFT operation signal when the operation is ended. At this time, using operation instructions SBUTTERFLY (subtract butterfly) and ABUTTERFLY (add butterfly), the FFT operation apparatus carries out the FFT operation.
- According to the present invention, even in the conventional programmable processor in which performance is not enhanced through the acceleration of the butterfly operation, performance can be enhanced by minimizing operation cycles generated during a looping instruction, data shift, and address calculation of butterfly input data except for the butterfly operations.
- The above aspects and other features of the present invention will become more apparent by describing in detail a preferred embodiment thereof with reference to the attached drawings, in which:
-
FIG. 1 is a view showing a structure of a DIT FFT butterfly; -
FIG. 2 is a view showing a configuration of the conventional Carmel DSP core operator by Infineon Technologies AG; -
FIG. 3 is a view showing a configuration of the conventional SC140 operator by Starcore™; -
FIG. 4 is a flow graph showing an operation of a conventional 8-point radix-2 DIT FFT; -
FIG. 5 is a view showing a programming architecture of a FFT using a looping instruction; -
FIG. 6 is a view showing a configuration of a programmable processor for FFT according to the present invention; -
FIG. 7 is a flow graph showing an operation of a butterfly according to the present invention; -
FIG. 8 is a flow chart showing the generation of an offset address of DIT butterfly data; -
FIG. 9 is a view showing a configuration of an operator carrying out the operation ofFIG. 8 ; -
FIG. 10 is a view showing a configuration of a data processor carrying out the DIT butterfly operation according to the present invention; -
FIG. 11A is a view showing a configuration of a dual multiplier-accumulator having separate 2 multiplier-accumulators; -
FIG. 11B is a view showing a configuration of a dual multiplier-accumulator using a 3-input adder; -
FIG. 11C is a view showing a dual multiplier-accumulator carrying out functions ofFIGS. 11A and 11B using a multiplexer; and -
FIG. 12 is a view showing a configuration of a data bus switch of the data processor. - Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.
-
FIG. 6 shows a fast Fourier transform (FFT) operating apparatus to fast operate a N point radix-2 DIT FFT operation without generating additional cycles except for butterfly operations. Referring toFIG. 6 , the FFT operating apparatus includes aprogram controller 110, aprogram memory 120, aFFT address generator 130, anaddress generator 140, adata processor 150, adata memory 160, and aflag register 170. - The
program controller 110 generates a FFT start signal and controls a programmable processor. Theprogram memory 120 stores an application of the programmable processor. TheFFT address generator 130 generates an offset address of a FFT butterfly input data and an operation stop signal. Theaddress generator 140 uses the offset address generated in theFFT address generator 130 to calculate an address of thedata memory 160. Thedata memory 160 stores data, and thedata processor 150 uses the data stored in thedata memory 160 to carry out an arithmetic and logic operation. Theflag register 170 generates a FFT operation signal. - The
data processor 150 includes a data bus switch circuit to receive the butterfly input data from thedata memory 160 and to write an output data in thedata memory 160, a butterfly operation circuit having two multiplier-accumulators to multiply and accumulate the data and one arithmetic and logic unit, an exponential operation circuit to carry out an exponential operation of the data during the butterfly operation, an input register to store data memory values, and an accumulator to store operation results and reuse the stored data for the operation. -
FIG. 7 is a flow graph of the butterfly operation according to the present invention, which shows the butterfly ofFIG. 1 as a complex operation. The complex operation is represented as the followingformula 4. “a” and “b” denote the butterfly input data, “c” and “d” denote the butterfly output data, and “w” denotes a twiddle factor. Subscripts “r” and “i” respectively denote a real part and an imaginary part of each data.
c r=αr +w r b r −w i b i [Formula 4]
c i=αi +w r b i +w i b r
d r=αr −w r b r +w i b i
d i=αi −w r b i −w i b r - To operate a single complex butterfly, 6 input data are required and 4 output data are generated. As the operation is carried out with divided into 2 cycles, it is implemented using a data memory configuration capable of reading 3 input data and writing 2 output data in a single cycle. In a first cycle, two of the 4 input data are multiplied and subtracted. At this time, the operation is carried out according to an operational instruction SBUTTERFLY. In a second cycle, two of the 4 input data are multiplied and added. Also, the operation is carried out according to an operational instruction ABUTTERFLY.
- The
program controller 110 controls a program of a conventional programmable processor. Also, theprogram controller 110 decodes a FFT instruction, transmits an N value from the N point FFT to theFFT address generator 130, and generates the FFT operation start signal. TheFFT address generator 130 receives the N value and the operation start signal from theprogram controller 110 to generate the offset address of the data. -
FIG. 8 shows a method to generate the offset address of the data in the FFT address generator 130, which includes starting the FFT if the FFT start signal is ‘1’; initializing a group count, a loop count, and a group count max value to ‘1’, respectively, a group offset value to ‘−1’, a loop count max value to ‘N/2’, and an offset address value of the twiddle factor to ‘0’ when the FFT starts; calculating an address of an input data A by adding the group offset and the loop count value, and an address of an input data B by adding the group offset, the loop count, and the loop count max value; if the loop count value is not equal to the loop count max value, increasing the loop count value by 1 and resuming from calculating the addresses of the input data A, B; if the loop count value is equal to the loop count max value, initializing the loop count value to ‘1’, setting the group offset value with a value obtained by multiplying the loop count max value by 2 and adding the group offset value, and increasing the twiddle factor by 1; if the group count is not equal to the group count max value, increasing the group count by 1 and resuming from calculating the addresses of the input data A, B; if the group count value is equal to the group count max value, initializing the group count value to ‘1’, the group offset value to ‘−1’, and the twiddle factor to ‘0’, dividing the loop count max value by 2, and multiplying the group count max value by 2; if the group count max value is greater than N/2, generating the operation stop signal and ending the FFT operation; and, if the group count max value is not greater than N/2, resuming from calculating the addresses of the input data A, B. - In order to calculate the loops of the 3 stages having a butterfly operation loop, a group operation loop, and a stage operation loop, a comparison is carried out three times. The loop count max value and the group count max value respectively represent the number of the butterflies and the number of the groups that are included in each of the groups and the stages. If the loop count value and the group count value respectively reach its max value, the operation carried out to a next group and stage. The group offset represents the address modification value when the group is altered.
-
FIG. 9 shows the configuration of theFTT address generator 130 to carry out the operations inFIG. 8 . Referring toFIG. 9 , theFTT address generator 130 includes alogical sum logic 131, anadder 132, GR, WR, LCR, andGCR registers 133, agroup counter 134, aloop counter 135, aglue logic 136, afirst adder 137, asecond adder 137′, afirst comparator 138, asecond comparator 138′, and athird comparator 138″. Thelogical sum logic 131 generates an initialization signal of a register to store the loop count value and a register to store the group count value according to the start signal and a group count match signal. Theadder 132 updates the group offset by a value obtained by multiplying the group offset and the loop count max value by 2 and adding the multiplied value. The GR, WR, LCR, GCR registers 133 store the group offset, the twiddle factor, the loop count max value, and the group count max value. Thegroup counter 134 calculates the group count value, and theloop counter 135 calculates the loop count value. Theglue logic 136 consists of a logic which generates a signal to initialize the group counter and the loop counter. Thefirst adder 137 outputs the address of the input data A by adding the group offset and the loop counter value. Thesecond adder 137′ outputs the address of the input data B by adding the output from thefirst adder 137 and the loop count max value. Thefirst comparator 138 compares the loop count value and the loop count max value, thesecond comparator 138′ compares the group counter value and the group count max value, and thethird comparator 138″ is input with the N value and the group count max value and compares the group count max value and the N/2 value. - If the FTT operation start signal is applied, the
loop counter 135 and thegroup counter 134 are initialized to ‘1’, and GR, WR, LCR, GCR registers 133 are initialized to ‘−1’, ‘0’, ‘N/2′, and ‘1’, respectively. If values of theloop counter 135 and the LCR register 133 are identical, ‘1’ is applied to the loop count match signal. If values of thegroup counter 134 and the GCR register 133 are identical, ‘1’ is applied to the group count match signal. The group counter 134 carries out the counting only if the loop count match signal is ‘1’. Theloop counter 135 and thegroup counter 134 are re-initialized when the loop count match signal and the group count match signal become ‘1’, respectively. The GR register 133 has a load input terminal to update a GR register value and another load input terminal to initialize. TheWR register 133 increases a WR register value by 1 if the loop count match signal is ‘1’, and is initialized to ‘0’ if the group count match signal is ‘1’. TheWR register 133 outputs a bit-reversed value. The LCR register 133 carries out a 1-bit right shift if the group count match signal becomes ‘1’. An initial value of theLCR register 133 is N/2. The GCR register 133 carries out a 1-bit left shift every time the group count match signal is applied. If the GCR register value becomes N, the FFT operation stop signal is generated. - The offset address generated in the
FFT address generator 130 is input to an offset register of the programmable processor and used as an offset for a base address. A programmable processor which is being currently developed uses plural arithmetic and logic units to calculate the address. Hence, a final data address can be calculated by using the offset address generated in theFTT address generator 130. -
FIG. 10 shows the configuration of thedata processor 150 to efficiently carry out the FFT. Referring toFIG. 10 , thedata processor 150 includes two multiplier-accumulators and an arithmetic and logic unit to carry out the butterfly operation, a data bus switch circuit to control data according to the operation flow, 8 input registers, and three accumulators. By using four multiplexers, the multiplier-accumulator according to the present invention may function as two separate multiplier-accumulators or carry out a function of adding and accumulating two multiplied results. -
FIG. 11A shows a configuration of a conventional dual multiplier-accumulator having two separate multiplier-accumulators to output two accumulated results.FIG. 11B shows a configuration capable of accumulating sum of two multiplied results by using a 3-input adder.FIG. 11C shows a dual multiplier-accumulator capable of carrying out the above conventional functions by using the multiplexer according to the present invention. If a selection input of the multiplexer is ‘0’, the dual multiplier-accumulator operates as inFIG. 11A , and if a selection input is ‘1’, the dual multiplier-accumulator operates as inFIG. 11B . Five input registers store ar, ai, br, bi, wr, and wi, respectively. Three accumulators are required to store 2 multiplier-accumulator values and one arithmetic and logic unit value. -
FIG. 12 shows the data bus switch of thedata processor 150. The data bus switch can be implemented using six 2×1 multiplexers adapted to a data bus switch of a conventional digital signal processor without having to re-design the circuit. - As aforementioned, the FFT operation method and a circuit to implement the FFT operation method are provided to enhance performance by minimizing the operation cycles which occur in the looping instruction, the data shift, and the address calculation of the butterfly input data in addition to the butterfly operation, in the conventional programmable processor of which performance is not enhanced through the acceleration of the butterfly operation. Further, according to the present invention, the operating apparatus of the conventional digital signal processor can be re-used by including the
FFT address generator 130 and the switch circuit of the data to thereby enhance the performance and facilitate the design and the modification. - Table 2 shows the comparison between the conventional programmable processor and the number of the FFT operation cycles together with the number of the multiplier-accumulators. The configuration according to the present invention does not generate additional operation cycles except for the butterfly operation. Compared with a conventional digital signal processor having the same number of the multiplier-accumulators, the 256-point FFT has performance enhanced 16%˜57%.
- Therefore, the FFT operating apparatus according to the present invention applies less hardware to the conventional programmable processor to thereby reduce the number of the FFT operation cycles, provide design flexibility to a FFT processor which have been implemented with a conventional on-demand semiconductor chip, and allow a real-time processing of an advanced telecommunication system.
Number of butterfly number Digital signal operation of processor cycles N = 256 N = 1024 Formula MAC DSP1620 — 16065 — — 1 DSP56602 8 9600 49680 — 1 DSP56303 — 9096 — — 1 TMS320C54x 8 8542 42098 — 1 TMS320C55x 5 4786 — — 2 TMS320C62x 4 4225 20815 (4N/2)log2 N + 7log2 N + N/4 + 9 2 TMS320C67x — 4286 20716 (2N/2)log2 N + 23log2 N + 6 2 Carmel DSP 2 2452 11624 (2/N)log2 N + 5N/4 + 10log2 N + 4 2 core Palm DSP 2 — — — 2 core Frio core 3 3176 — — 2 StarCore 1.5 — — — 4 (SC140) Configuration 2 2051 10243 (2N/2) log2 N + 6 6 of the present invention - Although a few preferred embodiments of the present invention has been described, it will be understood by those skilled in the art that the present invention should not be limited to the described preferred embodiments, but various changes and modifications can be made within the spirit and scope of the present invention as defined by the appended claims.
Claims (6)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2002-0078393A KR100492091B1 (en) | 2002-12-10 | 2002-12-10 | The FFT Computation Circuits and Methods on Programmable Processors |
KR2002-78393 | 2003-12-10 |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050131976A1 true US20050131976A1 (en) | 2005-06-16 |
US7437395B2 US7437395B2 (en) | 2008-10-14 |
Family
ID=32322369
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/731,404 Active 2026-03-26 US7437395B2 (en) | 2002-12-10 | 2003-12-10 | FFT operating apparatus of programmable processors and operation method thereof |
Country Status (3)
Country | Link |
---|---|
US (1) | US7437395B2 (en) |
EP (1) | EP1429256A3 (en) |
KR (1) | KR100492091B1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060095490A1 (en) * | 2004-11-01 | 2006-05-04 | Metanoia Technologies | Method for representing complex numbers in a communication system |
US20070297524A1 (en) * | 2006-06-13 | 2007-12-27 | Ben Jones | Approach for spectrum analysis in a receiver |
US20100106759A1 (en) * | 2008-10-24 | 2010-04-29 | Freescale Semiconductor, Inc. | Methods and apparatus for reordering data |
KR101036873B1 (en) * | 2010-09-13 | 2011-05-25 | 심흥섭 | Flag based low-complexity, expandable split radix fft system |
CN102737009A (en) * | 2011-04-01 | 2012-10-17 | 中兴通讯股份有限公司 | FFT twiddle factor generation device and application method thereof |
US20150236881A1 (en) * | 2012-08-22 | 2015-08-20 | Zte Corporation | Device and method for implementing fast fourier transform/discrete fourier transform |
US20160210146A1 (en) * | 2013-09-22 | 2016-07-21 | Zte Microelectronics Technology Co., Ltd | Vector Operation Core and Vector Processor |
CN108319804A (en) * | 2018-04-17 | 2018-07-24 | 福州大学 | A kind of 8192 bases, 2 DIT ASIC circuit design methods that low-resource calls |
CN112231626A (en) * | 2020-10-19 | 2021-01-15 | 南京宁麒智能计算芯片研究院有限公司 | FFT processor |
US20240078281A1 (en) * | 2021-01-28 | 2024-03-07 | Stmicroelectronics, Inc. | Methods and devices for fast fourier transforms |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100628303B1 (en) | 2004-09-03 | 2006-09-27 | 한국전자통신연구원 | Method and apparatus of the variable points IFFT/FFT |
RU188978U1 (en) * | 2018-12-14 | 2019-04-30 | Акционерное общество Научно-производственный центр "Электронные вычислительно-информационные системы" (АО НПЦ "ЭЛВИС") | UNIFIED RECONFIGURED SCHEME OF COMMUTATION OF FAST FURIET TRANSFORMATION |
RU2700194C1 (en) * | 2018-12-14 | 2019-09-13 | Акционерное общество Научно-производственный центр "Электронные вычислительно-информационные системы" (АО НПЦ "ЭЛВИС") | Unified reconfigurable fast fourier transform switching circuit and method of its formation |
RU197098U1 (en) * | 2020-01-17 | 2020-03-30 | Акционерное общество Научно-производственный центр «Электронные вычислительно-информационные системы» (АО НПЦ «ЭЛВИС») | RECONFIGURABLE QUICK FOURIER TRANSFORMER OF EXTRA LARGE TRANSFORMATION LENGTH |
RU2730174C1 (en) * | 2020-01-17 | 2020-08-19 | Акционерное общество Научно-производственный центр "Электронные вычислительно-информационные системы" (ОАО НПЦ "ЭЛВИС") | Reconfigurable fast fourier transform computer of super-long transform length |
RU196625U1 (en) * | 2020-01-17 | 2020-03-06 | Акционерное общество Научно-производственный центр "Электронные вычислительно-информационные системы" (АО НПЦ "ЭЛВИС") | HIGH-SPEED FOURIER FAST TRANSFORMING DEVICE WITH CONFLICT-FREE, LINEAR MEMORY ACCESS |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4996661A (en) * | 1988-10-05 | 1991-02-26 | United Technologies Corporation | Single chip complex floating point numeric processor |
US5430667A (en) * | 1992-05-22 | 1995-07-04 | Nec Corporation | Hardware arrangement for fast fourier transform having improved addressing techniques |
US6304887B1 (en) * | 1997-09-12 | 2001-10-16 | Sharp Electronics Corporation | FFT-based parallel system for array processing with low latency |
US20030225807A1 (en) * | 2002-01-24 | 2003-12-04 | Efland Gregory H. | Method and system for implementing a conditional one's complement of partial address |
US20040071104A1 (en) * | 2002-07-03 | 2004-04-15 | Commasic, Inc. | Multi-mode method and apparatus for performing digital modulation and demodulation |
US7047268B2 (en) * | 2002-03-15 | 2006-05-16 | Texas Instruments Incorporated | Address generators for mapping arrays in bit reversed order |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5941940A (en) | 1997-06-30 | 1999-08-24 | Lucent Technologies Inc. | Digital signal processor architecture optimized for performing fast Fourier Transforms |
US6993547B2 (en) | 2001-05-07 | 2006-01-31 | Jaber Associates, Llc | Address generator for fast fourier transform processor |
-
2002
- 2002-12-10 KR KR10-2002-0078393A patent/KR100492091B1/en active IP Right Grant
-
2003
- 2003-12-10 EP EP03257773A patent/EP1429256A3/en not_active Ceased
- 2003-12-10 US US10/731,404 patent/US7437395B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4996661A (en) * | 1988-10-05 | 1991-02-26 | United Technologies Corporation | Single chip complex floating point numeric processor |
US5430667A (en) * | 1992-05-22 | 1995-07-04 | Nec Corporation | Hardware arrangement for fast fourier transform having improved addressing techniques |
US6304887B1 (en) * | 1997-09-12 | 2001-10-16 | Sharp Electronics Corporation | FFT-based parallel system for array processing with low latency |
US20030225807A1 (en) * | 2002-01-24 | 2003-12-04 | Efland Gregory H. | Method and system for implementing a conditional one's complement of partial address |
US7047268B2 (en) * | 2002-03-15 | 2006-05-16 | Texas Instruments Incorporated | Address generators for mapping arrays in bit reversed order |
US20040071104A1 (en) * | 2002-07-03 | 2004-04-15 | Commasic, Inc. | Multi-mode method and apparatus for performing digital modulation and demodulation |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7529789B2 (en) * | 2004-11-01 | 2009-05-05 | Metanoia Technologies, Inc. | Method for representing complex numbers in a communication system |
US20090187616A1 (en) * | 2004-11-01 | 2009-07-23 | Metanoia Technologies, Inc. | Method for Representing Complex Numbers in a Communication System |
US8145696B2 (en) | 2004-11-01 | 2012-03-27 | Metanoia Technologies, Inc. | Method for representing complex numbers in a communication system |
US20060095490A1 (en) * | 2004-11-01 | 2006-05-04 | Metanoia Technologies | Method for representing complex numbers in a communication system |
US20070297524A1 (en) * | 2006-06-13 | 2007-12-27 | Ben Jones | Approach for spectrum analysis in a receiver |
US8023575B2 (en) | 2006-06-13 | 2011-09-20 | Bandspeed, Inc. | Approach for spectrum analysis in a receiver |
US8898212B2 (en) * | 2008-10-24 | 2014-11-25 | Freescale Semiconductor, Inc | Methods and apparatus for reordering data |
US20100106759A1 (en) * | 2008-10-24 | 2010-04-29 | Freescale Semiconductor, Inc. | Methods and apparatus for reordering data |
KR101036873B1 (en) * | 2010-09-13 | 2011-05-25 | 심흥섭 | Flag based low-complexity, expandable split radix fft system |
CN102737009A (en) * | 2011-04-01 | 2012-10-17 | 中兴通讯股份有限公司 | FFT twiddle factor generation device and application method thereof |
US20150236881A1 (en) * | 2012-08-22 | 2015-08-20 | Zte Corporation | Device and method for implementing fast fourier transform/discrete fourier transform |
US9559886B2 (en) * | 2012-08-22 | 2017-01-31 | Zte Corporation | Device and method for implementing fast fourier transform/discrete fourier transform |
US20160210146A1 (en) * | 2013-09-22 | 2016-07-21 | Zte Microelectronics Technology Co., Ltd | Vector Operation Core and Vector Processor |
US9910671B2 (en) * | 2013-09-22 | 2018-03-06 | Sanechips Technology Co. Ltd. | Vector operation core and vector processor |
CN108319804A (en) * | 2018-04-17 | 2018-07-24 | 福州大学 | A kind of 8192 bases, 2 DIT ASIC circuit design methods that low-resource calls |
CN112231626A (en) * | 2020-10-19 | 2021-01-15 | 南京宁麒智能计算芯片研究院有限公司 | FFT processor |
US20240078281A1 (en) * | 2021-01-28 | 2024-03-07 | Stmicroelectronics, Inc. | Methods and devices for fast fourier transforms |
Also Published As
Publication number | Publication date |
---|---|
KR20040050540A (en) | 2004-06-16 |
KR100492091B1 (en) | 2005-06-01 |
EP1429256A2 (en) | 2004-06-16 |
EP1429256A3 (en) | 2008-05-07 |
US7437395B2 (en) | 2008-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050131976A1 (en) | FFT operating apparatus of programmable processors and operation method thereof | |
Uzun et al. | FPGA implementations of fast Fourier transforms for real-time signal and image processing | |
US9275014B2 (en) | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods | |
US7870176B2 (en) | Method of and apparatus for implementing fast orthogonal transforms of variable size | |
JP2002501253A (en) | High-speed pipelined Fleier transform processor | |
US7792892B2 (en) | Memory control method for storing operational result data with the data order changed for further operation | |
Garrido et al. | Hardware architectures for the fast Fourier transform | |
Hussain et al. | Designing fast fourier transform accelerators for orthogonal frequency-division multiplexing systems | |
WO2008132510A2 (en) | Fft processor | |
KR101162649B1 (en) | A method of and apparatus for implementing fast orthogonal transforms of variable size | |
US7653676B2 (en) | Efficient mapping of FFT to a reconfigurable parallel and pipeline data flow machine | |
JP2010016830A (en) | Computation module to compute multi-radix butterfly to be used in dtf computation | |
US20060075010A1 (en) | Fast fourier transform method and apparatus | |
US7761495B2 (en) | Fourier transform processor | |
US6728742B1 (en) | Data storage patterns for fast fourier transforms | |
US20080228845A1 (en) | Apparatus for calculating an n-point discrete fourier transform by utilizing cooley-tukey algorithm | |
US7577698B2 (en) | Fast fourier transform processor | |
JP2010016831A (en) | Device for computing various sizes of dft | |
Ma et al. | Simplified addressing scheme for mixed radix FFT algorithms | |
US9582473B1 (en) | Instruction set to enable efficient implementation of fixed point fast fourier transform (FFT) algorithms | |
Kallapu et al. | DRRA-based Reconfigurable Architecture for Mixed-Radix FFT | |
Chang et al. | Accelerating multiple precision multiplication in GPU with Kepler architecture | |
Du Pont et al. | Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption | |
Dawwd et al. | Reduced Area and Low Power Implementation of FFT/IFFT Processor. | |
Jaber et al. | A novel approach for FFT data reordering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SUNWOO, MYUNG-HOON;REEL/FRAME:015300/0145 Effective date: 20040503 |
|
AS | Assignment |
Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SAMSUNG ELECTRONICS CO., LTD.;REEL/FRAME:021493/0614 Effective date: 20080820 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |