CN108255777B

CN108255777B - Embedded floating point type DSP hard core structure for FPGA

Info

Publication number: CN108255777B
Application number: CN201810056827.2A
Authority: CN
Inventors: 赵赫; 杨海钢; 黄志洪; 魏星; 李小龙
Original assignee: Institute of Electronics of CAS
Current assignee: Institute of Electronics of CAS
Priority date: 2018-01-19
Filing date: 2018-01-19
Publication date: 2021-08-06
Anticipated expiration: 2038-01-19
Also published as: CN108255777A

Abstract

The present disclosure provides an embedded floating point type DSP hardmac structure for FPGA, comprising: the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit; a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register; a second input unit including a second input register set connected to the output terminal of the multiplier unit; the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit; the ALU unit comprises an adder and a logic operation unit, is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and provides logic operation for the fixed point numbers; and an output unit. Because the processing and operation of the data are completed in the structure, the operation efficiency is obviously realized by using a soft core mode to realize floating point number operation.

Description

Embedded floating point type DSP hard core structure for FPGA

Technical Field

The present disclosure relates to the field of FPGAs, and in particular, to an embedded floating point DSP hardmac structure for an FPGA.

Background

The FPGA can be widely applied to the fields of communication, aerospace, military and the like by virtue of the advantages of self-programming, high parallelism, good flexibility and the like. Digital signal processing is an important application field, and at present, programmable digital signal processing modules are basically integrated in mainstream FPGA products in the industry. For example, Virtex-7 of Xilinx corporation contains 3600 DSP48E1 units and supports operations such as multiply-add/multiply-subtract/multiply-accumulate, Stratix-V of Altera corporation contains 532 DSP units, a single DSP IP core can be split according to application requirements to realize most functions with minimum resources and support operations such as multiply-add, multiply-subtract, multiply-accumulate, but does not support addition and accumulation. In the process of processing digital signals, an FPGA often needs to call a plurality of DSP modules to perform various mathematical operations on signals, but with the increasing of data volume, signals to be processed are also represented by original fixed-point numbers and are converted into floating-point numbers with a larger numerical range, for example, radar signals, navigation, and the like are represented by floating-point numbers.

In real life, floating point numbers have a wide application space, for example, radar signals are represented by floating point numbers, collected radar signals are sent to a computer for signal processing in the form of floating point numbers, and due to the application requirements, FPGA products of Altera and Xilinx companies provide relevant IP soft cores for floating point number operations. Taking Xilinx as an example, the company develops a related floating point number operation module IP soft core, performs IP calling through an IP Catalog function in Vivado, can support various floating point number operations, and provides operations such as exponent, logarithm, evolution and the like besides the basic operations of the floating point number, and the provided specific functions are shown in the following table 1. The method for generating the hard core circuit structure by adopting the hardware description language to carry out the algorithm modeling is feasible, is too complicated, and has long development period, so that most FPGA manufacturers adopt a traditional floating-point number operation implementation mode, namely an IP soft core mode, and realize related operations in a logic resource or DSP mode.

TABLE 1 Xilinx Floating-Point IP supported operations

The DSP module in the present FPGA product adopts a fixed-point DSP structure, and in EDA software Quartus II of Altera and EDA software Vivado of Xilinx, a logic control part in floating-point number operation is mapped to logic resources such as LUT table of FPGA, and operations such as multiplication and addition of floating-point number operation are mapped to a fixed-point multiplier and adder of DSP. Although this method is convenient, it occupies too much resources and the operation efficiency of the IP soft core is not high. Due to the application requirement of floating-point high-speed operation, Intel corporation embeds a floating-point hardmac DSP module in its latest products to improve the support of FPGAs for floating-point calculation, but no chip has been provided so far.

In order to solve the above problems, the present disclosure provides a floating-point DSP structure of a hard core, so as to improve the operation efficiency of floating-point numbers and reduce the use of logic resources in an FPGA.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

Technical problem to be solved

The present disclosure provides an embedded floating-point DSP hardcore architecture for an FPGA to at least partially solve the technical problems set forth above.

(II) technical scheme

According to one aspect of the present disclosure, there is provided an embedded floating point type DSP hardmac structure for an FPGA, comprising: the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit; a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register; a second input unit including a second input register set connected to the output terminal of the multiplier unit; the multi-path selector group unit consists of a plurality of selectors, and the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit; the ALU unit comprises an adder and a logic operation unit, wherein the adder is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and simultaneously provides logic operation for the fixed point numbers; and an output unit for outputting the operation result.

In some embodiments of the present disclosure, the ALU unit further comprises an adjusting circuit, a rounding unit, an encoding module, a detection tree module, a preliminary shifting module, and a shift correction module, wherein the adjusting circuit comprises a leading zero detection circuit and a one-bit error adjusting circuit.

In some embodiments of the present disclosure, in the floating-point number multiplication operation, the pre-adder unit of the first input unit is configured to sum an exponent part of an input floating-point number, the multiplier unit is configured to multiply a mantissa part, and the ALU unit is configured to perform adjustment, normalization, and rounding operations on the floating-point number.

In some embodiments of the present disclosure, when performing floating point number addition and subtraction operation, in the ALU unit, two input floating point numbers are respectively sent to two paths, one path of signal performs addition and subtraction operation on the two floating point numbers by using the adder, the obtained result detects the number of 0 in the mantissa part result by using the leading zero detection unit, and performs a preliminary shift and exponent adjustment, and the other path of signal is encoded and then sent to the detection tree structure, and finally generates a signal indicating whether further adjustment is needed for the preliminary shift signal, and finally obtains the result of the floating point number addition and subtraction operation.

In some embodiments of the present disclosure, the output unit includes: an output register group which provides a register unit for an adder unit in the preceding stage ALU unit, registers the calculated result in the adder, and uses the result in an accumulation operation; and the mode detector is a configurable module, and a user is used for detecting whether the output result conforms to the mode by configuring the mode in the mode detector, so that the DSP outputs specific data required by the user.

In some embodiments of the present disclosure, the multiplier unit multiplies the operands by means of a booth encoding and compresses the number of partial products, and the tree adder combined with the multiplier unit further compresses the partial products and combines with the leading zero detection circuit in the ALU unit to further modify the obtained result.

In some embodiments of the present disclosure, the multiplier cells introduce the structure of the pipeline during design.

In some embodiments of the present disclosure, the second input unit is further connected to include: the multiplexer set selects the signal OPMODE, the carry signal CARRYIN, the data input end of the port C and the configuration signal ALUMODE of the ALU operation mode.

In some embodiments of the present disclosure, the input terminal of the multiplexer bank unit is further connected to a cascade signal PCIN including the DSP result, an CARRYINSEL signal for selecting a carry input source, and a feedback signal PCOUT for outputting, and the selectors in the multiplexer bank unit are selected by corresponding gating signals OPMODE, to switch different functions and/or to change a data source input to the adder of the next stage.

In some embodiments of the present disclosure, the first input unit and/or the output unit reserves a port used when the DSPs are cascaded.

(III) advantageous effects

According to the technical scheme, the embedded floating-point DSP hard core structure for the FPGA has at least one of the following beneficial effects:

(1) because the processing and the operation of the data are completed in the structure, compared with a circuit structure which utilizes logic resources in FPGA to map floating point number adjustment and the like, the operation is more efficient, and the operation efficiency is obviously superior to that of the xlix 7 series which realizes the floating point number operation by using a soft core mode;

(2) by using a special floating-point type hard core DSP structure to carry out floating-point number operation, the consumption of logic resources in the FPGA can be reduced.

(3) Compared with a soft core implementation mode of floating point number operation, the floating point DSP hard core structure has lower power consumption on the premise of the same floating point number operation.

Drawings

Fig. 1 is a schematic diagram of an embedded floating-point DSP hardmac structure for an FPGA according to an embodiment of the present disclosure.

FIG. 2 is a schematic diagram of an embedded floating-point DSP hardmac for FPGA according to an embodiment of the present disclosure to implement floating-point number multiplication.

Fig. 3 is a diagram of an implementation structure of a floating-point number multiplication DSP according to the embodiment of the present disclosure.

FIG. 4 is a diagram illustrating operations of complement operations of mantissa multiplication according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of an embodiment of the present disclosure for implementing floating point number addition and subtraction operation by using an embedded floating point DSP hardcore of an FPGA.

FIG. 6 is a block diagram of an embodiment of a DSP architecture for floating-point number addition.

Detailed Description

The present disclosure provides an embedded floating point type DSP hardmac architecture for FPGA. For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

Certain embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.

In a first exemplary embodiment of the present disclosure, an embedded floating point type DSP hardmac architecture for an FPGA is provided. Fig. 1 is a schematic structural diagram of an embedded floating-point DSP hardmac structure for an FPGA according to a first embodiment of the present disclosure. As shown in fig. 1, the embedded floating-point DSP hardmac structure for FPGA of the present disclosure includes: the device comprises a first input unit, a multiplier unit, a second input unit, a multiplexer group unit, an ALU unit and an output unit.

The following describes each component of the embedded floating-point DSP hardmac structure for FPGA in detail.

The first input unit mainly comprises an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data A, B through a corresponding configuration bit; the floating-point multiplication is carried out by adding the exponents of the floating-point numbers by the pre-adder, and the obtained result is sent to the next stage for further operation. A multistage assembly line is introduced into the unit to segment the key path, so that the key path is shortened, and the working frequency of the DSP is ensured. Meanwhile, a port used in DSP cascade connection is reserved, more operations can be completed in a cascade connection mode, and the flexibility of the DSP is improved.

The multiplier unit is connected to the first input unit and is used for receiving input data A and B of a previous stage passing through the register. In this embodiment, the multiplier unit is a two-input multiplier unit (a × B), which performs multiplication on operands in a booth coding manner, compresses the number of partial products, and further compresses the partial products in combination with a tree-shaped adder, thereby reducing area overhead and increasing computation speed. Preferably, the multiplier unit can introduce a pipeline structure in the design process to improve the working frequency.

The second input unit comprises a second input register group connected to the output end of the multiplier unit, and also comprises a multi-channel selector group selection signal OPMODE, a carry signal CARRYIN, a data input end of a port C and a configuration signal ALUMINDE of an ALU operation mode, and the input unit is used for carrying out input register on input and simultaneously dividing a key path from the multiplier to the ALU unit to improve the working frequency.

The input end of the multiplexer group unit is connected to the output end of the second input unit and the output end of the first input unit, and simultaneously comprises a cascade signal PCIN of a DSP result, an CARRYINSEL signal for selecting a carry input source and an output feedback signal PCOUT of the DSP module.

The main unit of the ALU unit is an adder for addition, subtraction and multiplication operations for floating point numbers and fixed point numbers, and also for logical operations for fixed point numbers. In order to support floating point number operation, the ALU unit improves the structure, besides the operation structure, a leading zero detection circuit (LZD), a one-bit error adjustment circuit and a rounding unit are added, so that the ALU unit normalizes the output result to generate the floating point number output meeting the standard while finishing the operation of the floating point number.

The output unit comprises an output register group and a mode detector, the output unit is introduced into a pipeline structure through the output register group, so that the working frequency of the DSP is improved, meanwhile, the register provides a register unit for a preceding-stage adder unit, and the calculated result can be registered in the adder, so that the result can be used for accumulation operation. Meanwhile, the output unit reserves a cascaded port, so that more operations can be completed through cascading DSPs. The mode detector is a module which can be configured by a user, and the user can detect whether the output result is consistent with the mode by configuring the mode in the mode detector, so that the DSP outputs the specific data required by the user.

In implementing floating-point multiplication, the operation is as shown in FIG. 2. In order to realize the calculation process, the structure adds a pre-adder unit in the first input unit, sums the exponent parts of the input floating point numbers A and B, a multiplier unit multiplies the mantissa part, and the adjustment, normalization and rounding operations of the floating point numbers are completed by an ALU unit. The DSP architecture implementation block diagram of the algorithm is shown in fig. 3, where the pre-adder is in the first input unit, the adder after the multiplier and the adjustment circuit are in the ALU unit. The adder as a special exponent adder performs the addition operation on the exponent part of the 32-bit data of the input data, namely, the exponent part of 24-bit to 31-bit data, and the obtained result is used as the exponent part of the floating-point number multiplication.

According to the IEEE-754 protocol, the mantissa part of any normal floating point number is 23 bits, and a hidden bit with a value of 1 is not shown in the floating point number before the 23 bits, but the hidden bit needs to be supplemented to participate in the operation when participating in the floating point number operation, and in order to allow for the multiplication operation of a fixed point number of 25 × 25 bits, 0 needs to be supplemented before the hidden bit to be used as the sign bit of the mantissa, so that the mantissa meets the bit width requirement, as shown in fig. 4. The operation of complementing 0 to the mantissa is an operation in which all floating-point numbers are regarded as positive numbers, but this is not the true sign bit of the result, and therefore, in order to obtain the true sign bit, it is necessary to extract the signs of the two input floating-point multipliers and perform a logical exclusive or operation alone to obtain the true sign bit of the result. Then, the mantissa bits are multiplied to obtain the multiplication result of the mantissa part.

After the values of the exponent portion and the mantissa portion are obtained, the exponent portion and the mantissa portion need to be adjusted to normalize the related floating point numbers. A leading zero detection circuit (LZD) is required to detect the number of 0's in the mantissa part result during normalization to shift the mantissa part. After the leading 0 number value is obtained, corresponding shift operation is carried out on the mantissa result, meanwhile, the exponent value is subtracted by the leading 0 number value, and then the final sign bit, the 8-bit exponent part and the 23-bit mantissa part are combined to form the final floating point number multiplication result.

The principle of the embedded floating-point DSP hardmac for FPGA to implement floating-point addition and subtraction is shown in fig. 5, so that the structure shown in fig. 6 is adopted to operate on floating-point numbers to obtain the result of floating-point addition and subtraction. Two floating point numbers are input into an ALU unit and are respectively sent into two paths, one path of signals are subjected to addition and subtraction operation on the two floating point numbers by using an adder, the obtained result is subjected to primary shift and exponent adjustment through an LZD unit, the other path of signals are coded and then sent into a detection tree structure, and finally signals are generated to indicate whether the signals subjected to the primary shift need to be further adjusted, and the result of the addition and subtraction operation of the floating point numbers is finally obtained. The above operations are all completed in the ALU unit.

When the actual operation is carried out, the addition of the same sign number does not need to pass through the coding and detection tree unit, because the result does not generate errors under the operation condition, but the addition of the different sign or the subtraction of the same sign needs to carry out the adjustment of the result. There is a 1bit error in the process of either the opposite sign addition or the same sign subtraction. After the two input data are coded, tree detection is carried out on the two input data, and therefore whether one-bit error adjustment is needed or not is finally determined.

The floating-point type hard core DSP structure provided by the disclosure has the functions of floating-point number addition, multiplication and accumulation, adopts a standard central core international 28nm CMOS process library, has the voltage of 0.945V and the temperature of 125 ℃, utilizes a DC tool to complete the circuit realization of the floating-point type hard core DSP, and obtains the whole area of 11091 mu m through layout and wiring²。

Comparing xilinx 7 series FPGA with the same process node, the model is xc7v585tffg1157-3, under the condition of the same latency (the 6-level pipeline structure is adopted in the disclosure), calling a floating point number addition and subtraction unit, selecting speed-priority structure optimization, mapping the unit to 1 DSP48E1 and logic resources, respectively setting performance and resources as optimization targets to be compared with the structure of the disclosure, and calling a floating point number multiplication unit at the same time, wherein the comparison result is shown in the following table, and the comparison condition of the floating point number addition and multiplication is shown in the following table:

TABLE 2 comparison of the results

As can be seen from comparison, in the process of performing floating point number addition, the floating point number addition performance of the floating point type hard core DSP structure proposed by the present disclosure is 1.8 times that of the IP soft core under the performance priority condition; the floating-point number multiplication performance is 1.43 times that of the IP soft core under the performance priority condition. The data processing and operation of the floating-point hard core DSP structure provided by the disclosure are completed in the structure, and then the logic resource in FPGA is utilized to map the circuit structure such as floating point number adjustment and the like, which is not as efficient as a special circuit design. The floating-point number operation efficiency of the floating-point type hard core DSP structure provided by the disclosure is obviously superior to that of the xilinx 7 series which realizes floating-point number operation by using a soft core mode.

Certainly, the hardware structure should further include functional modules such as a power module (not shown), which can be understood by those skilled in the art, and those skilled in the art may also add corresponding functional modules according to the functional requirements, which are not described herein.

Thus, the introduction of the floating-point hardmac DSP structure is completed in the first embodiment of the present disclosure.

So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

Unless otherwise indicated, the numerical parameters set forth in the specification and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by the present disclosure. In particular, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about". Generally, the expression is meant to encompass variations of ± 10% in some embodiments, 5% in some embodiments, 1% in some embodiments, 0.5% in some embodiments by the specified amount.

Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

The use of ordinal numbers such as "first," "second," "third," etc., in the specification and claims to modify a corresponding element does not by itself connote any ordinal number of the element or any ordering of one element from another or the order of manufacture, and the use of the ordinal numbers is only used to distinguish one element having a certain name from another element having a same name.

In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. An embedded floating point type DSP hardmac architecture for FPGA, comprising:

the first input unit is composed of an input register group and a floating-point number multiplication special adder, and carries out input register or bypass selection on input data through a corresponding configuration bit;

a multiplier unit connected to the first input unit, receiving input data of a previous stage passing through a register;

a second input unit including a second input register set connected to the output terminal of the multiplier unit;

the multi-path selector group unit consists of a plurality of selectors, and the input end of the multi-path selector group unit is connected to the output end of the second input unit and the output end of the first input unit;

the ALU unit comprises an adder and a logic operation unit, wherein the adder is used for addition and subtraction and multiplication operations of floating point numbers and fixed point numbers, and simultaneously provides logic operation for the fixed point numbers;

the ALU unit also comprises an adjusting circuit, a rounding unit, a coding module, a detection tree module, a preliminary shift module and a shift correction module, wherein the adjusting circuit comprises a leading zero detection circuit and a one-bit error adjusting circuit; and

and the output unit is used for outputting the operation result.

2. The embedded floating-point DSP hardcore structure of claim 1, wherein the pre-adder unit of the first input unit is configured to sum an exponent portion of an input floating-point number, the multiplier unit is configured to multiply a mantissa portion, and the ALU unit is configured to perform the adjustment, normalization, and rounding operations of the floating-point number when performing the floating-point number multiplication operation.

3. The embedded floating-point DSP hardcore structure of claim 1, wherein when performing floating-point addition and subtraction operation, in the ALU unit, two input floating-point numbers are sent into two paths, respectively, one path of signal is added and subtracted by the adder for the two floating-point numbers, the obtained result detects the number of 0 in the mantissa partial result by the leading zero detection unit, and performs a preliminary shift and exponent adjustment, the other path of signal is encoded and sent into the detection tree structure, and finally generates a signal indicating whether further adjustment is needed for the preliminary shift signal, and finally obtains the result of the floating-point addition and subtraction operation.

4. The embedded floating point DSP hardmac structure of claim 1, the output unit comprising:

an output register group which provides a register unit for an adder unit in the preceding stage ALU unit, registers the calculated result in the adder, and uses the result in an accumulation operation;

and the mode detector is a configurable module, and a user is used for detecting whether the output result conforms to the mode by configuring the mode in the mode detector, so that the DSP outputs specific data required by the user.

5. The embedded floating point type DSP hardmac architecture of claim 1,

the multiplier unit multiplies the operands in a booth coding mode, compresses the number of partial products, further compresses the partial products by combining with a tree adder of the multiplier unit, and further corrects the obtained result by combining with a leading zero detection circuit in the ALU unit.

6. The embedded floating point type DSP hardmac architecture of claim 1,

the multiplier unit introduces the structure of the pipeline during the design process.

7. The embedded floating point type DSP hardmac architecture of claim 1,

the second input unit is further connected to a circuit including: the multiplexer set selects the signal OPMODE, the carry signal CARRYIN, the data input end of the port C and the configuration signal ALUMODE of the ALU operation mode.

8. The embedded floating point type DSP hardmac architecture of claim 1,

the input terminals of the multiplexer bank unit are further connected to a cascade signal PCIN including the result of the DSP for selecting the CARRYINSEL signal from which the carry input comes and outputting a feedback signal PCOUT, and the selectors in the multiplexer bank unit are selected by a corresponding gate signal OPMODE to switch different functions and/or change the source of data input to the adder of the next stage.

9. The embedded floating point type DSP hardmac architecture of claim 1,

the first input unit and/or the output unit reserves a port used when the DSPs are cascaded.