Nothing Special   »   [go: up one dir, main page]

CN100363885C - Multiply and accumulate device - Google Patents

Multiply and accumulate device Download PDF

Info

Publication number
CN100363885C
CN100363885C CNB2004100844834A CN200410084483A CN100363885C CN 100363885 C CN100363885 C CN 100363885C CN B2004100844834 A CNB2004100844834 A CN B2004100844834A CN 200410084483 A CN200410084483 A CN 200410084483A CN 100363885 C CN100363885 C CN 100363885C
Authority
CN
China
Prior art keywords
module
unit module
logic
logic module
multidigit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100844834A
Other languages
Chinese (zh)
Other versions
CN1632740A (en
Inventor
陈继承
刘鹏
姚庆栋
史册
王维东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2004100844834A priority Critical patent/CN100363885C/en
Publication of CN1632740A publication Critical patent/CN1632740A/en
Application granted granted Critical
Publication of CN100363885C publication Critical patent/CN100363885C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The present invention discloses a microprocessor and a computer system. The aim of the present invention is to provide a multiplying and accumulating device used for solving the problem that a processor requires the support of various multiplying and accumulating modes. The multiplying and accumulating device comprises a pre-decoding unit module, a generating unit module for partial products, a Wallace tree-type add unit module, an accumulating unit module and a final result unit module, wherein the pre-decoding unit module, the generating unit module for partial products, the Wallace tree-type add unit module, the accumulating unit module and the final result unit module are orderly connected. Compared with the prior art, the present invention has the advantages that the present invention puts forward a combined generating method for partial products, which does not need to generate BOOTH coding coefficient; the present invention can reduce one link of generating logic of the partial products, can reduce the time delay and the gate count of a generating circuit for partial products and can reduce the cost of circuit realization at the same time of guaranteeing functions; the present invention can equalize the time delay of each pipelining level and satisfy the requirement of the high operating frequency of DSP.

Description

Multiply adding up device
Technical field
The present invention relates to microprocessor and computer system, more particularly, the present invention relates to a kind of multiply adding up device that processor needs multiple multiply accumulating pattern to support that is used for solving.
Background technology
Handle in the application in signal Processing, there are a lot of operations to relate to the multiply accumulating computing, and the multiplier of multiply accumulating computing and multiplicand are being handled as signed number probably in some cases, handle as unsigned number under other situations, even may occur one for signed number, one is the situation of unsigned number, in addition, in some occasion, need to consider that the result to the multiply accumulating computing rounds off, with the precision that keeps calculating, so this has brought how to design multiply adding up device (MAC) to satisfy the requirement of various possibility situations.
Simultaneously, the frequency of operation of modern signal processing device (DSP) requires more and more higher, and the circuit time delay of multiply adding up device (MAC) is because the restriction of concrete physical property, even along with updating of semiconductor technology, also be difficult to catch up with the requirement of DSP frequency of operation, therefore the MAC circuit is divided into some clock period carry out become a kind of compromise, simultaneously this also brought the functional structure of how to divide MAC with the problem of DSP streamline coupling.
Summary of the invention
Fundamental purpose of the present invention is to overcome deficiency of the prior art, and the multiply adding up device that needs multiple multiply accumulating pattern to support in a kind of new solution processor is provided.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
The invention provides a kind of multiply adding up device, comprise: pre-decode unit module (10), partial product generation unit module (20), Wallace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50), described pre-decode unit module (10), partial product generation unit module (20), Wal lace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50) are connected in order successively;
Described pre-decode unit module (10) comprises multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) and pre-decode logic module (103), and multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) are connected to pre-decode logic module (103) respectively;
It is that multiplying still is the multiply accumulating computing that described multidigit multiply accumulating algorithm selector module (104) is used to select current computing, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing structure needs to round off;
Described square of marker bit module (105) is used to select whether current computing is square multiplication or multiply accumulating;
Described multiplication data type marker bit module (106) is used to select the data type of current multiplier and multiplicand;
The effect of described pre-decode logic module (103) is, participates in the long numeric data of multiply operation according to the indication output of MF, SQUARE, MODE signal, exports their sign pattern position simultaneously; Wherein to be used to select current computing be that multiplying still is the multiply accumulating computing to multidigit multiply accumulating algorithm function signal MF, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing needs to round off; A square of marking signal SQUARE is used to judge that current computing is square operation or multiply accumulating; Multiplication data type marking signal MODE is used to select the data type of current multiplier and multiplicand.
As a kind of improvement, described partial product generation unit module (20) comprises that sign extended logic module (201) and partial product produce logic module (202), sign extended logic module (201) and partial product produce logic module (202) and are connected, and are connected with pre-decode logic module (103) in the pre-decode unit module (10) respectively simultaneously;
The effect of described sign extended logic module (201) is, with one of long numeric data of pre-decode unit module (10) output and with the corresponding sign pattern of this long numeric data position as input, according to the sign pattern position this long numeric data is expanded the output long numeric data.
As a kind of improvement, described Wallace tree type adder unit module (30) comprises a Wallace tree type adder logic module (301), be connected in partial product with the multichannel form and produce logic module (202), and be connected in pre-decode logic module (103) with the two-way form;
Described Wallace tree type adder logic module (301) is used for the multidigit partial product result and the multidigit carry result of long-pending generation unit module (20) output of receiving unit, receives two round off marking signal and mode select signals of pre-decode unit module (10) output simultaneously; Described Wallace tree type adder unit module (30) also comprises a data type of process logic, this module according to the decision of the true and false of mode select signal final with number and carry digit.
As a kind of improvement, the described unit module that adds up (40) comprises logic module that adds up (401) and alternative logic module (402), the logic module that wherein adds up (401) is connected in Wallace tree type adder logic module (301) with the two-way form, and alternative logic module (402) is connected in pre-decode logic module (103) and the logic module that adds up (401);
The described logic module that adds up (401) is used to receive multidigit and the number and the carry digit of Wallace tree type adder unit module (30) output, receives the multidigit output data of alternative logic module (402) simultaneously, produces multidigit output result data;
Described alternative logic module (402) is used for receiving the previous operation result data of multiply adding up device and the remainder certificate of net result unit module (50) output, by the enable signal that adds up select the two one of be transported to multidigit output result data.
As a kind of improvement, described net result unit module (50) comprises a net result selection logic module (501), links to each other with the logic module that adds up (401), alternative logic module (402) and pre-decode logic module (103);
Described net result selects logic module (501) to be used for receiving to add up the multidigit accumulation result of unit module (40) output, produces final multiply adding up device operation result.
Compared with prior art, the invention has the beneficial effects as follows: in sum, example of the present invention has proposed a kind of MAC constructional device, is used to solve the multiplication and the multiply accumulating computing of various modes, the MAC constructional device is divided into five structural units of order, each structural unit is handled respectively and optimized; Propose a kind of partial product associating production method that does not need to produce the BOOTH code coefficient, partial product can have been produced logic and reduce by a link, reduced time delay and door number that partial product produces circuit; Propose a kind of Wallace of utilization and set the method for handling the computing of rounding off (computing of rounding off is preposition), it does not influence the realization of Wallace tree type addition, and can save in the net result generation unit because the extra additions module that carry is introduced reduces the cost that circuit is realized simultaneously at assurance function; Proposed the method for a kind of MAC functional device and dsp processor pipelining-stage coupling, time delay that can balanced each pipelining-stage satisfies DSP high workload frequency requirement; The method that example of the present invention proposed can be used in combination, and also can distinguish separately and to use, and can use in dsp processor, also can use in realizing towards the various circuit that need the MAC functional device.
Description of drawings
The multiply adding up device system architecture figure that Fig. 1 points to for example of the present invention.
Fig. 2 is that traditional BOOTH coding and partial product produces schematic diagram.
Fig. 3 produces schematic diagram for BOOTH coding and the partial product that example of the present invention proposes.
Fig. 4 is the specific implementation figure of example of the present invention with 16 multiplication Wallace tree type addition that is example.
Fig. 5 produces figure for the multiply adding up device net result of example of the present invention.
Fig. 6 is a kind of matching relationship figure of described multiply adding up device of example of the present invention and DSP streamline.
Embodiment
With reference to the accompanying drawings 1~6, will be described in detail the specific embodiment of the invention 1 below.
Multiply adding up device in the present embodiment comprises: pre-decode unit module 10, partial product generation unit module 20, Wallace tree type adder unit module 301, add up unit module 40 and net result unit module 501, described pre-decode unit module 10, partial product generation unit module 20, Wal lace tree type adder unit module 30, the unit module 40 that adds up select logic module 501 to be connected successively in order with net result.Pre-decode unit module 10 comprises multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, square marker bit module 105, multiplication data type marker bit module 106 and pre-decode logic module 103, and multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, square marker bit module 105, multiplication data type marker bit module 106 are connected to pre-decode logic module 103 respectively; Described partial product generation unit module 20 comprises that sign extended logic module 201 and partial product produce logic module 202, sign extended logic module 201 and partial product produce logic module 202 and are connected, and are connected with pre-decode logic module 103 in the pre-decode unit module 10 respectively simultaneously; Described Wallace tree type addition module 301 is connected in partial product with the multichannel form and produces logic module 202, and is connected in pre-decode logic module 103 with the two-way form; The described unit module 40 that adds up comprises add up logic module 401 and alternative logic module 402, the logic module that wherein adds up 401 is connected in Wal lace tree type adder unit module 301 with the two-way form, and alternative logic module 402 is connected in the pre-decode logic module 103 and the logic module 401 that adds up; Described net result selects logic module 501 to link to each other with the logic module 401 that adds up, alternative logic module 402 and pre-decode logic module 103.
Among Fig. 1:
Long numeric data A, B, two bits R, one digit number is the output of pre-decode unit module 10 according to sign_A, sign_B MAC, accumulation_en, round_en, mode.
Long numeric data A*0, A*1, A*m-2, A*m-1, A*m and sub_carry are the output of partial product generation unit module 20.
Long numeric data sum and carry are the output of Wallace tree type adder unit module 301.
Long numeric data mux_product and accu_product are the output of unit module 40 of adding up.
Long numeric data product is the output of net result unit module 501.
Among Fig. 2:
The B* long numeric data, as the input of MUX case,
Certain odd bits of 2n+1 B*,
Certain odd bits of 2n-1 B*,
Booth_encoder Booth codimg logic,
One of coefficient of 2x Booth coding,
One of coefficient of 1x Booth coding,
One of coefficient of 0x Booth coding,
The sign bit of sign Booth coding,
The partial_product_gen partial product produces logic,
The A long numeric data, partial product produces the input of logic partial_product_gen,
The PnA* long numeric data, partial product produces the output of logic partial_product_gen.
Among Fig. 3:
The B* long numeric data, as the input of MUX case,
Certain odd bits of 2n+1 B*,
Certain odd bits of 2n-1 B*,
Case MUX affair logic,
An incident of 000 MUX affair logic,
An incident of 001 MUX affair logic,
An incident of 010 MUX affair logic,
An incident of 011 MUX affair logic,
An incident of 100 MUX affair logics,
An incident of 101 MUX affair logics,
An incident of 110 MUX affair logics,
An incident of MUX affair logic,
Booth_partial_product_gen Booth coding and partial product produce associating generation logic,
The A long numeric data, Booth coding and partial product are united an input that produces logic Booth_partial_product_gen,
The PnA* long numeric data, Booth coding and partial product are united the output that produces logic Booth_partial_product_gen.
Among Fig. 4:
The concrete position of Pxy partial product, wherein x represents the identification number of partial product, 0≤x≤8, y represents this part AmassConcrete position, 0≤y≤17,
The opposite number of the concrete position of Pxy partial product, wherein x represents the identification number of partial product, 0≤x≤8, y represents the concrete position of this partial product, 0≤y≤17,
Si partial product carry result's concrete position, 0≤i≤7.
Among Fig. 5:
The accumulator long numeric data, as the input of criterion logic,
Accumulator[16:0]==17 ' b10000 criterion logic,
Accumulator[39:17], and 17 ' b0} long numeric data, one of output result of criterion logic,
Accumulator[39:16], and 16 ' b0} long numeric data, one of output result of criterion logic,
The true sensing of YES criterion logic,
The vacation of NO criterion logic is pointed to.
Among Fig. 6:
Interface processor pipelining-stage interface,
The clock processor clock,
The first order of the extendible execution level of EX1 processor,
The second level of the extendible execution level of EX2 processor,
MAC_in_EX1 MAC functional device is in the part of the first order that can expand execution level,
MAC_ia_EX2 MAC functional device can expanded the partial part of execution level,
The pre-decode unit module 10 of PARTI (10) MAC functional device,
The partial product generation unit module 20 of PART II (20) MAC functional device,
The Wallace tree type adder unit module 301 of PART III (30) MAC functional device,
The unit module 40 that adds up of PART IV (40) MAC functional device,
The net result unit module 501 of PART V (50) MAC functional device.
In essence, the effect of MAC is exactly to receive two long numeric datas (multiplier and multiplicand), finishes specific multiplication, and determines whether as requested this result and previous results added.For the long numeric data that mac device receives, the multiplication process process of signed number and unsigned number is distinguishing, so this just relates to the data type problem of multiplier and multiplicand.Simultaneously, this time the possibility of result of multiplication needs and previous result carry out addition or subtract each other to obtain final result, because the input data of elder generation's time domain the last period or frequency domain need and corresponding multiplication in the digital signal processing algorithms such as FIR, IIR, and these multiplied result add up and tire out and subtract obtaining the final operation result of current time domain or frequency domain, so multiply accumulating or take advantage of the tired function that subtracts that dsp processor is realized that DSP program is very necessary.In addition, processor particularly fixed-point processor always has the restriction of bit wide, so at multiply accumulating or take advantage of tired consideration and the balance that needs processing accuracy in the computing that subtract, therefore need roundoff function, give up the outer data bit of accuracy guarantee, to guarantee the precision of system to greatest extent.
For concrete multiplying, for obtaining best performance and minimum circuit realization, can adopt and break regular multiplication version, adopt particular algorithm or special optimizing structure.In this respect, the BOOTH encryption algorithm is encoded with every continuous three of multiplicand, obtain coefficient and the corresponding partial product of sign bit decision generation according to these codings, determine a partial product result with regard to equivalence for per two of multiplicand like this, can the partial product that multiplying is required reduce half by this method.
BOOTH coding corresponding algorithm is as shown in table 1.
For the partial product addition, can adopt Wallace tree type add structure to realize, it realizes each row vertical summation to several partial products result by 3: 2 full adders or 4: 2/5: 2 Compressor scheduling algorithms, Wallace tree type addition finally obtains two results, one is and number (sum) carry digit (carry).Can significantly reduce the number of times of partial product addition by Wallace tree type addition, with a level Four Wallace tree by 3: 2 full adder formations is example, it can receive nine partial product input vectors simultaneously, produce two output vectors (with number vector and carry digit vector) simultaneously, can significantly reduce the complexity and the time loss of partial product addition like this.
Table 1
BOOTH codimg logic table
B(2n+1,2n- 1) The BOOTH code coefficient The BOOTH coded identification
2x 1x
0x
000 0 0 1 0
001 0 1 0 0
010 0 1 0 0
011 1 0 0 0
100 1 0 0 1
101 0 1 0 1
110 0 1 0 1
111 0 0 0 1
For the matching relationship of mac device and dsp processor, the MAC constructional device is divided in a plurality of pipelining-stages and carries out and to satisfy DSP high workload frequency requirement with some clock period.
Multiply adding up device in the present embodiment comprises:
Pre-decode unit module 10, it accepts multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, the signal of square marker bit module 105 and multiplication data type marker bit module 106 is as input, wherein to be used to select current computing be that multiplying still is the multiply accumulating computing to multidigit multiply accumulating algorithm selector module 104, select to participate in the sign pattern of computing multiplier mltiplicand and multiplicand mltiplicator simultaneously, and whether current computing structure needs to round off; Square marker bit module 105 is used to select whether current computing is square multiplication or multiply accumulating; Multiplication data type marker bit module 106 is used to select the data type of current mltiplicand and mltiplicator, this paper mac device support the integer number (for example: 16.0 forms) and multiplication of fractions (for example: 1.15), in example of the present invention, the radix point that integer is counted the index certificate is targeted at back of lowest order of data, all positions of data are all before radix point like this, the radix point of fractional exponent certificate is targeted at back of most significant digit of data, data were removed most significant digit before radix point like this, all the other everybody all after radix point.
Logic in the pre-decode unit module 10 is a multiplication pre-service logic module 103, its effect is according to MF in example of the present invention, the indication output of SQUARE, MODE signal participates in the long numeric data A and the B of multiply operation, export their sign pattern position sign_A and sign_B simultaneously, its logic is as follows:
A=mltiplicand;
It is unsigned number: sign_A=1 that MF selects multiplier;
It is unsigned number: sign_A=0 that MF selects multiplier;
SQUARE puts height: B=mltiplicand;
SQUARE puts low: B=mltiplicantor;
It is unsigned number: sign_B=1 that MF selects multiplier;
It is unsigned number: sign_B=0 that MF selects multiplier;
In addition, multiplication pre-service logic module 103 is also exported multiply accumulating enable signal accumulation, the enable signal round_en that rounds off, mode select signal mode and two marking signal R that round off, the logic of R by as make decision:
The multiplication that rounds off is forbidden: R=00;
The multiplication that rounds off enables: R=10 under the multiplication of integers pattern, R=01 under the multiplication of fractions pattern;
System keeps: R=11 then;
Partial product generation unit module 20, it accepts long numeric data A and B and the corresponding symbol position sign_A and the sign_B of 10 outputs of pre-decode unit module, its sign extended logic module 201 with B and sign_B as input, according to sign pattern position sign_B B is expanded output long numeric data B* (supposition B* has the s position), its expansion logic is:
The figure place of B be even number (j=2n, n=0,1,2.., j represent the figure place of B, below all with):
Sign_B=1:B*={0,0, B}, wherein { } represents connector, promptly B expands two 0 left in most significant digit;
Sign_B=0:B*={B[j-1], B[j-1], B}, { } expression connector wherein, B[j-1] and the numerical value of most significant digit of expression B, promptly B expands two (s=j+2) in most significant digit left with its value;
The figure place of B is odd number (1,2.., j represent the figure place of B for j=2n+1, n=0):
Sign_B=1:B*={0, B}, wherein { } represents connector, promptly B expands one 0 left in most significant digit;
Sign_B=0:B*={B[j-1], B}, { } expression connector wherein, B[j-1] numerical value of most significant digit of expression B, promptly B expands one (s=j+1) in most significant digit left with its value;
Partial product produces logic module 202 and produces the partial product result in partial product generation unit module 20, and for this logic, example of the present invention has proposed a kind of partial product associating production method that does not need to produce the BOOTH code coefficient; With Fig. 2,3 is example, example of the present invention set forth the moving party of institute ratio juris and with traditional B OOTH code coefficient and the partial product difference of production method respectively.
Fig. 2 produces the schematic diagram of logic respectively for traditional B OOTH code coefficient and partial product, BOOTH code coefficient and partial product logic module be can be produced respectively and Booth_encoder and two sub-logic modules of partial_product_gen are divided into, wherein Booth_encoder is the Booth codimg logic, this tribute signal of 2n+1 position is arrived as input in its 2n-1 position with long numeric data B, produce 2x, 1x, three coefficient output identifications position such as 0x and a sign symbol output, can obtain continuous three the Booth coded message that begins and finish with odd positions of long numeric data B in this way, this codimg logic is as shown in table 1.Partial_product_gen is that partial product produces logic, it is output as input with four of Booth_encoder, import long numeric data A simultaneously, and the input signal that is passed over by Booth_encoder is as selecting signal that long numeric data A (supposition A has the k position) is handled, thereby export a partial product PnA*, concrete logic is as follows:
2x ix 0x sign PnA*
0 0 1 0 0
0 1 0 0 {0,0,A}
1 0 0 0 {0,A[k-1],A}
0 0 1 1 0
0 1 0 1 {1,1,~A}
1 0 0 1 {1,~A[k-1],~A}
Wherein ~ A represents the radix-minus-one complement of long numeric data A.
The flow process that traditional B OOTH code coefficient and partial product produce logic respectively can reduce:
Case (B*[2n+1,2n-1]) → BOOTH coding → PnA*
Fig. 3 unites the schematic diagram that produces logic for the partial product that does not need to produce the BOOTH code coefficient that example of the present invention proposes, it can be divided into case and two sub-logic modules of Booth_partial_product_gen, wherein case is the MUX logic, it arrives this tribute signal of 2n+1 position as input with the 2n-1 position of long numeric data B, produce 000,001,010,011,100,101,110, eight incidents such as 111 grades, Booth_partial_product_gen is that Booth coding and partial product produce associating generation logic, it is handled long numeric data A (supposition A has the k position) as input with eight incidents of case output, thereby export a partial product PnA*, concrete logic is as shown in table 2:
Table 2 is united generation logic corresponding tables for the partial product that does not need to produce the BOOTH code coefficient.
Table 2 BOOTH coding and partial product are united the generation logical table
case(B*[2n+1,2n-1]) PnA* sub_carray[n]
000,111: 0 0
001,010: {0,0,A} 0
011: {0,A[k-1],A} 0
100: {1,~A[k-1],~A} 1
101,110: {1,1,~A} 1
Come to the same thing by merging PnA*, as long as MUX logic case is actual output five tunnel.
The partial product that example of the present invention proposes does not need to produce the BOOTH code coefficient is united the flow process that produces logic and can be reduced:
case(B*[2n+1,2n-1])→PnA*
The method that example therefore of the present invention proposes can be omitted the BOOTH cataloged procedure, directly sets up the mapping of input long numeric data B (multiplicand) to partial product PnA*, reduces its circuit as far as possible and realize under the prerequisite of assurance function.
Partial product in partial product generation unit module 20 produces logic partial_generator also needs to produce partial product carry sub_carry as a result, the partial product that not needing of adopting that example of the present invention proposes produces the BOOTH code coefficient is united and is produced logic module and can solve this demand simultaneously, and concrete logic is also as shown in table 2.
Wallace tree type adder unit module 30 comprises a Wallace tree type addition module 301, the multidigit partial product of long-pending generation unit module 20 outputs of receiving unit is A*0 as a result, A*i, ..., A*m-2, A*m-1, (wherein the value of m is relevant with the figure place of B* for A*m, m=s/2), and multidigit carry sub_carry as a result, receive two round off marking signal R and mode select signal mode of 10 outputs of pre-decode unit module simultaneously, it comprises a Wallace_tree logic, and for this logic, example of the present invention has proposed a kind of Wallace of utilization and set the method for handling the computing of rounding off (computing of rounding off is preposition), with Fig. 4 is example, and this instructions has set forth that the present invention proposes utilizes Wallace to set to handle the implementation process of the computing of rounding off.
Fig. 4 is the Wallace tree type adder logic figure of 16 multiplication, 16 multiplication need produce nine partial products, Wallace tree is formed in two superimposition of staggering successively of these nine partial products together, if this Wallace tree is considered as determinant, from vertical direction, every row are distributed in the certain bits of various piece on long-pending by several and form, can adopt 3: 2 full adders that per 3 in these row are added up, produce a result bits and a carry digit, by this type of combination, then this Wallace tree can be added up to realize nine partial products with the level Four full adder, and it finally produces one and number and a carry digit.Before not influencing the required full adder progression of realization Wallace tree addition, put, example of the present invention is expanded two to the right at the lowest order of the 9th grade of partial product, these two can be held two marking signal R that round off just, so just the computing of rounding off can be advanceed to the Wallace tree type adder logic from the net result generation unit and handle, do not influence simultaneously the realization of Wallace tree type addition, can save like this at the net result generation unit because the extra additions module that carry brings reduces the cost that circuit is realized simultaneously at assurance function.
This Wallace tree type adder unit module 301 also comprises a data type of process logic, this module according to the decision of the true and false of mode select signal mode final with number and carry digit, its concrete logic is as follows:
Mode=1: and number and carry digit are respectively two results of Wallace tree type addition
Mode=0: two results that are respectively Wallace tree type addition with several and carry digit respectively move to left one
The unit module 40 that adds up receives multidigit and the number sum and the carry digit carry of Wallace tree type adder unit module 301 outputs, receive the previous operation result data of the mac device product of add up the enable signal accumulation_en and 501 outputs of net result unit module of 10 outputs of pre-decode unit module simultaneously, produce multidigit accumulation result data accu_product.
It comprises two sub-logic modules, wherein MUX logic module is the alternative selector switch, it receives the mac device previous operation result data product and the remainder certificate of 501 outputs of net result unit module, by the enable signal accumulation_en that adds up select the two one of be transported to multidigit output data mux_product, concrete logic is:
accumulation_en=1: mux_product=product;
accumulation_en=0: mux_product=0;
Accumulator is the logic module 401 that adds up, it receives multidigit and the number sum and the carry digit carry of 301 outputs of Wallace tree type adder unit module, receive the multidigit output data mux_product of alternative logic module 402 simultaneously, produce multidigit output result data accu_product.
Net result unit module 50 comprises a net result selection logic module 501, receives the multidigit accumulation result accu_product of unit module 4 outputs that add up, and produces final mac device operation result final_product_generator.It comprises net result and selects logic module, and the previous example of the present invention of the logical and of this module proposes to utilize Wallace to set, and to handle the method for the computing of rounding off (computing of rounding off is preposition) relevant.Fig. 5 is an example with sixteen bit multiplication or multiply accumulating, and supposes that final accumulation result is 40, sets forth its specific implementation process.
In Fig. 5, the criterion logic receives multidigit accumulating operation result data accumulator, by criterion logic accumulator[16:0]=true and false of=17 ' b10000 from two output candidates accumulator[39:17], 17 ' b0} and accumulator[39:16], select specific output among the 16 ' b0}, concrete logic is as follows:
Accumulator[16:0]==17 ' b10000 is true, the bias free that adopts for the example of the present invention computing of rounding off, then not adding the former result that the R zone bit obtains in Wallace tree type addition should be: accumulator[16:0]=17 ' h08000, this situation belongs to zone bit (accumulator[16]) and gives up for the intermediate value of even number, therefore low sixteen bit numerical value is left in the basket, and final operation result is shown in the sensing of the YES among Fig. 4;
Accumulator[16:0]==17 ' b10000 is false, the 15 accumulator[15 of former operation result that does not then add the sign that rounds off] no matter be 1 (will do carry in such cases) or 0 (will do in such cases and give up), NO points to the actual result that is depicted as this computing of rounding off among Fig. 4.
Example of the present invention has proposed the method for a kind of MAC functional device and dsp processor pipelining-stage coupling, is example with Fig. 6, and this instructions has been set forth the matching relationship of the division of MAC function, functional unit combination and DSP streamline.
Suppose that dsp processor carries out multiplication or multiply accumulating computing at EX (execute) execution level, because the physical property of MAC functional device restriction, it is difficult in a dsp processor and finishes in the clock period, therefore in example of the present invention, dsp processor has adopted expansion EX level structure, be that the EX pipelining-stage is telescopic, it moves the required clock period according to functional module and shrinks automatically.
For the MAC constructional device, because example of the present invention is divided into the functional unit of five orders with it, therefore can be based on this, set up the combination of plurality of continuous functional unit, realize crucial time delay and each pipelining-stage permissible delay analysis on matching relationship of DSP by its circuit, with each functional unit of MAC functional device and combination uniform distribution thereof in each pipelining-stage, the combination by the mac device functional unit that example of the present invention proposes is defined each functional unit being distributed with in pipelining-stage with DSP streamline matching relationship trial method and is helped each pipelining-stage time delay of balance, thereby the equilibrium that realizes dsp processor designs.Further, functional unit and pipelining-stage matching process that example of the present invention proposes can also further expand, each functional unit can be continued to be subdivided into continuous plurality of sub logic module, by the combination of the sub-logic module level inside and outside the functional unit and the trial of DSP streamline matching relationship, not only can satisfy more high workload frequency requirement of processor, and more help the equilibrium of pipelining-stage.With Fig. 5 is example, example of the present invention is according to the trial of pipelining-stage delay requirement and the functional unit combination and the DSP streamline matching relationship of target dsp processor, the MAC functional device is divided into MAC_in_EX1 and MAC_in_EX2 two parts that circuit time delay equates substantially, wherein MAC_in_EX1 comprises pre-decode unit module 10, partial product generation unit module 20,30 3 functional unit block of Wallace tree type adder unit module, is arranged in the EX1 pipelining-stage and carries out; MAC_in_EX2 comprises add up unit module 40 and 50 two functional unit block of net result unit module, is arranged in the EX2 pipelining-stage and carries out; Because MAC two parts are carried out all each pipelining-stage interface (interface) output by being latched by clock (clock) of gained result, the string that therefore can not produce the processor time delay around, simultaneously for continuous MAC computing, owing to adopt the feedback mechanism of EX2-EX1, therefore can finish twice MAC computing in two continuous clock period, thereby the characteristics of the streamline that utilizes, equivalence is monocyclic MAC computing, and then has reached the optimum matching of MAC module and dsp system framework.
At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above embodiment, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims (5)

1. multiply adding up device, it is characterized in that comprising: pre-decode unit module (10), partial product generation unit module (20), Wal lace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50), described pre-decode unit module (10), partial product generation unit module (20), Wallace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50) are connected in order successively;
Described pre-decode unit module (10) comprises multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) and pre-decode logic module (103), and multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) are connected to pre-decode logic module (103) respectively;
It is that multiplying still is the multiply accumulating computing that described multidigit multiply accumulating algorithm selector module (104) is used to select current computing, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing structure needs to round off;
Described square of marker bit module (105) is used to select whether current computing is square multiplication or multiply accumulating;
Described multiplication data type marker bit module (106) is used to select the data type of current multiplier and multiplicand;
The effect of described pre-decode logic module (103) is, participates in the long numeric data of multiply operation according to the indication output of MF, SQUARE, MODE signal, exports their sign pattern position simultaneously; Wherein to be used to select current computing be that multiplying still is the multiply accumulating computing to multidigit multiply accumulating algorithm function signal MF, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing needs to round off; A square of marking signal SQUARE is used to judge that current computing is square operation or multiply accumulating; Multiplication data type marking signal MODE is used to select the data type of current multiplier and multiplicand.
2. multiply adding up device according to claim 1, it is characterized in that, described partial product generation unit module (20) comprises that sign extended logic module (201) and partial product produce logic module (202), sign extended logic module (201) and partial product produce logic module (202) and are connected, and are connected with pre-decode logic module (103) in the pre-decode unit module (10) respectively simultaneously;
The effect of described sign extended logic module (201) is, with one of long numeric data of pre-decode unit module (10) output and with the corresponding sign pattern of this long numeric data position as input, according to the sign pattern position this long numeric data is expanded the output long numeric data.
3. multiply adding up device according to claim 2, it is characterized in that, described Wallace tree type adder unit module (30) comprises a Wallace tree type adder logic module (301), be connected in partial product with the multichannel form and produce logic module (202), and be connected in pre-decode logic module (103) with the two-way form;
Described Wallace tree type adder logic module (301) is used for the multidigit partial product result and the multidigit carry result of long-pending generation unit module (20) output of receiving unit, receives two round off marking signal and mode select signals of pre-decode unit module (10) output simultaneously; Described Wallace tree type adder unit module (30) also comprises a data type of process logic, this module according to the decision of the true and false of mode select signal final with number and carry digit.
4. multiply adding up device according to claim 3, it is characterized in that, the described unit module that adds up (40) comprises logic module that adds up (401) and alternative logic module (402), the logic module that wherein adds up (401) is connected in Wallace tree type adder logic module (301) with the two-way form, and alternative logic module (402) is connected in pre-decode logic module (103) and the logic module that adds up (401);
The described logic module that adds up (401) is used to receive multidigit and the number and the carry digit of Wallace tree type adder unit module (30) output, receives the multidigit output data of alternative logic module (402) simultaneously, produces multidigit output result data;
Described alternative logic module (402) is used for receiving the previous operation result data of multiply adding up device and the remainder certificate of net result unit module (50) output, by the enable signal that adds up select the two one of be transported to multidigit output result data.
5. multiply adding up device according to claim 4, it is characterized in that, described net result unit module (50) comprises a net result selection logic module (501), links to each other with the logic module that adds up (401), alternative logic module (402) and pre-decode logic module (103);
Described net result selects logic module (501) to be used for receiving to add up the multidigit accumulation result of unit module (40) output, produces final multiply adding up device operation result.
CNB2004100844834A 2004-11-19 2004-11-19 Multiply and accumulate device Expired - Fee Related CN100363885C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100844834A CN100363885C (en) 2004-11-19 2004-11-19 Multiply and accumulate device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100844834A CN100363885C (en) 2004-11-19 2004-11-19 Multiply and accumulate device

Publications (2)

Publication Number Publication Date
CN1632740A CN1632740A (en) 2005-06-29
CN100363885C true CN100363885C (en) 2008-01-23

Family

ID=34847351

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100844834A Expired - Fee Related CN100363885C (en) 2004-11-19 2004-11-19 Multiply and accumulate device

Country Status (1)

Country Link
CN (1) CN100363885C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100465877C (en) * 2006-12-01 2009-03-04 浙江大学 High speed split multiply accumulator apparatus
CN103677739B (en) * 2013-11-28 2016-08-17 中国航天科技集团公司第九研究院第七七一研究所 A kind of configurable multiply accumulating arithmetic element and composition thereof multiply accumulating computing array
CN103984520A (en) * 2014-04-22 2014-08-13 浙江大学 Self-adjusting multiply accumulation device for lossless audio decoding algorithm
CN106897046B (en) * 2017-01-24 2019-04-23 青岛专用集成电路设计工程技术研究中心 A kind of fixed-point multiply-accumulator
CN108108150B (en) * 2017-12-19 2021-11-16 云知声智能科技股份有限公司 Multiply-accumulate operation method and device
CN109634556B (en) * 2018-11-06 2021-04-23 极芯通讯技术(南京)有限公司 Multiply-accumulator and accumulation output method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5777915A (en) * 1993-05-21 1998-07-07 Deutsche Itt Industries Gmbh Multiplier apparatus and method for real or complex numbers
WO2001048595A1 (en) * 1999-12-23 2001-07-05 Intel Corporation Processing multiply-accumulate operations in a single cycle
WO2001063398A2 (en) * 2000-02-26 2001-08-30 Qualcomm Incorporated Digital signal processor with coupled multiply-accumulate units
US20020194240A1 (en) * 2001-06-04 2002-12-19 Intel Corporation Floating point multiply accumulator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5777915A (en) * 1993-05-21 1998-07-07 Deutsche Itt Industries Gmbh Multiplier apparatus and method for real or complex numbers
WO2001048595A1 (en) * 1999-12-23 2001-07-05 Intel Corporation Processing multiply-accumulate operations in a single cycle
WO2001063398A2 (en) * 2000-02-26 2001-08-30 Qualcomm Incorporated Digital signal processor with coupled multiply-accumulate units
US20020194240A1 (en) * 2001-06-04 2002-12-19 Intel Corporation Floating point multiply accumulator

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
16×16位带符号无符号基于RTL级实现的可综合的高速乘法器. 石碧,程伟综,何晓雄.电子工程师,第29卷第6期. 2003 *
16×16位高速低功耗并行乘法器的实现. 徐锋,邵丙铣.微电子学,第33卷第1期. 2003 *
一种双精度浮点乘法器的设计. 何晶,韩月秋.微电子学,第33卷第4期. 2003 *
一种新型DSP指令结构及数据通道. 姜小波,陈杰,仇玉林.电子器件,第27卷第1期. 2004 *
基于重构技术的并行乘法累加器结构. 李莺,陈杰.微电子学与计算机,第21卷第3期. 2004 *

Also Published As

Publication number Publication date
CN1632740A (en) 2005-06-29

Similar Documents

Publication Publication Date Title
CN101082860A (en) Multiply adding up device
Swartzlander et al. Computer arithmetic
CN107168678B (en) Multiply-add computing device and floating-point multiply-add computing method
CN106126189B (en) Method in microprocessor
CN101847087B (en) Reconfigurable transverse summing network structure for supporting fixed and floating points
CN101916177B (en) Configurable multi-precision fixed point multiplying and adding device
US7840629B2 (en) Methods and apparatus for providing a booth multiplier
CN106951211A (en) A kind of restructural fixed and floating general purpose multipliers
CN102520906A (en) Vector dot product accumulating network supporting reconfigurable fixed floating point and configurable vector length
US6704762B1 (en) Multiplier and arithmetic unit for calculating sum of product
CN110413254B (en) Data processor, method, chip and electronic equipment
CN112540743B (en) Reconfigurable processor-oriented signed multiply accumulator and method
CN100465877C (en) High speed split multiply accumulator apparatus
CN100363885C (en) Multiply and accumulate device
CN110688086A (en) Reconfigurable integer-floating point adder
CN116661734B (en) Low-precision multiply-add operator supporting multiple inputs and multiple formats
CN101840324B (en) 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism
CN102004627B (en) Multiplication rounding implementation method and device
Sharma et al. Modified booth multiplier using wallace structure and efficient carry select adder
CN103699729B (en) Modulus multiplier
CN109960532A (en) Method and device for front stage operation
CN101257483A (en) Configurable processor for wireless communication system baseband signal process
US7720902B2 (en) Methods and apparatus for providing a reduction array
Brown et al. Using internal redundant representations and limited bypass to support pipelined adders and register files
CN102929575A (en) Modular multiplier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080123

Termination date: 20101119