Nothing Special   »   [go: up one dir, main page]

CN109634558A - Programmable mixed-precision arithmetic element - Google Patents

Programmable mixed-precision arithmetic element Download PDF

Info

Publication number
CN109634558A
CN109634558A CN201811514918.2A CN201811514918A CN109634558A CN 109634558 A CN109634558 A CN 109634558A CN 201811514918 A CN201811514918 A CN 201811514918A CN 109634558 A CN109634558 A CN 109634558A
Authority
CN
China
Prior art keywords
precision
extended
adder
numerical value
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811514918.2A
Other languages
Chinese (zh)
Other versions
CN109634558B (en
Inventor
刘彦
赵立东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suiyuan Technology Co ltd
Original Assignee
Shanghai Suiyuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Suiyuan Technology Co Ltd filed Critical Shanghai Suiyuan Technology Co Ltd
Priority to CN201811514918.2A priority Critical patent/CN109634558B/en
Publication of CN109634558A publication Critical patent/CN109634558A/en
Application granted granted Critical
Publication of CN109634558B publication Critical patent/CN109634558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/505Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3812Devices capable of handling different types of numbers
    • G06F2207/3824Accepting both fixed-point and floating-point numbers

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

This application provides a kind of programmable mixed-precision arithmetic elements, the floating-point of a variety of precision or fixed point can be supported to multiply and/or add operation, the low precision operations of multipath concurrence not only may be implemented, but also can integrally realize a high-precision operation, therefore, there is higher energy efficiency ratio.

Description

Programmable mixed precision arithmetic unit
Technical Field
The application relates to the field of electronic information, in particular to a programmable hybrid precision arithmetic unit.
Background
Deep neural networks are widely applied in the field of artificial intelligence, and application scenarios thereof can be roughly divided into two types, namely Training (Training) and Inference (Inference). The inference algorithm has relatively low requirement on the operation precision, and 8-bit and 16-bit fixed point precisions are mostly used; most training algorithms require 16-bit or 32-bit floating point precision.
The existing arithmetic unit only supports 8-bit or 16-bit fixed point arithmetic and is only suitable for inference; or floating point operation is supported, the method is suitable for training and inference, but the hardware cost is high, the energy consumption is high, and the energy efficiency ratio is low when the method is applied to an inference scene.
Disclosure of Invention
The application provides a programmable mixed precision operation unit, which aims to solve the problems of compatibility of fixed-point and floating-point operations and high energy efficiency ratio.
In order to achieve the above object, the present application provides the following technical solutions:
a programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and four extended single-precision adders;
any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, and X is a preset numerical value;
any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value;
wherein the four extended half-precision multipliers and the four extended single-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multiply-add devices;
the second mode is as follows: the first extended single-precision adder is respectively cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier;
the second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier;
and the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder.
Optionally, the extended half-precision multiplier includes:
a single-precision exponent multiplier and an extended half-precision mantissa multiplier which are connected in parallel;
the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
Optionally, the extended single-precision adder includes:
a single-precision exponent adder and an extended single-precision mantissa adder which are connected in parallel;
the extended single precision mantissa adder is used to extend an input numerical value to Y bits and calculate the sum of the extended numerical values.
A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and four extended double precision adders;
the extended single-precision multiplier is the programmable mixed-precision operation unit of any one of the preceding items;
the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values, wherein M is a preset numerical value;
wherein the four extended single-precision multipliers and the four extended double-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended single-precision multipliers and the four extended double-precision adders are correspondingly cascaded one by one to form four single-precision multiply-add devices connected in parallel;
the second mode is as follows: the first extended double-precision adder is respectively cascaded with the first extended single-precision multiplier and the second extended single-precision multiplier;
the second extended double-precision adder is respectively cascaded with the third extended single-precision multiplier and the fourth extended single-precision multiplier;
the third extended double-precision adder is respectively cascaded with the first extended double-precision adder and the second extended double-precision adder.
Optionally, the extended double-precision adder includes:
a double-precision exponent adder and an extended double-precision mantissa adder which are connected in parallel;
the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values.
A programmable mixed-precision arithmetic unit, comprising:
a programmable mixed-precision arithmetic unit of any of the preceding in parallel.
A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and three extended single-precision adders;
the first extended half-precision multiplier is used for calculating MSBa and MSBb after an input numerical value is extended to X bits to obtain a first product;
the second expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating MSBa LSBb to obtain a second product;
the third expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and MSBb to obtain a third product;
the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product; the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, and LSBb is the low order of the expanded second numerical value;
the first extended single-precision adder is used for extending the first product and the second product to Y bits and then calculating the sum of the extended first product and the extended second product to obtain a first addition result;
the second extended single-precision adder is used for extending the third product and the fourth product to Y bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result;
and the third extended single-precision adder is used for calculating the sum of the extended first addition result and the extended second addition result after the first addition result and the second addition result are extended to Y bits.
Optionally, the method further includes:
and a fourth extended single-precision adder, configured to extend the two single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the single-precision values is the sum of the first addition result and the second addition result.
A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and three extended double precision adders;
the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit;
the first extended double-precision adder is used for extending the first product and the second product to M bits, and then calculating the sum of the extended first product and the extended second product to obtain a first addition result; the first product is an output result of a first extended single-precision multiplier, and the second product is an output result of a second extended single-precision multiplier;
the second extended double-precision adder is used for extending the third product and the fourth product to M bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result; the third product is an output result of a third extended single-precision multiplier, and the fourth product is an output result of a fourth extended single-precision multiplier;
and the third extended double-precision adder is used for calculating the sum of the extended first addition result and the second addition result after the first addition result and the second addition result are extended to M bits.
Optionally, the method further includes:
and a fourth extended double-precision adder, configured to extend the two double-precision values to M bits, and calculate a sum of the two extended double-precision values, where any one of the double-precision values is the sum of the first addition result and the second addition result.
The programmable mixed precision operation unit can support floating point or fixed point multiplication and/or addition operation of various precisions, can realize multipath concurrent low-precision operation, and can realize high-precision operation integrally, so that the programmable mixed precision operation unit has higher energy efficiency ratio.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a programmable mixed-precision computing unit according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
fig. 3 is a schematic diagram illustrating different operations implemented by switching the connection modes of the programmable mixed-precision operation unit disclosed in the embodiment of the present application;
FIG. 4 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
FIG. 5 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
fig. 6 is a schematic structural diagram of another programmable mixed-precision arithmetic unit disclosed in the embodiment of the present application.
Detailed Description
The programmable mixed precision arithmetic unit disclosed by the embodiment of the application can be applied to, but not limited to, a deep neural network, and is suitable for training and deducing processes in terms of arithmetic types; in terms of hardware, it can be provided in general purpose central processing units (CPUs such as Intel/AMD x86 CPUs), graphics processors (GPUs such as NVidia V100), neuron processors (such as Google TPU), field programmable gate arrays (FPGAs and Application Specific Integrated Circuits (ASICs)).
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a programmable mixed-precision arithmetic unit disclosed in an embodiment of the present application, including: four extended half-precision multipliers 1 and four extended single-precision adders 2.
Any one of the extended half-precision multipliers is used for extending an input value to X bits and calculating a product of a first value and a second value, wherein the first value is a higher value or a lower value of the extended value of one input value, the second value is a higher value or a lower value of the extended value of the other input value, and X is a preset value.
Any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value.
In the present embodiment, "extended" means that the function of extending the number of digits of a numerical value to a preset numerical value is provided, but if the input numerical value is full of the preset numerical value, the extension is not performed.
Specifically, any one of the extended half-precision multipliers 1 includes: a single-precision exponent multiplier 11 and an extended half-precision mantissa multiplier 12 are connected in parallel.
Wherein the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
Specifically, the value range of X includes: 1. 16-22 bits, where the part exceeding 16 bits (input) as precision will be used to maintain the precision of the intermediate result, without affecting the implementation of the invention. 2. 11-15 bits, compared to 1, only lack support for fixed-point 32-bit computation, which is also covered by the present invention.
Specifically, the single-precision exponent multiplier is used for calculating the product of exponents of single-precision floating-point numbers. The single-precision exponential multiplier supports multiplication of single-precision numerical values and can be downward compatible with multiplication of half-precision numerical values.
Based on the structure of the extended half-precision multiplier, the extended half-precision multiplier can realize multiplication of half-precision floating point numbers and multiplication of 8-bit or 16-bit fixed point numbers. For example, the following steps are carried out:
for half-precision floating-point multiplication: assume two half-precision floating-point numbers C and D:
c1, Mc 2 Ec, D1, Md 2 Ed, where 1, Mc is the mantissa of C, 2 Ec is the exponent of C, 1, Md is the mantissa of D, and 2 Ed is the exponent of D.
The product of C and D is then: c × D ═ 1.Mc × 1.Md) × 2^ (Ec + Ed).
After 1.Mc and 1.Md are extended from 11 bits to 16 bits by low-bit extension (zero padding), respectively, assume that 1.Ma is extended as follows:
1.Mc’=Mc*2^-15。
md after extension is:
1.Md’=Md*2^-15。
let X ═ C ═ D ═ 1.Mx ^ 2^ Ex;
then there is Ex ═ Ec + Ed;
1) if Mc Md 2^ -30> ═ 2
Mx Mc Md 2^ -31(Mx rounded to 10 bits)
Ex=Ec+Ed+1
2) If Mc Md 2^ -30<2
Mx Mc Md 2^ -30(Mx rounded to 10 bits)
Ex=Ec+Ed。
Therefore, the multiplication of the two half-precision floating point numbers comprises multiplication, shift operation and addition operation, so that the addition operation can be carried out by using a single-precision exponent multiplier and the multiplication operation can be carried out by using a half-precision mantissa multiplier, and the shift operation is realized by using the conventional shift operation module.
For 8-bit fixed-point multiplication: assuming that the fixed-point numbers of two 8 bits are a and B, let 1.Ma be a (signed or unsigned extension to 16 bits) and 1.Mb be B, a be 1.Ma 1.Mb (cut the lower 16 bits), and this operation can be completed by using one extended half-precision mantissa multiplier.
Any extended single precision adder 2 includes: a single-precision exponent adder 21 and an extended single-precision mantissa adder 22 connected in parallel.
The extended single-precision mantissa adder is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values.
Specifically, the value range of Y includes: 32-44 bits, where the part exceeding 32 bits (output) will be used as precision to maintain the precision of the intermediate result, without affecting the implementation of the present invention. 2. 22-30 bits, compared with 1, only lack of support for fixed-point 32-bit calculation, which is also covered by the invention.
Specifically, the single-precision exponent adder is used for calculating the sum of exponents of single-precision or half-precision floating point numbers.
Based on the above structure, any one extended single-precision adder can support:
1. single precision addition;
2. half precision addition, wherein half precision is expanded into single precision and then converted into half precision after addition;
3. fixed-point 32-bit addition is realized by directly utilizing an extended mantissa adder, and the exponent parts of addition operands are defaulted to be the same constants;
4. the fixed point 16-bit and 8-bit addition is expanded to 32-bit fixed point for operation, and the result is converted into 16-bit or 8-bit (lower bit).
The four extended half-precision multipliers 1 and the four extended single-precision adders 2 have two connection modes:
fig. 1 shows a first connection: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multipliers and adders.
Based on the above specific structure, any one of the half-precision multipliers and adders can realize the following operations:
1. a half precision multiply or add operation;
2. a half precision multiply-add operation;
3. a fixed point 8-bit 16-bit multiply or add operation;
4. a fixed point 8-bit 16-bit multiply-add operation;
5. a single precision addition operation;
6. a fixed point 32-bit addition operation.
It can be seen that the first connection scheme can constitute 4 half-precision multipliers and that the 4 half-precision multipliers can perform the above 6 operations in parallel (one half-precision multiplier and one operation at the same time).
It should be noted that, in the embodiments of the present application, according to an existing structure and a known operation type, a person skilled in the art can know how to implement the known operation type using the existing structure, and details are not repeated here. Other configurations can be similar and are not exhaustive in the embodiments of the present application.
Fig. 2 shows a second connection:
the four extended half-precision multipliers are respectively called: a first extended half-precision multiplier, a second extended half-precision multiplier, a third extended half-precision multiplier, and a fourth extended half-precision multiplier. The four extended single-precision adders are referred to as a first extended single-precision adder, a second extended single-precision adder, a third extended single-precision adder, and a fourth extended single-precision adder, respectively. The first extended single-precision adder is cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier, respectively. The second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier. And the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder. The fourth extended single-precision adder is not connected to other extended single-precision adders or extended half-precision multipliers.
Based on the above specific structure, an example of the second connection relationship shown in fig. 2 for implementing the multiply-add operation is:
specifically, assume that two single-precision floating-point numbers are a and B:
a is 1.Ma 2 Ea, B is 1.Mb 2 Eb, where 1.Ma is the mantissa of A, 2 Ea is the exponent of A, 1.Mb is the mantissa of B, and 2 Eb is the exponent of B.
The product of a and B is: a is B ═ (1.Ma is 1.Mb) is 2^ (Ea + Eb).
After 1.Ma and 1.Mb are extended from 23 bits to 32 bits by low-bit extension (zero padding), respectively, it is assumed that 1.Ma is extended as follows:
1.Ma’=MSBa*2^-15+LSBa*2^-31。
mb extended is:
1.Mb’=MSBb*2^-15+LSBb*2^-31。
wherein MSB represents a high bit, LSB represents a low bit, MSBa represents a high bit of the mantissa obtained after 1.Ma expansion, LSBa represents a low bit of the mantissa obtained after 1.Ma expansion, MSBb represents a high bit of the mantissa obtained after 1.Mb expansion, and LSBb represents a low bit of the mantissa obtained after 1.Mb expansion. The MSB and LSB are 16-bit fixed point numbers respectively.
The product of the mantissas is:
X=1.Ma*1.Mb=1.Ma’*1.Mb’
=(MSBa*2^-15+LSBa*2^-31)*(MSBb*2^-15+LSBb*2^-31)
=2^-15*((MSBa*MSBb)+2^-16*(MSBa*LSBb+LSBa*MSBb)+2^-32*(LSBa*LSBb))。(1)
it can be seen that equation (1) includes four products, and the product of mantissas is the sum of the four products, so that, based on equation (1), 4 extended half-precision multipliers in fig. 2 (specifically, extended half-precision mantissa multipliers in the extended half-precision multipliers) are sequentially used to calculate 4 products in equation (1), and 3 adders are used to calculate the sum of the 4 products.
Namely: fig. 2 comprises 3 layers from top to bottom: in the first layer, from left to right, the first extended half-precision multiplier is used to calculate MSBa × MSBb, the second extended half-precision multiplier is used to calculate MSBa × LSBb, the third extended half-precision multiplier is used to calculate LSBa × MSBb, and the fourth extended half-precision multiplier is used to calculate LSBa × LSBb. Two extended single precision adders in a second layer, in left-to-right order, the first extended single precision adder for calculating a sum of MSBa MSBb and MSBa LSBb (hereinafter the sum of MSBa MSBb and MSBa LSBb is referred to as a first addition result), and the second extended single precision adder for calculating a sum of LSBa MSBb and LSBa LSBb (hereinafter the sum of LSBa MSBb and LSBa LSBb is referred to as a second addition result). The extended single precision adder of the third layer is used for calculating the sum of the first addition result and the second addition result.
It should be noted that 2^ -15, 2^ -16 and 2^ -32 in the formula (1) are realized through shift operation, and the shift operation can be realized by using the existing shift operation module, which can be seen in the prior art specifically and is not described herein again, and the shift operation module is not shown in fig. 2.
When the shift operation is involved, the semi-precision floating-point multiplier can realize the purpose of precision protection for the expansion of mantissas.
As is apparent from the above description, in the configuration shown in fig. 2, the mantissa operation in one single-precision floating-point multiplication is synthesized by combining 4 times of half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition operations.
The exponent operation 2^ Ea × 2^ Eb in the product of A and B can be realized by any single-precision exponent multiplier, and the multiplication and addition of the mantissa multiplication result and the exponent multiplication result can be obtained by shift operation (a shift module is the prior art and is not shown in FIG. 2), so that the connection relation shown in FIG. 2 has the function of realizing the multiplication of floating point numbers.
And by combining the fourth extended single-precision adder, the result of the multiplication of the floating point number twice can be used as the input of the fourth extended single-precision adder, so that the multiplication and addition operation is realized. Namely: combining 4 times of extended half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition, and one extended single-precision floating point addition operation, one single-precision floating point multiply-add operation can be synthesized. That is, 4 extended half-precision multipliers and 4 extended single-precision adders may constitute one single-precision floating-point multiply-add unit and be downward compatible with half-precision floating-point multiply-add operations.
Because fixed-point operations can be implemented using the mantissa portion of floating-point operations, and the exponent portion, which corresponds to floating-point operations, is skipped, fixed-point multiplications can also be implemented using the connection shown in fig. 2, which can implement 8-bit or 16-bit or 32-bit fixed-point multiplications based on the precision of extended half-precision multipliers. The implementation of the 32-bit fixed-point multiplication is the same as the above-mentioned mantissa multiplication (since the value of the input extended half-precision multiplier is already 32 bits, the half-precision multiplier does not need to extend the value). For 8-bit or 16-bit fixed-point multiplication, the operation process of multiplying the mantissa after the value is expanded by the half-precision multiplier is the same as that of the mantissa, and the description is omitted here.
Here, an example is given of a 32-bit fixed-point multiply-add operation for the connection relationship shown in fig. 2:
assume two 32-bit fixed-point numbers a and B, respectively:
a ═ MSBa 2^16+ LSBa, B ═ MSBa 2^16+ LSBb, wherein MSBa is the 16 high bits of A, LSBa is the 16 low bits of A; MSBb is the upper 16 bits of B, and LSBb is the lower 16 bits of B.
The product of a and B is: a ═ B ═ MSBa ^ 2^16+ LSBa ^ MSBa ^ 2^16+ LSBb)
=2^32*(MSBa*MSBb)+2^16*(MSBa*LSBb+LSBa*MSBb)+(LSBa*LSBb)。
Similar to single precision floating point operation, 4 extended half-precision mantissa multipliers, three shifters and three extended single-precision mantissa adders are used to complete the operation of a × B. And intercepting the lower 32 bits of the result, and if the result is greater than the maximum value represented by the 32 bits of energy during interception, performing saturation or overflow processing according to the algorithm requirement.
Finally, an extended single-precision mantissa adder is cascaded to complete a 32-bit fixed point multiply-add operation.
In summary, the four extended half-precision multipliers and the four extended single-precision adders constitute a reconfigurable arithmetic unit (fig. 1 and 2) capable of supporting multiply and/or add operations of multiple precisions, such as half-precision, single-precision floating-point numbers, and also capable of supporting multiply and/or add operations of 8-bit, 16-bit, and 32-bit fixed-point numbers, i.e., capable of supporting mixed-precision floating-point and fixed-point multiply and/or add operations.
More importantly, a part of the arithmetic units in the arithmetic unit can support lower-precision operation, for example, a single-precision exponential multiplier and any one extended half-precision multiplier realize half-precision floating-point multiplication, and the whole arithmetic unit supports higher-precision operation. Moreover, the operation of a part of the operators does not affect the operation of other operators connected in parallel, so that the multi-path concurrent low-precision operation can be realized, and the whole can be used as a high-precision operation unit.
In the prior art, if independent single-precision and semi-precision calculators are adopted, only one of the calculators is used when a specific precision calculation task is executed, so that the transistor utilization rate and the energy efficiency are low, and if only the calculator with the highest precision is adopted to convert the low-precision sum calculation into the high-precision calculation, the energy efficiency of the calculation is low (equivalent to the high-precision calculation) when the low-precision calculation is carried out.
Therefore, the arithmetic unit shown in fig. 1 and 2 can realize a high transistor utilization rate and a high power consumption ratio.
The support for low-precision multiplication and addition is not limited to floating-point half-precision, 8-bit and 16-bit fixed-point. Any non-standard floating point data format (e.g., BFloat16) with an exponent of no more than 8 bits and a fractional portion of no more than 10 bits, and non-standard fixed point data formats of no more than 16 bits, e.g., 2-bit, 4-bit, 12-bit fixed point, may be supported with this structure.
It should be noted that, for the two connection modes, switching (i.e., programming) of different connection relationships may be instructed (controlled). This switching is realized by a MUX (gate switch).
Fig. 3 is a schematic diagram of a programmable mixed-precision computing unit that uses a selector to switch between the above first connection mode and the second connection mode to implement different operations:
wherein,
1. the solid path represents a floating point single precision scalar multiply-add:
d ═ a × B + C ═ (AM + AL) × (BM + BL) + C. AM represents the high 12 bits of the single precision multiplier A mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), AL represents the low 12 bits of the single precision multiplier A mantissa in the extended floating point number (note that AL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of A), BM represents the high 12 bits of the single precision multiplier B mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), BL represents the low 12 bits of the single precision multiplier B mantissa in the extended floating point number (note that BL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of B), and C represents the single precision floating point multiplication addend.
2. The dashed path indicates (4x16) floating-point half-precision vector multiply/multiply-add, di ═ ci + ai × bi, di indicates elements corresponding to vectors { d0, d1, d2, d3}, i ═ 0,1,2,3, ci indicates elements corresponding to vectors { c0, c1, c2, c3}, ai indicates elements corresponding to vectors { a0, a1, a2, a3}, and bi indicates elements corresponding to vectors { b0, b1, b2, b3 }.
3. The selectors a, b select the dashed paths, and the remaining selectors select the solid paths to represent (4x16) the vector dot product of floating point half precision:a represents the vectors A, BTDenotes the transpose of vector B, ak denotes the corresponding elements of vector a { a0, a1, a2, a3}, bk denotes the corresponding elements of vector B { B0, B1, B2, B3 }.
Fig. 4 is a schematic diagram of another programmable mixed-precision arithmetic unit according to an embodiment of the present application, including four extended single-precision multipliers and four extended double-precision adders.
The extended double precision adder is used for extending the input numerical value to M bits and calculating the sum of the extended numerical values.
Specifically, the range of M includes: 1. 63-100 bits, where the part exceeding 64 bits (addition) is used as precision to maintain the precision of the intermediate result, without affecting the implementation of the present invention. 2. The bits 46-62, compared with 1, only lack the support of fixed point 64 bits calculation, which is also covered by the present invention.
The arithmetic unit in fig. 4 differs from the arithmetic unit shown in fig. 1 in that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder. The connection relationship in the arithmetic unit shown in fig. 4 is the same as that shown in fig. 1.
The extended single-precision multiplier in the arithmetic unit shown in fig. 4 may be part of the programmable mixed-precision arithmetic unit shown in fig. 2 except for a single extended single-precision adder (i.e., a fourth extended single-precision adder). Therefore, the extended range of the extended single-precision multiplier bit number is: 1. 32-50 bits, where the part exceeding 32 bits (multiplication) is used as precision to maintain the precision of the intermediate result, without affecting the implementation of the present invention. 2. The 23-31 bits, compared with 1, only lack the support of fixed point 64 bit calculation, and are also covered by the invention.
Fig. 5 shows another connection relationship of four extended single-precision multipliers and four extended double-precision adders, and the difference between fig. 5 and fig. 2 is that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder.
The extended single-precision multiplier in the arithmetic unit shown in fig. 5 may be part of the programmable mixed-precision arithmetic unit shown in fig. 2 except for a single extended single-precision adder (i.e., a fourth extended single-precision adder). Correspondingly, the reconfigurable structure consisting of the four extended single-precision multipliers and the four extended double-precision adders supports the following operation types:
1. 1 double-precision floating-point multiply-add operation;
2. 1 64-bit fixed point multiply-add operation;
3. 4 concurrent fixed-point 8-bit, 16-bit or 32-bit multiply-add operations;
4. 4 concurrent 64-bit fixed-point addition operations;
5. 4 concurrent double precision addition operations.
The operation unit shown in fig. 4 or fig. 5 implements the processes of multiplication, addition, and multiplication-addition, which may refer to the processes shown in fig. 1 or fig. 2, and the difference is that the precision of the values involved in the operation is different, for example, the high-order value in the above operation formula is the high-order 32-order of the mantissa after the expansion, and details are not repeated here.
Fig. 6 is a schematic diagram of another programmable mixed-precision arithmetic unit (which may also be referred to as a multiply-add arithmetic operator) disclosed in the embodiment of the present application, including a parallel programmable mixed-precision arithmetic unit (denoted by Kernel in fig. 6), where the programmable mixed-precision arithmetic unit may be an arithmetic unit shown in fig. 1 and fig. 2 or shown in fig. 4 and fig. 5. It should be noted that fig. 6 is only an example of parallel connection, and the parallel connection mode and the dimension are not limited in this embodiment.
The programmable mixed-precision computing unit shown in fig. 6 may be applied to multiply-add operations of vectors, and the vectors may be in dimensions such as one-dimensional and two-dimensional dimensions, which is not limited herein.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and four extended single-precision adders;
any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, and X is a preset numerical value;
any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value;
wherein the four extended half-precision multipliers and the four extended single-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multiply-add devices;
the second mode is as follows: the first extended single-precision adder is respectively cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier;
the second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier;
and the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder.
2. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended half-precision multiplier comprises:
a single-precision exponent multiplier and an extended half-precision mantissa multiplier which are connected in parallel;
the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
3. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended single-precision adder comprises:
a single-precision exponent adder and an extended single-precision mantissa adder which are connected in parallel;
the extended single precision mantissa adder is used to extend an input numerical value to Y bits and calculate the sum of the extended numerical values.
4. A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and four extended double precision adders;
the extended single-precision multiplier is a programmable mixed-precision arithmetic unit as claimed in any one of claims 1 to 3;
the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values, wherein M is a preset numerical value;
wherein the four extended single-precision multipliers and the four extended double-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended single-precision multipliers and the four extended double-precision adders are correspondingly cascaded one by one to form four single-precision multiply-add devices connected in parallel;
the second mode is as follows: the first extended double-precision adder is respectively cascaded with the first extended single-precision multiplier and the second extended single-precision multiplier;
the second extended double-precision adder is respectively cascaded with the third extended single-precision multiplier and the fourth extended single-precision multiplier;
the third extended double-precision adder is respectively cascaded with the first extended double-precision adder and the second extended double-precision adder.
5. A programmable mixed-precision arithmetic unit as claimed in claim 4, wherein said extended double-precision adder comprises:
a double-precision exponent adder and an extended double-precision mantissa adder which are connected in parallel;
the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values.
6. A programmable mixed-precision arithmetic unit, comprising:
a programmable mixed-precision arithmetic unit as claimed in any one of claims 1 to 3 or 4 to 5 connected in parallel.
7. A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and three extended single-precision adders;
the first extended half-precision multiplier is used for calculating MSBa and MSBb after an input numerical value is extended to X bits to obtain a first product;
the second expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating MSBa LSBb to obtain a second product;
the third expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and MSBb to obtain a third product;
the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product; the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, and LSBb is the low order of the expanded second numerical value;
the first extended single-precision adder is used for extending the first product and the second product to Y bits and then calculating the sum of the extended first product and the extended second product to obtain a first addition result;
the second extended single-precision adder is used for extending the third product and the fourth product to Y bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result;
and the third extended single-precision adder is used for calculating the sum of the extended first addition result and the extended second addition result after the first addition result and the second addition result are extended to Y bits.
8. A programmable mixed-precision arithmetic unit as claimed in claim 7, further comprising:
and a fourth extended single-precision adder, configured to extend the two single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the single-precision values is the sum of the first addition result and the second addition result.
9. A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and three extended double precision adders;
the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit in claim 7;
the first extended double-precision adder is used for extending the first product and the second product to M bits, and then calculating the sum of the extended first product and the extended second product to obtain a first addition result; the first product is an output result of a first extended single-precision multiplier, and the second product is an output result of a second extended single-precision multiplier;
the second extended double-precision adder is used for extending the third product and the fourth product to M bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result; the third product is an output result of a third extended single-precision multiplier, and the fourth product is an output result of a fourth extended single-precision multiplier;
and the third extended double-precision adder is used for calculating the sum of the extended first addition result and the second addition result after the first addition result and the second addition result are extended to M bits.
10. A programmable mixed-precision arithmetic unit as claimed in claim 9, further comprising:
and a fourth extended double-precision adder, configured to extend the two double-precision values to M bits, and calculate a sum of the two extended double-precision values, where any one of the double-precision values is the sum of the first addition result and the second addition result.
CN201811514918.2A 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit Active CN109634558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811514918.2A CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811514918.2A CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Publications (2)

Publication Number Publication Date
CN109634558A true CN109634558A (en) 2019-04-16
CN109634558B CN109634558B (en) 2020-01-14

Family

ID=66073086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811514918.2A Active CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Country Status (1)

Country Link
CN (1) CN109634558B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389746A (en) * 2019-07-29 2019-10-29 中国电子科技集团公司第二十四研究所 Hardware-accelerated circuit, microcontroller chip and system
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN111666077A (en) * 2020-04-13 2020-09-15 北京百度网讯科技有限公司 Operator processing method and device, electronic equipment and storage medium
CN111784489A (en) * 2020-06-28 2020-10-16 广东金宇恒软件科技有限公司 Financial accounting management system based on big data
CN112506468A (en) * 2020-12-09 2021-03-16 上海交通大学 RISC-V general processor supporting high throughput multi-precision multiplication
WO2021073512A1 (en) * 2019-10-14 2021-04-22 安徽寒武纪信息科技有限公司 Multiplier for floating-point operation, method, integrated circuit chip, and calculation device
WO2021116799A1 (en) * 2019-12-12 2021-06-17 International Business Machines Corporation Mixed precision floating-point multiply-add operation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1279781A (en) * 1997-11-26 2001-01-10 阿塔迈尔公司 Apparatus for multiprecision integer arithmetic
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN104778028A (en) * 2014-01-15 2015-07-15 Arm有限公司 Multiply adder
CN107153522A (en) * 2017-04-21 2017-09-12 东南大学 A kind of dynamic accuracy towards artificial neural networks can match somebody with somebody approximate multiplier
CN107967132A (en) * 2017-11-27 2018-04-27 中国科学院计算技术研究所 A kind of adder and multiplier for neural network processor
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN108958705A (en) * 2018-06-26 2018-12-07 天津飞腾信息技术有限公司 A kind of floating-point fusion adder and multiplier and its application method for supporting mixed data type

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1279781A (en) * 1997-11-26 2001-01-10 阿塔迈尔公司 Apparatus for multiprecision integer arithmetic
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN104778028A (en) * 2014-01-15 2015-07-15 Arm有限公司 Multiply adder
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN107153522A (en) * 2017-04-21 2017-09-12 东南大学 A kind of dynamic accuracy towards artificial neural networks can match somebody with somebody approximate multiplier
CN107967132A (en) * 2017-11-27 2018-04-27 中国科学院计算技术研究所 A kind of adder and multiplier for neural network processor
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108958705A (en) * 2018-06-26 2018-12-07 天津飞腾信息技术有限公司 A kind of floating-point fusion adder and multiplier and its application method for supporting mixed data type

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389746A (en) * 2019-07-29 2019-10-29 中国电子科技集团公司第二十四研究所 Hardware-accelerated circuit, microcontroller chip and system
CN110389746B (en) * 2019-07-29 2021-04-23 中国电子科技集团公司第二十四研究所 Hardware acceleration circuit, micro control chip and system
WO2021073512A1 (en) * 2019-10-14 2021-04-22 安徽寒武纪信息科技有限公司 Multiplier for floating-point operation, method, integrated circuit chip, and calculation device
WO2021116799A1 (en) * 2019-12-12 2021-06-17 International Business Machines Corporation Mixed precision floating-point multiply-add operation
US11275561B2 (en) 2019-12-12 2022-03-15 International Business Machines Corporation Mixed precision floating-point multiply-add operation
CN111666077A (en) * 2020-04-13 2020-09-15 北京百度网讯科技有限公司 Operator processing method and device, electronic equipment and storage medium
CN111784489A (en) * 2020-06-28 2020-10-16 广东金宇恒软件科技有限公司 Financial accounting management system based on big data
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN111626414B (en) * 2020-07-30 2020-10-27 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN112506468A (en) * 2020-12-09 2021-03-16 上海交通大学 RISC-V general processor supporting high throughput multi-precision multiplication

Also Published As

Publication number Publication date
CN109634558B (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN109634558B (en) Programmable mixed precision arithmetic unit
JP7476175B2 (en) Multiply-accumulate circuit
CN114402289B (en) Multi-mode arithmetic circuit
EP3835942B1 (en) Systems and methods for loading weights into a tensor processing block
US11816448B2 (en) Compressing like-magnitude partial products in multiply accumulation
JP3940542B2 (en) Data processor and data processing system
US9274750B2 (en) System and method for signal processing in digital signal processors
US10776078B1 (en) Multimodal multiplier systems and methods
CN112712172B (en) Computing device, method, integrated circuit and apparatus for neural network operations
US20210011686A1 (en) Arithmetic operation device and arithmetic operation system
Schmookler et al. A low-power, high-speed implementation of a PowerPC/sup TM/microprocessor vector extension
CN114341796A (en) Signed multiword multiplier
EP4231134A1 (en) Method and system for calculating dot products
WO2001046796A1 (en) Computing system using newton-raphson method
US5999962A (en) Divider which iteratively multiplies divisor and dividend by multipliers generated from the divisors to compute the intermediate divisors and quotients
US20220075598A1 (en) Systems and Methods for Numerical Precision in Digital Multiplier Circuitry
Kuo et al. Configurable Multi-Precision Floating-Point Multiplier Architecture Design for Computation in Deep Learning
KR102338863B1 (en) Apparatus and method for controlling operation
TW202219839A (en) Neural network processing unit and system
JPH04172526A (en) Floating point divider
US20240069864A1 (en) Hardware accelerator for floating-point operations
Hsiao et al. Multi-Precision Table-Addition Designs for Computing Nonlinear Functions in Deep Neural Networks
CN115374904A (en) Low-power-consumption floating point multiplication accumulation operation method for neural network reasoning acceleration
WO2024144950A1 (en) Multi-modal systolic array for matrix multiplication
Balasaraswathi et al. IMPLEMENTATION OF FLOATING POINT FFT PROCESSOR WITH SINGLE PRECISION FOR REDUCTION IN POWER

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 200120 room a-522, 188 Yesheng Road, Lingang xinpian District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai Suiyuan Technology Co.,Ltd.

Address before: 201203 Room 302, building 2, zhangrun building, Lane 61, shengxia Road, Pudong New Area, Shanghai

Patentee before: SHANGHAI ENFLAME TECHNOLOGY Co.,Ltd.