WO2021184143A1

WO2021184143A1 - Data processing apparatus and data processing method

Info

Publication number: WO2021184143A1
Application number: PCT/CN2020/079431
Authority: WO
Inventors: 董镇江; 李震桁; 袁宏辉; 谢环; 蒋东龙
Original assignee: 华为技术有限公司
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2021-09-23
Also published as: CN115280277A

Abstract

A data processing apparatus, comprising: a product calculation circuit, for calculating a first group of products and a second group of products, the first group of products comprising a product of high N digits of a first multiplier and a first multiplicand and a product of high N digits of a second multiplier and a second multiplicand, the second group of products comprising a product of low N digits of a first multiplier of the data processing apparatus and a first multiplicand of the data processing apparatus and a product of low N digits of a second multiplier and a second multiplicand of the data processing apparatus, the first multiplier and the second multiplier being 2N digits, and N being a positive integer; and an accumulation circuit, for respectively performing accumulation processing on the first group of products and the second group of products. Partial products having the same left shift digit in multiple groups of multiplication operations are combined, and an accumulation operation is separately performed on the combined results, thereby reducing logic overhead of the data processing apparatus.

Description

Data processing device and data processing method

Technical field

This application relates to the technical field of digital signal processing, and in particular to a data processing device and a data processing method.

Background technique

Convolutional neural network (convolutional neural network, CNN) has a wide range of application scenarios in the fields of image and speech recognition. In the model that implements the convolutional neural network algorithm, the convolution calculation accounts for 90% of the calculation of the entire algorithm model. Therefore, the efficient calculation of the convolution layer is the key to greatly improving the calculation efficiency of the CNN algorithm model. The convolution calculation is realized through hardware acceleration. It is an effective way.

At present, Nbit*2Nbit and 2Nbit*2Nbit processors have high logic resource overhead and low performance. * represents convolution, and N is a positive integer. Specifically, as shown in FIG. 1, a schematic diagram of an Nbit*Nbit data processing device, the processor includes an Nbit×Nbit multiplier and an adder with a bit width of 2Nbit. Each Nbit×Nbit multiplier outputs 2Nbit data. When there are two or more sets of data for convolution operation, an adder with a bit width of 2Nbit is required to accumulate the output result of the multiplier. The area of the 2Nbit×Nbit multiplier is twice that of the Nbit×Nbit multiplier, and the area of the 2Nbit×2Nbit multiplier is 4 times that of the Nbit×Nbit multiplier. Compared with the Nbit×Nbit multiplier, this This method causes the area of the multiplier to be greatly increased. In addition, as shown in Figure 2, in order to enable the processor to process 4 groups of Nbit×Nbit operations, 2 groups of 2Nbit×Nbit operations, and 1 group of 2Nbit×2Nbit operations within one clock, this implementation requires bits. Compared with the Nbit*Nbit solution, the adder with a width of 4Nbit has doubled the bit width of the adder in this way. The increase in the area of the multiplier and the expansion of the bit width of the adder will increase the logic resource overhead of the processor and reduce the performance of the processor. Therefore, how to design a processor with low logic resource overhead is an urgent solution.

Summary of the invention

This application provides a data processing device and a data processing method. Compared with the product of the high part of each group of multipliers and the multiplicand, after being shifted, it is directly combined with the product of the low part of the corresponding multiplier and the multiplicand. In addition, the solution provided in this application combines partial products with the same left shift number in multiple sets of multiplication operations, and performs cumulative addition operations on the combined results, which greatly saves logic resources.

The first aspect of the present application provides a data processing device, which may include: a product calculation circuit for calculating a first set of products and a second set of products, the first set of products may include the high N bits of the first multiplier and the first The product of the multiplier, and the product of the high N bits of the second multiplier and the second multiplicand. The second set of products may include the product of the low N bits of the first multiplier and the first multiplicand, and the second multiplication The product of the low N bits of the number and the second multiplicand, the first multiplier and the second multiplier are both 2N bits, and N is a positive integer. The accumulating circuit is used for accumulating the first group of products and the second group of products respectively. From the first aspect, it can be seen that when there are two or more sets of data for convolution operation, the technical solution provided by this application is performed by separately accumulating the product of the high part of multiple sets of multipliers and the multiplicand. Shift processing avoids shifting the product of the high-order part of each group of multipliers and the multiplicand, directly adding the product of the corresponding low-order part of the multiplier and the multiplicand, resulting in bit expansion of the adder.

Optionally, in combination with the above first aspect, in a first possible implementation manner, the data processing device may further include a first shifter and a first adder, and the first multiplicand and the second multiplicand are both N bits, the first shifter, used to perform shift processing on the result of the first group of multiplication accumulation and addition to obtain the first shift result. The first adder is used to accumulate the first shift result and the second set of products. As can be seen from the first possible implementation manner of the first aspect, a specific solution of how to use an Nbit×Nbit multiplier to form a 2Nbit*Nbit processor is given.

Optionally, in combination with the above-mentioned first aspect, in a second possible implementation manner, the first multiplicand and the second multiplicand are 2N bits, the first set of products may include high-order products and high-low-order products, and the second The group product may include the low-high-order product and the low-order product. The product calculation circuit is specifically used to calculate the high-order product. The high-order product may include the product of the high N bits of the first multiplier and the high N bits of the first multiplicand, and the second The product of the high N bits of the multiplier and the high N bits of the second multiplicand. Calculate the product of high and low bits. The product of high and low bits can include the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, and the high N bits of the second multiplier and the low N bits of the second multiplicand. product. Calculate the low-high product. The low-high product can include the product of the low N bits of the first multiplicand and the high N bits of the first multiplicand, and the low N bits of the second multiplier and the high N bits of the second multiplicand. product. The low-order product is calculated. The low-order product may include the product of the low N bits of the first multiplier and the low N bits of the first multiplicand, and the product of the low N bits of the second multiplier and the low N bits of the second multiplicand. The accumulation circuit is specifically used to accumulate high-order products, high-low-order products, low-high-order products, and low-order products through a second adder with a bit width of 2Nbit. From the second possible implementation of the first aspect, it can be seen that a 2Nbit*2Nbit processor solution can be formed through an Nbit×Nbit multiplier.

Optionally, in combination with the second possible implementation manner of the first aspect described above, in the third possible implementation manner, the data processing device may further include a second shifter, a third adder, and a third shifter, The fourth shifter, the fourth adder, and the second shifter are used to shift the result of the high-order multiplication accumulation addition to the left by N bits to obtain the second shift result. The third adder is used to accumulate the second shift result and the result of multiplication and accumulation of high and low bits. The third shifter is used to shift the result output by the third adder to the left by N bits to obtain the third shift result. The fourth shifter is used to shift the result of the multiplication and accumulation of the low and high bits by N bits to the left to obtain the fourth shift result. The fourth adder is used to accumulate the third shift result, the fourth shift result, and the low-order product.

Optionally, in combination with the third possible implementation manner of the first aspect described above, in the fourth possible implementation manner, the data processing device may further include a fifth adder for: multiplying the fourth shift result and the low-order product Accumulate. The fourth adder is specifically used to accumulate the third shift result and the result output by the fifth adder.

Optionally, in combination with the foregoing first aspect or the first possible implementation manner of the first aspect, in the fifth possible implementation manner, a split logic circuit may also be included for: outputting the first multiplier through the selector MUX The high and low N bits of the number, the high and low N bits of the second multiplier.

Optionally, in combination with the fifth possible implementation manner of the first aspect described above, in the sixth possible implementation manner, the logic circuit is split and is also used to construct a first association relationship. The first association relationship may include the first multiplication The relationship between the high N bits of a number and the first multiplicand, the relationship between the low N bits of the first multiplier and the first multiplicand, and the relationship between the high N bits of the second multiplier and the second multiplicand , The relationship between the low N bits of the second multiplier and the second multiplicand.

Optionally, in combination with the above-mentioned second aspect of the first aspect to the fourth possible implementation manner of the first aspect, in the seventh possible implementation manner, a split logic circuit may also be included for: outputting through the selector MUX High N bits and low N bits of the first multiplier, high N bits and low N bits of the second multiplier, high N bits and low N bits of the first multiplicand, and high N bits of the second multiplicand Low N bits.

Optionally, in combination with the seventh possible implementation manner of the first aspect described above, in the eighth possible implementation manner, splitting the logic circuit is also used to: construct a second association relationship, and the second association relationship may include the first The relationship between the high N bits of the multiplier and the high N bits of the first multiplicand, the relationship between the high N bits of the first multiplier and the low N bits of the first multiplicand, and the low N bits of the first multiplier The relationship with the high N bits of the first multiplicand, the relationship between the low N bits of the first multiplier and the low N bits of the first multiplicand, the high N bits of the second multiplicand and the second multiplicand The relationship between the high N bits of the second multiplier, the high N bits of the second multiplicand and the low N bits of the second multiplicand, the low N bits of the second multiplier and the high N bits of the second multiplicand Relationship, the relationship between the low N bits of the second multiplier and the low N bits of the second multiplicand.

Optionally, in combination with the eighth possible implementation manner of the first aspect described above, in the ninth possible implementation manner, the data processing device may further include a data random access memory RAM, and a weight RAM, and a data RAM for storing The high N bits and low N bits of the first multiplier, and the high N bits and low N bits of the second multiplier. The weight RAM is used to store the high N bits and low N bits of the first multiplicand, and the high N bits and low N bits of the second multiplicand according to the second association relationship.

Optionally, in combination with the foregoing first aspect or the first aspect of the first aspect to the ninth possible implementation manner of the first aspect, in the tenth possible implementation manner, the first multiplier and the second multiplier are characteristic layers Data, the first multiplicand and the second multiplicand are the convolution kernel data, or the first and second multipliers are the convolution kernel data, and the first and second multiplicands are the feature layer data .

A second aspect of the present application provides a data processing method, which may include: calculating a first set of products and a second set of products, the first set of products may include the product of the high N bits of the first multiplier and the first multiplicand, The product of the high N bits of the second multiplier and the second multiplicand. The second set of products can include the product of the low N bits of the first multiplier and the first multiplicand, and the low N bits of the second multiplier and the second multiplier. The product of the multiplicand, the first multiplier and the second multiplier are both 2N bits, and N is a positive integer. The first group of products and the second group of products are respectively accumulated.

Optionally, in combination with the above second aspect, in the first possible implementation manner, it may further include: performing shift processing on the result of the first group of multiplication accumulation and addition to obtain the first shift result. Accumulate the first shift result and the second set of products.

Optionally, in combination with the above second aspect, in a second possible implementation manner, calculating the first set of products and the second set of products may specifically include: calculating the high-order product, and the high-order product may include the high N of the first multiplier. The product of bits and the high N bits of the first multiplicand, and the product of the high N bits of the second multiplier and the high N bits of the second multiplicand. Calculate the product of high and low bits. The product of high and low bits can include the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, and the high N bits of the second multiplier and the low N bits of the second multiplicand. product. Calculate the low-high product. The low-high product can include the product of the low N bits of the first multiplicand and the high N bits of the first multiplicand, and the low N bits of the second multiplier and the high N bits of the second multiplicand. product. The low-order product is calculated. The low-order product may include the product of the low N bits of the first multiplier and the low N bits of the first multiplicand, and the product of the low N bits of the second multiplier and the low N bits of the second multiplicand. Accumulating the first group of products and the second group of products separately may include: accumulating high-order products, high-low-order products, low-high-order products, and low-order products, respectively.

Optionally, in combination with the second possible implementation manner of the second aspect described above, in the third possible implementation manner, it may further include: shifting the result of the high-order multiplication accumulation addition to the left by N bits to obtain the second shift result. Accumulate the second shift result and the result of multiplying and accumulating high and low bits. After accumulating the second shift result and the result of multiplying and accumulating the high and low bits, the result is shifted to the left by N bits to obtain the third shift result. The result of multiplying and accumulating the low and high bits is shifted to the left by N bits to obtain the fourth shift result. Accumulate the third shift result, the fourth shift result, and the low-order product.

Optionally, in combination with the third possible implementation manner of the second aspect described above, in the fourth possible implementation manner, it may further include: accumulating the fourth shift result and the low-order product. Accumulating the third shift result, the fourth shift result, and the low-order product may include: accumulating the third shift result and the result of the fourth shift result and the low-order product after accumulating.

Optionally, in combination with the foregoing second aspect or the first possible implementation manner of the second aspect, in the fifth possible implementation manner, it may further include: outputting the high N bits and low N bits of the first multiplier, The high N bits and low N bits of the second multiplier.

Optionally, in combination with the fifth possible implementation manner of the second aspect described above, in the sixth possible implementation manner, it may further include: constructing a first association relationship, and the first association relationship may include the high N of the first multiplier. The relationship between bits and the first multiplicand, the relationship between the low N bits of the first multiplier and the first multiplicand, the relationship between the high N bits of the second multiplier and the second multiplicand, the second multiplication The relationship between the low N bits of the number and the second multiplicand.

Optionally, in combination with the foregoing second aspect of the second aspect to the fourth possible implementation manner of the second aspect, in the seventh possible implementation manner, it may further include: outputting the high N bits and low N bits of the first multiplier Bits, the high N bits and low N bits of the second multiplier, the high N bits and low N bits of the first multiplicand, and the high N bits and low N bits of the second multiplicand.

Optionally, in combination with the eighth possible implementation manner of the second aspect described above, in the ninth possible implementation manner, it may further include: constructing a second association relationship, and the second association relationship may include the high N of the first multiplier. The correlation between the high N bits of the first multiplicand and the high N bits of the first multiplicand and the low N bits of the first multiplicand. The low N bits of the first multiplicand and the first multiplied The relationship between the high N bits of the number, the relationship between the low N bits of the first multiplier and the low N bits of the first multiplicand, the high N bits of the second multiplier and the high N bits of the second multiplicand Association relationship, the relationship between the high N bits of the second multiplier and the low N bits of the second multiplicand, the relationship between the low N bits of the second multiplier and the high N bits of the second multiplicand, the second multiplication The relationship between the low N bits of the number and the low N bits of the second multiplicand.

Optionally, in combination with the foregoing second aspect or the first aspect of the second aspect to the ninth possible implementation manner of the second aspect, in the tenth possible implementation manner, the first multiplier and the second multiplier are characteristic layers Data, the first multiplicand and the second multiplicand are the convolution kernel data, or the first and second multipliers are the convolution kernel data, and the first and second multiplicands are the feature layer data .

A third aspect of the present application provides a data processing device, which may include: a product calculation module for calculating a first set of products and a second set of products, the first set of products may include the high N bits of the first multiplier and the first set of products. The product of the multiplier, and the product of the high N bits of the second multiplier and the second multiplicand. The second set of products may include the product of the low N bits of the first multiplier and the first multiplicand, and the second multiplication The product of the low N bits of the number and the second multiplicand, the first multiplier and the second multiplier are both 2N bits, and N is a positive integer. The accumulation module is used for accumulating the first group of products and the second group of products respectively. From the third aspect, it can be seen that when there are two or more sets of data for the convolution operation, the technical solution provided by this application is performed by separately accumulating the product of the high part of the multiplier and the multiplicand. Shift processing avoids shifting the product of the high part of each group of multipliers and the multiplicand, directly adding the product of the corresponding low part of the multiplier and the multiplicand, resulting in bit expansion of the addition module.

Optionally, in combination with the above third aspect, in the first possible implementation manner, the data processing device may further include a first shift module and a first addition module, and the first multiplicand and the second multiplicand are both N bits, the first shift module, used to perform shift processing on the result of the first group of multiplication accumulation and addition to obtain the first shift result. The first addition module is used to accumulate the first shift result and the second set of products. As can be seen from the first possible implementation manner of the third aspect, a specific solution of how to use Nbit×Nbit multiplication modules to form a 2Nbit*Nbit processing module is given.

Optionally, in combination with the above-mentioned third aspect, in a second possible implementation manner, the first multiplicand and the second multiplicand are 2N bits, the first set of products may include high-order products and high-low-order products, and the second The group product can include a low-high-order product and a low-order product. The product calculation module is specifically used to calculate the high-order product. The high-order product can include the product of the high N bits of the first multiplier and the high N bits of the first multiplicand, and the second The product of the high N bits of the multiplier and the high N bits of the second multiplicand. Calculate the product of high and low bits. The product of high and low bits can include the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, and the high N bits of the second multiplier and the low N bits of the second multiplicand. product. Calculate the low-high product. The low-high product can include the product of the low N bits of the first multiplicand and the high N bits of the first multiplicand, and the low N bits of the second multiplier and the high N bits of the second multiplicand. product. The low-order product is calculated. The low-order product may include the product of the low N bits of the first multiplier and the low N bits of the first multiplicand, and the product of the low N bits of the second multiplier and the low N bits of the second multiplicand. The accumulation module is specifically used to accumulate the high-order product, the high-low-order product, the low-high-order product, and the low-order product through the second addition module with a bit width of 2Nbit. From the second possible implementation manner of the third aspect, it can be known that a 2Nbit*2Nbit processing module solution can be formed through an Nbit×Nbit multiplication module.

Optionally, in combination with the second possible implementation manner of the third aspect described above, in the third possible implementation manner, the data processing device may further include a second shift module, a third addition module, and a third shift module, The fourth shift module, the fourth addition module, and the second shift module are used to shift the result of the high-order multiplication accumulation addition to the left by N bits to obtain the second shift result. The third addition module is used to accumulate the second shift result and the result of multiplying and accumulating high and low bits. The third shift module is used to shift the result output by the third addition module by N bits to the left to obtain the third shift result. The fourth shift module is used to shift the result of multiplication and accumulation of low and high bits to the left by N bits to obtain the fourth shift result. The fourth addition module is used to accumulate the third shift result, the fourth shift result, and the low-order product.

Optionally, in combination with the third possible implementation manner of the third aspect described above, in the fourth possible implementation manner, the data processing device may further include a fifth addition module, configured to: compare the fourth shift result and the low-order product Accumulate. The fourth addition module is specifically used to accumulate the third shift result and the result output by the fifth addition module.

Optionally, in combination with the foregoing third aspect or the first possible implementation manner of the third aspect, in the fifth possible implementation manner, a split logic module may also be included for: outputting the first multiplier through the selection module MUX The high and low N bits of the number, the high and low N bits of the second multiplier.

Optionally, in combination with the fifth possible implementation manner of the third aspect described above, in the sixth possible implementation manner, the logic module is split and is also used to construct a first association relationship. The first association relationship may include the first multiplication The relationship between the high N bits of a number and the first multiplicand, the relationship between the low N bits of the first multiplier and the first multiplicand, and the relationship between the high N bits of the second multiplier and the second multiplicand , The relationship between the low N bits of the second multiplier and the second multiplicand.

Optionally, in combination with the above-mentioned second aspect of the third aspect to the fourth possible implementation manner of the third aspect, in the seventh possible implementation manner, a split logic module may also be included for: outputting through the selection module MUX The high N bits and low N bits of the first multiplier, the high N bits and low N bits of the second multiplier, the high N bits and low N bits of the first multiplicand, and the high N bits of the second multiplicand Low N bits.

Optionally, in combination with the seventh possible implementation manner of the third aspect described above, in the eighth possible implementation manner, splitting the logic module is also used to: construct a second association relationship, and the second association relationship may include the first The relationship between the high N bits of the multiplier and the high N bits of the first multiplicand, the relationship between the high N bits of the first multiplier and the low N bits of the first multiplicand, and the low N bits of the first multiplier The relationship with the high N bits of the first multiplicand, the relationship between the low N bits of the first multiplier and the low N bits of the first multiplicand, the high N bits of the second multiplicand and the second multiplicand The relationship between the high N bits of the second multiplier, the high N bits of the second multiplicand and the low N bits of the second multiplicand, the low N bits of the second multiplier and the high N bits of the second multiplicand Relationship, the relationship between the low N bits of the second multiplier and the low N bits of the second multiplicand.

Optionally, in combination with the eighth possible implementation manner of the third aspect described above, in the ninth possible implementation manner, the data processing device may further include a data random access storage module RAM, and a weight RAM and a data RAM for Store the high N bits and low N bits of the first multiplier, and the high N bits and low N bits of the second multiplier. The weight RAM is used to store the high N bits and low N bits of the first multiplicand, and the high N bits and low N bits of the second multiplicand according to the second association relationship.

Optionally, in combination with the foregoing third aspect or the first to the ninth possible implementation manner of the third aspect, in the tenth possible implementation manner, the first multiplier and the second multiplier are characteristic layers Data, the first multiplicand and the second multiplicand are the convolution kernel data, or the first and second multipliers are the convolution kernel data, and the first and second multiplicands are the feature layer data .

The fourth aspect of the present application provides a field programmable gate array FPGA. The FPGA may include the data processing device described in the first aspect or any one of the possible implementation manners of the first aspect.

According to the technical solution provided by the embodiment of the present application, 2Nbit data is split into Nbit, and the multiplication including 2Nbit data can be processed by the Nbit*Nbit data processing device, avoiding the increase of the area of the multiplier. In addition, by separately accumulating the product of the high-order part of multiple sets of multipliers and the multiplicand, avoiding shifting the product of the high-order part of each group of multipliers and the multiplicand, directly and the low-order part of the corresponding multiplier Part and the product of the multiplicand are added, resulting in bit expansion of the adder.

Description of the drawings

Figure 1 shows a Nbit*Nbit convolution processor;

Fig. 2 is a Nbit*2Nbit and 2Nbit*2Nbit processor composed of Nbit×Nbit multipliers;

Figure 3 is a schematic diagram of the convolution processing principle of CNN;

FIG. 4 is an Nbit*2Nbit convolution processing solution provided by an embodiment of the application;

FIG. 5 is an Nbit*2Nbit convolution processing solution provided by an embodiment of the application;

FIG. 6 is a 2Nbit*2Nbit convolution processing solution provided by an embodiment of the application;

FIG. 7 is a schematic diagram of a calculation process in which the solution provided in an embodiment of the application is applied to a product;

FIG. 8 is a schematic diagram of a 2Nbit×Nbit splitting method provided by an embodiment of the application;

FIG. 9 is a schematic diagram of a 2Nbit×2Nbit splitting method provided by an embodiment of the application;

FIG. 10 is a schematic diagram of another calculation process in which the solution provided in an embodiment of the application is applied to a product;

FIG. 11 is a schematic diagram of another calculation process in which the solution provided by an embodiment of the application is applied to a product;

FIG. 12 is a schematic diagram of another calculation process in which the solution provided by an embodiment of the application is applied to a product;

FIG. 13 is a schematic flowchart of a data processing method provided by an embodiment of this application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

In order to facilitate understanding, first, the process of convolution calculation is briefly introduced in conjunction with FIG. 3. The convolution operation is a weighted summation process. For example, each element in the used image area is multiplied by each element in the convolution kernel, and the sum of all products is used as the new value of the center pixel of the area. The convolution kernel is a matrix of fixed size and composed of numerical parameters. In Figure 3, for a feature map, the convolutional neural network performs convolution processing on the convolution kernel data A1, A2, ..., An and the feature layer data w1, w2, ..., w3. Specifically, for each convolution kernel, it starts from the first pixel of the feature map and moves pixel by pixel along the row direction. When moving to the end of this row, move down one pixel in the column direction, and at the same time return to the starting point in the row direction, and repeat the process of moving in the row direction until all pixels in the feature map are traversed.

The technical solution provided in this application can be applied to the field of original image processing, for example, it can be applied to a scene where the original image is processed for desiccation. Since each pixel of the original image is generally represented by an integer ranging from 10 bits to 12 bits. If the traditional 8bit data format is used for quantization, the pixel information of the original image will be lost too much, and the effect of de-noising will be unsatisfactory. Therefore, when processing the original image, it is necessary to use a high-bit floating point number (FP) data format or a high-bit integer (INT) data format for quantization processing. Using the floating-point number data format for processing will cause additional exponential processing area and power consumption overhead, and using the INT data format for processing can save this part of the overhead. This application uses Nbit*2Nbit and 2Nbit*2Nbit processors composed of Nbit×Nbit multipliers, and N is a positive integer. When the solution of this application is applied in the field of original image processing, the convolution kernel data can be Nbit, the feature layer data can be 2Nbit, or the convolution kernel data can be 2Nbit, and the feature layer data can be Nbit, or the convolution kernel data and features The layer data is 2Nbit. It should be noted that the solution provided in this application is not only applicable to the field of original image processing, and how the Nbit×Nbit multiplier constitutes the Nbit*2Nbit and 2Nbit*2Nbit processors will be separately described below.

As shown in FIG. 4, an Nbit*2Nbit convolution processing solution provided by this embodiment of the application. In this scheme, the 2Nbit data is split into a high-bit part and a low-bit part. The high-bit part is the high N-bit or the first N bits of the 2Nbit data, and the low-bit part is the low N-bit or the last N bits of the 2Nbit data. For example, assuming N is 8, then 2Nbit is 16bit, such as FF1A, then the high part is FF, and the low part is 1A. If it is 32-bit data, such as 3F68415B, the high part is 3F68 and the low part is 415B. In the prior art, how to split 2Nbit data into a high-order part and a low-order part can be used in the embodiments of this application. The multiplier 401, or select the low N bits of the output data A to the multiplier 403. When there are two or more sets of data to be multiplied, the product results that require the same number of left shifts in the partial products are combined. The following example illustrates this.

Suppose there are two sets of data convolution operations: A×B+C×D, where A and C are 2Nbits, or 2N bits, and B and D are Nbits, or N bits. In the following, "A-high" is used to represent the high part of data A, that is, the high N bits of A data, and "C-low" is used to represent the low part of C data, that is, the low N bits of C data. A×B+C×D=[(A-high×B+C-high×D)<<N]+A-low×B+C-low×D. As shown in FIG. 4, the multiplier 401, the multiplier 402, the multiplier 403, and the multiplier are all Nbit×Nbit multipliers. Among them, the multiplier 401 can be used to calculate A-high×B, the multiplier 402 can be used to calculate C-high×D, the multiplier 403 can be used to calculate A-low×B, and the multiplier 404 can be used to calculate C- Low × D. The adder 405 with a bit width of 2Nbit at the input can be used to accumulate the results output by the multiplier 401 and the multiplier 402 to obtain the accumulation result of the first set of products, and the adder 406 with a bit width of 2Nbit at the input can be used for the multiplier. 403 and the result output by the multiplier 404 are accumulated to obtain the accumulated result of the second set of products. The solution provided in this application combines the product results of partial products that require the same number of left shifts. For example, in the example of A×B+C×D listed above, the product of A-high×B and C-high×D The results need to be shifted by N bits to the left, so the partial products are combined, that is, the adder 405 is used for accumulation. The result of the product of A-low×B and C-low×D does not need to be shifted to the left, that is, to the left by 0 bits, so the two partial products are combined, that is, accumulated by the adder 406. The shifter 407 performs shift processing on the result output by the adder 405, specifically shifting it to the left by N bits. The result output by the shifter 407 and the result output by the adder 406 are accumulated by the adder 408 to output the final result. The bit width of the adder 408 is 2Nbit.

It should be noted that the convolution operation of the two sets of data listed above does not mean that the technical solution provided by this application is only applicable to the convolution operation of the two sets of data. This application does not limit the number of data participating in the convolution operation. , This will not be repeated in the following. In order to better understand this solution, the following uses four sets of data as an example to illustrate how the Nbit×Nbit multiplier constitutes Nbit*2Nbit. As shown in FIG. 5, an Nbit*2Nbit convolution processing solution provided by this embodiment of the application. Suppose there are four sets of data convolution operations: A×B+C×D+E×F+G×H. Among them, A, C, E, G are 2Nbits, or 2N bits, and B, D, F, and H are Nbits, or N bits. In the following, "A-high" is used to represent the high part of data A, that is, the high N bits of A data, and "A-low" is used to represent the low part of A data, that is, the low N bits of A data. Use "C-high" to represent the high part of data C, use "C-low" to represent the low part of C data, that is, the low N bits of C data, and use "E-high" to represent the high part of data E, that is, E data Use "E-low" to represent the low part of E data, that is, the low N bits of E data, and use "G-high" to represent the high part of data G, that is, the high N bits of G data, and use "G -"Low" represents the low part of the G data, that is, the low N bits of the E data. A×B+C×D+E×F+G×H=[(A-High×B+C-High×D+E-High×F+G-High×H)＜＜N]+A-Low ×B+C-low×D+E-low×F+G-low×H. In this example, A-height×B, C-height×D, E-height×F, G-height×H can be calculated by the multiplier 501 to the multiplier 504, respectively, and calculated by the multiplier 505 to the multiplier 508. A-low×B, C-low×D, E-low×F, and G-low×H. The product results of A-H×B, C-H×D, E-H×F and G-H×H all need to be shifted to the left by N bits, so their product results are combined. Specifically, as shown in Figure 5 , The output results of the multiplier 501 and the multiplier 502 can be accumulated by the adder 509, the output results of the multiplier 503 and the multiplier 504 can be accumulated by the adder 510, and the adder 509 and the adder 510 can be added by the adder 513 The output results are accumulated. Among them, the bit widths of the adder 509, the adder 510, and the adder 513 are all 2Nbit. The product results of A-low×B, C-low×D, E-low×F and G-low×H do not need to be shifted left, that is, shifted to the left by O bits, so their product results are combined, specifically, such as As shown in FIG. 5, the output results of the multiplier 505 and the multiplier 506 can be accumulated by the adder 511, the output results of the multiplier 507 and the multiplier 508 can be accumulated by the adder 512, and the adder 511 can be added by the adder 514. And the output result of the adder 512 for accumulation. Among them, the bit widths of the adder 511, the adder 512, and the adder 514 are all 2Nbit. The shifter 515 shifts the output result of the adder 513 to the left by N bits. The adder 516 accumulates the output results of the shifter 515 and the adder 514.

Assuming that A, C, E, and G are regarded as multipliers, and B, D, F, and H are regarded as multiplicands, this scheme is to multiply the product of the high part of the multiplier and the multiplicand and the low order of the multiplier The product of the part and the multiplicand is accumulated separately, and the accumulated result of the product of the high part of the multiplier and the multiplicand is shifted as a whole, and then the product of the low part of the multiplier and the multiplicand is multiplied The accumulated results of are added to form the final result.

Compared with the solutions mentioned in the background art, the technical solution provided in the present application does not need to directly extend the Nbit×Nbit multiplier to avoid an increase in the area of the multiplier. In addition, by separately accumulating the product of the high-order part of multiple sets of multipliers and the multiplicand and then performing the shifting process, avoiding shifting the product of the high-order part of each group of multipliers and the multiplicand, directly and The low part of the corresponding multiplier is added to the product of the multiplicand, resulting in bit expansion of the adder. For example, in some schemes, A×B+C×D=[(A-High×B)＜＜N+A-Low×B]+[(C-High×D)＜＜N+C-Low× D], an adder with a bit width of 3Nbit is required to calculate the sum of A-high×B and A-low×B, and the sum of C-high×D+C-low×D. In this scheme, as shown in Figure 5, only the adder that needs to output the final result is 3Nbit, and the other adders in the scheme can be 2Nbit, and are compared to the product of the high part of each group of multipliers and the multiplicand After the shift, the solution is directly added to the product of the low-order part of the corresponding multiplier and the multiplicand. This solution does not need to perform shift processing multiple times, saving logic resources.

As shown in FIG. 6, a 2Nbit*2Nbit convolution processing solution provided by this embodiment of the application. In this solution, the 2Nbit data is split into a high-bit part and a low-bit part. The explanation of the high-bit part and the low-bit part can be understood with reference to the description in FIG. 4, and the details will not be repeated here. When there are two or more sets of data to be multiplied, the product results that require the same number of left shifts in the partial products are combined. Specifically, the products of multiple high-order parts and high-order parts, high-order parts and low-order parts, low-order parts and high-order parts, and low-order parts and low-order parts are respectively combined by polynomials. After the combination, multiple products are accumulated separately and the accumulated result is overall shifted. Summing to get the final result. The following is an example to illustrate this, suppose there are two sets of data convolution operations: A×C+E×G, where A, C, E, and G are all 2Nbit. In the following, "A-high" is used to represent the high part of data A, that is, the high N bits of A data, and "A-low" is used to represent the low part of A data, that is, the low N bits of A data. Use "C-high" to represent the high part of data C, use "C-low" to represent the low part of C data, that is, the low N bits of C data, and use "E-high" to represent the high part of data E, that is, E data Use "E-low" to represent the low part of E data, that is, the low N bits of E data, and use "G-high" to represent the high part of data G, that is, the high N bits of G data, and use "G -Low" represents the lower part of the G data, that is, the lower N bits of the G data. A×C+E×G=[(A-high×C-high+E-high×G-high)＜＜2N]+[(A-high×C-low+E-high×G-low)＜＜N]+[(A-low×C-high+E-low×G-high)＜＜N]+(A-low×C-low+E-low×G-low). As shown in FIG. 6, the multiplier 601 to the multiplier 608 are all Nbit×Nbit multipliers. The product of the high-order part and the high-order part can be calculated by the multiplier 601 and the multiplier 602. For example, the multiplier 601 can calculate the A-high ×C-high, E-high×G-high is calculated by the multiplier 602. The product of the high-order part and the low-order part can be calculated by the multiplier 603 and the multiplier 604, or the product of the low-order part and the high-order part can be calculated. For example, the multiplier 603 can calculate A-high×C-low, and the multiplier 604 can calculate E- High×G-low, or A-low×C-high can be calculated by the multiplier 603, and E-low×G-high can be calculated by the multiplier 604. If the product of the high-order part and the low-order part is calculated by the multiplier 603 and the multiplier 604, the product of the low-order part and the high-order part is calculated by the multiplier 605 and the multiplier 606, if the low-order part and the high-order part are calculated by the multiplier 603 and the multiplier 604 Part of the product, the multiplier 605 and the multiplier 606 calculate the product of the high-order part and the low-order part, where the product of the high-order part and the low-order part refers to A-high×C-low, E-high×G-low, low-order part The product of the high part is A-low×C-high, E-low×G-high. The product of the low-order part and the low-order part can be calculated by the multiplier 607 and the multiplier 608. For example, the multiplier 607 can calculate A-low×C-low, and the multiplier 608 can calculate E-low×G-low. The adder 609 accumulates the output results of the multiplier 601 and the multiplier 602, the adder 610 accumulates the output results of the multiplier 603 and the multiplier 604, and the adder 611 accumulates the output results of the multiplier 605 and the multiplier 606. The accumulation processing is performed, and the adder 612 performs accumulation processing on the output results of the multiplier 607 and the multiplier 608. The bit widths of the adder 609, the adder 610, the adder 611, and the adder 612 are all 2Nbit. The product of the high part and the high part (hereinafter referred to as the high product) needs to be shifted to the left by 2Nbit, the product of the high part and the low part (hereinafter referred to as the high-low product) and the product of the low part and the high part (hereinafter referred to as the low-high product) Both need to be shifted to the left by Nbit, and the product of the low-order part and the low-order part (hereinafter referred to as the low-order product) does not need to be shifted to the left, that is, shifted to the left by 0bit. In a specific implementation, the output result of the adder 609 can be shifted to the left by Nbit by the shifter 613, and the data output by the shifter 613 is 3Nbit, which is the first shift of the high-order product. The adder 615 accumulates the output results of the shifter 613 and the adder 610, and the bit width of the adder 615 is 3Nbit. The shifter 617 shifts the output result of the adder 615 to the left by Nbit, and the data output by the shifter 617 is 4Nbit. At this time, the high-order product completes the shift of 2Nbit. The shifter 614 shifts the output result of the adder 611 to the left by N bits, the adder 616 accumulates the output results of the shifter 614 and the adder 612, and the bit width of the adder 616 is 3Nbit. The adder 618 accumulates the output results of the shifter 617 and the adder 616 to obtain the final output result. The bit width of the adder 618 is 4Nbit.

The technical solution provided in this application deals with the product of the high part of the multiplier and the high part of the multiplicand, the product of the high part of the multiplier and the low part of the multiplicand, and the low part of the multiplier and The product of the high part of the multiplicand, the products of the low part of the multiplier and the low part of the multiplicand are accumulated separately, and then the 4 accumulated results are shifted and added accordingly to get the final result. The solution provided by this application avoids the product of the high part of each group of multipliers and the high part of the multiplicand, and the product of the high part and the low part is shifted separately, resulting in bit expansion of the adder. For example, there are some In the scheme, A×C+E×G=[(A-High×C-High)＜＜2N]+[(A-High×C-Low+)＜＜N]+[(A-Low×C- High＜＜N]+(A-Low×C-Low)+[(E-High×G-High)＜＜2N]+[(E-High×G-Low+)＜＜N]+[(E -Low×G-High<<N]+(E-Low×G-Low). This kind of scheme requires multiple shifts, and the more data involved in the convolution operation, the more shifts required. In addition, this solution requires a large number of adders with a bit width of 3Nbit and a 4Nbit adder. This solution combines partial products with the same number of left shifts and performs accumulation operations separately, which greatly saves logic resources.

In a specific implementation of the present application, by controlling the turn-on and turn-off of the shifter, 4 groups of Nbit×Nbit operations, 2 groups of 2Nbit×Nbit operations, and 1 group of 2Nbit×Nbit operations can be processed within one clock. The calculation of 2Nbit will be described in detail with reference to FIG. 6 below. The shifter 613, the shifter 614, and the shifter 616 can be turned on and off through the state machine. The specific user can input instructions through the state machine to control the shifter 613, the shifter 614, and the shifter 616. Turn on and turn off. In a specific implementation manner, the shifter 613, the shifter 614, and the shifter 617 can be controlled to be in an on state. At this time, two sets of 2Nbit×2Nbit operations can be processed. For details, refer to the description in FIG. 6 above. In a specific implementation manner, the shifter 613 can be controlled to be turned on, the shifter 614 is turned on, and the shifter 617 is turned off. At this time, 4 groups of 2Nbit×Nbit operations can be processed. In a specific implementation manner, the shifter 601, the shifter 614, and the shifter 617 can be controlled to be in an off state, and at this time, 8 groups of Nbit×Nbit operations can be processed.

The above describes how to perform calculations based on the feature layer data and the convolution kernel data. In specific application scenarios, the above solutions can be implemented by any convolution operation device, such as a multiplier, a central processing unit, and a CPU. ), field-programmable gate array (FPGA), application specific intergrated circuits (ASIC), graphics processing unit (GPU) or other artificial intelligence (AI) chips And so on on the chip and so on.

The following describes a scenario where the solution provided in this application is applied to a specific product, and the calculation process involved in this application is described. The specific product may refer to any of the convolution operation devices mentioned above. As shown in FIG. 7, a double-rate synchronous dynamic random access memory (DDR) controller 702 reads data from the DDR701, and the data includes feature layer data and convolution kernel data. The DDR controller 702 sends the read data to the splitting logic circuit 703, and the splitting logic circuit 703 splits the 2Nbit feature layer data into a high-order part and a low-order part, and stores the split data into the data randomly. In a random access memory (RAM) 705, the split logic circuit 703 splits the 2Nbit convolution kernel data into a high-order part and a low-order part, and stores the split data in the weight RAM 704. The calculation circuit 706 obtains the feature layer data from the data RAM, and performs calculation with the convolution kernel data preloaded in the calculation circuit 706. The specific calculation process can be understood with reference to the description of FIGS. 4 to 6. After the calculation circuit 706 completes the calculation, it writes the calculated result into the DDR 701 through the DDR controller 702 to complete the entire process. In a specific implementation manner, a state machine (not shown in the figure) may also be included to control the turning off of the shifter in the calculation circuit. The specific principle has been described in detail above and will not be repeated here.

In the following, taking the multiplier and the multiplicand both 2Nbit, or that the convolution kernel data and the feature layer data are both 2Nbit, as an example, the data calculation process in the structure of the product shown in FIG. 7 will be described. The DDR controller reads data from the DDR and sends the read data to the split logic circuit. The split logic circuit splits the acquired data and establishes a corresponding relationship. For example, as shown in Figure 8, a 2Nbit×Nbit splitting method is given. Assuming that there are four sets of data for multiplication, A×B+C×D+E×F+G×H, where A , C, E, G are 2Nbit, or 2N bit, B, D, F, H are Nbit, A×B+C×D+E×F+G×H=[(A-High×B+C- High×D+E-High×F+G-High×H)＜＜N]+A-Low×B+C-Low×D+E-Low×F+G-Low×H, then split the logic circuit Split data A into A-high and A-low, and establish a corresponding relationship between A-high and A-low and data B respectively, split data C into C-high and C-low, and set C-high And C-low respectively establish corresponding relationships with data D, split data E into E-high and E-low, and establish corresponding relationships between E-high and E-low and data F respectively, and split data G into G- High and G-low, and the corresponding relationship between G-high and G-low and data H is established. It should be noted that the embodiment of this application does not limit the number of data participating in the calculation. In actual application scenarios, participation The calculated data can be two groups or more than two groups. As shown in Figure 9, a 2Nbit×2Nbit splitting method is given, A×C+E×G, where A, C, E, and G are all 2Nbit, A×C+E×G=[(A -High×C-High+E-High×G-High)＜＜2N]+[(A-High×C-Low+E-High×G-Low)＜＜N]+[(A-Low×C -High+E-Low×G-High)＜＜N]+(A-Low×C-Low+E-Low×G-Low), the splitting logic circuit will split data A into A-high sum A-low, split data C into C-high and C-low, split data E into E-high and E-low, split data G into G-high and G-low, and divide A- Establish a corresponding relationship between high and C-high, establish a corresponding relationship between G-high and E-high, establish a corresponding relationship between A-high and C-low, establish a corresponding relationship between G-high and E-low, and establish a corresponding relationship between A-low and E-low. C-high establishes a corresponding relationship, G-low and E-high establish a corresponding relationship, A-low and C-low establish a corresponding relationship, and G-low and E-low establish a corresponding relationship. The split logic circuit stores the split feature layer data in the data RAM, and stores the split convolution kernel data in the weight RAM according to the corresponding relationship established above, as shown in Figure 10, with 2Nbit×2Nbit As an example, a schematic diagram of a split logic circuit that splits 2Nbit data into two parts, a high part and a low part, is stored in the data RAM and parameter RAM. As shown in Figure 11, the data in the weight RAM is preloaded into the calculation circuit. Specifically, the first segment of data is preloaded into the calculation circuit 1, and the second segment of data is preloaded into the calculation circuit 2,..., Preload the nth segment of data into the calculation circuit n. As shown in Figure 12, the first segment data is extracted from the data RAM and the first segment data preloaded in the calculation circuit 1 to calculate and the result is obtained. The specific calculation process can be understood with reference to the description in Figure 6, and will not be omitted here. Repeat it. After the calculation circuit 1 completes the calculation of the first segment data and the first segment data, it forwards the first segment data to the calculation circuit 2, and obtains the second segment data from the data RAM, and compares the second segment data with the calculation circuit 1 The preloaded first segment data is calculated and the result is obtained. After each clock, the calculation circuit 1 obtains new data from the data RAM, and the calculation circuit 2 to the calculation circuit n forward the data of the characteristic layer processed by the previous clock to the next calculation circuit. After all the data stored in the data RAM have completed the calculation, the calculation circuit 1 to the calculation circuit n output data, and the data output by the calculation circuit is stored in the DDR through the DDR controller.

The data processing device provided by the embodiment of the present application and the device including the data processing device have been described above, and the data processing method provided by the embodiment of the present application will be described below.

As shown in FIG. 13, it is a schematic flowchart of a data processing method provided in an embodiment of this application. As shown in FIG. 13, a data processing method provided by an embodiment of the present application may include the following steps:

1301 Calculate the first set of products and the second set of products. The first group of products can include the product of the high N bits of the first multiplier and the first multiplicand, the product of the high N bits of the second multiplier and the second multiplicand, and the second group of products can include the first multiplier. The product of the low N bits of and the first multiplicand, the product of the low N bits of the second multiplier and the second multiplicand, the first and second multipliers are both 2N bits, and N is a positive integer.

1302. The first group of products and the second group of products are respectively accumulated.

The technical solution provided in this application, when there are two or more than two sets of data for convolution operation, by separately accumulating the product of the high part of multiple sets of multipliers and the multiplicand and then performing shift processing, avoiding After the product of the high part of each group of multipliers and the multiplicand is shifted, it is directly added to the product of the corresponding low part of the multiplier and the multiplicand, resulting in bit expansion of the adder.

In a specific implementation, shift processing is performed on the result of the first group of multiplication accumulation and addition to obtain the first shift result. Accumulate the first shift result and the second set of products. This application can process the calculation of 2Nbit*Nbit through the Nbit×Nbit multiplier.

In a specific embodiment, calculating the first set of products and the second set of products may specifically include: calculating the high-order product, and the high-order product may include the high N bits of the first multiplier and the high N bits of the first multiplicand Product, the product of the high N bits of the second multiplier and the high N bits of the second multiplicand. Calculate the product of high and low bits. The product of high and low bits can include the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, and the high N bits of the second multiplier and the low N bits of the second multiplicand. product. Calculate the low-high product. The low-high product can include the product of the low N bits of the first multiplicand and the high N bits of the first multiplicand, and the low N bits of the second multiplier and the high N bits of the second multiplicand. product. The low-order product is calculated. The low-order product may include the product of the low N bits of the first multiplier and the low N bits of the first multiplicand, and the product of the low N bits of the second multiplier and the low N bits of the second multiplicand. Accumulating the first group of products and the second group of products separately may include: accumulating high-order products, high-low-order products, low-high-order products, and low-order products, respectively. The solution provided in this application can process the calculation of 2Nbit*2Nbit through the Nbit×Nbit multiplier.

In a specific implementation, it may further include: shifting the result of the high-order multiplication accumulation addition to the left by N bits to obtain the second shift result. Accumulate the second shift result and the result of multiplying and accumulating high and low bits. After accumulating the second shift result and the result of multiplying and accumulating the high and low bits, the result is shifted to the left by N bits to obtain the third shift result. The result of multiplying and accumulating the low and high bits is shifted to the left by N bits to obtain the fourth shift result. Accumulate the third shift result, the fourth shift result, and the low-order product.

In a specific implementation, it may further include: accumulating the fourth shift result and the low-order product. Accumulating the third shift result, the fourth shift result, and the low-order product may include: accumulating the third shift result and the result of the fourth shift result and the low-order product after accumulating.

In a specific implementation, it may further include: outputting the high N bits and low N bits of the first multiplier, and the high N bits and low N bits of the second multiplier.

In a specific embodiment, it may further include: constructing a first association relationship, the first association relationship may include an association relationship between the high N bits of the first multiplier and the first multiplicand, and the low N bits of the first multiplier The association relationship with the first multiplicand, the association relationship between the high N bits of the second multiplier and the second multiplicand, and the association relationship between the low N bits of the second multiplier and the second multiplicand.

In a specific implementation, it may also include: outputting the high N bits and low N bits of the first multiplier, the high N bits and low N bits of the second multiplier, and the high N bits and low N bits of the first multiplicand. N bits, the high and low N bits of the second multiplicand.

In a specific embodiment, it may further include: constructing a second association relationship, the second association relationship may include an association relationship between the high N bits of the first multiplier and the high N bits of the first multiplicand, and the first multiplier The correlation between the high N bits of the first multiplicand and the low N bits of the first multiplicand, the correlation between the low N bits of the first multiplier and the high N bits of the first multiplicand, the low N bits of the first multiplier and the first multiplicand The correlation between the low N bits of a multiplicand, the correlation between the high N bits of the second multiplier and the high N bits of the second multiplicand, the high N bits of the second multiplier and the low of the second multiplicand N-bit correlation, the correlation between the low N bits of the second multiplier and the high N bits of the second multiplicand, and the correlation between the low N bits of the second multiplier and the low N bits of the second multiplicand.

In a specific embodiment, the first multiplier and the second multiplier are feature layer data, and the first multiplicand and the second multiplicand are convolution kernel data, or the first multiplier and the second multiplier are Convolution kernel data, the first multiplicand and the second multiplicand are feature layer data. When the solution of this application is applied in the field of original image processing, the convolution kernel data can be Nbit, the feature layer data can be 2Nbit, or the convolution kernel data can be 2Nbit, and the feature layer data can be Nbit, or the convolution kernel data and features The layer data is 2Nbit. It should be noted that the solution provided by this application can be applied not only to the field of original image processing.

The data processing device and data processing method provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The description of the above embodiments is only used to help understand the application. At the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be understood as Restrictions on this application.

Claims

A data processing device, characterized in that it comprises:

A product calculation circuit for calculating a first set of products and a second set of products, the first set of products including the product of the high N bits of the first multiplier and the first multiplicand, and the high N bits of the second multiplier And the second multiplicand, the second set of products includes the product of the low N bits of the first multiplier and the first multiplicand, and the low N bits of the second multiplier and the The product of the second multiplicand, the first multiplier and the second multiplier are both 2N bits, and the N is a positive integer;

The accumulating circuit is used for accumulating the first group of products and the second group of products respectively.
The data processing device according to claim 1, wherein the data processing device further comprises a first shifter and a first adder, and the first multiplicand and the second multiplicand are both N bits,

The first shifter is configured to perform shift processing on the result of the first group of multiplication accumulation and addition to obtain a first shift result;

The first adder is configured to accumulate the first shift result and the second set of products.
The data processing device according to claim 1, wherein the first multiplicand and the second multiplicand are 2N bits, the first set of products includes a high-order product and a high-low-order product, and the The second set of products includes low-high-order products and low-order products. The product calculation circuit is specifically used for:

Calculate the high-order product, the high-order product includes the product of the high N bits of the first multiplier and the high N bits of the first multiplicand, the high N bits of the second multiplier and the first multiplier The product of the high N bits of the two multiplicands;

Calculate the high and low product, the high and low product includes the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, the high N bits of the second multiplier and the The product of the low N bits of the second multiplicand;

Calculate the low-high product, the low-high product includes the product of the low N bits of the first multiplier and the high N bits of the first multiplicand, the low N bits of the second multiplier and the The product of the high N bits of the second multiplicand;

Calculate the low-order product, the low-order product includes the product of the low-order N bits of the first multiplier and the low-order N bits of the first multiplicand, and the low-order N bits of the second multiplier and the first multiplicand The product of the low N bits of the two multiplicand;

The accumulation circuit is specifically configured to accumulate the high-order product, the high-low-order product, the low-high-order product, and the low-order product through a second adder with a bit width of 2Nbit.
3. The data processing device of claim 3, wherein the data processing device further comprises a second shifter, a third adder, a third shifter, a fourth shifter, and a fourth adder,

The second shifter is configured to shift the result of the high-order multiplication accumulation addition to the left by N bits to obtain a second shift result;

The third adder is configured to accumulate the second shift result and the result of the high and low multiplication accumulation and addition;

The third shifter is configured to shift the result output by the third adder to the left by N bits to obtain a third shift result;

The fourth shifter is configured to shift the result of the low-high-order multiplication accumulation and addition to the left by N bits to obtain a fourth shift result;

The fourth adder is configured to accumulate the third shift result, the fourth shift result, and the low-order product.
The data processing device according to claim 4, wherein the data processing device further comprises a fifth adder for:

Accumulate the fourth shift result and the low-order product;

The fourth adder is specifically configured to accumulate the third shift result and the result output by the fifth adder.
The data processing device according to claim 1 or 2, further comprising a split logic circuit for:

The high N bits and low N bits of the first multiplier, and the high N bits and low N bits of the second multiplier are output through the selector MUX.
The data processing device according to claim 6, wherein the split logic circuit is further used for:

Construct a first association relationship, the first association relationship includes an association relationship between the high N bits of the first multiplier and the first multiplicand, and the low N bits of the first multiplier and the first multiplicand The association relationship of the multiplicand, the association relationship between the high N bits of the second multiplier and the second multiplicand, and the association relationship between the low N bits of the second multiplier and the second multiplicand .
The data processing device according to any one of claims 3 to 5, further comprising a split logic circuit for:

Output the high N bits and low N bits of the first multiplier through the selector MUX, the high N bits and low N bits of the second multiplier, and the high N bits and low N bits of the first multiplicand , The high N bits and low N bits of the second multiplicand.
The data processing device according to claim 8, wherein the split logic circuit is further used for:

Construct a second association relationship, the second association relationship includes an association relationship between the high N bits of the first multiplier and the high N bits of the first multiplicand, and the high N bits of the first multiplier and The association relationship between the low N bits of the first multiplicand, the association relationship between the low N bits of the first multiplier and the high N bits of the first multiplicand, the low N bits of the first multiplier The correlation between the low N bits of the first multiplicand, the high N bits of the second multiplier and the high N bits of the second multiplicand, the correlation of the second multiplier The correlation between the high N bits and the low N bits of the second multiplicand, the correlation between the low N bits of the second multiplier and the high N bits of the second multiplicand, the second multiplication The correlation between the low N bits of the number and the low N bits of the second multiplicand.
The data processing device according to claim 9, wherein the data processing device further comprises a data random access memory RAM, and a weight RAM,

The data RAM is used to store the high N bits and low N bits of the first multiplier, and the high N bits and low N bits of the second multiplier;

The weight RAM is configured to store the high N bits and low N bits of the first multiplicand, and the high N bits and low N bits of the second multiplicand according to the second association relationship.
The data processing device according to any one of claims 1 to 10, wherein the first multiplier and the second multiplier are feature layer data, and the first multiplicand and the second multiplier are feature layer data. The multiplicand is convolution kernel data, or the first multiplier and the second multiplier are convolution kernel data, and the first multiplicand and the second multiplicand are feature layer data.
A data processing method, characterized in that it comprises:

Calculate the first set of products and the second set of products. The first set of products includes the product of the high N bits of the first multiplier and the first multiplicand, and the high N bits of the second multiplier and the second multiplicand The second set of products includes the product of the low N bits of the first multiplier and the first multiplicand, and the low N bits of the second multiplier and the second multiplicand The first multiplier and the second multiplier are both 2N bits, and the N is a positive integer;

The first set of products and the second set of products are respectively accumulated and added.
The data processing method according to claim 12, further comprising:

Performing shift processing on the result of the first group of multiplication accumulation and addition to obtain a first shift result;

Accumulate the first shift result and the second set of products.
The data processing method according to claim 12, wherein said calculating the first set of products and the second set of products specifically comprises:

Calculate the high-order product, the high-order product includes the product of the high N bits of the first multiplier and the high N bits of the first multiplicand, the high N bits of the second multiplier and the first multiplier The product of the high N bits of the two multiplicands;

Calculate the high and low product, the high and low product includes the product of the high N bits of the first multiplier and the low N bits of the first multiplicand, the high N bits of the second multiplier and the The product of the low N bits of the second multiplicand;

Calculate the low-high product, the low-high product includes the product of the low N bits of the first multiplier and the high N bits of the first multiplicand, the low N bits of the second multiplier and the The product of the high N bits of the second multiplicand;

Calculate the low-order product, the low-order product includes the product of the low-order N bits of the first multiplier and the low-order N bits of the first multiplicand, and the low-order N bits of the second multiplier and the first multiplicand The product of the low N bits of the two multiplicand;

The step of separately accumulating the first group of products and the second group of products includes:

The high-order product, the high-low-order product, the low-high-order product, and the low-order product are accumulated respectively.
The data processing method according to claim 14, further comprising:

Shift the result of the high-order multiplication accumulation and addition to the left by N bits to obtain the second shift result;

Accumulating the second shift result and the result of multiplying and accumulating the high and low bits;

The result of accumulating the second shifting result and the result of multiplying and accumulating the high and low bits is shifted to the left by N bits to obtain a third shifting result;

Shifting the result of the accumulation and addition of the low and high bits to the left by N bits to obtain the fourth shift result;

Accumulate the third shift result, the fourth shift result, and the low-order product.
The data processing method according to claim 15, further comprising:

Accumulate the fourth shift result and the low-order product;

The accumulating the third shift result, the fourth shift result, and the low-order product includes:

Accumulate the third shift result and the result of the accumulation of the fourth shift result and the low-order product.
The data processing method according to claim 12 or 13, further comprising:

Output high N bits and low N bits of the first multiplier, and high N bits and low N bits of the second multiplier.
The data processing method according to claim 17, further comprising:

Construct a first association relationship, the first association relationship includes an association relationship between the high N bits of the first multiplier and the first multiplicand, and the low N bits of the first multiplier and the first multiplicand The association relationship of the multiplicand, the association relationship between the high N bits of the second multiplier and the second multiplicand, and the association relationship between the low N bits of the second multiplier and the second multiplicand .
The data processing method according to any one of claims 14 to 16, further comprising:

Output the high N bits and low N bits of the first multiplier, the high N bits and low N bits of the second multiplier, the high N bits and low N bits of the first multiplicand, the first multiplier Two high N bits and low N bits of the multiplicand.
The data processing method according to claim 19, further comprising:

Construct a second association relationship, the second association relationship includes an association relationship between the high N bits of the first multiplier and the high N bits of the first multiplicand, and the high N bits of the first multiplier and The association relationship between the low N bits of the first multiplicand, the association relationship between the low N bits of the first multiplier and the high N bits of the first multiplicand, the low N bits of the first multiplier The correlation between the low N bits of the first multiplicand, the high N bits of the second multiplier and the high N bits of the second multiplicand, the correlation of the second multiplier The correlation between the high N bits and the low N bits of the second multiplicand, the correlation between the low N bits of the second multiplier and the high N bits of the second multiplicand, the second multiplication The correlation between the low N bits of the number and the low N bits of the second multiplicand.
The data processing method according to any one of claims 12 to 20, wherein the first multiplier and the second multiplier are feature layer data, and the first multiplicand and the second multiplier are feature layer data. The multiplicand is convolution kernel data, or the first multiplier and the second multiplier are convolution kernel data, and the first multiplicand and the second multiplicand are feature layer data.
A field programmable gate array FPGA, characterized in that the FPGA comprises the data processing device described in any one of claims 1 to 11.