CN112612447A - Matrix calculator and full-connection-layer calculation method based on matrix calculator - Google Patents
Matrix calculator and full-connection-layer calculation method based on matrix calculator Download PDFInfo
- Publication number
- CN112612447A CN112612447A CN202011638796.5A CN202011638796A CN112612447A CN 112612447 A CN112612447 A CN 112612447A CN 202011638796 A CN202011638796 A CN 202011638796A CN 112612447 A CN112612447 A CN 112612447A
- Authority
- CN
- China
- Prior art keywords
- row
- matrix
- data
- multiply
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 178
- 238000004364 calculation method Methods 0.000 title claims abstract description 44
- 238000009825 accumulation Methods 0.000 claims abstract description 38
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000000034 method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 241001442055 Vipera berus Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/491—Computations with decimal numbers radix 12 or 20.
- G06F7/498—Computations with decimal numbers radix 12 or 20. using counter-type accumulators
- G06F7/4983—Multiplying; Dividing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/509—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a matrix calculator and a full-connection layer calculation method based on the matrix calculator, wherein the matrix calculator comprises H rows and W columns of multiply-accumulate units, each multiply-accumulate unit comprises a multiplier and an accumulator, each row of multiply-accumulate units is provided with an addition tree and a row of accumulation registers, and the addition tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate units and accumulating the current sum to the row accumulation registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The matrix calculator provided by the invention can efficiently realize matrix multiplication, particularly under the condition that one dimension of a result matrix is small, for example, under the condition that the result matrix only has one row or one column, the matrix calculator can efficiently utilize a hardware multiplier array to achieve the effect of improving the calculation efficiency.
Description
Technical Field
The invention relates to the technical field of integrated circuits, in particular to a matrix calculator and a full-connection layer calculation method based on the matrix calculator.
Background
Matrix multiplication is the most basic operation in linear algebra and is commonly applied in the fields of image processing, artificial neural networks, deep learning and the like. Especially in the deep learning field, most of the operations are converted into matrix multiplication operations. For convolution layers, a convolution kernel is expanded, input layers participating in convolution operation are also expanded, and the convolution operation can be converted into ordinary matrix operation.
The matrix operation requires a large number of repeated multiply-accumulate calculations, a large amount of data needs to be read in, and a large number of calculation results are written out. The traditional CPU can only carry out multiplication calculation once in one period, and is not suitable for a calculation intensive algorithm of matrix multiplication. Various hardware acceleration circuits are designed for matrix operation, a common method is to use a multiply-accumulate unit array for calculation, fig. 1 is a common multiply-accumulate array arrangement mode, wherein one MAC represents a multiply-accumulate device.
The matrix calculator shown in fig. 1 can perform calculation of type D ═ a × B + C, and each row of the matrix calculator inputs row data corresponding to the left matrix a and each column inputs column data corresponding to the right matrix B. In each calculation period, inputting a new data into each row, and broadcasting the new data to all multiply-accumulate units of the row; meanwhile, a new data is input into each column, the new data is broadcast to all multiply-accumulate units of the column, and each multiply-accumulate unit performs multiplication calculation on the data received in the row direction and the column direction and accumulates the data in a local multiply-accumulate result register. When all the rows of the left matrix a are input to the matrix calculator, all the columns of the right matrix B should be input to the matrix calculator, i.e. the number of data in each row of the left matrix a should be equal to the number of data in each column of the right matrix B. At this time, each multiply-accumulate unit stores one result of the result matrix.
Of course, if the number of rows of the matrix A is greater than the number of rows of the matrix calculator and/or the number of columns of the matrix B is greater than the number of columns of the matrix calculator, the number of rows of the matrix A is greater than the number of columns of the matrix calculatorAnd carrying out block calculation on the matrix A and/or the matrix B, and calculating the result for multiple times. For example, if the number of rows of matrix a is M, the number of columns is N, the number of rows of matrix B is N, the number of columns is K, the number of rows of matrix calculator is H, and the number of columns is W, then at least one a × B operation is requiredThe acceleration performance of the matrix calculator is not ideal when M, K is small and N is large.
The full-connected layer is one of the most common layers in the deep learning model, for example, in the image recognition model based on deep learning, the last layer is basically the full-connected layer, the number of outputs of the full-connected layer is equal to the number of types of objects to be recognized (if the objects to be classified include 10 types, the full-connected layer should have 10 outputs), and the probability that an image is a certain type of image is represented after calculation by a softmax function.
The fully-connected layer can also be converted to a matrix calculation, and if the last layer of the deep learning model contains N elements and the second last layer contains M elements, the calculation of the fully-connected layer is equivalent to calculating the product of an N-row M-column matrix and an M-row 1-column matrix, or the product of a 1-row M-column matrix and an M-row N-column matrix. For such matrix calculation, the matrix calculator shown in fig. 1 can only use one column or one row in the array, and after the full link layer is converted into the matrix calculation, only one row or one column of the matrix is necessary, which makes the matrix calculator shown in fig. 1 inefficient for the full link layer calculation, and if the matrix calculator includes the H row and W column multiply-accumulate unit, the utilization rate is at most 1/H or 1/W.
For the fully-connected layer, the result matrix is only one vector, and whether the result matrix is a row vector or a column vector, M or K must have one 1, that is, the matrix calculator actually functions as a multiplier with only 1 row or 1 column, so the conventional matrix calculator shown in fig. 1 is not suitable for the calculation of the fully-connected layer in the deep learning model.
Disclosure of Invention
Aiming at the defects of the existing matrix calculator, the invention provides the matrix calculator and the full-connection layer calculation method based on the matrix calculator, and the calculation efficiency is improved.
A matrix calculator comprises a multiplication and accumulation unit with H rows and W columns, wherein the multiplication and accumulation unit comprises a multiplier and an accumulator, and is used for receiving data input in the row direction and the column direction, performing multiplication calculation and performing accumulation calculation through an internal accumulation register; each row of multiply-accumulate units is provided with an addition tree and a row of accumulate registers, wherein the addition tree is used for calculating the sum of the current calculation results of the row multiply-accumulate units and accumulating the current sum to the row accumulate registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit.
Further, the product of a matrix with M rows and N columns and a matrix with N rows and 1 columns is calculated by the multiplication and accumulation unit with H rows and W columns, wherein M > H, N > W, comprising the following steps:
step 2.1, sending H rows of left matrix data to the matrix calculator in each round, sending each row of data to the multiply-accumulate units in the corresponding row from the row direction, distributing N data in each row to the W multiply-accumulate units in the same row, and distributing each multiply-accumulate unit to the maximumA piece of data;
step 2.2, corresponding to step 2.1, each row of the matrix calculator is fed into 1 column of the right matrix from the column directionN data distributed on W multiply-accumulate units in the same row, and each multiply-accumulate unit is distributed at mostA piece of data;
step 2.3, in each cycle, only 1 data is sent in the row direction and the column direction of the multiplication accumulator, in each cycle, each multiplication accumulator in the matrix calculator carries out multiplication operation on the data sent in the row direction and the column direction, then the addition tree carries out full addition on the calculation result of the same-row multiplier, and the sum is summed with the last accumulation result in the row accumulation register, and the summation result is updated to the row accumulation register; at most pass throughAnd in each period, obtaining a multiplication result of the block submatrix and the right matrix, wherein the multiplication result is a matrix with H rows and 1 column.
Further, if M is an integer multiple of H and N is an integer multiple of W, each round of calculation is,
h rows of left matrix data are correspondingly sent into H rows of multipliers of the matrix calculator, and W multipliers in each row are sequentially sent into the first row of the corresponding row of the left matrixData, number oneData, number oneData … …,A piece of data;
correspondingly sending 1 column of right matrix data to H row multipliers of the matrix calculator, and sequentially sending W multipliers in each row to the 1 st column of the right matrixData, number oneData, number onePersonal data … …, secondA piece of data;
each multiplier is fed with 1 data per cycle and each multiplier receivesIndividual data needA period of time; in each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree performs full addition on the calculation result of the multiplier in the same row, sums with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes throughAnd periodically, the accumulated value stored in each row accumulation register is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
The matrix calculator provided by the invention can efficiently realize matrix multiplication, particularly under the condition that the result matrix has a smaller dimension, for example, under the condition that the result matrix has only one row or one column, the matrix calculator can efficiently utilize a hardware multiplier array to achieve the effect of improving the calculation efficiency.
Drawings
FIG. 1 is a prior art matrix calculator architecture;
FIG. 2 is a matrix calculator structure of the present invention;
FIG. 3 is a schematic diagram of an internal circuit of the multiply-accumulate unit;
FIG. 4 is a schematic diagram of a left matrix block;
FIG. 5 is a diagram of input data to a row multiplier.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The embodiments of the present invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Example 1
Taking a 4-row and 8-column matrix calculator as an example, the matrix calculator includes a 4-row and 8-column multiply-accumulate unit MAC, the multiply-accumulate unit MAC includes a multiplier MUL and an accumulator ADD, and an adder tree and a row of accumulator registers are arranged on each row of multiply-accumulate unit, as shown in fig. 2, the adder tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate unit and accumulating the current sum to the row accumulator register.
The matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The internal structure diagram of the multiply-accumulate unit MAC is shown in fig. 3, where module (i) represents an input from the row direction, module (ii) represents an input from the column direction, module (iii) represents a multiplier, module (iv) represents an output to the row-add tree, module (iv) represents an adder, and module (iv) represents a multiply-accumulate register, and stores a multiply-accumulate result of the multiply-accumulate unit. When the first control circuit and the second control circuit are both effective, the module does not work, and the module works; when the first control circuit and the second control circuit are both invalid, the module does not work, and the module works.
The following discusses the operation steps of the matrix calculator of fig. 2 to calculate the product of an M-row N-column matrix and an N-row 1-column matrix. The present invention mainly discusses the case of M > H, N > W, since in this embodiment H is 4 and W is 8, without loss of generality, this embodiment sets M is 12 and N is 32, i.e. the product of a left matrix with 12 rows and 32 columns and a right matrix with 32 rows and 1 columns is calculated by a 4-row and 8-column multiply-accumulate unit.
First, the adder tree and the row accumulation registers of all rows are controlled to work by the first control circuit, and the accumulators in all multiply-accumulate units are disabled by the second control circuit.
The left matrix has 12 rows and the matrix calculator has only 4 rows of multiply-accumulate units, for which the left matrix is divided into 3 blocks in 4 row units, as shown in fig. 4; then, 4 rows of data are sent to the matrix calculator in each round, 32 data in each row are distributed on 8 multiply-accumulate units in the same row, 4 data are distributed to each multiply-accumulate unit, 1 data is sent in each period, and 4 periods are finished; the data can be completely processed after 3 rounds of 12 rows.
Meanwhile, each round of matrix calculator sends 32 data of 1 column of the right matrix to each row, 32 data are distributed on 8 multiply-accumulate units in the same row, similarly, each multiply-accumulate unit distributes 4 data, 1 data is sent in each period, and 4 periods are finished.
In each period, each multiply-accumulate unit in the matrix calculator performs multiplication operation on data sent in the row direction (left matrix) and the column direction (right matrix), then the addition tree performs full addition on the calculation result of the multiply-accumulate unit in the same row, sums the calculation result with the last accumulation result in the row accumulation register, and updates the summation result to the row accumulation register, which is shown in fig. 5.
After 4 cycles of the 1 st round, the stored result in the row accumulating register of the 1 st row is the multiplication result of the 1 st row of the left matrix and the 1 st column of the right matrix, namely the result of the 1 st row of the result matrix; the result stored in the row 2 accumulation register is the multiplication result of the row 2 of the left matrix and the column 1 of the right matrix, namely the result of the row 2 of the result matrix; the stored result in the row 3 accumulation register is the multiplication result of the row 3 of the left matrix and the column 1 of the right matrix, namely the result of the row 3 of the result matrix; the row 4 accumulator register stores the result, i.e. the multiplication result of the row 4 of the left matrix and the column 1 of the right matrix, i.e. the result of the row 4 of the result matrix.
After 4 cycles of the 2 nd round, the stored result in the row accumulating register of the 1 st row is the multiplication result of the 5 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 5 th row of the result matrix; the result stored in the row 2 accumulation register is the multiplication result of the row 6 of the left matrix and the column 1 of the right matrix, namely the result of the row 6 of the result matrix; the stored result in the row 3 accumulation register is the multiplication result of the 7 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 7 th row of the result matrix; the row 4 accumulator register stores the result, i.e. the result of multiplying the row 8 of the left matrix by the column 1 of the right matrix, i.e. the result of row 8 of the result matrix. And the 3 rd round is also omitted for brevity.
In this embodiment, M is an integer multiple of H, N is an integer multiple of W, and it is easy to think that even if the M is not an integer multiple, the application of the present invention to the calculation of the full connection layer is not affected, for example, if the left matrix is 10 rows, then only two rows of data need to be input in the 3 rd round; if the left matrix is 30 columns, the last multiply-accumulate unit in each row only inputs data in the first 2 cycles of 4 cycles.
If the left matrix has M rows, the left matrix needs to be subjected to matrix partitioning, each round is sent to the H rows of the matrix calculator, W data of the H rows are respectively sent to corresponding rows of the H row multiplication accumulation unit of the matrix calculator in each period, and each round needs to be subjected to matrix partitioningIn one cycle, multiplication of the left matrix H rows by the right matrix 1 columns can be completed. ThroughThe operation of the whole matrix can be completed by the round operation, namely the operation is neededThe matrix operation is completed in each cycle. However, if a conventional matrix calculator is used for the calculation, it is necessary to calculateNamely, it isAnd (4) one period. Compared with the prior art, the matrix calculator provided by the invention can realize the efficiency improvement of about W times.
It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art and related arts based on the embodiments of the present invention without any creative effort, shall fall within the protection scope of the present invention.
Claims (3)
1. A matrix calculator comprises H rows and W columns of multiply-accumulate units, wherein each multiply-accumulate unit comprises a multiplier and an accumulator, and is characterized in that each multiply-accumulate unit is provided with an addition tree and a row of accumulation registers, and the addition tree is used for calculating the sum of the current calculation results of the row multiply-accumulate unit and accumulating the current sum to the row accumulation registers;
the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit.
2. The full link layer calculation method of the matrix calculator of claim 1, wherein the product of a matrix with M rows and N columns and a matrix with N rows and 1 columns is calculated by the multiply-accumulate unit with H rows and W columns, wherein M > H, N > W, comprising the steps of:
step 1, controlling the operation of the addition trees and the row accumulation registers of all rows through a first control circuit, and forbidding the accumulators in all multiply-accumulate units through a second control circuit;
step 2, partitioning the left matrix by H line unit, calculating a multiplication result of a partitioned submatrix and the right matrix in each round, and maximizingAnd (4) completing the whole matrix multiplication operation in turn, wherein each round of calculation process is as follows:
step 2.1, sending H rows of left matrix data to the matrix calculator in each round, sending each row of data to the multiply-accumulate units in the corresponding row from the row direction, distributing N data in each row to the W multiply-accumulate units in the same row, and distributing each multiply-accumulate unit to the maximumA piece of data;
step 2.2, corresponding to step 2.1, each row of the matrix calculator is fed with N data of 1 column of the right matrix from the column direction, N data are distributed on W multiply-accumulate units in the same row, and each multiply-accumulate unit is distributed at mostA piece of data;
step 2.3, in each cycle, only 1 data is sent in the row direction and the column direction of the multiplication accumulator, in each cycle, each multiplication accumulator in the matrix calculator carries out multiplication operation on the data sent in the row direction and the column direction, then the addition tree carries out full addition on the calculation result of the same-row multiplier, and the sum is summed with the last accumulation result in the row accumulation register, and the summation result is updated to the row accumulation register; at most pass throughAnd in each period, obtaining a multiplication result of the block submatrix and the right matrix, wherein the multiplication result is a matrix with H rows and 1 column.
3. The full link layer calculation method according to claim 2, wherein if M is an integer multiple of H and N is an integer multiple of W, each calculation pass is,
h rows of left matrix data are correspondingly sent into H rows of multipliers of the matrix calculator, and W multipliers in each row are sequentially sent into the first row of the corresponding row of the left matrixData, number oneData, line 1Data … …,A piece of data;
correspondingly sending 1 column of right matrix data to H row multipliers of the matrix calculator, and sequentially sending W multipliers in each row to the 1 st column of the right matrixData, number oneData, number onePersonal data … …, secondA piece of data;
each multiplier is fed with 1 data per cycle and each multiplier receivesIndividual data needA period of time; in each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree performs full addition on the calculation result of the multiplier in the same row, sums with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes throughIn each period, the accumulated value stored in each row accumulation register is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638796.5A CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638796.5A CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112612447A true CN112612447A (en) | 2021-04-06 |
CN112612447B CN112612447B (en) | 2023-12-08 |
Family
ID=75253190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011638796.5A Active CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112612447B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092810A (en) * | 1999-09-24 | 2001-04-06 | Nippon Telegr & Teleph Corp <Ntt> | Complex multiplier and complex correlator |
CN2674771Y (en) * | 2001-12-28 | 2005-01-26 | 交互数字技术公司 | Sub-station for calculating CDMA system transmission matrix coefficient |
JP2009181293A (en) * | 2008-01-30 | 2009-08-13 | Yamaha Corp | Matrix operation co-processor |
CN108733348A (en) * | 2017-04-21 | 2018-11-02 | 上海寒武纪信息科技有限公司 | The method for merging vector multiplier and carrying out operation using it |
CN109271138A (en) * | 2018-08-10 | 2019-01-25 | 合肥工业大学 | A kind of chain type multiplication structure multiplied suitable for big dimensional matrix |
CN109284475A (en) * | 2018-09-20 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of matrix convolution computing module and matrix convolution calculation method |
CN109992743A (en) * | 2017-12-29 | 2019-07-09 | 华为技术有限公司 | Matrix multiplier |
US20190266217A1 (en) * | 2018-02-27 | 2019-08-29 | Fujitsu Limited | Apparatus and method for matrix computation |
CN111767079A (en) * | 2019-03-30 | 2020-10-13 | 英特尔公司 | Apparatus, method, and system for transpose instruction for matrix manipulation accelerator |
CN111767994A (en) * | 2019-04-01 | 2020-10-13 | 中国科学院半导体研究所 | Neuron calculation module |
-
2020
- 2020-12-31 CN CN202011638796.5A patent/CN112612447B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092810A (en) * | 1999-09-24 | 2001-04-06 | Nippon Telegr & Teleph Corp <Ntt> | Complex multiplier and complex correlator |
CN2674771Y (en) * | 2001-12-28 | 2005-01-26 | 交互数字技术公司 | Sub-station for calculating CDMA system transmission matrix coefficient |
JP2009181293A (en) * | 2008-01-30 | 2009-08-13 | Yamaha Corp | Matrix operation co-processor |
CN108733348A (en) * | 2017-04-21 | 2018-11-02 | 上海寒武纪信息科技有限公司 | The method for merging vector multiplier and carrying out operation using it |
CN109992743A (en) * | 2017-12-29 | 2019-07-09 | 华为技术有限公司 | Matrix multiplier |
US20190266217A1 (en) * | 2018-02-27 | 2019-08-29 | Fujitsu Limited | Apparatus and method for matrix computation |
CN109271138A (en) * | 2018-08-10 | 2019-01-25 | 合肥工业大学 | A kind of chain type multiplication structure multiplied suitable for big dimensional matrix |
CN109284475A (en) * | 2018-09-20 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of matrix convolution computing module and matrix convolution calculation method |
CN111767079A (en) * | 2019-03-30 | 2020-10-13 | 英特尔公司 | Apparatus, method, and system for transpose instruction for matrix manipulation accelerator |
CN111767994A (en) * | 2019-04-01 | 2020-10-13 | 中国科学院半导体研究所 | Neuron calculation module |
Non-Patent Citations (2)
Title |
---|
周昔平, 高德远, 樊晓桠: "乘累加运算器的高性能解决方案", 微电子学与计算机, no. 11, pages 58 - 62 * |
张琦;陈婧;何明华;: "基于总体性能优化的矩阵乘法器设计与实现", 福州大学学报(自然科学版), no. 01, pages 21 - 24 * |
Also Published As
Publication number | Publication date |
---|---|
CN112612447B (en) | 2023-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110222308B (en) | Matrix multiplication matrix operation method and device | |
CN107301456B (en) | Deep neural network multi-core acceleration implementation method based on vector processor | |
US20190095776A1 (en) | Efficient data distribution for parallel processing | |
US20190205738A1 (en) | Systems and methods for hardware-based pooling | |
KR101788829B1 (en) | Convolutional neural network computing apparatus | |
CN107633297B (en) | Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm | |
US20210241071A1 (en) | Architecture of a computer for calculating a convolution layer in a convolutional neural network | |
EP3674982A1 (en) | Hardware accelerator architecture for convolutional neural network | |
CN111626403B (en) | Convolutional neural network accelerator based on CPU-FPGA memory sharing | |
CN110766128A (en) | Convolution calculation unit, calculation method and neural network calculation platform | |
CN110580519B (en) | Convolution operation device and method thereof | |
CN108960414B (en) | Method for realizing single broadcast multiple operations based on deep learning accelerator | |
CN116167424B (en) | CIM-based neural network accelerator, CIM-based neural network accelerator method, CIM-based neural network storage processing system and CIM-based neural network storage processing equipment | |
CN116702851A (en) | Pulsation array unit and pulsation array structure suitable for weight multiplexing neural network | |
CN111639701B (en) | Method, system and equipment for extracting image features and readable storage medium | |
Chang et al. | VSCNN: Convolution neural network accelerator with vector sparsity | |
CN111275167A (en) | High-energy-efficiency pulse array framework for binary convolutional neural network | |
CN116362314A (en) | Integrated storage and calculation device and calculation method | |
CN112612447B (en) | Matrix calculator and full-connection layer calculating method based on same | |
CN113743046B (en) | Integrated layout structure for memory and calculation and integrated layout structure for data splitting and memory and calculation | |
CN113627587A (en) | Multichannel convolutional neural network acceleration method and device | |
CN112836793B (en) | Floating point separable convolution calculation accelerating device, system and image processing method | |
CN112115665B (en) | Integrated memory array and convolution operation method thereof | |
CN113110822A (en) | Configurable matrix multiplication device and algorithm | |
CN112668709B (en) | Computing device and method for data reuse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |