CN112612447B - Matrix calculator and full-connection layer calculating method based on same - Google Patents
Matrix calculator and full-connection layer calculating method based on same Download PDFInfo
- Publication number
- CN112612447B CN112612447B CN202011638796.5A CN202011638796A CN112612447B CN 112612447 B CN112612447 B CN 112612447B CN 202011638796 A CN202011638796 A CN 202011638796A CN 112612447 B CN112612447 B CN 112612447B
- Authority
- CN
- China
- Prior art keywords
- row
- matrix
- data
- multiplication
- column
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 180
- 238000000034 method Methods 0.000 title claims abstract description 11
- 238000009825 accumulation Methods 0.000 claims abstract description 64
- 238000004364 calculation method Methods 0.000 claims abstract description 31
- 230000000694 effects Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000013136 deep learning model Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/491—Computations with decimal numbers radix 12 or 20.
- G06F7/498—Computations with decimal numbers radix 12 or 20. using counter-type accumulators
- G06F7/4983—Multiplying; Dividing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
- G06F7/509—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a matrix calculator and a full-connection layer calculating method based on the matrix calculator, wherein the matrix calculator comprises H rows and W columns of multiply-accumulate units, the multiply-accumulate units comprise multipliers and accumulators, an adding tree and a row of accumulate registers are arranged on each row of multiply-accumulate units, and the adding tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate units and accumulating the current sum into the row of accumulate registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The matrix calculator provided by the invention can efficiently realize matrix multiplication, especially when the result matrix has a smaller dimension, for example, when the result matrix has only one row or one column, the invention can efficiently utilize the hardware multiplier array, thereby achieving the effect of improving the calculation efficiency.
Description
Technical Field
The invention relates to the technical field of integrated circuits, in particular to a matrix calculator and a full-connection layer calculating method based on the matrix calculator.
Background
Matrix multiplication is the most basic operation in linear algebra, and is widely applied in the fields of image processing, artificial neural networks, deep learning and the like. Especially in the field of deep learning, most operations are converted into matrix multiplication operations. For the convolution layers, the convolution kernels are spread, the input layers participating in the convolution operation are also spread, and the convolution operation can be converted into common matrix operation.
The matrix operation requires a large number of repeated multiply-accumulate calculations, a large amount of data is read in, and a large amount of calculation results are written out. Conventional CPUs typically perform only one multiplication operation per cycle, and are not suitable for computationally intensive algorithms such as matrix multiplication. For matrix operations, various hardware acceleration circuits are designed, and a common method is to use a multiply-accumulate unit array to perform computation, and fig. 1 is a common arrangement mode of the multiply-accumulate array, where one MAC represents a multiply-accumulate device.
The matrix calculator shown in fig. 1 may perform a calculation of the type d=a×b+c, where each row of the matrix calculator inputs row data corresponding to the left matrix a and each column inputs column data corresponding to the right matrix B. Each computing period, inputting a new data into each row, and broadcasting the new data to all multiplication and accumulation units of the row; at the same time, a new data is input to each column, broadcast to all multiply-accumulate units of the column, each multiply-accumulate unit multiplies the data received in its row and column directions and accumulates to a local multiply-accumulate result register. When all the rows of the left matrix a are input to the matrix calculator, all the column data of the right matrix B should also be input to the matrix calculator, i.e. the number of data per row of the left matrix a should be equal to the number of data per column of the right matrix B. At this time, each multiply-accumulate unit stores one result of the result matrix.
Of course, if the number of rows of the matrix a is greater than the number of rows of the matrix calculator and/or the number of columns of the matrix B is greater than the number of columns of the matrix calculator, the matrix a and/or the matrix B need to be calculated in a blocking manner, and the result is calculated multiple times. For example, if the number of rows of matrix a is M, the number of columns is N, the number of rows of matrix B is N, the number of columns is K, the number of rows of matrix calculator is H, and the number of columns is W, then at least one a×b operation is requiredThe acceleration performance of the matrix calculator is not ideal when M, K is relatively small and N is relatively large.
The fully connected layer is one of the most common layers in the deep learning model, for example, in the image recognition model based on deep learning, the last layer is basically the fully connected layer, the output number of the fully connected layer is equal to the number of the types of the objects to be recognized (if the objects to be classified contain 10 types, the fully connected layer has 10 outputs), and the probability that the representative image is a certain type of image is calculated through a softmax function.
The fully connected layer can also be converted into matrix calculation, and if the last layer of the deep learning model contains N elements and the last layer contains M elements, the calculation of the fully connected layer is equivalent to calculating the product of an N-row M-column matrix and an M-row 1-column matrix, or the product of a 1-row M-column matrix and an M-row N-column matrix. For such matrix computation, the matrix calculator shown in fig. 1 can only use one column or one row in the array, and after the full-connection layer is converted into matrix computation, there must be one matrix with only one row or one column, which makes the matrix calculator shown in fig. 1 not efficient for full-connection layer computation, and if the matrix calculator includes an H row and W column multiply-accumulate unit, the utilization rate is at most 1/H or 1/W.
For the fully connected layer, the result matrix is only one vector, and no matter whether the result matrix is a row vector or a column vector, M or K must have one 1, i.e. the multiplier actually functioning as the matrix calculator has only 1 row or 1 column, so the conventional matrix calculator shown in fig. 1 is not suitable for the calculation of the fully connected layer in the deep learning model.
Disclosure of Invention
Aiming at the defects of the existing matrix calculator, the invention provides a matrix calculator and a full-connection layer calculating method based on the matrix calculator, and the calculating efficiency of the matrix calculator is improved.
A matrix calculator comprises an H-row W-column multiply-accumulate unit, wherein the multiply-accumulate unit comprises a multiplier and an accumulator, and the multiply-accumulate unit is used for receiving data input in the row direction and the column direction, carrying out multiply calculation and carrying out accumulation calculation through an internal accumulation register; setting an addition tree and a row of accumulation registers on each row of multiplication accumulation units, wherein the addition tree is used for calculating the sum of the current calculation results of the row of multiplication accumulation units and accumulating the current sum into the row of accumulation registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit.
Further, the product of an M row and N column matrix and an N row and 1 column matrix is calculated by the H row and W column multiply-accumulate unit, wherein M > H, N > W, comprising the steps of:
step 1, controlling the addition tree and the row accumulation register of all rows to work through a first control circuit, and disabling the accumulator in all multiplication accumulation units through a second control circuit;
step 2, the left matrix is partitioned by H line units, and the multiplication result of a partitioned sub-matrix and the right matrix is calculated in each round, at mostAnd (3) a round of completing the whole matrix multiplication operation, wherein each round of calculation process comprises the following steps:
step 2.1, each round sends H rows of left matrix data to the matrix calculator, each row of data is sent to the multiplication and accumulation units of the corresponding row from the row direction, N data of each row are distributed on W multiplication and accumulation units of the same row, and each multiplication and accumulation unit is distributed at mostData;
step 2.2, corresponding to step 2.1, each row of each round of matrix calculator is fed with 1 column of N data of right matrix from column direction, N data are distributed on W multiplication accumulation units of same row, and each multiplication accumulation unit is distributed at mostData;
step 2.3, each cycle of each round, only 1 data is sent in the row direction and the column direction of the multiplication accumulator, each multiplication accumulator in the matrix calculator multiplies the data sent in the row direction and the column direction in each cycle, and then the addition tree fully adds the calculation results of the same-row multipliersSumming with the last accumulated result in the row accumulation register, and updating the summed result to the row accumulation register; at most pass throughAnd (3) obtaining a multiplication result of the block submatrix and the right matrix in a period, wherein the multiplication result is an H row and 1 column matrix.
Further, if M is an integer multiple of H and N is an integer multiple of W, each round of calculation process is,
h lines of left matrix data are correspondingly sent to H lines of multipliers of a matrix calculator, and W multipliers in each line are sequentially sent to the corresponding line of the left matrixPersonal data, th->Personal data, th->Data, … …,Data;
correspondingly feeding 1 column of right matrix data into H-row multipliers of a matrix calculator, and sequentially feeding W multipliers of each row into 1 column of the right matrixPersonal data, th->Personal data, th->Data, … …, thData;
each multiplier receives 1 data per cyclePersonal data need->A cycle; each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree adds the calculation results of the same-row multipliers completely, sums the calculation results with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes through->And (3) periodically, the accumulated value stored in each row of accumulation registers is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
The matrix calculator provided by the invention can efficiently realize matrix multiplication, especially when the result matrix has a smaller dimension, for example, when the result matrix has only one row or one column, the invention can efficiently utilize the hardware multiplier array to achieve the effect of improving the calculation efficiency, and in the full-connection layer calculation of the deep learning model, the situation that the result matrix has only one row or one column is obtained after the conversion into matrix calculation.
Drawings
FIG. 1 is a diagram of a prior art matrix calculator architecture;
FIG. 2 is a diagram of a matrix calculator architecture of the present invention;
FIG. 3 is a schematic diagram of the internal circuit of the multiply-accumulate unit;
FIG. 4 is a block diagram of a left matrix;
fig. 5 is a schematic diagram of input data for a row of multipliers.
Detailed Description
The invention will be described in further detail with reference to the drawings and the detailed description. The embodiments of the invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Example 1
Taking a 4 row and 8 column matrix calculator as an example, the matrix calculator comprises 4 rows and 8 columns of multiply-accumulate units MAC, wherein the multiply-accumulate units MAC comprise multipliers MUL and accumulators ADD, an adding tree and a row of accumulation registers are arranged on each row of multiply-accumulate units, and the adding tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate units and adding the current sum to the row of accumulation registers as shown in fig. 2.
The matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The internal structure of the multiply-accumulate unit MAC is shown in fig. 3, where block (1) represents the input from the row direction, block (2) represents the input from the column direction, block (3) represents the multiplier, block (4) represents the output to the row-add tree, block (5) represents the adder, block (6) represents the multiply-accumulate register, and the multiply-accumulate result of one multiply-accumulate unit is stored. When both the first control circuit and the second control circuit are active, the modules (5) (6) are inactive, and the modules (1) (2) (3) (4) are active; when both the first control circuit and the second control circuit are inactive, the module (4) is inactive and the modules (1) (2) (3) (5) (6) are active.
The operation of the matrix calculator of fig. 2 to calculate the product of an M row N column matrix and an N row 1 column matrix is discussed below. The present invention mainly discusses the case where M > H, N > W, since h=4, w=8 in the present embodiment, without losing generality, the present embodiment sets m=12, n=32, that is, the product of a 12 row and 32 column left matrix and a 32 row and 1 column right matrix is calculated by the 4 row and 8 column multiply-accumulate unit.
First, the first control circuit controls the operation of the addition tree and the row accumulation register of all rows, and the second control circuit disables the accumulators in all multiply-accumulate units.
The left matrix has 12 rows and the matrix calculator has only 4 rows of multiply-accumulate units, for which the left matrix is divided into 3 blocks in 4 row units, as shown in fig. 4; then 4 lines of data are sent to the matrix calculator every round, 32 data of each line are distributed on 8 multiplication and accumulation units of the same line, 4 data are distributed to each multiplication and accumulation unit, 1 data are sent to each period, and 4 periods are finished; the data can be completely processed after 3 rounds of 12 rows.
At the same time, each row of each round of matrix calculator is fed with 1 column of 32 data of right matrix, and 32 data are distributed on 8 multiplication-accumulation units of the same row, and similarly, each multiplication-accumulation unit is distributed with 4 data, and each cycle is fed with 1 data, and 4 cycles are fed.
Each multiplication and accumulation unit in the matrix calculator multiplies the data fed in the row direction (left matrix) and the column direction (right matrix) every period, then the addition tree fully adds the calculation results of the same-row multiplication and accumulation units, sums the calculation results with the last accumulation result in the row accumulation register, and updates the summation result to the row accumulation register, referring to fig. 5.
After the 1 st round for 4 periods, the stored result in the 1 st row accumulation register is the multiplied result of the 1 st row of the left matrix and the 1 st column of the right matrix, namely the result of the 1 st row of the result matrix; the storage result in the 2 nd row accumulation register is the multiplication result of the 2 nd row of the left matrix and the 1 st column of the right matrix, namely the result of the 2 nd row of the result matrix; the stored result in the 3 rd row accumulation register is the multiplied result of the 3 rd row of the left matrix and the 1 st column of the right matrix, namely the result of the 3 rd row of the result matrix; the stored result in the 4 th row accumulation register is the multiplied result of the 4 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 4 th row of the result matrix.
After the 2 nd round of 4 cycles, the stored result in the 1 st row accumulation register is the multiplied result of the 5 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 5 th row of the result matrix; the storage result in the 2 nd row accumulation register is the multiplication result of the 6 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 6 th row of the result matrix; the stored result in the 3 rd row accumulation register is the multiplied result of the 7 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 7 th row of the result matrix; the result stored in the 4 th row accumulation register is the result of multiplying the 8 th row of the left matrix by the 1 st column of the right matrix, namely the result of the 8 th row of the result matrix. The 3 rd round is vice versa and will not be described again.
In this embodiment, M is an integer multiple of H, N is an integer multiple of W, and it is easy to think that even though the relationship is not a multiple, the application of the present invention in full-connection layer calculation is not affected, for example, if the left matrix is 10 rows, then the 3 rd round only needs to input two rows of data; if the left matrix is 30 columns, the last multiply-accumulate unit of each row only needs to input data in the first 2 periods of 4 periods.
If the left matrix has M rows, the left matrix needs to be subjected to matrix blocking, each round of the left matrix is sent to H rows of a matrix calculator, W data of the H rows are respectively sent to corresponding rows of H row multiplication accumulation units of the matrix calculator in each period, and each round of the left matrix needs to be subjected to matrix blockingThe multiplication of the left matrix H row and the right matrix 1 column can be completed in one period. Through->The round operation can complete the operation of the whole matrix, namely, the common need +.>The matrix operation is completed in one period. However, if the calculation is performed using a conventional matrix calculator, it is necessary to +.>I.e. < ->A cycle. In contrast, the matrix calculator according to the present invention can achieve an efficiency improvement of about W times.
It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art and which are included in the embodiments of the present invention without the inventive step, are intended to be within the scope of the present invention.
Claims (2)
1. A full connection layer computing method of a matrix calculator is characterized in that,
the matrix calculator comprises H rows and W columns of multiplication and accumulation units, wherein the multiplication and accumulation units comprise multipliers and accumulators, and the matrix calculator is characterized in that an addition tree and a row of accumulation registers are arranged on each row of multiplication and accumulation unit, and the addition tree is used for calculating the sum of the current calculation results of the row of multiplication and accumulation units and accumulating the current sum into the row of accumulation registers;
the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit;
calculating the product of an M row and N column matrix and an N row and 1 column matrix by an H row and W column multiply-accumulate unit, wherein M is larger than H, N and larger than W, and the method comprises the following steps of:
step 1, controlling the addition tree and the row accumulation register of all rows to work through a first control circuit, and disabling the accumulator in all multiplication accumulation units through a second control circuit;
step 2, the left matrix is partitioned by H line units, and the multiplication result of a partitioned sub-matrix and the right matrix is calculated in each round, at mostAnd (3) a round of completing the whole matrix multiplication operation, wherein each round of calculation process comprises the following steps:
step 2.1, each round sends H rows of left matrix data to the matrix calculator, each row of data is sent to the multiplication and accumulation units of the corresponding row from the row direction, N data of each row are distributed on W multiplication and accumulation units of the same row, and each multiplication and accumulation unit is distributed at mostData;
step 2.2, corresponding to step 2.1, each row of each round of matrix calculator is fed with 1 column of N data of right matrix from column direction, N data are distributed on W multiplication accumulation units of same row, and each multiplication accumulation unit is distributed at mostData;
step 2.3, each cycle of each round, only 1 data is sent to the multiplication accumulator in the row direction and the column direction, each multiplication accumulator in the matrix calculator performs multiplication operation on the data sent in the row direction and the column direction in each cycle, then the addition tree performs full addition on the calculation results of the same-row multipliers, performs summation with the last accumulation result in the row accumulation register, and updates the summation result to the row accumulation register; at most pass throughAnd (3) obtaining a multiplication result of the block submatrix and the right matrix in a period, wherein the multiplication result is an H row and 1 column matrix.
2. The method of claim 1, wherein if M is an integer multiple of H and N is an integer multiple of W, each round of calculation is,
h lines of left matrix data are correspondingly sent to H lines of multipliers of a matrix calculator, and W multipliers in each line are sequentially sent to the corresponding line of the left matrixPersonal data, th->Data, line 1->Data, … …,Data;
correspondingly feeding 1 column of right matrix data into H-row multipliers of a matrix calculator, and sequentially feeding W multipliers of each row into 1 column of the right matrixPersonal data, th->Personal data, th->Data, … …, thData;
each multiplier receives 1 data per cyclePersonal data need->A cycle; each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree adds the calculation results of the same-row multipliers completely, sums the calculation results with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes through->And each period, the accumulated value stored in each row of accumulated registers is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638796.5A CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011638796.5A CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112612447A CN112612447A (en) | 2021-04-06 |
CN112612447B true CN112612447B (en) | 2023-12-08 |
Family
ID=75253190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011638796.5A Active CN112612447B (en) | 2020-12-31 | 2020-12-31 | Matrix calculator and full-connection layer calculating method based on same |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112612447B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092810A (en) * | 1999-09-24 | 2001-04-06 | Nippon Telegr & Teleph Corp <Ntt> | Complex multiplier and complex correlator |
CN2674771Y (en) * | 2001-12-28 | 2005-01-26 | 交互数字技术公司 | Sub-station for calculating CDMA system transmission matrix coefficient |
JP2009181293A (en) * | 2008-01-30 | 2009-08-13 | Yamaha Corp | Matrix operation co-processor |
CN108733348A (en) * | 2017-04-21 | 2018-11-02 | 上海寒武纪信息科技有限公司 | The method for merging vector multiplier and carrying out operation using it |
CN109271138A (en) * | 2018-08-10 | 2019-01-25 | 合肥工业大学 | A kind of chain type multiplication structure multiplied suitable for big dimensional matrix |
CN109284475A (en) * | 2018-09-20 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of matrix convolution computing module and matrix convolution calculation method |
CN109992743A (en) * | 2017-12-29 | 2019-07-09 | 华为技术有限公司 | Matrix multiplier |
CN111767994A (en) * | 2019-04-01 | 2020-10-13 | 中国科学院半导体研究所 | Neuron calculation module |
CN111767079A (en) * | 2019-03-30 | 2020-10-13 | 英特尔公司 | Apparatus, method, and system for transpose instruction for matrix manipulation accelerator |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019148969A (en) * | 2018-02-27 | 2019-09-05 | 富士通株式会社 | Matrix arithmetic device, matrix arithmetic method, and matrix arithmetic program |
-
2020
- 2020-12-31 CN CN202011638796.5A patent/CN112612447B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001092810A (en) * | 1999-09-24 | 2001-04-06 | Nippon Telegr & Teleph Corp <Ntt> | Complex multiplier and complex correlator |
CN2674771Y (en) * | 2001-12-28 | 2005-01-26 | 交互数字技术公司 | Sub-station for calculating CDMA system transmission matrix coefficient |
JP2009181293A (en) * | 2008-01-30 | 2009-08-13 | Yamaha Corp | Matrix operation co-processor |
CN108733348A (en) * | 2017-04-21 | 2018-11-02 | 上海寒武纪信息科技有限公司 | The method for merging vector multiplier and carrying out operation using it |
CN109992743A (en) * | 2017-12-29 | 2019-07-09 | 华为技术有限公司 | Matrix multiplier |
CN109271138A (en) * | 2018-08-10 | 2019-01-25 | 合肥工业大学 | A kind of chain type multiplication structure multiplied suitable for big dimensional matrix |
CN109284475A (en) * | 2018-09-20 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of matrix convolution computing module and matrix convolution calculation method |
CN111767079A (en) * | 2019-03-30 | 2020-10-13 | 英特尔公司 | Apparatus, method, and system for transpose instruction for matrix manipulation accelerator |
CN111767994A (en) * | 2019-04-01 | 2020-10-13 | 中国科学院半导体研究所 | Neuron calculation module |
Non-Patent Citations (2)
Title |
---|
周昔平,高德远,樊晓桠.乘累加运算器的高性能解决方案.微电子学与计算机.2002,(第11期),58-62. * |
张琦 ; 陈婧 ; 何明华 ; .基于总体性能优化的矩阵乘法器设计与实现.福州大学学报(自然科学版).2009,(第01期),21-24,64. * |
Also Published As
Publication number | Publication date |
---|---|
CN112612447A (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102523263B1 (en) | Systems and methods for hardware-based pooling | |
CN107633297B (en) | Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm | |
US8051124B2 (en) | High speed and efficient matrix multiplication hardware module | |
US20180174036A1 (en) | Hardware Accelerator for Compressed LSTM | |
US20210241071A1 (en) | Architecture of a computer for calculating a convolution layer in a convolutional neural network | |
EP3674982A1 (en) | Hardware accelerator architecture for convolutional neural network | |
CN110580519B (en) | Convolution operation device and method thereof | |
CN110766128A (en) | Convolution calculation unit, calculation method and neural network calculation platform | |
CN113033794B (en) | Light weight neural network hardware accelerator based on deep separable convolution | |
Nag et al. | ViTA: A vision transformer inference accelerator for edge applications | |
GB2601701A (en) | Performing dot product operations using a memristive crossbar array | |
CN116167424B (en) | CIM-based neural network accelerator, CIM-based neural network accelerator method, CIM-based neural network storage processing system and CIM-based neural network storage processing equipment | |
CN113918120A (en) | Computing device, neural network processing apparatus, chip, and method of processing data | |
CN116702851A (en) | Pulsation array unit and pulsation array structure suitable for weight multiplexing neural network | |
CN116362314A (en) | Integrated storage and calculation device and calculation method | |
CN107368459B (en) | Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication | |
CN112612447B (en) | Matrix calculator and full-connection layer calculating method based on same | |
WO2022016261A1 (en) | System and method for accelerating training of deep learning networks | |
CN113743046B (en) | Integrated layout structure for memory and calculation and integrated layout structure for data splitting and memory and calculation | |
CN113627587A (en) | Multichannel convolutional neural network acceleration method and device | |
CN111008697B (en) | Convolutional neural network accelerator implementation architecture | |
CN110765413A (en) | Matrix summation structure and neural network computing platform | |
CN112115665B (en) | Integrated memory array and convolution operation method thereof | |
CN111126580B (en) | Multi-precision weight coefficient neural network acceleration chip arithmetic device adopting Booth coding | |
CN113110822A (en) | Configurable matrix multiplication device and algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |