Nothing Special   »   [go: up one dir, main page]

CN112612447B - Matrix calculator and full-connection layer calculating method based on same - Google Patents

Matrix calculator and full-connection layer calculating method based on same Download PDF

Info

Publication number
CN112612447B
CN112612447B CN202011638796.5A CN202011638796A CN112612447B CN 112612447 B CN112612447 B CN 112612447B CN 202011638796 A CN202011638796 A CN 202011638796A CN 112612447 B CN112612447 B CN 112612447B
Authority
CN
China
Prior art keywords
row
matrix
data
multiplication
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011638796.5A
Other languages
Chinese (zh)
Other versions
CN112612447A (en
Inventor
林广栋
黄光红
张笑
顾大晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Core Century Technology Co ltd
Original Assignee
Anhui Core Century Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Core Century Technology Co ltd filed Critical Anhui Core Century Technology Co ltd
Priority to CN202011638796.5A priority Critical patent/CN112612447B/en
Publication of CN112612447A publication Critical patent/CN112612447A/en
Application granted granted Critical
Publication of CN112612447B publication Critical patent/CN112612447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators
    • G06F7/4983Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/505Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
    • G06F7/509Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination for multiple operands, e.g. digital integrators
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a matrix calculator and a full-connection layer calculating method based on the matrix calculator, wherein the matrix calculator comprises H rows and W columns of multiply-accumulate units, the multiply-accumulate units comprise multipliers and accumulators, an adding tree and a row of accumulate registers are arranged on each row of multiply-accumulate units, and the adding tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate units and accumulating the current sum into the row of accumulate registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The matrix calculator provided by the invention can efficiently realize matrix multiplication, especially when the result matrix has a smaller dimension, for example, when the result matrix has only one row or one column, the invention can efficiently utilize the hardware multiplier array, thereby achieving the effect of improving the calculation efficiency.

Description

Matrix calculator and full-connection layer calculating method based on same
Technical Field
The invention relates to the technical field of integrated circuits, in particular to a matrix calculator and a full-connection layer calculating method based on the matrix calculator.
Background
Matrix multiplication is the most basic operation in linear algebra, and is widely applied in the fields of image processing, artificial neural networks, deep learning and the like. Especially in the field of deep learning, most operations are converted into matrix multiplication operations. For the convolution layers, the convolution kernels are spread, the input layers participating in the convolution operation are also spread, and the convolution operation can be converted into common matrix operation.
The matrix operation requires a large number of repeated multiply-accumulate calculations, a large amount of data is read in, and a large amount of calculation results are written out. Conventional CPUs typically perform only one multiplication operation per cycle, and are not suitable for computationally intensive algorithms such as matrix multiplication. For matrix operations, various hardware acceleration circuits are designed, and a common method is to use a multiply-accumulate unit array to perform computation, and fig. 1 is a common arrangement mode of the multiply-accumulate array, where one MAC represents a multiply-accumulate device.
The matrix calculator shown in fig. 1 may perform a calculation of the type d=a×b+c, where each row of the matrix calculator inputs row data corresponding to the left matrix a and each column inputs column data corresponding to the right matrix B. Each computing period, inputting a new data into each row, and broadcasting the new data to all multiplication and accumulation units of the row; at the same time, a new data is input to each column, broadcast to all multiply-accumulate units of the column, each multiply-accumulate unit multiplies the data received in its row and column directions and accumulates to a local multiply-accumulate result register. When all the rows of the left matrix a are input to the matrix calculator, all the column data of the right matrix B should also be input to the matrix calculator, i.e. the number of data per row of the left matrix a should be equal to the number of data per column of the right matrix B. At this time, each multiply-accumulate unit stores one result of the result matrix.
Of course, if the number of rows of the matrix a is greater than the number of rows of the matrix calculator and/or the number of columns of the matrix B is greater than the number of columns of the matrix calculator, the matrix a and/or the matrix B need to be calculated in a blocking manner, and the result is calculated multiple times. For example, if the number of rows of matrix a is M, the number of columns is N, the number of rows of matrix B is N, the number of columns is K, the number of rows of matrix calculator is H, and the number of columns is W, then at least one a×b operation is requiredThe acceleration performance of the matrix calculator is not ideal when M, K is relatively small and N is relatively large.
The fully connected layer is one of the most common layers in the deep learning model, for example, in the image recognition model based on deep learning, the last layer is basically the fully connected layer, the output number of the fully connected layer is equal to the number of the types of the objects to be recognized (if the objects to be classified contain 10 types, the fully connected layer has 10 outputs), and the probability that the representative image is a certain type of image is calculated through a softmax function.
The fully connected layer can also be converted into matrix calculation, and if the last layer of the deep learning model contains N elements and the last layer contains M elements, the calculation of the fully connected layer is equivalent to calculating the product of an N-row M-column matrix and an M-row 1-column matrix, or the product of a 1-row M-column matrix and an M-row N-column matrix. For such matrix computation, the matrix calculator shown in fig. 1 can only use one column or one row in the array, and after the full-connection layer is converted into matrix computation, there must be one matrix with only one row or one column, which makes the matrix calculator shown in fig. 1 not efficient for full-connection layer computation, and if the matrix calculator includes an H row and W column multiply-accumulate unit, the utilization rate is at most 1/H or 1/W.
For the fully connected layer, the result matrix is only one vector, and no matter whether the result matrix is a row vector or a column vector, M or K must have one 1, i.e. the multiplier actually functioning as the matrix calculator has only 1 row or 1 column, so the conventional matrix calculator shown in fig. 1 is not suitable for the calculation of the fully connected layer in the deep learning model.
Disclosure of Invention
Aiming at the defects of the existing matrix calculator, the invention provides a matrix calculator and a full-connection layer calculating method based on the matrix calculator, and the calculating efficiency of the matrix calculator is improved.
A matrix calculator comprises an H-row W-column multiply-accumulate unit, wherein the multiply-accumulate unit comprises a multiplier and an accumulator, and the multiply-accumulate unit is used for receiving data input in the row direction and the column direction, carrying out multiply calculation and carrying out accumulation calculation through an internal accumulation register; setting an addition tree and a row of accumulation registers on each row of multiplication accumulation units, wherein the addition tree is used for calculating the sum of the current calculation results of the row of multiplication accumulation units and accumulating the current sum into the row of accumulation registers; the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit.
Further, the product of an M row and N column matrix and an N row and 1 column matrix is calculated by the H row and W column multiply-accumulate unit, wherein M > H, N > W, comprising the steps of:
step 1, controlling the addition tree and the row accumulation register of all rows to work through a first control circuit, and disabling the accumulator in all multiplication accumulation units through a second control circuit;
step 2, the left matrix is partitioned by H line units, and the multiplication result of a partitioned sub-matrix and the right matrix is calculated in each round, at mostAnd (3) a round of completing the whole matrix multiplication operation, wherein each round of calculation process comprises the following steps:
step 2.1, each round sends H rows of left matrix data to the matrix calculator, each row of data is sent to the multiplication and accumulation units of the corresponding row from the row direction, N data of each row are distributed on W multiplication and accumulation units of the same row, and each multiplication and accumulation unit is distributed at mostData;
step 2.2, corresponding to step 2.1, each row of each round of matrix calculator is fed with 1 column of N data of right matrix from column direction, N data are distributed on W multiplication accumulation units of same row, and each multiplication accumulation unit is distributed at mostData;
step 2.3, each cycle of each round, only 1 data is sent in the row direction and the column direction of the multiplication accumulator, each multiplication accumulator in the matrix calculator multiplies the data sent in the row direction and the column direction in each cycle, and then the addition tree fully adds the calculation results of the same-row multipliersSumming with the last accumulated result in the row accumulation register, and updating the summed result to the row accumulation register; at most pass throughAnd (3) obtaining a multiplication result of the block submatrix and the right matrix in a period, wherein the multiplication result is an H row and 1 column matrix.
Further, if M is an integer multiple of H and N is an integer multiple of W, each round of calculation process is,
h lines of left matrix data are correspondingly sent to H lines of multipliers of a matrix calculator, and W multipliers in each line are sequentially sent to the corresponding line of the left matrixPersonal data, th->Personal data, th->Data, … …,Data;
correspondingly feeding 1 column of right matrix data into H-row multipliers of a matrix calculator, and sequentially feeding W multipliers of each row into 1 column of the right matrixPersonal data, th->Personal data, th->Data, … …, thData;
each multiplier receives 1 data per cyclePersonal data need->A cycle; each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree adds the calculation results of the same-row multipliers completely, sums the calculation results with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes through->And (3) periodically, the accumulated value stored in each row of accumulation registers is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
The matrix calculator provided by the invention can efficiently realize matrix multiplication, especially when the result matrix has a smaller dimension, for example, when the result matrix has only one row or one column, the invention can efficiently utilize the hardware multiplier array to achieve the effect of improving the calculation efficiency, and in the full-connection layer calculation of the deep learning model, the situation that the result matrix has only one row or one column is obtained after the conversion into matrix calculation.
Drawings
FIG. 1 is a diagram of a prior art matrix calculator architecture;
FIG. 2 is a diagram of a matrix calculator architecture of the present invention;
FIG. 3 is a schematic diagram of the internal circuit of the multiply-accumulate unit;
FIG. 4 is a block diagram of a left matrix;
fig. 5 is a schematic diagram of input data for a row of multipliers.
Detailed Description
The invention will be described in further detail with reference to the drawings and the detailed description. The embodiments of the invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Example 1
Taking a 4 row and 8 column matrix calculator as an example, the matrix calculator comprises 4 rows and 8 columns of multiply-accumulate units MAC, wherein the multiply-accumulate units MAC comprise multipliers MUL and accumulators ADD, an adding tree and a row of accumulation registers are arranged on each row of multiply-accumulate units, and the adding tree is used for calculating the sum of the current calculation results of the row of multiply-accumulate units and adding the current sum to the row of accumulation registers as shown in fig. 2.
The matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit. The internal structure of the multiply-accumulate unit MAC is shown in fig. 3, where block (1) represents the input from the row direction, block (2) represents the input from the column direction, block (3) represents the multiplier, block (4) represents the output to the row-add tree, block (5) represents the adder, block (6) represents the multiply-accumulate register, and the multiply-accumulate result of one multiply-accumulate unit is stored. When both the first control circuit and the second control circuit are active, the modules (5) (6) are inactive, and the modules (1) (2) (3) (4) are active; when both the first control circuit and the second control circuit are inactive, the module (4) is inactive and the modules (1) (2) (3) (5) (6) are active.
The operation of the matrix calculator of fig. 2 to calculate the product of an M row N column matrix and an N row 1 column matrix is discussed below. The present invention mainly discusses the case where M > H, N > W, since h=4, w=8 in the present embodiment, without losing generality, the present embodiment sets m=12, n=32, that is, the product of a 12 row and 32 column left matrix and a 32 row and 1 column right matrix is calculated by the 4 row and 8 column multiply-accumulate unit.
First, the first control circuit controls the operation of the addition tree and the row accumulation register of all rows, and the second control circuit disables the accumulators in all multiply-accumulate units.
The left matrix has 12 rows and the matrix calculator has only 4 rows of multiply-accumulate units, for which the left matrix is divided into 3 blocks in 4 row units, as shown in fig. 4; then 4 lines of data are sent to the matrix calculator every round, 32 data of each line are distributed on 8 multiplication and accumulation units of the same line, 4 data are distributed to each multiplication and accumulation unit, 1 data are sent to each period, and 4 periods are finished; the data can be completely processed after 3 rounds of 12 rows.
At the same time, each row of each round of matrix calculator is fed with 1 column of 32 data of right matrix, and 32 data are distributed on 8 multiplication-accumulation units of the same row, and similarly, each multiplication-accumulation unit is distributed with 4 data, and each cycle is fed with 1 data, and 4 cycles are fed.
Each multiplication and accumulation unit in the matrix calculator multiplies the data fed in the row direction (left matrix) and the column direction (right matrix) every period, then the addition tree fully adds the calculation results of the same-row multiplication and accumulation units, sums the calculation results with the last accumulation result in the row accumulation register, and updates the summation result to the row accumulation register, referring to fig. 5.
After the 1 st round for 4 periods, the stored result in the 1 st row accumulation register is the multiplied result of the 1 st row of the left matrix and the 1 st column of the right matrix, namely the result of the 1 st row of the result matrix; the storage result in the 2 nd row accumulation register is the multiplication result of the 2 nd row of the left matrix and the 1 st column of the right matrix, namely the result of the 2 nd row of the result matrix; the stored result in the 3 rd row accumulation register is the multiplied result of the 3 rd row of the left matrix and the 1 st column of the right matrix, namely the result of the 3 rd row of the result matrix; the stored result in the 4 th row accumulation register is the multiplied result of the 4 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 4 th row of the result matrix.
After the 2 nd round of 4 cycles, the stored result in the 1 st row accumulation register is the multiplied result of the 5 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 5 th row of the result matrix; the storage result in the 2 nd row accumulation register is the multiplication result of the 6 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 6 th row of the result matrix; the stored result in the 3 rd row accumulation register is the multiplied result of the 7 th row of the left matrix and the 1 st column of the right matrix, namely the result of the 7 th row of the result matrix; the result stored in the 4 th row accumulation register is the result of multiplying the 8 th row of the left matrix by the 1 st column of the right matrix, namely the result of the 8 th row of the result matrix. The 3 rd round is vice versa and will not be described again.
In this embodiment, M is an integer multiple of H, N is an integer multiple of W, and it is easy to think that even though the relationship is not a multiple, the application of the present invention in full-connection layer calculation is not affected, for example, if the left matrix is 10 rows, then the 3 rd round only needs to input two rows of data; if the left matrix is 30 columns, the last multiply-accumulate unit of each row only needs to input data in the first 2 periods of 4 periods.
If the left matrix has M rows, the left matrix needs to be subjected to matrix blocking, each round of the left matrix is sent to H rows of a matrix calculator, W data of the H rows are respectively sent to corresponding rows of H row multiplication accumulation units of the matrix calculator in each period, and each round of the left matrix needs to be subjected to matrix blockingThe multiplication of the left matrix H row and the right matrix 1 column can be completed in one period. Through->The round operation can complete the operation of the whole matrix, namely, the common need +.>The matrix operation is completed in one period. However, if the calculation is performed using a conventional matrix calculator, it is necessary to +.>I.e. < ->A cycle. In contrast, the matrix calculator according to the present invention can achieve an efficiency improvement of about W times.
It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art and which are included in the embodiments of the present invention without the inventive step, are intended to be within the scope of the present invention.

Claims (2)

1. A full connection layer computing method of a matrix calculator is characterized in that,
the matrix calculator comprises H rows and W columns of multiplication and accumulation units, wherein the multiplication and accumulation units comprise multipliers and accumulators, and the matrix calculator is characterized in that an addition tree and a row of accumulation registers are arranged on each row of multiplication and accumulation unit, and the addition tree is used for calculating the sum of the current calculation results of the row of multiplication and accumulation units and accumulating the current sum into the row of accumulation registers;
the matrix calculator is provided with a first control circuit for controlling the addition tree and the row accumulation register of each row to work when the result matrix has only one row or one column; the matrix calculator is provided with a second control circuit for disabling the accumulator in the multiply-accumulate unit;
calculating the product of an M row and N column matrix and an N row and 1 column matrix by an H row and W column multiply-accumulate unit, wherein M is larger than H, N and larger than W, and the method comprises the following steps of:
step 1, controlling the addition tree and the row accumulation register of all rows to work through a first control circuit, and disabling the accumulator in all multiplication accumulation units through a second control circuit;
step 2, the left matrix is partitioned by H line units, and the multiplication result of a partitioned sub-matrix and the right matrix is calculated in each round, at mostAnd (3) a round of completing the whole matrix multiplication operation, wherein each round of calculation process comprises the following steps:
step 2.1, each round sends H rows of left matrix data to the matrix calculator, each row of data is sent to the multiplication and accumulation units of the corresponding row from the row direction, N data of each row are distributed on W multiplication and accumulation units of the same row, and each multiplication and accumulation unit is distributed at mostData;
step 2.2, corresponding to step 2.1, each row of each round of matrix calculator is fed with 1 column of N data of right matrix from column direction, N data are distributed on W multiplication accumulation units of same row, and each multiplication accumulation unit is distributed at mostData;
step 2.3, each cycle of each round, only 1 data is sent to the multiplication accumulator in the row direction and the column direction, each multiplication accumulator in the matrix calculator performs multiplication operation on the data sent in the row direction and the column direction in each cycle, then the addition tree performs full addition on the calculation results of the same-row multipliers, performs summation with the last accumulation result in the row accumulation register, and updates the summation result to the row accumulation register; at most pass throughAnd (3) obtaining a multiplication result of the block submatrix and the right matrix in a period, wherein the multiplication result is an H row and 1 column matrix.
2. The method of claim 1, wherein if M is an integer multiple of H and N is an integer multiple of W, each round of calculation is,
h lines of left matrix data are correspondingly sent to H lines of multipliers of a matrix calculator, and W multipliers in each line are sequentially sent to the corresponding line of the left matrixPersonal data, th->Data, line 1->Data, … …,Data;
correspondingly feeding 1 column of right matrix data into H-row multipliers of a matrix calculator, and sequentially feeding W multipliers of each row into 1 column of the right matrixPersonal data, th->Personal data, th->Data, … …, thData;
each multiplier receives 1 data per cyclePersonal data need->A cycle; each period, each multiplier multiplies the data sent in the row direction and the column direction, then the addition tree adds the calculation results of the same-row multipliers completely, sums the calculation results with the last accumulation result in the row accumulation register, updates the summation result to the row accumulation register, and passes through->And each period, the accumulated value stored in each row of accumulated registers is the result of multiplying one row of the left matrix by the 1 st column of the right matrix.
CN202011638796.5A 2020-12-31 2020-12-31 Matrix calculator and full-connection layer calculating method based on same Active CN112612447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011638796.5A CN112612447B (en) 2020-12-31 2020-12-31 Matrix calculator and full-connection layer calculating method based on same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011638796.5A CN112612447B (en) 2020-12-31 2020-12-31 Matrix calculator and full-connection layer calculating method based on same

Publications (2)

Publication Number Publication Date
CN112612447A CN112612447A (en) 2021-04-06
CN112612447B true CN112612447B (en) 2023-12-08

Family

ID=75253190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011638796.5A Active CN112612447B (en) 2020-12-31 2020-12-31 Matrix calculator and full-connection layer calculating method based on same

Country Status (1)

Country Link
CN (1) CN112612447B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092810A (en) * 1999-09-24 2001-04-06 Nippon Telegr & Teleph Corp <Ntt> Complex multiplier and complex correlator
CN2674771Y (en) * 2001-12-28 2005-01-26 交互数字技术公司 Sub-station for calculating CDMA system transmission matrix coefficient
JP2009181293A (en) * 2008-01-30 2009-08-13 Yamaha Corp Matrix operation co-processor
CN108733348A (en) * 2017-04-21 2018-11-02 上海寒武纪信息科技有限公司 The method for merging vector multiplier and carrying out operation using it
CN109271138A (en) * 2018-08-10 2019-01-25 合肥工业大学 A kind of chain type multiplication structure multiplied suitable for big dimensional matrix
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A kind of matrix convolution computing module and matrix convolution calculation method
CN109992743A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Matrix multiplier
CN111767994A (en) * 2019-04-01 2020-10-13 中国科学院半导体研究所 Neuron calculation module
CN111767079A (en) * 2019-03-30 2020-10-13 英特尔公司 Apparatus, method, and system for transpose instruction for matrix manipulation accelerator

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019148969A (en) * 2018-02-27 2019-09-05 富士通株式会社 Matrix arithmetic device, matrix arithmetic method, and matrix arithmetic program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092810A (en) * 1999-09-24 2001-04-06 Nippon Telegr & Teleph Corp <Ntt> Complex multiplier and complex correlator
CN2674771Y (en) * 2001-12-28 2005-01-26 交互数字技术公司 Sub-station for calculating CDMA system transmission matrix coefficient
JP2009181293A (en) * 2008-01-30 2009-08-13 Yamaha Corp Matrix operation co-processor
CN108733348A (en) * 2017-04-21 2018-11-02 上海寒武纪信息科技有限公司 The method for merging vector multiplier and carrying out operation using it
CN109992743A (en) * 2017-12-29 2019-07-09 华为技术有限公司 Matrix multiplier
CN109271138A (en) * 2018-08-10 2019-01-25 合肥工业大学 A kind of chain type multiplication structure multiplied suitable for big dimensional matrix
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A kind of matrix convolution computing module and matrix convolution calculation method
CN111767079A (en) * 2019-03-30 2020-10-13 英特尔公司 Apparatus, method, and system for transpose instruction for matrix manipulation accelerator
CN111767994A (en) * 2019-04-01 2020-10-13 中国科学院半导体研究所 Neuron calculation module

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周昔平,高德远,樊晓桠.乘累加运算器的高性能解决方案.微电子学与计算机.2002,(第11期),58-62. *
张琦 ; 陈婧 ; 何明华 ; .基于总体性能优化的矩阵乘法器设计与实现.福州大学学报(自然科学版).2009,(第01期),21-24,64. *

Also Published As

Publication number Publication date
CN112612447A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
KR102523263B1 (en) Systems and methods for hardware-based pooling
CN107633297B (en) Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm
US8051124B2 (en) High speed and efficient matrix multiplication hardware module
US20180174036A1 (en) Hardware Accelerator for Compressed LSTM
US20210241071A1 (en) Architecture of a computer for calculating a convolution layer in a convolutional neural network
EP3674982A1 (en) Hardware accelerator architecture for convolutional neural network
CN110580519B (en) Convolution operation device and method thereof
CN110766128A (en) Convolution calculation unit, calculation method and neural network calculation platform
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
Nag et al. ViTA: A vision transformer inference accelerator for edge applications
GB2601701A (en) Performing dot product operations using a memristive crossbar array
CN116167424B (en) CIM-based neural network accelerator, CIM-based neural network accelerator method, CIM-based neural network storage processing system and CIM-based neural network storage processing equipment
CN113918120A (en) Computing device, neural network processing apparatus, chip, and method of processing data
CN116702851A (en) Pulsation array unit and pulsation array structure suitable for weight multiplexing neural network
CN116362314A (en) Integrated storage and calculation device and calculation method
CN107368459B (en) Scheduling method of reconfigurable computing structure based on arbitrary dimension matrix multiplication
CN112612447B (en) Matrix calculator and full-connection layer calculating method based on same
WO2022016261A1 (en) System and method for accelerating training of deep learning networks
CN113743046B (en) Integrated layout structure for memory and calculation and integrated layout structure for data splitting and memory and calculation
CN113627587A (en) Multichannel convolutional neural network acceleration method and device
CN111008697B (en) Convolutional neural network accelerator implementation architecture
CN110765413A (en) Matrix summation structure and neural network computing platform
CN112115665B (en) Integrated memory array and convolution operation method thereof
CN111126580B (en) Multi-precision weight coefficient neural network acceleration chip arithmetic device adopting Booth coding
CN113110822A (en) Configurable matrix multiplication device and algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant