CN107153873A - A kind of two-value convolutional neural networks processor and its application method - Google Patents
A kind of two-value convolutional neural networks processor and its application method Download PDFInfo
- Publication number
- CN107153873A CN107153873A CN201710316252.9A CN201710316252A CN107153873A CN 107153873 A CN107153873 A CN 107153873A CN 201710316252 A CN201710316252 A CN 201710316252A CN 107153873 A CN107153873 A CN 107153873A
- Authority
- CN
- China
- Prior art keywords
- data
- binary
- convolution
- elements
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims description 37
- 238000011176 pooling Methods 0.000 claims abstract description 20
- 238000010606 normalization Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 60
- 239000011159 matrix material Substances 0.000 claims description 27
- 238000013500 data storage Methods 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 13
- 238000009825 accumulation Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 description 23
- 230000008569 process Effects 0.000 description 14
- 230000015654 memory Effects 0.000 description 8
- 238000013506 data mapping Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000005265 energy consumption Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000001994 activation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000946 synaptic effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
本发明提供一种二值卷积神经网络处理器,包括:待计算数据存储装置,用于存储二值形式的待卷积数据的元素以及二值形式的卷积核元素;二值卷积装置,用于对所述二值形式的卷积核元素及所述二值形式的待卷积数据中相应的元素进行二值卷积操作;数据调度装置,用于将所述卷积核元素与所述待卷积数据中相应的元素载入所述二值卷积装置;池化装置,用于对卷积所获得的结果进行池化处理;以及归一化装置,用于对经过池化的结果进行归一化操作。
The present invention provides a binary convolutional neural network processor, comprising: a storage device for data to be calculated, used to store elements of data to be convoluted in binary form and convolution kernel elements in binary form; a binary convolution device , for performing a binary convolution operation on the convolution kernel element in the binary form and corresponding elements in the data to be convoluted in the binary form; the data scheduling device is used for combining the convolution kernel element with the The corresponding elements in the data to be convoluted are loaded into the binary convolution device; the pooling device is used to perform pooling processing on the results obtained by convolution; and the normalization device is used to perform pooling processing on The results are normalized.
Description
技术领域technical field
本发明涉及用于神经网络模型计算中数据的存储与调度。The invention relates to the storage and scheduling of data used in neural network model calculation.
背景技术Background technique
随着人工智能技术的发展,涉及深度神经网络、尤其是卷积神经网络的技术在近几年得到了飞速的发展,在图像识别、语音识别、自然语言理解、天气预测、基因表达、内容推荐和智能机器人等领域均取得了广泛的应用。With the development of artificial intelligence technology, technologies involving deep neural networks, especially convolutional neural networks, have developed rapidly in recent years. In image recognition, speech recognition, natural language understanding, weather prediction, gene expression, content recommendation It has been widely used in fields such as artificial intelligence and intelligent robots.
所述深度神经网络可以被理解为一种运算模型,其中包含大量数据节点,每个数据节点与其他数据节点相连,各个节点间的连接关系用权重表示。随着深度神经网络的不断发展,其复杂程度也在不断地提高。The deep neural network can be understood as an operation model, which contains a large number of data nodes, each data node is connected to other data nodes, and the connection relationship between each node is represented by weight. As deep neural networks continue to develop, so do their complexity.
为了权衡复杂度和运算效果之间的矛盾,在参考文献:Courbariaux M,Hubara I,Soudry D,et al.Binarized neural networks:Training deep neural networks withweights and activations constrained to+1or-1[J].arXiv preprint arXiv:1602.02830,2016.中提出了可以采用“二值卷积神经网络模型”来降低传统神经网络的复杂度。在所述二值卷积神经网络中,卷积神经网络中的权重、输入数据、输出数据均采用“二值形式”,即通过“1”和“-1”近似地表示其大小,例如以“1”来表示大于等于0的数值,并用“-1”来表示小于0的数值。通过上述方式,降低了神经网络中用于操作的数据位宽,由此极大程度地降低了所需的参数容量,致使二值卷积神经网络尤其适用于在物端实现图像识别、增强现实和虚拟现实。In order to balance the contradiction between complexity and operation effect, in the reference: Courbariaux M, Hubara I, Soudry D, et al. Binarized neural networks: Training deep neural networks with weights and activations constrained to+1or-1[J].arXiv Preprint arXiv:1602.02830, 2016. It is proposed that the "binary convolutional neural network model" can be used to reduce the complexity of traditional neural networks. In the binary convolutional neural network, the weights, input data, and output data in the convolutional neural network are all in "binary form", that is, their size is approximately represented by "1" and "-1", such as "1" is used to indicate a value greater than or equal to 0, and "-1" is used to indicate a value less than 0. Through the above method, the data bit width used for operation in the neural network is reduced, thereby greatly reducing the required parameter capacity, making the binary convolutional neural network especially suitable for image recognition and augmented reality at the object end. and virtual reality.
在现有技术中通常采用通用的计算机处理器来运行深度神经网络,例如中央处理器(CPU)和图形处理器(GPU)等。然而,并不存在针对二值卷积神经网络的专用处理器。通用的计算机处理器计算单元位宽通常为多比特,计算二值神经网络会产生资源浪费。In the prior art, a general-purpose computer processor is usually used to run a deep neural network, such as a central processing unit (CPU) and a graphics processing unit (GPU). However, dedicated processors for binary convolutional neural networks do not exist. The bit width of the calculation unit of a general-purpose computer processor is usually multi-bit, and the calculation of a binary neural network will result in waste of resources.
发明内容Contents of the invention
因此,本发明的目的在于克服上述现有技术的缺陷,提供一种二值卷积神经网络处理器,包括:Therefore, the object of the present invention is to overcome the defective of above-mentioned prior art, a kind of binary convolutional neural network processor is provided, comprising:
待计算数据存储装置,用于存储二值形式的待卷积数据的元素以及二值形式的卷积核元素;The data storage device to be calculated is used to store the elements of the data to be convoluted in binary form and the convolution kernel elements in binary form;
二值卷积装置,用于对所述二值形式的卷积核元素及所述二值形式的待卷积数据中相应的元素进行二值卷积操作;A binary convolution device, configured to perform a binary convolution operation on the binary convolution kernel elements and corresponding elements in the binary data to be convoluted;
数据调度装置,用于将所述卷积核元素与所述待卷积数据中相应的元素载入所述二值卷积装置;A data scheduling device, configured to load the convolution kernel element and the corresponding element in the data to be convolved into the binary convolution device;
池化装置,用于对卷积所获得的结果进行池化处理;以及a pooling device for pooling the results obtained by the convolution; and
归一化装置,用于对经过池化的结果进行归一化操作。A normalization device, used for normalizing the pooled results.
优选地,根据所述的二值卷积神经网络处理器,其中所述二值卷积装置,包括:Preferably, according to the binary convolutional neural network processor, wherein the binary convolution device includes:
XNOR门,其以所述二值形式的卷积核元素及所述二值形式的待卷积数据中相应的元素作为其输入;XNOR gate, which takes the convolution kernel element in the binary form and the corresponding element in the data to be convoluted in the binary form as its input;
累加装置,其将所述XNOR门的输出作为其输入,用于对所述XNOR门的输出进行累加,以输出二值卷积操作的结果;an accumulating device, which takes the output of the XNOR gate as its input, and is used to accumulate the output of the XNOR gate to output the result of the binary convolution operation;
其中,所述累加装置包括OR门和或汉明重量计算单元,其中,Wherein, the accumulating means comprises an OR gate and or a Hamming weight calculation unit, wherein,
所述OR门的至少一个输入为所述XNOR门的输出;At least one input of the OR gate is the output of the XNOR gate;
所述汉明重量计算单元的至少一个输入为所述XNOR门的输出。At least one input of the Hamming weight calculation unit is the output of the XNOR gate.
优选地,根据所述的二值卷积神经网络处理器,其中所述待计算数据存储装置还被用于在线地对所获得的经过二值转换的卷积核和或待卷积数据进行存储。Preferably, according to the binary convolutional neural network processor, wherein the storage device for data to be calculated is also used to store the obtained binary-converted convolution kernel and or data to be convolved online .
优选地,根据所述的二值卷积神经网络处理器,其中还包括:Preferably, according to the binary convolutional neural network processor, it also includes:
二值化装置,用于将所获得的卷积核和或待卷积数据转换为二值形式。The binarization device is used for converting the obtained convolution kernel and or the data to be convolved into a binary form.
优选地,根据所述的二值卷积神经网络处理器,其中所述数据调度装置中设置有寄存器,用于在使用时载入需要重复使用的卷积核元素。Preferably, according to the binary convolutional neural network processor, a register is set in the data scheduling device for loading convolution kernel elements that need to be reused during use.
优选地,根据上述任意一项所述的二值卷积神经网络处理器,在所述待计算数据存储装置中所述待卷积数据的元素以及所述卷积核元素按照图层交错的方式而存储。Preferably, according to the binary convolutional neural network processor described in any one of the above, the elements of the data to be convolved and the elements of the convolution kernel in the storage device for the data to be calculated are interleaved in layers And storage.
优选地,根据所述的二值卷积神经网络处理器,在所述待计算数据存储装置中所述待卷积数据的元素根据卷积核的大小及卷积操作时依次参与计算的待卷积数据的元素而存储。Preferably, according to the binary convolutional neural network processor, the elements of the data to be convoluted in the data storage device to be calculated are sequentially involved in the calculation according to the size of the convolution kernel and the convolution operation. The elements of the product data are stored.
优选地,根据所述的二值卷积神经网络处理器,在所述待计算数据存储装置中所述待卷积数据的元素和或所述卷积核元素的存储方式满足以下一项或多项:Preferably, according to the binary convolutional neural network processor, the elements of the data to be convolved and or the elements of the convolution kernel in the storage device for the data to be calculated are stored in such a way that one or more of the following item:
依照所述卷积核和所述待卷积数据的矩阵排布顺序而存储;Stored according to the matrix arrangement order of the convolution kernel and the data to be convolved;
卷积核和或待卷积数据的矩阵中处于同一位置、不同通道中的元素连续地存储在连续的多个存储单元中;Elements in the same position and in different channels in the matrix of the convolution kernel and or the data to be convoluted are continuously stored in multiple consecutive storage units;
同一卷积核中同一权重下的全部元素和或同一待卷积数据中用于进行卷积操作的子矩阵中的全部元素存储在存储装置中连续的多个存储单元中。All the elements under the same weight in the same convolution kernel and or all the elements in the sub-matrix used for the convolution operation in the same data to be convolved are stored in multiple consecutive storage units in the storage device.
并且,本发明还提供了一种针对上述任意一项所述的二值卷积神经网络处理器的使用方法,包括:Moreover, the present invention also provides a method for using the binary convolutional neural network processor described in any one of the above, including:
1)将所述待计算数据存储装置中的待卷积数据载入寄存器;1) loading the data to be convoluted in the data storage device to be calculated into a register;
2)将所述寄存器中的所述待卷积数据以及所述待计算数据存储装置中需要与所述待卷积数据执行乘法的元素载入至二值卷积装置中,以进行二值卷积操作;2) Load the data to be convolved in the register and the elements in the data to be calculated storage device that need to be multiplied with the data to be convolved into the binary convolution device to perform binary convolution accumulation operation;
3)由所述池化装置对所述二值卷积装置的输出进行池化处理;3) performing pooling processing on the output of the binary convolution device by the pooling device;
4)由所述归一化装置对所述池化装置的输出进行归一化操作。4) performing a normalization operation on the output of the pooling device by the normalizing device.
以及一种计算机可读存储介质,其中存储有计算机程序,所述计算机程序在被执行时用于实现上述的方法。And a computer-readable storage medium, in which a computer program is stored, and the computer program is used to implement the above method when executed.
与现有技术相比,本发明的优点在于:Compared with the prior art, the present invention has the advantages of:
提供了经过简化的用于执行卷积运算的硬件结构、以及基于该结构的二值卷积神经网络处理器及相应的计算方法,通过在运算过程中减少进行计算的数据的位宽,达到提高运算效率、降低存储容量及能耗的效果。Provides a simplified hardware structure for performing convolution operations, a binary convolutional neural network processor based on this structure, and corresponding calculation methods. By reducing the bit width of the calculated data during the operation, the The effect of computing efficiency, reducing storage capacity and energy consumption.
附图说明Description of drawings
以下参照附图对本发明实施例作进一步说明,其中:Embodiments of the present invention will be further described below with reference to the accompanying drawings, wherein:
图1是神经网络的多层结构的示意图;Fig. 1 is the schematic diagram of the multilayer structure of neural network;
图2是在二维空间中进行卷积计算的示意图;Fig. 2 is a schematic diagram of convolution calculation in two-dimensional space;
图3是根据本发明的一个实施例的二值卷积装置的硬件结构示意图;FIG. 3 is a schematic diagram of the hardware structure of a binary convolution device according to an embodiment of the present invention;
图4是根据本发明又一个实施例的二值卷积装置的硬件结构示意图;Fig. 4 is a schematic diagram of the hardware structure of a binary convolution device according to yet another embodiment of the present invention;
图5是根据本发明又一个实施例的二值卷积装置的硬件结构示意图;Fig. 5 is a schematic diagram of the hardware structure of a binary convolution device according to yet another embodiment of the present invention;
图6a~6c示出了本发明采用汉明重量计算元件的二值卷积装置的硬件结构示意图;Figures 6a to 6c show a schematic diagram of the hardware structure of the binary convolution device using the Hamming weight calculation element in the present invention;
图7是根据本发明的一个实施例对多通道的卷积核即权重0和权重1以及待卷积数据进行存储的示意图;FIG. 7 is a schematic diagram of storing multi-channel convolution kernels, namely weight 0 and weight 1, and data to be convoluted according to an embodiment of the present invention;
图8是根据本发明的一个实施例的二值卷积神经网络处理器的结构的示意图;8 is a schematic diagram of the structure of a binary convolutional neural network processor according to an embodiment of the present invention;
图9是根据本发明的一个实施例采用二值卷积神经网络处理器进行计算的示意图;Fig. 9 is a schematic diagram of calculation using a binary convolutional neural network processor according to an embodiment of the present invention;
图10是根据本发明的又一个实施例采用二值卷积神经网络处理器进行计算的示意图。Fig. 10 is a schematic diagram of calculation using a binary convolutional neural network processor according to yet another embodiment of the present invention.
具体实施方式detailed description
下面结合附图和具体实施方式对本发明作详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.
计算机学科中所使得神经网络是一种仿照生物学上神经突触联接结构的数学模型,利用由神经网络所组成的应用系统可以实现诸如机器学习、模式识别等诸多功能。The neural network in computer science is a mathematical model imitating the synaptic connection structure in biology. The application system composed of neural network can realize many functions such as machine learning and pattern recognition.
所述神经网络在结构上分为多层,图1示出了一种神经网络多层结构的示意图。参考图1,所述多层结构中的第一层为输入层,最后一层为输出层,其余各层为隐藏层。在使用所述神经网络时,向输入层输入原始图像,即输入层图层,(在本发明中的所述“图像”、“图层”指的是待处理的原始数据,不仅仅是狭义的通过拍摄照片获得的图像),由神经网络中的每一层对所输入的图层进行加工处理并将结果输入到神经网络的下一层中,并最终将输出层的输出作为所输出的结果。The neural network is structurally divided into multiple layers, and FIG. 1 shows a schematic diagram of a multi-layer structure of a neural network. Referring to Fig. 1, the first layer in the multi-layer structure is an input layer, the last layer is an output layer, and the remaining layers are hidden layers. When using the neural network, the original image is input to the input layer, that is, the input layer layer, (the "image" in the present invention, "layer" refers to the original data to be processed, not only in a narrow sense The image obtained by taking photos), each layer in the neural network processes the input layer and inputs the result to the next layer of the neural network, and finally uses the output of the output layer as the output result.
如前文中所述地,为了应对神经网络日益复杂的结构,现有技术提出了一种二值卷积神经网络的概念。顾名思义,二值卷积神经网络的运算包括对所输入的数据进行“卷积”操作,并且其还包括诸如“池化”、“归一化”、“二值化”等操作。As mentioned above, in order to cope with the increasingly complex structure of the neural network, the prior art proposes a concept of a binary convolutional neural network. As the name implies, the operation of the binary convolutional neural network includes "convolution" operations on the input data, and it also includes operations such as "pooling", "normalization", and "binarization".
作为二值卷积神经网络中重要的一项操作,。下面将通过图2详细介绍“卷积”的计算过程。As an important operation in the binary convolutional neural network, . The calculation process of "convolution" will be introduced in detail through Figure 2 below.
图2示出了在二维空间中利用大小为3乘3的“二值”的卷积核对大小为5乘5的“二值”的图像进行卷积的计算过程。参考图2,首先针对图像从上至下的第1-3行、从左至右的第1-3列范围内的各个元素,分别采用在卷积核中对应的元素与每个所述元素相乘:例如,采用卷积核中第1行第1列的元素(表示为“卷积核(1,1)”)乘以图像中第1行第1列的元素(表示为“图像(1,1)”)得到1×1=1,采用卷积核中第1行第2列的卷积核(1,2)乘以图像中第1行第2列的元素图像(1,2)得到1×0=0,类似地计算卷积核(1,3)乘以图像(1,3)得到1×1=1,依次类推计算得出9个结果并将这9个结果相加得到1+0+1+0+1+0+0+0+1=4以作为卷积结果中第1行第1列的元素,卷积结果(1,1)。类似地,计算卷积核(1,1)乘以图像(1,2)、卷积核(1,2)乘以图像(1,3)、卷积核(1,3)乘以图像(1,4)、卷积核(2,1)乘以图像(2,2)…,依次类推计算出1+0+0+1+0+0+0+1=3以作为卷积结果(1,2)。采用上述方式可以计算出如图2所示出的大小为3乘3的卷积结果矩阵。FIG. 2 shows the calculation process of convolving a "binary" image with a size of 5x5 by using a "binary" convolution kernel with a size of 3x3 in a two-dimensional space. Referring to Figure 2, first, for each element in the 1-3 rows from top to bottom of the image, and the 1-3 columns from left to right, the corresponding elements in the convolution kernel and each of the elements are respectively used Multiplication: For example, use the element in row 1 and column 1 in the convolution kernel (expressed as "convolution kernel (1,1)") to multiply the element in row 1 and column 1 in the image (expressed as "image( 1,1)") to get 1×1=1, the convolution kernel (1,2) in row 1, column 2 in the convolution kernel is multiplied by the element image in row 1, column 2 in the image (1,2 ) to get 1×0=0, similarly calculate the convolution kernel (1,3) multiplied by the image (1,3) to get 1×1=1, and so on to get 9 results and add these 9 results Obtain 1+0+1+0+1+0+0+0+1=4 as the element of row 1 and column 1 in the convolution result, the convolution result (1,1). Similarly, compute kernel(1,1) times image(1,2), kernel(1,2) times image(1,3), kernel(1,3) times image( 1,4), the convolution kernel (2,1) is multiplied by the image (2,2)..., and so on to calculate 1+0+0+1+0+0+0+1=3 as the convolution result ( 1,2). The convolution result matrix with a size of 3 by 3 as shown in FIG. 2 can be calculated by using the above method.
所获得到的如图2所示出的卷积结果通过缓冲和二值化处理被输入到下一层的二值卷积神经网络中。The obtained convolution result as shown in Fig. 2 is input into the binary convolutional neural network of the next layer through buffering and binarization processing.
上述示例示出了卷积的计算过程所包括的“乘”、以及“加”或“累加求和”的运算。The above example shows the operations of "multiplication" and "addition" or "accumulation and summation" included in the calculation process of convolution.
发明人意识到基于二值的乘法运算所特有的特性,使得二值卷积运算中的“乘”可被“异或非”运算所代替,即仅利用一个逻辑元件XNOR门便可完成在现有技术中必须采用乘法器才可完成的“乘”的运算。可以看出,基于二值的卷积过程相较于传统的卷积更为简单,其无需进行诸如“2×4”这样复杂的乘法运算,在进行“乘”的运算时,若进行乘法运算的元素中有任意一个为“0”则所获得的结果便为“0”,若进行乘法运算的全部元素均为“1”则所获得的结果便为“1”。The inventor realized that based on the unique characteristics of the binary multiplication operation, the "multiplication" in the binary convolution operation can be replaced by the "exclusive or not" operation, that is, only one logic element XNOR gate can be used to complete the present invention. In the existing technology, the operation of "multiplication" must be completed by using a multiplier. It can be seen that the binary-based convolution process is simpler than the traditional convolution, and it does not need to perform complex multiplication operations such as "2×4". When performing "multiplication" operations, if the multiplication operation If any of the elements in the multiplication operation is "0", the result obtained is "0", and if all the elements of the multiplication operation are "1", the result obtained is "1".
下面将通过一个具体的示例来详细说明在本发明中可以利用XNOR门元件来代替乘法器的原理。The principle that the XNOR gate element can be used to replace the multiplier in the present invention will be described in detail below through a specific example.
在实际使用二值化的卷积时,首先会对图像和卷积核中的非二值数值z进行二值化处理,即:When actually using binarized convolution, the non-binary value z in the image and the convolution kernel is first binarized, that is:
其中,将大于等于0的数值z二值化为“1”以代表图2中用于卷积运算的符号“1”,将小于0的数值z二值化为“-1”以代表图2中用于卷积运算的符号“0”。Among them, the value z that is greater than or equal to 0 is binarized to "1" to represent the symbol "1" used in the convolution operation in Figure 2, and the value z that is less than 0 is binarized to "-1" to represent Figure 2 The symbol "0" used in the convolution operation.
对经过二值化处理的图像和卷积核的值进行“异或非”运算,即存在以下几种情况:Perform "XOR" operation on the binarized image and the value of the convolution kernel, that is There are several situations:
通过上述真值表可以看出,在针对二值化的数值进行“乘”的运算时,可以采用用于执行“异或非”运算的逻辑元件XNOR门来代替乘法器。而如本领域公知地,乘法器的复杂度远高于一个逻辑元件XNOR门。It can be seen from the above truth table that when the "multiplication" operation is performed on the binarized value, the logic element XNOR gate for performing the "exclusive NOR" operation can be used instead of the multiplier. However, as is well known in the art, the complexity of a multiplier is much higher than one logic element XNOR gate.
因此,发明人认为通过采用逻辑元件XNOR门来代替传统处理器中的乘法器,可以大幅地降低二值卷积神经网络的处理器所使用器件的复杂度。Therefore, the inventor believes that the complexity of the devices used in the binary convolutional neural network processor can be greatly reduced by using the logic element XNOR gate to replace the multiplier in the traditional processor.
此外,发明人还意识到基于二值的加法运算所特有的特性,使得上述二值卷积运算中的“加”可被“或”运算所代替,即可以利用逻辑元件OR门便来代替在现有技术中所采用的加法器。这是由于,对上述XNOR门的输出进行的“或”运算的结果可被表达为G=F1+F2...+Fn,并最终输出单比特的结果G,其中Fk表示第k个XNOR门的输出,n表示其输出被用作为OR门的输入的XNOR门总数。In addition, the inventor also realized that the unique characteristics of the binary-based addition operation make the "addition" in the above-mentioned binary convolution operation be replaced by an "or" operation, that is, the logical element OR gate can be used to replace the The adder used in the prior art. This is because the result of the "OR" operation on the output of the above-mentioned XNOR gate can be expressed as G=F 1 +F 2 ...+F n , and finally output a single-bit result G, where F k represents the first Outputs of k XNOR gates, n represents the total number of XNOR gates whose outputs are used as inputs to OR gates.
基于发明人的上述分析,本发明提供了一种可被用于二值卷积神经网络处理器的二值卷积装置,其利用基于二值的乘法运算、加法运算的特性,简化了处理器中用于执行卷积运算的硬件的构成,由此提高卷积运算的速度、降低处理器的总体能耗。Based on the above analysis of the inventor, the present invention provides a binary convolution device that can be used in a binary convolutional neural network processor, which uses the characteristics of binary-based multiplication and addition operations to simplify the processor The configuration of the hardware used to perform the convolution operation, thereby increasing the speed of the convolution operation and reducing the overall energy consumption of the processor.
图3示出了根据本发明的一个实施例的二值卷积装置的硬件结构。如图3所示,该二值卷积装置包括9个XNOR门以及1个OR门,全部9个XNOR门的输出被用作所述OR门的输入。在进行卷积运算时,由每个XNOR门分别计算n1×w1、n2×w2…n9×w9,以获得输出F1~F9;OR门将F1~F9作为其输入,输出卷积结果中的第一个元素G1。类似地,采用同一个卷积核,针对图像中的其他区域进行计算,可以获得卷积结果中的其他元素的大小,此处不再复述。FIG. 3 shows the hardware structure of a binary convolution device according to an embodiment of the present invention. As shown in FIG. 3 , the binary convolution device includes 9 XNOR gates and 1 OR gate, and the outputs of all 9 XNOR gates are used as the input of the OR gate. When performing convolution operation, each XNOR gate calculates n 1 ×w 1 , n 2 ×w 2 ...n 9 ×w 9 to obtain output F 1 ~ F 9 ; OR gate uses F 1 ~ F 9 as its Input, output the first element G 1 in the convolution result. Similarly, by using the same convolution kernel to perform calculations on other areas in the image, the size of other elements in the convolution result can be obtained, which will not be repeated here.
在图3所示出的实施例中,并行地利用多个XNOR门进行乘的计算,提高了卷积计算的速率。然而应当理解,在本发明中还可以对所述二值卷积装置的硬件结构进行变形,下面将通过其他几个实施例进行举例说明。In the embodiment shown in FIG. 3 , multiple XNOR gates are used in parallel to perform multiplication calculations, which increases the rate of convolution calculations. However, it should be understood that the hardware structure of the binary convolution device may also be modified in the present invention, which will be illustrated below through several other embodiments.
图4示出了根据本发明的又一个实施例的二值卷积装置的硬件结构。如图4所示,该二值卷积装置包括1个XNOR门、1个OR门、以及一个寄存器,所述寄存器用于存储OR门的输出并且其所存储的值被用作所述OR门的其中一个输入,并且所述OR门的另一个输入为所述XNOR门的输出。在进行卷积运算时,依照时刻的推进,分别在第一至第九个时刻将n1和w1、n2和w2、…n9和w9作为XNOR门的输入,相应地对应于每个时刻从XNOR门输出F1、F2…F9以作为OR门的其中一个输入,并且将寄存器中所存储的在前一时刻从OR门中输出的结果作为OR门的另一个输入。例如,当XNOR门输出F1(其大小等于n1×w1)时,从寄存器中读取出预存的符号“0”将其与F1一并作为OR门的输入,并从OR门输出F1;当XNOR门输出F2(其大小等于n2×w2)时,从寄存器中读取出F1将其与F2一并作为OR门的输入,并从OR门输出F1+F2,依次类推直至输出针对F1~F9的累加结果G1。Fig. 4 shows the hardware structure of a binary convolution device according to yet another embodiment of the present invention. As shown in Figure 4, the binary convolution device includes 1 XNOR gate, 1 OR gate, and a register, the register is used to store the output of the OR gate and its stored value is used as the OR gate One of the inputs of the OR gate, and the other input of the OR gate is the output of the XNOR gate. When performing the convolution operation, according to the progress of time, n 1 and w 1 , n 2 and w 2 , ... n 9 and w 9 are used as the input of the XNOR gate at the first to ninth time respectively, corresponding to F 1 , F 2 . . . F 9 are output from the XNOR gate at each moment as one input of the OR gate, and the result output from the OR gate at the previous moment stored in the register is used as the other input of the OR gate. For example, when the XNOR gate outputs F 1 (its size is equal to n 1 ×w 1 ), read the pre-stored symbol "0" from the register and use it together with F1 as the input of the OR gate, and output F from the OR gate 1 ; when the XNOR gate outputs F 2 (its size is equal to n 2 ×w 2 ), read F 1 from the register and use it and F 2 together as the input of the OR gate, and output F 1 +F from the OR gate 2 , and so on until the accumulation result G 1 for F 1 to F 9 is output.
在图4所示出的实施例中,通过增加对XNOR门和OR门的复用率,减少了所采用元件数量,并且该方案所采用的是仅具有两个输入端的OR门,其硬件复杂程度更低。In the embodiment shown in Figure 4, the number of components used is reduced by increasing the multiplexing rate of the XNOR gate and the OR gate, and the solution uses an OR gate with only two input terminals, and its hardware is complex to a lesser degree.
图5示出了根据本发明的又一个实施例的二值卷积装置的硬件结构。该实施例与图4所示出的实施例类似,均只采用了一个XNOR门、一个OR门和一个寄存器,不同的是在图5中XNOR门的输入被存入可以同时存储多位结果的寄存器中,并且寄存器中的各个结果被用作OR门的输入。该实施例的使用方法与图4中的实施例相类似,均是对XNOR门进行复用,不同的是图5将每个时刻XNOR门所输出的结果存入能够同时保存多位结果的寄存器中,并在获得了全部F1~F9后,由OR门进行“或”的运算以输出G1。Fig. 5 shows the hardware structure of a binary convolution device according to yet another embodiment of the present invention. This embodiment is similar to the embodiment shown in Figure 4, all only adopting an XNOR gate, an OR gate and a register, the difference is that in Figure 5 the input of the XNOR gate is stored in a multi-bit result that can be stored simultaneously register, and the individual results in the register are used as inputs to the OR gate. The usage method of this embodiment is similar to the embodiment in Fig. 4, both are to multiplex the XNOR gate, the difference is that Fig. 5 stores the result output by the XNOR gate at each moment into a register capable of simultaneously storing multi-bit results , and after obtaining all F 1 -F 9 , the OR gate performs an "or" operation to output G 1 .
在本发明图3、4、5所提供的实施例中,均采用了OR门来实现“加”或“累加”的功能,并且所述OR门的输入均来自于XNOR门的输出,致使最终从OR门输出的结果均为单比特值,由此可以简化计算过程、增加运算速率。该方案所提供的硬件结构尤其适用于针对二值神经网络的专用处理器,这是由于二值神经网络采用数值“1”和“-1”表示神经网络中的权重和数据,在神经网络计算过程存在大量乘法和加法操作,减少计算操作数位宽可以有效地降低计算复杂度。In the embodiments provided in Fig. 3, 4, and 5 of the present invention, the OR gate is used to realize the function of "addition" or "accumulation", and the input of the OR gate comes from the output of the XNOR gate, so that the final The results output from the OR gate are all single-bit values, which can simplify the calculation process and increase the calculation rate. The hardware structure provided by this scheme is especially suitable for special-purpose processors for binary neural networks. This is because binary neural networks use the values "1" and "-1" to represent the weights and data in the neural network. There are a large number of multiplication and addition operations in the process, and reducing the bit width of the calculation operand can effectively reduce the calculation complexity.
然而,由于上述采用OR门来实现“加”或“累加”的功能的方案均为单比特计算,因而会引入一定程度的误差。对此,本发明还提供了一种可选的方案,即采用汉明重量计算元件来代替如图3、4、5中所示出的OR门以实现“加”或“累加”的功能。图6a~6c示出了具有汉明重量计算元件的硬件结构,在所述可选的方案中,汉明重量计算元件将XNOR门的输出作为其输入,输出所输出数据中逻辑“1”的数据,即汉明重量。所述方案与上述采用OR门的方案相类似,同样可以达到简化计算过程的效果,并且该方案还可以实现精准的求和操作。However, since the above-mentioned schemes of using the OR gate to realize the function of "addition" or "accumulation" are all single-bit calculations, a certain degree of error will be introduced. In this regard, the present invention also provides an optional solution, that is, the Hamming weight calculation element is used to replace the OR gate shown in Figures 3, 4, and 5 to realize the "addition" or "accumulation" function. Fig. 6 a ~ 6c have shown the hardware structure that has the Hamming weight calculation element, in described alternative scheme, the Hamming weight calculation element takes the output of the XNOR gate as its input, outputs the logic "1" in the output data data, the Hamming weight. The solution described above is similar to the above-mentioned solution using the OR gate, and can also achieve the effect of simplifying the calculation process, and the solution can also achieve precise summing operations.
发明人发现,基于本发明所提供的上述二值卷积装置针对每一次“乘”及“累加”计算,所操作的均为单个比特的数据,并且通过该二值卷积装置所输出的也均为单个比特的数据,而这样的特征尤其适用于采用“图层交错型数据映射方式”来存储和调度参与卷积运算及计算所获得的数据,从而达到减少数据载入次数,充分利用数据的局部性提高数据的重复利用率的效果。The inventors found that, based on the above-mentioned binary convolution device provided by the present invention, for each "multiplication" and "accumulation" calculation, the operation is a single bit of data, and the output of the binary convolution device is also All are single-bit data, and this feature is especially suitable for using the "layer interleaved data mapping method" to store and schedule the data obtained by participating in convolution operations and calculations, so as to reduce the number of data loading and make full use of data. The locality improves the effect of data reuse.
本发明中的所述“图层交错型数据映射方式”指的是,按照通道(Channel)的方向将卷积核和待卷积数据中的各个元素依次存储至存储装置的每一行中,即在存储装置中数据按照图层交错的方式进行存储,相邻的两个数据元素来自不同的通道而不是同一通道。如图7所示,在本发明中,在同一z轴上的卷积核和待卷积数据的元素对应同一“通道”,即具有相同z值的元素属于同一通道。The "layer interleaved data mapping method" in the present invention refers to sequentially storing each element in the convolution kernel and the data to be convoluted into each row of the storage device according to the direction of the channel, that is In the storage device, data is stored in an interleaved manner, and two adjacent data elements come from different channels rather than the same channel. As shown in Figure 7, in the present invention, the convolution kernel on the same z-axis and the elements of the data to be convoluted correspond to the same "channel", that is, elements with the same z value belong to the same channel.
为更加形象具体地描述所述数据计算方式,图7以(x,y,z)=2*2*2的卷积核权重0和卷积核权重1,与(x,y,z)=2*3*2的待卷积数据为例,详细阐述本发明提供的适用于二值卷积神经网络的图层交错型数据映射方式。参考图7,权重0和权重1中的元素按照该元素所处的空间位置,被分别划分为四组:其中,权重0的四组权重分别为Az、Bz、Cz和Dz,如图所示,z为0、1;权重1的四组权重分别为az、bz、cz和dz,如图所示,z为0、1。In order to describe the data calculation method more vividly and concretely, Fig. 7 uses (x, y, z) = 2*2*2 convolution kernel weight 0 and convolution kernel weight 1, and (x, y, z) = Taking 2*3*2 data to be convolved as an example, the layer interleaved data mapping method suitable for binary convolutional neural networks provided by the present invention is described in detail. Referring to Figure 7, the elements in weight 0 and weight 1 are divided into four groups according to the spatial position of the element: among them, the four groups of weight 0 are A z , B z , C z and D z , As shown in the figure, z is 0, 1; the four groups of weights of weight 1 are a z , b z , c z and d z respectively, and z is 0, 1 as shown in the figure.
参考图7,根据本发明的一个实施例,可以采用以下方式来存储卷积核权重0、卷积核权重1、以及待卷积数据中的各个元素。Referring to FIG. 7 , according to an embodiment of the present invention, the convolution kernel weight 0, the convolution kernel weight 1, and each element in the data to be convolved can be stored in the following manner.
图7中,为了方便说明,依据每个卷积核的尺寸和步进大小,将权重0和权重1的三维矩阵中的元素按照所处的通道划分为两个二维矩阵,例如将权重0划分为由A0、B0、C0、D0所组成的二维矩阵和由A1、B1、C1、D1所组成的二维矩阵;类似地,将待卷积数据的三维矩阵中的元素按照所处的通道划分为两个二维矩阵,即由X0、Y0、Z0、P0、Q0、R0所组成的二维矩阵和由X1、Y1、Z1、P1、Q1、R1所组成的二维矩阵。In Figure 7, for the convenience of illustration, according to the size and step size of each convolution kernel, the elements in the three-dimensional matrix with weight 0 and weight 1 are divided into two two-dimensional matrices according to the channel they are in, for example, weight 0 Divided into a two-dimensional matrix composed of A 0 , B 0 , C 0 , D 0 and a two-dimensional matrix composed of A 1 , B 1 , C 1 , D 1 ; similarly, the three-dimensional matrix of the data to be convoluted The elements in the matrix are divided into two two-dimensional matrices according to the channel they are in, that is, the two-dimensional matrix composed of X 0 , Y 0 , Z 0 , P 0 , Q 0 , R 0 and the two-dimensional matrix composed of X 1 , Y 1 , A two-dimensional matrix composed of Z 1 , P 1 , Q 1 , and R 1 .
在存储卷积核权重0时,在权重存储装置的一行连续的存储单元中,依次存储权重0中的元素A0、A1、B0、B1、C0、C1、D0和D1,共8个比特。可以看出,在存储单元中,相邻的两个元素彼此来自于不同的通道,例如A0和A1分别来自不同的通道,A1和B0也来自不同的通道,按照这样的方式即为前文中所述按照图层交错的存储方式。When storing convolution kernel weight 0, elements A 0 , A 1 , B 0 , B 1 , C 0 , C 1 , D 0 and D in weight 0 are sequentially stored in a row of continuous storage units of the weight storage device 1 , 8 bits in total. It can be seen that in the storage unit, two adjacent elements come from different channels. For example, A 0 and A 1 come from different channels, and A 1 and B 0 also come from different channels. In this way, It is the interleaved storage method according to the layers mentioned above.
在存储卷积核权重1时,在该权重存储装置的另外一行连续的存储单元中,依次存储权重1中的元素的a0、a1、b0、b1、c0、c1、d0和d1,共8个比特。与权重0的存储方式类似地,相邻的两个元素同样来自于不同的通道。When storing convolution kernel weight 1, in another row of continuous storage units of the weight storage device, a 0 , a 1 , b 0 , b 1 , c 0 , c 1 , d of elements in weight 1 are sequentially stored 0 and d 1 , a total of 8 bits. Similar to the storage method of weight 0, two adjacent elements also come from different channels.
在权重存储装置中,位于相同x轴和相同y轴的权重元素(例如A0和A1)作为相邻元素依次存储,在相同x轴和相同y轴的元素存储完毕后存储下一组具有相同x轴和y轴的权重元素(例如B0和B1),依次类推,将卷积核内其他权重元素存储完毕。In the weight storage device, the weight elements on the same x-axis and the same y-axis (for example, A 0 and A 1 ) are stored as adjacent elements in sequence, and the next group with Weight elements of the same x-axis and y-axis (such as B 0 and B 1 ), and so on, store other weight elements in the convolution kernel.
在存储待卷积数据时,可以根据卷积核的大小及卷积操作时依次参与计算的数据元素进行存储。参考图2所示出的卷积计算规则,可知需要首先针对Az Xz、Bz Yz、Cz Pz和DzQz进行计算,再针对Az Yz、Bz Zz、Cz Qz和DzRz进行计算。因此,在存储待卷积数据的各个元素时,除按照图层交错的存储方式之外,还应当考虑卷积计算的规则,从而依次存储参与计算的数据元素,例如将Xz、Yz、Pz、Qz存储在一行或一列连续的存储单元中,将Yz、Zz、Qz、Rz存储在另外一行或一列连续的存储单元中。When storing the data to be convoluted, it can be stored according to the size of the convolution kernel and the data elements sequentially involved in the calculation during the convolution operation. Referring to the convolution calculation rules shown in Figure 2, it can be seen that A z X z , B z Y z , C z P z and D z Q z need to be calculated first, and then A z Y z , B z Z z , C z Q z and D z R z for calculation. Therefore, when storing each element of the data to be convolved, in addition to the interleaved storage method of layers, the rules of convolution calculation should also be considered, so as to store the data elements involved in the calculation in sequence, such as X z , Y z , P z , Q z are stored in one row or one column of continuous memory cells, and Y z , Z z , Q z , R z are stored in another row or one column of continuous memory cells.
参考图7,在数据存储装置的一列连续的存储单元中,依次存储X0、X1、Y0、Y1、P0、P1、Q0、Q1。在数据存储装置的另外一列连续的存储单元中,依次存储Y0、Y1、Z0、Z1、Q0、Q1、R0、R1。Referring to FIG. 7 , X 0 , X 1 , Y 0 , Y 1 , P 0 , P 1 , Q 0 , and Q 1 are sequentially stored in a column of continuous memory cells of the data storage device. In another column of continuous memory cells of the data storage device, Y 0 , Y 1 , Z 0 , Z 1 , Q 0 , Q 1 , R 0 , and R 1 are stored sequentially.
与存储卷积核的元素相类似地,在数据存储装置中,位于相同x轴和相同y轴的数据元素(例如X0和X1)被分为一组并作为相邻元素依次存储,在相同x轴和相同y轴的元素存储完毕后存储下一组具有相同x轴和y轴的权重元素(例如Y0和Y1),依次类推,将待卷积数据矩阵中与卷积核尺寸相当的子矩阵(例如在图7中以虚线标出的)内其他数据元素存储完毕。Similar to storing the elements of the convolution kernel, in the data storage device, the data elements (such as X 0 and X 1 ) located on the same x-axis and the same y-axis are grouped into one group and stored as adjacent elements in sequence, in After storing the elements of the same x-axis and the same y-axis, store the next set of weight elements with the same x-axis and y-axis (for example, Y 0 and Y 1 ), and so on, combine the data matrix to be convolved with the size of the convolution kernel Other data elements in the corresponding sub-matrix (eg, marked with dotted lines in FIG. 7) are stored.
尽管在图7所示出的示例中,卷积核和待卷积数据的通道数均为2,然而应当理解在本发明中对于通道数大于2的卷积核和待卷积数据也可以按照图层交错的存储方式。Although in the example shown in FIG. 7 , the number of channels of the convolution kernel and the data to be convoluted is both 2, it should be understood that in the present invention, the convolution kernel and the data to be convoluted with the number of channels greater than 2 can also be calculated according to How layers are stored interleaved.
优选地,在存储时,依次填满存储装置中连续的多个存储单元,即依照卷积核和待卷积数据的矩阵排布顺序,在存储装置中进行存储。Preferably, when storing, a plurality of continuous storage units in the storage device are sequentially filled up, that is, stored in the storage device according to the matrix arrangement order of the convolution kernel and the data to be convolved.
优选地,将卷积核和或待卷积数据的矩阵中处于同一位置、不同通道中的元素连续地存储在存储装置中连续的多个存储单元。Preferably, the convolution kernel and or the elements in the matrix of the data to be convolved at the same position but in different channels are continuously stored in a plurality of continuous storage units in the storage device.
优选地,将同一卷积核中同一权重下的全部元素和或同一待卷积数据中用于进行卷积操作的子矩阵中的全部元素存储在存储装置中连续的多个存储单元中。Preferably, all elements under the same weight in the same convolution kernel and or all elements in the sub-matrix used for convolution operation in the same data to be convoluted are stored in a plurality of continuous storage units in the storage device.
图7中为了方便解释,将权重存储装置和数据存储装置设置为彼此不同的存储装置,然而应当理解本发明即可以将所述权重存储装置和所述数据存储装置分别设置在不同的存储器上,也可以存储在同一存储器的不同区域上,例如统一地存储在待计算数据存储装置上。In Fig. 7, for the convenience of explanation, the weight storage device and the data storage device are set as different storage devices from each other, but it should be understood that the present invention can respectively set the weight storage device and the data storage device on different memories, It can also be stored in different areas of the same memory, for example, stored collectively on the data storage device to be calculated.
并且,本领域技术人员应当理解,上述实施例所描述的存储方式既可以优先于二值神经网络的计算过程,在处理器外离线地完成,也可以在处理器上在线地完成,例如在处理器的片上芯片中完成,或以计算机程序的方式进行存储,并通过处理器来执行所述计算机程序。Moreover, those skilled in the art should understand that the storage method described in the above embodiments can be completed offline outside the processor or online on the processor, for example, in the processing implemented in an on-chip chip of a device, or stored in the form of a computer program, and the computer program is executed by a processor.
采用根据本发明的上述图层交错型数据映射方式来存储各个卷积核以及待卷积数据中的各个元素,可以减少数据的载入次数、提高数据的复用率。Using the layer interleaved data mapping method according to the present invention to store each convolution kernel and each element in the data to be convoluted can reduce the number of data loading and improve the data multiplexing rate.
还应当理解,采用上述“图层交错型数据映射方式”来存储卷积核元素以及与所述待卷积数据中相应的元素的目的在于方便读取,以快速便捷地确定二值卷积装置的输入。因此,凡是可以实现在所述卷积核元素的存储位置与所述待卷积数据中相应的元素的存储位置之间建立映射关系的方式,均可被用于存储所述卷积核元素以及与所述待卷积数据的元素。It should also be understood that the purpose of using the above-mentioned "layer interleaved data mapping method" to store the convolution kernel elements and the corresponding elements in the data to be convoluted is to facilitate reading, so as to quickly and conveniently determine the binary convolution device input of. Therefore, any manner that can establish a mapping relationship between the storage location of the convolution kernel element and the storage location of the corresponding element in the data to be convolved can be used to store the convolution kernel element and with the elements of the data to be convolved.
例如,在连续的存储单元的长度小于8比特时,例如仅为4比特,对权重0中的A0、A1、B0、B1、C0、C1、D0和D1进行折叠式的存储,即在连续的存储单元中存储A0、A1、B0、B1,并在另一行连续的存储单元中存储C0、C1、D0和D1。For example, when the length of the continuous storage unit is less than 8 bits, for example, only 4 bits, A 0 , A 1 , B 0 , B 1 , C 0 , C 1 , D 0 and D 1 in weight 0 are folded storage in the same way, that is, store A 0 , A 1 , B 0 , and B 1 in consecutive memory cells, and store C 0 , C 1 , D 0 , and D 1 in another row of consecutive memory cells.
在使用通过上述方式而存储的卷积核元素以及待卷积数据中的相应元素进行卷积运算时,适宜于采用单指令多数据流(SIMD)的方式来执行,即通过单条指令将所存储的多个数据载入至运算单元。针对所存储数据进行载入及计算的方法将在随后的实施例中详细介绍。通过这样的方式,可以减少计算单元的位宽、降低计算单元的硬件开销。When using the convolution kernel elements stored in the above manner and the corresponding elements in the data to be convoluted to perform convolution operations, it is suitable to use the single instruction multiple data flow (SIMD) method to perform, that is, the stored A plurality of data of is loaded into the operation unit. The method for loading and calculating the stored data will be described in detail in the following embodiments. In this way, the bit width of the computing unit can be reduced, and the hardware overhead of the computing unit can be reduced.
综合前文中所提到的二值卷积装置以及卷积核和待卷积数据中元素的存储方式和调用方式,可以提供一种计算单元位款少、硬件结构相对简单、针对二值卷积神经网络的专用处理器。Combining the binary convolution device mentioned above and the storage and calling methods of the convolution kernel and the elements in the data to be convoluted, it is possible to provide a computing unit with a small amount of money and a relatively simple hardware structure for binary convolution A dedicated processor for neural networks.
参考图8,根据本发明的一个实施例,提供了一种二值卷积神经网络处理器10,包括:Referring to FIG. 8, according to an embodiment of the present invention, a binary convolutional neural network processor 10 is provided, including:
数据调度装置101、待计算数据存储装置102、二值卷积装置103、池化装置104、归一化装置105、二值化装置106。A data scheduling device 101 , a storage device for data to be calculated 102 , a binary convolution device 103 , a pooling device 104 , a normalization device 105 , and a binarization device 106 .
其中,待计算数据存储装置102用于存储二值形式的卷积核元素以及二值形式的待卷积数据。如前文中所述,所述存储方式应当能够反映出用于卷积计算的卷积核的元素与待卷积数据中相应的元素之间的映射关系。例如,以按照图层交错的方式存储卷积核元素和待卷积数据、以及根据卷积核的大小及卷积操作时依次参与计算的待卷积数据的元素来存储待卷积数据。具体的存储方式可以参考前述实施例。Wherein, the storage device 102 for data to be calculated is used for storing convolution kernel elements in binary form and data to be convolved in binary form. As mentioned above, the storage method should be able to reflect the mapping relationship between the elements of the convolution kernel used for convolution calculation and the corresponding elements in the data to be convolved. For example, the elements of the convolution kernel and the data to be convolved are stored in an interleaved manner according to the layers, and the data to be convolved is stored according to the size of the convolution kernel and the elements of the data to be convolved that are sequentially involved in the calculation during the convolution operation. For a specific storage manner, reference may be made to the foregoing embodiments.
数据调度装置101,用于根据所述映射关系,将所述卷积核元素与所述待卷积数据中相应的元素载入所述二值卷积装置。例如,在所述数据调度装置101中设置寄存器,并在使用时将需要重复使用的卷积核元素载入寄存器中。The data scheduling device 101 is configured to load the convolution kernel element and the corresponding element in the data to be convolved into the binary convolution device according to the mapping relationship. For example, registers are set in the data scheduling device 101, and the convolution kernel elements to be reused are loaded into the registers during use.
二值卷积装置103,用于对所述二值形式的卷积核元素及所述二值形式待卷积数据中相应的元素进行二值卷积操作。所述二值卷积装置103可以采用如前述实施例中任意一种结构,通过XNOR门实现对卷积核元素及待卷积数据中相应的元素的乘的运算,并通过OR门或汉明重量计算元件实现对通过乘的运算所得结果的累加。The binary convolution device 103 is configured to perform a binary convolution operation on the binary convolution kernel elements and corresponding elements in the binary data to be convoluted. The binary convolution device 103 can adopt any structure as in the foregoing embodiments, and realize the multiplication operation of the convolution kernel element and the corresponding element in the data to be convoluted through the XNOR gate, and through the OR gate or Hamming The weight calculation element implements the accumulation of the results obtained through the multiplication operation.
池化装置104,用于对卷积所获得的结果进行池化处理。The pooling device 104 is configured to perform pooling processing on the result obtained by the convolution.
归一化装置105,用于对经过池化的结果进行归一化操作以加速神经网络的参数训练过程。A normalization device 105 is configured to perform a normalization operation on the pooled results to speed up the parameter training process of the neural network.
在本发明的一些实施例中,可以在线地从数据源处获得用于二值卷积操作的卷积核和或待卷积数据。由于所获得的数据不一定为二值化的数据,因此在所述实施例中,还可以在二值卷积神经网络处理器10中设置二值化装置106,以将所获得的数据转换为二值形式。并且,还可以由待计算数据存储装置102在线地对经过二值转换的数据进行存储。In some embodiments of the present invention, the convolution kernel used for the binary convolution operation and or the data to be convolved can be obtained online from the data source. Since the obtained data is not necessarily binarized data, in the embodiment, a binarization device 106 can also be set in the binary convolutional neural network processor 10 to convert the obtained data into binary form. In addition, the binary-converted data can also be stored online by the data-to-be-calculated storage device 102 .
应当理解,对于已经在进行卷积神经网络计算之前预先离线地在待计算数据存储装置102中存储了卷积核和或待卷积数据的实施例,不必在二值卷积神经网络处理器10中设置二值化装置106。It should be understood that, for the embodiment in which the convolution kernel and or the data to be convolved have been stored offline in advance in the data storage device 102 before performing the convolutional neural network calculation, it is not necessary to A binarization device 106 is set in.
下面将参考图9和图10,通过具体的实施例详细介绍采用如图8中所示出的二值卷积神经网络处理器10进行计算的过程。Referring to FIG. 9 and FIG. 10 , the calculation process using the binary convolutional neural network processor 10 shown in FIG. 8 will be described in detail through specific embodiments.
图9示出了根据本发明的一个实施例,采用上述二值卷积神经网络处理器进行计算的过程。图9采用了与图7中相同的符号来表述卷积核元素及待卷积数据元素,例如,X0、X1、A0、A1等。其中,在权重存储矩阵中以一个存储字来存储一行处于同一通道中的全部卷积核元素,如图所示,所述存储字位宽是8比特,每个元素占据1比特。类似地,待卷积数据矩阵中一个存储字的位宽同样是8比特。此外,在图9中,XNOR门和寄存器组的位宽均为2比特。在计算过程中,遵循同一卷积核内的数据在同一累加器中累加的原则。其计算过程如下:FIG. 9 shows a calculation process using the above-mentioned binary convolutional neural network processor according to an embodiment of the present invention. FIG. 9 uses the same symbols as those in FIG. 7 to represent convolution kernel elements and data elements to be convolved, for example, X 0 , X 1 , A 0 , A 1 and so on. In the weight storage matrix, all the convolution kernel elements in a row in the same channel are stored in one storage word. As shown in the figure, the bit width of the storage word is 8 bits, and each element occupies 1 bit. Similarly, the bit width of a storage word in the data matrix to be convoluted is also 8 bits. In addition, in FIG. 9, the bit widths of the XNOR gate and the register bank are both 2 bits. During the calculation, follow the principle that the data in the same convolution kernel is accumulated in the same accumulator. Its calculation process is as follows:
步骤1,将待卷积数据中的高两位(即X0和X1)载入至寄存器组中;Step 1, load the upper two bits (namely X 0 and X 1 ) in the data to be convolved into the register set;
参考图2中所示出的卷积原理图可知,在图9中,待卷积数据中的元素将被反复地使用X0和X1,以在随后的步骤中计算出A0X0、B0X0、A1X1、B1X1,因此需要将待卷积数据中的2比特的数据存入寄存器中。Referring to the convolution principle diagram shown in Figure 2, it can be seen that in Figure 9, the elements in the data to be convoluted will be repeatedly used X 0 and X 1 to calculate A 0 X 0 , B 0 X 0 , A 1 X 1 , B 1 X 1 , so it is necessary to store 2-bit data in the data to be convoluted into the register.
步骤2,将寄存器组中的待卷积数据和权重矩阵中第一行的前两位权重数据(A0和A1)载入至XNOR门中;Step 2, load the data to be convolved in the register set and the first two weight data (A 0 and A 1 ) of the first row in the weight matrix into the XNOR gate;
步骤3,通过加法单元对XNOR门的计算结果执行OR运算或计算汉明重量;Step 3, perform an OR operation or calculate the Hamming weight on the calculation result of the XNOR gate through the addition unit;
如前文中所述,OR运算或计算汉明重量可以达到“加”的效果,在此步骤中,可以计算得出A0X0和A1X1。As mentioned above, the OR operation or the calculation of the Hamming weight can achieve the effect of "addition". In this step, A 0 X 0 and A 1 X 1 can be calculated.
步骤4,将加法单元计算结果输入值累加器0中;Step 4, input the calculation result of the addition unit into the value accumulator 0;
所述累加器0是针对同一卷积核内的数据进行累加。The accumulator 0 is for accumulating data in the same convolution kernel.
步骤5,将寄存器组中的待卷积数据和权重矩阵中第二行的前两位权重数据(a0和a1)载入至XNOR门中;Step 5, load the data to be convolved in the register set and the first two weight data (a 0 and a 1 ) of the second row in the weight matrix into the XNOR gate;
步骤6,加法单元对XNOR门的计算结果执行OR运算或计算汉明重量,计算得出a0X0和a1X1。In step 6, the adding unit performs an OR operation on the calculation result of the XNOR gate or calculates the Hamming weight to obtain a 0 X 0 and a 1 X 1 .
步骤7,将加法单元计算结果输入至累加器1中,以此类推,将X0和X1依次与权重存储阵列中指定八行的前两位权重进行计算;Step 7, input the calculation result of the addition unit into the accumulator 1, and so on, calculate the weights of X 0 and X 1 with the first two digits of the specified eight rows in the weight storage array in turn;
步骤8,与前述步骤中类似地,将待卷积数据中的第三位与第四位(Y0和Y1)载入至寄存器组中;Step 8, similar to the previous steps, load the third and fourth bits (Y 0 and Y 1 ) in the data to be convolved into the register set;
步骤9,将寄存器组中的待卷积数据和权重矩阵中第一行的第三位和第四位权重数据(B0和B1)载入至XNOR门中;Step 9, loading the data to be convolved in the register set and the third and fourth weight data (B 0 and B 1 ) of the first row in the weight matrix into the XNOR gate;
步骤10,通过加法单元对XNOR门的计算结果执行OR运算或计算汉明重量;Step 10, performing an OR operation or calculating the Hamming weight on the calculation result of the XNOR gate through the addition unit;
步骤11,将加法单元计算结果输入值累加器1中,此后与步骤5至步骤7类似,将b0和b1等位于同列的数据依次与Y0和Y1进行计算;Step 11, input the calculation result of the addition unit into the value accumulator 1, and thereafter, similar to steps 5 to 7, calculate b 0 and b 1 in the same column with Y 0 and Y 1 in turn;
步骤12,将累加器得到一个输出图层的数据时,将累加器计算结果载入至缓冲单元;Step 12, when the accumulator obtains the data of an output layer, load the calculation result of the accumulator into the buffer unit;
步骤13,当缓冲单元得到输出图层完整数据后,将输出待卷积数据载入至池化单元进行池化操作;Step 13, after the buffer unit obtains the complete data of the output layer, load the output data to be convoluted into the pooling unit for pooling operation;
步骤14,将池化操作计算结果载入至批量归一化单元进行批量归一化操作;Step 14, load the calculation result of the pooling operation into the batch normalization unit to perform the batch normalization operation;
步骤15,将批量归一化的计算结果载入至二值化单元进行二值化操作。Step 15, load the calculation result of batch normalization into the binarization unit to perform binarization operation.
可以看出,采用如前文中所述的方式依据所述卷积核元素的存储位置与所述待卷积数据中相应的元素的存储位置之间所存在映射关系,可以快速地确定需要进行卷积的相应元素以将其输入XNOR门中。It can be seen that, according to the mapping relationship between the storage location of the convolution kernel element and the storage location of the corresponding element in the data to be convoluted, it can be quickly determined that convolution corresponding element of the product to feed it into the XNOR gate.
当存储单元位宽小于图9中所示出的矩阵位宽时,还可以对所述矩阵采用分块折叠的方式来存储卷积核元素及待卷积数据元素,如图10所示。类似地,图10也采用了与图7中相同的符号来表述卷积核元素及待卷积数据元素,与图9的区别在于当需要读取属于待卷积数据的同一数据块中的数据时还需要考虑该数据在寄存器组中所存储的位置。When the bit width of the storage unit is smaller than the bit width of the matrix shown in FIG. 9 , the matrix can also be folded in blocks to store convolution kernel elements and data elements to be convolved, as shown in FIG. 10 . Similarly, Figure 10 also uses the same symbols as those in Figure 7 to describe the convolution kernel elements and data elements to be convoluted, the difference from Figure 9 is that when it is necessary to read the data in the same data block that belongs to the data to be convolved You also need to consider where the data is stored in the register file.
通过本发明的实施例可以看出,本发明基于二值化运算的特性,提供了经过简化的用于执行卷积运算的硬件结构、以及基于该结构的二值卷积神经网络处理器及相应的计算方法,通过在运算过程中减少进行计算的数据的位宽,达到提高运算效率、降低存储容量及能耗的效果。It can be seen from the embodiments of the present invention that the present invention provides a simplified hardware structure for performing convolution operations based on the characteristics of binarization operations, and a binary convolutional neural network processor based on the structure and corresponding By reducing the bit width of the calculated data during the calculation process, the calculation method can improve the calculation efficiency and reduce the storage capacity and energy consumption.
并且,本发明采用图层交错型数据映射方式进行数据存储和计算,简化了卷积计算时调取待卷积数据以及卷积核数据的过程,减少了硬件开销并提高了数据利用率。Moreover, the present invention adopts a layer interleaved data mapping method for data storage and calculation, which simplifies the process of calling data to be convoluted and convolution kernel data during convolution calculation, reduces hardware overhead and improves data utilization.
需要说明的是,上述实施例中介绍的各个步骤并非都是必须的,本领域技术人员可以根据实际需要进行适当的取舍、替换、修改等。It should be noted that not all the steps described in the foregoing embodiments are necessary, and those skilled in the art may make appropriate trade-offs, replacements, modifications, etc. according to actual needs.
最后所应说明的是,以上实施例仅用以说明本发明的技术方案而非限制。尽管上文参照实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,对本发明的技术方案进行修改或者等同替换,都不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than limit them. Although the present invention has been described in detail above with reference to the embodiments, those skilled in the art should understand that modifications or equivalent replacements to the technical solutions of the present invention do not depart from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in Within the scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710316252.9A CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710316252.9A CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107153873A true CN107153873A (en) | 2017-09-12 |
CN107153873B CN107153873B (en) | 2018-06-01 |
Family
ID=59794343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710316252.9A Active CN107153873B (en) | 2017-05-08 | 2017-05-08 | A kind of two-value convolutional neural networks processor and its application method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107153873B (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657312A (en) * | 2017-09-18 | 2018-02-02 | 东南大学 | Towards the two-value real-time performance system of voice everyday words identification |
CN107967132A (en) * | 2017-11-27 | 2018-04-27 | 中国科学院计算技术研究所 | A kind of adder and multiplier for neural network processor |
CN107977704A (en) * | 2017-11-10 | 2018-05-01 | 中国科学院计算技术研究所 | Weighted data storage method and the neural network processor based on this method |
CN108108811A (en) * | 2017-12-18 | 2018-06-01 | 北京地平线信息技术有限公司 | Convolutional calculation method and electronic equipment in neutral net |
CN108205704A (en) * | 2017-09-27 | 2018-06-26 | 深圳市商汤科技有限公司 | A kind of neural network chip |
CN108647777A (en) * | 2018-05-08 | 2018-10-12 | 济南浪潮高新科技投资发展有限公司 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
CN108681773A (en) * | 2018-05-23 | 2018-10-19 | 腾讯科技(深圳)有限公司 | Accelerated method, device, terminal and the readable storage medium storing program for executing of data operation |
WO2019055224A1 (en) * | 2017-09-14 | 2019-03-21 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
CN109754061A (en) * | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related products |
CN109871936A (en) * | 2017-12-05 | 2019-06-11 | 三星电子株式会社 | Method and apparatus for handling the convolution algorithm in neural network |
CN109978148A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN109993286A (en) * | 2017-12-29 | 2019-07-09 | 深圳云天励飞技术有限公司 | Computational method of sparse neural network and related products |
CN110033086A (en) * | 2019-04-15 | 2019-07-19 | 北京异构智能科技有限公司 | Hardware accelerator for neural network convolution algorithm |
CN110033085A (en) * | 2019-04-15 | 2019-07-19 | 北京异构智能科技有限公司 | Tensor processor |
CN110046705A (en) * | 2019-04-15 | 2019-07-23 | 北京异构智能科技有限公司 | Device for convolutional neural networks |
CN110059805A (en) * | 2019-04-15 | 2019-07-26 | 北京异构智能科技有限公司 | Method for two value arrays tensor processor |
CN110110283A (en) * | 2018-02-01 | 2019-08-09 | 北京中科晶上科技股份有限公司 | A kind of convolutional calculation method |
CN110147873A (en) * | 2018-05-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | The processor and training method of convolutional neural networks |
CN110263809A (en) * | 2019-05-16 | 2019-09-20 | 华南理工大学 | Pond characteristic pattern processing method, object detection method, system, device and medium |
CN110265002A (en) * | 2019-06-04 | 2019-09-20 | 北京清微智能科技有限公司 | Audio recognition method, device, computer equipment and computer readable storage medium |
CN111126579A (en) * | 2019-11-05 | 2020-05-08 | 复旦大学 | An in-memory computing device suitable for binary convolutional neural network computing |
CN111340208A (en) * | 2020-03-04 | 2020-06-26 | 开放智能机器(上海)有限公司 | Depth convolution calculation method and device for vectorization calculation |
CN108829610B (en) * | 2018-04-02 | 2020-08-04 | 浙江大华技术股份有限公司 | Memory management method and device in neural network forward computing process |
CN111985602A (en) * | 2019-05-24 | 2020-11-24 | 华为技术有限公司 | Neural network computing device, method and computing device |
CN112596912A (en) * | 2020-12-29 | 2021-04-02 | 清华大学 | Acceleration operation method and device for convolution calculation of binary or ternary neural network |
US11599785B2 (en) | 2018-11-13 | 2023-03-07 | International Business Machines Corporation | Inference focus for offline training of SRAM inference engine in binary neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005774A (en) * | 2015-07-28 | 2015-10-28 | 中国科学院自动化研究所 | Face relative relation recognition method based on convolutional neural network and device thereof |
CN105354568A (en) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | Convolutional neural network based vehicle logo identification method |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
-
2017
- 2017-05-08 CN CN201710316252.9A patent/CN107153873B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005774A (en) * | 2015-07-28 | 2015-10-28 | 中国科学院自动化研究所 | Face relative relation recognition method based on convolutional neural network and device thereof |
CN105354568A (en) * | 2015-08-24 | 2016-02-24 | 西安电子科技大学 | Convolutional neural network based vehicle logo identification method |
CN105975931A (en) * | 2016-05-04 | 2016-09-28 | 浙江大学 | Convolutional neural network face recognition method based on multi-scale pooling |
Cited By (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019055224A1 (en) * | 2017-09-14 | 2019-03-21 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
EP3682378A1 (en) * | 2017-09-14 | 2020-07-22 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
US10839286B2 (en) | 2017-09-14 | 2020-11-17 | Xilinx, Inc. | System and method for implementing neural networks in integrated circuits |
CN107657312A (en) * | 2017-09-18 | 2018-02-02 | 东南大学 | Towards the two-value real-time performance system of voice everyday words identification |
CN108205704B (en) * | 2017-09-27 | 2021-10-29 | 深圳市商汤科技有限公司 | Neural network chip |
CN108205704A (en) * | 2017-09-27 | 2018-06-26 | 深圳市商汤科技有限公司 | A kind of neural network chip |
CN109754061A (en) * | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related products |
CN109754061B (en) * | 2017-11-07 | 2023-11-24 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
US11531889B2 (en) | 2017-11-10 | 2022-12-20 | Institute Of Computing Technology, Chinese Academy Of Sciences | Weight data storage method and neural network processor based on the method |
CN107977704A (en) * | 2017-11-10 | 2018-05-01 | 中国科学院计算技术研究所 | Weighted data storage method and the neural network processor based on this method |
CN107977704B (en) * | 2017-11-10 | 2020-07-31 | 中国科学院计算技术研究所 | Weight data storage method and neural network processor based on the method |
CN107967132B (en) * | 2017-11-27 | 2020-07-31 | 中国科学院计算技术研究所 | An adder and multiplier for a neural network processor |
CN107967132A (en) * | 2017-11-27 | 2018-04-27 | 中国科学院计算技术研究所 | A kind of adder and multiplier for neural network processor |
US12056595B2 (en) | 2017-12-05 | 2024-08-06 | Samsung Electronics Co., Ltd. | Method and apparatus for processing convolution operation in neural network using sub-multipliers |
CN109871936B (en) * | 2017-12-05 | 2024-03-08 | 三星电子株式会社 | Method and apparatus for processing convolution operations in a neural network |
CN109871936A (en) * | 2017-12-05 | 2019-06-11 | 三星电子株式会社 | Method and apparatus for handling the convolution algorithm in neural network |
CN108108811A (en) * | 2017-12-18 | 2018-06-01 | 北京地平线信息技术有限公司 | Convolutional calculation method and electronic equipment in neutral net |
CN109978148B (en) * | 2017-12-28 | 2020-06-23 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN109978148A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN109993286A (en) * | 2017-12-29 | 2019-07-09 | 深圳云天励飞技术有限公司 | Computational method of sparse neural network and related products |
CN110110283A (en) * | 2018-02-01 | 2019-08-09 | 北京中科晶上科技股份有限公司 | A kind of convolutional calculation method |
CN108829610B (en) * | 2018-04-02 | 2020-08-04 | 浙江大华技术股份有限公司 | Memory management method and device in neural network forward computing process |
CN108647777A (en) * | 2018-05-08 | 2018-10-12 | 济南浪潮高新科技投资发展有限公司 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
CN110147873B (en) * | 2018-05-18 | 2020-02-18 | 中科寒武纪科技股份有限公司 | Convolutional neural network processor and training method |
CN110147873A (en) * | 2018-05-18 | 2019-08-20 | 北京中科寒武纪科技有限公司 | The processor and training method of convolutional neural networks |
CN108681773A (en) * | 2018-05-23 | 2018-10-19 | 腾讯科技(深圳)有限公司 | Accelerated method, device, terminal and the readable storage medium storing program for executing of data operation |
US11599785B2 (en) | 2018-11-13 | 2023-03-07 | International Business Machines Corporation | Inference focus for offline training of SRAM inference engine in binary neural network |
US11797851B2 (en) | 2018-11-13 | 2023-10-24 | International Business Machines Corporation | Inference focus for offline training of SRAM inference engine in binary neural network |
CN110046705A (en) * | 2019-04-15 | 2019-07-23 | 北京异构智能科技有限公司 | Device for convolutional neural networks |
CN110033085A (en) * | 2019-04-15 | 2019-07-19 | 北京异构智能科技有限公司 | Tensor processor |
CN110046705B (en) * | 2019-04-15 | 2022-03-22 | 广州异构智能科技有限公司 | Apparatus for convolutional neural network |
CN110059805A (en) * | 2019-04-15 | 2019-07-26 | 北京异构智能科技有限公司 | Method for two value arrays tensor processor |
CN110033085B (en) * | 2019-04-15 | 2021-08-31 | 广州异构智能科技有限公司 | Tensor processor |
CN110033086A (en) * | 2019-04-15 | 2019-07-19 | 北京异构智能科技有限公司 | Hardware accelerator for neural network convolution algorithm |
CN110033086B (en) * | 2019-04-15 | 2022-03-22 | 广州异构智能科技有限公司 | Hardware accelerator for neural network convolution operations |
CN110263809B (en) * | 2019-05-16 | 2022-12-16 | 华南理工大学 | Pooling feature map processing method, target detection method, system, device and medium |
CN110263809A (en) * | 2019-05-16 | 2019-09-20 | 华南理工大学 | Pond characteristic pattern processing method, object detection method, system, device and medium |
CN111985602A (en) * | 2019-05-24 | 2020-11-24 | 华为技术有限公司 | Neural network computing device, method and computing device |
CN110265002B (en) * | 2019-06-04 | 2021-07-23 | 北京清微智能科技有限公司 | Speech recognition method, apparatus, computer equipment, and computer-readable storage medium |
CN110265002A (en) * | 2019-06-04 | 2019-09-20 | 北京清微智能科技有限公司 | Audio recognition method, device, computer equipment and computer readable storage medium |
CN111126579B (en) * | 2019-11-05 | 2023-06-27 | 复旦大学 | In-memory computing device suitable for binary convolutional neural network computation |
CN111126579A (en) * | 2019-11-05 | 2020-05-08 | 复旦大学 | An in-memory computing device suitable for binary convolutional neural network computing |
CN111340208B (en) * | 2020-03-04 | 2023-05-23 | 开放智能机器(上海)有限公司 | Vectorization calculation depth convolution calculation method and device |
CN111340208A (en) * | 2020-03-04 | 2020-06-26 | 开放智能机器(上海)有限公司 | Depth convolution calculation method and device for vectorization calculation |
CN112596912A (en) * | 2020-12-29 | 2021-04-02 | 清华大学 | Acceleration operation method and device for convolution calculation of binary or ternary neural network |
Also Published As
Publication number | Publication date |
---|---|
CN107153873B (en) | 2018-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107153873B (en) | A kind of two-value convolutional neural networks processor and its application method | |
CN107203808B (en) | A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor | |
CN108427990B (en) | Neural network computing system and method | |
CN110263925B (en) | A hardware acceleration implementation device for forward prediction of convolutional neural network based on FPGA | |
Shafiee et al. | ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars | |
CN107844826B (en) | Neural network processing unit and processing system comprising same | |
CN109543140B (en) | A Convolutional Neural Network Accelerator | |
EP3407266B1 (en) | Artificial neural network calculating device and method for sparse connection | |
US11880768B2 (en) | Method and apparatus with bit-serial data processing of a neural network | |
US9886377B2 (en) | Pipelined convolutional operations for processing clusters | |
CN107169563B (en) | Processing system and method applied to two-value weight convolutional network | |
CN108665063B (en) | Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator | |
WO2022037257A1 (en) | Convolution calculation engine, artificial intelligence chip, and data processing method | |
CN111831254A (en) | Image processing acceleration method, image processing model storage method and corresponding device | |
CN106970896A (en) | The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented | |
CN107145939A (en) | A neural network optimization method and device | |
CN107256424B (en) | Three-value weight convolution network processing system and method | |
JP2021510219A (en) | Multicast Network On-Chip Convolutional Neural Network Hardware Accelerator and Its Behavior | |
CN107423816A (en) | A kind of more computational accuracy Processing with Neural Network method and systems | |
US20230297819A1 (en) | Processor array for processing sparse binary neural networks | |
JP2018116469A (en) | Arithmetic system and arithmetic method for neural network | |
CN110766127B (en) | Neural network computing special circuit and related computing platform and implementation method thereof | |
KR102038390B1 (en) | Artificial neural network module and scheduling method thereof for highly effective parallel processing | |
Sommer et al. | Efficient hardware acceleration of sparsely active convolutional spiking neural networks | |
CN111105023A (en) | Data stream reconstruction method and reconfigurable data stream processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |