CN111710356A

CN111710356A - Encoding type flash memory device and encoding method

Info

Publication number: CN111710356A
Application number: CN202010472550.9A
Authority: CN
Inventors: 黄鹏; 项亚臣; 康晋锋; 刘晓彦
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-25
Anticipated expiration: 2040-05-29
Also published as: CN111710356B

Abstract

The invention discloses a coding type flash memory device and a coding method, wherein the coding type flash memory device comprises: at least one flash memory array structural unit, multiple comparators and multiple adders, each flash memory array in the at least one flash memory array structural unit The structural unit is a 3D NAND FLASH array structural unit, which is used to implement an encoding operation to generate a source line voltage on each of the multiple source lines of the flash memory array structural unit; each comparator in the multiple comparators is related to each source. The lines are correspondingly connected, and are used to convert the source line voltage of each source line correspondingly connected to an output result in a binary form; and each adder in the plurality of adders corresponds to at least two of the plurality of comparators. Each of the source lines of , is connected to each other, and is used to perform sum operation on at least two output results corresponding to at least two comparators. The coded flash memory device and the coding method of the present invention can realize efficient and accurate full-connection layer or convolutional layer operations, thereby realizing a deep neural network.

Description

Encoding type flash memory device and encoding method

技术领域technical field

本发明涉及半导体器件与集成电路技术领域，特别涉及一种实现深度神经网络的编码型闪存装置和编码方法。The present invention relates to the technical field of semiconductor devices and integrated circuits, in particular to a coding flash memory device and coding method for realizing a deep neural network.

背景技术Background technique

深度神经网络目前被广泛应用于图像处理和语音识别等领域并展现出了优异的性能。现有技术中的深度神经网络多依托于传统的冯诺依曼式架构进行运算，但受限于冯诺依曼式计算架构中存储单元和计算单元的分离，数据通过总线传输会导致延时增加、能耗增大，从而使得深度神经网络的数据处理能力和能效比的提高存在瓶颈。为此，现有技术进一步提出了基于3D NAND FLASH结构进行模拟运算的编码型闪存装置。但该编码型闪存装置，在实现深度神经网络的过程中，仍然存在三个主要问题如下：Deep neural networks are currently widely used in image processing and speech recognition and have shown excellent performance. Most of the deep neural networks in the prior art rely on the traditional von Neumann architecture to perform operations, but due to the separation of storage units and computing units in the von Neumann computing architecture, data transmission through the bus will cause delays. The increase of energy consumption and the increase of energy consumption make the data processing capacity and energy efficiency ratio improvement of deep neural network bottlenecks. To this end, the prior art further proposes an encoded flash memory device that performs analog operations based on a 3D NAND FLASH structure. However, there are still three main problems in the implementation of the deep neural network in the coded flash memory device as follows:

第一，当不同器件之间的阈值电压涨落较高时，计算准确率会下降；First, when the threshold voltage fluctuation between different devices is high, the calculation accuracy will decrease;

第二，模拟运算的结果需要经过复杂的模数转换电路，一定程度上降低了3D NANDFLASH存内运算所带来的能效比；Second, the result of the analog operation needs to go through a complex analog-to-digital conversion circuit, which reduces the energy efficiency ratio brought by the in-memory operation of 3D NANDFLASH to a certain extent;

第三，受限于3D NAND FLASH的操作方式，即同一时刻只能对单根字线WL上的数据进行运算，整个计算架构的效率并没有最大化、并行度也并未达到饱和。Third, limited by the operation mode of 3D NAND FLASH, that is, only data on a single word line WL can be operated on at the same time, the efficiency of the entire computing architecture has not been maximized, and the parallelism has not reached saturation.

发明内容SUMMARY OF THE INVENTION

(一)要解决的技术问题(1) Technical problems to be solved

为解决现有技术中，基于3D NAND FLASH结构进行模拟运算的编码型闪存装置存在计算准确率低、能效比降低以及计算架构效率没有最大化且并行度未达到饱和等技术问题，本发明提供了一种编码型闪存装置和编码方法。In order to solve the technical problems in the prior art that the coded flash memory device based on the 3D NAND FLASH structure for analog operation has low calculation accuracy, reduced energy efficiency ratio, and the efficiency of the calculation architecture is not maximized and the parallelism is not saturated. An encoding type flash memory device and encoding method.

(二)技术方案(2) Technical solutions

本发明的一个方面公开了一种编码型闪存装置，其中，包括：至少一个闪存阵列结构单元、多个比较器和多个加法器，至少一个闪存阵列结构单元中每个闪存阵列结构单元为3D NAND FLASH阵列结构单元，用于实现编码运算以生成闪存阵列结构单元的多条源线中每条源线上的源线电压；多个比较器中每个比较器与每条源线对应相连，用于将对应相连的每条源线的源线电压转换为二进制形式的输出结果；以及多个加法器中每个加法器与多个比较器中的至少2个比较器通过对应的每条源线相连，用于将至少2个比较器对应的至少两个输出结果进行加和运算，以实现深度神经网络。One aspect of the present invention discloses an encoded flash memory device, which includes: at least one flash memory array structural unit, multiple comparators and multiple adders, and each flash memory array structural unit in the at least one flash memory array structural unit is 3D The NAND FLASH array structure unit is used to implement encoding operations to generate the source line voltage on each of the multiple source lines of the flash memory array structure unit; each of the multiple comparators is connected to each source line correspondingly, It is used to convert the source line voltage of each connected source line into an output result in binary form; and each adder in the plurality of adders and at least two comparators in the plurality of comparators pass through the corresponding source The lines are connected to each other, and are used for summing at least two output results corresponding to at least two comparators, so as to implement a deep neural network.

根据本发明的实施例，其中，每个闪存阵列结构单元包括：多个运算单元和多个冗余单元；多个运算单元中每个运算单元为多条字线中每条字线与每条源线在闪存阵列结构单元中交叉位置的晶体管，其中每条源线上对应设置一个运算单元；多个冗余单元为每个闪存阵列结构单元中的非运算单元的晶体管，用于在实现编码运算时处于开态。According to an embodiment of the present invention, each flash memory array structural unit includes: multiple operation units and multiple redundant units; each operation unit in the multiple operation units is each word line and each word line in the multiple word lines. Transistors at the intersection of source lines in the flash memory array structural unit, wherein each source line is provided with a corresponding operation unit; the plurality of redundant units are transistors of non-operational units in each flash memory array structural unit, which are used to implement coding On during operation.

根据本发明的实施例，其中，每个闪存阵列结构单元还包括：串选择线和地选择线，串选择线与闪存阵列结构单元的位线端相连；地选择线与闪存阵列结构单元的源线端相连；其中，串选择线与地选择线用于在实现编码运算时施加高电平。According to an embodiment of the present invention, wherein each flash memory array structural unit further includes: a string selection line and a ground selection line, the string selection line is connected to the bit line end of the flash memory array structural unit; the ground selection line is connected to the source of the flash memory array structural unit The line ends are connected; wherein, the string selection line and the ground selection line are used to apply a high level when implementing the encoding operation.

根据本发明的实施例，其中，编码运算为全连接运算或卷积层运算，多个加法器的数量与全连接运算或卷积层运算的加和运算的运算结果的数量一致。According to an embodiment of the present invention, wherein the encoding operation is a fully connected operation or a convolutional layer operation, and the number of the multiple adders is consistent with the number of operation results of the summation operation of the fully connected operation or the convolutional layer operation.

本发明的另一个方面公开了一种编码方法，基于上述的编码型闪存装置实现，其中，编码方法包括：基于至少一个闪存阵列结构单元进行编码运算，生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压；多个比较器将与每个比较器对应的每条源线的源线电压转换为二进制形式的输出结果；以及多个加法器将多个比较器中至少2个比较器对应的至少2个输出结果进行加和运算，实现深度神经网络。Another aspect of the present invention discloses an encoding method, implemented based on the above-mentioned encoded flash memory device, wherein the encoding method includes: performing an encoding operation based on at least one flash memory array structural unit, and generating a plurality of flash memory array structural units for each of the flash memory array structural units. source line voltage on each of the source lines; a plurality of comparators to convert the source line voltage of each source line corresponding to each comparator to an output result in binary form; and a plurality of adders to compare the plurality of At least two output results corresponding to at least two comparators in the device are added and added to implement a deep neural network.

根据本发明的实施例，其中，编码运算为全连接运算或卷积层运算。According to an embodiment of the present invention, the encoding operation is a fully connected operation or a convolutional layer operation.

根据本发明的实施例，其中，编码运算为全连接运算时，在至少一个闪存阵列结构单元进行编码运算生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压之前，编码方法还包括：将全连接层的权值矩阵向量中的多个非零位元素中的每个非零位元素预存入编码型闪存装置中多个运算单元中的对应运算单元中。According to an embodiment of the present invention, when the encoding operation is a fully connected operation, the encoding operation is performed on at least one flash memory array structural unit to generate a source line voltage on each of the multiple source lines of each flash memory array structural unit. Before, the encoding method further includes: pre-storing each non-zero bit element of the multiple non-zero bit elements in the weight matrix vector of the fully connected layer into a corresponding operation unit of the multiple operation units in the encoded flash memory device.

根据本发明的实施例，其中，编码运算为卷积层运算时，在至少一个闪存阵列结构单元进行编码运算生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压之前，编码方法还包括：将卷积层的卷积矩阵中的多个卷积核元素中的每个卷积核元素预存入编码型闪存装置中多个运算单元中的对应运算单元中。According to an embodiment of the present invention, when the encoding operation is a convolution layer operation, the encoding operation is performed on at least one flash memory array structural unit to generate a source line on each of the multiple source lines of each flash memory array structural unit. Before applying the voltage, the encoding method further includes: pre-storing each convolution kernel element of the plurality of convolution kernel elements in the convolution matrix of the convolution layer into a corresponding operation unit of the plurality of operation units in the encoded flash memory device.

根据本发明的实施例，其中，在将全连接层的权值矩阵向量中的多个非零位元素中的每个非零位元素预存入编码型闪存装置中多个运算单元中的对应运算单元中之后，或在将卷积层的卷积矩阵中的多个卷积核元素中的每个卷积核元素预存入编码型闪存装置中多个运算单元中的对应运算单元中之后，编码方法还包括：将对应的输入向量的输入元素对应输入多条字线中的每条字线中，以生成相应每条字线上的字线电压，使得编码型闪存装置的多个冗余单元处于开态。According to an embodiment of the present invention, wherein each non-zero bit element of the multiple non-zero bit elements in the weight matrix vector of the fully connected layer is pre-stored into the corresponding operation in the multiple operation units in the coded flash memory device After each convolution kernel element in the convolution matrix of the convolution layer is pre-stored in the corresponding operation unit of the plurality of operation units in the coded flash memory device, the encoding The method further includes: correspondingly inputting input elements of the corresponding input vector into each word line of the plurality of word lines to generate a word line voltage on each corresponding word line, so that the plurality of redundant cells of the coded flash memory device is on.

根据本发明的实施例，其中，在将对应的输入向量的输入元素对应输入多条字线中的每条字线中，以生成相应每条字线上的字线电压，使得编码型闪存装置的多个冗余单元处于开态之后，编码方法还包括：对编码型闪存装置的串选择线和地选择线施加高电平，以生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压。According to an embodiment of the present invention, wherein the input elements of the corresponding input vector are input to each word line of the plurality of word lines correspondingly to generate a word line voltage on each word line, so that the coded flash memory device After the plurality of redundant cells are in an on state, the encoding method further includes: applying a high level to a string selection line and a ground selection line of the encoded flash memory device to generate each of the plurality of source lines in each of the flash memory array structural cells. The source line voltage on the source line.

根据本发明的实施例，其中，编码运算为全连接运算时，基于至少一个闪存阵列结构单元进行编码运算，生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压，包括：基于多个闪存阵列结构单元同时进行对应输入向量和权值矩阵向量的编码运算，或者基于一个闪存阵列结构单元采用分时复用的方式进行对应输入向量和权值矩阵向量的编码运算；其中，输入向量为多比特数据。According to an embodiment of the present invention, when the encoding operation is a fully connected operation, the encoding operation is performed based on at least one flash memory array structural unit to generate a source line on each of the multiple source lines of each flash memory array structural unit. Voltage, including: simultaneously performing encoding operations on corresponding input vectors and weight matrix vectors based on multiple flash memory array structural units, or encoding corresponding input vectors and weight matrix vectors based on one flash memory array structural unit in a time-division multiplexing manner operation; wherein, the input vector is multi-bit data.

根据本发明的实施例，其中，编码运算为卷积层运算时，编码方法还包括：对多个加法器的加和运算输出的加和结果进行移位求和，实现深度神经网络；其中，卷积层的卷积矩阵为多比特数据。According to an embodiment of the present invention, when the encoding operation is a convolutional layer operation, the encoding method further includes: shifting and summing the addition results output by the addition operations of multiple adders to implement a deep neural network; wherein, The convolution matrix of the convolution layer is multi-bit data.

(三)有益效果(3) Beneficial effects

本发明公开了一种编码型闪存装置和编码方法，其中，编码型闪存装置包括：至少一个闪存阵列结构单元、多个比较器和多个加法器，至少一个闪存阵列结构单元中每个闪存阵列结构单元为3D NAND FLASH阵列结构单元，用于实现编码运算以生成闪存阵列结构单元的多条源线中每条源线上的源线电压；多个比较器中每个比较器与每条源线对应相连，用于将对应相连的每条源线的源线电压转换为二进制形式的输出结果；以及多个加法器中每个加法器与多个比较器中的至少2个比较器通过对应的每条源线相连，用于将至少2个比较器对应的至少两个输出结果进行加和运算。本发明的编码型闪存装置和编码方法可以实现高效且精确的全连接层或卷积层运算，从而实现深度神经网络。The invention discloses a coding type flash memory device and a coding method, wherein the coding type flash memory device comprises: at least one flash memory array structural unit, multiple comparators and multiple adders, each flash memory array in the at least one flash memory array structural unit The structural unit is a 3D NAND FLASH array structural unit, which is used to implement an encoding operation to generate a source line voltage on each of the multiple source lines of the flash memory array structural unit; each comparator in the multiple comparators is related to each source. The lines are correspondingly connected, and are used to convert the source line voltage of each source line correspondingly connected to an output result in a binary form; and each adder in the plurality of adders corresponds to at least two of the plurality of comparators. Each source line of , is connected to each other, and is used to perform sum operation on at least two output results corresponding to at least two comparators. The coded flash memory device and the coding method of the present invention can realize efficient and accurate full-connection layer or convolutional layer operations, thereby realizing a deep neural network.

附图说明Description of drawings

图1是根据本发明一实施例的对应单个闪存阵列结构单元的编码型闪存装置的组成示意图；FIG. 1 is a schematic diagram of the composition of a coded flash memory device corresponding to a single flash memory array structural unit according to an embodiment of the present invention;

图2是根据本发明实施例的编码型闪存装置的编码方法的流程示意图；FIG. 2 is a schematic flowchart of an encoding method for an encoded flash memory device according to an embodiment of the present invention;

图3根据本发明又一实施例的应用于全连接运算的编码型闪存装置的组成示意图；3 is a schematic diagram of the composition of a coded flash memory device applied to a fully connected operation according to another embodiment of the present invention;

图4根据本发明又一实施例的应用于全连接运算的编码型闪存装置的编码方法的部分流程示意图；4 is a schematic partial flowchart of an encoding method for an encoded flash memory device applied to a fully connected operation according to yet another embodiment of the present invention;

图5根据本发明另一实施例的应用于卷积层运算的编码型闪存装置的组成示意图；5 is a schematic diagram of the composition of a coded flash memory device applied to a convolutional layer operation according to another embodiment of the present invention;

图6根据本发明另一实施例的应用于卷积层运算的编码型闪存装置的编码方法的部分流程示意图。FIG. 6 is a schematic partial flowchart of an encoding method for an encoded flash memory device applied to convolutional layer operations according to another embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.

本发明的编码型闪存装置和编码方法能够基于3D NAND FLASH结构实现并行存内计算的编码运算基于如下技术原理：在同一根字线WL上，当所有存储单元(即晶体管)均处于开态时，对应的源线SL才能读出源线电流。基于此原理，为了实现基于3D NAND FLASH结构的并行存内计算，区别于传统的串行操作字线WL实现3D NAND FLASH存内计算的编码方法，如图1所示，基于本发明的编码型闪存装置的编码方法技术构思如下：The encoding type flash memory device and encoding method of the present invention can realize parallel in-memory computing based on the 3D NAND FLASH structure. The encoding operation is based on the following technical principle: on the same word line WL, when all memory cells (ie transistors) are in the on state , the corresponding source line SL can read the source line current. Based on this principle, in order to realize the parallel in-memory calculation based on the 3D NAND FLASH structure, it is different from the traditional serial operation word line WL to realize the coding method of 3D NAND FLASH in-memory calculation. The technical idea of the coding method of the flash memory device is as follows:

在3D NAND FLASH结构上每条源线SL上预设定仅有一个存储单元负责数据运算(即运算单元)，On each source line SL of the 3D NAND FLASH structure, it is preset that only one storage unit is responsible for data operation (ie, operation unit).

该运算单元中可以存储的数值为0或1，其中分别对应于运算单元上的阈值电压为V_th＝V_{th_Low}(即运算单元存储数值为0)或V_th＝V_{th_High}(即运算单元存储数值为1)；The value that can be stored in the operation unit is 0 or 1, and the corresponding threshold voltage on the operation unit is V _th =V _{th_Low} (that is, the value stored in the operation unit is 0) or V _th =V _{th_High} (that is, the value stored in the operation unit is 0) is 1);

在3D NAND FLASH结构上的其余存储单元(非运算单元)设置为冗余单元，冗余单元上存储的数值均为0，则对应冗余单元上的阈值电压为V_th＝V_{th_Low}；The remaining storage units (non-operational units) on the 3D NAND FLASH structure are set as redundant units, and the values stored on the redundant units are all 0, then the threshold voltage on the corresponding redundant units is V _th =V _{th_Low} ;

运算单元上输入数值0或1，则对应于运算单元所在的字线WL上施加的字线电压V_g＝V_{g_High}或V_g＝V_{g_Low}；The input value 0 or 1 on the operation unit corresponds to the word line voltage V _g =V _{g_High} or V _g =V _{g_Low} applied on the word line WL where the operation unit is located;

可见，无论运算单元上输入数值为0或1，其他的冗余单元均处于开态，因此，源线SL的源线电流将对应出现如下运算结果：It can be seen that no matter the input value on the operation unit is 0 or 1, other redundant units are in the open state. Therefore, the source line current of the source line SL will correspond to the following operation results:

当源线SL读出的源线电流为0时，表示运算单元处于截止态，即V_th＝V_{th_High}(1)，V_g＝V_{g_Low}(1)，代表运算结果为1；When the source line current read by the source line SL is 0, it means that the operation unit is in an off state, that is, V _th =V _{th_High} (1), V _g =V _{g_Low} (1), indicating that the operation result is 1;

当SL读出的源线电流为1时，表示运算单元处于开态，即存在以下三种情况之一：V_th＝V_{th_High}(1)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_Low}(1)，代表运算结果为0。When the source line current read by SL is 1, it means that the operation unit is in the open state, that is, there are one of the following three situations: V _th =V _{th_High} (1), V _g =V _{g_High} (0); V _th =V _{th_Low} (0), V _g =V _{g_High} (0); V _th =V _{th_Low} (0), V _g =V _{g_Low} (1), indicating that the operation result is 0.

因此，通过采用本发明的编码型闪存装置，能够在3D NAND FLASH结构内实现并行存内计算的编码方法。具体地，本发明的编码型闪存装置通过加法器辅助的基于3D NANDFLASH的深度神经网络，能够利用存算一体技术并行完成各计算层的功能，并得到准确的运算结果。Therefore, by adopting the encoding type flash memory device of the present invention, the encoding method for parallel in-memory computing can be realized in the 3D NAND FLASH structure. Specifically, the coded flash memory device of the present invention can perform the functions of each computing layer in parallel by using the integrated storage and computing technology through a deep neural network based on 3D NANDFLASH assisted by an adder, and obtain accurate computing results.

本发明的一个方面公开了一种编码型闪存装置，如图1、图3和图5所示，其中，编码型闪存装置包括：至少一个闪存阵列结构单元、多个比较器和多个加法器，用于实现上述的基于3D NAND FLASH的深度神经网络。One aspect of the present invention discloses an encoded flash memory device, as shown in FIG. 1 , FIG. 3 and FIG. 5 , wherein the encoded flash memory device includes: at least one flash memory array structure unit, multiple comparators and multiple adders , which is used to implement the above-mentioned deep neural network based on 3D NAND FLASH.

编码型闪存装置包括：至少一个闪存阵列结构单元100，如图3和图4中彼此相互并联连接的闪存阵列结构单元100-1至闪存阵列结构单元110-N，共N个闪存阵列结构单元，N≥1。N个闪存阵列结构单元中每个闪存阵列结构单元100为3D NAND FLASH阵列结构单元，该3D NAND FLASH阵列结构单元同时具有多个存储单元、多条位线BL、多条字线WL和多条源线SL。The coded flash memory device includes: at least one flash memory array structural unit 100, such as the flash memory array structural unit 100-1 to the flash memory array structural unit 110-N connected in parallel with each other as shown in FIG. 3 and FIG. 4, a total of N flash memory array structural units, N≥1. Each flash memory array structural unit 100 in the N flash memory array structural units is a 3D NAND FLASH array structural unit, and the 3D NAND FLASH array structural unit has a plurality of memory cells, a plurality of bit lines BL, a plurality of word lines WL and a plurality of Source line SL.

如图1所示，该多个存储单元的数量可以是M×M个，其中M个存储单元按照一定的排列规则相互串联形成M个存储单元串，每个存储单元串的一端对应连接一条位线BL，另一端对应连接一条源线SL，每条字线WL与存储单元串相互垂直，连接多个存储单元串上对应的存储单元，因此，位线BL、源线SL的数量为M条，字线WL的数量对应存储单元串上串联的存储单元数量也为M条，每个闪存阵列结构单元100用于实现编码运算以生成闪存阵列结构单元的多条源线中每条源线上的源线电压。As shown in FIG. 1 , the number of the plurality of storage cells may be M×M, wherein the M storage cells are connected in series with each other according to a certain arrangement rule to form M storage cell strings, and one end of each storage cell string is connected to a corresponding bit The other end of the line BL is connected to a source line SL correspondingly, and each word line WL is perpendicular to the memory cell strings, and connects the corresponding memory cells on the plurality of memory cell strings. Therefore, the number of bit lines BL and source lines SL is M. , the number of word lines WL corresponds to the number of memory cells connected in series on the memory cell string, which is also M, and each flash memory array structural unit 100 is used to implement an encoding operation to generate a plurality of source lines of the flash memory array structural unit. the source line voltage.

具体地，在对应每条源线SL上预设定的运算单元存储数值0或1，其他的冗余单元存储数值为0，此时冗余单元将处于开态，对应于运算单元所在的字线WL上施加的字线电压为V_g＝V_{g_High}或V_g＝V_{g_Low}，此时即可生成与运算单元所在的源线SL相应的源线电压。Specifically, the preset arithmetic unit corresponding to each source line SL stores a value of 0 or 1, and other redundant units store a value of 0. At this time, the redundant unit will be in an open state, corresponding to the word where the arithmetic unit is located. The word line voltage applied on the line WL is V _g =V _{g_High} or V _g =V _{g_Low} , at this time, the source line voltage corresponding to the source line SL where the operation unit is located can be generated.

多个比较器中每个比较器与每条源线对应相连，用于将对应相连的每条源线的源线电压转换为二进制形式的输出结果，本发明的比较器可以为集成电路领域的标准器件，为将模拟电压信号与基准电压相比较的电路，可以在两路输入为模拟信号，实现输出为二进制数值0或1，且在输入电压的发生数值波动时，输出值保持恒定。在本发明的另一实施例中，上述比较器可以替换为采样电路。Each comparator in the plurality of comparators is correspondingly connected to each source line, and is used to convert the source line voltage of each source line correspondingly connected to an output result in binary form. The comparator of the present invention can be in the field of integrated circuits. The standard device is a circuit that compares the analog voltage signal with the reference voltage. It can input two analog signals to realize the output as binary value 0 or 1, and when the input voltage fluctuates, the output value remains constant. In another embodiment of the present invention, the above-mentioned comparator may be replaced by a sampling circuit.

多个加法器中每个加法器与多个比较器中的至少2个比较器通过对应的每条源线相连，用于将至少2个比较器对应的至少两个输出结果进行加和运算，以实现深度神经网络。加法器为计算机技术领域中的逻辑器件，用于执行数字的加和运算。该深度神经网络中包括多个卷积层数据或全连接层数据的处理。Each adder in the plurality of adders is connected to at least two comparators in the plurality of comparators through each corresponding source line, and is used for summing at least two output results corresponding to the at least two comparators, to implement deep neural networks. An adder is a logic device in the field of computer technology, which is used to perform digital addition operations. The deep neural network includes the processing of multiple convolution layer data or fully connected layer data.

因此，本发明的编码型闪存装置通过比较器输出二进制形式的输出结果，不仅能够在不同器件之间的阈值电压涨落较高时，维持较高的计算准确率，而且还能够避免类似传统输出结果数值的复杂性，避免传统模拟运算所需要的复杂模数转换电路，提高了3DNAND FLASH结构存内运算所带来的能效比，最后借助于加法器的辅助，实现了最大化整个计算架构的效率和并行度，可在一个时钟周期之内完成并行的编码运算。Therefore, the coded flash memory device of the present invention outputs the output result in binary form through the comparator, which can not only maintain a high calculation accuracy when the threshold voltage fluctuation between different devices is relatively high, but also can avoid the similar traditional output The complexity of the resulting numerical value avoids the complex analog-to-digital conversion circuit required by traditional analog operations, and improves the energy efficiency ratio brought by the in-memory operation of the 3DNAND FLASH structure. Efficiency and parallelism, parallel encoding operations can be completed within one clock cycle.

根据本发明的实施例，其中，如图1、图3和图5所示，每个闪存阵列结构单元包括：多个运算单元C和多个冗余单元；多个运算单元C中每个运算单元C为多条字线中每条字线与每条源线在闪存阵列结构单元中交叉位置的晶体管，其中每条源线上对应设置一个运算单元C；运算单元C为在3D NAND FLASH结构上每条源线SL上的存储单元串里预设定的一个存储单元，并且该源线SL存储单元串里仅由这个存储单元负责数据运算。该运算单元C中可以存储的数值为0或1，其中分别对应于运算单元C上的阈值电压为V_th＝V_{th_Low}(即运算单元C存储数值为0)或V_th＝V_{th_High}(即运算单元C存储数值为1)。运算单元用于全连接层运算或卷积层运算，在全连接层运算时，全连接层的权值矩阵向量中的多个非零位元素中的每个非零位元素对应存储到相应运算单元C中；在进行卷积层运算时，将卷积层的卷积矩阵中的多个卷积核元素中的每个卷积核元素对应存储到相应运算单元C中。According to an embodiment of the present invention, wherein, as shown in FIG. 1 , FIG. 3 and FIG. 5 , each flash memory array structural unit includes: a plurality of operation units C and a plurality of redundant units; Unit C is a transistor at the intersection of each word line and each source line in the multiple word lines in the flash memory array structural unit, wherein each source line is correspondingly provided with an arithmetic unit C; the arithmetic unit C is in the 3D NAND FLASH structure. A memory cell is preset in the memory cell string on each source line SL, and only this memory cell is responsible for data operation in the memory cell string of the source line SL. The value that can be stored in the operation unit C is 0 or 1, and the threshold voltage corresponding to the operation unit C is V _th =V _{th_Low} (that is, the value stored in the operation unit C is 0) or V _th =V _{th_High} (that is, the operation unit C stores a value of 0). Cell C stores the value 1). The operation unit is used for the operation of the fully connected layer or the operation of the convolution layer. During the operation of the fully connected layer, each non-zero bit element in the weight matrix vector of the fully connected layer is correspondingly stored in the corresponding operation. In the unit C; when performing the convolution layer operation, each convolution kernel element in the multiple convolution kernel elements in the convolution matrix of the convolution layer is correspondingly stored in the corresponding operation unit C.

多个冗余单元为每个闪存阵列结构单元100中的非运算单元C的晶体管，用于在实现编码运算时处于开态。冗余单元为对应于上述每条源线上存储单元串中非运算单元C的存储单元，相当于处于非每条字线与每条源线交叉位置的晶体管即冗余单元。当本发明的编码型闪存装置进行编码预算时，无论运算单元C上输入数值为0或1，冗余单元均处于开态。The plurality of redundant units are transistors of the non-operation unit C in each flash memory array structural unit 100, and are used to be in an open state when the encoding operation is implemented. The redundant cell is the memory cell corresponding to the non-operational cell C in the memory cell string on each source line, and is equivalent to the transistor at the intersection of each word line and each source line, that is, the redundant cell. When the coding type flash memory device of the present invention performs coding budget, no matter the input value on the operation unit C is 0 or 1, the redundant unit is in an open state.

根据本发明的实施例，如图1、图3和图5所示，其中，每个闪存阵列结构单元100还包括：串选择线SSL(serial select line)和地选择线GSL(ground select line)，串选择线SSL与闪存阵列结构单元100的位线端相连，具体地，串选择线SSL上设置有多个选择管，每个选择管的一端连接位线BL，另一端连接存储单元串的位端，串选择线SSL将多个选择管沿与源线SL相互垂直的方向串接。According to an embodiment of the present invention, as shown in FIG. 1 , FIG. 3 and FIG. 5 , wherein each flash memory array structural unit 100 further includes: a serial select line SSL (serial select line) and a ground select line GSL (ground select line) , the string selection line SSL is connected to the bit line end of the flash memory array structural unit 100. Specifically, the string selection line SSL is provided with a plurality of selection tubes, one end of each selection tube is connected to the bit line BL, and the other end is connected to the memory cell string. At the bit end, the string selection line SSL connects a plurality of selection tubes in series along the direction perpendicular to the source line SL.

地选择线GSL与闪存阵列结构单元100的源线端相连；具体地，地选择线GSL上设置有多个选择管，每个选择管的一端连接源线SL，另一端连接存储单元串的源端，地选择线GSL将多个选择管沿与源线SL相互垂直的方向串接。The ground selection line GSL is connected to the source line end of the flash memory array structure unit 100; specifically, the ground selection line GSL is provided with a plurality of selection tubes, one end of each selection tube is connected to the source line SL, and the other end is connected to the source of the memory cell string. At the end, the ground selection line GSL connects a plurality of selection tubes in series along the direction perpendicular to the source line SL.

其中，串选择线SSL与地选择线GSL用于在实现编码运算时施加高电平。Among them, the string selection line SSL and the ground selection line GSL are used to apply a high level when implementing an encoding operation.

根据本发明的实施例，其中，编码运算为全连接运算或卷积层运算，多个加法器的数量与全连接运算或卷积层运算的加和运算的运算结果的数量一致。具体地，每个闪存阵列结构单元100上加法器的数量l与该闪存阵列结构单元100上的全连接运算或卷积层运算的加和运算的运算结果的数量一致，本发明的编码型闪存装置的加法器数量L则为L＝l×N，N为该编码型闪存装置中并联连接的闪存阵列结构单元100的数量。According to an embodiment of the present invention, wherein the encoding operation is a fully connected operation or a convolutional layer operation, and the number of the multiple adders is consistent with the number of operation results of the summation operation of the fully connected operation or the convolutional layer operation. Specifically, the number l of adders on each flash memory array structural unit 100 is consistent with the number of operation results of the full connection operation or the sum operation of the convolution layer operation on the flash memory array structural unit 100. The encoded flash memory of the present invention The number L of adders of the device is L=1×N, where N is the number of flash memory array structural units 100 connected in parallel in the coded flash memory device.

对于全连接层运算，如图3所示，输入向量X(1×M)与权值矩阵向量K(M×N)相乘后得到输出向量Y(1×N)，数学表达如下公式(1)：For the fully connected layer operation, as shown in Figure 3, the input vector X (1×M) is multiplied by the weight matrix vector K (M×N) to obtain the output vector Y (1×N). The mathematical expression is as follows (1 ):

Y_i＝X₁·K_1,i+X₂·K_2,i+…+X_M·K_M,i Y _i =X ₁ ·K _1,i +X ₂ ·K _2,i +...+X _M ·K _M,i

其中1≤i≤N。where 1≤i≤N.

因此，全连接运算的输出结果Y可以具有N个，则对应全连接运算需要的加法器的数量等于N。Therefore, the output results Y of the full connection operation may have N, and the number of adders required for the corresponding full connection operation is equal to N.

对于卷积层运算，如图5所示，多个局部区域的全连接层，输入向量的矩阵X(M×N)的各部分分别与卷积矩阵K(k×k)进行矩阵向量乘法运算，完成卷积操作并得到输出向量的矩阵Y(m×n)，数学表达如下公式(2)：For the convolutional layer operation, as shown in Figure 5, for the fully connected layers of multiple local regions, each part of the matrix X (M×N) of the input vector is respectively subjected to a matrix-vector multiplication operation with the convolution matrix K (k×k) , complete the convolution operation and get the matrix Y(m×n) of the output vector, the mathematical expression is as follows (2):

Y_i,j＝X_i,j·K_k,k+X_i,j+1·K_k,k-1+X_i+1,j·K_k-1,k+…+X_i+k-1,j+k-1·K_1,1 Y _i,j =X _i,j ·K _k,k +X _i,j+1 ·K _k,k-1 +X _i+1,j ·K _k-1,k +...+X _{i+k- 1,j+k-1} ·K _1,1

其中1≤i≤M-k+1(m)，1≤j≤N-k+1(n)。where 1≤i≤M-k+1(m), 1≤j≤N-k+1(n).

因此，卷积层运算的输出结果Y可以具有(M-k+1)*(N-k+1)个，其构成一个完整的输出矩阵，则对应卷积层运算所需要的加法器的数量等于(M-k+1)*(N-k+1)。Therefore, the output results Y of the convolutional layer operation can have (M-k+1)*(N-k+1), which constitute a complete output matrix, which corresponds to the number of adders required for the convolutional layer operation. Equal to (M-k+1)*(N-k+1).

无论是卷积层运算亦或是全连接运算，都相当于向量矩阵乘法运算。相比之下，卷积层比全连接层增加了卷积核的移位操作。在利用3D NAND FLASH结构的闪存阵列结构单元100实现两种类型的向量矩阵乘法运算时，主要差异在于卷积层的位线BL输入会在每个周期后更新为输入图像待卷积的下一个局部区域。Whether it is a convolutional layer operation or a fully connected operation, it is equivalent to a vector-matrix multiplication operation. In contrast, the convolutional layer increases the shift operation of the convolution kernel than the fully connected layer. When using the flash memory array structure unit 100 of the 3D NAND FLASH structure to implement two types of vector-matrix multiplication operations, the main difference is that the bit line BL input of the convolution layer is updated after each cycle to the next input image to be convolved. Partial area.

以上已经结合附图1、图3和图5对本发明的编码型闪存装置的实施例进行了详细描述。The embodiments of the coded flash memory device of the present invention have been described in detail above with reference to FIG. 1 , FIG. 3 and FIG. 5 .

现具体阐述将3D NAND FLASH应用于向量矩阵乘法运算、实现并行化数据处理的过程。The process of applying 3D NAND FLASH to vector-matrix multiplication and realizing parallelized data processing will now be described in detail.

本发明的另一个方面公开了一种编码方法，如图2所示，基于上述的编码型闪存装置实现，其中，该编码方法包括：Another aspect of the present invention discloses an encoding method, as shown in FIG. 2 , implemented based on the above-mentioned encoding type flash memory device, wherein the encoding method includes:

步骤S410：基于至少一个闪存阵列结构单元进行编码运算，生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压；具体地，在对应每条源线SL上预设定的运算单元存储数值0或1，其他的冗余单元存储数值为0，此时冗余单元将处于开态，对应于运算单元所在的字线WL上施加的字线电压为V_g＝V_{g_High}或V_g＝V_{g_Low}，此时即可生成与运算单元所在的源线SL相应的源线电压。Step S410: Perform an encoding operation based on at least one flash memory array structural unit to generate a source line voltage on each of the multiple source lines of each flash memory array structural unit; The set operation unit stores a value of 0 or 1, and other redundant units store a value of 0. At this time, the redundant unit will be in an open state, corresponding to the word line voltage applied on the word line WL where the operation unit is located. V _g = V _{g_High} or V _g =V _{g_Low} , at this time, the source line voltage corresponding to the source line SL where the operation unit is located can be generated.

步骤S420：多个比较器将与每个比较器对应的每条源线的源线电压转换为二进制形式的输出结果；本发明的比较器可以在两路输入为模拟信号，实现输出为二进制数值0或1，且在输入电压的发生数值波动时，输出值保持恒定。Step S420 : a plurality of comparators convert the source line voltage of each source line corresponding to each comparator into an output result in binary form; the comparator of the present invention can input analog signals in two channels, and realize the output as a binary value 0 or 1, and the output value remains constant when the input voltage fluctuates numerically.

步骤S430：多个加法器将多个比较器中至少2个比较器对应的至少2个输出结果进行加和运算，实现深度神经网络。加法器为计算机技术领域中的逻辑器件，用于执行数字的加和运算。该深度神经网络中包括多个卷积层数据或全连接层数据的处理。Step S430 : The multiple adders perform an addition operation on at least two output results corresponding to at least two of the multiple comparators, so as to implement a deep neural network. An adder is a logic device in the field of computer technology, which is used to perform digital addition operations. The deep neural network includes the processing of multiple convolution layer data or fully connected layer data.

基于本发明的编码型闪存装置的编码方法，能够在3D NAND FLASH结构内实现并行存内计算的编码方法。具体地，本发明的编码型闪存装置通过加法器辅助的基于3D NANDFLASH的深度神经网络能够利用存算一体技术并行完成各计算层的功能并得到准确的运算结果。Based on the encoding method of the encoding type flash memory device of the present invention, the encoding method of parallel in-memory computing can be realized in the 3D NAND FLASH structure. Specifically, the coded flash memory device of the present invention can perform the functions of each computing layer in parallel and obtain accurate computing results by using the integrated storage and computing technology through the 3D NANDFLASH-based deep neural network assisted by the adder.

因此，本发明的编码方法通过比较器输出二进制形式的输出结果，不仅能够在不同器件之间的阈值电压涨落较高时，维持较高的计算准确率，而且还能够避免类似传统输出结果数值的复杂性，避免传统模拟运算所需要的复杂模数转换电路，提高了3D NANDFLASH结构存内运算所带来的能效比，最后借助于加法器的辅助，实现了最大化整个计算架构的效率和并行度，可在一个时钟周期之内完成并行的编码运算。Therefore, the coding method of the present invention outputs the output result in binary form through the comparator, which can not only maintain a high calculation accuracy rate when the threshold voltage fluctuation between different devices is high, but also can avoid similar traditional output result values. It avoids the complex analog-to-digital conversion circuit required by traditional analog operations, improves the energy efficiency ratio brought by the in-memory operation of the 3D NANDFLASH structure, and finally maximizes the efficiency and efficiency of the entire computing architecture with the help of the adder. Parallelism, parallel coding operations can be completed within one clock cycle.

以下对本发明具有单个闪存阵列结构单元的编码型闪存装置的编码方法作进一步的解释。The coding method of the coded flash memory device having a single flash memory array structural unit of the present invention will be further explained below.

根据本发明的一实施例，如图1所示，对于本发明的上述编码型闪存装置，其中某一闪存阵列结构单元100在编码运算开始之前，将权值矩阵向量K(M×N)的各个元素(K₁～K_M)中的非零位元素分别存入3D NAND FLASH结构单元的M条源线SL上的对应的运算单元C上，即非零位元素K₁存入字线WL1和源线SL1的交叉位置，非零位元素K₂存入字线WL2和源线SL2的交叉位置……依此类推，非零位元素K_M存入字线WL_M和源线SL_M的交叉位置。According to an embodiment of the present invention, as shown in FIG. 1 , for the above-mentioned encoded flash memory device of the present invention, a certain flash memory array structure unit 100 converts the weight matrix vector K (M×N) to the value matrix before the encoding operation starts. The non-zero bit elements in each element (K ₁ _-KM ) are respectively stored in the corresponding operation unit C on the M source lines SL of the 3D NAND FLASH structural unit, that is, the non-zero bit element K ₁ is stored in the word line WL1 At the intersection position with the source line SL1, the non-zero bit element K ₂ is stored in the intersection position of the word line WL2 and the source line SL2... and so on, the non-zero bit element K _M is stored in the word line WL _M and the source line SL _M. cross position.

如前述，在运算单元C中存储的数值0或1，对应于的运算单元的阈值电压为V_th＝V_{th_Low}或V_{th_High}，每条源线SL上仅有一个器件为运算单元C，其余单元设置为冗余单元，每个冗余单元存储的数值均为0，即对应冗余单元上的阈值电压V_th＝V_{th_Low}，使得冗余单元保持开态。As mentioned above, the value 0 or 1 stored in the operation unit C corresponds to the threshold voltage of the operation unit V _th =V _{th_Low} or V _{th_High} , only one device on each source line SL is the operation unit C, and the rest of the units It is set as a redundant unit, and the value stored in each redundant unit is 0, that is, the threshold voltage V _th =V _{th_Low} corresponding to the redundant unit, so that the redundant unit remains on.

输入向量(1×M)的各个元素(X₁～X_M)分别转化为对应的电压值并施加至字线WL1～字线WL-M上，对应电压值对应于字线WL上输入0或1，使得字线WL上施加的字线电压为V_g＝V_{g_High}或V_{g_Low}。借此，可以保证冗余单元对应的字线WL的输入数据不会影响冗余单元的状态，即冗余单元一直处于开态。The elements (X ₁ ˜X M ) of the input vector (1× _M ) are respectively converted into corresponding voltage values and applied to the word lines WL1 ˜ WL-M, and the corresponding voltage values correspond to the input 0 or 0 on the word line WL. 1, so that the word line voltage applied on the word line WL is V _g =V _{g_High} or V _{g_Low} . In this way, it can be ensured that the input data of the word line WL corresponding to the redundant cell will not affect the state of the redundant cell, that is, the redundant cell is always in the ON state.

最后，在该闪存阵列结构单元100的串选择线SSL和地选择线GSL同时加高电平，源线SL的输出此时仅取决于对应的运算单元C的状态，具体而言：当运算单元C中存储的权值和通过字线WL输入的输入数据均为1时，即该运算单元C的阈值电压为V_th＝V_{th_High}，其所在字线WL的字线电压V_g＝V_{g_Low}，此时运算单元C处于截止态，对应的源线SL无源线电流，代表运算结果为1；其余情况：V_th＝V_{th_High}(1)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_Low}(1)，运算单元均处于开态，对应的源线SL有源线电流，代表运算结果为0。Finally, the string selection line SSL and the ground selection line GSL of the flash memory array structure unit 100 are simultaneously increased to a high level, and the output of the source line SL only depends on the state of the corresponding operation unit C. Specifically: when the operation unit When the weight stored in C and the input data input through the word line WL are both 1, that is, the threshold voltage of the operation unit C is V _th =V _{th_High} , the word line voltage of the word line WL where it is located is V _g =V _{g_Low} , At this time, the operation unit C is in the off state, and the corresponding source line SL has a passive line current, which means that the operation result is 1; in other cases: V _th =V _{th_High} (1), V _g =V _{g_High} (0); V _th =V _{th_Low} (0), V _g =V _{g_High} (0); V _th =V _{th_Low} (0), V _g =V _{g_Low} (1), the operation units are all in the open state, the corresponding source line SL has active line current, representing The result of the operation is 0.

通过源线SL末端与源线相连的比较器200，可以将源线电流的有无读出并转换为计算结果0或1，即作为比较器200二进制形式的输出结果；不同源线SL的运算单元C得到的输出结果通过加法器进行加和运算，即可得到输入数据和权值数据的加和结果，此时也即实现了深度神经网络。Through the comparator 200 connected to the source line at the end of the source line SL, the presence or absence of the source line current can be read out and converted into a calculation result of 0 or 1, that is, as the output result of the comparator 200 in binary form; the operation of different source lines SL The output result obtained by the unit C is added by the adder, and the summation result of the input data and the weight data can be obtained. At this time, the deep neural network is realized.

以下对本发明具有多个闪存阵列结构单元的编码型闪存装置的编码方法作进一步的解释。The coding method of the coded flash memory device having a plurality of flash memory array structural units of the present invention will be further explained below.

其中1≤i≤N。where 1≤i≤N.

根据本发明的又一实施例，如图3和图4所示，其中，本发明实施例的编码方法基于图3所示的编码闪存装置实现，编码运算为全连接运算时，在步骤S410之前，编码方法还包括：According to another embodiment of the present invention, as shown in FIG. 3 and FIG. 4 , wherein the encoding method of the embodiment of the present invention is implemented based on the encoding flash memory device shown in FIG. 3 , when the encoding operation is a fully connected operation, before step S410 , the encoding method also includes:

步骤S510：将全连接层的权值矩阵向量中的多个非零位元素中的每个非零位元素预存入编码型闪存装置中多个运算单元C中的对应运算单元C中。Step S510: Pre-store each non-zero bit element of the multiple non-zero bit elements in the weight matrix vector of the fully connected layer into a corresponding operation unit C of the multiple operation units C in the coded flash memory device.

根据本发明的又一实施例，其中，在步骤S510之后，编码方法还包括：According to yet another embodiment of the present invention, after step S510, the encoding method further includes:

步骤S520：将对应的输入向量的输入元素对应输入多条字线中的每条字线中，以生成相应每条字线上的字线电压，使得编码型闪存装置的多个冗余单元处于开态。Step S520: Input the input elements of the corresponding input vector into each word line of the plurality of word lines, so as to generate a word line voltage on each word line, so that the plurality of redundant cells of the coded flash memory device are in the open state.

根据本发明的又一实施例，其中，在步骤S520之后，还包括：According to yet another embodiment of the present invention, after step S520, it further includes:

步骤S530：对编码型闪存装置的串选择线和地选择线施加高电平，以生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压。Step S530: Apply a high level to the string selection line and the ground selection line of the coded flash memory device to generate a source line voltage on each of the plurality of source lines of each flash memory array structural unit.

具体地，对于编码运算为全连接运算时，如图3和图4所示，该编码型闪存装置中的一个闪存阵列结构单元100(即3D NAND FLASH结构单元)可以作为单个“串(serial)”负责处理单比特输入向量X(1×M)与单比特权值矩阵K(M×N)的全连接运算。Specifically, when the encoding operation is a fully connected operation, as shown in FIG. 3 and FIG. 4 , a flash memory array structural unit 100 (ie, a 3D NAND FLASH structural unit) in the encoded flash memory device can be used as a single “serial” "Responsible for processing the full connection operation of the single-bit input vector X (1×M) and the single-bit weight matrix K (M×N).

同一个串上共有N组存算模块，分别对应输出向量Y-1～Y-N。每组存算模块包含M条源线SL，分别对应输入字线WL-1～WL-M。运算开始之前，需对3D NAND FLASH结构单元的存算资源进行配置，按照进行向量矩阵乘法时输入向量与权值矩阵向量的各元素对应关系将权值矩阵的非零位元素写入3D NAND FLASH结构单元中对应的运算单元C，即对应于步骤S510。There are N groups of storage and arithmetic modules on the same string, corresponding to output vectors Y-1 to Y-N respectively. Each set of memory modules includes M source lines SL, which correspond to the input word lines WL-1 to WL-M respectively. Before the operation starts, it is necessary to configure the storage and computing resources of the 3D NAND FLASH structural unit, and write the non-zero elements of the weight matrix into the 3D NAND FLASH according to the corresponding relationship between the input vector and each element of the weight matrix vector when performing vector-matrix multiplication. The corresponding operation unit C in the structural unit corresponds to step S510.

不同串的输入向量(X₁～X_M)的输入元素施加于对应的字线WL1～WL-M上，分别转化为对应WL1～WL-M上的输入字线电压，对应于步骤520，使得3D NAND FLASH结构单元上的冗余单元处于开态。The input elements of the input vectors (X ₁ ˜X _M ) of different strings are applied to the corresponding word lines WL1 ˜WL-M, and are respectively converted into the input word line voltages on the corresponding WL1 ˜WL-M, corresponding to step 520 , so that Redundant cells on the 3D NAND FLASH structural cells are on.

当串选择线SSL线和地选择线GSL线加高电平时，编码运算开始：因为同一条源线SL只有一个FLASH单元(即运算单元C)，该FLASH单元存储权重值(即前述的权值矩阵的非零位元素)并进行运算，其运算结果(0或1)可通过源线电压反映，即对应于步骤S530。When the string selection line SSL line and the ground selection line GSL line are at high level, the encoding operation starts: because the same source line SL has only one FLASH unit (ie the operation unit C), the FLASH unit stores the weight value (ie the aforementioned weight value). The non-zero element of the matrix) and perform operation, the operation result (0 or 1) can be reflected by the source line voltage, that is, corresponding to step S530.

该源线电压对应的源线电流值被比较器读出并转换为二进制形式的输出结果。N个加法器分别将对应的N组比较器的输出结果(每组比较器对应有M条SL)的运算结果相加，即可得到全连接层的输出结果Y1～Y-N。The source line current value corresponding to the source line voltage is read out by the comparator and converted into an output result in binary form. The N adders respectively add the operation results of the output results of the corresponding N groups of comparators (each group of comparators corresponds to M pieces of SL) to obtain the output results Y1 to Y-N of the fully connected layer.

根据本发明的又一实施例，不同串(SSL-1～SSL-N)协同处理多比特输入和多比特权值矩阵的全连接运算。其中，编码运算为全连接运算时，其中，输入向量为多比特数据，多比特数据可以是多比特输入向量和多比特权值矩阵向量的数据。步骤S410包括：According to yet another embodiment of the present invention, different strings (SSL-1 to SSL-N) cooperatively process multi-bit inputs and fully-connected operations of multi-bit weight matrices. When the encoding operation is a fully connected operation, the input vector is multi-bit data, and the multi-bit data may be data of a multi-bit input vector and a multi-bit weight matrix vector. Step S410 includes:

基于多个闪存阵列结构单元同时进行对应输入向量和权值矩阵向量的编码运算，由于多个闪存阵列结构单元在本发明的编码型闪存装置中为并联连接，因此，其可以同时实现对多比特输入和多比特权值矩阵的全连接运算，即多个闪存阵列结构单元的每个闪存阵列结构单元实现单比特数据的全连接运算。Based on a plurality of flash memory array structural units performing encoding operations corresponding to the input vector and the weight matrix vector at the same time, since the plurality of flash memory array structural units are connected in parallel in the encoded flash memory device of the present invention, it can simultaneously realize multi-bit encoding operations. The full connection operation of the input and the multi-bit weight matrix, that is, each flash memory array structural unit of the multiple flash memory array structural units realizes the full connection operation of single-bit data.

或者基于一个闪存阵列结构单元采用分时复用的方式进行对应输入向量和权值矩阵向量的编码运算；对于闪存阵列结构单元的数量较为有限的情况，可以基于一个闪存阵列结构单元在不同的时间，分为多次进行多比特输入和多比特权值矩阵的全连接运算，即每次该闪存阵列结构单元均实现单比特数据的全连接运算。Or based on a flash memory array structural unit, the coding operation of the corresponding input vector and the weight matrix vector is performed in a time-division multiplexing manner; for the case where the number of flash memory array structural units is relatively limited, it can be based on one flash memory array structural unit at different times. , which is divided into multiple times to perform the full connection operation of the multi-bit input and the multi-bit weight matrix, that is, each time the flash memory array structural unit implements the full connection operation of the single-bit data.

可见，本发明实施例的编码方法，基于3D NAND FLASH结构单元的存内计算架构可在一个时钟周期内完成并行的全连接运算。It can be seen that the encoding method according to the embodiment of the present invention, based on the in-memory computing architecture of the 3D NAND FLASH structural unit, can complete parallel fully connected operations within one clock cycle.

根据本发明的另一实施例，如图5和图6所示，其中，本发明实施例的编码方法基于图5所示的编码闪存装置实现，编码运算为卷积层运算时，在步骤S410之前，编码方法还包括：According to another embodiment of the present invention, as shown in FIG. 5 and FIG. 6 , the encoding method of the embodiment of the present invention is implemented based on the encoding flash memory device shown in FIG. 5 , and when the encoding operation is a convolution layer operation, in step S410 Previously, encoding methods also included:

步骤S610：将卷积层的卷积矩阵中的多个卷积核元素中的每个卷积核元素预存入编码型闪存装置中多个运算单元中的对应运算单元中。Step S610: Pre-store each convolution kernel element in the multiple convolution kernel elements in the convolution matrix of the convolution layer into a corresponding operation unit among the multiple operation units in the coded flash memory device.

根据本发明的另一实施例，其中，在步骤S610之后，编码方法还包括：According to another embodiment of the present invention, after step S610, the encoding method further includes:

步骤S620：将对应的输入向量的输入元素对应输入多条字线中的每条字线中，以生成相应每条字线上的字线电压，使得编码型闪存装置的多个冗余单元处于开态。Step S620: Input the input elements of the corresponding input vector into each word line of the plurality of word lines, so as to generate a word line voltage on each corresponding word line, so that the plurality of redundant cells of the encoded flash memory device are in the open state.

根据本发明的另一实施例，其中，在步骤S620之后，编码方法还包括：According to another embodiment of the present invention, after step S620, the encoding method further includes:

步骤S630：对编码型闪存装置的串选择线和地选择线施加高电平，以生成其中每个闪存阵列结构单元的多条源线中每条源线上的源线电压。Step S630: Apply a high level to the string selection line and the ground selection line of the coded flash memory device to generate a source line voltage on each of the plurality of source lines of each flash memory array structural unit.

具体地，对于编码运算为卷积层运算时，如图5和图6所示，3D NAND FLASH结构单元的每个串负责处理单比特输入矩阵X(M×N)与卷积核K(k×k)的卷积运算。以附图5举例，输入3*3图像(X_1,1,X_1,2,X_1,3,X_2,1,X_2,2,X_2,3,X_3,1,X_3,2,X_3,3)与2*2卷积核(K_1,1,K_1,0,K_0,1,K_0,0)卷积后得到的2*2输出图像为(Y_1,1,Y_1,2,Y_2,1,Y_2,2)。实现上述卷积运算，输入图像(即对应的输入矩阵向量)需拆分成四个子矩阵(X_1,1,X_1,2,X_2,1,X_2,2)，(X_1,2,X_1,3,X_2,2,X_2,3)，(X_2,1,X_2,2,X_3,1,X_3,2)，(X_2,2,X_2，3,X_3,2,X_3,3)分别与卷积核(K_1,1,K_1,0,K_0,1,K_0,0)向量矩阵乘法得到Y_1,1,Y_1,2,Y_2,1,Y_2,2。Specifically, when the encoding operation is a convolution layer operation, as shown in Figure 5 and Figure 6, each string of the 3D NAND FLASH structural unit is responsible for processing the single-bit input matrix X (M × N) and the convolution kernel K (k ×k) convolution operation. Taking Figure 5 as an example, input a 3*3 image (X _1,1 ,X _1,2 ,X _1,3 ,X _2,1 ,X _2,2 ,X _2,3 ,X _3,1 ,X _{3, 2} ,X _3,3 ) and the 2*2 convolution kernel (K _1,1 ,K _1,0 ,K _0,1 ,K _0,0 )The 2*2 output image obtained after convolution is (Y _{1, 1} ,Y _1,2 ,Y _2,1 ,Y _2,2 ). To implement the above convolution operation, the input image (that is, the corresponding input matrix vector) needs to be split into four sub-matrices (X _1,1 ,X _1,2 ,X _2,1 ,X _2,2 ), (X _1,2 ,X _1,3 ,X _2,2 ,X _2,3 ),(X _2,1 ,X _2,2 ,X _3,1 ,X _3,2 ),(X _2,2 ,X _2,3 , X _3,2 ,X _3,3 ) and the convolution kernel (K _1,1 ,K _1,0 ,K _0,1 ,K _0,0 ) vector-matrix multiplication respectively to obtain Y _1,1 ,Y _1,2 , Y _2,1 ,Y _2,2 .

在编码运算时，卷积层的卷积矩阵中每个卷积核元素被映射到输入向量(X_1,1,X_1,2,X_2,1,X_2,2)的对应运算单元C中，即位线WL1,WL2,WL4,WL5与各源线SL对应的交叉位置，即对应于步骤S610。考虑到深度神经网络中每个卷积层均有多个卷积核对输入图像进行多维度特征提取，因此可以将其他卷积核按照相同的原理分别写入WL1～WL9的对应运算单元C中。During the encoding operation, each convolution kernel element in the convolution matrix of the convolution layer is mapped to the corresponding operation unit C of the input vector (X _1,1 ,X _1,2 ,X _2,1 ,X _2,2 ) , that is, the intersection positions of the bit lines WL1 , WL2 , WL4 , and WL5 corresponding to the source lines SL, that is, corresponding to step S610 . Considering that each convolutional layer in the deep neural network has multiple convolution kernels to perform multi-dimensional feature extraction on the input image, other convolution kernels can be written into the corresponding operation units C of WL1-WL9 respectively according to the same principle.

输入图像沿行的方向拆分成行向量(X_1,1,X_1,2,…,X_1,N,…,,,X_M,1,X_M,2,…,X_M,N)并施加到3D NAND FLASH结构单元的字线WL上作为输入，以分别转化为对应字线WL上的输入字线电压，对应于步骤S620。The input image is split into row vectors (X _1,1 ,X _1,2 ,…,X _1,N ,…,,,X _M,1 ,X _M,2 ,…,X _M,N ) in the row direction and It is applied to the word line WL of the 3D NAND FLASH structure unit as an input to be converted into an input word line voltage on the corresponding word line WL, corresponding to step S620.

当串选择线SSL线和地选择线GSL线加高电平时，运算开始：因为同一条源线SL只有一个FLASH单元(即运算单元C)，该FLASH单元存储卷积核元素值并进行运算，其运算结果(0或1)可通过源线电压反映，即对应于步骤S630。最后，如图5所示，4个加法器300分别将对应的4组比较器200的输出结果(每组4条源线SL)的输出结果相加，即可得到卷积层运算的输出结果(Y_1,1,Y_1,2,Y_2,1,Y_2,2)，此即实现了深度神经网络。When the string selection line SSL line and the ground selection line GSL line add high level, the operation starts: because the same source line SL has only one FLASH unit (ie, the operation unit C), the FLASH unit stores the convolution kernel element value and performs the operation, The operation result (0 or 1) can be reflected by the source line voltage, which corresponds to step S630. Finally, as shown in FIG. 5 , the four adders 300 respectively add the output results of the corresponding four groups of comparators 200 (each group of four source lines SL) to obtain the output results of the convolution layer operation. (Y _1,1 ,Y _1,2 ,Y _2,1 ,Y _2,2 ), which implements a deep neural network.

根据本发明的另一实施例，其中，编码运算为卷积层运算时，其中，卷积层的卷积矩阵为多比特数据。本发明的编码方法还包括：According to another embodiment of the present invention, when the encoding operation is a convolution layer operation, the convolution matrix of the convolution layer is multi-bit data. The encoding method of the present invention also includes:

步骤S440：对多个加法器的加和运算输出的加和结果进行移位求和，实现深度神经网络；具体地，当卷积核为多比特数据时，将编码运算得到的加法器的加和结果进行移位求和，即可以得到输入图像与多比特卷积核卷积后的输出图像。Step S440: Shifting and summing the summation results output by the summation operations of multiple adders to implement a deep neural network; The result is shifted and summed, that is, the output image after the input image is convolved with the multi-bit convolution kernel can be obtained.

可见，本发明的编码方法基于上述编码闪存装置，可以在基于3D NAND FLASH结构的存内计算架构上，实现在一个时钟周期内完成并行的卷积运算。It can be seen that the encoding method of the present invention is based on the above encoding flash memory device, and can realize parallel convolution operations in one clock cycle on the in-memory computing architecture based on the 3D NAND FLASH structure.

可见，在本发明的编码方法中，编码运算开始前，权值矩阵向量中的非零位元素或卷积向量中的卷积核元素依据一定的映射规则写入3D NAND FLASH结构单元对应的位置(运算单元)，输入向量中不同元素的值(0或1)决定闪存阵列结构单元中字线WL上的输入电压(即字线电压V_g＝V_{g_High}或V_{g_Low})，然后对应的源线SL的输出源线电流反映运算单元的计算结果。最后，通过在3D NAND FLASH结构单元中各条源线SL的末端设置的多个比较器，将计算结果转换为二进制形式(0或1)并行读出。在此基础上，与比较器连接的加法器对来自多条源线的比较器的输出结果进行加和运算，得到最终的加和运算结果，实现高效且精确的全连接或卷积层运算，从而实现深度神经网络。It can be seen that, in the encoding method of the present invention, before the encoding operation starts, the non-zero-bit elements in the weight matrix vector or the convolution kernel elements in the convolution vector are written into the position corresponding to the 3D NAND FLASH structural unit according to a certain mapping rule (operation unit), the value (0 or 1) of different elements in the input vector determines the input voltage on the word line WL in the structural unit of the flash memory array (ie the word line voltage V _g =V _{g_High} or V _{g_Low} ), and then the corresponding source line SL The output source line current reflects the calculation result of the arithmetic unit. Finally, through a plurality of comparators arranged at the ends of each source line SL in the 3D NAND FLASH structural unit, the calculation result is converted into binary form (0 or 1) and read out in parallel. On this basis, the adder connected to the comparator adds the output results of the comparators from multiple source lines to obtain the final summation result, which realizes efficient and accurate fully connected or convolutional layer operations. So as to realize the deep neural network.

至此，已经结合附图对本发明实施例进行了详细描述。So far, the embodiments of the present invention have been described in detail with reference to the accompanying drawings.

需要说明的是，在附图或说明书正文中，未绘示或描述的实现方式，均为所属技术领域中普通技术人员所知的形式，并未进行详细说明。此外，上述对各元件和方法的定义并不仅限于实施例中提到的各种具体结构、形状或方式，本领域普通技术人员可对其进行简单地更改或替换。It should be noted that, in the accompanying drawings or the text of the description, the implementations that are not shown or described are in the form known to those of ordinary skill in the technical field, and are not described in detail. In addition, the above definitions of various elements and methods are not limited to various specific structures, shapes or manners mentioned in the embodiments, and those of ordinary skill in the art can simply modify or replace them.

还需要说明的是，实施例中提到的方向用语，例如“上”、“下”、“前”、“后”、“左”、“右”等，仅是参考附图的方向，并非用来限制本发明的保护范围。贯穿附图，相同的元素由相同或相近的附图标记来表示。在可能导致对本发明的理解造成混淆时，将省略常规结构或构造。It should also be noted that the directional terms mentioned in the embodiments, such as "up", "down", "front", "rear", "left", "right", etc., only refer to the directions of the drawings, not used to limit the scope of protection of the present invention. Throughout the drawings, the same elements are denoted by the same or similar reference numbers. Conventional structures or constructions will be omitted when it may lead to obscuring the understanding of the present invention.

并且图中各部件的形状和尺寸不反映真实大小和比例，而仅示意本发明实施例的内容。另外，在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。Moreover, the shapes and sizes of the components in the figures do not reflect the actual size and proportion, but merely illustrate the contents of the embodiments of the present invention. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

再者，单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

说明书与权利要求中所使用的序数例如“第一”、“第二”、“第三”等的用词，以修饰相应的元件，其本身并不意味着该元件有任何的序数，也不代表某一元件与另一元件的顺序或是制造方法上的顺序，这些序数的使用仅用来使具有某命名的一元件得以和另一具有相同命名的元件能做出清楚区分。The ordinal numbers such as "first", "second", "third", etc. used in the description and the claims are used to modify the corresponding elements, which themselves do not mean that the elements have any ordinal numbers, nor do they Representing the order of an element from another element or the order in the manufacturing method, the use of these ordinal numbers is only used to clearly distinguish an element with a certain name from another element with the same name.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把他们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把他们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的代替特征来代替。并且，在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。Those skilled in the art will appreciate that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also, in a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware.

类似地，应当理解，为了精简本发明并帮助理解各个公开方面的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，公开方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together , or in its description. This disclosure, however, should not be construed as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

以上所述的具体实施例，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. An encoding type flash memory device, comprising:

at least one flash memory array structure unit, wherein each flash memory array structure unit is a 3D NAND FLASH array structure unit, and is used for realizing coding operation to generate source line voltage on each source line in a plurality of source lines of the flash memory array structure unit;

a plurality of comparators, wherein each comparator is correspondingly connected with each source line and is used for converting the source line voltage of each correspondingly connected source line into an output result in a binary form; and

and each adder is connected with at least 2 comparators in the comparators through corresponding source lines and is used for summing at least two output results corresponding to the at least 2 comparators so as to realize the deep neural network.

2. The coded flash memory device according to claim 1, wherein each of the flash array structure units comprises:

the memory comprises a plurality of operation units, wherein each operation unit is a transistor at the crossing position of each word line in a plurality of word lines and each source line in the flash memory array structure unit, and one operation unit is correspondingly arranged on each source line;

and the redundant units are transistors of non-operation units in each flash array structure unit and are used for being in an on state when the coding operation is realized.

3. The coded flash memory device according to claim 2, wherein each of said flash array structure units further comprises:

the string selection line is connected with the bit line end of the flash memory array structure unit;

the ground selection line is connected with a source line end of the flash memory array structure unit;

wherein the string select line and the ground select line are used to apply a high level when implementing the encoding operation.

4. The coded flash memory device of claim 1, wherein the coding operation is a full concatenation operation or a convolutional layer operation,

the number of the adders is equal to the number of operation results of the addition operation of the full-concatenation operation or the convolution layer operation.

5. An encoding method implemented based on the encoding type flash memory device of any one of claims 1 to 4, wherein the encoding method comprises:

performing encoding operation based on at least one flash memory array structure unit to generate source line voltage on each source line in a plurality of source lines of each flash memory array structure unit;

a plurality of comparators convert the source line voltage of each source line corresponding to each comparator into an output result in a binary form; and

and the adders add at least 2 output results corresponding to at least 2 comparators in the comparators to realize the deep neural network.

6. The encoding method of claim 5, wherein the encoding operation is a fully concatenated operation or a convolutional layer operation.

7. The encoding method of claim 6, wherein when the encoding operation is a full-link operation, before the encoding operation performed on at least one flash memory array structure unit generates a source line voltage on each source line of a plurality of source lines of each flash memory array structure unit, the method further comprises:

and pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the full connection layer into a corresponding operation unit in a plurality of operation units in the coding type flash memory device.

8. The encoding method of claim 6, wherein when the encoding operation is a convolutional layer operation, before the encoding operation performed by at least one flash memory array structure unit generates a source line voltage on each source line in a plurality of source lines of each flash memory array structure unit, the method further comprises:

and pre-storing each convolution kernel element in a plurality of convolution kernel elements in the convolution matrix of the convolution layer into a corresponding operation unit in a plurality of operation units in the coding type flash memory device.

9. The encoding method according to claim 7 or 8,

after pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the fully-connected layer into a corresponding operation unit in a plurality of operation units in the coded flash memory device, or after pre-storing each convolution kernel element in a plurality of convolution kernel elements in the convolution matrix of the convolutional layer into a corresponding operation unit in a plurality of operation units in the coded flash memory device, the method further includes:

correspondingly inputting the input elements of the corresponding input vector into each word line of the plurality of word lines to generate word line voltages on each word line, so that the plurality of redundant units of the coding type flash memory device are in an on state.

10. The encoding method of claim 9, wherein after correspondingly inputting the input elements of the corresponding input vector into each of the plurality of word lines to generate a word line voltage on the corresponding each word line such that the plurality of redundant cells of the encoding-type flash memory device are in an on state, further comprising:

applying a high level to a string select line and a ground select line of the encoding type flash memory device to generate a source line voltage on each of a plurality of source lines of each flash memory array structural unit.

11. The encoding method of claim 5, wherein when the encoding operation is a full join operation, the performing the encoding operation based on at least one flash memory array structural unit to generate a source line voltage on each source line of a plurality of source lines of each flash memory array structural unit comprises:

based on multiple flash memory array structure units, simultaneously performing coding operation corresponding to input vector and weight matrix vector, or

Performing coding operation of corresponding input vectors and weight matrix vectors by adopting a time division multiplexing mode based on a flash memory array structure unit;

wherein the input vector is multi-bit data.

12. The encoding method of claim 5, wherein when the encoding operation is a convolutional layer operation, the encoding method further comprises:

shifting and summing the summation result output by the summation operation of the summers to realize a deep neural network;

wherein the convolution matrix of the convolutional layer is multi-bit data.