WO2021237513A1 - Data compression storage system and method, processor, and computer storage medium - Google Patents
Data compression storage system and method, processor, and computer storage medium Download PDFInfo
- Publication number
- WO2021237513A1 WO2021237513A1 PCT/CN2020/092627 CN2020092627W WO2021237513A1 WO 2021237513 A1 WO2021237513 A1 WO 2021237513A1 CN 2020092627 W CN2020092627 W CN 2020092627W WO 2021237513 A1 WO2021237513 A1 WO 2021237513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- compression
- feature map
- data
- compressed
- module
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Definitions
- the embodiments of the present invention relate to the field of data processing, and more specifically, to a system, method, processor, and computer storage medium for data compression storage.
- the current data compression method has a small amount of data compression, that is, even the compressed data occupies a large storage space. Moreover, for larger compressed data, larger bandwidth resources need to be consumed in the process of reading and writing from the memory.
- the embodiment of the present invention provides a data compression storage system, method, processor, and computer storage medium, which can compress the feature map of the on-chip memory and then store it in the external memory, reducing the storage space and reducing the time for reading and writing. Bandwidth resources.
- a system for data compression and storage is provided.
- the system is used to compress a feature map in an on-chip memory and then store it in an external memory.
- the system includes a compression instruction generation module and a read arbitration module. , At least two compression paths and write arbitration module:
- the compression instruction generating module is configured to distribute the compression instruction to each of the at least two compression paths;
- Each of the at least two compression paths is configured to read the corresponding original feature map from the on-chip memory according to the compression instruction received from the compression instruction generation module, and read the original The feature map is compressed;
- the read arbitration module is configured to arbitrate the read feature map commands of the at least two compressed paths for the original feature map in the on-chip memory;
- the write arbitration module is configured to arbitrate the write requests of the at least two compression paths to write compressed data into the external memory.
- a method for data compression storage is provided.
- the method is used to compress a feature map in an on-chip memory and then store it in an external memory.
- the method includes:
- Each compression path reads the corresponding original feature map from the on-chip memory according to the received compression instruction, and compresses the read original feature map;
- the write requests of the at least two compression paths are arbitrated.
- a processor including:
- a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in the first aspect are implemented.
- the system for data compression storage of the embodiment of the present invention can compress the feature map of the on-chip memory and then store it in the external memory, which can make the compressed data occupy a small storage space, and on the one hand, it can reduce the external The space occupied by the memory, on the other hand, can also reduce the bandwidth resources when reading and writing, and save power consumption.
- the data compression in the embodiment of the present invention is compressed in parallel by at least two compression paths, which can also improve the efficiency of compression.
- Fig. 1 is a schematic diagram of data storage according to an embodiment of the present invention.
- Fig. 2 is a schematic block diagram of a system for data compression storage according to an embodiment of the present invention.
- Fig. 3 is a schematic diagram of various modules of a system for data compression storage according to an embodiment of the present invention.
- Fig. 4 is a schematic diagram of a flow of compression performed by the system for data compression storage according to an embodiment of the present invention.
- FIG. 5 is another schematic diagram of a process of performing compression by the system for data compression storage according to an embodiment of the present invention.
- Fig. 6 is a schematic diagram of a state machine of the system for data compression storage according to an embodiment of the present invention.
- FIG. 7 is a schematic diagram of each module of a compression path of a system for data compression storage according to an embodiment of the present invention.
- FIG. 8 is a schematic flowchart of a data storage method according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of a minimum access unit storage characteristic map according to an embodiment of the present invention.
- FIG. 10 is another schematic diagram of a minimum access unit storage characteristic map according to an embodiment of the present invention.
- FIG. 11 is a schematic diagram of calculating multiple differences for one data unit according to an embodiment of the present invention.
- Fig. 12 is a schematic structural diagram of a scan coding module according to an embodiment of the present invention.
- Fig. 13 is a schematic diagram of fetching a data unit from a minimum access unit according to an embodiment of the present invention.
- FIG. 14 is a schematic structural diagram of a difference algorithm compression module according to an embodiment of the present invention.
- FIG. 15 is a schematic diagram of several situations of compressed data of a data unit according to an embodiment of the present invention.
- FIG. 16 is a schematic flowchart of a compression process performed by a compression path according to an embodiment of the present invention.
- FIG. 17 is a schematic block diagram of an apparatus for data storage according to an embodiment of the present invention.
- neural networks such as Convolution Neural Networks (CNN).
- CNN Convolution Neural Networks
- a large amount of feature map data will be generated.
- data compression technology is usually used, which can reduce the space occupied by the external memory. , And can reduce the bandwidth when reading and writing.
- the external memory may be, for example, a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate Synchronous Dynamic Random Access Memory), or DDR for short.
- a convolutional neural network generally includes a large number of convolutional layers, and each convolutional layer generates a large amount of feature map data.
- these large amounts of feature map data are read and written to DDR, they will consume valuable system external memory bandwidth resources, resulting in other modules with high bandwidth requirements (such as CNN or other modules) because they cannot quickly access DDR and affect computing performance. .
- an embodiment of the present invention provides a method for a data compression storage system.
- the feature map data calculated by the convolutional neural network may be located in an on-chip memory.
- the embodiment of the present invention aims to compress the feature map data in the on-chip memory and store the compressed data in the external memory.
- the process can be similar to that shown in Figure 1, where the on-chip memory can be assumed to be SRAM on-chip, and the external memory can be assumed to be DDR.
- the compression system reads the feature map data from the on-chip memory, the feature map data is compressed, and the compressed feature map data is stored in the external memory.
- a system for data compression storage includes at least: a compression instruction generation module, a read arbitration module, at least two compression paths, and a write arbitration module, as shown in FIG. 2.
- the number of compression paths in the embodiment of the present invention is at least two, for example, it can be 3 or more, and can be specifically configured according to the performance of the processor and the size of the feature map data to be processed. In this way, the embodiment of the present invention can flexibly configure the number of compression paths according to the output rate of the feature map data, so as to flexibly meet the performance requirements of different processing tasks and improve the compression performance.
- compression path 1 compression path 1
- compression path 2 compression path 2
- the compression instruction generation module can be used to distribute the compression instructions to various compression paths.
- the compression path can read the corresponding feature map data from the on-chip memory according to the compression instruction, and then compress the read feature map data.
- the read arbitration module can arbitrate the read feature map commands of at least two compression paths for the feature map data in the on-chip memory.
- the write arbitration module can arbitrate the write requests of at least two compression paths to write the compressed data into the external memory.
- the compression instruction generation module can be expressed as the ENC_INSTR_PROC module, which can receive the compression instruction, parse the compression instruction; and further distribute the parsed compression instruction to each compression path.
- the processor may send a compression instruction to the compression instruction generation module.
- the compression instruction generation module After the compression instruction generation module receives the compression instruction, it can correspondingly distribute the compression instruction to each compression path, so that each compression path reads the characteristic map data from the on-chip memory and compresses it.
- the compression instructions distributed to a certain compression path may include: the number of feature maps to be compressed by the compression path, the width of the feature maps, the height of the feature maps, the base addresses of these numbers of feature maps in the on-chip memory, and the number of feature maps in the on-chip memory.
- the inter-picture storage interval of the on-chip memory the base address of these number of feature maps output to the external memory after being compressed by the compression path, the inter-picture storage interval of these numbers of feature maps output to the external memory after being compressed by the compression path, and these numbers
- the base address of the compressed header information after the feature map is compressed by the compression path is output to the external memory
- the header information storage interval of the compressed header information after these number of feature maps are compressed by the compression path is output to the external memory.
- the read arbitration module can be represented as the FM_RD_ARB module, and referring to Figure 3, the system can also include a read command buffer module (which can be represented as RD_CMD_FIFO module) and a read data path identification buffer module (which can be represented as RDATA_ID_FIFO module); both are arbitrated with read Module connection.
- the read arbitration module can obtain the read feature map commands issued by each compression path, that is, the read feature map commands issued by each compression path can be gathered here.
- the read characteristic map commands of each compression path can be cached in the respective read command cache module.
- the read arbitration module can arbitrate the read characteristic map commands in each read command cache module according to the arbitration rules to obtain the arbitration result.
- the command to read the characteristic map of the first compression path is sent to the on-chip memory first, and the path identification (ID) of the first compression path is stored in the read data path identification cache module. After that, after returning the feature map data from the on-chip memory, according to the path identification (ID) stored in the read data path identification cache module, the returned feature map data will be sent to the compressed path corresponding to the path identification (ID).
- the arbitration rules may be a priority mechanism or a fair polling mechanism configured according to the compressed instruction; or may be other arbitration-related mechanisms, which are not listed here. It can be understood that if the arbitration rule is a priority mechanism, the compression processing of the compression path with a high priority can be guaranteed first, and the task performance of the compression path with the priority can be ensured.
- the write arbitration module which can be expressed as the FM_WR_ARB module, can obtain the write requests issued by each compression path, that is, the write requests issued by each compression path can be gathered here.
- each write request can be arbitrated according to the arbitration rules, and the arbitration result can be obtained. If the result of the arbitration indicates that the second compression path wins, the write request of the second compression path is first sent to the external memory, that is, the compressed data obtained by the second compression path is stored in the external memory.
- the arbitration rules may be a priority mechanism or a fair polling mechanism configured according to the compressed instruction; or may be other arbitration-related mechanisms, which are not listed here.
- each compression path module can include a feature map reading module, a feature map caching module, a data compression module, a data packing module, a compression header generation module, a length alignment module, and compression Header cache module, compressed feature map cache module, compressed header write module and compressed feature map write module.
- the feature map reading module which can be expressed as an RD-FM module, can send a feature map read command for the original feature map in the on-chip memory according to the compression instruction received from the compression instruction generation module.
- the read feature map command may include the width and height of the original feature map to be read, the base address of the on-chip memory, and so on.
- the feature map reading module is also used to re-read the original feature map compressed this time from the on-chip memory.
- the feature map cache module which can be expressed as the SRC_FM_FIFO module, can be used to store the original feature map read back from the on-chip memory.
- the data compression module can be used to divide the original feature map in the feature map cache module into multiple data units, and perform differential compression for each of the multiple data units.
- the data compression module may include: a scan coding module and a difference algorithm compression module.
- the scan coding module can be expressed as the SCAN_DPCM module
- the difference algorithm compression module can be expressed as the RES_ENC module.
- the data compression module will be described in more detail below in conjunction with FIG. 7 to FIG. 17.
- the data packing module can be expressed as a DATA_PACK module, which is used to splice the data compressed by the data compression module into complete compressed data. Specifically, the fragmented data compressed by the data compression module is spliced into complete data, for example, into data with a unit of 16 bytes.
- the length alignment module can be expressed as the LEN_ALIGN module, which is used to fill in the length of the compressed data spliced by the data packing module to a specific length.
- LEN_ALIGN module is used to fill in the length of the compressed data spliced by the data packing module to a specific length.
- the specific length is related to the chip performance of the external memory. That is, the specific length may be preset according to the performance of the chip of the external memory.
- the compressed length can be filled with invalid data to a certain length.
- the length of the compressed data is added from N ⁇ 16B to ceil(N/4) ⁇ 64B, where ceil means rounding up, and N is a positive integer.
- the embodiment of the present invention can ensure that the external memory can work more efficiently by setting the length alignment module. Improve the performance of the entire system.
- the compression header generation module can be expressed as the ENC_HDR_GEN module, which can generate compression header information corresponding to the compressed data obtained by the data compression module according to the compression instruction received from the compression instruction generation module.
- the compression header information can be generated according to the address information in the compression instruction, the feature map size information, the length of the compression result in the current clock cycle, whether it is the end of the current data unit, and so on.
- the generated compression header information can be used to determine whether the current data unit needs to be bypassed, and on the other hand, the generated compression header information can be used to decompress compressed data in the future. It is understandable that the process of judging whether the bypass is needed based on the compressed header information is optional, but not necessary, that is, the compressed data and compressed header information can be stored without judging whether the bypass is needed.
- the compressed header buffer module which can be expressed as the ENC_HDR_FIFO module, is used to buffer the compressed header information to be output generated by the compressed header generator module.
- Compressed feature map cache module which can be expressed as the ENC_FM_FIFO module, used to cache the data to be output.
- the cached data may be compressed data with length complement, or it may be the original feature map read back from the on-chip memory during bypass operation. The original feature map after the length is complemented.
- the compression header writing module can be expressed as the ENC_HDR_WR module, which performs the writing operation of the compression header information.
- the compressed feature map write module can be represented as the ENC_FM_WR module, which performs data storage operations, specifically the compressed data in the compressed feature map cache module or the original feature map read back from the on-chip memory during the bypass operation to the external memory.
- the compression header information for the compressed data and the compression header information for the original data when the bypass operation is performed may have different compression identifiers.
- the first compression identifier represents compressed data
- the second compression identifier represents original data.
- the compression feature map write module namely the ENC_FM_WR module, records the base address of this write operation, that is, the address that will be written to the external memory.
- the currently working module may include a data compression module and so on.
- the reading feature map module that is, the RD_FM module, re-sends the reading feature map instruction, thereby restarting the reading of the original feature map of the compression unit this time.
- the original feature map is read from the on-chip memory again, and the read original feature map can be stored in the feature map cache module.
- bypass mechanism After the bypass mechanism reads the original feature map, it will not go through the data compression module and the data packing module, but directly from the feature map cache module to the length alignment module, and reuse the module for output length alignment.
- the compressed feature map writing module overwrites the previously obtained compressed data into the original feature map, and outputs the original feature map to the external memory.
- the embodiment of the present invention can ensure that the storage space occupied by the external memory is smaller by setting the bypass mechanism.
- the feature map data obtained through the convolutional neural network in the processor can be compressed and stored in the external memory.
- FIG. 4 A schematic flowchart of the method may be shown in FIG. 4 and includes:
- S101 Distribute the compression instruction to each of the at least two compression paths
- each compression path reads a corresponding original feature map from the on-chip memory according to the received compression instruction, and compresses the read original feature map;
- the compression instruction generation module may receive the compression instruction, parse the compression instruction, and then distribute the compression instruction to each compression path (PATH).
- the received compression instruction may include information describing the compression task to be performed by each compression path and the priority of each compression path. Then, after the compression path is analyzed, the tasks of each compression path can be configured according to the analysis and each compression path can be configured. The priority of the compressed path. After that, each compression path can perform compression work in accordance with the received compression instruction.
- the feature map data can be read from the on-chip memory, compressed, and then compressed information (such as compression header information including length) can be calculated.
- the compressed data is greater than the length of the original feature map data, read the original feature map data again.
- length alignment is performed, and the compression result is written.
- the written compression result includes not only the compressed data after the length is filled or the original feature map data when the bypass operation is performed, but also the compressed header information. If each compression path has completed the compression storage process, the flow of the compression instruction ends; otherwise, it waits for the unfinished compression path to continue execution.
- the system for data compression storage of the embodiment of the present invention can realize compression instruction reception, processing, and distribution, and can monitor and feedback completion.
- the workflow shown in FIG. 5 is clear, and can realize the compression and storage of feature map data.
- system for data compression storage in the embodiment of the present invention may have multiple different states, including but not limited to: idle state, receiving instruction state, parsing instruction state, waiting for completion state, and the like.
- state switching can be implemented according to the state machine shown in FIG. 6.
- the idle state can be expressed as the IDLE state.
- the start signal of the compression command can be expressed as instr_strt.
- the receiving instruction status can be expressed as the RCV_INSTR state.
- the command ready signal can be output, and at the same time as the command ready signal is output or after the command ready signal is output, switch to the analysis command state.
- the instruction ready signal can be expressed as instr_rdy.
- the state of the analysis instruction can be expressed as the PROC_INSTR state.
- the compression instruction received in the receiving instruction state is analyzed, and the compression instruction is distributed to each compression path according to the analysis.
- the compression instructions distributed to each compression path can be expressed as instr_isu.
- the compression instructions distributed to a certain compression path may include: the number of feature maps to be compressed by the compression path, the width of the feature maps, the height of the feature maps, the base addresses of these numbers of feature maps in the on-chip memory, and the number of feature maps.
- the command information included in the compression command distributed to compression path 1 may include: (1) FM_NUM, which indicates the number of feature maps that need to be compressed in compression path 1; (2) FM_WIDTH, which indicates The width of the feature map that needs to be compressed in path 1; (3) FM_HIGHT, which indicates the height of the feature map that needs to be compressed in path 1; (4) FM_SRAM_BADDR, the base address of the feature map that needs to be compressed in path 1 in the on-chip memory; (5) FM_SRAM_LEN indicates the storage interval of the feature map that needs to be compressed in the on-chip memory in compression path 1; (6) FM_DDR_BADDR indicates the base address of the feature map after compression path 1 is compressed to the external memory; (7) FM_DDR_LEN, indicates compression The storage interval between the feature map compressed in path 1 and output to the external memory; (8) FM_HDR_BADDR, which means the base address of the compressed header information corresponding to the feature map compressed in path
- the waiting state can be expressed as the WAIT_DONE state.
- the completion signal of each compression path can be monitored, and after the completion of all the compression paths is monitored, it can be switched to the idle state.
- an instruction completion signal may be output to the upper-level module.
- the instruction completion signal can be expressed as instr_done.
- the embodiment of the present invention can ensure the normal operation of the system and ensure the safe and orderly storage of the feature map data.
- each compression path performs data compression in conjunction with Figures 7 to 17. It can be understood that, since the process of compressing the feature map data by each compression path is similar, the following compression process may be performed for any compression path.
- Figure 7 shows a schematic diagram of each module of a compression path.
- the function of each module is as described above in conjunction with FIG. 3, and in FIG. 7, the dashed box shows a data compression module, which includes a scan coding module and a difference algorithm compression module.
- the scan coding module can be expressed as the SCAN_DPCM module
- the difference algorithm compression module can be expressed as the RES_ENC module.
- the scan coding module (SCAN_DPCM module) is a difference calculation module of the difference (Residual, RES) compression method, which can scan (SCAN) the data to be compressed according to the compression performance, and take out a certain amount of data for compression.
- the difference algorithm compression module (RES_ENC module) is a data compression module of the difference compression method. According to the difference compression algorithm, the difference value output by the scan encoding module (SCAN_DPCM module) can be compressed, and the current cycle compression result will be output at the same time The length and whether it is the end of the current data unit.
- the feature map data is the output of the convolutional layer of the convolutional neural network
- the values of the adjacent two pixels of the feature map output by the convolutional layer are very close or even equal, it can be fully utilized to consider the characteristics
- the direct difference between adjacent pixels is used for compression.
- FIG. 8 is a schematic flowchart of a data storage method according to an embodiment of the present invention.
- the method shown in Figure 8 includes:
- S140 Store the compressed feature map data.
- the compression instruction generation module reads the compression instruction from the on-chip memory.
- the feature map read at one time through the read feature map command may correspond to the smallest access unit of the memory.
- the feature map data corresponding to the smallest access unit of the memory is received in S110.
- the size of the received feature map data may be equal to or smaller than the minimum access unit of the memory.
- the feature map data corresponding to the smallest access unit of the memory can be referred to as a compression unit.
- the feature map data corresponding to the smallest access unit is divided into multiple data units.
- aligning the width and the number of rows of the feature map according to the minimum access unit of the memory can facilitate the access of the feature map on the one hand, and can efficiently use the bandwidth of the read-write memory on the other hand.
- the storage space required by a row of data of the feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the same row of the feature map data. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit. Among them, the storage space required for one row of feature map data is determined according to the width of the feature map and the data bit width of each pixel.
- each row is aligned with 64B, and each 64B stores at most 1 row of the feature map (may also require multiple 64Bs to store 1 row ), the remaining invalid data can be filled with 0;
- the minimum access unit of the memory can also be 16Byte or other sizes, and the data bit width of each pixel can also be 4 bits or 16 bits or other sizes, and the feature map can be determined similarly.
- the storage form of the file is not listed one by one in the embodiment of the present invention.
- the feature map data to be stored corresponding to the read feature map command can be received in S110, and temporarily stored in the feature map cache module. It can be understood that what is received in S110 is the original feature map data before compression.
- S120 and S130 in FIG. 8 may be executed by the data compression module.
- the current compression unit can be set, such as a row of the feature map data or all of the feature map data. Subsequently, the current compression unit is divided into multiple data units. As an example, one data unit may include 8 pixels. In this way, a compression path can compress data units of 8 pixels at a time.
- each compression path can compress data units of 8 pixels at a time, which can improve the degree of parallelism, and on the one hand, improve the efficiency and speed of compression. , On the other hand, it also avoids becoming the performance bottleneck of the system.
- compression may be performed through the following process: divide the data unit into one or more groups; if the data of the first group of the plurality of groups is all zeros, Then the compressed data is 0; if the data in the second group of the multiple groups is not all zeros, then: determine multiple differences between the data in the second group, and based on the multiple The difference is compressed.
- the data of the first group is all zeros means: all the data of the first group are zeros. If the data in the second group is not all zeros, it means that at least one data in the second group is not zero.
- the multiple differences between the data in the second group refer to the differences between every two adjacent pixels.
- determining the multiple differences between the data in the second group may include: determining the first data in the second group and the last data before the second group The difference between a data, and determine the difference between each data in the second group except the first data and the first data. It is understandable that if the second group includes n0 data, then n0 differences will be obtained. Also, it should be noted that multiple differences are signed bit differences.
- the embodiment of the present invention can be executed by the scan coding module: divide a data unit into one or more groups; determine whether the data in each group is all zeros; the data in a certain group is not all zeros Calculate multiple differences between the data in the non-all-zero group.
- one data unit can be divided into two groups, that is, each group includes 4 pixels. If the 8 pixels of a data unit are represented as ⁇ p1,p2,p3,p4,p5,p6,p7,p8 ⁇ , then the two groups after division are: ⁇ p1,p2,p3,p4 ⁇ and ⁇ p5, p6, p7, p8 ⁇ . Subsequently, for the first set of data ⁇ p1, p2, p3, p4 ⁇ , determine whether the pixel values of these four pixels are all zeros, if they are all zeros, they can be represented by an all-zero indicator, for example, the all-zero indicator is 1-bit "0".
- the pixel values of these four pixels are not all zeros (that is, not all zeros), that is, at least one pixel is non-zero, it can be represented by a non-all zero indicator, for example, the non-all zero indicator is a 1-bit "1". ".
- the non-all zero indicator is a 1-bit "1”.
- the indicator can be obtained by judging whether the two groups are all zeros, as shown in Table 1 below.
- the difference values D1, D2, D3, and D4 can be calculated. If the indicator is "01”, the difference values D5, D6, D7, and D8 can be calculated. If the indicator is "11”, the difference values D1, D2, D3, D4, D5, D6, D7, and D8 can be calculated.
- the multiple difference values obtained are signed numbers. For example, assuming that each pixel in a data unit is an 8-bit signed number, the difference obtained is a 9-bit signed number, where the first bit of the 9-bit signed number is its sign bit, for example, A sign bit of 0 indicates a positive number, and a sign bit of 1 indicates a negative number.
- a schematic structural diagram of the scan coding module (SCAN_DPCM module) in the embodiment of the present invention may be as shown in FIG. 12.
- the register which can be expressed as SRORAGE_MIN_UNIT, is the smallest access unit of the temporary storage memory, which can include multiple data units.
- the scan coding module may divide the feature map data in the minimum access unit into multiple data units, that is, all data in one data unit are located in the same minimum access unit.
- the minimum access unit in conjunction with Figure 9 and Figure 10, if the storage space required for a row of feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the feature map data. Same line. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit.
- SCAN_MUX can select a data unit from the register (SRORAGE_MIN_UNIT), and then divide the data unit into one or more groups for compression. Specifically, SCAN_MUX fetches a data unit from the smallest access unit until the traversal completes the smallest access unit. And in order to avoid invalid compression operations, a data unit must contain at least one valid pixel. If all data contained in a data unit is invalid data for complement, it is an invalid data unit. At this time, the data unit can be skipped and the next data unit can be continued.
- compressing multiple differences in S130 may include: determining the number of storage bits according to multiple non-negative numbers corresponding to the multiple differences, and according to the sign bits of the multiple differences and the determined number of storage bits, Compress multiple differences.
- This process can be executed by the difference algorithm compression module. Specifically, the number of stored bits can be determined according to the multiple non-negative numbers corresponding to the multiple differences, and the multiple difference values can be combined according to the sign bits and the number of bits of the multiple differences. Perform compression.
- it may include: determining a plurality of non-negative numbers corresponding to the plurality of difference values one-to-one; determining the number of bits required for storage according to the position of the highest non-zero value in the plurality of non-negative numbers; The sign bit of and the number of bits compress multiple differences, where the storage length of the compressed difference is the number of bits.
- the non-negative number corresponding to the difference may refer to the absolute binary value of the difference.
- the sign bit of the first difference value indicates that the first difference value is a non-negative number
- the non-negative number corresponding to the first difference value is the number obtained by removing the sign bit of the first difference value.
- the sign bit of the second difference value indicates that the second difference value is a negative number
- the non-negative number corresponding to the second difference value is the second difference value after removing its sign bit and inverted.
- the position of the highest non-zero value in the multiple non-negative numbers can be determined by performing a "bitwise OR" operation on multiple non-negative numbers, and then the number of bits required to store multiple differences can be determined.
- the stored compressed data may include: a non-all zero indicator, a bit number indicator, and multiple compressed differences, where the bit number indicator indicates the compressed data The length of the multiple differences after removing the sign bit. That is to say, each of the multiple difference values after compression has data with a sign bit and a bit number.
- the scan coding module obtains 8 difference values D1 to D8.
- the following describes the difference algorithm compression module with reference to Figure 14 to compare the 8 difference values D1 to D8 Example process for compression.
- the sign bits of the eight differences D1 to D8 can be extracted, and then the corresponding non-negative numbers can be determined according to the sign bits.
- F1 represents the highest bit of D1, that is, the sign bit.
- D1' represents the remaining binary number after the sign bit of D1 is removed.
- d1' represents the non-negative number corresponding to the difference D1.
- D1 itself is a non-negative number
- D1' is a non-negative number corresponding to D1, that is, it is determined that d1' is D1'.
- D1 itself is a negative number
- ⁇ D1' is a non-negative number corresponding to D1
- d1' is determined to be ( ⁇ D1'), where ⁇ represents the inverse.
- D1 itself is a negative number (F1 is 0)
- the absolute value of the negative number represented by D1 is ⁇ D1'+1.
- the decimal number represented by an 8-bit binary number ranges from -256 to 255, because the binary number "11111111” represents the decimal number 255, and the sign bit "1" for negative numbers is added to the front of it to represent the decimal number -256; also That is to say, when the sign bit is "1", the absolute value of the corresponding negative number is the decimal number +1 after removing the sign bit.
- 8 non-negative numbers corresponding to 8 differences can be obtained: d1’, d2’, d3’, d4’, d5’, d6’, d7’, d8’.
- d_max1 is a bitwise OR operation on d1', d2', d3', and d4' to obtain the 4 difference values D1, D2, D3 and D4 of the first group. It is detected that 1 of the highest bit of d_max1 is the first bit, that is, len1.
- d_max2 is a bitwise OR operation of d5', d6', d7', and d8' to obtain the 4 difference values D5, D6, D7, and D8 of the second group. How many bits are needed to represent , It is detected that 1 of the highest bit of d_max2 is the first bit, that is, len2.
- each of the multiple difference values is a 9-bit difference value (including a 1-bit sign bit). Therefore, the number of storage bits occupied by each difference is at most 8, so that len1 and len2 only need 3 bits.
- a 3-bit binary number can represent [0,7], and in this example, the number of bits corresponding to the compressed difference is [1,8]. For example, suppose len1 is “010”, which means that the number of bits after the difference is compressed is 3; suppose len1 is “111”, which means that the number of bits after the difference is compressed is 8.
- each compressed difference value d1 to d8 also includes its own sign bit, the actual number of bits occupied by each compressed difference value is [2,9].
- the compressed data can be further obtained based on this.
- the compressed data for a group in the data unit includes: all zero/non-all zero indicator, bit number indicator (if not All zeros) and the compression difference (if not all zeros).
- the compressed data may be as shown in FIG. 15, and there may be three situations.
- ALL0_FLAG1 0
- ALL0_FLAG2 0
- compressed data (ENC_RESULT) ALL0_FLAG1, ALL0_FLAG2.
- the compression length of the compressed data can also be calculated, and the compression length can represent all the bits occupied by the compressed data, for example, refer to the sum of the bits of each data contained in the compressed data shown in FIG. 15.
- the obtained compressed data can be cached in the compressed feature map cache module after length-aligned for subsequent output process.
- the compression header generation module determines that bypass (bypass operation) is not needed, then the compressed data is written to the external memory; otherwise, the original feature map is read again and the compressed feature map is replaced The compressed data in the module is cached, and the original feature map after replacement is written into the external memory.
- the second data unit located after the first data unit is read immediately, and a similar compression operation is performed.
- the first data unit and the second data unit may be two adjacent data units that are sequentially compressed in time by the compression path.
- starting the compression process for the second data unit includes: judging whether the data of the second data unit is all zeros. That is to say, while the compressed data unit is written into the external memory, the compression process of determining whether all zeros is started is started for the second data unit. In this way, the pipeline compression processing process for multiple data units can be realized, and the resource utilization rate can be improved.
- the difference algorithm compression module compresses the difference. And when the difference algorithm compression module compresses the difference of the data in the first data unit, the scan coding module starts to determine whether the two groups in the second data unit are all zeros.
- different modules may be performing compression processing for different data units. In this way, resource utilization can be further improved, and the efficiency of data unit compression by the compression path can be improved.
- first data unit and the second data unit herein may be two adjacent data units to be processed in a register (as shown in FIG. 12, where data of the smallest access unit size of the temporary storage memory) is stored.
- the data compression module in the embodiment of the present invention may include a scan coding module and a difference algorithm compression module. It is a multi-stage compression pipeline design that can reduce the combinatorial logic of each stage, so that the generated circuit can support more High clock frequency improves chip performance.
- the scan encoding module and the difference algorithm compression module have the circuit design structures shown in FIG. 12 and FIG. 14, respectively, so that one compression path can compress 8 pixel data at a time.
- the process of performing compression by each compression path may be as shown in FIG. 16.
- the steps after reading the feature map data in FIG. 16 are performed by the difference compression model.
- the feature map data it is read according to the smallest access unit of the memory, that is, the feature map data of the smallest access unit size is read at a time, and the feature map data of the smallest access unit size is completed.
- the feature map data of the next smallest access unit size until all the feature map data indicated by the compression instruction has been read.
- the scan coding module can read a data unit of the feature map data with the smallest access unit size, and judge whether the two groups included in the data unit are all zeros, and if there are non-all zero groups, the original difference is calculated, where the original difference is The value represents the difference between two adjacent pixels in the data unit, such as the above D1 to D8.
- the difference algorithm compression module can compress the original difference values (such as the above D1 to D8) to obtain the compressed difference values (such as the above d1 to d8), and calculate the compression length.
- a flag indicating whether the current data unit is the end of the current compression unit can also be output.
- the next data unit can be read from the feature map data of the smallest access unit size until the compression process of all pixels of the feature map data of the smallest access unit size is completed.
- the method for data compression and storage in the embodiment of the present invention fully takes into account the situation of zero in the feature map and the feature that the values of adjacent pixels of the feature map data are close, and the difference method is used for compression, which can make the compressed data
- the occupied storage space is smaller, on the one hand, it can reduce the space occupation of the external memory, on the other hand, it can also reduce the bandwidth resources during reading and writing, and save power consumption.
- the device may include: a receiving device 210, a dividing device 220, a compression device 230, and a storage device 240. .
- the receiving device 210 is configured to receive feature map data to be stored
- the dividing device 220 is configured to divide the feature map data into multiple data units
- the compression device 230 is configured to, for each data unit of the multiple data units, determine whether the data in the data unit is all zeros, and perform compression according to the result of the determination;
- the storage device 240 is configured to store the compressed feature map data.
- the compression device 230 compresses a data unit through the following process: divide the data unit into one or more groups; if the data in the first group of the multiple groups is all zeros, the compressed data It is 0; if the data of the second group in the multiple groups is not all zeros, then: determine multiple differences between the data in the second group, and compress according to the multiple differences.
- the compression device 230 is configured to: determine the difference between the first data in the second group and the last data before the second group, and to determine the difference between the first data in the second group The difference between each other data and the first data.
- the compression device 230 is configured to: determine a plurality of first non-negative numbers corresponding to a plurality of difference values one-to-one; The number of bits required; according to the sign bits and the number of bits of the multiple differences, the multiple differences are compressed, wherein the compressed length of each difference is the number of bits plus one.
- the compression device 230 is configured to: determine the number of bits required for storage by performing a bitwise OR operation on a plurality of first non-negative numbers.
- the compression device 230 is configured to:
- the first non-negative number corresponding to the first difference value is the number obtained by removing the sign bit of the first difference value
- the first non-negative number corresponding to the first difference value is the first difference value after removing its sign bit and inverted.
- the data stored after compressing the second group includes: a non-all zero indicator, a bit number indicator, and multiple difference values after compression.
- the bit number indicator represents the length of the compressed multiple differences after removing the sign bit.
- the non-all zero indicator is 1.
- the compression device 230 is further configured to: generate compression header information corresponding to the compressed data unit; wherein, the storage device 240 is configured to: combine the compressed data unit with the corresponding compression header. Information is stored in external storage.
- the compression device 230 is further configured to: determine whether a bypass operation needs to be performed according to the compression header information; if it is determined that the bypass operation needs to be performed, generate bypass compression header information corresponding to the bypass operation.
- the storage device 240 is configured to store uncompressed feature map data and bypass compression header information in an external memory.
- it further includes a reading device configured to: receive a compression instruction; send a feature map read command according to the compression instruction, so as to obtain feature map data corresponding to the read feature map command from the on-chip memory.
- the read feature map command includes the width and height of the feature map data, and the base address of the on-chip memory.
- the receiving device 210 is configured to receive feature map data consistent with the size of the minimum access unit.
- the compression device 240 is configured to divide the feature map data into multiple data units according to the minimum access unit of the memory, wherein all data in one data unit are located in the same minimum access unit.
- the storage space required for a row of data of the feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the same row of the feature map data. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit.
- the feature map data to be stored is the output of the convolutional layer in the neural network.
- the compression device 230 is configured to: while storing the compressed first data unit, start the compression process for the second data unit.
- the first data unit and the second data unit are data units that are compressed sequentially in time.
- the compression device 230 is configured to start the compression process of the second data unit by determining whether the data of the second data unit is all zeros.
- the device shown in FIG. 17 can be used to implement the data storage method shown in FIG. 8. In order to avoid repetition, it will not be repeated here.
- the device shown in FIG. 17 can be any one of the at least two compression paths, and it is understandable that the device shown in FIG. 17 is only schematic, and it can also be implemented as Figure 7 shows the various modules.
- the system for data compression storage in the embodiment of the present invention can be implemented on a processor, for example, it can be a processor of various devices such as a computer, a server, a workstation, a mobile terminal, and a pan/tilt.
- the original feature map may be received or obtained by the processor from other devices, or generated by the processor in the process of executing other operations or algorithms.
- the processor may be in the process of executing a convolutional neural network. Generate the original feature map.
- an embodiment of the present invention also provides a processor.
- the processor may include an on-chip memory and the system as shown in FIG. 3.
- the processor may include on-chip memory and the device as shown in FIG. 17.
- the processor may include a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, such as Field-Programmable Gate Array (Field-Programmable Gate Array). , FPGA) or Advanced RISC (Reduced Instruction Set Computer) Machine (ARM), etc., and the processor may include other components to perform various desired functions.
- CPU Central Processing Unit
- FPGA Field-Programmable Gate Array
- ARM Advanced RISC
- the processor may include other components to perform various desired functions.
- characteristic map refers to the data before compression by the system of the embodiment of the present invention, unless otherwise indicated. , It can have two dimensions of width and height, or alternatively can have three dimensions of width, height and channel.
- the embodiment of the present invention also provides a computer storage medium on which a computer program is stored.
- the computer program is executed by the processor, the steps of the data storage method shown above can be realized.
- the computer storage medium is a computer-readable storage medium.
- the computer or the processor executes the steps of the method shown in FIG. 4 or FIG. 8.
- the computer or the processor executes the following steps: receiving the feature map data to be stored; dividing the feature map data into multiple data units; Each data unit of the plurality of data units: judges whether the data in the data unit is all zeros, and compresses the data according to the judgment result; and stores the compressed feature map data.
- the computer storage medium may include, for example, the memory card of a smart phone, the storage component of a tablet computer, the hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory ( CD-ROM), USB memory, or any combination of the above storage media.
- the computer-readable storage medium may be any combination of one or more computer-readable storage media.
- an embodiment of the present invention also provides a computer program product, which contains instructions, which when executed by a computer, cause the computer to execute the steps of the data storage method shown in FIG. 4 or FIG. 8.
- the computer when the instruction is executed by the computer, the computer is caused to execute: receive the feature map data to be stored; divide the feature map data into a plurality of data units; A data unit: judge whether the data in the data unit is all zeros, and compress according to the judgment result; store the compressed feature map data.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from a website, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
- the method for data compression and storage in the embodiment of the present invention fully takes into account the situation of zero in the feature map and the feature that the values of adjacent pixels of the feature map data are close, and the difference method is used for compression, which can make the compressed data
- the occupied storage space is smaller, on the one hand, it can reduce the space occupation of the external memory, on the other hand, it can also reduce the bandwidth resources during reading and writing, and save power consumption.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processor, or each unit may exist alone physically, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Memory System (AREA)
Abstract
A data compression storage system and method, a processor, and a computer storage medium, used for compressing a feature map in an on-chip memory and then storing same in an external memory. The system comprises: a compression command generating module is used for distributing compression commands to each of at least two compression paths; each compression path is used for reading a corresponding original feature map from an on-chip memory on the basis of the compression command and compressing same; a read arbitration module is used for performing arbitration on read feature map commands of the at least two compression paths; and a write arbitration module is used for performing arbitration on write requests of the at least two compression paths. Thus, the system enables compressed data to occupy less storage space, reducing the space occupation of the external memory and also reducing bandwidth resources during read/write, saving power consumption. In addition, the data compression is performed in parallel by at least two compression paths, increasing compression efficiency.
Description
本发明实施例涉及数据处理领域,并且更具体地,涉及一种数据压缩存储的系统、方法、处理器及计算机存储介质。The embodiments of the present invention relate to the field of data processing, and more specifically, to a system, method, processor, and computer storage medium for data compression storage.
在越来越多的场景中,需要将大量的数据进行存储。为了充分利用存储器的存储空间,为了存储更多的数据,一般会将数据进行压缩以后再进行存储。In more and more scenarios, a large amount of data needs to be stored. In order to make full use of the storage space of the memory, in order to store more data, the data is generally compressed and then stored.
但是目前的数据压缩方式对于数据的压缩量较小,也就是说,即使是压缩后的数据,所占用的存储空间也较大。并且,对于较大的压缩数据,从存储器进行读写操作的过程中,需要消耗较大的带宽资源。However, the current data compression method has a small amount of data compression, that is, even the compressed data occupies a large storage space. Moreover, for larger compressed data, larger bandwidth resources need to be consumed in the process of reading and writing from the memory.
发明内容Summary of the invention
本发明实施例提供了一种数据压缩存储的系统、方法、处理器及计算机存储介质,能够将片上存储器的特征图进行压缩后再存储到外部存储器,减小了存储空间且降低了读写时的带宽资源。The embodiment of the present invention provides a data compression storage system, method, processor, and computer storage medium, which can compress the feature map of the on-chip memory and then store it in the external memory, reducing the storage space and reducing the time for reading and writing. Bandwidth resources.
第一方面,提供了一种用于数据压缩存储的系统,所述系统用于将片上存储器中的特征图进行压缩后再存储在外部存储器中,所述系统包括压缩指令生成模块、读仲裁模块、至少两个压缩路径以及写仲裁模块:In the first aspect, a system for data compression and storage is provided. The system is used to compress a feature map in an on-chip memory and then store it in an external memory. The system includes a compression instruction generation module and a read arbitration module. , At least two compression paths and write arbitration module:
所述压缩指令生成模块,用于将压缩指令分发到所述至少两个压缩路径中的各个压缩路径;The compression instruction generating module is configured to distribute the compression instruction to each of the at least two compression paths;
所述至少两个压缩路径中的每个压缩路径,用于根据从所述压缩指令生成模块接收到的压缩指令,从所述片上存储器读取相应的原始特征图,并且将读取到的原始特征图进行压缩;Each of the at least two compression paths is configured to read the corresponding original feature map from the on-chip memory according to the compression instruction received from the compression instruction generation module, and read the original The feature map is compressed;
所述读仲裁模块,用于对所述至少两个压缩路径针对所述片上存储器中的原始特征图的读特征图命令进行仲裁;The read arbitration module is configured to arbitrate the read feature map commands of the at least two compressed paths for the original feature map in the on-chip memory;
所述写仲裁模块,用于对所述至少两个压缩路径将压缩后的数据写入所述外部存储器的写请求进行仲裁。The write arbitration module is configured to arbitrate the write requests of the at least two compression paths to write compressed data into the external memory.
第二方面,提供了一种用于数据压缩存储的方法,所述方法用于将片上存储器中的特征图进行压缩后再存储到外部存储器中,所述方法包括:In a second aspect, a method for data compression storage is provided. The method is used to compress a feature map in an on-chip memory and then store it in an external memory. The method includes:
将压缩指令分发到所述至少两个压缩路径中的各个压缩路径;Distributing the compression instruction to each of the at least two compression paths;
每个压缩路径都根据接收到的压缩指令,从所述片上存储器读取相应的原始特征图,并且将读取到的原始特征图进行压缩;Each compression path reads the corresponding original feature map from the on-chip memory according to the received compression instruction, and compresses the read original feature map;
将压缩后的特征图存入所述外部存储器;Storing the compressed feature map in the external memory;
其中,在至少两个压缩路径针对所述片上存储器中的原始特征图进行读取时,对所述至少两个压缩路径的读特征图命令进行仲裁;Wherein, when at least two compressed paths read the original feature map in the on-chip memory, arbitrate the read feature map commands of the at least two compressed paths;
其中,在所述至少两个压缩路径将压缩后的特征图写入所述外部存储器时,对所述至少两个压缩路径的写请求进行仲裁。Wherein, when the at least two compression paths write the compressed feature map into the external memory, the write requests of the at least two compression paths are arbitrated.
第三方面,提供了一种处理器,包括:In the third aspect, a processor is provided, including:
片上存储器,以及On-chip memory, and
上述第一方面所述的用于数据压缩存储的系统。The system for data compression storage described in the first aspect above.
第四方面,提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面所述方法的步骤。In a fourth aspect, a computer storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in the first aspect are implemented.
可见,本发明实施例的用于数据压缩存储的系统,能够将片上存储器的特征图进行压缩后再存储到外部存储器,能够使得压缩数据所占用较小的存储空间,一方面能够减小对外部存储器的空间占用,另一方面也能够降低读写时的带宽资源,节省了功耗。并且,本发明实施例的数据压缩是由至少两个压缩路径进行并行压缩的,还能够提高压缩的效率。It can be seen that the system for data compression storage of the embodiment of the present invention can compress the feature map of the on-chip memory and then store it in the external memory, which can make the compressed data occupy a small storage space, and on the one hand, it can reduce the external The space occupied by the memory, on the other hand, can also reduce the bandwidth resources when reading and writing, and save power consumption. In addition, the data compression in the embodiment of the present invention is compressed in parallel by at least two compression paths, which can also improve the efficiency of compression.
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本发明实施例的数据存储的一个示意图。Fig. 1 is a schematic diagram of data storage according to an embodiment of the present invention.
图2是本发明实施例的用于数据压缩存储的系统的一个示意框图。Fig. 2 is a schematic block diagram of a system for data compression storage according to an embodiment of the present invention.
图3是本发明实施例的用于数据压缩存储的系统的各个模块的示意图。Fig. 3 is a schematic diagram of various modules of a system for data compression storage according to an embodiment of the present invention.
图4是本发明实施例的用于数据压缩存储的系统执行压缩的一个流程示意图。Fig. 4 is a schematic diagram of a flow of compression performed by the system for data compression storage according to an embodiment of the present invention.
图5是本发明实施例的用于数据压缩存储的系统执行压缩的另一个流程示意图。FIG. 5 is another schematic diagram of a process of performing compression by the system for data compression storage according to an embodiment of the present invention.
图6是本发明实施例的用于数据压缩存储的系统的状态机的一个示意图。Fig. 6 is a schematic diagram of a state machine of the system for data compression storage according to an embodiment of the present invention.
图7是本发明实施例的用于数据压缩存储的系统的一个压缩路径的各个模块的示意图。FIG. 7 is a schematic diagram of each module of a compression path of a system for data compression storage according to an embodiment of the present invention.
图8是本发明实施例的数据存储的方法的一个示意性流程图。FIG. 8 is a schematic flowchart of a data storage method according to an embodiment of the present invention.
图9是本发明实施例的最小访问单元存储特征图的一个示意图。FIG. 9 is a schematic diagram of a minimum access unit storage characteristic map according to an embodiment of the present invention.
图10是本发明实施例的最小访问单元存储特征图的另一个示意图。FIG. 10 is another schematic diagram of a minimum access unit storage characteristic map according to an embodiment of the present invention.
图11是本发明实施例的针对一个数据单元计算多个差值的示意图。FIG. 11 is a schematic diagram of calculating multiple differences for one data unit according to an embodiment of the present invention.
图12是本发明实施例的扫描编码模块的一个示意性结构图。Fig. 12 is a schematic structural diagram of a scan coding module according to an embodiment of the present invention.
图13是本发明实施例的从最小访问单元中取数据单元的一个示意图。Fig. 13 is a schematic diagram of fetching a data unit from a minimum access unit according to an embodiment of the present invention.
图14是本发明实施例的差值算法压缩模块的一个示意性结构图。FIG. 14 is a schematic structural diagram of a difference algorithm compression module according to an embodiment of the present invention.
图15是本发明实施例的一个数据单元的压缩数据的几种情形的示意图。FIG. 15 is a schematic diagram of several situations of compressed data of a data unit according to an embodiment of the present invention.
图16是本发明实施例的一个压缩路径执行压缩过程的一个示意流程图。FIG. 16 is a schematic flowchart of a compression process performed by a compression path according to an embodiment of the present invention.
图17是本发明实施例的用于数据存储的装置的一个示意框图。FIG. 17 is a schematic block diagram of an apparatus for data storage according to an embodiment of the present invention.
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
随着人工智能技术的发展,在越来越多的领域中都会涉及深度学习等算法。深度学习的核心之一是神经网络,例如卷积神经网络(Convolution Neural Networks,CNN)。在卷积神经网络的计算过程中,会产生大量的特征图数据,在将这些特征图数据写入到处理器的外部存储器,通常会使用数据压缩技术,如此能够减小对外部存储器的占用空间,且能够降低读写时的带宽。其中,外部存储器可以诸如为双倍速率同步动态随机存取存储器(Double Data Rate Synchronous Dynamic Random Access Memory),或简称为DDR。With the development of artificial intelligence technology, algorithms such as deep learning will be involved in more and more fields. One of the cores of deep learning is neural networks, such as Convolution Neural Networks (CNN). In the calculation process of the convolutional neural network, a large amount of feature map data will be generated. When writing these feature map data to the external memory of the processor, data compression technology is usually used, which can reduce the space occupied by the external memory. , And can reduce the bandwidth when reading and writing. Among them, the external memory may be, for example, a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate Synchronous Dynamic Random Access Memory), or DDR for short.
但是,卷积神经网络一般会包括大量的卷积层,每个卷积层都会产生大量的特征图数据。这些大量的特征图数据对DDR进行读写操作时,会消耗宝贵的系统外部存储器带宽资源,从而导致其他的带宽需求大的模块(如CNN或其他模块)因为无法快速访问DDR而影响了计算性能。并且,由于对DDR的访问量增大,会进一步地导致功耗更高。However, a convolutional neural network generally includes a large number of convolutional layers, and each convolutional layer generates a large amount of feature map data. When these large amounts of feature map data are read and written to DDR, they will consume valuable system external memory bandwidth resources, resulting in other modules with high bandwidth requirements (such as CNN or other modules) because they cannot quickly access DDR and affect computing performance. . Moreover, due to the increased access to DDR, it will further lead to higher power consumption.
为了进一步地减小压缩后的数据量,进一步减小读写DDR时的带宽进而降低功耗,本发明实施例提供了一种用于数据压缩存储的系统的方法。具体地,卷积神经网络计算所得到的特征图数据可以是位于片上存储器,本发明实施例旨在将片上存储器中的特征图数据进行压缩,并将压缩之后的数据存储在外部存储器。该过程可以类似地如图1所示,其中可以假设片上存储器为SRAM on-chip,假设外部存储器为DDR。压缩系统从片上存储器读取特征图数据之后,将特征图数据进行压缩操作,并将压缩后的特征图数据存入外部存储器中。In order to further reduce the amount of compressed data, further reduce the bandwidth when reading and writing DDR, and thereby reduce power consumption, an embodiment of the present invention provides a method for a data compression storage system. Specifically, the feature map data calculated by the convolutional neural network may be located in an on-chip memory. The embodiment of the present invention aims to compress the feature map data in the on-chip memory and store the compressed data in the external memory. The process can be similar to that shown in Figure 1, where the on-chip memory can be assumed to be SRAM on-chip, and the external memory can be assumed to be DDR. After the compression system reads the feature map data from the on-chip memory, the feature map data is compressed, and the compressed feature map data is stored in the external memory.
本发明实施例中,用于数据压缩存储的系统至少包括:压缩指令生成模块、读仲裁模块、至少两个压缩路径以及写仲裁模块,如图2所示。In the embodiment of the present invention, a system for data compression storage includes at least: a compression instruction generation module, a read arbitration module, at least two compression paths, and a write arbitration module, as shown in FIG. 2.
应当理解的是,本发明实施例中的压缩路径的数量为至少两个,例如可以为3个甚至更多个,具体地可以依据处理器的性能、待处理的特征图数据的大小进行配置。这样,本发明实施例能够根据特征图数据的输出速率来灵活地配置压缩路径的数量,从而灵活地满足不同的处理任务的性能需求,提高压缩性能。It should be understood that the number of compression paths in the embodiment of the present invention is at least two, for example, it can be 3 or more, and can be specifically configured according to the performance of the processor and the size of the feature map data to be processed. In this way, the embodiment of the present invention can flexibly configure the number of compression paths according to the output rate of the feature map data, so as to flexibly meet the performance requirements of different processing tasks and improve the compression performance.
为了简化示意,图2中仅示出了两个压缩路径,分别为压缩路径1和压缩路径2。To simplify the illustration, only two compression paths are shown in FIG. 2, namely compression path 1 and compression path 2.
压缩指令生成模块可以用于将压缩指令分发到各个压缩路径。压缩路径可以根据压缩指令从片上存储器读取相应的特征图数据,并且随后将读取到的特征图数据进行压缩。读仲裁模块可以对至少两个压缩路径针对片上存储器中的特征图数据的读特征图命令进行仲裁。写仲裁模块可以对至少两个压缩路径将压缩后的数据写入外部存储器的写请求进行仲裁。The compression instruction generation module can be used to distribute the compression instructions to various compression paths. The compression path can read the corresponding feature map data from the on-chip memory according to the compression instruction, and then compress the read feature map data. The read arbitration module can arbitrate the read feature map commands of at least two compression paths for the feature map data in the on-chip memory. The write arbitration module can arbitrate the write requests of at least two compression paths to write the compressed data into the external memory.
压缩指令生成模块,可以表示为ENC_INSTR_PROC模块,其可以接收压缩指令,解析该压缩指令;并进一步将解析后的压缩指令分发到各个压缩路径。具体地,当片上存储器存储有特征图数据需要进行压缩存储时,可以由处理器向该压缩指令生成模块发送压缩指令。该压缩指令生成模块接收到压缩指令后,可以对应地向各个压缩路径分发压缩指令,以便各个压缩路径从片上存储器读取特征图数据并进行压缩。其中,分发给某个压缩路径的压缩指令可以包括:该压缩路径待压缩的特征图数量、特征图宽度、特征图高度、这些数量的特征图在片上存储器的基地址、这些数量的特征图在片上存储器的图间存储间隔、这些数量的特征图被该压缩路径压缩之后输出到外部存储器的基地址、这些数量的特征图被该压缩路径压缩之后输出到外部存储器的图间存储间隔、这些数量的特征图被该压缩路径压缩之后的压缩头信息输出到外部存储器的基地址、这些数量的特征图被该压缩路径压缩之后的压 缩头信息输出到外部存储器的头信息存储间隔。The compression instruction generation module can be expressed as the ENC_INSTR_PROC module, which can receive the compression instruction, parse the compression instruction; and further distribute the parsed compression instruction to each compression path. Specifically, when the feature map data is stored in the on-chip memory and needs to be compressed and stored, the processor may send a compression instruction to the compression instruction generation module. After the compression instruction generation module receives the compression instruction, it can correspondingly distribute the compression instruction to each compression path, so that each compression path reads the characteristic map data from the on-chip memory and compresses it. Among them, the compression instructions distributed to a certain compression path may include: the number of feature maps to be compressed by the compression path, the width of the feature maps, the height of the feature maps, the base addresses of these numbers of feature maps in the on-chip memory, and the number of feature maps in the on-chip memory. The inter-picture storage interval of the on-chip memory, the base address of these number of feature maps output to the external memory after being compressed by the compression path, the inter-picture storage interval of these numbers of feature maps output to the external memory after being compressed by the compression path, and these numbers The base address of the compressed header information after the feature map is compressed by the compression path is output to the external memory, and the header information storage interval of the compressed header information after these number of feature maps are compressed by the compression path is output to the external memory.
读仲裁模块,可以表示为FM_RD_ARB模块,并且参照图3,该系统还可以包括读命令缓存模块(可以表示为RD_CMD_FIFO模块)和读数据路径标识缓存模块(可以表示为RDATA_ID_FIFO模块);均与读仲裁模块连接。读仲裁模块可以获取各个压缩路径发出的读特征图命令,也就是说,各个压缩路径发出的读特征图命令可以汇聚在此处。并且,各个压缩路径的读特征图命令可以缓存在各自的读命令缓存模块内。读仲裁模块可以根据仲裁规则对各个读命令缓存模块内的读特征图命令进行仲裁,得到仲裁结果。如果仲裁结果表示第一压缩路径胜出,则将第一压缩路径的读特征图命令优先发送到片上存储器,并将该第一压缩路径的路径标识(ID)存储在读数据路径标识缓存模块内。在此之后,从片上存储器返回特征图数据后,将根据存储在读数据路径标识缓存模块内的路径标识(ID),将返回的特征图数据送至与该路径标识(ID)对应的压缩路径去。可选地,其中的仲裁规则可以是按照压缩指令配置好的优先级机制或者公平轮询机制;或者可以为其他仲裁相关的机制,此处不再罗列。可理解,如果仲裁规则是优先级机制,那么可以优先保障高优先级的压缩路径的压缩处理,确保该优先级的压缩路径的任务性能。The read arbitration module can be represented as the FM_RD_ARB module, and referring to Figure 3, the system can also include a read command buffer module (which can be represented as RD_CMD_FIFO module) and a read data path identification buffer module (which can be represented as RDATA_ID_FIFO module); both are arbitrated with read Module connection. The read arbitration module can obtain the read feature map commands issued by each compression path, that is, the read feature map commands issued by each compression path can be gathered here. In addition, the read characteristic map commands of each compression path can be cached in the respective read command cache module. The read arbitration module can arbitrate the read characteristic map commands in each read command cache module according to the arbitration rules to obtain the arbitration result. If the arbitration result indicates that the first compression path wins, the command to read the characteristic map of the first compression path is sent to the on-chip memory first, and the path identification (ID) of the first compression path is stored in the read data path identification cache module. After that, after returning the feature map data from the on-chip memory, according to the path identification (ID) stored in the read data path identification cache module, the returned feature map data will be sent to the compressed path corresponding to the path identification (ID). . Optionally, the arbitration rules may be a priority mechanism or a fair polling mechanism configured according to the compressed instruction; or may be other arbitration-related mechanisms, which are not listed here. It can be understood that if the arbitration rule is a priority mechanism, the compression processing of the compression path with a high priority can be guaranteed first, and the task performance of the compression path with the priority can be ensured.
写仲裁模块,可以表示为FM_WR_ARB模块,可以获取各个压缩路径发出的写请求,也就是说,各个压缩路径发出的写请求可以汇聚在此处。并且,可以根据仲裁规则对各个写请求进行仲裁,得到仲裁结果。如果仲裁结果表示第二压缩路径胜出,则将第二压缩路径的写请求优先发送到外部存储器,即将第二压缩路径得到的压缩数据存储在外部存储器。可选地,其中的仲裁规则可以是按照压缩指令配置好的优先级机制或者公平轮询机制;或者可以为其他仲裁相关的机制,此处不再罗列。The write arbitration module, which can be expressed as the FM_WR_ARB module, can obtain the write requests issued by each compression path, that is, the write requests issued by each compression path can be gathered here. In addition, each write request can be arbitrated according to the arbitration rules, and the arbitration result can be obtained. If the result of the arbitration indicates that the second compression path wins, the write request of the second compression path is first sent to the external memory, that is, the compressed data obtained by the second compression path is stored in the external memory. Optionally, the arbitration rules may be a priority mechanism or a fair polling mechanism configured according to the compressed instruction; or may be other arbitration-related mechanisms, which are not listed here.
压缩路径模块,可以表示为ENC_PATH模块,并且参照图3,每个压缩路径模块可以包括读特征图模块、特征图缓存模块、数据压缩模块、数据打 包模块、压缩头生成模块、长度对齐模块、压缩头缓存模块、压缩特征图缓存模块、压缩头写模块和压缩特征图写模块。The compression path module can be expressed as the ENC_PATH module, and referring to Figure 3, each compression path module can include a feature map reading module, a feature map caching module, a data compression module, a data packing module, a compression header generation module, a length alignment module, and compression Header cache module, compressed feature map cache module, compressed header write module and compressed feature map write module.
读特征图模块,可以表示为RD-FM模块,可以根据从压缩指令生成模块接收到的压缩指令,发送针对片上存储器中的原始特征图的读特征图命令。其中,读特征图命令可以包括待读取的原始特征图的宽度、高度、在片上存储器的基地址等。可选地,当需要执行旁路操作时,读特征图模块还用于从片上存储器重新读取本次压缩的原始特征图。The feature map reading module, which can be expressed as an RD-FM module, can send a feature map read command for the original feature map in the on-chip memory according to the compression instruction received from the compression instruction generation module. Among them, the read feature map command may include the width and height of the original feature map to be read, the base address of the on-chip memory, and so on. Optionally, when the bypass operation needs to be performed, the feature map reading module is also used to re-read the original feature map compressed this time from the on-chip memory.
特征图缓存模块,可以表示为SRC_FM_FIFO模块,可以用于存储从片上存储器读回的原始特征图。The feature map cache module, which can be expressed as the SRC_FM_FIFO module, can be used to store the original feature map read back from the on-chip memory.
数据压缩模块,可以用于将特征图缓存模块内的原始特征图划分为多个数据单元,并针对多个数据单元中的每个数据单元进行差值压缩。示例性地,该数据压缩模块可以包括:扫描编码模块和差值算法压缩模块。其中,扫描编码模块可以表示为SCAN_DPCM模块,差值算法压缩模块可以表示为RES_ENC模块。具体地,关于数据压缩模块将在下文中结合图7至图17进行更详细地阐述。The data compression module can be used to divide the original feature map in the feature map cache module into multiple data units, and perform differential compression for each of the multiple data units. Exemplarily, the data compression module may include: a scan coding module and a difference algorithm compression module. Among them, the scan coding module can be expressed as the SCAN_DPCM module, and the difference algorithm compression module can be expressed as the RES_ENC module. Specifically, the data compression module will be described in more detail below in conjunction with FIG. 7 to FIG. 17.
数据打包模块,可以表示为DATA_PACK模块,其用于将经数据压缩模块所压缩之后的数据拼接成完整的压缩数据。具体地将经数据压缩模块所压缩之后的零碎数据拼接成完整的数据,比如拼接成以16byte为单位的数据。The data packing module can be expressed as a DATA_PACK module, which is used to splice the data compressed by the data compression module into complete compressed data. Specifically, the fragmented data compressed by the data compression module is spliced into complete data, for example, into data with a unit of 16 bytes.
长度对齐模块,可以表示为LEN_ALIGN模块,其用于将经数据打包模块拼接后的压缩数据的长度补齐到特定长度。示例性地,当需要执行旁路操作时,还用于将原始特征图的长度补齐到特定长度。也就是说,可以将待输出数据的长度补齐到某个特定长度。其中,特定长度是与外部存储器的芯片性能相关的。也就是说,特定长度可以是根据外部存储器的芯片的性能所预先设定的。示例性地,对于当前的数据单元,结束的时候,可以用无效数据把压缩长度补齐到一定长度。例如当前的数据单元结束时,将压缩数据的长 度由N×16B,补齐到ceil(N/4)×64B,其中ceil表示向上取整,N为正整数。可理解,由于一些外部存储器(如DDR)的芯片在写数据满足一定长度时,才会高效工作,因此本发明实施例通过设置长度对齐模块,能够保证外部存储器能够更高效地工作,从而也能够提升整个系统的性能。The length alignment module can be expressed as the LEN_ALIGN module, which is used to fill in the length of the compressed data spliced by the data packing module to a specific length. Exemplarily, when a bypass operation needs to be performed, it is also used to fill in the length of the original feature map to a specific length. In other words, the length of the data to be output can be padded to a certain length. Among them, the specific length is related to the chip performance of the external memory. That is, the specific length may be preset according to the performance of the chip of the external memory. Exemplarily, for the current data unit, at the end, the compressed length can be filled with invalid data to a certain length. For example, at the end of the current data unit, the length of the compressed data is added from N×16B to ceil(N/4)×64B, where ceil means rounding up, and N is a positive integer. It can be understood that because some external memory (such as DDR) chips will only work efficiently when the write data meets a certain length, the embodiment of the present invention can ensure that the external memory can work more efficiently by setting the length alignment module. Improve the performance of the entire system.
压缩头生成模块,可以表示为ENC_HDR_GEN模块,其可以根据从压缩指令生成模块接收到的压缩指令,生成与数据压缩模块得到的压缩数据对应的压缩头信息。具体地,可以根据压缩指令中的地址信息、特征图尺寸信息、当前时钟周期内压缩结果的长度、是否是当前数据单元的结束等信息,生成压缩头信息。一方面,所生成的压缩头信息可以用于判断是否需要旁路(bypass)当前数据单元,另一方面,所生成的压缩头信息用于将来对压缩数据进行解压缩。可理解的是,基于压缩头信息判断是否需要旁路的过程是可选的,而非必须的,也就是说,可以将压缩数据和压缩头信息进行存储,而不判断是否需要旁路。The compression header generation module can be expressed as the ENC_HDR_GEN module, which can generate compression header information corresponding to the compressed data obtained by the data compression module according to the compression instruction received from the compression instruction generation module. Specifically, the compression header information can be generated according to the address information in the compression instruction, the feature map size information, the length of the compression result in the current clock cycle, whether it is the end of the current data unit, and so on. On the one hand, the generated compression header information can be used to determine whether the current data unit needs to be bypassed, and on the other hand, the generated compression header information can be used to decompress compressed data in the future. It is understandable that the process of judging whether the bypass is needed based on the compressed header information is optional, but not necessary, that is, the compressed data and compressed header information can be stored without judging whether the bypass is needed.
压缩头缓存模块,可以表示为ENC_HDR_FIFO模块,用于缓存压缩头生成模块所生成的待输出的压缩头信息。The compressed header buffer module, which can be expressed as the ENC_HDR_FIFO module, is used to buffer the compressed header information to be output generated by the compressed header generator module.
压缩特征图缓存模块,可以表示为ENC_FM_FIFO模块,用于缓存待输出数据,其中缓存的数据可能是长度补齐后的压缩数据,也可能是旁路操作时从片上存储器读回的原始特征图的长度补齐后的原始特征图。Compressed feature map cache module, which can be expressed as the ENC_FM_FIFO module, used to cache the data to be output. The cached data may be compressed data with length complement, or it may be the original feature map read back from the on-chip memory during bypass operation. The original feature map after the length is complemented.
压缩头写模块,可以表示为ENC_HDR_WR模块,其执行压缩头信息的写操作。The compression header writing module can be expressed as the ENC_HDR_WR module, which performs the writing operation of the compression header information.
压缩特征图写模块,可以表示为ENC_FM_WR模块,其执行数据存储操作,具体地将压缩特征图缓存模块内的压缩数据或旁路操作时从片上存储器读回的原始特征图写入外部存储器。The compressed feature map write module can be represented as the ENC_FM_WR module, which performs data storage operations, specifically the compressed data in the compressed feature map cache module or the original feature map read back from the on-chip memory during the bypass operation to the external memory.
其中,如果压缩后的压缩数据比原始数据要大,即压缩数据所占用的存储空间将大于原始数据所占用的存储空间,此时存储压缩数据是不合理的, 因此将执行旁路操作,并存储原始数据。具体地,执行旁路操作时的工作流程可以简述如下:Among them, if the compressed data after compression is larger than the original data, that is, the storage space occupied by the compressed data will be greater than the storage space occupied by the original data. At this time, it is unreasonable to store the compressed data, so the bypass operation will be performed, and Store raw data. Specifically, the workflow when performing the bypass operation can be briefly described as follows:
a、记录本次旁路操作的旁路信息到压缩头信息,以供解压缩的时候使用。可理解,针对压缩数据的压缩头信息与针对执行旁路操作时原始数据的压缩头信息,两者可以具有不同的压缩标识。例如,第一压缩标识表示压缩数据,第二压缩标识表示原始数据。a. Record the bypass information of this bypass operation to the compression header information for use when decompressing. It can be understood that the compression header information for the compressed data and the compression header information for the original data when the bypass operation is performed may have different compression identifiers. For example, the first compression identifier represents compressed data, and the second compression identifier represents original data.
b、压缩特征图写模块,即ENC_FM_WR模块,记录本次写操作的基地址,也就是,即将写入到外部存储器的地址。b. The compression feature map write module, namely the ENC_FM_WR module, records the base address of this write operation, that is, the address that will be written to the external memory.
c、复位当前压缩路径正在工作的模块。例如,当前正在工作的模块可以包括数据压缩模块等。c. Reset the currently working module of the compression path. For example, the currently working module may include a data compression module and so on.
d、读特征图模块,即RD_FM模块,重新发送读特征图指令,从而再次启动本次压缩单元的原始特征图的读取。也就是说,重新从片上存储器读取原始特征图,并且所读取的原始特征图可以存储在特征图缓存模块。d. The reading feature map module, that is, the RD_FM module, re-sends the reading feature map instruction, thereby restarting the reading of the original feature map of the compression unit this time. In other words, the original feature map is read from the on-chip memory again, and the read original feature map can be stored in the feature map cache module.
e、旁路机制读取原始特征图之后,将不经过数据压缩模块和数据打包模块,而是直接从特征图缓存模块到长度对齐模块,并复用该模块进行输出长度对齐。e. After the bypass mechanism reads the original feature map, it will not go through the data compression module and the data packing module, but directly from the feature map cache module to the length alignment module, and reuse the module for output length alignment.
f、压缩特征图写模块将先前得到的压缩数据覆盖为原始特征图,并将原始特征图输出到外部存储器。f. The compressed feature map writing module overwrites the previously obtained compressed data into the original feature map, and outputs the original feature map to the external memory.
由此,本发明实施例通过设定旁路机制,能够确保对外部存储器所占用的存储空间更小。Therefore, the embodiment of the present invention can ensure that the storage space occupied by the external memory is smaller by setting the bypass mechanism.
通过本发明实施例的用于数据压缩存储的系统,能够将处理器中经过卷积神经网络等所得到的特征图数据进行压缩后,存储在外部存储器中。Through the system for data compression storage in the embodiment of the present invention, the feature map data obtained through the convolutional neural network in the processor can be compressed and stored in the external memory.
示例性地,本发明实施例中如图3所示的系统可以执行数据压缩存储的方法,该方法的一个示意性流程图可以如图4所示,包括:Exemplarily, the system shown in FIG. 3 in the embodiment of the present invention may execute a data compression storage method. A schematic flowchart of the method may be shown in FIG. 4 and includes:
S101,将压缩指令分发到至少两个压缩路径中的各个压缩路径;S101: Distribute the compression instruction to each of the at least two compression paths;
S102,每个压缩路径都根据接收到的压缩指令,从所述片上存储器读取相应的原始特征图,并且将读取到的原始特征图进行压缩;S102, each compression path reads a corresponding original feature map from the on-chip memory according to the received compression instruction, and compresses the read original feature map;
S103,将压缩后的特征图存入所述外部存储器;S103: Store the compressed feature map in the external memory;
其中,在至少两个压缩路径针对所述片上存储器中的原始特征图进行读取时,对所述至少两个压缩路径的读特征图命令进行仲裁。其中,在所述至少两个压缩路径将压缩后的特征图写入所述外部存储器时,对所述至少两个压缩路径的写请求进行仲裁。Wherein, when at least two compressed paths read the original feature maps in the on-chip memory, arbitration is performed on the read feature map commands of the at least two compressed paths. Wherein, when the at least two compression paths write the compressed feature map into the external memory, the write requests of the at least two compression paths are arbitrated.
或者,具体地,如3所示的系统执行压缩的过程也可以更加详细地如图5所示。Or, specifically, the process of performing compression by the system shown in 3 can also be shown in more detail in FIG. 5.
示例性地,压缩指令生成模块可以接收压缩指令,并解析压缩指令,随后可以将压缩指令分发到各个压缩路径(PATH)。其中,接收到的压缩指令可以包括描述各个压缩路径将要执行的压缩任务以及各个压缩路径的优先级的信息,那么,在解析该压缩路径之后,可以根据解析来配置各个压缩路径的任务并配置各个压缩路径的优先级。在此之后,各个压缩路径可以按照接收到的压缩指令进行压缩工作。具体地,可以从片上存储器读取特征图数据,对其进行压缩,随后可以计算压缩信息(如包括长度在内的压缩头信息)。根据压缩信息判断是否执行旁路操作,如果压缩后的数据大于原始的特征图数据的长度,则重新读取原始的特征图数据。在确定待输出存储的数据(压缩数据或者执行旁路操作时的原始特征图数据)后,进行长度对齐,并写出压缩结果。其中,写出的压缩结果不仅包括长度补齐之后的压缩数据或执行旁路操作时的原始特征图数据,还包括压缩头信息。如果各个压缩路径均完成了压缩存储的过程,则该压缩指令的流程结束;否则等待未完成的压缩路径继续执行。Exemplarily, the compression instruction generation module may receive the compression instruction, parse the compression instruction, and then distribute the compression instruction to each compression path (PATH). Among them, the received compression instruction may include information describing the compression task to be performed by each compression path and the priority of each compression path. Then, after the compression path is analyzed, the tasks of each compression path can be configured according to the analysis and each compression path can be configured. The priority of the compressed path. After that, each compression path can perform compression work in accordance with the received compression instruction. Specifically, the feature map data can be read from the on-chip memory, compressed, and then compressed information (such as compression header information including length) can be calculated. Determine whether to perform the bypass operation according to the compressed information, and if the compressed data is greater than the length of the original feature map data, read the original feature map data again. After determining the data to be output for storage (compressed data or original feature map data when performing a bypass operation), length alignment is performed, and the compression result is written. Among them, the written compression result includes not only the compressed data after the length is filled or the original feature map data when the bypass operation is performed, but also the compressed header information. If each compression path has completed the compression storage process, the flow of the compression instruction ends; otherwise, it waits for the unfinished compression path to continue execution.
可见,本发明实施例的用于数据压缩存储的系统能够实现压缩指令接收、处理、分发,并能够监控和反馈完成,图5所示的工作流程清晰,能够实现 特征图数据的压缩和存储。It can be seen that the system for data compression storage of the embodiment of the present invention can realize compression instruction reception, processing, and distribution, and can monitor and feedback completion. The workflow shown in FIG. 5 is clear, and can realize the compression and storage of feature map data.
另外,本发明实施例中的用于数据压缩存储的系统可以具有多个不同的状态,可以包括但不限于:空闲状态、接收指令状态、解析指令状态、等待完成状态等。示例性地,可以按照如图6所示的状态机来实现状态切换。In addition, the system for data compression storage in the embodiment of the present invention may have multiple different states, including but not limited to: idle state, receiving instruction state, parsing instruction state, waiting for completion state, and the like. Exemplarily, the state switching can be implemented according to the state machine shown in FIG. 6.
空闲状态,可以表示为IDLE状态,系统处于该状态时,等待压缩指令起始信号,并在接收到压缩指令起始信号之后,切换到接收指令状态。其中可以将压缩指令起始信号表示为instr_strt。The idle state can be expressed as the IDLE state. When the system is in this state, it waits for the compression command start signal, and after receiving the compression command start signal, it switches to the receiving command state. The start signal of the compression command can be expressed as instr_strt.
接收指令状态,可以表示为RCV_INSTR状态,系统处于该状态时,正在接收压缩指令,直到接收完成。在接收完成之后,可以输出指令就绪信号,并在输出该指令就绪信号的同时或在输出该指令就绪信号之后,切换到解析指令状态。可以将该指令就绪信号表示为instr_rdy。The receiving instruction status can be expressed as the RCV_INSTR state. When the system is in this state, the compressed instruction is being received until the receiving is completed. After the reception is completed, the command ready signal can be output, and at the same time as the command ready signal is output or after the command ready signal is output, switch to the analysis command state. The instruction ready signal can be expressed as instr_rdy.
解析指令状态,可以表示为PROC_INSTR状态,系统处于该状态时,对在接收指令状态所接收到的压缩指令进行解析,并根据解析向各个压缩路径分发压缩指令。可以将分发到各个压缩路径的压缩指令表示为instr_isu。具体地,分发给某个压缩路径的压缩指令可以包括:该压缩路径待压缩的特征图数量、特征图宽度、特征图高度、这些数量的特征图在片上存储器的基地址、这些数量的特征图在片上存储器的图间存储间隔、这些数量的特征图被该压缩路径压缩之后输出到外部存储器的基地址、这些数量的特征图被该压缩路径压缩之后输出到外部存储器的图间存储间隔、这些数量的特征图被该压缩路径压缩之后的压缩头信息输出到外部存储器的基地址、这些数量的特征图被该压缩路径压缩之后的压缩头信息输出到外部存储器的头信息存储间隔。The state of the analysis instruction can be expressed as the PROC_INSTR state. When the system is in this state, the compression instruction received in the receiving instruction state is analyzed, and the compression instruction is distributed to each compression path according to the analysis. The compression instructions distributed to each compression path can be expressed as instr_isu. Specifically, the compression instructions distributed to a certain compression path may include: the number of feature maps to be compressed by the compression path, the width of the feature maps, the height of the feature maps, the base addresses of these numbers of feature maps in the on-chip memory, and the number of feature maps. The storage interval between pictures in the on-chip memory, the base address of these number of feature maps output to the external memory after being compressed by the compression path, the storage interval between these numbers of feature maps output to the external memory after being compressed by the compression path, these The base address of the compressed header information of the number of feature maps compressed by the compression path is output to the external memory, and the header information storage interval of the compressed header information of the number of feature maps compressed by the compression path is output to the external memory.
以分发到压缩路径1的压缩指令为例,分发到压缩路径1的压缩指令所包括的指令信息可以包括:(1)FM_NUM,表示压缩路径1需要压缩的特征图数量;(2)FM_WIDTH,表示压缩路径1需要压缩的特征图宽度;(3) FM_HIGHT,表示压缩路径1需要压缩的特征图高度;(4)FM_SRAM_BADDR,表示压缩路径1需要压缩的特征图在片上存储器的基地址;(5)FM_SRAM_LEN,表示压缩路径1需要压缩的特征图在片上存储器的图间存储间隔;(6)FM_DDR_BADDR,表示压缩路径1进行压缩后的特征图输出到外部存储器的基地址;(7)FM_DDR_LEN,表示压缩路径1进行压缩后的特征图输出到外部存储器的图间存储间隔;(8)FM_HDR_BADDR,表示压缩路径1进行压缩后的特征图对应的压缩头信息输出到外部存储器的基地址;(9)FM_HDR_LEN,表示压缩路径1进行压缩后的特征图对应的压缩头信息输出到外部存储器的头信息存储间隔。Taking the compression command distributed to compression path 1 as an example, the command information included in the compression command distributed to compression path 1 may include: (1) FM_NUM, which indicates the number of feature maps that need to be compressed in compression path 1; (2) FM_WIDTH, which indicates The width of the feature map that needs to be compressed in path 1; (3) FM_HIGHT, which indicates the height of the feature map that needs to be compressed in path 1; (4) FM_SRAM_BADDR, the base address of the feature map that needs to be compressed in path 1 in the on-chip memory; (5) FM_SRAM_LEN indicates the storage interval of the feature map that needs to be compressed in the on-chip memory in compression path 1; (6) FM_DDR_BADDR indicates the base address of the feature map after compression path 1 is compressed to the external memory; (7) FM_DDR_LEN, indicates compression The storage interval between the feature map compressed in path 1 and output to the external memory; (8) FM_HDR_BADDR, which means the base address of the compressed header information corresponding to the feature map compressed in path 1 output to the external memory; (9) FM_HDR_LEN , Indicates the storage interval of the header information corresponding to the compressed feature map after compression path 1 is output to the external memory.
等待完成状态,可以表示为WAIT_DONE状态,系统处于该状态时,可以监控各个压缩路径的完成信号,并且可以在监控到所有压缩路径完成之后,切换到空闲状态。示例性地,在所述压缩路径完成后,可以输出指令完成信号给上级模块。可以将该指令完成信号表示为instr_done。The waiting state can be expressed as the WAIT_DONE state. When the system is in this state, the completion signal of each compression path can be monitored, and after the completion of all the compression paths is monitored, it can be switched to the idle state. Exemplarily, after the compression path is completed, an instruction completion signal may be output to the upper-level module. The instruction completion signal can be expressed as instr_done.
可见,本发明实施例通过设定用于数据压缩存储的系统的状态机,能够保证该系统的正常运行,确保特征图数据的安全有序地存储。It can be seen that, by setting the state machine of the system for data compression storage, the embodiment of the present invention can ensure the normal operation of the system and ensure the safe and orderly storage of the feature map data.
针对图3所示的系统,下面将结合图7至图17,描述各个压缩路径如何进行数据压缩。可理解,由于各个压缩路径对特征图数据进行压缩的过程是类似的,因此以下压缩过程可以是针对任一压缩路径进行的。For the system shown in Figure 3, the following will describe how each compression path performs data compression in conjunction with Figures 7 to 17. It can be understood that, since the process of compressing the feature map data by each compression path is similar, the following compression process may be performed for any compression path.
如图7所示为一个压缩路径的各个模块的示意图。其中,各个模块的功能如上面结合图3所描述的,并且在图7中,虚线框所示为数据压缩模块,其包括扫描编码模块和差值算法压缩模块。其中,扫描编码模块可以表示为SCAN_DPCM模块,差值算法压缩模块可以表示为RES_ENC模块。扫描编码模块(SCAN_DPCM模块)是一种差值(Residual,RES)压缩方法的差值计算模块,可以根据压缩性能,扫描(SCAN)待压缩数据,取出一定数量的数据进行压缩。并且可以理解,每个时钟周期(cycle)所能够压缩的数 据量,是该压缩路径的压缩性能的体现。差值算法压缩模块(RES_ENC模块)是一种差值压缩方法的数据压缩模块,可以根据差值压缩算法,完成对扫描编码模块(SCAN_DPCM模块)输出的差值进行压缩,同时输出当前cycle压缩结果的长度及是否是当前数据单元的结束。Figure 7 shows a schematic diagram of each module of a compression path. Among them, the function of each module is as described above in conjunction with FIG. 3, and in FIG. 7, the dashed box shows a data compression module, which includes a scan coding module and a difference algorithm compression module. Among them, the scan coding module can be expressed as the SCAN_DPCM module, and the difference algorithm compression module can be expressed as the RES_ENC module. The scan coding module (SCAN_DPCM module) is a difference calculation module of the difference (Residual, RES) compression method, which can scan (SCAN) the data to be compressed according to the compression performance, and take out a certain amount of data for compression. And it can be understood that the amount of data that can be compressed in each clock cycle (cycle) is a manifestation of the compression performance of the compression path. The difference algorithm compression module (RES_ENC module) is a data compression module of the difference compression method. According to the difference compression algorithm, the difference value output by the scan encoding module (SCAN_DPCM module) can be compressed, and the current cycle compression result will be output at the same time The length and whether it is the end of the current data unit.
在特征图数据为卷积神经网络的卷积层的输出的场景下,由于卷积层所输出的特征图的相邻两个像素的数值是非常接近甚至相等的,因此可以充分利用来特点考虑相邻像素直接的差值来进行压缩。In the scenario where the feature map data is the output of the convolutional layer of the convolutional neural network, since the values of the adjacent two pixels of the feature map output by the convolutional layer are very close or even equal, it can be fully utilized to consider the characteristics The direct difference between adjacent pixels is used for compression.
如图8所示是本发明实施例的数据存储的方法的一个示意性流程图。图8所示的方法包括:As shown in FIG. 8 is a schematic flowchart of a data storage method according to an embodiment of the present invention. The method shown in Figure 8 includes:
S110,接收待存储的特征图数据;S110: Receive feature map data to be stored;
S120,将所述特征图数据划分为多个数据单元;S120: Divide the feature map data into multiple data units;
S130,针对所述多个数据单元中的每个数据单元:判断所述数据单元中的数据是否为全零,并根据判断的结果进行压缩;S130, for each data unit of the multiple data units: determine whether the data in the data unit is all zeros, and compress according to the result of the determination;
S140,将压缩后的特征图数据进行存储。S140: Store the compressed feature map data.
示例性地,S110之前,还可以包括:接收压缩指令;根据所接收到的压缩指令,发送读特征图命令,以便在S110中从片上存储器获取与读特征图命令对应的特征图数据。具体地,一个压缩路径接收到压缩指令后,根据压缩指令中的指令信息向读仲裁模块发送读特征图命令,可选地,该读特征图命令包括待读取的特征图的宽度、高度、在片上存储器的基地址等。在一个实施方式中,压缩指令生成模块从片上存储器读取压缩指令。Exemplarily, before S110, it may further include: receiving a compression instruction; according to the received compression instruction, sending a feature map read command, so as to obtain feature map data corresponding to the feature map read command from the on-chip memory in S110. Specifically, after a compression path receives the compression instruction, it sends a read feature map command to the read arbitration module according to the instruction information in the compression instruction. Optionally, the read feature map command includes the width, height, and height of the feature map to be read. The base address of the on-chip memory, etc. In one embodiment, the compression instruction generation module reads the compression instruction from the on-chip memory.
示例性地,通过读特征图命令一次读取的特征图可以与存储器的最小访问单元对应。可选地,可以理解为,S110中接收与存储器的最小访问单元对应的特征图数据。如下面描述的,接收到的特征图数据的大小可以等于或小于存储器的最小访问单元。作为一例,与存储器的最小访问单元对应的特征图数据可以称为一个压缩单元。相应地,S120中,将与最小访问单元对应的特征图数据划分为多个数据单元。Exemplarily, the feature map read at one time through the read feature map command may correspond to the smallest access unit of the memory. Optionally, it can be understood that the feature map data corresponding to the smallest access unit of the memory is received in S110. As described below, the size of the received feature map data may be equal to or smaller than the minimum access unit of the memory. As an example, the feature map data corresponding to the smallest access unit of the memory can be referred to as a compression unit. Correspondingly, in S120, the feature map data corresponding to the smallest access unit is divided into multiple data units.
这样,将特征图的宽度和行数按照存储器的最小访问单元进行对齐,一方面能够便于特征图的存取,另一方面能够高效地利用读写存储器的带宽。In this way, aligning the width and the number of rows of the feature map according to the minimum access unit of the memory can facilitate the access of the feature map on the one hand, and can efficiently use the bandwidth of the read-write memory on the other hand.
示例性地,如果特征图数据的一行数据所需的存储空间大于最小访问单元,则位于同一个最小访问单元内的数据属于特征图数据的同一行。如果特征图数据的一行数据所需的存储空间小于最小访问单元,则属于特征图数据的同一行的数据位于同一个最小访问单元内。其中,特征图数据的一行数据所需的存储空间是根据特征图的宽度以及每个像素数据位宽所确定的。Exemplarily, if the storage space required by a row of data of the feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the same row of the feature map data. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit. Among them, the storage space required for one row of feature map data is determined according to the width of the feature map and the data bit width of each pixel.
具体地,假设存储器的最小访问单元为32Byte(简称为32B),每个像素的数据位宽为8比特(bit)。那么每个特征图的存储总长度都按照32B进行对齐。假设特征图宽度为fm_w,那么,如图9所示:(1)若fm_w>=17,则每行32B对齐,每个32B最多存储特征图的1行(也可能需要多个32B存储1行),其余无效数据可以用0补齐;(2)若fm_w<=16,则每行16B对齐,每个32B最多存储特征图的2行,其余无效数据用0补齐。示例性地,为了便于理解,可以将fm_w>=17的特征图定义为大图,将fm_w<=16的特征图定义为小图。Specifically, it is assumed that the minimum access unit of the memory is 32Byte (referred to as 32B for short), and the data bit width of each pixel is 8 bits (bit). Then the total storage length of each feature map is aligned according to 32B. Assuming that the width of the feature map is fm_w, then, as shown in Figure 9: (1) If fm_w>=17, each row is aligned with 32B, and each 32B stores at most 1 row of the feature map (may also require multiple 32Bs to store 1 row ), the remaining invalid data can be filled with 0; (2) If fm_w<=16, each row is aligned with 16B, and each 32B stores at most 2 rows of the feature map, and the remaining invalid data is filled with 0. Exemplarily, for ease of understanding, the feature map with fm_w>=17 can be defined as a large image, and the feature map with fm_w<=16 can be defined as a small image.
具体地,假设存储器的最小访问单元为64Byte(简称为64B),每个像素的数据位宽为8比特。那么每个特征图的存储总长度都按照64B进行对齐。假设特征图宽度为fm_w,那么,如图10所示:(1)若fm_w>=33,则每行64B对齐,每个64B最多存储特征图的1行(也可能需要多个64B存储1行),其余无效数据可以用0补齐;(2)若fm_w=[17,32],则每行32B对齐,每个64B最多存储特征图的2行,其余无效数据可以用0补齐;(3)若fm_w<=16,则每行16B对齐,每个64B最多存储特征图的4行,其余无效数据用0补齐。示例性地,为了便于理解,可以将fm_w>=33的特征图定义为大图,将fm_w=[17,32]的特征图定义为中图,将fm_w<=16的特征图定义为小图。Specifically, it is assumed that the minimum access unit of the memory is 64Byte (64B for short), and the data bit width of each pixel is 8 bits. Then the total storage length of each feature map is aligned according to 64B. Assuming that the width of the feature map is fm_w, then, as shown in Figure 10: (1) If fm_w>=33, each row is aligned with 64B, and each 64B stores at most 1 row of the feature map (may also require multiple 64Bs to store 1 row ), the remaining invalid data can be filled with 0; (2) If fm_w=[17,32], each row is aligned with 32B, and each 64B can store at most 2 rows of the feature map, and the remaining invalid data can be filled with 0; ( 3) If fm_w<=16, each row is aligned with 16B, and each 64B stores at most 4 rows of the feature map, and the remaining invalid data is filled with 0. Exemplarily, for ease of understanding, the feature map of fm_w>=33 can be defined as a large image, the feature map of fm_w=[17,32] can be defined as a middle image, and the feature map of fm_w<=16 can be defined as a small image. .
本领域技术人员应当理解的是,存储器的最小访问单位也可以是16Byte或其他的大小,每个像素的数据位宽也可以是4比特或16比特或其他大小,并且都可以类似地确定特征图的存储形式,本发明实施例不再一一罗列。Those skilled in the art should understand that the minimum access unit of the memory can also be 16Byte or other sizes, and the data bit width of each pixel can also be 4 bits or 16 bits or other sizes, and the feature map can be determined similarly. The storage form of the file is not listed one by one in the embodiment of the present invention.
随后,便可以在S110接收与该读特征图命令对应的待存储的特征图数据,并暂存在特征图缓存模块内。可理解,S110中所接收的是压缩之前的原始特征图数据。Subsequently, the feature map data to be stored corresponding to the read feature map command can be received in S110, and temporarily stored in the feature map cache module. It can be understood that what is received in S110 is the original feature map data before compression.
示例性地,图8中的S120和S130可以由数据压缩模块执行。S120中,可以设定当前压缩单元,如为该特征图数据的一行或者该特征图数据的全部。随后,再将当前压缩单元划分为多个数据单元。作为一例,一个数据单元可以包括8个像素。这样,一个压缩路径可以每次对8个像素的数据单元进行压缩。当采用如图3所示的系统由至少两个压缩路径进行并行压缩时,每一个压缩路径都能一次压缩8个像素的数据单元,这样能够提高并行度,一方面提升了压缩的效率和速率,另一方面也避免成为系统的性能瓶颈。Exemplarily, S120 and S130 in FIG. 8 may be executed by the data compression module. In S120, the current compression unit can be set, such as a row of the feature map data or all of the feature map data. Subsequently, the current compression unit is divided into multiple data units. As an example, one data unit may include 8 pixels. In this way, a compression path can compress data units of 8 pixels at a time. When the system shown in Figure 3 is used for parallel compression by at least two compression paths, each compression path can compress data units of 8 pixels at a time, which can improve the degree of parallelism, and on the one hand, improve the efficiency and speed of compression. , On the other hand, it also avoids becoming the performance bottleneck of the system.
示例性地,在S130中,针对一个数据单元,可以通过如下过程进行压缩:将所述数据单元划分为一个或多个组;如果所述多个组中的第一组的数据为全零,则压缩后的数据为0;如果所述多个组中的第二组的数据为非全零,则:确定所述第二组中的数据之间的多个差值,并根据所述多个差值进行压缩。Exemplarily, in S130, for one data unit, compression may be performed through the following process: divide the data unit into one or more groups; if the data of the first group of the plurality of groups is all zeros, Then the compressed data is 0; if the data in the second group of the multiple groups is not all zeros, then: determine multiple differences between the data in the second group, and based on the multiple The difference is compressed.
其中,第一组的数据为全零是指:第一组的所有数据都为零。第二组的数据为非全零是指:第二组的至少一个数据不为零。Among them, the data of the first group is all zeros means: all the data of the first group are zeros. If the data in the second group is not all zeros, it means that at least one data in the second group is not zero.
示例性地,如果第二组的数据为非全零,那么第二组中的数据之间的多个差值是指:每两个相邻的像素之间的差值。Exemplarily, if the data in the second group is not all zeros, the multiple differences between the data in the second group refer to the differences between every two adjacent pixels.
其中,如果第二组的数据为非全零,那么确定第二组中的数据之间的多个差值,可以包括:确定第二组中的第一个数据与位于第二组之前的最后一个数据之间差值,并确定第二组中除第一个数据之外的其他每个数据与第一 个数据之间的差值。可理解,假设第二组包括n0个数据,那么将得到n0个差值。并且,应该注意的是,多个差值为有符号位差值。Wherein, if the data in the second group is not all zeros, then determining the multiple differences between the data in the second group may include: determining the first data in the second group and the last data before the second group The difference between a data, and determine the difference between each data in the second group except the first data and the first data. It is understandable that if the second group includes n0 data, then n0 differences will be obtained. Also, it should be noted that multiple differences are signed bit differences.
结合图7,本发明实施例可以由扫描编码模块执行:将一个数据单元划分为一个或多个组;判断每个组中的数据是否为全零;在某个组中的数据为非全零时,计算该非全零的组中数据之间的多个差值。With reference to Figure 7, the embodiment of the present invention can be executed by the scan coding module: divide a data unit into one or more groups; determine whether the data in each group is all zeros; the data in a certain group is not all zeros Calculate multiple differences between the data in the non-all-zero group.
假设一个数据单元为8个像素,可以将一个数据单元划分为两个组,即每个组包括4个像素。若将一个数据单元的8个像素表示为{p1,p2,p3,p4,p5,p6,p7,p8},那么划分后的两个组依次为:{p1,p2,p3,p4}和{p5,p6,p7,p8}。随后,针对第一组数据{p1,p2,p3,p4},判断这四个像素的像素值是否全为零,如果全为零,则可以用全零指示符表示,例如全零指示符为1比特的“0”。如果这四个像素的像素值为非全零(即不全为零),即至少存在一个像素是非零的,则可以用非全零指示符表示,例如非全零指示符为1比特的“1”。针对第二组数据{p5,p6,p7,p8},可以执行类似的判断,并得到全零指示符或非全零指示符。Assuming that one data unit has 8 pixels, one data unit can be divided into two groups, that is, each group includes 4 pixels. If the 8 pixels of a data unit are represented as {p1,p2,p3,p4,p5,p6,p7,p8}, then the two groups after division are: {p1,p2,p3,p4} and { p5, p6, p7, p8}. Subsequently, for the first set of data {p1, p2, p3, p4}, determine whether the pixel values of these four pixels are all zeros, if they are all zeros, they can be represented by an all-zero indicator, for example, the all-zero indicator is 1-bit "0". If the pixel values of these four pixels are not all zeros (that is, not all zeros), that is, at least one pixel is non-zero, it can be represented by a non-all zero indicator, for example, the non-all zero indicator is a 1-bit "1". ". For the second set of data {p5, p6, p7, p8}, a similar judgment can be performed, and an all-zero indicator or a non-all-zero indicator can be obtained.
这样,针对一个数据单元{p1,p2,p3,p4,p5,p6,p7,p8},通过对两组进行是否全零判断,能够得到指示符,如下表一所示。In this way, for a data unit {p1, p2, p3, p4, p5, p6, p7, p8}, the indicator can be obtained by judging whether the two groups are all zeros, as shown in Table 1 below.
表一Table I
指示符indicator | 含义meaning |
0000 | 两组像素都全零Both sets of pixels are all zeros |
1010 | 第二组像素全零,第一组像素非全零The second group of pixels are all zeros, the first group of pixels are not all zeros |
0101 | 第一组像素全零,第二组像素非全零The first set of pixels are all zeros, the second set of pixels are not all zeros |
1111 | 两组像素都非全零Both sets of pixels are not all zeros |
进一步地,如果指示符为“10”,则可以计算差值D1、D2、D3和D4。如果指示符为“01”,则可以计算差值D5、D6、D7和D8。如果指示符为“11”, 则可以计算差值D1、D2、D3、D4、D5、D6、D7和D8。Further, if the indicator is "10", the difference values D1, D2, D3, and D4 can be calculated. If the indicator is "01", the difference values D5, D6, D7, and D8 can be calculated. If the indicator is "11", the difference values D1, D2, D3, D4, D5, D6, D7, and D8 can be calculated.
具体地,D1=p1–p0;D2=p2–p1;D3=p3–p1;D4=p4–p1。以及D5=p5–p4;D6=p6–p5;D7=p7–p5;D8=p8–p5。Specifically, D1 = p1-p0; D2 = p2-p1; D3 = p3-p1; D4 = p4-p1. And D5=p5–p4; D6=p6–p5; D7=p7–p5; D8=p8–p5.
其中,p0表示位于该数据单元之前的上一个数据单元的最后一个像素,如图11所示。应注意,如果该数据单元是当前压缩单元的起始位置,即该数据单元不存在上一个数据单元,则可以定义p0=0。Among them, p0 represents the last pixel of the previous data unit located before the data unit, as shown in FIG. 11. It should be noted that if the data unit is the starting position of the current compression unit, that is, there is no previous data unit in the data unit, p0=0 can be defined.
应当注意的是,所得到的多个差值是有符号数。举例来说,假设一个数据单元中的每个像素是8比特有符号数,那么得到的差值为9比特有符号数,其中,9比特有符号数的第一位是其符号位,例如,符号位为0表示正数,符号位为1表示负数。It should be noted that the multiple difference values obtained are signed numbers. For example, assuming that each pixel in a data unit is an 8-bit signed number, the difference obtained is a 9-bit signed number, where the first bit of the 9-bit signed number is its sign bit, for example, A sign bit of 0 indicates a positive number, and a sign bit of 1 indicates a negative number.
示例性地,本发明实施例中的扫描编码模块(SCAN_DPCM模块)的一个示意性结构图可以如图12所示。Exemplarily, a schematic structural diagram of the scan coding module (SCAN_DPCM module) in the embodiment of the present invention may be as shown in FIG. 12.
寄存器,可以表示为SRORAGE_MIN_UNIT,用于暂存存储器的最小访问单元,其可以包括多个数据单元。The register, which can be expressed as SRORAGE_MIN_UNIT, is the smallest access unit of the temporary storage memory, which can include multiple data units.
具体地,扫描编码模块可以将最小访问单元内的特征图数据划分为多个数据单元,也就是说,一个数据单元中的所有数据位于同一个最小访问单元内。Specifically, the scan coding module may divide the feature map data in the minimum access unit into multiple data units, that is, all data in one data unit are located in the same minimum access unit.
参照前述结合图9和图10部分的关于最小访问单元的相关描述,如果特征图数据的一行数据所需的存储空间大于最小访问单元,则位于同一个最小访问单元内的数据属于特征图数据的同一行。如果特征图数据的一行数据所需的存储空间小于最小访问单元,则属于特征图数据的同一行的数据位于同一个最小访问单元内。With reference to the foregoing description of the minimum access unit in conjunction with Figure 9 and Figure 10, if the storage space required for a row of feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the feature map data. Same line. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit.
SCAN_MUX可以从寄存器(SRORAGE_MIN_UNIT)选择一个数据单元,后续将对该数据单元划分为一个或多个组进行压缩。具体地,SCAN_MUX从最小访问单元中取一个数据单元,直到遍历完成该最小访问 单元。并且为了避免无效的压缩操作,一个数据单元一定包含至少一个有效的像素。如果遇到一个数据单元包含的数据全是用于补齐的无效数据,则是一个无效的数据单元,此时可以跳过该数据单元,继续取下一个数据单元。SCAN_MUX can select a data unit from the register (SRORAGE_MIN_UNIT), and then divide the data unit into one or more groups for compression. Specifically, SCAN_MUX fetches a data unit from the smallest access unit until the traversal completes the smallest access unit. And in order to avoid invalid compression operations, a data unit must contain at least one valid pixel. If all data contained in a data unit is invalid data for complement, it is an invalid data unit. At this time, the data unit can be skipped and the next data unit can be continued.
举例来说,假设存储器的最小访问单位是32B,特征图的每个像素数据位宽是8比特,下面结合图13说明如何避免无效的压缩操作:(1)当特征图的宽度(fm_w)取值[1,8]时,那么每16B会丢弃高8B;(2)当特征图的宽度(fm_w)取值[17,24]时,那么每32B会丢弃高8B。For example, assuming that the minimum access unit of the memory is 32B, and the data bit width of each pixel of the feature map is 8 bits, the following describes how to avoid invalid compression operations in conjunction with Figure 13: (1) When the width of the feature map (fm_w) is taken When the value is [1,8], then the height of 8B will be discarded every 16B; (2) When the width of the feature map (fm_w) takes the value [17,24], then the height of 8B will be discarded every 32B.
现在返回图12,针对取出的一个数据单元,假设包含8个像素{p1,p2,p3,p4,p5,p6,p7,p8},每4个像素为一组,即划分为两组,分别为{p1,p2,p3,p4}和{p5,p6,p7,p8}。随后可以判断第一组的4个像素是否全零,用全零/非全零指示符(ALL0_FLAG1)表示,判断第二组的4个像素是否全零,用全零/非全零指示符(ALL0_FLAG2)表示。Now return to Figure 12, for a data unit taken out, suppose it contains 8 pixels {p1, p2, p3, p4, p5, p6, p7, p8}, every 4 pixels are a group, that is, divided into two groups, respectively These are {p1,p2,p3,p4} and {p5,p6,p7,p8}. Then it can be judged whether the 4 pixels of the first group are all zeros, indicated by the all-zero/non-all-zero indicator (ALL0_FLAG1), and whether the 4 pixels of the second group are all zeros, use the all-zero/non-all-zero indicator ( ALL0_FLAG2) said.
如果第一组的4个元素全零,则ALL0_FLAG1=0,否则ALL0_FLAG1=1。如果第二组的4个元素全零,则ALL0_FLAG2=0,否则ALL0_FLAG2=1。参见上述表一,表示该数据单元中的像素的全零/非全零情形。If the 4 elements of the first group are all zeros, then ALL0_FLAG1=0, otherwise ALL0_FLAG1=1. If the 4 elements of the second group are all zeros, then ALL0_FLAG2=0, otherwise ALL0_FLAG2=1. Refer to Table 1 above, which indicates the all-zero/non-all-zero situation of the pixels in the data unit.
进一步地,如果ALL0_FLAG1=1,则可以计算差值(CALC_RES)D1,D2,D3,D4。如果ALL0_FLAG2=1,则可以计算差值(CALC_RES)D5,D6,D7,D8。其中,可以使用寄存器{D1,D2,D3,D4,D5,D6,D7,D8}流水存储差值,且差值满足:Further, if ALL0_FLAG1=1, the difference (CALC_RES) D1, D2, D3, D4 can be calculated. If ALL0_FLAG2=1, the difference (CALC_RES) D5, D6, D7, D8 can be calculated. Among them, you can use registers {D1, D2, D3, D4, D5, D6, D7, D8} to store the difference value in a pipeline, and the difference value satisfies:
D1=p1–p0;D1=p1–p0;
D2=p2–p1;D2=p2–p1;
D3=p3–p1;D3=p3-p1;
D4=p4–p1;D4=p4–p1;
D5=p5–p4;D5=p5–p4;
D6=p6–p5;D6=p6–p5;
D7=p7–p5;D7=p7–p5;
D8=p8–p5;D8=p8–p5;
其中,参见图12,p0为0或为上一个数据单元的最后一个像素。具体地,p0为上一个数据单元的最后一个像素(即图中PRE_P8),但是如果不存在上一个数据单元,即当前的数据单位为特征图的一行的开始(row_start),则p0=0。Wherein, referring to FIG. 12, p0 is 0 or the last pixel of the previous data unit. Specifically, p0 is the last pixel of the previous data unit (ie PRE_P8 in the figure), but if there is no previous data unit, that is, the current data unit is the start of a row of the feature map (row_start), then p0=0.
另外,可理解,如果{p1,p2,p3,p4,p5,p6,p7,p8}每个pix是8比特有符号数,那么上述差值{D1,D2,D3,D4,D5,D6,D7,D8}是9比特有符号数。In addition, it is understandable that if {p1, p2, p3, p4, p5, p6, p7, p8} each pix is an 8-bit signed number, then the above difference {D1, D2, D3, D4, D5, D6, D7, D8} are 9-bit signed numbers.
进一步地,在S130中对多个差值进行压缩可以包括:根据多个差值对应的多个非负数,确定存储比特数,并根据多个差值的符号位以及所确定的存储比特数,将多个差值进行压缩。Further, compressing multiple differences in S130 may include: determining the number of storage bits according to multiple non-negative numbers corresponding to the multiple differences, and according to the sign bits of the multiple differences and the determined number of storage bits, Compress multiple differences.
本发明实施例中,由于多个差值中可能既有大值也有小值,但是当多个差值均为小值时,可以使用更少的比特来表示,这样压缩后能够占用更少的存储空间。上述的存储比特数可以表示为len,用于表示将多个差值压缩后所需使用的最小比特数。In the embodiment of the present invention, since multiple differences may have both large and small values, when multiple differences are all small, fewer bits can be used to represent them, so that less space can be occupied after compression. storage. The above-mentioned number of stored bits can be expressed as len, which is used to represent the minimum number of bits that need to be used after compressing multiple differences.
该过程可以由差值算法压缩模块执行,具体地可以根据与多个差值对应的多个非负数,确定存储比特数,并根据多个差值的符号位以及比特数,将多个差值进行压缩。This process can be executed by the difference algorithm compression module. Specifically, the number of stored bits can be determined according to the multiple non-negative numbers corresponding to the multiple differences, and the multiple difference values can be combined according to the sign bits and the number of bits of the multiple differences. Perform compression.
示例性地,可以包括:确定与多个差值一一对应的多个非负数;根据多个非负数中的最高位非零值的位置,确定存储所需的比特数;根据多个差值的符号位以及所述比特数,对多个差值进行压缩,其中,压缩后的差值的存储长度为所述比特数。Exemplarily, it may include: determining a plurality of non-negative numbers corresponding to the plurality of difference values one-to-one; determining the number of bits required for storage according to the position of the highest non-zero value in the plurality of non-negative numbers; The sign bit of and the number of bits compress multiple differences, where the storage length of the compressed difference is the number of bits.
其中,与差值对应的非负数,可以是指,该差值的二进制的绝对值。示例性地,如果第一差值的符号位指示第一差值为非负数,则与第一差值对应的非负数为第一差值去掉其符号位所得到的数。如果第二差值的符号位指示 第二差值为负数,则与第二差值对应的非负数为第二差值去掉其符号位后取反所得到的。Among them, the non-negative number corresponding to the difference may refer to the absolute binary value of the difference. Exemplarily, if the sign bit of the first difference value indicates that the first difference value is a non-negative number, the non-negative number corresponding to the first difference value is the number obtained by removing the sign bit of the first difference value. If the sign bit of the second difference value indicates that the second difference value is a negative number, the non-negative number corresponding to the second difference value is the second difference value after removing its sign bit and inverted.
其中,可以通过对多个非负数进行“按位取或”操作,来确定多个非负数中的最高位非零值的位置,进而来确定存储多个差值所需的比特数。Wherein, the position of the highest non-zero value in the multiple non-negative numbers can be determined by performing a "bitwise OR" operation on multiple non-negative numbers, and then the number of bits required to store multiple differences can be determined.
其中,对多个差值进行压缩时,只保留每个差值的后面的比特数的位数,而将前面的为0的位删除,然后再在所保留的部分前面添加其符号位。并且,在对非全零的组进行压缩存储时,所存储的压缩数据可以包括:非全零指示符、比特数指示符、以及压缩后的多个差值,其中比特数指示符表示压缩后的多个差值除去符号位后的长度。也就是说,压缩后的多个差值中每个差值具有一个符号位和比特数位的数据。Among them, when compressing multiple differences, only the digits of the number of bits behind each difference are retained, and the digits with 0 in front are deleted, and then the sign bit is added in front of the reserved part. Moreover, when compressing and storing a group of non-all zeros, the stored compressed data may include: a non-all zero indicator, a bit number indicator, and multiple compressed differences, where the bit number indicator indicates the compressed data The length of the multiple differences after removing the sign bit. That is to say, each of the multiple difference values after compression has data with a sign bit and a bit number.
下面结合上述的数据单元{p1,p2,p3,p4,p5,p6,p7,p8}来描述该差值压缩过程。The following describes the difference compression process in conjunction with the aforementioned data units {p1, p2, p3, p4, p5, p6, p7, p8}.
假设第一组为非全零,且第二组为非全零,则由扫描编码模块得到8个差值D1至D8,下面结合图14描述差值算法压缩模块将8个差值D1至D8进行压缩的示例过程。参照图14,可以提取8个差值D1至D8的符号位,然后根据符号位确定对应的非负数。以D1为例,F1表示D1的最高位,即符号位。D1’表示D1去除符号位之后的剩余的二进制数。d1’表示该差值D1对应的非负数。具体地,如果D1本身是非负数,例如F1是0,那么D1’是与D1对应的非负数,即确定d1’为D1’。另一种情形,如果D1本身是负数,例如F1是1,那么(~D1’)是与D1对应的非负数,即确定d1’为(~D1’),其中~表示取反。应当注意的是,如果D1本身是负数(F1为0),那么D1所表示的负数的绝对值为~D1’+1。例如,8位二进制数表示的十进制数的范围为-256~255,由于二进制数“11111111”表示十进制数255,而在其前面增加表示负数的符号位“1”之后表示十进制数-256;也就是说,当符号位为“1”时,去掉该符号位后表示的十进制数+1才是对应的负数的绝对值。这样,通过类似的过程,可以得到与8个差值所对应的8个非负数:d1’,d2’, d3’,d4’,d5’,d6’,d7’,d8’。Assuming that the first group is non-all zeros, and the second group is non-all zeros, the scan coding module obtains 8 difference values D1 to D8. The following describes the difference algorithm compression module with reference to Figure 14 to compare the 8 difference values D1 to D8 Example process for compression. Referring to FIG. 14, the sign bits of the eight differences D1 to D8 can be extracted, and then the corresponding non-negative numbers can be determined according to the sign bits. Taking D1 as an example, F1 represents the highest bit of D1, that is, the sign bit. D1' represents the remaining binary number after the sign bit of D1 is removed. d1' represents the non-negative number corresponding to the difference D1. Specifically, if D1 itself is a non-negative number, for example, F1 is 0, then D1' is a non-negative number corresponding to D1, that is, it is determined that d1' is D1'. In another case, if D1 itself is a negative number, for example, F1 is 1, then (~D1') is a non-negative number corresponding to D1, that is, d1' is determined to be (~D1'), where ~ represents the inverse. It should be noted that if D1 itself is a negative number (F1 is 0), then the absolute value of the negative number represented by D1 is ~D1'+ 1. For example, the decimal number represented by an 8-bit binary number ranges from -256 to 255, because the binary number "11111111" represents the decimal number 255, and the sign bit "1" for negative numbers is added to the front of it to represent the decimal number -256; also That is to say, when the sign bit is "1", the absolute value of the corresponding negative number is the decimal number +1 after removing the sign bit. In this way, through a similar process, 8 non-negative numbers corresponding to 8 differences can be obtained: d1’, d2’, d3’, d4’, d5’, d6’, d7’, d8’.
随后,针对第一组,d_max1是对d1’,d2’,d3’,d4’按位取或操作,得到第一组的4个差值D1,D2,D3和D4需要多少个比特来表示,检测d_max1的最高比特位的1是第几个比特,也即len1。类似地,针对第二组,d_max2是对d5’,d6’,d7’,d8’按位取或操作,得到第二组的4个差值D5,D6,D7和D8需要多少个比特来表示,检测d_max2的最高比特位的1是第几个比特,也即len2。Subsequently, for the first group, d_max1 is a bitwise OR operation on d1', d2', d3', and d4' to obtain the 4 difference values D1, D2, D3 and D4 of the first group. It is detected that 1 of the highest bit of d_max1 is the first bit, that is, len1. Similarly, for the second group, d_max2 is a bitwise OR operation of d5', d6', d7', and d8' to obtain the 4 difference values D5, D6, D7, and D8 of the second group. How many bits are needed to represent , It is detected that 1 of the highest bit of d_max2 is the first bit, that is, len2.
在此之后,可以保留d1’,d2’,d3’,d4’的后面len1位,再在前面添加各自对应的符号位F1,F2,F3,F4,从而得到差值压缩后的结果d1,d2,d3和d4。可以保留d5’,d6’,d7’,d8’的后面len2位,再在前面添加各自对应的符号位F5,F6,F7,F8,从而得到差值压缩后的结果d5,d6,d7和d8。After this, you can keep the len1 bit behind d1', d2', d3', d4', and then add the corresponding sign bits F1, F2, F3, F4 in front to get the result d1, d2 after the difference compression , D3 and d4. You can keep the len2 bits behind d5', d6', d7', d8', and then add the corresponding sign bits F5, F6, F7, F8 in front to get the difference compressed results d5, d6, d7 and d8 .
应当理解的是,尽管结合图14描述了将8个差值进行压缩的过程,但是本发明对此不限定。例如如果第一组全为零,则无需计算D1,D2,D3和D4,进而也无需压缩得到d1,d2,d3和d4。同样地,如果第二组全为零,则无需计算D5,D6,D7和D8,进而也无需压缩得到d5,d6,d7和d8。It should be understood that although the process of compressing the eight differences is described in conjunction with FIG. 14, the present invention is not limited thereto. For example, if the first group is all zeros, there is no need to calculate D1, D2, D3, and D4, and there is no need to compress to get d1, d2, d3, and d4. Similarly, if the second group is all zeros, there is no need to calculate D5, D6, D7, and D8, and there is no need to compress to get d5, d6, d7, and d8.
如上所述,本发明实施例中假设每个像素数据位宽为8比特,从而多个差值中每个差值都是9比特差值(包含1比特的符号位)。因此,每个差值所占用的存储比特数最多为8,从而len1和len2只需要3比特就可以了。可理解,3比特的二进制数可以表示[0,7],在此例中,可以对应差值被压缩后的比特数为[1,8]。举例来说,假设len1为“010”,其表示差值被压缩后的比特数为3;假设len1为“111”,其表示差值被压缩后的比特数为8。另外,可理解,由于每个差值被压缩后的d1至d8还包括各自的符号位,因此每个压缩后的差值实际占用的位数是[2,9]。As described above, in the embodiment of the present invention, it is assumed that the data bit width of each pixel is 8 bits, so that each of the multiple difference values is a 9-bit difference value (including a 1-bit sign bit). Therefore, the number of storage bits occupied by each difference is at most 8, so that len1 and len2 only need 3 bits. It is understandable that a 3-bit binary number can represent [0,7], and in this example, the number of bits corresponding to the compressed difference is [1,8]. For example, suppose len1 is “010”, which means that the number of bits after the difference is compressed is 3; suppose len1 is “111”, which means that the number of bits after the difference is compressed is 8. In addition, it can be understood that since each compressed difference value d1 to d8 also includes its own sign bit, the actual number of bits occupied by each compressed difference value is [2,9].
在得到压缩差值之d1-d8后,可以基于此进一步得到压缩数据,具体地,针对数据单元中的一个组的压缩数据包括:全零/非全零指示符、比特数指示 符(如果非全零)以及压缩差值(如果非全零)。示例性地,针对如图11所示的数据单元,压缩数据可以如图15所示,可能存在三种情形。After obtaining the d1-d8 of the compression difference, the compressed data can be further obtained based on this. Specifically, the compressed data for a group in the data unit includes: all zero/non-all zero indicator, bit number indicator (if not All zeros) and the compression difference (if not all zeros). Exemplarily, for the data unit shown in FIG. 11, the compressed data may be as shown in FIG. 15, and there may be three situations.
情形1中,数据单元的两个组均为全零,则只需要两个全零指示符,占用2比特。也就是说,ALL0_FLAG1=0,ALL0_FLAG2=0并且压缩数据(ENC_RESULT)=ALL0_FLAG1,ALL0_FLAG2。In case 1, if both groups of data units are all zeros, only two all-zero indicators are needed, occupying 2 bits. That is, ALL0_FLAG1=0, ALL0_FLAG2=0 and compressed data (ENC_RESULT)=ALL0_FLAG1, ALL0_FLAG2.
情形2中,数据单元的两个组其中一个为全零,另一个为非全零,则需要一个全零指示符和一个非全零指示符,占用2比特;还需要一个比特数指示符,占用3比特;以及需要4个压缩差值,占用位数与比特数的具体值有关。也就是说,ALL0_FLAG1=1,ALL0_FLAG2=0并且压缩数据(ENC_RESULT)=ALL0_FLAG1,ALL0_FLAG2,len1,d1,d2,d3,d4;或者ALL0_FLAG1=0,ALL0_FLAG2=1并且压缩数据(ENC_RESULT)=ALL0_FLAG1,ALL0_FLAG2,len2,d5,d6,d7,d8。In case 2, if one of the two groups of data units is all zeros and the other is non-all zeros, an all-zero indicator and a non-all-zero indicator are required, occupying 2 bits; a bit number indicator is also required, Occupies 3 bits; and 4 compression differences are required. The number of occupied bits is related to the specific value of the number of bits. That is, ALL0_FLAG1=1, ALL0_FLAG2=0 and compressed data (ENC_RESULT)=ALL0_FLAG1, ALL0_FLAG2, len1, d1, d2, d3, d4; or ALL0_FLAG1=0, ALL0_FLAG2=1 and compressed data (ENC_RESULT)=ALL0_FLAG1, ,len2,d5,d6,d7,d8.
情形3中,数据单元的两个组均为非全零,则两个非全零指示符,占用2比特;还需要两个比特数指示符,占用6比特;以及需要8个压缩差值,占用位数与比特数的具体值有关。也就是说,ALL0_FLAG1=1,ALL0_FLAG2=1并且压缩数据(ENC_RESULT)=ALL0_FLAG1,ALL0_FLAG2,len1,len2,d1,d2,d3,d4,d5,d6,d7,d8。In case 3, if both groups of data units are non-all zeros, then two non-all zero indicators occupy 2 bits; two bit number indicators are also required, occupying 6 bits; and 8 compression differences are required, The number of occupied bits is related to the specific value of the number of bits. That is, ALL0_FLAG1=1, ALL0_FLAG2=1 and compressed data (ENC_RESULT)=ALL0_FLAG1, ALL0_FLAG2, len1, len2, d1, d2, d3, d4, d5, d6, d7, d8.
示例性地,还可以计算压缩数据的压缩长度,该压缩长度可以表示该压缩数据所占用的所有的比特数,例如参照图15所示的压缩数据所包含的各个数据的比特之和。Exemplarily, the compression length of the compressed data can also be calculated, and the compression length can represent all the bits occupied by the compressed data, for example, refer to the sum of the bits of each data contained in the compressed data shown in FIG. 15.
在此之后,便可以将得到的压缩数据经长度对齐之后缓存在压缩特征图缓存模块,以用于后续的输出过程。示例性地,如结合前述图3所描述的,如果压缩头生成模块判断不需要旁路(旁路操作),则将压缩数据写入外部存储器;否则重新读取原始特征图,替换压缩特征图缓存模块内的压缩数据,并替换以后的原始特征图写入外部存储器。After that, the obtained compressed data can be cached in the compressed feature map cache module after length-aligned for subsequent output process. Exemplarily, as described in conjunction with the aforementioned FIG. 3, if the compression header generation module determines that bypass (bypass operation) is not needed, then the compressed data is written to the external memory; otherwise, the original feature map is read again and the compressed feature map is replaced The compressed data in the module is cached, and the original feature map after replacement is written into the external memory.
并且,应当理解是,在S130中,当对多个数据单元中的第一数据单元执行压缩之后,紧接着读取位于第一数据单元之后的第二数据单元,执行类似的压缩操作。其中,第一数据单元和第二数据单元可以为由该压缩路径在时间上按顺序进行压缩的两个相邻的数据单元。And, it should be understood that, in S130, after performing compression on the first data unit of the multiple data units, the second data unit located after the first data unit is read immediately, and a similar compression operation is performed. The first data unit and the second data unit may be two adjacent data units that are sequentially compressed in time by the compression path.
示例性地,在对第一数据单元压缩之后,在对压缩后的第一数据单元进行存储的同时,启动对第二数据单元的压缩过程。其中,启动对第二数据单元的压缩过程,包括:判断第二数据单元的数据是否为全零。也就是说,在将压缩后的数据单元写入外部存储器的同时,针对第二数据单元启动判断是否全零的压缩过程。这样,能够实现对多个数据单元的流水线压缩处理过程,提高资源利用率。Exemplarily, after the first data unit is compressed, while storing the compressed first data unit, the compression process for the second data unit is started. Wherein, starting the compression process for the second data unit includes: judging whether the data of the second data unit is all zeros. That is to say, while the compressed data unit is written into the external memory, the compression process of determining whether all zeros is started is started for the second data unit. In this way, the pipeline compression processing process for multiple data units can be realized, and the resource utilization rate can be improved.
示例性地,在由扫描编码模块判断第一数据单元中的两组是否为全零,并计算差值(如果存在非全零的组)之后,由差值算法压缩模块对差值进行压缩。并且在差值算法压缩模块对第一数据单元中数据的差值进行压缩的过程中,由扫描编码模块开始判断第二数据单元中的两组是否为全零。也就是说,在同一时间,不同的模块可能在针对不同的数据单元进行压缩处理,这样,能够进一步提高资源利用率,提升压缩路径对数据单元压缩的效率。Exemplarily, after the scan coding module determines whether the two groups in the first data unit are all zeros, and calculates the difference (if there is a non-zero group), the difference algorithm compression module compresses the difference. And when the difference algorithm compression module compresses the difference of the data in the first data unit, the scan coding module starts to determine whether the two groups in the second data unit are all zeros. In other words, at the same time, different modules may be performing compression processing for different data units. In this way, resource utilization can be further improved, and the efficiency of data unit compression by the compression path can be improved.
可理解,这里的第一数据单元和第二数据单元可以是在寄存器(如图12所示的,其中暂存存储器的最小访问单元大小的数据)内的两个相邻的待处理数据单元。It can be understood that the first data unit and the second data unit herein may be two adjacent data units to be processed in a register (as shown in FIG. 12, where data of the smallest access unit size of the temporary storage memory) is stored.
由此可见,本发明实施例中的数据压缩模块可以包括扫描编码模块和差值算法压缩模块,是一种多级压缩流水设计,能够减小每级的组合逻辑,使得生成的电路可以支持更高的时钟频率,提升芯片性能。并且扫描编码模块和差值算法压缩模块分别具有如图12和图14所示的电路设计结构,这样使得一个压缩路径可以一次压缩8个像素数据。It can be seen that the data compression module in the embodiment of the present invention may include a scan coding module and a difference algorithm compression module. It is a multi-stage compression pipeline design that can reduce the combinatorial logic of each stage, so that the generated circuit can support more High clock frequency improves chip performance. In addition, the scan encoding module and the difference algorithm compression module have the circuit design structures shown in FIG. 12 and FIG. 14, respectively, so that one compression path can compress 8 pixel data at a time.
具体地,每个压缩路径执行压缩的过程可以如图16所示。示例性地, 图16中读取特征图数据之后的步骤由差值压缩模型执行。并且如前所述,在读取特征图数据时,是按照存储器的最小访问单元读取的,即一次读取最小访问单元大小的特征图数据,在完成对该最小访问单元大小的特征图数据的压缩之后,再读取下一个最小访问单元大小的特征图数据,直到将压缩指令所指示的全部特征图数据读取完毕。Specifically, the process of performing compression by each compression path may be as shown in FIG. 16. Exemplarily, the steps after reading the feature map data in FIG. 16 are performed by the difference compression model. And as mentioned above, when reading the feature map data, it is read according to the smallest access unit of the memory, that is, the feature map data of the smallest access unit size is read at a time, and the feature map data of the smallest access unit size is completed. After the compression, read the feature map data of the next smallest access unit size until all the feature map data indicated by the compression instruction has been read.
扫描编码模块可以读取最小访问单元大小的特征图数据的一个数据单元,判断数据单元包括的两个组是否全为零,如果存在非全零的组,则计算原始差值,这里的原始差值即表示数据单元中相邻两个像素之间的差值,如上述的D1至D8。The scan coding module can read a data unit of the feature map data with the smallest access unit size, and judge whether the two groups included in the data unit are all zeros, and if there are non-all zero groups, the original difference is calculated, where the original difference is The value represents the difference between two adjacent pixels in the data unit, such as the above D1 to D8.
差值算法压缩模块可以对原始差值(如上述的D1至D8)进行压缩,得到压缩差值(如上述的d1至d8),并计算压缩长度。可选地,还可以输出当前数据单元是否为当前压缩单元的结束的标志。The difference algorithm compression module can compress the original difference values (such as the above D1 to D8) to obtain the compressed difference values (such as the above d1 to d8), and calculate the compression length. Optionally, a flag indicating whether the current data unit is the end of the current compression unit can also be output.
在完成对一个数据单元的压缩之后,可以从最小访问单元大小的特征图数据读取下一个数据单元,直到完成对最小访问单元大小的特征图数据的所有像素的压缩过程。After the compression of one data unit is completed, the next data unit can be read from the feature map data of the smallest access unit size until the compression process of all pixels of the feature map data of the smallest access unit size is completed.
可见,本发明实施例的用于数据压缩存储的方法充分考虑到了特征图中零的情况以及特征图数据的相邻像素之间的数值接近的特点,利用差值方法进行压缩,能够使得压缩数据所占用的存储空间更小,一方面能够减小对外部存储器的空间占用,另一方面也能够降低读写时的带宽资源,节省了功耗。It can be seen that the method for data compression and storage in the embodiment of the present invention fully takes into account the situation of zero in the feature map and the feature that the values of adjacent pixels of the feature map data are close, and the difference method is used for compression, which can make the compressed data The occupied storage space is smaller, on the one hand, it can reduce the space occupation of the external memory, on the other hand, it can also reduce the bandwidth resources during reading and writing, and save power consumption.
作为本发明实施例的另一方面,还提供了另一种用于数据压缩存储的装置,如图17所示,该装置可以包括:接收装置210、划分装置220、压缩装置230、存储装置240。As another aspect of the embodiments of the present invention, another device for data compression and storage is also provided. As shown in FIG. 17, the device may include: a receiving device 210, a dividing device 220, a compression device 230, and a storage device 240. .
接收装置210,被配置用于接收待存储的特征图数据;The receiving device 210 is configured to receive feature map data to be stored;
划分装置220,被配置用于将特征图数据划分为多个数据单元;The dividing device 220 is configured to divide the feature map data into multiple data units;
压缩装置230,被配置用于针对多个数据单元中的每个数据单元:判断 数据单元中的数据是否为全零,并根据判断的结果进行压缩;The compression device 230 is configured to, for each data unit of the multiple data units, determine whether the data in the data unit is all zeros, and perform compression according to the result of the determination;
存储装置240,被配置用于将压缩后的特征图数据进行存储。The storage device 240 is configured to store the compressed feature map data.
在一个实现方式中,压缩装置230通过以下过程对一个数据单元进行压缩:将数据单元划分为一个或多个组;如果多个组中的第一组的数据为全零,则压缩后的数据为0;如果多个组中的第二组的数据为非全零,则:确定第二组中的数据之间的多个差值,并根据多个差值进行压缩。In one implementation, the compression device 230 compresses a data unit through the following process: divide the data unit into one or more groups; if the data in the first group of the multiple groups is all zeros, the compressed data It is 0; if the data of the second group in the multiple groups is not all zeros, then: determine multiple differences between the data in the second group, and compress according to the multiple differences.
在一个实现方式中,压缩装置230被配置用于:确定第二组中的第一个数据与位于第二组之前的最后一个数据之间差值,并确定第二组中除第一个数据之外的其他每个数据与第一个数据之间的差值。In one implementation, the compression device 230 is configured to: determine the difference between the first data in the second group and the last data before the second group, and to determine the difference between the first data in the second group The difference between each other data and the first data.
在一个实现方式中,压缩装置230被配置用于:确定与多个差值一一对应的多个第一非负数;根据多个第一非负数中的最高位非零值的位置,确定存储所需的比特数;根据多个差值的符号位以及比特数,对多个差值进行压缩,其中,每个差值被压缩后的长度为所述比特数加1。In an implementation manner, the compression device 230 is configured to: determine a plurality of first non-negative numbers corresponding to a plurality of difference values one-to-one; The number of bits required; according to the sign bits and the number of bits of the multiple differences, the multiple differences are compressed, wherein the compressed length of each difference is the number of bits plus one.
在一个实现方式中,压缩装置230被配置用于:通过对多个第一非负数进行按位取或操作,来确定存储所需的所述比特数。In an implementation manner, the compression device 230 is configured to: determine the number of bits required for storage by performing a bitwise OR operation on a plurality of first non-negative numbers.
在一个实现方式中,压缩装置230被配置用于:In one implementation, the compression device 230 is configured to:
如果第一差值的符号位指示第一差值为第二非负数,则与第一差值对应的第一非负数为第一差值去掉其符号位所得到的数;If the sign bit of the first difference value indicates that the first difference value is a second non-negative number, the first non-negative number corresponding to the first difference value is the number obtained by removing the sign bit of the first difference value;
如果第一差值的符号位指示第一差值为负数,则与第一差值对应的第一非负数为第一差值去掉其符号位后取反所得到的。If the sign bit of the first difference value indicates that the first difference value is a negative number, the first non-negative number corresponding to the first difference value is the first difference value after removing its sign bit and inverted.
在一个实现方式中,如果第二组的数据为非全零,则对第二组进行压缩后存储的数据包括:非全零指示符、比特数指示符、压缩后的多个差值。其中,比特数指示符表示所述压缩后的多个差值除去符号位后的长度。In an implementation manner, if the data of the second group is non-all zeros, the data stored after compressing the second group includes: a non-all zero indicator, a bit number indicator, and multiple difference values after compression. Wherein, the bit number indicator represents the length of the compressed multiple differences after removing the sign bit.
在一个实现方式中,非全零指示符为1。In one implementation, the non-all zero indicator is 1.
在一个实现方式中,压缩装置230还被配置用于:生成与压缩后的数据 单元所对应的压缩头信息;其中,存储装置240被配置用于:将压缩后的数据单元与对应的压缩头信息存储到外部存储器。In an implementation manner, the compression device 230 is further configured to: generate compression header information corresponding to the compressed data unit; wherein, the storage device 240 is configured to: combine the compressed data unit with the corresponding compression header. Information is stored in external storage.
在一个实现方式中,压缩装置230还被配置用于:根据压缩头信息判断是否需要执行旁路操作;如果确定需要执行旁路操作,则生成与旁路操作对应的旁路压缩头信息。其中,存储装置240被配置用于:将未经压缩的特征图数据与旁路压缩头信息存储到外部存储器。In an implementation manner, the compression device 230 is further configured to: determine whether a bypass operation needs to be performed according to the compression header information; if it is determined that the bypass operation needs to be performed, generate bypass compression header information corresponding to the bypass operation. The storage device 240 is configured to store uncompressed feature map data and bypass compression header information in an external memory.
在一个实现方式中,还包括读取装置,被配置用于:接收压缩指令;根据所述压缩指令,发送读特征图命令,以便从片上存储器获取与读特征图命令对应的特征图数据。In an implementation manner, it further includes a reading device configured to: receive a compression instruction; send a feature map read command according to the compression instruction, so as to obtain feature map data corresponding to the read feature map command from the on-chip memory.
在一个实现方式中,读特征图命令包括特征图数据的宽度、高度、在所述片上存储器的基地址。In one implementation, the read feature map command includes the width and height of the feature map data, and the base address of the on-chip memory.
在一个实现方式中,接收装置210被配置用于:接收与最小访问单元大小一致的特征图数据。In one implementation, the receiving device 210 is configured to receive feature map data consistent with the size of the minimum access unit.
在一个实现方式中,压缩装置240被配置用于:根据存储器的最小访问单元,将特征图数据划分为多个数据单元,其中,一个数据单元中的所有数据位于同一个最小访问单元内。In one implementation, the compression device 240 is configured to divide the feature map data into multiple data units according to the minimum access unit of the memory, wherein all data in one data unit are located in the same minimum access unit.
在一个实现方式中,如果特征图数据的一行数据所需的存储空间大于所述最小访问单元,则位于同一个最小访问单元内的数据属于特征图数据的同一行。如果特征图数据的一行数据所需的存储空间小于所述最小访问单元,则属于特征图数据的同一行的数据位于同一个最小访问单元内。In an implementation manner, if the storage space required for a row of data of the feature map data is greater than the minimum access unit, the data located in the same minimum access unit belongs to the same row of the feature map data. If the storage space required for one row of feature map data is less than the minimum access unit, the data belonging to the same row of feature map data is located in the same minimum access unit.
在一个实现方式中,待存储的特征图数据为神经网络中的卷积层的输出。In one implementation, the feature map data to be stored is the output of the convolutional layer in the neural network.
在一个实现方式中,压缩装置230被配置用于:在对压缩后的第一数据单元进行存储的同时,启动对第二数据单元的压缩过程。其中,第一数据单元和第二数据单元为时间上按顺序进行压缩的数据单元。In one implementation, the compression device 230 is configured to: while storing the compressed first data unit, start the compression process for the second data unit. Among them, the first data unit and the second data unit are data units that are compressed sequentially in time.
在一个实现方式中,压缩装置230被配置用于:通过判断第二数据单元 的数据是否为全零来启动对第二数据单元的压缩过程。In one implementation, the compression device 230 is configured to start the compression process of the second data unit by determining whether the data of the second data unit is all zeros.
示例性地,图17所示的装置能够用于实现前述图8所示的数据存储的方法,为避免重复,这里不再赘述。Exemplarily, the device shown in FIG. 17 can be used to implement the data storage method shown in FIG. 8. In order to avoid repetition, it will not be repeated here.
另外,结合图3来看,图17所示的装置可以是至少两个压缩路径中的任一压缩路径,并且可理解,图17所示出的装置只是示意性地,其也可以实现为如图7所示的各种不同的模块。In addition, in conjunction with FIG. 3, the device shown in FIG. 17 can be any one of the at least two compression paths, and it is understandable that the device shown in FIG. 17 is only schematic, and it can also be implemented as Figure 7 shows the various modules.
应当理解的是,本发明实施例的用于数据压缩存储的系统能够实现在处理器上,例如可以是计算机、服务器、工作站、移动终端、云台等各种设备的处理器上。并且,原始特征图可以是由该处理器从其他设备接收或获取的,也可以是由该处理器执行其他操作或算法的过程中生成的,例如处理器可以在执行卷积神经网络的过程中生成原始特征图。It should be understood that the system for data compression storage in the embodiment of the present invention can be implemented on a processor, for example, it can be a processor of various devices such as a computer, a server, a workstation, a mobile terminal, and a pan/tilt. In addition, the original feature map may be received or obtained by the processor from other devices, or generated by the processor in the process of executing other operations or algorithms. For example, the processor may be in the process of executing a convolutional neural network. Generate the original feature map.
示例性地,本发明实施例还提供了一种处理器,该处理器可以包括片上存储器以及如图3所示的系统。或者,该处理器可以包括片上存储器以及如图17所示的装置。Exemplarily, an embodiment of the present invention also provides a processor. The processor may include an on-chip memory and the system as shown in FIG. 3. Alternatively, the processor may include on-chip memory and the device as shown in FIG. 17.
本发明实施例中,处理器可以包括中央处理单元(Central Processing Unit,CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,例如现场可编程门阵列(Field-Programmable Gate Array,FPGA)或进阶精简指令集机器(Advanced RISC(Reduced Instruction Set Computer)Machine,ARM)等,并且处理器可以包括其他组件以执行各种期望的功能。In the embodiment of the present invention, the processor may include a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, such as Field-Programmable Gate Array (Field-Programmable Gate Array). , FPGA) or Advanced RISC (Reduced Instruction Set Computer) Machine (ARM), etc., and the processor may include other components to perform various desired functions.
应当理解的是,本发明实施例中的“特征图”、“特征图数据”、“原始特征图”等术语在没有相反指示的前提下,是指经本发明实施例的系统压缩之前的数据,其可以具有宽度和高度两个维度,或者可选地可以具有宽度、高度和通道(channel)三个维度。It should be understood that the terms "characteristic map", "characteristic map data", and "original characteristic map" in the embodiment of the present invention refer to the data before compression by the system of the embodiment of the present invention, unless otherwise indicated. , It can have two dimensions of width and height, or alternatively can have three dimensions of width, height and channel.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结 合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
另外,本发明实施例还提供了一种计算机存储介质,其上存储有计算机程序。当所述计算机程序由处理器执行时,可以实现前述所示的数据存储的方法的步骤。例如,该计算机存储介质为计算机可读存储介质。例如,计算机程序指令在被计算机或处理器运行时使计算机或处理器执行如图4或图8等所示的方法的步骤。In addition, the embodiment of the present invention also provides a computer storage medium on which a computer program is stored. When the computer program is executed by the processor, the steps of the data storage method shown above can be realized. For example, the computer storage medium is a computer-readable storage medium. For example, when the computer program instructions are executed by the computer or the processor, the computer or the processor executes the steps of the method shown in FIG. 4 or FIG. 8.
在一个实施例中,所述计算机程序指令在被计算机或处理器运行时使计算机或处理器执行以下步骤:接收待存储的特征图数据;将所述特征图数据划分为多个数据单元;针对所述多个数据单元中的每个数据单元:判断所述数据单元中的数据是否为全零,并根据判断的结果进行压缩;将压缩后的特征图数据进行存储。In one embodiment, when the computer program instructions are executed by the computer or the processor, the computer or the processor executes the following steps: receiving the feature map data to be stored; dividing the feature map data into multiple data units; Each data unit of the plurality of data units: judges whether the data in the data unit is all zeros, and compresses the data according to the judgment result; and stores the compressed feature map data.
计算机存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合。The computer storage medium may include, for example, the memory card of a smart phone, the storage component of a tablet computer, the hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory ( CD-ROM), USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.
另外,本发明实施例还提供了一种计算机程序产品,其包含指令,当该指令被计算机所执行时,使得计算机执行上述如图4或图8所示的数据存储的方法的步骤。In addition, an embodiment of the present invention also provides a computer program product, which contains instructions, which when executed by a computer, cause the computer to execute the steps of the data storage method shown in FIG. 4 or FIG. 8.
在一个实施例中,当该指令被计算机所执行时,使得计算机执行:接收待存储的特征图数据;将所述特征图数据划分为多个数据单元;针对所述多个数据单元中的每个数据单元:判断所述数据单元中的数据是否为全零,并根据判断的结果进行压缩;将压缩后的特征图数据进行存储。In one embodiment, when the instruction is executed by the computer, the computer is caused to execute: receive the feature map data to be stored; divide the feature map data into a plurality of data units; A data unit: judge whether the data in the data unit is all zeros, and compress according to the judgment result; store the compressed feature map data.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc. .
可见,本发明实施例的用于数据压缩存储的方法充分考虑到了特征图中零的情况以及特征图数据的相邻像素之间的数值接近的特点,利用差值方法进行压缩,能够使得压缩数据所占用的存储空间更小,一方面能够减小对外部存储器的空间占用,另一方面也能够降低读写时的带宽资源,节省了功耗。It can be seen that the method for data compression and storage in the embodiment of the present invention fully takes into account the situation of zero in the feature map and the feature that the values of adjacent pixels of the feature map data are close, and the difference method is used for compression, which can make the compressed data The occupied storage space is smaller, on the one hand, it can reduce the space occupation of the external memory, on the other hand, it can also reduce the bandwidth resources during reading and writing, and save power consumption.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和 方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理器中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processor, or each unit may exist alone physically, or two or more units may be integrated into one unit.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.
Claims (37)
- 一种用于数据压缩存储的系统,其特征在于,所述系统用于将片上存储器中的特征图进行压缩后再存储在外部存储器中,所述系统包括压缩指令生成模块、读仲裁模块、至少两个压缩路径以及写仲裁模块:A system for data compression storage, characterized in that the system is used to compress a feature map in an on-chip memory and then store it in an external memory, and the system includes a compression instruction generation module, a read arbitration module, and at least Two compression paths and write arbitration module:所述压缩指令生成模块,用于将压缩指令分发到所述至少两个压缩路径中的各个压缩路径;The compression instruction generating module is configured to distribute the compression instruction to each of the at least two compression paths;所述至少两个压缩路径中的每个压缩路径,用于根据从所述压缩指令生成模块接收到的压缩指令,从所述片上存储器读取相应的原始特征图,并且将读取到的原始特征图进行压缩;Each of the at least two compression paths is configured to read the corresponding original feature map from the on-chip memory according to the compression instruction received from the compression instruction generation module, and read the original The feature map is compressed;所述读仲裁模块,用于对所述至少两个压缩路径针对所述片上存储器中的原始特征图的读特征图命令进行仲裁;The read arbitration module is configured to arbitrate the read feature map commands of the at least two compressed paths for the original feature map in the on-chip memory;所述写仲裁模块,用于对所述至少两个压缩路径将压缩后的数据写入所述外部存储器的写请求进行仲裁。The write arbitration module is configured to arbitrate the write requests of the at least two compression paths to write compressed data into the external memory.
- 根据权利要求1所述的系统,其特征在于,所述至少两个压缩路径中的每个压缩路径包括:读特征图模块、特征图缓存模块、数据压缩模块、数据打包模块、压缩头生成模块,The system according to claim 1, wherein each of the at least two compression paths comprises: a feature map reading module, a feature map caching module, a data compression module, a data packing module, and a compression header generation module ,所述读特征图模块,用于根据从所述压缩指令生成模块接收到的压缩指令,发送针对所述片上存储器中的原始特征图的读特征图命令;The read feature map module is configured to send a read feature map command for the original feature map in the on-chip memory according to the compression instruction received from the compression instruction generation module;所述特征图缓存模块,用于存储从所述片上存储器读回的所述原始特征图;The feature map cache module is configured to store the original feature map read back from the on-chip memory;所述数据压缩模块,用于将所述特征图缓存模块内的所述原始特征图划分为多个数据单元,并针对所述多个数据单元中的每个数据单元进行差值压缩;The data compression module is configured to divide the original feature map in the feature map cache module into multiple data units, and perform differential compression for each data unit of the multiple data units;所述数据打包模块,用于将经所述数据压缩模块所压缩之后的数据拼接成完整的压缩数据;The data packing module is used to splice the data compressed by the data compression module into complete compressed data;所述压缩头生成模块,用于根据从所述压缩指令生成模块接收到的压缩指令,生成与所述数据压缩模块得到的所述压缩数据对应的压缩头信息。The compression header generation module is configured to generate compression header information corresponding to the compressed data obtained by the data compression module according to the compression instruction received from the compression instruction generation module.
- 根据权利要求2所述的系统,其特征在于,所述至少两个压缩路径中的每个压缩路径还包括长度对齐模块,用于:The system according to claim 2, wherein each of the at least two compression paths further comprises a length alignment module for:将经所述数据打包模块拼接后的所述压缩数据的长度补齐到特定长度。The length of the compressed data spliced by the data packing module is complemented to a specific length.
- 根据权利要求2或3所述的系统,其特征在于,The system according to claim 2 or 3, wherein:所述压缩头生成模块,还用于判断是否需要执行旁路,并且在确定需要旁路时生成与原始特征图对应的压缩头信息;The compressed header generating module is also used to determine whether bypass is needed, and to generate compressed header information corresponding to the original feature map when it is determined that bypass is needed;所述读特征图模块,还用于当所述压缩头生成模块确定需要旁路时,从所述片上存储器重新读取所述原始特征图。The feature map reading module is further configured to re-read the original feature map from the on-chip memory when the compression head generation module determines that bypassing is required.
- 根据权利要求4所述的系统,其特征在于,所述至少两个压缩路径中的每个压缩路径还包括长度对齐模块,用于:The system according to claim 4, wherein each of the at least two compression paths further comprises a length alignment module for:将重新读取的所述原始特征图的长度补齐到特定长度。The length of the re-read original feature map is padded to a specific length.
- 根据权利要求3或5所述的系统,其特征在于,所述特定长度是根据所述外部存储器的芯片的性能所预先设定的。The system according to claim 3 or 5, wherein the specific length is preset according to the performance of the external memory chip.
- 根据权利要求2至6中任一项所述的系统,其特征在于,所述数据压缩模块包括:扫描编码模块和差值算法压缩模块,The system according to any one of claims 2 to 6, wherein the data compression module comprises: a scan coding module and a difference algorithm compression module,所述扫描编码模块,用于将所述原始特征图划分为多个数据单元,针对每个数据单元:将所述数据单元划分为一个或多个组,判断每个组的所有数据是否为全零,并且在确定为非全零时,计算非全零的组中的数据之间的多个差值;The scan coding module is configured to divide the original feature map into multiple data units, for each data unit: divide the data unit into one or more groups, and determine whether all the data in each group is complete Zero, and when it is determined to be non-zero, calculate multiple differences between the data in the non-zero group;所述差值算法压缩模块,用于根据所述多个差值对应的多个非负数,确定存储比特数,并根据所述多个差值的符号位以及所述比特数,将所述多个差值进行压缩。The difference algorithm compression module is configured to determine the number of stored bits according to the plurality of non-negative numbers corresponding to the plurality of differences, and calculate the number of stored bits according to the sign bits of the plurality of differences and the number of bits. The difference is compressed.
- 根据权利要求2至7中任一项所述的系统,其特征在于,所述至少两个压缩路径中的每个压缩路径还包括:压缩头缓存模块、压缩特征图缓存模块、压缩头写模块和压缩特征图写模块,The system according to any one of claims 2 to 7, wherein each compression path of the at least two compression paths further comprises: a compressed header cache module, a compressed feature map cache module, and a compressed header write module And compression feature map writing module,所述压缩头缓存模块,用于缓存所述压缩头生成模块所生成的待输出的压缩头信息;The compressed header caching module is configured to buffer the compressed header information to be output generated by the compressed header generating module;所述压缩特征图缓存模块,用于缓存待输出数据,所述待输出数据为长度补齐后的压缩数据或者为需要旁路时的长度补齐后的原始特征图;The compressed feature map caching module is configured to cache data to be output, the data to be output is compressed data with length complemented or original feature maps with length complemented when bypass is needed;所述压缩头写模块,用于执行所述压缩头缓存模块内的压缩头信息的写操作;The compression header writing module is configured to perform a writing operation of the compression header information in the compression header caching module;所述压缩特征图写模块,用于执行所述压缩特征图缓存模块内的所述待 输出数据的写操作。The compressed feature map writing module is configured to perform a write operation of the to-be-output data in the compressed feature map cache module.
- 根据权利要求1至8中任一项所述的系统,其特征在于,所述读特征图命令包括待读取的所述原始特征图的宽度、高度、在所述片上存储器的基地址。The system according to any one of claims 1 to 8, wherein the read feature map command includes the width and height of the original feature map to be read, and the base address of the on-chip memory.
- 根据权利要求1至9中任一项所述的系统,其特征在于,所述系统还包括读命令缓存模块和读数据路径标识缓存模块,均与所述读仲裁模块连接,The system according to any one of claims 1 to 9, wherein the system further comprises a read command cache module and a read data path identification cache module, both of which are connected to the read arbitration module,所述读仲裁模块,用于:The read arbitration module is used for:获取各个压缩路径发出的读特征图命令,其中,各个压缩路径的读特征图命令可以缓存在各自的读命令缓存模块内;Obtain the read feature map commands issued by each compression path, where the read feature map commands of each compression path can be cached in the respective read command cache module;根据仲裁规则对各个读命令缓存模块内的读特征图命令进行仲裁,得到仲裁结果;Arbitrate the read characteristic map commands in each read command cache module according to the arbitration rules, and obtain the arbitration result;将仲裁胜出的压缩路径的读特征图命令优先发送到所述片上存储器,并将仲裁胜出的压缩路径的路径标识存储在读数据路径标识缓存模块内。The read characteristic map command of the compressed path that won the arbitration is sent to the on-chip memory first, and the path identifier of the compressed path that won the arbitration is stored in the read data path identifier cache module.
- 根据权利要求1至10中任一项所述的系统,其特征在于,所述写仲裁模块,具体用于:The system according to any one of claims 1 to 10, wherein the write arbitration module is specifically configured to:获取各个压缩路径发出的写请求;Get the write request issued by each compression path;根据仲裁规则对各个写请求进行仲裁,得到仲裁结果;Arbitrate each write request according to the arbitration rules, and get the arbitration result;将仲裁胜出的压缩路径的写请求优先发送到所述外部存储器。The write request of the compression path that wins the arbitration is sent to the external memory first.
- 根据权利要求10或11所述的系统,其特征在于,所述仲裁规则是按照压缩指令配置好的优先级机制或者公平轮询机制。The system according to claim 10 or 11, wherein the arbitration rule is a priority mechanism or a fair polling mechanism configured according to the compressed instruction.
- 根据权利要求1至12中任一项所述的系统,其特征在于,所述压缩指令生成模块,具体用于:The system according to any one of claims 1 to 12, wherein the compressed instruction generating module is specifically configured to:接收压缩指令,解析接收到的压缩指令,并将解析后的压缩指令分发到所述至少两个压缩路径中的各个压缩路径。The compression instruction is received, the received compression instruction is parsed, and the parsed compression instruction is distributed to each of the at least two compression paths.
- 根据权利要求1至13中任一项所述的系统,其特征在于,分发到各个压缩路径的压缩指令包括:The system according to any one of claims 1 to 13, wherein the compression instructions distributed to each compression path include:所述压缩路径待压缩的特征图数量、The number of feature maps to be compressed in the compression path,特征图宽度、Feature map width,特征图高度、Feature map height,这些数量的特征图在所述片上存储器的基地址、These number of feature maps are in the base address of the on-chip memory,这些数量的特征图在所述片上存储器的图间存储间隔、These number of feature maps are stored in the inter-map storage interval of the on-chip memory,这些数量的特征图被所述压缩路径压缩之后输出到所述外部存储器的基地址、These number of feature maps are compressed by the compression path and output to the base address of the external memory,这些数量的特征图被所述压缩路径压缩之后输出到所述外部存储器的图间存储间隔、These number of feature maps are compressed by the compression path and output to the external memory storage interval,这些数量的特征图被所述压缩路径压缩之后的压缩头信息输出到外部存储器的基地址、These numbers of feature maps are output to the base address of the external memory after the compression header information compressed by the compression path,这些数量的特征图被所述压缩路径压缩之后的压缩头信息输出到外部存储器的头信息存储间隔。The compressed header information after these number of feature maps are compressed by the compression path is output to the header information storage interval of the external memory.
- 根据权利要求1至14中任一项所述的系统,其特征在于,所述系统在如下的状态之间进行切换:空闲状态、接收指令状态、解析指令状态、等待完成状态,其中,The system according to any one of claims 1 to 14, wherein the system switches between the following states: idle state, receiving instruction state, parsing instruction state, and waiting for completion state, wherein,所述系统处于所述空闲状态时,等待压缩指令起始信号,并且在接收到所述压缩指令起始信号之后,切换到所述接收指令状态;When the system is in the idle state, waiting for a compression command start signal, and after receiving the compression command start signal, switch to the receiving command state;所述系统处于所述接收指令状态时,接收压缩指令,并且在接收完成之后,输出指令就绪信号,并切换到所述解析指令状态;When the system is in the receiving instruction state, receiving a compression instruction, and after receiving the instruction, outputting an instruction ready signal, and switching to the analysis instruction state;所述系统处于所述解析指令状态时,对在所述接收指令状态所接收到的所述压缩指令进行解析,并向各个压缩路径分发压缩指令;When the system is in the parsing instruction state, parsing the compression instruction received in the receiving instruction state, and distributing the compression instruction to each compression path;所述系统处于所述等待完成状态时,监控各个压缩路径的完成信号,并且可以在监控到所有压缩路径完成之后,切换到所述空闲状态。When the system is in the waiting completion state, it monitors the completion signals of each compression path, and can switch to the idle state after monitoring that all the compression paths are completed.
- 根据权利要求1至15中任一项所述的系统,其特征在于,所述特征图为神经网络中的卷积层的输出。The system according to any one of claims 1 to 15, wherein the feature map is the output of a convolutional layer in a neural network.
- 一种用于数据压缩存储的方法,其特征在于,所述方法用于将片上存储器中的特征图进行压缩后再存储到外部存储器中,所述方法包括:A method for data compression storage, characterized in that the method is used for compressing the feature map in the on-chip memory and then storing it in the external memory, and the method includes:将压缩指令分发到所述至少两个压缩路径中的各个压缩路径;Distributing the compression instruction to each of the at least two compression paths;每个压缩路径都根据接收到的压缩指令,从所述片上存储器读取相应的原始特征图,并且将读取到的原始特征图进行压缩;Each compression path reads the corresponding original feature map from the on-chip memory according to the received compression instruction, and compresses the read original feature map;将压缩后的特征图存入所述外部存储器;Storing the compressed feature map in the external memory;其中,在至少两个压缩路径针对所述片上存储器中的原始特征图进行读取时,对所述至少两个压缩路径的读特征图命令进行仲裁;Wherein, when at least two compressed paths read the original feature map in the on-chip memory, arbitrate the read feature map commands of the at least two compressed paths;其中,在所述至少两个压缩路径将压缩后的特征图写入所述外部存储器时,对所述至少两个压缩路径的写请求进行仲裁。Wherein, when the at least two compression paths write the compressed feature map into the external memory, the write requests of the at least two compression paths are arbitrated.
- 根据权利要求17所述的方法,其特征在于,将读取到的原始特征图进行压缩,包括:The method according to claim 17, wherein compressing the read original feature map comprises:将所述原始特征图划分为多个数据单元;Dividing the original feature map into multiple data units;针对所述多个数据单元中的每个数据单元进行差值压缩。Perform difference compression for each data unit of the plurality of data units.
- 根据权利要求18所述的方法,将读取到的原始特征图进行压缩,还包括:The method according to claim 18, compressing the read original feature map, further comprising:将差值压缩后的特征图拼接成完整的压缩数据。The feature map after the difference value compression is spliced into a complete compressed data.
- 根据权利要求19所述的方法,还包括:The method of claim 19, further comprising:将拼接后的所述压缩数据的长度补齐到特定长度。The length of the spliced compressed data is padded to a specific length.
- 根据权利要求18至20中任一项所述的方法,其特征在于,针对所述多个数据单元中的每个数据单元进行差值压缩,包括:The method according to any one of claims 18 to 20, wherein performing difference compression for each data unit of the plurality of data units comprises:针对一个数据单元:For a data unit:将所述数据单元划分为一个或多个组;Divide the data unit into one or more groups;判断每个组的所有数据是否为全零,并且在确定为非全零时,计算非全零的组中的数据之间的多个差值;Determine whether all the data in each group are all zeros, and when it is determined to be non-all zeros, calculate multiple differences between the data in the non-all zero groups;根据所述多个差值对应的多个非负数,确定存储比特数;Determine the number of storage bits according to the multiple non-negative numbers corresponding to the multiple differences;根据所述多个差值的符号位以及所述比特数,将所述多个差值进行压缩。Compressing the plurality of differences according to the sign bits of the plurality of differences and the number of bits.
- 根据权利要求18至21中任一项所述的方法,其特征在于,将读取到的原始特征图进行压缩,还包括:The method according to any one of claims 18 to 21, characterized in that compressing the read original feature map, further comprising:生成与差值压缩后的特征图所对应的压缩头信息。Generate compressed header information corresponding to the feature map after difference compression.
- 根据权利要求22所述的方法,其特征在于,还包括:The method according to claim 22, further comprising:根据所述压缩头信息判断是否需要执行旁路操作;Judging whether a bypass operation needs to be performed according to the compressed header information;在确定需要执行旁路操作时,从所述片上存储器重新读取所述原始特征图,并生成与所述原始特征图对应的旁路压缩头信息。When it is determined that the bypass operation needs to be performed, the original feature map is re-read from the on-chip memory, and bypass compression header information corresponding to the original feature map is generated.
- 根据权利要求23所述的方法,其特征在于,还包括:The method according to claim 23, further comprising:将重新读取的所述原始特征图的长度补齐到特定长度,并丢弃所述差值压缩后的特征图。The length of the re-read original feature map is padded to a specific length, and the feature map after compression of the difference is discarded.
- 根据权利要求20或24所述的方法,其特征在于,所述特定长度是根据所述外部存储器的芯片的性能所预先设定的。The method according to claim 20 or 24, wherein the specific length is preset according to the performance of the chip of the external memory.
- 根据权利要求20或24或25所述的方法,其特征在于,将长度补齐到特定长度,包括:The method according to claim 20 or 24 or 25, wherein the complementing the length to a specific length comprises:通过添加无效数据,将长度补齐到所述特定长度。By adding invalid data, the length is padded to the specified length.
- 根据权利要求23所述的方法,其特征在于,判断是否需要执行旁路操作,包括:The method according to claim 23, wherein determining whether a bypass operation needs to be performed comprises:通过比较所述差值压缩后的特征图的大小与所述原始特征图之间的大小,来判断是否需要执行旁路操作。By comparing the size of the feature map after the difference is compressed with the size between the original feature map, it is determined whether a bypass operation needs to be performed.
- 根据权利要求17至27中任一项所述的方法,其特征在于,从所述片上存储器读取相应的原始特征图,包括:The method according to any one of claims 17 to 27, wherein reading the corresponding original feature map from the on-chip memory comprises:从所述片上存储器读取与最小访问单元大小一致的原始特征图。The original feature map consistent with the minimum access unit size is read from the on-chip memory.
- 根据权利要求17至28中任一项所述的方法,其特征在于,所述读特征图命令包括待读取的所述原始特征图的宽度、高度、在所述片上存储器的基地址。The method according to any one of claims 17 to 28, wherein the read feature map command includes the width and height of the original feature map to be read, and the base address of the on-chip memory.
- 根据权利要求17至29中任一项所述的方法,其特征在于,对所述至少两个压缩路径的读特征图命令进行仲裁,包括:The method according to any one of claims 17 to 29, wherein arbitrating the read characteristic map commands of the at least two compressed paths comprises:根据仲裁规则对来自所述至少两个压缩路径的至少两个读特征图命令进行仲裁,得到仲裁结果;Arbitrate the at least two read characteristic map commands from the at least two compression paths according to the arbitration rule to obtain an arbitration result;将仲裁胜出的压缩路径的读特征图命令优先发送到所述片上存储器,并存储仲裁胜出的压缩路径的路径标识。The command to read the characteristic map of the compressed path that won the arbitration is sent to the on-chip memory first, and the path identifier of the compressed path that won the arbitration is stored.
- 根据权利要求17至30中任一项所述的方法,其特征在于,对所述至少两个压缩路径的写请求进行仲裁,包括:The method according to any one of claims 17 to 30, wherein arbitrating the write requests of the at least two compression paths comprises:根据仲裁规则对来自所述至少两个压缩路径的至少两个写请求进行仲裁,得到仲裁结果;Arbitrate the at least two write requests from the at least two compression paths according to the arbitration rule, and obtain an arbitration result;将仲裁胜出的压缩路径的写请求优先发送到所述外部存储器。The write request of the compression path that wins the arbitration is sent to the external memory first.
- 根据权利要求30或31所述的方法,其特征在于,所述仲裁规则是按照压缩指令配置好的优先级机制或者公平轮询机制。The method according to claim 30 or 31, wherein the arbitration rule is a priority mechanism or a fair polling mechanism configured according to the compressed instruction.
- 根据权利要求17至32中任一项所述的方法,其特征在于,分发到各个压缩路径的压缩指令包括:The method according to any one of claims 17 to 32, wherein the compression instructions distributed to each compression path comprise:所述压缩路径待压缩的特征图数量、The number of feature maps to be compressed in the compression path,特征图宽度、Feature map width,特征图高度、Feature map height,这些数量的特征图在所述片上存储器的基地址、These number of feature maps are in the base address of the on-chip memory,这些数量的特征图在所述片上存储器的图间存储间隔、These number of feature maps are stored in the inter-map storage interval of the on-chip memory,这些数量的特征图被所述压缩路径压缩之后输出到所述外部存储器的基地址、These number of feature maps are compressed by the compression path and output to the base address of the external memory,这些数量的特征图被所述压缩路径压缩之后输出到所述外部存储器的图间存储间隔、These number of feature maps are compressed by the compression path and output to the external memory storage interval,这些数量的特征图被所述压缩路径压缩之后的压缩头信息输出到外部存储器的基地址、These numbers of feature maps are output to the base address of the external memory after the compression header information compressed by the compression path,这些数量的特征图被所述压缩路径压缩之后的压缩头信息输出到外部存储器的头信息存储间隔。The compressed header information after these number of feature maps are compressed by the compression path is output to the header information storage interval of the external memory.
- 根据权利要求17至33中任一项所述的方法,其特征在于,还包括:The method according to any one of claims 17 to 33, further comprising:预先设置状态机,以在所述数据压缩存储的过程中进行状态切换,其中,所述状态机包括以下状态:空闲状态、接收指令状态、解析指令状态、等待完成状态,其中,A state machine is preset to perform state switching during the process of data compression and storage, where the state machine includes the following states: idle state, receiving instruction state, parsing instruction state, and waiting for completion state, wherein,当处于所述空闲状态时,等待压缩指令起始信号,并且在接收到所述压缩指令起始信号之后,切换到所述接收指令状态;When in the idle state, waiting for a compression instruction start signal, and after receiving the compression instruction start signal, switch to the receiving instruction state;当处于所述接收指令状态时,接收压缩指令,并且在接收完成之后,输出指令就绪信号,并切换到所述解析指令状态;When in the receiving instruction state, receive the compression instruction, and after the reception is completed, output an instruction ready signal, and switch to the analysis instruction state;当处于所述解析指令状态时,对在所述接收指令状态所接收到的所述压缩指令进行解析,并向各个压缩路径分发压缩指令;When in the analyzing instruction state, analyze the compression instruction received in the receiving instruction state, and distribute the compression instruction to each compression path;当处于所述等待完成状态时,监控各个压缩路径的完成信号,并且可以在监控到所有压缩路径完成之后,切换到所述空闲状态。When in the waiting state, the completion signal of each compression path is monitored, and after the completion of all the compression paths is monitored, the idle state can be switched to.
- 根据权利要求17至34中任一项所述的方法,其特征在于,所述特 征图为神经网络中的卷积层的输出。The method according to any one of claims 17 to 34, wherein the feature map is the output of a convolutional layer in a neural network.
- 一种处理器,其特征在于,包括:A processor, characterized in that it comprises:片上存储器,以及On-chip memory, and如权利要求1至16中任一项所述的系统。The system according to any one of claims 1 to 16.
- 一种计算机存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求17至35中任一项所述方法的步骤。A computer storage medium having a computer program stored thereon, wherein the computer program implements the steps of any one of claims 17 to 35 when the computer program is executed by a processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/092627 WO2021237513A1 (en) | 2020-05-27 | 2020-05-27 | Data compression storage system and method, processor, and computer storage medium |
PCT/CN2020/099495 WO2021237870A1 (en) | 2020-05-27 | 2020-06-30 | Data encoding method, data decoding method, data processing method, encoder, decoder, system, movable platform, and computer-readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/092627 WO2021237513A1 (en) | 2020-05-27 | 2020-05-27 | Data compression storage system and method, processor, and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021237513A1 true WO2021237513A1 (en) | 2021-12-02 |
Family
ID=78745231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/092627 WO2021237513A1 (en) | 2020-05-27 | 2020-05-27 | Data compression storage system and method, processor, and computer storage medium |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021237513A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114866483A (en) * | 2022-03-25 | 2022-08-05 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1561007A (en) * | 2004-02-27 | 2005-01-05 | 中兴通讯股份有限公司 | Device and method for data compression decompression in data transmission |
US20180198994A1 (en) * | 2017-01-11 | 2018-07-12 | Sony Corporation | Compressive sensing capturing device and method |
CN108875751A (en) * | 2017-11-02 | 2018-11-23 | 北京旷视科技有限公司 | Image processing method and device, the training method of neural network, storage medium |
CN109445719A (en) * | 2018-11-16 | 2019-03-08 | 郑州云海信息技术有限公司 | A kind of date storage method and device |
CN110163370A (en) * | 2019-05-24 | 2019-08-23 | 上海肇观电子科技有限公司 | Compression method, chip, electronic equipment and the medium of deep neural network |
CN110494892A (en) * | 2017-05-31 | 2019-11-22 | 三星电子株式会社 | Method and apparatus for handling multi-channel feature figure image |
-
2020
- 2020-05-27 WO PCT/CN2020/092627 patent/WO2021237513A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1561007A (en) * | 2004-02-27 | 2005-01-05 | 中兴通讯股份有限公司 | Device and method for data compression decompression in data transmission |
US20180198994A1 (en) * | 2017-01-11 | 2018-07-12 | Sony Corporation | Compressive sensing capturing device and method |
CN110494892A (en) * | 2017-05-31 | 2019-11-22 | 三星电子株式会社 | Method and apparatus for handling multi-channel feature figure image |
CN108875751A (en) * | 2017-11-02 | 2018-11-23 | 北京旷视科技有限公司 | Image processing method and device, the training method of neural network, storage medium |
CN109445719A (en) * | 2018-11-16 | 2019-03-08 | 郑州云海信息技术有限公司 | A kind of date storage method and device |
CN110163370A (en) * | 2019-05-24 | 2019-08-23 | 上海肇观电子科技有限公司 | Compression method, chip, electronic equipment and the medium of deep neural network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114866483A (en) * | 2022-03-25 | 2022-08-05 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
CN114866483B (en) * | 2022-03-25 | 2023-10-03 | 新华三大数据技术有限公司 | Data compression flow control method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12061564B2 (en) | Network-on-chip data processing based on operation field and opcode | |
WO2019227724A1 (en) | Data read/write method and device, and circular queue | |
CN114501024B (en) | Video compression system, method, computer readable storage medium and server | |
US12073102B2 (en) | Method and apparatus for compressing data of storage system, device, and readable storage medium | |
US20160124683A1 (en) | In-memory data compression complementary to host data compression | |
WO2024074012A1 (en) | Video transmission control method, apparatus and device, and nonvolatile readable storage medium | |
CN106685429B (en) | Integer compression method and device | |
CN114201421A (en) | Data stream processing method, storage control node and readable storage medium | |
CN103914404A (en) | Configuration information cache device in coarseness reconfigurable system and compression method | |
WO2023197507A1 (en) | Video data processing method, system, and apparatus, and computer readable storage medium | |
WO2021237513A1 (en) | Data compression storage system and method, processor, and computer storage medium | |
WO2021237510A1 (en) | Data decompression method and system, and processor and computer storage medium | |
CN113177015B (en) | Frame header-based serial port communication method and serial port chip | |
CN107147914A (en) | A kind of embedded system and monochrome bitmap compression method, main frame | |
US11275683B2 (en) | Method, apparatus, device and computer-readable storage medium for storage management | |
WO2021237518A1 (en) | Data storage method and apparatus, processor and computer storage medium | |
CN114422801B (en) | Method, system, device and storage medium for optimizing video compression control logic | |
CN212873459U (en) | System for data compression storage | |
CN114610231A (en) | Control method, system, equipment and medium for large-bit-width data bus segmented storage | |
CN111382856B (en) | Data processing device, method, chip and electronic equipment | |
CN111382852B (en) | Data processing device, method, chip and electronic equipment | |
CN115576661A (en) | Data processing system, method and controller | |
CN112637602A (en) | JPEG interface and digital image processing system | |
CN114442951B (en) | Method, device, storage medium and electronic equipment for transmitting multipath data | |
WO2021092941A1 (en) | Roi-pooling layer computation method and device, and neural network system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20937416 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20937416 Country of ref document: EP Kind code of ref document: A1 |