CN111815502B

CN111815502B - FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Info

Publication number: CN111815502B
Application number: CN202010653783.9A
Authority: CN
Inventors: 杨晓成
Original assignee: Shanghai Xuehu Technology Co ltd
Current assignee: Shanghai Xuehu Technology Co ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2023-11-28
Anticipated expiration: 2040-07-08
Also published as: CN111815502A

Abstract

The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-image processing based on a WebP compression algorithm, which comprises the steps of importing images according to RGB three-channel data and converting the images into corresponding YUV data; buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area; each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded; the invention realizes the effective proposal of acceleration, realizes coding by adopting a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Description

FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Technical Field

The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm.

Background

With the development of image acquisition equipment such as mobile phones, flat panels and digital cameras and the like and the increase of picture pixel scale, the scale of internet image data is exponentially increased. Recent studies have shown that the size of data storage on data center servers will increase four times from 663EB to 2.6ZB in 2016 to 2021, with most of the data storage coming from images and video.

Currently, images occupy up to 60% -65% of bytes on most web pages, with image data in the page being particularly important for mobile devices, where less image information can save bandwidth and battery life. WebP is a new picture format proposed by Google on the basis of VP8 coding in order to meet the current higher and higher bandwidth requirements. Since WebP uses predictive coding techniques, the color values of its neighboring blocks are predicted from the colors of some pixel blocks and only the difference between the two is recorded. And in most cases the difference between the two is very small, even zero, so that the compression ratio is greatly improved. Comparing the WebP with the JPEG compression, when the WebP compresses the JPG to be equivalent to 90% of the original image quality, the picture volume is reduced by about 50%. When WebP compresses JPG to an amount equivalent to 80% by mass of the original image, the image volume is reduced by 60% -80%. The reason why the compression performance of the lossy WebP is superior to that of the JPG is mainly that the predictive coding technology is advanced, and the macroblock adaptive quantization also brings about improvement of compression efficiency, while the boolean arithmetic coding improves the compression performance by 5% -10% compared with the huffman coding.

In the prior art, as shown in fig. 1, the WebP lossy compression algorithm firstly converts an original picture into YUV macro blocks (Y represents brightness and UV represents chromaticity) which are correspondingly analyzed according to three channels of RGB, then the original picture is divided into two branches, one branch is used for obtaining calculation parameters required in the corresponding quantization process through simple pre-analysis and segment calculation, and the other branch is used for further processing through sub-blocks decomposed by macro blocks by respectively distinguishing the Y macro blocks, the U macro blocks and the V macro blocks, so that each pheromone is analyzed, and the information loss in the encoding process can be greatly reduced. For this reason, the whole process is from prediction, DCT transformation, quantization, inverse quantization, IDCT transformation to form a closed loop, and the same picture will form a front-back dependency between each macro block, as well as sub-blocks.

The WebP algorithm has high complexity, and the calculation of the latter macroblock must wait until the calculation of the former macroblock is finished, so that a Blocked design is formed, the processing efficiency is relatively low, as shown in fig. 2, 4 pictures are processed, and the whole processing mode is the processing mode of front and back blocking from the time of T1 to the time of T3.

With the advent of the 5G age, the high-reliability low-delay large-bandwidth data transmission has improved the requirement on cloud computing performance, and in order not to influence customer experience, the period of the picture compression coding is required to be shortened, and although the WebP algorithm greatly reduces the number of codes, the overall algorithm complexity is still higher than that of other codes.

Disclosure of Invention

In view of the above technical problems, the present invention provides an FPGA acceleration method for multi-graph processing based on WebP compression algorithm, which provides an effective scheme for accelerating WebP algorithm implementation on field-programmable gate array (FPGA), and by implementing encoding in a parallel pipeline processing manner, the method is more efficient than serial processing on CPU, and reasonably utilizes on-board resources of FPGA, and under the influence of FPGA acceleration scheme, the processing time span can be shortened to T1 to T2.

An FPGA acceleration method for multi-graph processing based on a WebP compression algorithm is characterized by comprising the following steps:

step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;

step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;

step S3: and each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are alternately switched until all macro blocks of the group of pictures are completely encoded.

In a preferred scheme, the multi-image processing FPGA acceleration method based on the WebP compression algorithm is characterized by further comprising a parameter buffer area, and after converting the YUV data into the YUV data, calculating segment parameters of the image through pre-analysis, and buffering the segment parameters into the parameter buffer area.

In a preferred scheme, the method for accelerating the FPGA based on the multi-graph processing of the WebP compression algorithm is characterized in that the parameter cache is an internal storage Bram of the FPGA.

In a preferred scheme, the method for accelerating the FPGA based on the WebP compression algorithm for multi-graph processing is characterized in that the dependent data buffer area is a DDR storage area.

The technical scheme has the following advantages or beneficial effects:

the invention provides an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Drawings

The invention and its features, aspects and advantages will become more apparent from the detailed description of non-limiting embodiments with reference to the following drawings. Like numbers refer to like parts throughout. The drawings may not be to scale, emphasis instead being placed upon illustrating the principles of the invention.

FIG. 1 is a prior art WebP lossy compression algorithm;

FIG. 2 processes a span graph of 4 pictures from time T1 to time T3;

FIG. 3 is a schematic diagram of an FPGA acceleration method based on the multi-graph processing of the WebP compression algorithm.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in FIG. 3, the invention discloses an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field-programmable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and shortens the processing time span to be between T1 and T2 under the influence of the FPGA acceleration scheme. The specific method comprises the following steps:

In a preferred embodiment, the method further comprises a parameter buffer area, and after converting the YUV data into the YUV data, the method further comprises calculating segment parameters of the picture through pre-analysis, and buffering the segment parameters into the parameter buffer area, wherein the parameter buffer area is an internal storage Bram of the FPGA.

Preferably, the dependent data buffer is a DDR memory area.

In the specific implementation manner, as shown in fig. 3, the picture data is written into the DDR memory area of the FPGA chip through the upper computer, according to the calculation flow, the data of each picture is divided into macro blocks with various sizes of Y, U and V through the calculation of three channels of RGB, so that the complexity of information calculation is increased, and then the user walks on the branch line to calculate the segment parameters, because the data size of the segment parameters of each picture is smaller, the segment parameters can be cached in the internal memory Bram of the FPGA, and the corresponding segment parameters are waited to be called for calculation when the macro blocks of the corresponding picture are quantized. And the second branch line establishes buffer areas for buffering macro block data of a plurality of pictures respectively, then the macro block data enter a calculation module through a round arbiter, one round of traversal of N pictures is carried out, the number of N depends on the whole operation period of the calculation module (including prediction, DCT conversion, quantization, inverse quantization and IDCT conversion), the operation period of the calculation module is submerged or covered through the data input of each macro block of each picture, thereby indirectly realizing the acceleration scheme of the whole parallel flow, waiting until the macro block data after the new IDCT conversion treatment is returned to the data buffer areas, and the next macro block data of the first picture can be transmitted into the calculation module as input. Each macro block completing the closed loop is written into the DDR storage space corresponding to the picture partition, and the encoding operation can be performed only after the data of one picture is calculated and processed, so that in order to buffer the compressed information of one picture, the data of the macro block after processing of N pictures is buffered by means of a larger buffer space in the FPGA, and then the encoding module entering the pipeline calculation is sequentially taken out from the DDR.

The method is characterized in that a WebP algorithm on a google open-source CPU is anti-observed, the whole processing process aims at the same picture, information extraction of a plurality of macro blocks is carried out, because the root of the whole picture compression algorithm is to filter similar information in each macro block and keep information with larger difference, and adjacent macro blocks are also kept, the object of the whole algorithm closed-loop process is a single macro block, the minimum cycle interval is the cycle number spent by calculating the single macro block, and the cycle number spent by processing the picture is increased in equal proportion as the number of the decomposed macro blocks of the picture is increased.

In order to avoid the situation, the output frame rate of the picture compression algorithm is increased, the acceleration scheme of the invention is adopted, and macro block processing of a plurality of pictures is sequentially added in the whole closed loop process to fill the middle blocking period, and the feasible reasons for doing so are that the macro block calculation among different pictures is not interfered with each other, and the FPGA resource is high in configurability, and compared with the closed loop algorithm which is more suitable for processing the blocking by a CPU (Central processing unit) in a parallel pipeline calculation mode, the output frame rate of the whole WebP algorithm is improved.

Those skilled in the art will understand that the variations may be implemented in combination with the prior art and the above embodiments, and are not described herein. Such modifications do not affect the essence of the present invention, and are not described herein.

The preferred embodiments of the present invention have been described above. It is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention, using the methods and technical contents disclosed above, without affecting the essential content of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. The FPGA acceleration method for multi-graph processing based on the Webp compression algorithm is characterized by comprising the following steps of:

step S3: each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded;

the method also comprises a parameter buffer zone, and further comprises calculating segment parameters of the picture by pre-analysis of the YUV data after the YUV data is converted into the YUV data, and buffering the segment parameters into the parameter buffer zone, wherein the parameter buffer is an internal storage Bram of the FPGA.

2. The method for accelerating the FPGA of the multi-graph processing based on the Webp compression algorithm according to claim 1, wherein the dependent data buffer area is a DDR storage area.