CN111815502B - FPGA acceleration method for multi-graph processing based on WebP compression algorithm - Google Patents
FPGA acceleration method for multi-graph processing based on WebP compression algorithm Download PDFInfo
- Publication number
- CN111815502B CN111815502B CN202010653783.9A CN202010653783A CN111815502B CN 111815502 B CN111815502 B CN 111815502B CN 202010653783 A CN202010653783 A CN 202010653783A CN 111815502 B CN111815502 B CN 111815502B
- Authority
- CN
- China
- Prior art keywords
- data
- pictures
- picture
- yuv
- webp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000006835 compression Effects 0.000 title claims abstract description 24
- 238000007906 compression Methods 0.000 title claims abstract description 24
- 230000001133 acceleration Effects 0.000 title claims abstract description 16
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 230000003139 buffering effect Effects 0.000 claims abstract description 8
- 230000001419 dependent effect Effects 0.000 claims abstract description 7
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 description 7
- 238000013139 quantization Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000000903 blocking effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000003016 pheromone Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-image processing based on a WebP compression algorithm, which comprises the steps of importing images according to RGB three-channel data and converting the images into corresponding YUV data; buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area; each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded; the invention realizes the effective proposal of acceleration, realizes coding by adopting a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm.
Background
With the development of image acquisition equipment such as mobile phones, flat panels and digital cameras and the like and the increase of picture pixel scale, the scale of internet image data is exponentially increased. Recent studies have shown that the size of data storage on data center servers will increase four times from 663EB to 2.6ZB in 2016 to 2021, with most of the data storage coming from images and video.
Currently, images occupy up to 60% -65% of bytes on most web pages, with image data in the page being particularly important for mobile devices, where less image information can save bandwidth and battery life. WebP is a new picture format proposed by Google on the basis of VP8 coding in order to meet the current higher and higher bandwidth requirements. Since WebP uses predictive coding techniques, the color values of its neighboring blocks are predicted from the colors of some pixel blocks and only the difference between the two is recorded. And in most cases the difference between the two is very small, even zero, so that the compression ratio is greatly improved. Comparing the WebP with the JPEG compression, when the WebP compresses the JPG to be equivalent to 90% of the original image quality, the picture volume is reduced by about 50%. When WebP compresses JPG to an amount equivalent to 80% by mass of the original image, the image volume is reduced by 60% -80%. The reason why the compression performance of the lossy WebP is superior to that of the JPG is mainly that the predictive coding technology is advanced, and the macroblock adaptive quantization also brings about improvement of compression efficiency, while the boolean arithmetic coding improves the compression performance by 5% -10% compared with the huffman coding.
In the prior art, as shown in fig. 1, the WebP lossy compression algorithm firstly converts an original picture into YUV macro blocks (Y represents brightness and UV represents chromaticity) which are correspondingly analyzed according to three channels of RGB, then the original picture is divided into two branches, one branch is used for obtaining calculation parameters required in the corresponding quantization process through simple pre-analysis and segment calculation, and the other branch is used for further processing through sub-blocks decomposed by macro blocks by respectively distinguishing the Y macro blocks, the U macro blocks and the V macro blocks, so that each pheromone is analyzed, and the information loss in the encoding process can be greatly reduced. For this reason, the whole process is from prediction, DCT transformation, quantization, inverse quantization, IDCT transformation to form a closed loop, and the same picture will form a front-back dependency between each macro block, as well as sub-blocks.
The WebP algorithm has high complexity, and the calculation of the latter macroblock must wait until the calculation of the former macroblock is finished, so that a Blocked design is formed, the processing efficiency is relatively low, as shown in fig. 2, 4 pictures are processed, and the whole processing mode is the processing mode of front and back blocking from the time of T1 to the time of T3.
With the advent of the 5G age, the high-reliability low-delay large-bandwidth data transmission has improved the requirement on cloud computing performance, and in order not to influence customer experience, the period of the picture compression coding is required to be shortened, and although the WebP algorithm greatly reduces the number of codes, the overall algorithm complexity is still higher than that of other codes.
Disclosure of Invention
In view of the above technical problems, the present invention provides an FPGA acceleration method for multi-graph processing based on WebP compression algorithm, which provides an effective scheme for accelerating WebP algorithm implementation on field-programmable gate array (FPGA), and by implementing encoding in a parallel pipeline processing manner, the method is more efficient than serial processing on CPU, and reasonably utilizes on-board resources of FPGA, and under the influence of FPGA acceleration scheme, the processing time span can be shortened to T1 to T2.
An FPGA acceleration method for multi-graph processing based on a WebP compression algorithm is characterized by comprising the following steps:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: and each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are alternately switched until all macro blocks of the group of pictures are completely encoded.
In a preferred scheme, the multi-image processing FPGA acceleration method based on the WebP compression algorithm is characterized by further comprising a parameter buffer area, and after converting the YUV data into the YUV data, calculating segment parameters of the image through pre-analysis, and buffering the segment parameters into the parameter buffer area.
In a preferred scheme, the method for accelerating the FPGA based on the multi-graph processing of the WebP compression algorithm is characterized in that the parameter cache is an internal storage Bram of the FPGA.
In a preferred scheme, the method for accelerating the FPGA based on the WebP compression algorithm for multi-graph processing is characterized in that the dependent data buffer area is a DDR storage area.
The technical scheme has the following advantages or beneficial effects:
the invention provides an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.
Drawings
The invention and its features, aspects and advantages will become more apparent from the detailed description of non-limiting embodiments with reference to the following drawings. Like numbers refer to like parts throughout. The drawings may not be to scale, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a prior art WebP lossy compression algorithm;
FIG. 2 processes a span graph of 4 pictures from time T1 to time T3;
FIG. 3 is a schematic diagram of an FPGA acceleration method based on the multi-graph processing of the WebP compression algorithm.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 3, the invention discloses an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field-programmable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and shortens the processing time span to be between T1 and T2 under the influence of the FPGA acceleration scheme. The specific method comprises the following steps:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: and each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are alternately switched until all macro blocks of the group of pictures are completely encoded.
In a preferred embodiment, the method further comprises a parameter buffer area, and after converting the YUV data into the YUV data, the method further comprises calculating segment parameters of the picture through pre-analysis, and buffering the segment parameters into the parameter buffer area, wherein the parameter buffer area is an internal storage Bram of the FPGA.
Preferably, the dependent data buffer is a DDR memory area.
In the specific implementation manner, as shown in fig. 3, the picture data is written into the DDR memory area of the FPGA chip through the upper computer, according to the calculation flow, the data of each picture is divided into macro blocks with various sizes of Y, U and V through the calculation of three channels of RGB, so that the complexity of information calculation is increased, and then the user walks on the branch line to calculate the segment parameters, because the data size of the segment parameters of each picture is smaller, the segment parameters can be cached in the internal memory Bram of the FPGA, and the corresponding segment parameters are waited to be called for calculation when the macro blocks of the corresponding picture are quantized. And the second branch line establishes buffer areas for buffering macro block data of a plurality of pictures respectively, then the macro block data enter a calculation module through a round arbiter, one round of traversal of N pictures is carried out, the number of N depends on the whole operation period of the calculation module (including prediction, DCT conversion, quantization, inverse quantization and IDCT conversion), the operation period of the calculation module is submerged or covered through the data input of each macro block of each picture, thereby indirectly realizing the acceleration scheme of the whole parallel flow, waiting until the macro block data after the new IDCT conversion treatment is returned to the data buffer areas, and the next macro block data of the first picture can be transmitted into the calculation module as input. Each macro block completing the closed loop is written into the DDR storage space corresponding to the picture partition, and the encoding operation can be performed only after the data of one picture is calculated and processed, so that in order to buffer the compressed information of one picture, the data of the macro block after processing of N pictures is buffered by means of a larger buffer space in the FPGA, and then the encoding module entering the pipeline calculation is sequentially taken out from the DDR.
The method is characterized in that a WebP algorithm on a google open-source CPU is anti-observed, the whole processing process aims at the same picture, information extraction of a plurality of macro blocks is carried out, because the root of the whole picture compression algorithm is to filter similar information in each macro block and keep information with larger difference, and adjacent macro blocks are also kept, the object of the whole algorithm closed-loop process is a single macro block, the minimum cycle interval is the cycle number spent by calculating the single macro block, and the cycle number spent by processing the picture is increased in equal proportion as the number of the decomposed macro blocks of the picture is increased.
In order to avoid the situation, the output frame rate of the picture compression algorithm is increased, the acceleration scheme of the invention is adopted, and macro block processing of a plurality of pictures is sequentially added in the whole closed loop process to fill the middle blocking period, and the feasible reasons for doing so are that the macro block calculation among different pictures is not interfered with each other, and the FPGA resource is high in configurability, and compared with the closed loop algorithm which is more suitable for processing the blocking by a CPU (Central processing unit) in a parallel pipeline calculation mode, the output frame rate of the whole WebP algorithm is improved.
Those skilled in the art will understand that the variations may be implemented in combination with the prior art and the above embodiments, and are not described herein. Such modifications do not affect the essence of the present invention, and are not described herein.
The preferred embodiments of the present invention have been described above. It is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention, using the methods and technical contents disclosed above, without affecting the essential content of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.
Claims (2)
1. The FPGA acceleration method for multi-graph processing based on the Webp compression algorithm is characterized by comprising the following steps of:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded;
the method also comprises a parameter buffer zone, and further comprises calculating segment parameters of the picture by pre-analysis of the YUV data after the YUV data is converted into the YUV data, and buffering the segment parameters into the parameter buffer zone, wherein the parameter buffer is an internal storage Bram of the FPGA.
2. The method for accelerating the FPGA of the multi-graph processing based on the Webp compression algorithm according to claim 1, wherein the dependent data buffer area is a DDR storage area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010653783.9A CN111815502B (en) | 2020-07-08 | 2020-07-08 | FPGA acceleration method for multi-graph processing based on WebP compression algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010653783.9A CN111815502B (en) | 2020-07-08 | 2020-07-08 | FPGA acceleration method for multi-graph processing based on WebP compression algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111815502A CN111815502A (en) | 2020-10-23 |
CN111815502B true CN111815502B (en) | 2023-11-28 |
Family
ID=72843439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010653783.9A Active CN111815502B (en) | 2020-07-08 | 2020-07-08 | FPGA acceleration method for multi-graph processing based on WebP compression algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111815502B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112437308B (en) * | 2020-11-12 | 2024-11-01 | 北京深维科技有限公司 | WebP coding method and WebP coding device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488753A (en) * | 2015-11-27 | 2016-04-13 | 武汉精测电子技术股份有限公司 | Method and device for carrying out two-dimensional Fourier transform and inverse transform on image |
CN107154062A (en) * | 2017-05-12 | 2017-09-12 | 郑州云海信息技术有限公司 | A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system |
CN107483948A (en) * | 2017-09-18 | 2017-12-15 | 郑州云海信息技术有限公司 | Pixel macroblock processing method in a kind of webp compressions processing |
CN109327698A (en) * | 2018-11-09 | 2019-02-12 | 杭州网易云音乐科技有限公司 | Dynamic previewing map generalization method, system, medium and electronic equipment |
CN110689475A (en) * | 2019-09-10 | 2020-01-14 | 浪潮电子信息产业股份有限公司 | Image data processing method, system, electronic equipment and storage medium |
CN110876078A (en) * | 2018-08-30 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Animation picture processing method and device, storage medium and processor |
CN110913225A (en) * | 2019-11-19 | 2020-03-24 | 北京奇艺世纪科技有限公司 | Image encoding method, image encoding device, electronic device, and computer-readable storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10552936B2 (en) * | 2016-03-02 | 2020-02-04 | Alibaba Group Holding Limited | Solid state storage local image processing system and method |
US10719447B2 (en) * | 2016-09-26 | 2020-07-21 | Intel Corporation | Cache and compression interoperability in a graphics processor pipeline |
-
2020
- 2020-07-08 CN CN202010653783.9A patent/CN111815502B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105488753A (en) * | 2015-11-27 | 2016-04-13 | 武汉精测电子技术股份有限公司 | Method and device for carrying out two-dimensional Fourier transform and inverse transform on image |
CN107154062A (en) * | 2017-05-12 | 2017-09-12 | 郑州云海信息技术有限公司 | A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system |
CN107483948A (en) * | 2017-09-18 | 2017-12-15 | 郑州云海信息技术有限公司 | Pixel macroblock processing method in a kind of webp compressions processing |
CN110876078A (en) * | 2018-08-30 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Animation picture processing method and device, storage medium and processor |
CN109327698A (en) * | 2018-11-09 | 2019-02-12 | 杭州网易云音乐科技有限公司 | Dynamic previewing map generalization method, system, medium and electronic equipment |
CN110689475A (en) * | 2019-09-10 | 2020-01-14 | 浪潮电子信息产业股份有限公司 | Image data processing method, system, electronic equipment and storage medium |
CN110913225A (en) * | 2019-11-19 | 2020-03-24 | 北京奇艺世纪科技有限公司 | Image encoding method, image encoding device, electronic device, and computer-readable storage medium |
Non-Patent Citations (2)
Title |
---|
Zhenhua Guo等.An OpenCL Implementation of WebP Accelerator on FPGAs.Applied Reconfigurable Computing.Architectures,Tools,and Applications..2018,第578-589页第1-3节. * |
高分七号卫星图像压缩FPGA设计与实现技术;韩宇等;航天器工程;第29卷(第3期);第169-176页第1-5节 * |
Also Published As
Publication number | Publication date |
---|---|
CN111815502A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11057585B2 (en) | Image processing method and device using line input and output | |
KR101941955B1 (en) | Recursive block partitioning | |
CN110351568A (en) | A kind of filtering video loop device based on depth convolutional network | |
TW202228081A (en) | Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product | |
CN114761968B (en) | Method, system and storage medium for frequency domain static channel filtering | |
JP2010515317A (en) | Apparatus and method for encoding transform coefficient block | |
AU2018357828A1 (en) | Method and apparatus for super-resolution using line unit operation | |
CN111815502B (en) | FPGA acceleration method for multi-graph processing based on WebP compression algorithm | |
WO2023082107A1 (en) | Decoding method, encoding method, decoder, encoder, and encoding and decoding system | |
US9794574B2 (en) | Adaptive tile data size coding for video and image compression | |
WO2022252222A1 (en) | Encoding method and encoding device | |
CN114449262A (en) | Video coding control method, device, equipment and storage medium | |
CN105100799A (en) | Method for reducing intraframe coding time delay in HEVC encoder | |
CN108900842B (en) | Y data compression processing method, device and equipment and WebP compression system | |
CN110446043A (en) | A kind of HEVC fine grained parallel coding method based on multi-core platform | |
US20110299790A1 (en) | Image compression method with variable quantization parameter | |
WO2024078066A1 (en) | Video decoding method and apparatus, video encoding method and apparatus, storage medium, and device | |
EP4300976A1 (en) | Audio/video or image layered compression method and apparatus | |
CN112437308B (en) | WebP coding method and WebP coding device | |
CN114727116A (en) | Encoding method and device | |
CN112738522A (en) | Video coding method and device | |
TWI832661B (en) | Methods, devices and storage media for image coding or decoding | |
CN114119789B (en) | A lightweight HEVC chroma image quality enhancement method based on online learning | |
CN114205614B (en) | A Parallel Hardware Method for Intra Prediction Mode Based on HEVC Standard | |
WO2024078403A1 (en) | Image processing method and apparatus, and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |