Nothing Special   »   [go: up one dir, main page]

CN111815502B - FPGA acceleration method for multi-graph processing based on WebP compression algorithm - Google Patents

FPGA acceleration method for multi-graph processing based on WebP compression algorithm Download PDF

Info

Publication number
CN111815502B
CN111815502B CN202010653783.9A CN202010653783A CN111815502B CN 111815502 B CN111815502 B CN 111815502B CN 202010653783 A CN202010653783 A CN 202010653783A CN 111815502 B CN111815502 B CN 111815502B
Authority
CN
China
Prior art keywords
data
pictures
picture
yuv
webp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010653783.9A
Other languages
Chinese (zh)
Other versions
CN111815502A (en
Inventor
杨晓成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Technology Co ltd
Original Assignee
Shanghai Xuehu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Technology Co ltd filed Critical Shanghai Xuehu Technology Co ltd
Priority to CN202010653783.9A priority Critical patent/CN111815502B/en
Publication of CN111815502A publication Critical patent/CN111815502A/en
Application granted granted Critical
Publication of CN111815502B publication Critical patent/CN111815502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-image processing based on a WebP compression algorithm, which comprises the steps of importing images according to RGB three-channel data and converting the images into corresponding YUV data; buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area; each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded; the invention realizes the effective proposal of acceleration, realizes coding by adopting a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Description

FPGA acceleration method for multi-graph processing based on WebP compression algorithm
Technical Field
The invention relates to the technical field of image processing, in particular to an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm.
Background
With the development of image acquisition equipment such as mobile phones, flat panels and digital cameras and the like and the increase of picture pixel scale, the scale of internet image data is exponentially increased. Recent studies have shown that the size of data storage on data center servers will increase four times from 663EB to 2.6ZB in 2016 to 2021, with most of the data storage coming from images and video.
Currently, images occupy up to 60% -65% of bytes on most web pages, with image data in the page being particularly important for mobile devices, where less image information can save bandwidth and battery life. WebP is a new picture format proposed by Google on the basis of VP8 coding in order to meet the current higher and higher bandwidth requirements. Since WebP uses predictive coding techniques, the color values of its neighboring blocks are predicted from the colors of some pixel blocks and only the difference between the two is recorded. And in most cases the difference between the two is very small, even zero, so that the compression ratio is greatly improved. Comparing the WebP with the JPEG compression, when the WebP compresses the JPG to be equivalent to 90% of the original image quality, the picture volume is reduced by about 50%. When WebP compresses JPG to an amount equivalent to 80% by mass of the original image, the image volume is reduced by 60% -80%. The reason why the compression performance of the lossy WebP is superior to that of the JPG is mainly that the predictive coding technology is advanced, and the macroblock adaptive quantization also brings about improvement of compression efficiency, while the boolean arithmetic coding improves the compression performance by 5% -10% compared with the huffman coding.
In the prior art, as shown in fig. 1, the WebP lossy compression algorithm firstly converts an original picture into YUV macro blocks (Y represents brightness and UV represents chromaticity) which are correspondingly analyzed according to three channels of RGB, then the original picture is divided into two branches, one branch is used for obtaining calculation parameters required in the corresponding quantization process through simple pre-analysis and segment calculation, and the other branch is used for further processing through sub-blocks decomposed by macro blocks by respectively distinguishing the Y macro blocks, the U macro blocks and the V macro blocks, so that each pheromone is analyzed, and the information loss in the encoding process can be greatly reduced. For this reason, the whole process is from prediction, DCT transformation, quantization, inverse quantization, IDCT transformation to form a closed loop, and the same picture will form a front-back dependency between each macro block, as well as sub-blocks.
The WebP algorithm has high complexity, and the calculation of the latter macroblock must wait until the calculation of the former macroblock is finished, so that a Blocked design is formed, the processing efficiency is relatively low, as shown in fig. 2, 4 pictures are processed, and the whole processing mode is the processing mode of front and back blocking from the time of T1 to the time of T3.
With the advent of the 5G age, the high-reliability low-delay large-bandwidth data transmission has improved the requirement on cloud computing performance, and in order not to influence customer experience, the period of the picture compression coding is required to be shortened, and although the WebP algorithm greatly reduces the number of codes, the overall algorithm complexity is still higher than that of other codes.
Disclosure of Invention
In view of the above technical problems, the present invention provides an FPGA acceleration method for multi-graph processing based on WebP compression algorithm, which provides an effective scheme for accelerating WebP algorithm implementation on field-programmable gate array (FPGA), and by implementing encoding in a parallel pipeline processing manner, the method is more efficient than serial processing on CPU, and reasonably utilizes on-board resources of FPGA, and under the influence of FPGA acceleration scheme, the processing time span can be shortened to T1 to T2.
An FPGA acceleration method for multi-graph processing based on a WebP compression algorithm is characterized by comprising the following steps:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: and each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are alternately switched until all macro blocks of the group of pictures are completely encoded.
In a preferred scheme, the multi-image processing FPGA acceleration method based on the WebP compression algorithm is characterized by further comprising a parameter buffer area, and after converting the YUV data into the YUV data, calculating segment parameters of the image through pre-analysis, and buffering the segment parameters into the parameter buffer area.
In a preferred scheme, the method for accelerating the FPGA based on the multi-graph processing of the WebP compression algorithm is characterized in that the parameter cache is an internal storage Bram of the FPGA.
In a preferred scheme, the method for accelerating the FPGA based on the WebP compression algorithm for multi-graph processing is characterized in that the dependent data buffer area is a DDR storage area.
The technical scheme has the following advantages or beneficial effects:
the invention provides an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, is more suitable for processing the closed-loop algorithm of the blockage compared with the CPU, and improves the output frame rate of the whole WebP algorithm.
Drawings
The invention and its features, aspects and advantages will become more apparent from the detailed description of non-limiting embodiments with reference to the following drawings. Like numbers refer to like parts throughout. The drawings may not be to scale, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a prior art WebP lossy compression algorithm;
FIG. 2 processes a span graph of 4 pictures from time T1 to time T3;
FIG. 3 is a schematic diagram of an FPGA acceleration method based on the multi-graph processing of the WebP compression algorithm.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in FIG. 3, the invention discloses an FPGA acceleration method for multi-graph processing based on a WebP compression algorithm, which provides an effective accelerating scheme for realizing the WebP algorithm on a field-programmable gate array (FPGA), realizes coding in a parallel pipeline processing mode, is more efficient than serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and shortens the processing time span to be between T1 and T2 under the influence of the FPGA acceleration scheme. The specific method comprises the following steps:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: and each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are alternately switched until all macro blocks of the group of pictures are completely encoded.
In a preferred embodiment, the method further comprises a parameter buffer area, and after converting the YUV data into the YUV data, the method further comprises calculating segment parameters of the picture through pre-analysis, and buffering the segment parameters into the parameter buffer area, wherein the parameter buffer area is an internal storage Bram of the FPGA.
Preferably, the dependent data buffer is a DDR memory area.
In the specific implementation manner, as shown in fig. 3, the picture data is written into the DDR memory area of the FPGA chip through the upper computer, according to the calculation flow, the data of each picture is divided into macro blocks with various sizes of Y, U and V through the calculation of three channels of RGB, so that the complexity of information calculation is increased, and then the user walks on the branch line to calculate the segment parameters, because the data size of the segment parameters of each picture is smaller, the segment parameters can be cached in the internal memory Bram of the FPGA, and the corresponding segment parameters are waited to be called for calculation when the macro blocks of the corresponding picture are quantized. And the second branch line establishes buffer areas for buffering macro block data of a plurality of pictures respectively, then the macro block data enter a calculation module through a round arbiter, one round of traversal of N pictures is carried out, the number of N depends on the whole operation period of the calculation module (including prediction, DCT conversion, quantization, inverse quantization and IDCT conversion), the operation period of the calculation module is submerged or covered through the data input of each macro block of each picture, thereby indirectly realizing the acceleration scheme of the whole parallel flow, waiting until the macro block data after the new IDCT conversion treatment is returned to the data buffer areas, and the next macro block data of the first picture can be transmitted into the calculation module as input. Each macro block completing the closed loop is written into the DDR storage space corresponding to the picture partition, and the encoding operation can be performed only after the data of one picture is calculated and processed, so that in order to buffer the compressed information of one picture, the data of the macro block after processing of N pictures is buffered by means of a larger buffer space in the FPGA, and then the encoding module entering the pipeline calculation is sequentially taken out from the DDR.
The method is characterized in that a WebP algorithm on a google open-source CPU is anti-observed, the whole processing process aims at the same picture, information extraction of a plurality of macro blocks is carried out, because the root of the whole picture compression algorithm is to filter similar information in each macro block and keep information with larger difference, and adjacent macro blocks are also kept, the object of the whole algorithm closed-loop process is a single macro block, the minimum cycle interval is the cycle number spent by calculating the single macro block, and the cycle number spent by processing the picture is increased in equal proportion as the number of the decomposed macro blocks of the picture is increased.
In order to avoid the situation, the output frame rate of the picture compression algorithm is increased, the acceleration scheme of the invention is adopted, and macro block processing of a plurality of pictures is sequentially added in the whole closed loop process to fill the middle blocking period, and the feasible reasons for doing so are that the macro block calculation among different pictures is not interfered with each other, and the FPGA resource is high in configurability, and compared with the closed loop algorithm which is more suitable for processing the blocking by a CPU (Central processing unit) in a parallel pipeline calculation mode, the output frame rate of the whole WebP algorithm is improved.
Those skilled in the art will understand that the variations may be implemented in combination with the prior art and the above embodiments, and are not described herein. Such modifications do not affect the essence of the present invention, and are not described herein.
The preferred embodiments of the present invention have been described above. It is to be understood that the invention is not limited to the specific embodiments described above, wherein devices and structures not described in detail are to be understood as being implemented in a manner common in the art; any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention, using the methods and technical contents disclosed above, without affecting the essential content of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (2)

1. The FPGA acceleration method for multi-graph processing based on the Webp compression algorithm is characterized by comprising the following steps of:
step S1: the picture is transmitted according to RGB three-channel data and converted into corresponding YUV data;
step S2: buffering YUV data generated by corresponding pictures into an on-chip DDR buffer memory, reading the data into a calculation module through bus read data, and respectively reading corresponding data from the on-chip DDR buffer memory according to the processing progress of a plurality of pictures and putting the corresponding data into a dependent data buffer memory area;
step S3: each time the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the group of pictures are completely encoded;
the method also comprises a parameter buffer zone, and further comprises calculating segment parameters of the picture by pre-analysis of the YUV data after the YUV data is converted into the YUV data, and buffering the segment parameters into the parameter buffer zone, wherein the parameter buffer is an internal storage Bram of the FPGA.
2. The method for accelerating the FPGA of the multi-graph processing based on the Webp compression algorithm according to claim 1, wherein the dependent data buffer area is a DDR storage area.
CN202010653783.9A 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm Active CN111815502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010653783.9A CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010653783.9A CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Publications (2)

Publication Number Publication Date
CN111815502A CN111815502A (en) 2020-10-23
CN111815502B true CN111815502B (en) 2023-11-28

Family

ID=72843439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010653783.9A Active CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Country Status (1)

Country Link
CN (1) CN111815502B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112437308B (en) * 2020-11-12 2024-11-01 北京深维科技有限公司 WebP coding method and WebP coding device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488753A (en) * 2015-11-27 2016-04-13 武汉精测电子技术股份有限公司 Method and device for carrying out two-dimensional Fourier transform and inverse transform on image
CN107154062A (en) * 2017-05-12 2017-09-12 郑州云海信息技术有限公司 A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system
CN107483948A (en) * 2017-09-18 2017-12-15 郑州云海信息技术有限公司 Pixel macroblock processing method in a kind of webp compressions processing
CN109327698A (en) * 2018-11-09 2019-02-12 杭州网易云音乐科技有限公司 Dynamic previewing map generalization method, system, medium and electronic equipment
CN110689475A (en) * 2019-09-10 2020-01-14 浪潮电子信息产业股份有限公司 Image data processing method, system, electronic equipment and storage medium
CN110876078A (en) * 2018-08-30 2020-03-10 阿里巴巴集团控股有限公司 Animation picture processing method and device, storage medium and processor
CN110913225A (en) * 2019-11-19 2020-03-24 北京奇艺世纪科技有限公司 Image encoding method, image encoding device, electronic device, and computer-readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552936B2 (en) * 2016-03-02 2020-02-04 Alibaba Group Holding Limited Solid state storage local image processing system and method
US10719447B2 (en) * 2016-09-26 2020-07-21 Intel Corporation Cache and compression interoperability in a graphics processor pipeline

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488753A (en) * 2015-11-27 2016-04-13 武汉精测电子技术股份有限公司 Method and device for carrying out two-dimensional Fourier transform and inverse transform on image
CN107154062A (en) * 2017-05-12 2017-09-12 郑州云海信息技术有限公司 A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system
CN107483948A (en) * 2017-09-18 2017-12-15 郑州云海信息技术有限公司 Pixel macroblock processing method in a kind of webp compressions processing
CN110876078A (en) * 2018-08-30 2020-03-10 阿里巴巴集团控股有限公司 Animation picture processing method and device, storage medium and processor
CN109327698A (en) * 2018-11-09 2019-02-12 杭州网易云音乐科技有限公司 Dynamic previewing map generalization method, system, medium and electronic equipment
CN110689475A (en) * 2019-09-10 2020-01-14 浪潮电子信息产业股份有限公司 Image data processing method, system, electronic equipment and storage medium
CN110913225A (en) * 2019-11-19 2020-03-24 北京奇艺世纪科技有限公司 Image encoding method, image encoding device, electronic device, and computer-readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Zhenhua Guo等.An OpenCL Implementation of WebP Accelerator on FPGAs.Applied Reconfigurable Computing.Architectures,Tools,and Applications..2018,第578-589页第1-3节. *
高分七号卫星图像压缩FPGA设计与实现技术;韩宇等;航天器工程;第29卷(第3期);第169-176页第1-5节 *

Also Published As

Publication number Publication date
CN111815502A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US11057585B2 (en) Image processing method and device using line input and output
KR101941955B1 (en) Recursive block partitioning
CN110351568A (en) A kind of filtering video loop device based on depth convolutional network
TW202228081A (en) Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product
CN114761968B (en) Method, system and storage medium for frequency domain static channel filtering
JP2010515317A (en) Apparatus and method for encoding transform coefficient block
AU2018357828A1 (en) Method and apparatus for super-resolution using line unit operation
CN111815502B (en) FPGA acceleration method for multi-graph processing based on WebP compression algorithm
WO2023082107A1 (en) Decoding method, encoding method, decoder, encoder, and encoding and decoding system
US9794574B2 (en) Adaptive tile data size coding for video and image compression
WO2022252222A1 (en) Encoding method and encoding device
CN114449262A (en) Video coding control method, device, equipment and storage medium
CN105100799A (en) Method for reducing intraframe coding time delay in HEVC encoder
CN108900842B (en) Y data compression processing method, device and equipment and WebP compression system
CN110446043A (en) A kind of HEVC fine grained parallel coding method based on multi-core platform
US20110299790A1 (en) Image compression method with variable quantization parameter
WO2024078066A1 (en) Video decoding method and apparatus, video encoding method and apparatus, storage medium, and device
EP4300976A1 (en) Audio/video or image layered compression method and apparatus
CN112437308B (en) WebP coding method and WebP coding device
CN114727116A (en) Encoding method and device
CN112738522A (en) Video coding method and device
TWI832661B (en) Methods, devices and storage media for image coding or decoding
CN114119789B (en) A lightweight HEVC chroma image quality enhancement method based on online learning
CN114205614B (en) A Parallel Hardware Method for Intra Prediction Mode Based on HEVC Standard
WO2024078403A1 (en) Image processing method and apparatus, and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant