Nothing Special   »   [go: up one dir, main page]

CN106254870B - Video encoding method, system and computer-readable recording medium using adaptive color conversion - Google Patents

Video encoding method, system and computer-readable recording medium using adaptive color conversion Download PDF

Info

Publication number
CN106254870B
CN106254870B CN201610357374.8A CN201610357374A CN106254870B CN 106254870 B CN106254870 B CN 106254870B CN 201610357374 A CN201610357374 A CN 201610357374A CN 106254870 B CN106254870 B CN 106254870B
Authority
CN
China
Prior art keywords
coding
size
mode
act
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610357374.8A
Other languages
Chinese (zh)
Other versions
CN106254870A (en
Inventor
张耀仁
林俊隆
涂日升
林敬杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial Technology Research Institute ITRI
Original Assignee
Industrial Technology Research Institute ITRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/757,556 external-priority patent/US20160360205A1/en
Priority claimed from TW105114323A external-priority patent/TWI597977B/en
Application filed by Industrial Technology Research Institute ITRI filed Critical Industrial Technology Research Institute ITRI
Publication of CN106254870A publication Critical patent/CN106254870A/en
Application granted granted Critical
Publication of CN106254870B publication Critical patent/CN106254870B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video coding method and system, the method includes the following steps. An original video frame is received. The original video picture is divided into a coding tree unit (coding tree unit). A coding unit (coding unit) is determined from the coding tree unit. A coding mode (coding mode) of the coding unit is enabled or disabled. If the coding mode is enabled, it is determined whether to estimate a transform unit size in the enabled coding mode. The conversion unit of the coding unit is decided in the enabled coding mode. The size of the coding unit is NxN.

Description

Video encoding method, system and computer-readable recording medium using adaptive color conversion
Technical Field
The present disclosure relates to video encoding and decoding methods and systems.
Background
The demand for high quality images is gradually increasing. With the coming of video standards such as 4K and 8K, it is highly desirable to improve video encoding and decoding efficiency. In addition, consumers expect to be able to transmit and receive high quality images through various transmission media. For example, consumers desire to be able to view high quality images over networks on portable devices (e.g., smart phones, tablet computers, notebook computers) and home televisions and computers. Consumers also want to be able to display high quality images during video conferencing and screen sharing.
The High Efficiency Video Coding (HEVC) standard h.265 provides a new standard for improving the Coding and decoding performance of Video compression. HEVC, which is established by ISO/IEC JTC 1/SC 29/WG 11 MPEG (moving Picture Experts group) and ITU-T SG16 VCEG (video Coding Experts group), can reduce the data rate of compressed high quality video compared to the original AVC (advanced video Coding) standard. The AVC standard is also known as h.264.
HEVC compresses video using various coding tools such as Inter prediction (Inter prediction) and intra prediction (intra prediction). Inter-prediction techniques utilize temporal redundancy (temporal redundancy) between different video pictures of a video stream to compress video data. For example, encoded and decoded video pictures containing similar content may be used to encode a current video picture. These coded and decoded video pictures can be used to predict the coding region of the current video picture. In contrast, intra prediction techniques compress video data using only the intra data of the currently encoded video picture. Intra prediction techniques do not use the temporal redundancy of different video pictures. For example, a current video picture is encoded using another portion of the same picture. The intra prediction technique includes 35 intra modes including a Planar mode (Planar mode), a DC mode, and 33 directional modes (directional modes).
Compared to the AVC standard, the HEVC standard employs an extended partitioning technique (expanding and partitioning) for each input video picture. The AVC standard performs partitioning using only large blocks (macroblocks) of an input video picture at the time of encoding and decoding. In contrast, the HEVC standard may partition an input video picture into data units and blocks of different sizes, as described below. The HEVC standard provides more flexibility to encoding and decoding processes for dynamic, multi-detail, and multi-edge video pictures than the AVC standard.
Some coding tools that can improve video coding procedures are also listed in the HEVC standard. Such coding tools are called coding extensions (coding extensions). The Screen Content coding extension (SCC extension) is focused on improving the processing performance of video Screen Content under the HEVC standard. The screen content is a video imaged (render) of a pattern, text or animation, rather than a video scene extracted by a camera. The imaged pattern, text or animation may be dynamic or static and may be provided as video within a video scene extracted by the camera. Examples of SCC applications may include Screen mirroring (Screen mirroring), cloud gaming (closed gaming), wireless display of content (wireless display of content), display of remote computer access (display generated remote computer desktop access), and Screen sharing (Screen sharing), such as instant Screen sharing for video conferencing.
One coding tool within the SCC is Adaptive Color Transform (ACT). ACT is a color space conversion applied to residual pixel samples (residual pixel samples) of a Coding Unit (CU). For a particular color space, there is already a correlation of color elements (color components) of a pixel of a Coding Unit (CU). When the correlation of the color elements of a pixel is high, the pixel performing ACT may help the correlated color elements concentrate energy by de-correlating (de-correlating). This energy-intensive approach can improve coding efficiency and reduce coding cost. Thus, ACT can improve coding performance in HEVC coding.
However, during the encoding process, an additional Rate Distorsionnotificationj (RDO) is required to evaluate whether ACT is enabled. RDO is used to estimate the cost of Rate Distortion (RD). These evaluation processes may increase coding complexity and coding time. Furthermore, ACT may not be necessary when the color elements of a pixel have been decorrelated. In this case, since the cost of performing ACT is higher than the benefit of encoding, further decorrelation procedures on color elements may not provide any benefit.
Disclosure of Invention
According to an aspect of the present disclosure, there is provided a video encoding method. The video encoding method includes the following steps. An original video frame is received. The original video picture is divided into a coding tree unit (coding tree unit). A coding unit (coding unit) is determined from the coding tree unit. A coding mode (coding mode) of the coding unit is enabled or disabled. If the coding mode is enabled, it is determined whether to estimate a transform unit size when the coding mode is enabled. The conversion unit of the coding unit is decided in the enabled coding mode. The size of the coding unit is NxN.
According to another aspect of the present disclosure, a video encoding system is provided. The video coding system includes a memory and a processor. The memory is used for storing a set of instructions. The processor is used for executing the set of instructions. The set of instructions includes the following steps. An original video frame is received. The original video picture is divided into a coding tree unit (coding tree unit). A coding unit (coding unit) is determined from the coding tree unit. A coding mode (coding mode) of the coding unit is enabled or disabled. If the coding mode is enabled, it is determined whether to estimate a transform unit size in the enabled coding mode. The conversion unit of the coding unit is decided in the enabled coding mode. The size of the coding unit is NxN.
According to another aspect of the present disclosure, a non-transitory computer-readable recording medium is provided. The non-transitory computer readable recording medium stores a set of instructions. The set of instructions is executable by one or more processors to perform a video encoding method. The video encoding method includes the following steps. An original video frame is received. The original video picture is divided into a coding tree unit (coding tree unit). A coding unit (codingunit) is determined from the coding tree unit. A coding mode (coding mode) of the coding unit is enabled or disabled. If the coding mode is enabled, it is determined whether to estimate a transform unit size in the enabled coding mode. The conversion unit of the coding unit is decided in the enabled coding mode. The size of the coding unit is NxN.
In order to better appreciate the above and other aspects of the present disclosure, reference will now be made in detail to the various embodiments, examples of which are illustrated in the accompanying drawings, wherein:
drawings
FIGS. 1A-1J illustrate video frames and associated segmentation according to several embodiments of the present disclosure.
Fig. 2 shows a video encoder of the present disclosure.
Fig. 3 illustrates an encoding method according to an embodiment of the present disclosure.
Fig. 4 illustrates an encoding method according to another embodiment of the present disclosure.
Fig. 5 illustrates an encoding method according to another embodiment of the present disclosure.
Fig. 6 illustrates an encoding method according to another embodiment of the present disclosure.
Fig. 7 illustrates the calculation flow of IPM in non-444 chroma format.
Fig. 8 illustrates a system for performing the encoding and decoding methods of the present disclosure.
[ notation ] to show
101: video picture (video frame)
102: coding Tree Unit (CTU)
103: luminance coding tree block (luma CTB)
104:Cb CTB
105:Cr CTB
106. 111: description of the related Art
107-1, 107-2, 107-3, 107-4: luminance coding block (luma CB)
108: coding unit (Coding unit, CU)
109:Cb CB
110:Cr CB
112: luma Prediction Block (PB)
113-1, 113-2, 113-3, 113-4: conversion block (transform block, TB)
114: conversion unit (Transform unit, TU)
200: video encoder
202: picture segmentation Module (Frame partitioning Module)
204: inter Prediction enabling ACT Module (Inter Prediction enabling adaptive colorreporting Module)
206: inter Prediction disabling ACT Module (Inter Prediction partitioning ACT Module)
208: picture register (Frame Buffer)
210: mode Decision Module (Mode Decision Module)
212: intra Prediction enabling ACT Module (Intra Prediction enabling ACT Module)
214: Intra-Prediction disabled ACT Module (Intra Prediction disabling ACT Module)
216. 218: totaling Module (Summing Module)
220: switching device
222: adaptive color conversion (ACT) module
224: CCP, transformation and Quantization Module (CCP)
226: entropy Coding Module (control Coding Module)
228: inverse CCP, Transform, and quantization Module (Inverse CCP)
230: switching device
232: inverse operation ACT Module (Inverse ACT Module)
300. 400, 500, 600, 700, 800: coding method
304: component correlation analysis (component correlation analysis)
306: profile mode decision (Rough mode decision)
308: end up
310: rate distortion optimization mode decision (RDO mode decision)
311: determination of whether chroma format is non-444 (non-444)
312: determination of whether CU size is less than threshold T1
314: TU size determination (TU size determination)
316: chroma mode decision (chroma mode decision)
402: determination of whether CU size is less than threshold T2
702: non-transitory computer readable medium
704: processor with a memory having a plurality of memory cells
Detailed Description
Exemplary embodiments will be described in detail below with reference to the accompanying drawings. In the drawings described below, like reference numerals in different drawings represent the same or similar elements unless otherwise specified. The embodiments set forth below are not intended to represent all implementations of the present disclosure. Indeed, these embodiments are merely examples of systems and methods that may correspond to the claims.
Fig. 1A-1J illustrate video pictures and their associated segmentation according to embodiments of the present disclosure.
Fig. 1A shows a video frame 101. The video frame 101 comprises several pixels. A video picture 101 is partitioned into a number of Coding Tree Units (CTUs) 102. The size of each CTU102 is determined according to L vertical samples and L horizontal samples (LxL). Each sample corresponds to a pixel value at a different pixel location of the CTU. For example, L may be 16, 32, or 64. The pixel location may be the location of the pixel at the CTU or the location between pixels. When the pixel location is a location between pixels, the pixel value may be an interpolated value of one or more pixels near the pixel location. Each CTU102 includes a luma coding tree block (luma CTB), a chroma coding tree block (chroma CTB), and an associated description (associated syntax).
Fig. 1B illustrates that several CTBs may be included in one CTU102 of fig. 1A. For example, CTU102 may include luma CTB (luma CTB)103, chroma CTB (chroma CTB) (Cb-containing CTB104 Cr CTB 105). CTU102 may also include associated instructions (associated syntax) 106. Cb CTB104 is a blue color difference CTB (blue color difference chroma component CTB), which represents the change of CTB in blue. Cr CTB105 is a red difference color CTB (red difference chroma component CTB), which indicates a change in the color of CTB in red. The correlation specification 106 includes information on how the luma CTB103, Cb CTB104, and Cr CTB105 are encoded, and further partitioning of the luma CTB103, Cb CTB104, and Cr CTB 105. The dimensions of the CTB103, Cb CTB104, and Cr CTB105 may be the same as those of the CTU 102. Alternatively, the size of the luminance CTB103 may be the same as that of the CTU102, but the sizes of the Cb CTB104 and the Cr CTB105 may be smaller than that of the CTU 102.
Intra prediction (intra prediction), inter prediction (inter prediction), and other coding tools operate on Coding Blocks (CBs). To decide whether the encoding procedure is to use intra prediction or inter prediction, the CTB may be partitioned into one or more CBs. The procedure of CTB partitioning into CBs is according to a quad-tree partitioning (quad-tree) technique. Therefore, the CTB may be divided into four CBs, and each CB may be subdivided into four CBs. Such a segmentation procedure may be continued according to the size of the CTB.
FIG. 1C shows the luminance CTB103 of FIG. 1B divided into one or more luminances CB107-1, 107-2, 107-3 or 107-4. Taking the luminance CTB of 64x64 as an example, the corresponding luminance CB107-1, 107-2, 107-3, or 107-4 may be of NxN size, such as 64x64, 32x32, 16x16, or 8x 8. In fig. 1C, the size of the luminance CTB103 is 64x 64. And the size of the luminance CTB103 may be 32x32 or 16x 16.
FIG. 1D illustrates an example of the luminance CTB103 of FIG. 1B being split into quadtree partitions, where the luminance CTB103 is split into luminances CB107-1, 107-2, 107-3, or 107-4 of FIG. 1C. In fig. 1D, the size of the luminance CTB103 is 64x 64. However, the size of the luminance CTB103 may also be 32x32 or 16x 16.
In fig. 1D, the luminance CTB103 is divided into four luminances CB 107-2 of 32 × 32. The luminance CB of each 32x32 can be further divided into four luminance CB 107-3 of 16x 16. The luminance CB of each 16x16 can be further divided into four luminance CBs 107-4 of 8x 8.
A Coding Unit (CU) is used to code the CB. A CTB may include only one CU, or be partitioned into several CUs. The CU size may thus also be NxN, e.g. 64x64, 32x32, 16x16 or 8x 8. Each CU includes one luma CB, two chroma CBs and associated description. The size of the residual CU generated in the encoding and decoding processes may be the same as the size of its corresponding CU.
FIG. 1E is a schematic diagram of CBs (luminance CB107-1 of FIG. 1C), which may be part of CU 108. For example, CU108 may include luma CB107-1 and chroma CBs (Cb CB 109) and chroma CB (Cr CB 110). CU108 may include a related description 111. The related description 111 includes information on how to encode the luma CB107-1, the CB 109, and the Cr CB110, such as a description of the quadtree information (size, position, and further partitioning of luma CB and chroma CB). Each CU108 may have associated Prediction Blocks (PBs) at luma CB107-1, CB 109, and Cr CB 110. The prediction blocks are combined into Prediction Units (PUs).
FIG. 1F illustrates various possible divisions of CB107-1 of FIG. 1D into a luminance PB 112. The luminance CB107-1 is divided into luminances PB 112 according to predictability of different areas of the luminance CB107-1, for example. For example, the luminance CB107-1 may include a single luminance PB 112 having the same size as the luminance CB 107-1. Alternatively, the luminance CB107-1 may be vertically or horizontally divided into two even luminances PB 112. Or the luminance CB107-1 may be vertically or horizontally divided into four luminances PB 112. It should be noted that fig. 1F is merely an example. Any way of partitioning into PBs under the HEVC standard falls within the scope of this disclosure. FIG. 1F illustrates the way the luminance CB107-1 is divided into the luminances PB 112 is mutually exclusive. For example, in intra prediction modes of HEVC, the CBs of 64x64, 32x32, and 16x16 may be partitioned into a single PB, which is the same size as the CB. However, a CB of 8x8 may be split into a single PB of 8x8 or four PBs of 4x 4.
Once intra-prediction or inter-prediction is used, a residual signal (residual signal) generated at a different location between a prediction block and a source video image block is converted into another domain (domain) for further encoding by Discrete Cosine Transform (DCT) or Discrete Sine Transform (DST). To provide these conversions, each CU or each CB needs to utilize one or more conversion blocks (TBs).
FIG. 1G illustrates how the luminance CB107-1 of FIG. 1E or FIG. 1F is divided into different TBs 113-1, 113-2, 113-3 and 113-4. If the luminance CB107-1 is a CB of 64x64, the TB113-1 is a TB of 32x32, the TB 113-2 is a TB of 16x16, the TB113-3 is a TB of 8x8, and the TB 113-4 is a TB of 4x 4. The luminance CB107-1 may be divided into 4 TBs 113-1, 16 TBs 113-2, 64 TBs 113-3, and 256 TBs 113-4. One luminance CB107-1 may be divided into TBs 113 of the same size or TBs 113 of different sizes.
The procedure for partitioning CB into TB is according to a quad-tree partitioning. Thus, one CB may be partitioned into one or more TBs, where each TB may be further partitioned into 4 TBs. Such a segmentation procedure may proceed according to the size of the CB.
FIG. 1H illustrates a quadtree partitioning of luminance CB107-1 of FIG. 1E or FIG. 1F into TBs 113-1, 113-2, 113-3 or 113-4 of FIG. 1G using various partitioning schemes. In FIG. 1H, the luminance CB107-1 has a size of 64x 64. However, the size of the luminance CB107-1 may be 32x32 or 16x 16.
In FIG. 1H, luminance CB107-1 is divided into four 32 × 32 TB 113-1. Each 32x32 TB may be further divided into 4 16x16 TBs 113-2. Each 16x16 TB may be further divided into 4 TBs 113-3 of 8x 8. Each 8x8 TB may be further divided into 4x4 TBs 113-4.
TB113 is then converted to DCT or any HEVC standard. A Transform Units (TUs) aggregates TB 113. One or more TBs are employed by each CB. CB forms each CU. Therefore, the structure of a Transform Unit (TU) is different for different CUs 108 and is determined by CU 108.
FIG. 1I shows various partitioned TBs 113-1, 113-2, 113-3 and 113-4 of TU 114. Each TU summarizes the TB split in fig. 1G or fig. 1H. A 32x32 TU 114 may employ a single TB113-1 of 32x32, or one or more TBs 113-2 of 16x16, TB113-3 of 8x8, or TB 113-4 of 4x 4. For a CU with inter prediction of HEVC, a TU may be larger than a PU, such that the TU may contain PU boundaries (boundaries). However, for a CU that employs intra prediction of HEVC, a TU may not cross (cross) PU boundaries.
FIG. 1J depicts a quadtree partitioning of the TU 114 of FIG. 1I, utilizing the various TBs 113-1, 113-2, 113-3 or 113-4 of FIG. 1I. In fig. 1J, TU 114 has a size of 32x 32. However, the size of TU may be 16x16, 8x8, or 4x 4.
In FIG. 1J, TU 114 is split into one TB113-1 of 32X32 and 4 TB 113-2 of 16X 16. Each TB of 16x16 may be further divided into 4 TBs 113-3 of 8x 8. Each 8x8 TB may be further divided into 4x4 TBs 113-4.
The CTUs, CTBs, CBs, CUs, PUs, PBs, TUs, or TBs described in this disclosure may include any feature (feature), size (size), and property (property) of the HEVC standard. The segmentation described in fig. 1C, 1E and 1F can also be applied to chroma CTB (Cb CTB 104), chroma CTB (Cr CTB 105), chroma Cb (Cb 109) and chroma Cb (Cr Cb 110).
Fig. 2 shows a video encoder 200 performing the encoding method of the present disclosure. Video encoder 200 may include one or more additional elements that provide additional encoding functions of HEVC-SCC, such as palette mode, sample adaptive offset, and de-blocking filtering. Furthermore, the present disclosure contemplates intra prediction modes of ACT and other coding modes, such as inter prediction modes of ACT.
The video encoder 200 receives an original video frame (source video frame) as input. The input original video Frame is first input to a Frame partitioning Module (Frame partitioning Module) 202. The frame segmentation module 202 segments the original video frame into at least one original ctu (source ctu). The original CU (source CU) is obtained from the original CTU. The size of the original CTU and the size of the original CU are determined by the picture segmentation module 202. Then, the encoding is performed on a CU-by-CU basis. After being output by the picture division Module 202, the original CU is input to an Inter Prediction enabled ACT Module (Inter Prediction adaptive color transform Module)204, an Inter Prediction disabled ACT Module (Inter Prediction disabling ACT Module)206, an Intra Prediction enabled ACT Module (Intra Prediction disabling ACT Module)212, and an Intra Prediction disabled ACT Module (Intra Prediction disabling ACT Module) 214.
The original CU of the input picture is encoded by the inter prediction enable ACT module 204, which determines a prediction of an original CU from the input picture using inter prediction techniques and enabling adaptive color conversion (ACT). The original CU of the input picture is also encoded by the inter prediction disable ACT module 206, which determines a prediction of the original CU from the input picture using inter prediction techniques and without enabling adaptive color conversion (ACT) (i.e., ACT is disabled).
The reference CU stored in a Frame Buffer 208 may be used in inter prediction. The original PUs and PBs are also obtained by the original CU and are used for inter prediction procedures in the inter-prediction enabled ACT module 204 and the inter-prediction disabled ACT module 206. Inter prediction uses regions of a video picture at different times for motion detection. The encoded inter-predicted CUs of the inter-prediction enabled ACT module 204 and the inter-prediction disabled ACT module 206 are predetermined to be of the highest picture quality. The encoded inter-prediction CU is then input to a Mode Decision Module (Mode Decision Module) 210.
The original CU of the input picture is also encoded by the intra prediction enabled ACT module 212, which utilizes intra prediction techniques and enables adaptive color conversion (ACT) to determine a prediction of the original CU from the input picture.
The original CU of the input picture is also encoded by the intra prediction disable ACT module 214, which determines a prediction of the original CU from the input picture using intra prediction techniques and without enabling adaptive color conversion (ACT) (i.e., ACT is disabled).
When the intra-prediction enabled ACT module 212 and the intra-prediction disabled ACT module 214 perform intra-prediction, the original CU of the same picture stored in the picture register 208 may be used. The original PUs and PBs are also obtained by the original CU and are used for intra prediction procedures in the intra prediction enabled ACT module 212 and the intra prediction disabled ACT module 214. The encoded intra-predicted CU is predetermined to the highest picture quality. The encoded intra-prediction CUs output from the intra-prediction enable ACT module 212 and the intra-prediction disable ACT module 214 are input to the mode decision module 210.
In the mode decision module 210, the cost of encoding the original CU is compared with the quality of the predicted CU by using an inter-prediction enable ACT, an inter-prediction disable ACT, an intra-prediction enable ACT, an intra-prediction disable ACT, and the like. Based on the comparison result, it is decided which encoding mode of the predicted CU (e.g., inter-predicted CU or intra-predicted CU) is. The selected predicted CU is then passed to a Summing Module (Summing Module)216, 218.
In the summation block 216, the selected predicted CU is subtracted from the original CU to provide a remaining CU (residual CU). If the selected predicted CU is from one of the inter-prediction enabled ACT module 204 and the intra-prediction enabled ACT module 121, the switch 220 switches to position A. In position a, the remaining CUs are input to an ACT Module 222, and then to a CCP, Transform, and Quantization Module 224(CCP, Transform, and Quantization Module) 224. However, if the selected CU is from one of the inter-prediction disabled ACT module 206 and the intra-prediction disabled ACT module 214, the switch 220 switches to position B. At position B, the ACT module 222 is skipped and is not executed during the encoding process. The remaining CUs are input directly from the summing module 216 to the CCP, conversion and quantization module 224.
In ACT block 222, adaptive color transform (adaptive color transform) is performed on the remaining CUs. The output of the ACT module 222 is coupled to the CCP, convert, and quantize module 224.
CCP, Transform and Quantization Module 224 performs Cross Component Prediction (CCP), Transform (such as Discrete Cosine Transform (DCT) or Discrete Sine Transform (DST), and Quantization of remaining CUs CCP, the output of Transform and Quantization Module 224 is coupled to Entropy Coding Module (Entropy Coding Module)226 and Inverse operation CCP, Transform and Quantization Module (Inverse CCP, Transform, and Quantization Module) 228.
The entropy coding module 226 performs residual entropy coding (entropy encoding). For example, full-text Adaptive Binary Arithmetic Coding (CABAC) may be performed to encode the remaining CUs. Any other entropy encoding procedure provided by HEVC may be performed in entropy encoding module 226.
After performing entropy encoding, the encoded bitstream of the CU of the input video picture is output from the video encoder 200. The output encoded bit stream may be stored in a memory, broadcast over a transmission line or network, or provided to a display or the like.
In the inverse CCP, transform and quantize module 228, the inverse of the determinations of CCP, transform and quantize module 224 are performed on the remaining CUs to provide a reconstructed remaining CU.
If the selected predicted CU is from the inter-prediction enabled ACT module 204 or the intra-prediction enabled ACT module 212, the switch 230 switches to position C. At position C, the reconstructed remaining CU is input to Inverse ACT Module 232 and then to Summing Module 218. However, if the selected CU is from the inter-predictor disable ACT module 206 or the intra-predictor disable ACT module 214, the switch 230 switches to position D. At position D, the inverse ACT module 232 is skipped from execution and the reconstructed remaining CUs are input directly to the summation module 218.
The inverse ACT module 232 performs an inverse operation of the adaptive color conversion of the ACT module 232 on the reconstructed remaining CUs. The output of the inverse ACT module 232 is input to the summing module 218.
In the summation module 218, the selected predicted CU from the mode decision module 210 is added to the reconstructed remaining CU to provide a reconstructed original CU (reconstructed source CU). The reconstructed original CU is then stored in a picture register 208 for use in inter prediction and intra prediction of other CUs.
How the encoding methods 300, 400, and 500 described below are performed within the intra-prediction enabled ACT module 212. The encoding methods 300, 400, and 500 can improve encoding efficiency and encoding time.
The inter-prediction enabled ACT module 204, the inter-prediction disabled ACT module 206, the intra-prediction enabled ACT module 212, and the intra-prediction disabled ACT module 214 are not limited to being arranged in a parallel manner. In one embodiment, the inter-prediction enabled ACT module 204, the inter-prediction disabled ACT module 206, the intra-prediction enabled ACT module 212, and the intra-prediction disabled ACT module 214 may be arranged in sequence. The arrangement of the inter-prediction enabled ACT module 204, the inter-prediction disabled ACT module 206, the intra-prediction enabled ACT module 212, and the intra-prediction disabled ACT module 214 may vary.
Fig. 3 illustrates an encoding method 300 according to an embodiment of the present disclosure, which determines whether or not estimation of TU size (TU sizing) needs to be performed in an ACT enabled intra prediction coding process. More specifically, the encoding method 300 utilizes a threshold calculation (threshold calculation) for CU size to determine whether to perform the estimation of TU size.
In step 304, component correlation analysis (component correlation analysis) is performed on an original CU to determine whether the coding mode of the ACT of the CU needs to be enabled. The correlation of the color components of the individual pixels within the CU is analyzed. In each pixel, the correlation of the color component is compared with a pixel correlation threshold (pixel correlation threshold) to analyze whether the correlation is higher than, equal to, or lower than the pixel correlation threshold.
In a CU, the total number of pixels above the pixel correlation threshold is calculated, wherein pixels equal to the pixel correlation threshold are also counted as being above the pixel correlation threshold. The total number of pixels is then compared to a CU correlation threshold.
If the total number of pixels is less than the CU correlation threshold, the color components of the CU are determined to have low correlation. Therefore, the CU does not need ACT, so the flow proceeds to step 308, where ACT is disabled for CU encoding.
However, if the total number of pixels is higher than the CU correlation threshold, the color component of the CU is determined to have high correlation. In this case, ACT is the component dependency that is needed to remove each pixel of the CU. When a high correlation is confirmed, ACT is enabled and flow proceeds to step 306. In step 306, a rough mode determination is made with intra prediction enabled ACT.
The correlation analysis of step 304 may be further or alternatively performed according to the color space (colorspace) of the CU. For example, at step 304, the color components of pixels within the CU may be analyzed and the color space of the CU may be determined. The color space may be determined as a red, green, and blue (RGB) space or a luminance and chrominance (YUV) space.
If the color space is determined to be an RGB color space, the process proceeds to step 306. In step 306, a Rough mode decision (Rough mode decision) is made with the intra prediction enabled ACT. Since RGB pixel components typically have high correlation, ACT is required to remove the correlation of components of each pixel within a CU to isolate the pixel energy (pixel energy) into single components.
On the other hand, when the color space is determined to be the YUV color space, the flow proceeds to step 308, and ACT is disabled. This is because YUV pixel components generally have low correlation and most of the pixel energy (pixel energy) is stored in a single pixel component. ACT is not required to be enabled for YUV pixel components since further decorrelation action (de-correlation) of CU pixel components does not yield additional coding benefit.
In the intra-prediction enabled ACT module 212, while the encoding method 300 disables ACT, the encoding mode of intra-prediction enabled ACT is disabled and no prediction is output to the mode decision module 210 by the intra-prediction enabled ACT module 212.
In the inter-prediction enabled ACT module 204, when inter-prediction coding is disabled, the coding mode of inter-prediction enabled ACT is disabled, and the inter-prediction enabled ACT module 204 does not output prediction to the mode decision module 210.
In step 306, the rough mode decision is made with intra prediction enabled ACT. The summary mode decision may be a cost-based mode decision. For example, in a coarse mode decision, a selected coding mode that is low complexity cost may be decided upon to make the decision quickly, typically with the highest quality and lowest coding cost.
In step 310, rate distortion function mode decision (RDO mode decision) is performed in the ACT-enabled coding mode. Here, when ACT, CCP, transform, quantization, and entropy coding are performed, a variation (decoding) of an original video and a bit cost of a coding mode are calculated. The variation may be obtained by error calculation (error calculation), such as Mean Squared Error (MSE). Next, ROD analysis selects the coding mode with the lowest coding cost and the highest coding quality.
For example, in the intra-prediction enabled ACT module 212, 35 Intra Prediction Modes (IPMs) are available for encoding. The intra-prediction enable ACT module 212 selects the lowest coding cost and highest coding quality from the intra-prediction modes using a simple, low complexity coding cost determination in the rough mode determination of step 306. For example, the sum of absolute transformation error (SATD) cost may be used to determine the low complexity encoding cost of each IPM. For example, the selection of the lowest coding cost and the highest partial code quality may be to select 3 IPMs or to select 8 IPMs. The intra-prediction enabled ACT module 212 makes an RDO mode decision for each selected IPM in the RDO mode decision of step 310. When ACT, CCP, transform, quantization and entropy coding are performed, the variation and coding bit cost of the original video for each selected IPM is calculated. The variation may be obtained by error calculation (error calculation), such as Mean Squared Error (MSE). Then, an IPM with the lowest coding cost and the highest coding quality is selected from the selected IPMs by ROD analysis.
The variant flow described above with respect to the intra-prediction enable ACT module 212 may also be implemented in the inter-prediction enable ACT module 204. For example, when the inter-prediction enabled ACT module 204 performs the encoding method 300, a rough mode determination for optimal inter prediction of temporally neighboring video pictures is made, which provides the lowest encoding cost and the highest encoding quality, at step 306. In step 310, an RDO mode decision for inter prediction is made. Here, when ACT, CCP, conversion, quantization, and entropy coding are performed, a variation (derivation) and a coding bit cost of an original video for inter prediction are calculated. The variation may be obtained by error calculation (error calculation), such as Mean Squared Error (MSE). Then, ROD analysis selects the inter prediction with the lowest coding cost and the highest coding quality.
In step 312, the CU size of the currently processed CU is calculated. The size of a CU may be NxN, where N may be 4, 8, 16, 32, or 64. The value of CU N is compared to a threshold value T1. The threshold T1 may be 4, 8, 16, 32, or 64. Based on the comparison result, it is determined whether the CU size is smaller than a threshold T1, and thereby the size of the transform unit to enable the coding mode is estimated. If the CU size is smaller than the threshold T1, the flow proceeds to step 314 to determine the TU size (TU size determination). However, if the CU size is equal to or greater than the threshold T1, the flow proceeds to step 316, and the TU size determination step of step 314 is skipped. In step 312, when the CU size is larger than the threshold T1, the TU size for the CU may be larger than the threshold T1. If the CU size CU is equal to or greater than the critical value T1, a TU quadtree structure (quadtree structure) may be determined as the maximum possible TU size. For example, when the CU size is equal to or greater than the threshold T1, four TUs of 32x32 may be determined for a CU size of 64x 64. In another embodiment, when the CU size is equal to or greater than the threshold T1, TU may be the same size as CU for CU of 32x32, 16x16, 8x8, or 4x 4. For example, if the size of a CU is 32x32, the corresponding PU size may be 32x 32.
Since the determination of TU size takes time and increases coding cost, step 312 can improve coding time and efficiency. Therefore, if the determination of TU size can be omitted, the coding cost, i.e., time, can be saved. Furthermore, the CU size equal to or greater than the threshold T1 indicates that the content of the CU is not complex. For example, a CU size greater than the threshold T1 may indicate a wide area of a video image without boundaries, motion, or complex images. Therefore, the determination of TU size may not be needed to efficiently perform coding of a CU with high video quality.
In step 314, if the CU size is lower than the threshold T1, the TU size determination is performed. Here, TU of the original CU is determined. From the RDO cost estimate of step 310, the TU size is analyzed, and the ACT conversion of the CU with the highest efficiency and high video quality is obtained. For example, TU sizes of 4x4, 8x8, 16x16, and 32x32 may be analyzed. When the TU size for the ACT conversion that achieves the highest efficiency is determined, the TU size is selected for the ACT conversion of the CU and step 316 is entered. The TU size has been selected as the optimal TU quadtree structure size.
In step 316, a chroma mode decision (chroma mode decision) is made. The chroma mode is determined according to the prediction mode determination in step 310, and the chroma prediction (chroma prediction) is used to generate a chroma pu (chroma pu) and a corresponding chroma tu (chroma tu) according to the determined prediction mode (determined prediction mode). The TU determined from step 312 or step 314 may also be used to generate a chroma TU. The chroma TU also performs sub-sampling (subsample) according to a chroma format (chroma format). Thus, in one embodiment, when the chroma format is 4: 2: 0, the size of the luminance TU is 32x32, and the determined chrominance TU is 16x 16.
In step 308, the process of selecting the best intra prediction mode and selecting the best TU quadtree structure size by the intra prediction enabled ACT module is completed. Prediction and RDO costs are generated and input to the mode determination module 210 for comparison with the RDO costs input to the mode determination module 210 by other prediction modules. For example, inter-prediction enabled ACT module 204 may generate prediction and RDO costs for ACT-enabled CUs and input the predicted CU and RDO costs to mode determination module 210. The inter-prediction disable ACT module 206 and the intra-prediction disable ACT module 214 also generate predicted CU and RDO costs and input their respective predicted CU and RDO costs to the mode determination module 210. The mode determination module 210 compares the predicted CU and RDO costs input by the inter-prediction enabled ACT module 204, the inter-prediction disabled ACT module 206, the intra-prediction enabled ACT module 212, and the intra-prediction disabled ACT module 214, and determines the predicted CU to be input to the summing modules 216, 218.
Fig. 4 illustrates an encoding method 400 according to another embodiment of the present disclosure, which determines whether ACT needs to be enabled according to another embodiment of the present disclosure. More specifically, the encoding method 400 utilizes a threshold calculation (threshold calculation) for the CU size and a determination of the correlation of the color components of the CU pixels. ACT may be enabled or disabled based on the threshold calculation. Elements having the same reference numerals may be referred to in the foregoing description.
In step 304, component correlation analysis (component correlation analysis) is performed on the original CU to determine whether ACT needs to be enabled or disabled. Step 304 is as described for encoding method 300. If the correlation of the color components of the CU is high, ACT is enabled and flow proceeds to steps 306, 310, 314, 316, and 308 (as in encoding step 300 above). However, if the correlation is low, the flow proceeds to step 402.
In step 402, the size of the currently processed CU is determined. As previously mentioned, CU size is NxN, where N may be 4, 8, 16, 32, or 64. The value of N of CU is compared with a threshold T2 to determine whether the CU size is smaller than a threshold T2. The threshold T2 may be 4, 8, 16, 32, or 64. If the CU size is less than the threshold T2, ACT is enabled and flow proceeds to step 310 as determined by the RDO mode of step 310 of the encoding method 300. However, if the CU size is equal to or greater than the threshold T2, the flow proceeds to step 308 and ACT is disabled.
In the inter-prediction enabled ACT module 204, when ACT is disabled in the encoding method 400, the output of the inter-prediction enabled ACT module 204 is an inter-prediction CU to which ACT is not applied. Thus, in this case, the CU output by the inter-prediction enabled ACT module 204 is the same as the output of the inter-prediction disabled ACT module 206. Likewise, in the intra-prediction enabled ACT module 212, when ACT is disabled in the encoding method 400, the output of the intra-prediction enabled ACT module 212 is an intra-prediction CU to which ACT is not applied. Thus, in this case, the output CU of the intra-prediction enabled ACT module 212 is the same as the output of the intra-prediction disabled ACT module 214.
Since the CU size is the same or larger than the threshold T2, indicating that the content of the CU is not complex, step 402 can improve the encoding time and efficiency. CU sizes larger than the critical value T2 may indicate a wide area of the video image without boundaries, dynamic or complex images. In combining already sufficiently decorrelated color components, ACT may not be needed in order to efficiently encode a CU.
Fig. 5 illustrates an encoding method 500 according to another embodiment of the present disclosure, which determines whether ACT needs to be enabled and whether TU size estimation needs to be performed by two threshold calculations according to another embodiment of the present disclosure. More specifically, the encoding method 500 uses a first threshold calculation (first threshold calculation) for the CU size and a correlation decision of the CU pixel color components to determine whether ACT is to be enabled. The encoding method 500 also uses second threshold calculation (second threshold calculation) for CU size to determine whether to perform the estimation of TU size. Elements having the same reference numerals may be referred to in the foregoing description.
In step 304, component correlation analysis (component correlation analysis) is performed on the original CU to determine whether ACT needs to be enabled or disabled. Step 304 is as described for encoding method 300. If the correlation of the color components of the CU is high, ACT is enabled and flow proceeds to step 306 for coarse mode determination and RDO mode determination at step 310. Steps 306 and 310 are as described above for encoding method 300. However, if the correlation is low, the flow proceeds to step 402.
In step 402, the size of the CU currently being processed is determined (as described above with respect to the encoding method 400 of fig. 4). If the CU size is less than threshold T2, ACT is enabled and step 310 is entered for RDO mode determination. However, if the CU size is equal to or greater than the threshold T2, the flow proceeds to step 308 and ACT is disabled.
In the inter-prediction enabled ACT module 204, when ACT is disabled in the encoding method 500, the output of the inter-prediction enabled ACT module 204 is an inter-prediction CU to which ACT is not applied. Thus, in this case, the CU output by the inter-prediction enabled ACT module 204 is the same as the output of the inter-prediction disabled ACT module 206.
Likewise, in the intra-prediction enabled ACT module 212, when ACT is disabled in the encoding method 500, the output of the intra-prediction enabled ACT module 212 is an intra-prediction CU to which ACT is not applied. Thus, in this case, the output CU of the intra-prediction enabled ACT module 212 is the same as the output of the intra-prediction disabled ACT module 214.
In step 310, the RDO mode is determined as described in the encoding method 300.
In step 312, the currently processed CU size is calculated as described in the encoding method 300 to determine whether the CU size is smaller than the threshold T1. If the CU size is smaller than the threshold T1, the flow proceeds to step 314 to perform TU size determination. However, if the CU size is equal to or greater than the threshold T1, the flow proceeds to step 316, and the TU size determination of step 314 is skipped. The determination in steps 314 and 316 is similar to the encoding method 300 described above.
The thresholds T1 and T2 may be set to the same or different values.
The encoding method 500 of fig. 5 incorporates threshold calculations to improve coding efficiency and time. As described above, the CU size being equal to or greater than the threshold T2 indicates that the content of the CU is not complex and a borderless, dynamic or complex pattern of a wide area can be expected. In combining already sufficiently decorrelated color components, ACT may not be needed in order to efficiently encode a CU. Moreover, the size determination of TU in step 314 is omitted, which can save coding cost.
Fig. 6 illustrates an encoding method 600 (similar to the encoding method 300) according to another embodiment of the disclosure, which determines whether TU size estimation needs to be performed in an ACT-enabled intra prediction procedure according to another embodiment of the disclosure. More specifically, the method 600 uses a threshold calculation (threshold calculation) for CU size and determines whether TU size estimation needs to be performed based on the threshold calculation.
In step 304, component correlation analysis (component correlation analysis) is performed on the original CU to determine whether ACT needs to be enabled or disabled. Step 304 is as described for encoding method 300. If the correlation of the color components of the CU is high, ACT is enabled and flow proceeds to step 306 for coarse mode determination and RDO mode determination at step 310. Steps 306 and 310 are as described above for encoding method 300. However, if the correlation is low in step 304 or the color space is determined to be YUV, the encoding mode of ACT is enabled and step 310 is directly entered, but the coarse mode determination in step 306 is not performed. Here, ACT is still enabled for low correlation pixel components or YUV color space to confirm that decorrelation of pixel components may yield additional coding benefits.
In step 310, the RDO mode decision is calculated as described above for the encoding method 300.
In step 312, the currently processed CU size is calculated as described in the encoding method 300 to determine whether the CU size is smaller than the threshold T1. If the CU size is smaller than the threshold T1, the flow proceeds to step 314 to perform TU size determination. However, if the CU size is equal to or greater than the threshold T1, the flow proceeds to step 316, and the TU size determination of step 314 is skipped. The determination in steps 314 and 316 is similar to the encoding method 300 described above.
The thresholds T1 and T2 may be set to the same or different values.
A decoding method that performs the inverse of the steps of the encoding methods 300, 400, 500, 600 may efficiently decode video encoded by the encoding methods 300, 400, 500, 600. Thus, the foregoing of the present disclosure is sufficient for understanding the decoding method that performs the inverse steps of the encoding method 300, 400, 500, 600. The disclosure above is also sufficient to understand other decoding procedures required to decode the video encoded by the encoding methods 300, 400, 500, 600.
If a large CU uses IPM as screen visual content, it is possible that the content representing the region is not complicated and it is not necessary to estimate the size of TU. Therefore, IPM of non-444 chroma format is prohibited from TU partitioning of partial large CU. Fig. 7 illustrates the calculation flow of IPM in non-444 chroma format. Steps 306 and 310 are as described above for encoding method 300. In step 310, the RDO mode decision is calculated as described above for the encoding method 300.
In step 311, it is determined whether the chroma format is non-444. If the chroma format is not 444, step 312 is entered. If the chroma grid is not non-444, step 314 is entered for a near row TU size determination.
In step 312, the currently processed CU size is calculated as described in the encoding method 300 to determine whether the CU size is smaller than the threshold T1. If the CU size is smaller than the threshold T1, the flow proceeds to step 314 to perform TU size determination. However, if the CU size is equal to or greater than the threshold T1, the flow proceeds to step 316, and the TU size determination of step 314 is skipped. The determination in steps 314 and 316 is similar to the encoding method 300 described above.
The thresholds T1 and T2 may be set to the same or different values.
Fig. 8 illustrates a system 700 for performing the encoding and decoding methods of the present disclosure. The system 700 includes a non-transitory computer-readable medium 702, which may be a memory that stores arrays of instructions. Such instructions may be executed by processor 704. It is noted that one or more non-transitory computer-readable media 702 and/or one or more processors 704 may be selectively employed to perform the encoding and decoding methods of the present disclosure.
The non-transitory computer-readable medium 702 may be any type of non-transitory computer-readable recording medium (non-transitory CRM). The non-transitory computer readable recording medium may include a flexible disk (floppy disk), a flexible disk (flexible disk), a hard disk (hard disk), a hard drive (hard drive), a solid state drive (solid state drive), a magnetic tape (magnetic tape), any magnetic data storage medium (magnetic data storage medium), a compact disc drive (CD-ROM), any optical data storage medium (optical data storage medium), any physical medium with a hole pattern, a dynamic Random Access Memory (RAM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a FLASH-EPROM, any FLASH memory, a non-volatile memory (NVRAM), a cache (cache), a register (register), a memory chip (memory), a film (cartridge), and a network. The computer readable recording medium may store an array of instructions for execution by at least one processor. The instructions include instructions for causing a processor to perform the steps or stages of the encoding and decoding methods of the present disclosure. Furthermore, one or more computer-readable recording media may be used to implement the encoding and decoding methods of the present disclosure. The "computer readable recording medium" contains the tangible object but does not contain the carrier signal and the transient signal.
The processor 704 may be any form of Digital Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), programmable logic array (FPGA), controller (controller), microcontroller (micro-controller), microprocessor (micro-processor), computer, or any other electronic component capable of performing the encoding and decoding methods of the present disclosure.
Results of the experiment
Experimental results of the encoding method of the present disclosure are described below.
The laboratory here employs SCM 4.0 under Common Test Conditions (CTCs) of the HEVC SCC reference model. The coding performance of the coding method of the present disclosure is compared to the reference model of HEVC. The HEVC reference model takes coding time a to encode. The test encoding method of the present disclosure takes encoding time B to encode. The coding time percentage is coding time B divided by coding time a. The experiment may employ the HEVC general test flow. The video may be mixed with text, images, moving pictures, mixed content, animation, camera-extracted content. The video may be an RGB color space and a YUV color space having an image quality of 720p, 1080p, or 1440 p. The experiments used full intra prediction, random access and low B prediction (low-B prediction) under lossy conditions (lossylation). Full intra prediction uses information within a picture currently being compressed to compress the video picture, while random access and low B prediction use information of a previously coded picture and the picture currently being compressed to compress the video picture. In the following description, low B prediction may be referred to as low delay B prediction (low delay B prediction). In each experiment, the encoding time and the decoding time were recorded as percentages indicating the ratio of the encoding method to the decoding method relative to the reference model. With respect to the original video source, a positive percentage for each G/Y, B/U and R/V component represents a bit rate coding loss (bitrate coding gain) and a negative percentage represents a bitrate coding gain (bitrate coding gain). For example, a 0.1% value for the G/Y component represents a coding loss of 0.1% for the G/Y component of the encoded video relative to the G/Y component of the original video. In another example, the-0.1% value of the G/Y component indicates that the coding gain of the G/Y component of the coded video relative to the G/Y component of the original video is 0.1%.
Please refer to the encoding method 500 of fig. 5 and table 1 below. In the encoding method 500, the laboratory is run under the following three settings. In setting one, both threshold T2 and threshold T1 are set to 64. At setting two, threshold T2 is set to 64 and threshold T1 is set to 32. At setting three, threshold T2 is set to 64 and threshold T1 is set to 16. Intra-prediction is a predetermined coding mode.
When the pixel component has low correlation, a CU having a size of 64x64 or more is encoded without enabling ACT. CUs of size less than 64x64 are encoded with ACT enabled. If the CU size is larger than 64x64, step 314 of determining the TU size is omitted. For the case that the CU size is smaller than 64x64, the TU size decision step 314 is performed.
In setting two, when the pixel components have low correlation, CUs having a size greater than or equal to 64x64 are encoded in a manner that does not enable ACT. CUs of size less than 64x64 are encoded with ACT enabled. If the CU size is larger than 32x32, step 314 of determining the TU size is omitted. For the case that the CU size is smaller than 32x32, the TU size decision step 314 is performed.
At setting three, when the pixel components have low correlation, CUs having a size greater than or equal to 64x64 are encoded in a manner that does not enable ACT. CUs of size less than 64x64 are encoded with ACT enabled. If the CU size is larger than 16 × 16, step 314 of determining the TU size is omitted. For the case that the CU size is smaller than 16x16, the TU size decision step 314 is performed.
Figure GDA0001970712540000201
Figure GDA0001970712540000211
TABLE 1
As shown in table 1, the coding performance is improved for setting one, setting two, and setting three. Setting one reduces the coding complexity by 3%, and setting two reduces the coding complexity by 6%. Setting three reduces the coding complexity by 9% (setting three reduces the most). Therefore, all settings can improve coding efficiency. Each setting improves both the coding time and efficiency at a minimum loss of bit rate (minimum loss of bit rate).
Please refer to the encoding method 500 and tables 2 and 3 below. Here, the experiment is performed in a full frame, random access and low delay b (lowdelay b). In experiment one, both threshold T2 and threshold T1 were set to 32. In experiment two, both threshold T2 and threshold T1 were set to 16. As with the encoding method 500, in experiment one, CUs with a size greater than or equal to 32x32 disable TU estimation (TU evaluation), and CUs with a size greater than or equal to 32x32 are encoded without ACT enabled. In experiment two, CUs of size greater than or equal to 16x16 disable TU estimation, and CUs of size greater than or equal to 16x16 are encoded in a manner that ACT is not enabled. CUs of size less than 16x16 are encoded with ACT enabled. The experiments were performed under lossy conditions (loss conditions) and full frame intra block copy (full frame intra block copy).
Figure GDA0001970712540000221
Figure GDA0001970712540000231
Figure GDA0001970712540000232
Figure GDA0001970712540000241
TABLE 2
As described in table 2, in experiment one, the full intra mode (all intra mode) reduces the coding complexity by 5%. Random access and low latency B each reduce the coding complexity by 1%. The individual settings show very low bit rate loss with little bit rate change for full frames and random access.
In experiment two, the full intra mode reduces the coding complexity by 8%. Random access reduces the coding complexity by 1%. The low delay B does not change the coding complexity. Each mode has more bit rate loss than experiment one, but the bit rate loss is still kept to a minimum (only in the fractional range of a percentage). The coded video only slightly reduces the bitrate compared to the original video, so only a small part of the video quality is lost. Such video quality is acceptable for most applications because the encoding method 500 improves encoding time.
Figure GDA0001970712540000242
Figure GDA0001970712540000251
Figure GDA0001970712540000261
Figure GDA0001970712540000262
Figure GDA0001970712540000271
Figure GDA0001970712540000281
TABLE 3
As shown in table 3, in both experiment one and experiment two, the contrast ratio was not changed in all or on average in each mode. The maximum rate of coding complexity was reduced over the full frame (1% reduction in each experiment).
Please refer to the encoding method 500 of fig. 5 and table 4 below. Here, the experiments were performed under lossy conditions (loss conditions), 4-CTU Intra block copy (Intra block copy), and 4: 4: in 4-chroma mode. Intra block copy techniques copy a block from a previously coded CU to the currently coded video picture using motion vectors. The 4-CTU indicates the range that the motion vector can search.
In experiment one, both threshold T2 and threshold T1 were set to 32. In experiment two, both threshold T2 and threshold T1 were set to 16. As with the encoding method 500, in experiment one, CUs with a size greater than or equal to 32x32 disable TU estimation. In experiment two, CUs with a size greater than or equal to 16x16 disable TU estimation. In experiment one, a CU with a size greater than 32x32 enables ACT, and a CU with a size greater than or equal to 32x32 disables ACT. In experiment two, CUs with a size less than 16x16 have ACT enabled, and CUs with a size greater than or equal to 16x16 have ACT disabled.
Figure GDA0001970712540000282
Figure GDA0001970712540000291
Figure GDA0001970712540000301
Figure GDA0001970712540000302
Figure GDA0001970712540000311
TABLE 4
As shown in table 4, in both experiment one and experiment two, the full intra, random access or low latency B mode is the minimum bit rate change. The most coding complexity was reduced within the full frame, which was reduced by 5% in experiment one and 8% in experiment two.
Please refer to the encoding method 400 of fig. 4 and the following tables 5.1 and 5.2. Here, the threshold value T2 is set to 64. Therefore, when the component correlation analysis of step 304 analyzes that the color components of the CU have low correlation, step 402 is performed to determine whether the CU size is less than 64x 64. If the CU size is less than 64x64, ACT is enabled and the RDO mode determination of step 310 is performed. If the CU size is greater than or equal to 64x64, ACT is disabled and step 308 is entered. In the first experiment, a lossy full intra coding mode (lossy all intra coding mode) of a full frame intra block copy technology (full frame intra block copy) is adopted, and in the second experiment, a lossy full intra coding mode of a 4CTU IBC technology is adopted. The chroma mode was chosen for each experiment as 4: 4: 4.
Figure GDA0001970712540000312
table 5.1: experiment one
Figure GDA0001970712540000313
Table 5.2: experiment two
As shown in table 5.1, under the YUV color space and full-frame, lossy, full-picture intra block copy technique, the encoding method 400 reduces the encoding time by 1% to 3% with minimal bit rate loss. As shown in table 5.2, under the full intra, lossy, 4CTU intra block copy technique, the rate of coding time reduction of the coding method 400 with minimum bit loss is similar to the first experiment of table 5.1.
Please refer to the encoding method 400 and table 6 below. Here, the threshold value T2 is set to 64. In the following step 4: 4: the chroma mode of 4 performs Lossless intra encoding (Lossless intra encoding).
Figure GDA0001970712540000321
TABLE 6
In the YUV color space, the coding method saves 0 to 2 percent of coding time.
Please refer to the encoding method 300 of fig. 3 and table 7 below. Here, the threshold T1 is set to 32 in the first experiment and 16 in the second experiment. As with the encoding method 300, in experiment one, when the CU size is greater than or equal to 32x32, the TU size determination of step 314 will be omitted; if the CU size is smaller than 32x32, the TU size determination of step 314 is performed. In experiment two, when the CU size is greater than or equal to 16 × 16, the TU size determination of step 314 will be omitted; if the CU size is smaller than 16x16, the TU size determination of step 314 is performed. Experiments were performed with ACT enabled lossy full intra coding.
Figure GDA0001970712540000322
Figure GDA0001970712540000331
TABLE 7
The coding time for experiment one was saved by 3% to 6%. The coding time for experiment two was saved by 6% to 10%. Therefore, TU size decision is allowed only below CU size below 32x32 or 16x16 to help coding efficiency.
The foregoing is illustrative of the disclosed technology and is not to be construed as limiting thereof. Modifications and adaptations of the embodiments are within the scope of the present disclosure. For example, the disclosed embodiments include software and hardware, but the systems and methods of the present disclosure may be implemented solely in hardware.
A software developer may develop a computer program based on the methods of the present disclosure, which may be developed using various computer program technologies. For example, program segments or program modules may be developed in Java, C + +, a combination language, or any other programming language. One or more of the software segments and modules may be installed on a computer system, non-transitory computer readable medium, or existing communications software.
Moreover, although various embodiments have been described above, the scope of the present disclosure includes equivalents, modifications, omissions, combinations (e.g., of various embodiments), uses, or alternatives of the various elements. The elements of the claims are to be interpreted in the broadest sense and not as a limitation of the embodiment. In addition, the steps of the methods may be modified (including adjusting the order, inserting or deleting steps). While the present disclosure has been described with reference to the preferred embodiments, it is not intended to be limited thereto. The scope of the disclosure is to be determined by the claims appended hereto.
Other embodiments will be apparent to those skilled in the art from consideration of the specification. The scope of the present disclosure includes various modifications, implementations, and applications in conjunction with the general knowledge. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims.

Claims (14)

1. A video encoding method, comprising:
receiving an original video picture;
dividing the original video picture into coding tree units;
determining coding units from the coding tree unit;
a coding mode that enables adaptive color conversion for the coding unit, wherein the coding unit has a size of NxN;
judging whether N is smaller than a first critical value;
when N is smaller than the first critical value, determining the size of a conversion unit and determining the chroma mode of the coding unit; and
when N is not less than the first threshold, the size of the transform unit is not determined and the chroma mode of the coding unit is determined.
2. The method of claim 1, further comprising:
judging the color space of the coding unit;
wherein determining the color space of the coding unit comprises determining whether the color space is a red, green, and blue color space or a luminance and chrominance color space.
3. The method of claim 2, further comprising:
if the encoding mode that is enabled is the intra prediction mode that enables adaptive color conversion, cost mode determination is performed when the color space is determined to be the red, green, and blue color spaces.
4. The method of claim 2, further comprising:
if the encoding mode that is enabled is the intra prediction mode that enables adaptive color conversion, a cost mode determination is performed when the color space is determined to be luma and chroma color spaces.
5. The method of claim 2, further comprising:
judging whether N is smaller than a second critical value; and
and when the color space is judged to be the brightness color space and the chrominance color space and N is larger than or equal to the second critical value, the coding mode of the coding unit is forbidden.
6. The method of claim 2, further comprising:
judging whether N is smaller than the second critical value; and
when the color space is determined to be the luma and chroma color spaces and N is less than the second threshold, the encoding mode is enabled, which enables adaptive color conversion.
7. The method of claim 2, further comprising:
judging whether N is greater than or equal to a second critical value;
when the color space is determined to be the luma and chroma color spaces and N is not greater than or equal to the second threshold, the encoding mode is enabled, which enables adaptive color conversion.
8. The method of claim 1, further comprising:
if the original video frame is not 444 and N is less than the first threshold, the size of the transform unit is evaluated.
9. A video encoding system, comprising:
a memory to store a set of instructions; and
a processor configured to execute the set of instructions, the set of instructions comprising:
receiving an original video picture;
dividing the original video picture into coding tree units;
determining coding units from the coding tree unit;
a coding mode that enables adaptive color conversion for the coding unit, wherein the coding unit has a size of NxN;
judging whether N is smaller than a first critical value;
when N is smaller than the first critical value, determining the size of a conversion unit and determining the chroma mode of the coding unit; and
when N is not less than the first threshold, the size of the transform unit is not determined and the chroma mode of the coding unit is determined.
10. The system of claim 9, wherein the set of instructions to be executed by the processor further comprises:
judging the color space of the coding unit;
wherein the color space is determined whether it is a red, green, and blue color space or a luminance and chrominance color space.
11. The system of claim 10, wherein the set of instructions to be executed by the processor further comprises:
judging whether N is smaller than a second critical value;
when the color space is determined to be the luma and chroma color spaces and N is less than the second threshold, the encoding mode is enabled, which enables adaptive color conversion.
12. The system of claim 10, wherein the set of instructions to be processed by the processor further comprises:
judging whether N is greater than or equal to a second critical value;
when the color space is determined to be the luma and chroma color spaces and N is not greater than or equal to the second threshold, the encoding mode is enabled, which enables adaptive color conversion.
13. The system of claim 9, wherein the set of instructions to be processed by the processor further comprises:
if the original video frame is not 444 and N is less than the first threshold, the size of the transform unit is evaluated.
14. A non-transitory computer-readable recording medium storing a set of instructions for execution by one or more processors to perform a video encoding method, wherein the video encoding method comprises:
receiving an original video picture;
dividing the original video picture into coding tree units;
determining coding units from the coding tree unit;
a coding mode that enables adaptive color conversion for the coding unit, wherein the coding unit has a size of NxN;
judging whether N is smaller than a first critical value;
when N is smaller than the first critical value, determining the size of a conversion unit and determining the chroma mode of the coding unit; and
when N is not less than the first threshold, the size of the transform unit is not determined and the chroma mode of the coding unit is determined.
CN201610357374.8A 2015-06-08 2016-05-26 Video encoding method, system and computer-readable recording medium using adaptive color conversion Active CN106254870B (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201562172256P 2015-06-08 2015-06-08
US62/172,256 2015-06-08
US14/757,556 US20160360205A1 (en) 2015-06-08 2015-12-24 Video encoding methods and systems using adaptive color transform
US14/757,556 2015-12-24
US201662290992P 2016-02-04 2016-02-04
US62/290,992 2016-02-04
TW105114323 2016-05-09
TW105114323A TWI597977B (en) 2015-06-08 2016-05-09 Video encoding methods and systems using adaptive color transform

Publications (2)

Publication Number Publication Date
CN106254870A CN106254870A (en) 2016-12-21
CN106254870B true CN106254870B (en) 2020-08-18

Family

ID=57626642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610357374.8A Active CN106254870B (en) 2015-06-08 2016-05-26 Video encoding method, system and computer-readable recording medium using adaptive color conversion

Country Status (1)

Country Link
CN (1) CN106254870B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106851272B (en) * 2017-01-20 2019-11-12 杭州当虹科技股份有限公司 A kind of method of HDR and SDR adaptive rate control
US10820017B2 (en) * 2017-03-15 2020-10-27 Mediatek Inc. Method and apparatus of video coding
WO2019076138A1 (en) 2017-10-16 2019-04-25 Huawei Technologies Co., Ltd. Encoding method and apparatus
CN108174214A (en) * 2017-12-08 2018-06-15 重庆邮电大学 A kind of remote table sharing method based on screen content Video coding
JP7121133B2 (en) 2018-02-23 2022-08-17 華為技術有限公司 Position-dependent Spatial Variation Transforms for Video Coding
KR102532021B1 (en) 2018-05-31 2023-05-12 후아웨이 테크놀러지 컴퍼니 리미티드 Spatial adaptive transform of adaptive transform type
WO2020253861A1 (en) 2019-06-21 2020-12-24 Beijing Bytedance Network Technology Co., Ltd. Adaptive in-loop color-space transform for video coding
AU2020321002B2 (en) * 2019-07-26 2023-06-01 Beijing Bytedance Network Technology Co., Ltd. Determination of picture partition mode based on block size
CN117336478A (en) 2019-11-07 2024-01-02 抖音视界有限公司 Quantization characteristics of adaptive intra-annular color space transform for video codec
TWI743919B (en) * 2020-08-03 2021-10-21 緯創資通股份有限公司 Video processing apparatus and processing method of video stream

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104126303B (en) * 2011-11-29 2018-03-06 华为技术有限公司 Unified segmenting structure and Signalling method for high efficiency video coding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AHG6: On Adaptive Color Transform (ACT) in SCM2.0;PoLin Lai, Shan Liu, Shawmin Lei;《19.JCT-VC MEETING》;20141008;摘要,第2-4小节 *
High Efficiency Video Coding (HEVC) Test Model 16 (HM 16) Encoder Description;K. McCann;《18.JCT-VC MEETING》;20141014;4.1.3小节,4.1.5小节,图4-4,图4-6 *
Screen content coding test model 2 (SCM 2);Rajan Joshi;《17.JCT-VC MEETING》;20141017;全文 *

Also Published As

Publication number Publication date
CN106254870A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106254870B (en) Video encoding method, system and computer-readable recording medium using adaptive color conversion
US10390020B2 (en) Video encoding methods and systems using adaptive color transform
TWI621353B (en) Video decoding method
US20160360205A1 (en) Video encoding methods and systems using adaptive color transform
EP2767087B1 (en) Sample adaptive offset merged with adaptive loop filter in video coding
CN113812148A (en) Reference picture resampling and inter-coding tools for video coding
KR101981905B1 (en) Encoding method and device, decoding method and device, and computer-readable storage medium
JP2022538061A (en) Combined inter and intra prediction modes for video coding
WO2013049309A1 (en) Coefficient coding for sample adaptive offset and adaptive loop filter
JP2007053561A (en) Device and method for encoding image
CN113545051B (en) Reconstruction of blocks of video data using block size restriction
JP7317973B2 (en) IMAGE PREDICTION METHOD, DEVICE AND SYSTEM, APPARATUS AND STORAGE MEDIUM
AU2020301234A1 (en) Nonlinear extensions of adaptive loop filtering for video coding
CN114208199A (en) Chroma intra prediction unit for video coding
CN114598873B (en) Decoding method and device for quantization parameter
US20170195675A1 (en) Apparatus and method for performing rate-distortion optimization based on hadamard-quantization cost
EP3104606B1 (en) Video encoding methods and systems using adaptive color transform
WO2015142618A1 (en) Systems and methods for low complexity encoding and background detection
TWI597977B (en) Video encoding methods and systems using adaptive color transform
US10205952B2 (en) Method and apparatus for inter color component prediction
EP3462744A1 (en) Method and apparatus for encoding a picture block
KR20230123947A (en) Adaptive loop filter with fixed filters
EP2899975A1 (en) Video encoder with intra-prediction pre-processing and methods for use therewith
KR102414164B1 (en) A method of video processing providing high-throughput arithmetic coding and a method and appratus for decoding and encoding video using the processing.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant