WO2024152957A1 - Multiple block vectors for intra template matching prediction - Google Patents
Multiple block vectors for intra template matching prediction Download PDFInfo
- Publication number
- WO2024152957A1 WO2024152957A1 PCT/CN2024/071524 CN2024071524W WO2024152957A1 WO 2024152957 A1 WO2024152957 A1 WO 2024152957A1 CN 2024071524 W CN2024071524 W CN 2024071524W WO 2024152957 A1 WO2024152957 A1 WO 2024152957A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- current
- search
- neighboring
- vector
- Prior art date
Links
- 239000013598 vector Substances 0.000 title claims abstract description 146
- 238000000034 method Methods 0.000 claims abstract description 64
- 230000004927 fusion Effects 0.000 claims description 14
- 230000011664 signaling Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 description 26
- 230000015654 memory Effects 0.000 description 22
- 238000012545 processing Methods 0.000 description 19
- 230000000875 corresponding effect Effects 0.000 description 11
- 238000013139 quantization Methods 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 238000000638 solvent extraction Methods 0.000 description 5
- 238000009499 grossing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 101150114515 CTBS gene Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 239000002355 dual-layer Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
Definitions
- the present disclosure relates generally to video coding.
- the present disclosure relates to methods of coding pixel blocks by intra template matching prediction (IntraTMP) .
- IntraTMP intra template matching prediction
- High-Efficiency Video Coding is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) .
- JCT-VC Joint Collaborative Team on Video Coding
- HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture.
- the basic unit for compression termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached.
- Each CU contains one or multiple prediction units (PUs) .
- VVC Versatile video coding
- JVET Joint Video Expert Team
- the input video signal is predicted from the reconstructed signal, which is derived from the coded picture regions.
- the prediction residual signal is processed by a block transform.
- the transform coefficients are quantized and entropy coded together with other side information in the bitstream.
- the reconstructed signal is generated from the prediction signal and the reconstructed residual signal after inverse transform on the de-quantized transform coefficients.
- the reconstructed signal is further processed by in-loop filtering for removing coding artifacts.
- the decoded pictures are stored in the frame buffer for predicting the future pictures in the input video signal.
- a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) .
- the leaf nodes of a coding tree correspond to the coding units (CUs) .
- a coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order.
- a bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block.
- a predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block.
- An intra (I) slice is decoded using intra prediction only.
- a CTU can be partitioned into one or multiple non-overlapped coding units (CUs) using the quadtree (QT) with nested multi-type-tree (MTT) structure to adapt to various local motion and texture characteristics.
- a CU can be further split into smaller CUs using one of the five split types: quad-tree partitioning, vertical binary tree partitioning, horizontal binary tree partitioning, vertical center-side triple-tree partitioning, horizontal center-side triple-tree partitioning.
- Each CU contains one or more prediction units (PUs) .
- the prediction unit together with the associated CU syntax, works as a basic unit for signaling the predictor information.
- the specified prediction process is employed to predict the values of the associated pixel samples inside the PU.
- Each CU may contain one or more transform units (TUs) for representing the prediction residual blocks.
- a transform unit (TU) is comprised of a transform block (TB) of luma samples and two corresponding transform blocks of chroma samples and each TB correspond to one residual block of samples from one color component.
- An integer transform is applied to a transform block.
- the level values of quantized coefficients together with other side information are entropy coded in the bitstream.
- coding tree block CB
- CB coding block
- PB prediction block
- TB transform block
- motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation.
- the motion parameter can be signalled in an explicit or implicit manner.
- a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index.
- a merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC.
- the merge mode can be applied to any inter-predicted CU.
- the alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
- Some embodiments of the disclosure provide a method for coding a block of pixels by referencing the current picture.
- the video coder may signal or receive an index identifying a search region from multiple search regions in the current picture.
- the video coder derives one or more block vectors by computing matching costs at searching positions within the identified search region.
- the video coder uses first and second block vectors to identify first and second reference blocks respectively in the current picture for a current block.
- the video coder generates a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture.
- the video coder encodes or decodes the current block by using the generated predictor.
- the video coder stores the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block.
- the plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap.
- the signaled index identifies two or more search regions in the plurality of search regions.
- the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector. (A neighboring block vector is a block vector that was used to code a neighboring block of the current block. )
- the video coder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions.
- the video coder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
- the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position.
- the search positions are identified based on the neighboring block vector.
- the starting position (s) of search are identified based on the neighboring block vector.
- the predictor may be a fusion of the first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors.
- the first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector.
- the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors. A coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
- FIG. 1 conceptually illustrates current picture referencing (CPR) .
- FIG. 2 conceptually illustrates search regions for intra template matching.
- FIG. 3 shows the use of a block vector of a neighboring block for a current block.
- FIG. 4 illustrates multiple search regions for searching the best Intra Template Matching Prediction (IntraTMP) predictor.
- IntraTMP Intra Template Matching Prediction
- FIG. 5 illustrates different indices being assigned to multiple different search regions of the current block for IntraTMP mode.
- FIG. 6 illustrates fusing multiple candidate predictors into one IntraTMP predictor for the current block.
- FIG. 7 illustrates search regions that are assigned indices reordered according to distances from a neighboring block vector (BV) .
- BV neighboring block vector
- FIG. 8 illustrates a searching start point that is determined by a neighboring BV.
- FIG. 9 conceptually illustrates a buffer storage storing multiple to-be-inserted candidates.
- FIG. 10 illustrates an example video encoder that may implement intra block copy (IBC) or current picture referencing (CPR) .
- IBC intra block copy
- CPR current picture referencing
- FIG. 11 illustrates portions of the video encoder that implement search and inheritance of multiple block vectors.
- FIG. 12 conceptually illustrates an encoding process for implementing inheritance of multiple block vectors.
- FIG. 13 illustrates an example video decoder that may implement intra block copy (IBC) or current picture referencing (CPR) .
- IBC intra block copy
- CPR current picture referencing
- FIG. 14 illustrates portions of the video decoder that implement search and inheritance of multiple block vectors.
- FIG. 15 conceptually illustrates a decoding process for implementing inheritance of multiple block vectors.
- FIG. 16 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.
- Motion Compensation is a video coding process that explores the pixel correlation between adjacent pictures. It is generally assumed that in a video sequence the patterns corresponding to objects or background in a frame are displaced to form corresponding objects on the subsequent frame or correlated with other patterns within the current frame. With the estimation of such a displacement (e.g., using block matching techniques) , the pattern could be mostly reproduced without needing to re-code the pattern. Block matching and copy allows selecting the reference block from within the same picture, but it is observed to be not as efficient when applied to camera captured videos. Part of the reasons is that textual pattern in a spatial neighboring area may be similar to the current coding block but usually with some gradual changes over space. It is therefore less likely for a block to find a good match within the same picture of a camera captured video, thereby limiting the improvement in coding performance.
- IBC intra block copy
- CPR current picture referencing
- FIG. 1 conceptually illustrates current picture referencing (CPR) .
- a prediction unit as a current block 110 is predicted from a previously reconstructed block 130 within the same picture 100.
- a displacement vector 120 (called block vector or BV) is used to signal the relative displacement from the position of the current block to that of the reference block, which provides the reference samples used for generating a predictor of the current block.
- the prediction errors are then coded using transformation, quantization and entropy coding.
- the reference samples may correspond to the reconstructed samples of the current decoded picture prior to in-loop filter operations, both deblocking and sample adaptive offset (SAO) filters.
- SAO sample adaptive offset
- Intra template matching prediction is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches a template (current template) neighboring the block being currently coded (current block) .
- the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.
- FIG. 2 conceptually illustrates search regions for intra template matching.
- the search for a matching L-shape can be conducted in several predefined search regions: R1, which is the current CTU; R2, which is the top-left CTU; R3 which is the above CTU; and R4, which is the left CTU.
- Sum of absolute differences is used as a cost function, such that within each search region, the decoder searches for the reference template 230 that has the least SAD with respect to a current template 220 neighboring the current block 210.
- the corresponding block 235 of the reference template 230 is used as a prediction block or reference block.
- ‘a’ is a constant that controls the gain/complexity trade-off. In some embodiments, ‘a’ is set to 5.
- the search range of all search regions is subsampled by a factor of 2. This leads to a reduction of template matching search by 4.
- a refinement process is performed. The refinement is done via a second template matching search around the best match with a reduced range.
- the reduced range is defined as min (BlkW, BlkH) /2.
- intra template matching mode is enabled for CUs with size less than or equal to 64 in width and height. This maximum CU size for Intra template matching is configurable.
- the Intra template matching prediction mode is signaled at CU level through a dedicated flag when decoder-side intra mode derivation (DIMD) is not used for current CU.
- DIMD decoder-side intra mode derivation
- the block vector (BV) derived from the intra template matching prediction (IntraTMP) is used for intra block copy (IBC) prediction mode.
- IntraTMP block vectors are added to IBC block vector candidate list as spatial candidates.
- An IntraTMP block vector may be stored in an IBC block vector buffer and the current IBC block can use both its own IBC BV and the IntraTMP BV of neighboring blocks as BV candidate for IBC BV candidate list.
- the stored IntraTMP BV of the neighboring blocks along with IBC BV may be used as spatial BV candidates in IBC candidate list construction.
- FIG. 3 shows the use of a block vector of a neighboring block for a current block.
- a current block 310 in a current picture 300 is coded by IBC, while a neighboring block 320 of the current block is coded by IntraTMP.
- the IntraTMP coded neighboring block 320 has a BV 325 that is derived by template matching to reference a reference block 330.
- the BV 325 can be inherited by the current block 310 as a neighboring BV to be part of its IBC candidate list.
- IntraTMP uses multiple search regions (or ranges) for searching the best IntraTMP predictor. (These search regions or ranges may be partitions of a larger pre-defined area. ) These search regions may be non-overlapping, or some of the search regions may overlap.
- FIG. 4 illustrates multiple search regions for searching the best IntraTMP predictor.
- the current block 410 is coded by IntraTMP by searching for a best match for the current block’s neighboring L-shaped template 415, in multiple search regions 421-426 (labeled as the 1 st search regions through the 6 th search region) of the current picture 400.
- an index may be signaled to indicate which search region is selected and used for searching the best IntraTMP predictor.
- FIG. 5 illustrates different indices being assigned to multiple different search regions of the current block for IntraTMP mode.
- search regions 421-424 are assigned indices 0, 1, 2, 3, respectively.
- the multiple search regions may be defined according to the distance from the current block. In some embodiments, the multiple search regions may be defined according to spatial direction. In some embodiments, two or more search regions are selected by a signaled index, and the best IntraTMP predictor is searched within these selected search regions. In some embodiments, a high-level syntax may be signaled in SPS, PPS, PH or SH to indicate whether the IntraTMP search region selection index is allowed or used for the current sequence, picture, or slice.
- a fusion method may be applied to fuse multiple candidate predictors into one predictor. Specifically, two or more candidate predictors may be found based on their template matching cost in a selected search region, and then the two or more candidate predictors may be fused into one IntraTMP predictor.
- FIG. 6 illustrates fusing multiple candidate predictors into one IntraTMP predictor for the current block. As illustrated, for the current block 410 in the current picture 400, two candidate predictors 621 and 622 are identified in the search region 422 by template matching. The two candidate predictors are fused /combined into a fused predictor 630 for coding the current block 410.
- the fusion of the two predictors 621 and 622 may be weighted, and the fusion weights may be fixed weight, derived from their respective matching costs, or derived by a regression-based method. In some embodiments, there may be multiple selected search regions, and the best candidates for each search region may be fused into one IntraTMP predictor.
- the fusion weights could be fixed weight, derived from their matching cost, or derived by a regression-based method.
- an index may be signaled to indicate the size of an IntraTMP search region.
- the size of a search range may be a multiple of the block width and/or block height, and this multiple may be determined by the signaled index.
- a high-level syntax may be signaled in SPS, PPS, PH or SH to indicate whether the IntraTMP search region size index is allowed or used for the current sequence, picture, or slice.
- the BV of a neighboring coded IBC block or IntraTMP block may be used to improve the IntraTMP searching process of the current block.
- the neighboring coded block is coded by IBC or IntraTMP mode
- the different (IntraTMP) search regions are assigned respective indices that are reordered according to the search regions’ distances from (the pixel position specified by) the neighboring BV (i.e., the BV of the neighboring IBC/IntraTMP block) .
- FIG. 7 illustrates search regions that are assigned indices reordered according to distances from a neighboring BV.
- a neighboring block 705 of the current block 410 is coded by using a BV 715.
- the BV 715 is inherited by the current block 410 as a neighboring BV.
- the BV 715 is pointing at a position in the search region 422, therefore the search region 422 has the smallest distance from the position pointed by the neighboring BV 715 and is assigned index 0.
- Search regions 421, 423, and 424 are respectively assigned index 1, index 2, and index 3, also based on their respective distances from the position pointed by the neighboring BV 715.
- the distance of a search region with the position pointed by the neighboring BV may be defined as the Euclidean distance between the center position of the search region and the position pointed by the neighboring BV, or the Euclidean distance between the top-left position of the search region and the position pointed by the neighboring BV.
- a BV cost may be added into the IntraTMP searching process.
- the BV cost of a candidate BV may be the sum of the horizontal difference and the vertical difference of the candidate BV and the neighboring BV, or the Euclidean distance between neighboring BV and candidate BV.
- the starting point of IntraTMP searching process is determined by the neighboring BV.
- FIG. 8 illustrates a searching start point that is determined by a neighboring BV.
- the figure illustrates the search regions 421-424 for the current block 410 in a current picture 400.
- a neighboring block 805 of the current block 410 is coded by using a BV 815.
- the BV 815 is inherited by the current block 410 as a neighboring BV.
- the video coder performs the search from a starting point (in the search region 422) identified by the neighboring BV 815. From that starting point, the search follows a spiral shape.
- the final predictor of IntraTMP may be a fusion of the best candidate in IntraTMP searching process and the predictor of the neighboring BV.
- BVs may be inherited to be used by other coding tools when multiple BVs are used to encode or decode the current block.
- the BV with the best TM cost e.g., SAD or SATD
- the BV with smaller or smallest magnitude is stored and inherited.
- the magnitude of one BV may be defined as the sum of absolute values of one BV in horizontal and vertical directions, the sum of square values of one BV in horizontal and vertical directions, the minimum value between the horizontal component of BV and the vertical component of BV, or the maximum value between the horizontal component of BV and the vertical component of BV.
- two or more BVs may be stored and inherited together.
- Two or more BVs that are used together may be stored together in a storage as a “to-be-inserted” candidate.
- Such a storage may store multiple “to-be-inserted” candidates, each to-be-inserted candidate including two or more BVs that are to be stored and inherited together.
- FIG. 9 conceptually illustrates a buffer storage 900 storing multiple to-be-inserted candidates 910, 920, 930, etc.
- two or more BVs stored in the buffer 900 may be taken together as a to-be-inserted candidate.
- BVs 911 and 912 are used in IntraTMP prediction by fusing multiple predictors, and the two BVs 911 and 912 are stored and inherited together in one to-be-inserted candidate 910.
- the two BVs 911 and 912 stored in the buffer 900 may be taken together as one to-be-inserted candidate 910.
- the first BV in the buffer may be taken and followed by the second BV in the buffer.
- the first BV 911 in the to-be-inserted candidate 910 in the buffer 900 may be taken and followed by the second BV 912 in the same candidate 910.
- the placing order of BVs in the buffer 900 may depend on the TM costs or the magnitudes of the BVs.
- the magnitude of one BV can be the sum of absolute values of one BV in horizontal and vertical directions, the sum of square values of one BV in horizontal and vertical directions, the minimum value between the horizontal component of BV and the vertical component of BV, or the maximum value between the horizontal component of BV and the vertical component of BV.
- the first BVs (911, 921, 931, etc. ) of all to-be-inserted candidates (910, 920, 930, etc. ) in the buffer 900 may be taken first and followed by the second BVs (912, 922, 932, etc. ) of all to-be-inserted candidates.
- BVs if multiple BVs are used in IntraTMP prediction by fusing multiple predictors, two or more BVs could be stored and inherited. And for different coding tools, the inserting order can be different. For example, for IBC-AMVP mode, only the first BV of all to-be-inserted candidates may be considered; for IBC-Merge mode, the first BVs of all to-be-inserted candidates will be taken first and followed by the second BVs of all to-be-inserted candidates; for IntraTMP, the BVs of one to-be-inserted candidate will be taken together.
- the BV inheritance methods described in Section IV may be applied to the CUs coded with the fusion based IntraTMP using multiple BVs, the fusion based IBC using multiple BVs, and/or the GPM prediction using two BVs.
- the method mentioned above may also be applied to CUs coded by IBC by changing the IntraTMP BV to IBC BV.
- the methods described above may also be applied to CUs coded by GPM with BVs by changing the IntraTMP BV to GPM BV.
- any of the foregoing proposed methods can be implemented in encoders and/or decoders.
- any of the proposed methods can be implemented in inter/intra coding of an encoder, and/or a decoder.
- any of the proposed methods can be implemented as a circuit coupled to the inter/intra coding of the encoder and/or the decoder, so as to provide the information needed by the inter/intra coding.
- FIG. 10 illustrates an example video encoder 1000 that may implement intra block copy (IBC) or current picture referencing (CPR) .
- the video encoder 1000 receives input video signal from a video source 1005 and encodes the signal into bitstream 1095.
- the video encoder 1000 has several components or modules for encoding the signal from the video source 1005, at least including some components selected from a transform module 1010, a quantization module 1011, an inverse quantization module 1014, an inverse transform module 1015, an intra-picture estimation module 1020, an intra-prediction module 1025, a motion compensation module 1030, a motion estimation module 1035, an in-loop filter 1045, a reconstructed picture buffer 1050, a MV buffer 1065, and a MV prediction module 1075, and an entropy encoder 1090.
- the motion compensation module 1030 and the motion estimation module 1035 are part of an inter-prediction module 1040.
- the modules 1010 –1090 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 1010 –1090 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 1010 –1090 are illustrated as being separate modules, some of the modules can be combined into a single module.
- the video source 1005 provides a raw video signal that presents pixel data of each video frame without compression.
- a subtractor 1008 computes the difference between the raw video pixel data of the video source 1005 and the predicted pixel data 1013 from the motion compensation module 1030 or intra-prediction module 1025 as prediction residual 1009.
- the transform module 1010 converts the difference (or the residual pixel data or residual signal 1008) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) .
- the quantization module 1011 quantizes the transform coefficients into quantized data (or quantized coefficients) 1012, which is encoded into the bitstream 1095 by the entropy encoder 1090.
- the inverse quantization module 1014 de-quantizes the quantized data (or quantized coefficients) 1012 to obtain transform coefficients, and the inverse transform module 1015 performs inverse transform on the transform coefficients to produce reconstructed residual 1019.
- the reconstructed residual 1019 is added with the predicted pixel data 1013 to produce reconstructed pixel data 1017.
- the reconstructed pixel data 1017 is temporarily stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
- the reconstructed pixels are filtered by the in-loop filter 1045 and stored in the reconstructed picture buffer 1050.
- the reconstructed picture buffer 1050 is a storage external to the video encoder 1000.
- the reconstructed picture buffer 1050 is a storage internal to the video encoder 1000.
- the intra-picture estimation module 1020 performs intra-prediction based on the reconstructed pixel data 1017 to produce intra prediction data.
- the intra-prediction data is provided to the entropy encoder 1090 to be encoded into bitstream 1095.
- the intra-prediction data is also used by the intra-prediction module 1025 to produce the predicted pixel data 1013.
- the motion estimation module 1035 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 1050. These MVs are provided to the motion compensation module 1030 to produce predicted pixel data.
- the video encoder 1000 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 1095.
- the MV prediction module 1075 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation.
- the MV prediction module 1075 retrieves reference MVs from previous video frames from the MV buffer 1065.
- the video encoder 1000 stores the MVs generated for the current video frame in the MV buffer 1065 as reference MVs for generating predicted MVs.
- the MV prediction module 1075 uses the reference MVs to create the predicted MVs.
- the predicted MVs can be computed by spatial MV prediction or temporal MV prediction.
- the difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 1095 by the entropy encoder 1090.
- the entropy encoder 1090 encodes various parameters and data into the bitstream 1095 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
- CABAC context-adaptive binary arithmetic coding
- the entropy encoder 1090 encodes various header elements, flags, along with the quantized transform coefficients 1012, and the residual motion data as syntax elements into the bitstream 1095.
- the bitstream 1095 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.
- the in-loop filter 1045 performs filtering or smoothing operations on the reconstructed pixel data 1017 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
- the filtering or smoothing operations performed by the in-loop filter 1045 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
- DPF deblock filter
- SAO sample adaptive offset
- ALF adaptive loop filter
- FIG. 11 illustrates portions of the video encoder 1000 that implement search and inheritance of multiple block vectors.
- the inter-prediction module 1040 and the intra-prediction module 1025 retrieve pixel samples from the reconstructed picture buffer 1050 to produce the predictor of the current block in the predicted pixel data 1013.
- the intra-prediction module 1025 performs prediction for regular intra modes, where template matching operations may be performed to select one or more intra-prediction directions.
- the inter-prediction module 1040 performs motion estimation and compensation for inter prediction modes (e.g., merge modes) .
- the inter-prediction module 1040 also performs prediction based on samples in the current picture as reference, such as in IBC or CPR or intraTMP mode.
- the inter-prediction module 1040 includes a search engine 1110 that searches the current picture for a template area (reference template) having the lowest matching cost with the template area neighboring the current block.
- the search engine 1110 receives a search region selection indication (e.g., from the entropy encoder 1090) that specifies one or more search regions in the current picture in which the search is to be conducted.
- the indication is based on indices assigned to different search regions that are ordered based on their distances from an inherited neighboring block vector.
- the search engine 1110 may produce one or multiple block vectors that are used to produce the predictor for the current block.
- the used block vector (s) are stored in a reference storage 1140 so it can be inherited and used by subsequent coding operations of the current block or a subsequent block (e.g., as a neighboring block vector) .
- the multiple block vectors are used to retrieve samples from multiple reference blocks. These samples may be fused or blended (as a weighted sum) to generate a predictor for the current block.
- FIG. 12 conceptually illustrates an encoding process 1200 for implementing inheritance of multiple block vectors.
- one or more processing units e.g., a processor
- a computing device implementing the encoder 1000 performs the process 1200 by executing instructions stored in a computer readable medium.
- an electronic apparatus implementing the encoder 1000 performs the process 1200.
- the encoder receives (at block 1210) data to be encoded as a current block of pixels in a current picture.
- the encoder signals (at block 1220) an index identifying a search region from a plurality of search regions in the current picture.
- the plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap.
- the signaled index identifies two or more search regions in the plurality of search regions.
- the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector.
- a neighboring block vector is a block vector that was used to code a neighboring block of the current block.
- the encoder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions. In some embodiments, the encoder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
- the encoder derives (at block 1230) a block vector by computing matching costs at searching positions within the selected or identified search region.
- the matching cost of a search position is a difference between a reference template at the position and a current template neighboring the current block.
- the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position.
- the search positions are identified based on the neighboring block vector.
- the encoder uses (at block 1240) first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block.
- the first and second block vectors are derived by searching positions within the identified search region.
- the encoder generates (at block 1250) a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture.
- the predictor may be a fusion of first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors.
- the encoder encodes (at block 1260) the current block by using the generated predictor to produce prediction residuals.
- the encoder stores (at block 1270) the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block.
- the first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector.
- the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors.
- a coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
- an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.
- FIG. 13 illustrates an example video decoder 1300 that may implement intra block copy (IBC) or current picture referencing (CPR) .
- the video decoder 1300 is an image-decoding or video-decoding circuit that receives a bitstream 1395 and decodes the content of the bitstream into pixel data of video frames for display.
- the video decoder 1300 has several components or modules for decoding the bitstream 1395, including some components selected from an inverse quantization module 1311, an inverse transform module 1310, an intra-prediction module 1325, a motion compensation module 1330, an in-loop filter 1345, a decoded picture buffer 1350, a MV buffer 1365, a MV prediction module 1375, and a parser 1390.
- the motion compensation module 1330 is part of an inter-prediction module 1340.
- the modules 1310 –1390 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device.
- the modules 1310 –1390 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 1310 –1390 are illustrated as being separate modules, some of the modules can be combined into a single module.
- the parser 1390 receives the bitstream 1395 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard.
- the parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 1312.
- the parser 1390 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
- CABAC context-adaptive binary arithmetic coding
- Huffman encoding Huffman encoding
- the inverse quantization module 1311 de-quantizes the quantized data (or quantized coefficients) 1312 to obtain transform coefficients, and the inverse transform module 1310 performs inverse transform on the transform coefficients 1316 to produce reconstructed residual signal 1319.
- the reconstructed residual signal 1319 is added with predicted pixel data 1313 from the intra- prediction module 1325 or the motion compensation module 1330 to produce decoded pixel data 1317.
- the decoded pixels data are filtered by the in-loop filter 1345 and stored in the decoded picture buffer 1350.
- the decoded picture buffer 1350 is a storage external to the video decoder 1300.
- the decoded picture buffer 1350 is a storage internal to the video decoder 1300.
- the intra-prediction module 1325 receives intra-prediction data from bitstream 1395 and according to which, produces the predicted pixel data 1313 from the decoded pixel data 1317 stored in the decoded picture buffer 1350.
- the decoded pixel data 1317 is also stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
- the content of the decoded picture buffer 1350 is used for display.
- a display device 1355 either retrieves the content of the decoded picture buffer 1350 for display directly, or retrieves the content of the decoded picture buffer to a display buffer.
- the display device receives pixel values from the decoded picture buffer 1350 through a pixel transport.
- the motion compensation module 1330 produces predicted pixel data 1313 from the decoded pixel data 1317 stored in the decoded picture buffer 1350 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 1395 with predicted MVs received from the MV prediction module 1375.
- MC MVs motion compensation MVs
- the MV prediction module 1375 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation.
- the MV prediction module 1375 retrieves the reference MVs of previous video frames from the MV buffer 1365.
- the video decoder 1300 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 1365 as reference MVs for producing predicted MVs.
- the in-loop filter 1345 performs filtering or smoothing operations on the decoded pixel data 1317 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
- the filtering or smoothing operations performed by the in-loop filter 1345 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
- DPF deblock filter
- SAO sample adaptive offset
- ALF adaptive loop filter
- FIG. 14 illustrates portions of the video decoder 1300 that implement search and inheritance of multiple block vectors.
- the inter-prediction module 1340 and the intra-prediction module 1325 retrieve pixel samples from the decoded picture buffer 1350 to produce the predictor of the current block in the predicted pixel data 1313.
- the intra-prediction module 1325 performs prediction for regular intra modes, where template matching operations may be performed to select one or more intra-prediction directions.
- the inter-prediction module 1340 performs motion estimation and compensation for inter prediction modes (e.g., merge modes) .
- the inter-prediction module 1340 also performs prediction based on samples in the current picture as reference, such as in IBC or CPR or intraTMP mode.
- the inter-prediction module 1340 includes a search engine 1410 that searches the current picture for a template area (reference template) having the lowest matching cost with the template area neighboring the current block.
- the search engine 1410 receives a search region selection indication (e.g., from the entropy decoder 1390) that specifies one or more search regions in the current picture in which the search is to be conducted.
- the indication is based on indices assigned to different search regions that are ordered based on their distances from an inherited neighboring block vector.
- the search engine 1410 may produce one or multiple block vectors that are used to produce the predictor for the current block.
- the used block vector (s) are stored in a reference storage 1440 so it can be inherited and used by subsequent coding operations of the current block or a subsequent block (e.g., as a neighboring block vector) .
- the multiple block vectors are used to retrieve samples from multiple reference blocks. These samples may be fused or blended (as a weighted sum) to generate a predictor for the current block.
- FIG. 15 conceptually illustrates a process 1500 for implementing inheritance of multiple block vectors.
- one or more processing units e.g., a processor
- a computing device implementing the decoder 1300 performs the process 1500 by executing instructions stored in a computer readable medium.
- an electronic apparatus implementing the decoder 1300 performs the process 1500.
- the decoder receives (at block 1510) data to be decoded as a current block of pixels in a current picture.
- the decoder receives (at block 1520) an index identifying a search region from a plurality of search regions in the current picture.
- the plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap.
- the signaled index identifies two or more search regions in the plurality of search regions.
- the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector.
- a neighboring block vector is a block vector that was used to code a neighboring block of the current block.
- the decoder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions. In some embodiments, the decoder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
- the decoder derives (at block 1530) a block vector by computing matching costs at searching positions within the selected or identified search region.
- the matching cost of a search position is a difference between a reference template at the position and a current template neighboring the current block.
- the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position.
- the search positions are identified based on the neighboring block vector.
- the decoder uses (at block 1540) first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block.
- the first and second block vectors are derived by searching positions within the identified search region.
- the decoder generates (at block 1550) a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture.
- the predictor may be a fusion of first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors.
- the decoder reconstructs (at block 1560) the current block by using the generated predictor.
- the decoder may then provide the reconstructed current block for display as part of the reconstructed current picture.
- the decoder stores (at block 1570) the first and second block vectors in a storage for subsequent decoding operations of the current block or a subsequent block.
- the first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector.
- the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors.
- a coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
- Computer readable storage medium also referred to as computer readable medium
- these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions.
- computational or processing unit e.g., one or more processors, cores of processors, or other processing units
- Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc.
- the computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
- the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor.
- multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions.
- multiple software inventions can also be implemented as separate programs.
- any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure.
- the software programs when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
- FIG. 16 conceptually illustrates an electronic system 1600 with which some embodiments of the present disclosure are implemented.
- the electronic system 1600 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device.
- Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media.
- Electronic system 1600 includes a bus 1605, processing unit (s) 1610, a graphics-processing unit (GPU) 1615, a system memory 1620, a network 1625, a read-only memory 1630, a permanent storage device 1635, input devices 1640, and output devices 1645.
- the bus 1605 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1600.
- the bus 1605 communicatively connects the processing unit (s) 1610 with the GPU 1615, the read-only memory 1630, the system memory 1620, and the permanent storage device 1635.
- the processing unit (s) 1610 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure.
- the processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1615.
- the GPU 1615 can offload various computations or complement the image processing provided by the processing unit (s) 1610.
- the read-only-memory (ROM) 1630 stores static data and instructions that are used by the processing unit (s) 1610 and other modules of the electronic system.
- the permanent storage device 1635 is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1600 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1635.
- the system memory 1620 is a read-and-write memory device. However, unlike storage device 1635, the system memory 1620 is a volatile read-and-write memory, such a random access memory.
- the system memory 1620 stores some of the instructions and data that the processor uses at runtime.
- processes in accordance with the present disclosure are stored in the system memory 1620, the permanent storage device 1635, and/or the read-only memory 1630.
- the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1610 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
- the bus 1605 also connects to the input and output devices 1640 and 1645.
- the input devices 1640 enable the user to communicate information and select commands to the electronic system.
- the input devices 1640 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc.
- the output devices 1645 display images generated by the electronic system or otherwise output data.
- the output devices 1645 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
- CTR cathode ray tubes
- LCD liquid crystal displays
- bus 1605 also couples electronic system 1600 to a network 1625 through a network adapter (not shown) .
- the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1600 may be used in conjunction with the present disclosure.
- Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) .
- computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.
- the computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- integrated circuits execute instructions that are stored on the circuit itself.
- PLDs programmable logic devices
- ROM read only memory
- RAM random access memory
- the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people.
- display or displaying means displaying on an electronic device.
- the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
- any two components so associated can also be viewed as being “operably connected” , or “operably coupled” , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable” , to each other to achieve the desired functionality.
- operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A method for coding a block of pixels by referencing the current picture is provided. The video coder may signal or receive an index identifying a search region from multiple search regions in the current picture. The video coder derives one or more block vectors by computing matching costs at searching positions within the identified search region. The video coder uses first and second block vectors to identify first and second reference blocks respectively in the current picture for a current block. The video coder generates a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture. The video coder encodes or decodes the current block by using the generated predictor. The video coder stores the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block.
Description
CROSS REFERENCE TO RELATED PATENT APPLICATION (S)
The present disclosure is part of a non-provisional application that claims the priority benefit of U.S. Provisional Patent Application Nos. 63/480,550 and 63/488,984, filed on 19 January 2023 and 8 March 2023, respectively. Contents of above-listed applications are herein incorporated by reference.
The present disclosure relates generally to video coding. In particular, the present disclosure relates to methods of coding pixel blocks by intra template matching prediction (IntraTMP) .
Unless otherwise indicated herein, approaches described in this section are not prior art to the claims listed below and are not admitted as prior art by inclusion in this section.
High-Efficiency Video Coding (HEVC) is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) . HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs) .
Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Expert Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11. The input video signal is predicted from the reconstructed signal, which is derived from the coded picture regions. The prediction residual signal is processed by a block transform. The transform coefficients are quantized and entropy coded together with other side information in the bitstream. The reconstructed signal is generated from the prediction signal and the reconstructed residual signal after inverse transform on the de-quantized transform coefficients. The reconstructed signal is further processed by in-loop filtering for removing coding artifacts. The decoded pictures are stored in the frame buffer for predicting the future pictures in the input video signal.
In VVC, a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) . The leaf nodes of a coding tree correspond to the coding units (CUs) . A coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order. A bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block. A predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block. An intra (I) slice is decoded using intra prediction only.
A CTU can be partitioned into one or multiple non-overlapped coding units (CUs) using the quadtree (QT) with nested multi-type-tree (MTT) structure to adapt to various local motion and texture characteristics. A CU can be further split into smaller CUs using one of the five split types: quad-tree partitioning, vertical binary tree partitioning, horizontal binary tree partitioning, vertical center-side triple-tree partitioning, horizontal center-side triple-tree partitioning.
Each CU contains one or more prediction units (PUs) . The prediction unit, together with the associated CU syntax, works as a basic unit for signaling the predictor information. The specified prediction process is employed to predict the values of the associated pixel samples inside the PU. Each CU may contain one or more transform units (TUs) for representing the prediction residual blocks. A transform unit (TU) is comprised of a transform block (TB) of luma samples and two corresponding transform blocks of chroma samples and each TB correspond to one residual block of samples from one color component. An integer transform is applied to a transform block. The level values of quantized coefficients together with other side information are entropy coded in the bitstream. The terms coding tree block (CTB) , coding block (CB) , prediction block (PB) , and transform block (TB) are defined to specify the 2-D sample array of one-color component associated with CTU, CU, PU, and TU, respectively. Thus, a CTU consists of one luma CTB, two chroma CTBs, and associated syntax elements. A similar relationship is valid for CU, PU, and TU.
For each inter-predicted CU, motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC. The merge mode can be applied to any inter-predicted CU. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
The following summary is illustrative only and is not intended to be limiting in any way. That is, the following summary is provided to introduce concepts, highlights, benefits and advantages of the novel and non-obvious techniques described herein. Select and not all implementations are further described below in the detailed description. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.
Some embodiments of the disclosure provide a method for coding a block of pixels by referencing the current picture. The video coder may signal or receive an index identifying a search region from multiple search regions in the current picture. The video coder derives one or more block vectors by computing matching costs at searching positions within the identified search region. The video coder uses first and second block vectors to identify first and second reference blocks
respectively in the current picture for a current block. The video coder generates a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture. The video coder encodes or decodes the current block by using the generated predictor. The video coder stores the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block.
The plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap. In some embodiments, the signaled index identifies two or more search regions in the plurality of search regions. In some embodiments, the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector. (A neighboring block vector is a block vector that was used to code a neighboring block of the current block. ) In some embodiments, the video coder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions. In some embodiments, the video coder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
In some embodiments, the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position. In some embodiments, the search positions are identified based on the neighboring block vector. In some embodiments, the starting position (s) of search are identified based on the neighboring block vector. The predictor may be a fusion of the first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors.
In some embodiments, the first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector. In some embodiments, the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors. A coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of the present disclosure. The drawings illustrate implementations of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion than the size in actual implementation in order to clearly illustrate the concept of the present disclosure.
FIG. 1 conceptually illustrates current picture referencing (CPR) .
FIG. 2 conceptually illustrates search regions for intra template matching.
FIG. 3 shows the use of a block vector of a neighboring block for a current block.
FIG. 4 illustrates multiple search regions for searching the best Intra Template Matching Prediction (IntraTMP) predictor.
FIG. 5 illustrates different indices being assigned to multiple different search regions of the current block for IntraTMP mode.
FIG. 6 illustrates fusing multiple candidate predictors into one IntraTMP predictor for the current block.
FIG. 7 illustrates search regions that are assigned indices reordered according to distances from a neighboring block vector (BV) .
FIG. 8 illustrates a searching start point that is determined by a neighboring BV.
FIG. 9 conceptually illustrates a buffer storage storing multiple to-be-inserted candidates.
FIG. 10 illustrates an example video encoder that may implement intra block copy (IBC) or current picture referencing (CPR) .
FIG. 11 illustrates portions of the video encoder that implement search and inheritance of multiple block vectors.
FIG. 12 conceptually illustrates an encoding process for implementing inheritance of multiple block vectors.
FIG. 13 illustrates an example video decoder that may implement intra block copy (IBC) or current picture referencing (CPR) .
FIG. 14 illustrates portions of the video decoder that implement search and inheritance of multiple block vectors.
FIG. 15 conceptually illustrates a decoding process for implementing inheritance of multiple block vectors.
FIG. 16 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivatives and/or extensions based on teachings described herein are within the protective scope of the present disclosure. In some instances, well-known methods, procedures, components, and/or circuitry pertaining to one or more example implementations disclosed herein may be described at a relatively high level without detail, in order to avoid unnecessarily obscuring aspects of teachings of the present disclosure.
I. Current Picture Referencing
Motion Compensation is a video coding process that explores the pixel correlation between adjacent pictures. It is generally assumed that in a video sequence the patterns corresponding to objects or background in a frame are displaced to form corresponding objects on the subsequent frame or correlated with other patterns within the current frame. With the estimation of such a displacement (e.g., using block matching techniques) , the pattern could be mostly reproduced without needing to re-code the pattern. Block matching and copy allows selecting the reference block from within the same picture, but it is observed to be not as efficient when applied to camera captured videos. Part of the reasons is that textual pattern in a spatial neighboring area may be similar to the current coding block but usually with some gradual changes over space. It is therefore less likely for
a block to find a good match within the same picture of a camera captured video, thereby limiting the improvement in coding performance.
However, the spatial correlation among pixels within the same picture is different for screen content. For a typical video with text and graphics, there are usually repetitive patterns within the same picture. Hence, intra (picture) block compensation has been observed to be very effective. The prediction mode intra block copy (IBC) mode or current picture referencing (CPR) may therefore be used for screen content coding.
FIG. 1 conceptually illustrates current picture referencing (CPR) . As illustrated, a prediction unit (PU) as a current block 110 is predicted from a previously reconstructed block 130 within the same picture 100. A displacement vector 120 (called block vector or BV) is used to signal the relative displacement from the position of the current block to that of the reference block, which provides the reference samples used for generating a predictor of the current block. The prediction errors are then coded using transformation, quantization and entropy coding. The reference samples may correspond to the reconstructed samples of the current decoded picture prior to in-loop filter operations, both deblocking and sample adaptive offset (SAO) filters.
II. Intra Template Matching Prediction
Intra template matching prediction (IntraTMP) is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches a template (current template) neighboring the block being currently coded (current block) . For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.
FIG. 2 conceptually illustrates search regions for intra template matching. As illustrated, to encode a current block 210 in a current picture 200, the search for a matching L-shape can be conducted in several predefined search regions: R1, which is the current CTU; R2, which is the top-left CTU; R3 which is the above CTU; and R4, which is the left CTU.
Sum of absolute differences (SAD) is used as a cost function, such that within each search region, the decoder searches for the reference template 230 that has the least SAD with respect to a current template 220 neighboring the current block 210. The corresponding block 235 of the reference template 230 is used as a prediction block or reference block. The dimensions of all search regions (SearchRange_w, SearchRange_h) are set proportional to the block dimension (BlkW, BlkH) to have a fixed number of SAD comparisons per pixel. That is: SearchRange_w = a *BlkW, and SearchRange_h = a *BlkH. Where ‘a’ is a constant that controls the gain/complexity trade-off. In some embodiments, ‘a’ is set to 5.
In some embodiments, to speed-up the template matching process, the search range of all search regions is subsampled by a factor of 2. This leads to a reduction of template matching search by 4. After finding the best match, a refinement process is performed. The refinement is done via a second template matching search around the best match with a reduced range. The reduced range is defined as min (BlkW, BlkH) /2.
In some embodiments, intra template matching mode is enabled for CUs with size less than or equal to 64 in width and height. This maximum CU size for Intra template matching is
configurable. The Intra template matching prediction mode is signaled at CU level through a dedicated flag when decoder-side intra mode derivation (DIMD) is not used for current CU.
In some embodiments, the block vector (BV) derived from the intra template matching prediction (IntraTMP) is used for intra block copy (IBC) prediction mode. In some embodiments, IntraTMP block vectors are added to IBC block vector candidate list as spatial candidates. An IntraTMP block vector may be stored in an IBC block vector buffer and the current IBC block can use both its own IBC BV and the IntraTMP BV of neighboring blocks as BV candidate for IBC BV candidate list. The stored IntraTMP BV of the neighboring blocks along with IBC BV may be used as spatial BV candidates in IBC candidate list construction.
FIG. 3 shows the use of a block vector of a neighboring block for a current block. In the example, a current block 310 in a current picture 300 is coded by IBC, while a neighboring block 320 of the current block is coded by IntraTMP. The IntraTMP coded neighboring block 320 has a BV 325 that is derived by template matching to reference a reference block 330. The BV 325 can be inherited by the current block 310 as a neighboring BV to be part of its IBC candidate list.
III. Multiple Search Regions for Block Vector
Some embodiments of the disclosure provide methods to improve the performance of prediction by current picture referencing (e.g., IntraTMP) by signaling additional syntax or utilizing the neighboring information. In some embodiments, IntraTMP uses multiple search regions (or ranges) for searching the best IntraTMP predictor. (These search regions or ranges may be partitions of a larger pre-defined area. ) These search regions may be non-overlapping, or some of the search regions may overlap. FIG. 4 illustrates multiple search regions for searching the best IntraTMP predictor. In the figure, the current block 410 is coded by IntraTMP by searching for a best match for the current block’s neighboring L-shaped template 415, in multiple search regions 421-426 (labeled as the 1st search regions through the 6th search region) of the current picture 400.
In some embodiments, an index may be signaled to indicate which search region is selected and used for searching the best IntraTMP predictor. FIG. 5 illustrates different indices being assigned to multiple different search regions of the current block for IntraTMP mode. In the figure, search regions 421-424 are assigned indices 0, 1, 2, 3, respectively.
In some embodiments, the multiple search regions may be defined according to the distance from the current block. In some embodiments, the multiple search regions may be defined according to spatial direction. In some embodiments, two or more search regions are selected by a signaled index, and the best IntraTMP predictor is searched within these selected search regions. In some embodiments, a high-level syntax may be signaled in SPS, PPS, PH or SH to indicate whether the IntraTMP search region selection index is allowed or used for the current sequence, picture, or slice.
In some embodiments, a fusion method may be applied to fuse multiple candidate predictors into one predictor. Specifically, two or more candidate predictors may be found based on their template matching cost in a selected search region, and then the two or more candidate predictors may be fused into one IntraTMP predictor. FIG. 6 illustrates fusing multiple candidate predictors into one IntraTMP predictor for the current block. As illustrated, for the current block 410 in the current picture 400, two candidate predictors 621 and 622 are identified in the search region 422 by template matching. The two candidate predictors are fused /combined into a fused predictor 630 for coding
the current block 410. The fusion of the two predictors 621 and 622 may be weighted, and the fusion weights may be fixed weight, derived from their respective matching costs, or derived by a regression-based method. In some embodiments, there may be multiple selected search regions, and the best candidates for each search region may be fused into one IntraTMP predictor. The fusion weights could be fixed weight, derived from their matching cost, or derived by a regression-based method.
In some embodiments, an index may be signaled to indicate the size of an IntraTMP search region. For example, the size of a search range may be a multiple of the block width and/or block height, and this multiple may be determined by the signaled index. In some embodiments, a high-level syntax may be signaled in SPS, PPS, PH or SH to indicate whether the IntraTMP search region size index is allowed or used for the current sequence, picture, or slice.
In some embodiments, the BV of a neighboring coded IBC block or IntraTMP block may be used to improve the IntraTMP searching process of the current block. In some embodiments, if the neighboring coded block is coded by IBC or IntraTMP mode, the different (IntraTMP) search regions are assigned respective indices that are reordered according to the search regions’ distances from (the pixel position specified by) the neighboring BV (i.e., the BV of the neighboring IBC/IntraTMP block) .
FIG. 7 illustrates search regions that are assigned indices reordered according to distances from a neighboring BV. As illustrated, a neighboring block 705 of the current block 410 is coded by using a BV 715. The BV 715 is inherited by the current block 410 as a neighboring BV. In the example illustrated, the BV 715 is pointing at a position in the search region 422, therefore the search region 422 has the smallest distance from the position pointed by the neighboring BV 715 and is assigned index 0. Search regions 421, 423, and 424 are respectively assigned index 1, index 2, and index 3, also based on their respective distances from the position pointed by the neighboring BV 715. In some embodiments, the distance of a search region with the position pointed by the neighboring BV may be defined as the Euclidean distance between the center position of the search region and the position pointed by the neighboring BV, or the Euclidean distance between the top-left position of the search region and the position pointed by the neighboring BV.
In some embodiments, if a neighboring coded block is coded by IBC or IntraTMP mode, a BV cost may be added into the IntraTMP searching process. The BV cost of a candidate BV may be the sum of the horizontal difference and the vertical difference of the candidate BV and the neighboring BV, or the Euclidean distance between neighboring BV and candidate BV.
In some embodiments, if the neighboring coded block is coded by IBC or IntraTMP mode, the starting point of IntraTMP searching process is determined by the neighboring BV. FIG. 8 illustrates a searching start point that is determined by a neighboring BV.
The figure illustrates the search regions 421-424 for the current block 410 in a current picture 400. A neighboring block 805 of the current block 410 is coded by using a BV 815. The BV 815 is inherited by the current block 410 as a neighboring BV. The video coder performs the search from a starting point (in the search region 422) identified by the neighboring BV 815. From that starting point, the search follows a spiral shape. In some embodiments, if the neighboring coded block is coded by IBC or IntraTMP mode, the final predictor of IntraTMP may be a fusion of the best candidate in IntraTMP searching process and the predictor of the neighboring BV.
IV. Inheriting Block Vectors
In some embodiments, BVs may be inherited to be used by other coding tools when multiple BVs are used to encode or decode the current block. In some embodiments, if multiple BVs are used in IntraTMP prediction by fusing multiple predictors, the BV with the best TM cost (e.g., SAD or SATD) may be stored and inherited by other tools or subsequent blocks. In some embodiments, the BV with smaller or smallest magnitude is stored and inherited. The magnitude of one BV may be defined as the sum of absolute values of one BV in horizontal and vertical directions, the sum of square values of one BV in horizontal and vertical directions, the minimum value between the horizontal component of BV and the vertical component of BV, or the maximum value between the horizontal component of BV and the vertical component of BV.
In some embodiment, if multiple BVs are used in IntraTMP prediction by fusing multiple predictors, two or more BVs may be stored and inherited together. Two or more BVs that are used together (e.g., for fusing multiple predictors or for a coding tool that require two or more BVs) may be stored together in a storage as a “to-be-inserted” candidate. Such a storage may store multiple “to-be-inserted” candidates, each to-be-inserted candidate including two or more BVs that are to be stored and inherited together. FIG. 9 conceptually illustrates a buffer storage 900 storing multiple to-be-inserted candidates 910, 920, 930, etc.
In some embodiments, if a coding tool requires two or more BVs, two or more BVs stored in the buffer 900 may be taken together as a to-be-inserted candidate. For example, say BVs 911 and 912 are used in IntraTMP prediction by fusing multiple predictors, and the two BVs 911 and 912 are stored and inherited together in one to-be-inserted candidate 910. If a coding tool requires two or more BVs, the two BVs 911 and 912 stored in the buffer 900 may be taken together as one to-be-inserted candidate 910.
In some embodiments, if a coding tool requires only one BV, the first BV in the buffer may be taken and followed by the second BV in the buffer. For example, the first BV 911 in the to-be-inserted candidate 910 in the buffer 900 may be taken and followed by the second BV 912 in the same candidate 910. The placing order of BVs in the buffer 900 may depend on the TM costs or the magnitudes of the BVs. The magnitude of one BV can be the sum of absolute values of one BV in horizontal and vertical directions, the sum of square values of one BV in horizontal and vertical directions, the minimum value between the horizontal component of BV and the vertical component of BV, or the maximum value between the horizontal component of BV and the vertical component of BV.
In some embodiments, if a coding tool requires only one BV, the first BVs (911, 921, 931, etc. ) of all to-be-inserted candidates (910, 920, 930, etc. ) in the buffer 900 may be taken first and followed by the second BVs (912, 922, 932, etc. ) of all to-be-inserted candidates.
In some embodiments, if multiple BVs are used in IntraTMP prediction by fusing multiple predictors, two or more BVs could be stored and inherited. And for different coding tools, the inserting order can be different. For example, for IBC-AMVP mode, only the first BV of all to-be-inserted candidates may be considered; for IBC-Merge mode, the first BVs of all to-be-inserted candidates will be taken first and followed by the second BVs of all to-be-inserted candidates; for IntraTMP, the BVs of one to-be-inserted candidate will be taken together.
The BV inheritance methods described in Section IV may be applied to the CUs coded with the fusion based IntraTMP using multiple BVs, the fusion based IBC using multiple BVs, and/or
the GPM prediction using two BVs. In some embodiments, the method mentioned above may also be applied to CUs coded by IBC by changing the IntraTMP BV to IBC BV. In some embodiment, the methods described above may also be applied to CUs coded by GPM with BVs by changing the IntraTMP BV to GPM BV.
Any of the foregoing proposed methods can be implemented in encoders and/or decoders. For example, any of the proposed methods can be implemented in inter/intra coding of an encoder, and/or a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter/intra coding of the encoder and/or the decoder, so as to provide the information needed by the inter/intra coding.
V. Example Video Encoder
FIG. 10 illustrates an example video encoder 1000 that may implement intra block copy (IBC) or current picture referencing (CPR) . As illustrated, the video encoder 1000 receives input video signal from a video source 1005 and encodes the signal into bitstream 1095. The video encoder 1000 has several components or modules for encoding the signal from the video source 1005, at least including some components selected from a transform module 1010, a quantization module 1011, an inverse quantization module 1014, an inverse transform module 1015, an intra-picture estimation module 1020, an intra-prediction module 1025, a motion compensation module 1030, a motion estimation module 1035, an in-loop filter 1045, a reconstructed picture buffer 1050, a MV buffer 1065, and a MV prediction module 1075, and an entropy encoder 1090. The motion compensation module 1030 and the motion estimation module 1035 are part of an inter-prediction module 1040.
In some embodiments, the modules 1010 –1090 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 1010 –1090 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 1010 –1090 are illustrated as being separate modules, some of the modules can be combined into a single module.
The video source 1005 provides a raw video signal that presents pixel data of each video frame without compression. A subtractor 1008 computes the difference between the raw video pixel data of the video source 1005 and the predicted pixel data 1013 from the motion compensation module 1030 or intra-prediction module 1025 as prediction residual 1009. The transform module 1010 converts the difference (or the residual pixel data or residual signal 1008) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) . The quantization module 1011 quantizes the transform coefficients into quantized data (or quantized coefficients) 1012, which is encoded into the bitstream 1095 by the entropy encoder 1090.
The inverse quantization module 1014 de-quantizes the quantized data (or quantized coefficients) 1012 to obtain transform coefficients, and the inverse transform module 1015 performs inverse transform on the transform coefficients to produce reconstructed residual 1019. The reconstructed residual 1019 is added with the predicted pixel data 1013 to produce reconstructed pixel data 1017. In some embodiments, the reconstructed pixel data 1017 is temporarily stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction. The reconstructed pixels are filtered by the in-loop filter 1045 and stored in the reconstructed picture buffer 1050. In some embodiments, the reconstructed picture buffer 1050 is a storage external to the video encoder 1000.
In some embodiments, the reconstructed picture buffer 1050 is a storage internal to the video encoder 1000.
The intra-picture estimation module 1020 performs intra-prediction based on the reconstructed pixel data 1017 to produce intra prediction data. The intra-prediction data is provided to the entropy encoder 1090 to be encoded into bitstream 1095. The intra-prediction data is also used by the intra-prediction module 1025 to produce the predicted pixel data 1013.
The motion estimation module 1035 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 1050. These MVs are provided to the motion compensation module 1030 to produce predicted pixel data.
Instead of encoding the complete actual MVs in the bitstream, the video encoder 1000 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 1095.
The MV prediction module 1075 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 1075 retrieves reference MVs from previous video frames from the MV buffer 1065. The video encoder 1000 stores the MVs generated for the current video frame in the MV buffer 1065 as reference MVs for generating predicted MVs.
The MV prediction module 1075 uses the reference MVs to create the predicted MVs. The predicted MVs can be computed by spatial MV prediction or temporal MV prediction. The difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 1095 by the entropy encoder 1090.
The entropy encoder 1090 encodes various parameters and data into the bitstream 1095 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding. The entropy encoder 1090 encodes various header elements, flags, along with the quantized transform coefficients 1012, and the residual motion data as syntax elements into the bitstream 1095. The bitstream 1095 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.
The in-loop filter 1045 performs filtering or smoothing operations on the reconstructed pixel data 1017 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 1045 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
FIG. 11 illustrates portions of the video encoder 1000 that implement search and inheritance of multiple block vectors. As illustrated, the inter-prediction module 1040 and the intra-prediction module 1025 retrieve pixel samples from the reconstructed picture buffer 1050 to produce the predictor of the current block in the predicted pixel data 1013. In some embodiments, the intra-prediction module 1025 performs prediction for regular intra modes, where template matching operations may be performed to select one or more intra-prediction directions.
The inter-prediction module 1040 performs motion estimation and compensation for inter prediction modes (e.g., merge modes) . The inter-prediction module 1040 also performs prediction based on samples in the current picture as reference, such as in IBC or CPR or intraTMP
mode. The inter-prediction module 1040 includes a search engine 1110 that searches the current picture for a template area (reference template) having the lowest matching cost with the template area neighboring the current block. The search engine 1110 receives a search region selection indication (e.g., from the entropy encoder 1090) that specifies one or more search regions in the current picture in which the search is to be conducted. In some embodiments, the indication is based on indices assigned to different search regions that are ordered based on their distances from an inherited neighboring block vector.
Based on the search, the search engine 1110 may produce one or multiple block vectors that are used to produce the predictor for the current block. The used block vector (s) are stored in a reference storage 1140 so it can be inherited and used by subsequent coding operations of the current block or a subsequent block (e.g., as a neighboring block vector) . When multiple block vectors are identified by the search engine 1110, the multiple block vectors are used to retrieve samples from multiple reference blocks. These samples may be fused or blended (as a weighted sum) to generate a predictor for the current block.
FIG. 12 conceptually illustrates an encoding process 1200 for implementing inheritance of multiple block vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the encoder 1000 performs the process 1200 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the encoder 1000 performs the process 1200.
The encoder receives (at block 1210) data to be encoded as a current block of pixels in a current picture. The encoder signals (at block 1220) an index identifying a search region from a plurality of search regions in the current picture. The plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap. In some embodiments, the signaled index identifies two or more search regions in the plurality of search regions. In some embodiments, the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector. (A neighboring block vector is a block vector that was used to code a neighboring block of the current block. ) In some embodiments, the encoder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions. In some embodiments, the encoder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
The encoder derives (at block 1230) a block vector by computing matching costs at searching positions within the selected or identified search region. The matching cost of a search position is a difference between a reference template at the position and a current template neighboring the current block. In some embodiments, the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position. In some embodiments, the search positions are identified based on the neighboring block vector.
The encoder uses (at block 1240) first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block. The first and second block vectors are derived by searching positions within the identified search region. The encoder generates (at block 1250) a predictor based on the first and second reference blocks identified by first
and second block vectors in the current picture. The predictor may be a fusion of first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors. The encoder encodes (at block 1260) the current block by using the generated predictor to produce prediction residuals.
The encoder stores (at block 1270) the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block. The first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector. In some embodiments, the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors. A coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
VI. Example Video Decoder
In some embodiments, an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.
FIG. 13 illustrates an example video decoder 1300 that may implement intra block copy (IBC) or current picture referencing (CPR) . As illustrated, the video decoder 1300 is an image-decoding or video-decoding circuit that receives a bitstream 1395 and decodes the content of the bitstream into pixel data of video frames for display. The video decoder 1300 has several components or modules for decoding the bitstream 1395, including some components selected from an inverse quantization module 1311, an inverse transform module 1310, an intra-prediction module 1325, a motion compensation module 1330, an in-loop filter 1345, a decoded picture buffer 1350, a MV buffer 1365, a MV prediction module 1375, and a parser 1390. The motion compensation module 1330 is part of an inter-prediction module 1340.
In some embodiments, the modules 1310 –1390 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 1310 –1390 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 1310 –1390 are illustrated as being separate modules, some of the modules can be combined into a single module.
The parser 1390 (or entropy decoder) receives the bitstream 1395 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard. The parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 1312. The parser 1390 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
The inverse quantization module 1311 de-quantizes the quantized data (or quantized coefficients) 1312 to obtain transform coefficients, and the inverse transform module 1310 performs inverse transform on the transform coefficients 1316 to produce reconstructed residual signal 1319. The reconstructed residual signal 1319 is added with predicted pixel data 1313 from the intra-
prediction module 1325 or the motion compensation module 1330 to produce decoded pixel data 1317. The decoded pixels data are filtered by the in-loop filter 1345 and stored in the decoded picture buffer 1350. In some embodiments, the decoded picture buffer 1350 is a storage external to the video decoder 1300. In some embodiments, the decoded picture buffer 1350 is a storage internal to the video decoder 1300.
The intra-prediction module 1325 receives intra-prediction data from bitstream 1395 and according to which, produces the predicted pixel data 1313 from the decoded pixel data 1317 stored in the decoded picture buffer 1350. In some embodiments, the decoded pixel data 1317 is also stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
In some embodiments, the content of the decoded picture buffer 1350 is used for display. A display device 1355 either retrieves the content of the decoded picture buffer 1350 for display directly, or retrieves the content of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 1350 through a pixel transport.
The motion compensation module 1330 produces predicted pixel data 1313 from the decoded pixel data 1317 stored in the decoded picture buffer 1350 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 1395 with predicted MVs received from the MV prediction module 1375.
The MV prediction module 1375 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 1375 retrieves the reference MVs of previous video frames from the MV buffer 1365. The video decoder 1300 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 1365 as reference MVs for producing predicted MVs.
The in-loop filter 1345 performs filtering or smoothing operations on the decoded pixel data 1317 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 1345 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
FIG. 14 illustrates portions of the video decoder 1300 that implement search and inheritance of multiple block vectors. As illustrated, the inter-prediction module 1340 and the intra-prediction module 1325 retrieve pixel samples from the decoded picture buffer 1350 to produce the predictor of the current block in the predicted pixel data 1313. In some embodiments, the intra-prediction module 1325 performs prediction for regular intra modes, where template matching operations may be performed to select one or more intra-prediction directions.
The inter-prediction module 1340 performs motion estimation and compensation for inter prediction modes (e.g., merge modes) . The inter-prediction module 1340 also performs prediction based on samples in the current picture as reference, such as in IBC or CPR or intraTMP mode. The inter-prediction module 1340 includes a search engine 1410 that searches the current picture for a template area (reference template) having the lowest matching cost with the template area neighboring the current block. The search engine 1410 receives a search region selection indication (e.g., from the entropy decoder 1390) that specifies one or more search regions in the current picture in which the search is to be conducted. In some embodiments, the indication is based
on indices assigned to different search regions that are ordered based on their distances from an inherited neighboring block vector.
Based on the search, the search engine 1410 may produce one or multiple block vectors that are used to produce the predictor for the current block. The used block vector (s) are stored in a reference storage 1440 so it can be inherited and used by subsequent coding operations of the current block or a subsequent block (e.g., as a neighboring block vector) . When multiple block vectors are identified by the search engine 1410, the multiple block vectors are used to retrieve samples from multiple reference blocks. These samples may be fused or blended (as a weighted sum) to generate a predictor for the current block.
FIG. 15 conceptually illustrates a process 1500 for implementing inheritance of multiple block vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the decoder 1300 performs the process 1500 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the decoder 1300 performs the process 1500.
The decoder receives (at block 1510) data to be decoded as a current block of pixels in a current picture. The decoder receives (at block 1520) an index identifying a search region from a plurality of search regions in the current picture. The plurality of search regions may be non-overlapping regions of the current picture, or different regions of the current picture that may overlap. In some embodiments, the signaled index identifies two or more search regions in the plurality of search regions. In some embodiments, the different search regions are assigned corresponding indices that are ordered according to the distances of the search regions from a neighboring block vector. (A neighboring block vector is a block vector that was used to code a neighboring block of the current block. ) In some embodiments, the decoder may signal a syntax element to indicate whether an index is used to select a search region from a plurality of search regions. In some embodiments, the decoder may signal a syntax element to indicate a size of a search range that includes the plurality of search regions.
The decoder derives (at block 1530) a block vector by computing matching costs at searching positions within the selected or identified search region. The matching cost of a search position is a difference between a reference template at the position and a current template neighboring the current block. In some embodiments, the matching cost of a search position is further based on horizontal and vertical differences between a neighboring block vector and the search position. In some embodiments, the search positions are identified based on the neighboring block vector.
The decoder uses (at block 1540) first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block. The first and second block vectors are derived by searching positions within the identified search region. The decoder generates (at block 1550) a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture. The predictor may be a fusion of first and second reference blocks identified by the first and second block vectors, and the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors. The decoder reconstructs (at block 1560) the current block by using the generated predictor. The decoder
may then provide the reconstructed current block for display as part of the reconstructed current picture.
The decoder stores (at block 1570) the first and second block vectors in a storage for subsequent decoding operations of the current block or a subsequent block. The first block vector may be retrieved from the storage before the second block vector, when the first block vector has a lower (or equal) matching cost than that of the second block vector, or when the first block vector has a smaller magnitude than that of the second block vector. In some embodiments, the storage may store two or more to-be-inserted candidates. For example, a first to-be-inserted candidate may store the first and second block vectors, while a second to-be-inserted candidate may store third and fourth block vectors. A coding tool may subsequently retrieve and use the first block vector of the first to-be-inserted candidate and the third block vector of the second to-be-inserted candidate from the storage.
VII. Example Electronic System
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium) . When these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
FIG. 16 conceptually illustrates an electronic system 1600 with which some embodiments of the present disclosure are implemented. The electronic system 1600 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1600 includes a bus 1605, processing unit (s) 1610, a graphics-processing unit (GPU) 1615, a system memory 1620, a network 1625, a read-only memory 1630, a permanent storage device 1635, input devices 1640, and output devices 1645.
The bus 1605 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1600. For instance,
the bus 1605 communicatively connects the processing unit (s) 1610 with the GPU 1615, the read-only memory 1630, the system memory 1620, and the permanent storage device 1635.
From these various memory units, the processing unit (s) 1610 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure. The processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1615. The GPU 1615 can offload various computations or complement the image processing provided by the processing unit (s) 1610.
The read-only-memory (ROM) 1630 stores static data and instructions that are used by the processing unit (s) 1610 and other modules of the electronic system. The permanent storage device 1635, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1600 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1635.
Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding disk drive) as the permanent storage device. Like the permanent storage device 1635, the system memory 1620 is a read-and-write memory device. However, unlike storage device 1635, the system memory 1620 is a volatile read-and-write memory, such a random access memory. The system memory 1620 stores some of the instructions and data that the processor uses at runtime. In some embodiments, processes in accordance with the present disclosure are stored in the system memory 1620, the permanent storage device 1635, and/or the read-only memory 1630. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1610 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 1605 also connects to the input and output devices 1640 and 1645. The input devices 1640 enable the user to communicate information and select commands to the electronic system. The input devices 1640 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc. The output devices 1645 display images generated by the electronic system or otherwise output data. The output devices 1645 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, as shown in FIG. 16, bus 1605 also couples electronic system 1600 to a network 1625 through a network adapter (not shown) . In this manner, the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1600 may be used in conjunction with the present disclosure.
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) . Some examples of such computer-readable media include RAM,
ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc. ) , flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc. ) , magnetic and/or solid state hard drives, read-only and recordable discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, many of the above-described features and applications are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) . In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs) , ROM, or RAM devices.
As used in this specification and any claims of this application, the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure can be embodied in other specific forms without departing from the spirit of the present disclosure. In addition, a number of the figures (including FIG. 12 and FIG. 15) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the present disclosure is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
Additional Notes
The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is
achieved, irrespective of architectures or intermediate components. Likewise, any two components so associated can also be viewed as being "operably connected" , or "operably coupled" , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable" , to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Further, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
Moreover, it will be understood by those skilled in the art that, in general, terms used herein, and especially in the appended claims, e.g., bodies of the appended claims, are generally intended as “open” terms, e.g., the term “including” should be interpreted as “including but not limited to, ” the term “having” should be interpreted as “having at least, ” the term “includes” should be interpreted as “includes but is not limited to, ” etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an, " e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more; ” the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number, e.g., the bare recitation of "two recitations, " without other modifiers, means at least two recitations, or two or more recitations. Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. In those instances where a convention analogous to “at least one of A, B, or C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc. It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For
example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B. ”
From the foregoing, it will be appreciated that various implementations of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Claims (21)
- A video coding method comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;using first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block;generating a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture;encoding or decoding the current block by using the generated predictor; andstoring the first and second block vectors in a storage for subsequent encoding or decoding operations of the current block or a subsequent block.
- The video coding method of claim 1, wherein the first block vector is retrieved from the storage before the second block vector, wherein the first block vector has a lower or equal matching cost than that of the second block vector, wherein a matching cost of a block vector is determined based on a difference between a reference template identified by the block vector and a current template neighboring the current block.
- The video coding method of claim 1, wherein the first block vector is retrieved from the storage before the second block vector, wherein the first block vector has a smaller magnitude than that of the second block vector.
- The video coding method of claim 1, wherein the storage stores two or more to-be-inserted candidates, wherein a first to-be-inserted candidate comprises the first and second block vectors.
- The video coding method of claim 4, further comprising retrieving the first block vector of a first to-be-inserted candidate and a third block vector of a second to-be-inserted candidate from the storage for use by a coding tool.
- A video coding method comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;signaling or receiving an index identifying a search region from a plurality of search regions in the current picture;deriving a block vector by computing matching costs at searching positions within the identified search region, wherein the matching cost of a search position is a difference between a reference template at the position and a current template neighboring the current block;generating a predictor based on a reference block identified by the block vector in the current picture; andencoding or decoding the current block by using the generated predictor.
- The video coding method of claim 6, wherein the plurality of search regions are non-overlapping regions of the current picture.
- The video coding method of claim 6, wherein the plurality of search regions are different regions in the current pictures.
- The video coding method of claim 6, wherein a neighboring block vector is used to code a block neighboring the current block by referencing samples of the current picture, wherein the plurality of search regions are assigned corresponding indices that are ordered according to the distances of the search regions from the neighboring block vector.
- The video coding method of claim 6, wherein a neighboring block vector is used to code a block neighboring the current block by referencing samples of the current picture, wherein the matching cost of a search position is further based on horizontal and vertical differences between the neighboring block vector and the search position.
- The video coding method of claim 6, wherein the signaled index identifies two or more search regions in the plurality of search regions, wherein the block vector is derived by searching positions within the identified two or more search regions.
- The video coding method of claim 6, wherein first and second block vectors are derived by searching positions within the identified search region, wherein the predictor is generated by using the first and second block vectors.
- The video coding method of claim 12, wherein the predictor is a fusion of first and second reference blocks identified by the first and second block vectors.
- The video coding method of claim 12, wherein the fusion of the first and second reference blocks is weighted based on the matching costs of the first and second block vectors.
- The video coding method of claim 6, further comprising signaling or receiving a syntax element indicating whether an index is used to select a search region from a plurality of search regions for deriving the block vector.
- The video coding method of claim 6, further comprising signaling or receiving a syntax element indicating a size of a search range comprising the plurality of search regions.
- The video coding method of claim 6, wherein the search positions are identified based on a neighboring block vector that is used to code a block neighboring the current block by referencing samples of the current picture.
- An electronic apparatus comprising:a video coder circuit configured to perform operations comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;using first and second block vectors to identify first and second reference blocks respectively in the current picture for the current block;generating a predictor based on the first and second reference blocks identified by first and second block vectors in the current picture;encoding or decoding the current block by using the generated predictor; andstoring the first and second block vectors in a storage for subsequent encoding operations of the current block or a subsequent block.
- A video coding method comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;deriving a block vector by computing matching costs at searching positions within a search region, wherein the matching cost of a search position is a difference between a reference template at the search position and a current template neighboring the current block;generating a predictor based on a reference block identified by the block vector in the current picture; andencoding or decoding the current block by using the generated predictor.
- The video coding method of claim 19, wherein a neighboring block vector is used to code a block neighboring the current block by referencing samples of the current picture, wherein the matching cost of a search position is further based on horizontal and vertical differences between the neighboring block vector and the search position.
- The video coding method of claim 19, wherein the search positions are identified based on a neighboring block vector that is used to code a block neighboring the current block by referencing samples of the current picture.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363480550P | 2023-01-19 | 2023-01-19 | |
US63/480,550 | 2023-01-19 | ||
US202363488984P | 2023-03-08 | 2023-03-08 | |
US63/488,984 | 2023-03-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024152957A1 true WO2024152957A1 (en) | 2024-07-25 |
Family
ID=91955355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/071524 WO2024152957A1 (en) | 2023-01-19 | 2024-01-10 | Multiple block vectors for intra template matching prediction |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024152957A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200084454A1 (en) * | 2015-07-27 | 2020-03-12 | Mediatek Inc. | Method of System for Video Coding Using Intra Block Copy Mode |
US20200112717A1 (en) * | 2018-10-05 | 2020-04-09 | Qualcomm Incorporated | Intra block copy prediction restrictions in video coding |
US20220094925A1 (en) * | 2019-06-07 | 2022-03-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Region based intra block copy |
WO2022063729A1 (en) * | 2020-09-28 | 2022-03-31 | Interdigital Vc Holdings France, Sas | Template matching prediction for versatile video coding |
US20220201285A1 (en) * | 2019-07-19 | 2022-06-23 | Lg Electronics Inc. | Image encoding/decoding method and apparatus using ibc, and method for transmitting bitstream |
-
2024
- 2024-01-10 WO PCT/CN2024/071524 patent/WO2024152957A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200084454A1 (en) * | 2015-07-27 | 2020-03-12 | Mediatek Inc. | Method of System for Video Coding Using Intra Block Copy Mode |
US20200112717A1 (en) * | 2018-10-05 | 2020-04-09 | Qualcomm Incorporated | Intra block copy prediction restrictions in video coding |
US20220094925A1 (en) * | 2019-06-07 | 2022-03-24 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Region based intra block copy |
US20220201285A1 (en) * | 2019-07-19 | 2022-06-23 | Lg Electronics Inc. | Image encoding/decoding method and apparatus using ibc, and method for transmitting bitstream |
WO2022063729A1 (en) * | 2020-09-28 | 2022-03-31 | Interdigital Vc Holdings France, Sas | Template matching prediction for versatile video coding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11178414B2 (en) | Classification for multiple merge tools | |
US11343541B2 (en) | Signaling for illumination compensation | |
US11172203B2 (en) | Intra merge prediction | |
WO2020169082A1 (en) | Intra block copy merge list simplification | |
US11297348B2 (en) | Implicit transform settings for coding a block of pixels | |
WO2019206190A1 (en) | Storage of motion vectors for affine prediction | |
US11245922B2 (en) | Shared candidate list | |
US20240357082A1 (en) | Using template matching for refining candidate selection | |
WO2019161798A1 (en) | Intelligent mode assignment in video coding | |
WO2024152957A1 (en) | Multiple block vectors for intra template matching prediction | |
WO2023202569A1 (en) | Extended template matching for video coding | |
WO2023236916A1 (en) | Updating motion attributes of merge candidates | |
WO2023241347A1 (en) | Adaptive regions for decoder-side intra mode derivation and prediction | |
WO2023217140A1 (en) | Threshold of similarity for candidate list | |
WO2023198187A1 (en) | Template-based intra mode derivation and prediction | |
WO2023208063A1 (en) | Linear model derivation for cross-component prediction by multiple reference lines | |
WO2023197998A1 (en) | Extended block partition types for video coding | |
WO2024222399A1 (en) | Refinement for merge mode motion vector difference | |
WO2023143173A1 (en) | Multi-pass decoder-side motion vector refinement | |
WO2024213123A1 (en) | Intra-block copy with subblock modes and template matching | |
WO2024016955A1 (en) | Out-of-boundary check in video coding | |
WO2023198105A1 (en) | Region-based implicit intra mode derivation and prediction | |
WO2024022144A1 (en) | Intra prediction based on multiple reference lines | |
WO2023236775A1 (en) | Adaptive coding image and video data | |
WO2023241340A1 (en) | Hardware for decoder-side intra mode derivation and prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24744133 Country of ref document: EP Kind code of ref document: A1 |