CN113056915B

CN113056915B - Use of collocated blocks in sub-block based temporal motion vector prediction modes

Info

Publication number: CN113056915B
Application number: CN201980076019.5A
Authority: CN
Inventors: 张凯; 张莉; 刘鸿彬; 许继征; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2018-11-22
Filing date: 2019-11-22
Publication date: 2024-03-26
Anticipated expiration: 2039-11-22
Also published as: CN113056920A; CN113056915A; US20210152816A1; KR102660160B1; US20210377521A1; WO2020103944A1; US20210160531A1; EP4325849A2; CN113170198A; US12069239B2; EP4325849A3; WO2020103940A1; US11632541B2; JP2023156316A; US11671587B2; KR20210090176A; US20210185346A1; CN113056916A; CN113170198B; US11431964B2

Abstract

Apparatus, systems, and methods for digital video coding are described, including sub-block based inter prediction methods. An example method for video processing includes: for a transition between a current block of video and a bitstream representation of the video, determining a maximum candidate in a sub-block-based Merge candidate list and/or adding a sub-block-based Merge candidate into the sub-block-based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled or whether Current Picture Reference (CPR) codec mode is used for the transition during the transition; and performing the conversion based on the determination.

Description

Use of collocated blocks in sub-block based temporal motion vector prediction modes

Cross Reference to Related Applications

The present application is a chinese national phase application of international patent application No. pct/CN2019/120311, filed on 22 th 11 th 2019, which in time claims priority and benefit of international patent application No. pct/CN2018/116889, filed on 22 th 11 th 2018, international patent application No. pct/CN2018/125420, filed on 13 th 8 th 2019, and international patent application No. pct/CN2019/100396, filed on 22 th 9 th 2019. The entire disclosure of the above application is incorporated by reference as part of the disclosure of this application.

Technical Field

This patent document relates to image and video codec and decoding.

Background

Despite advances in video compression, digital video is still a maximum bandwidth usage over the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth required for digital video usage is expected to continue to increase.

Disclosure of Invention

Apparatuses, systems, and methods related to digital video coding are described, including sub-block based inter prediction methods. The described methods may be applied to existing video codec standards (e.g., high Efficiency Video Codec (HEVC) and/or general video codec (VVC)) and future video codec standards or video codecs.

In one representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a transition between a current block of video and a bitstream representation of video, determining a maximum number of candidates (ML) in a sub-block based Merge candidate list and/or adding a sub-block based Merge candidate into the sub-block based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled or whether a Current Picture Reference (CPR) codec mode is used for the transition during the transition; and performing the conversion based on the determination.

In another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a transition between a current block of video and a bitstream representation of video, determining a maximum candidate number (ML) in a sub-block-based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled during the transition, sub-block-based temporal motion vector prediction (SbTMVP), and affine codec mode; and performing the conversion based on the determination.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: determining, for a transition between a current block of a first video segment of the video and a bitstream representation of the video, that subblock-based motion vector prediction (SbTMVP) mode is disabled for the transition due to Temporal Motion Vector Prediction (TMVP) mode being disabled at a first video segment level; and performing a conversion based on the determination, wherein the bitstream represents a conforming format, the format specifying whether the indication of SbTMVP mode is included and/or a position of the indication of SbTMVP mode relative to the indication of TMVP mode in the Merge candidate list.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: performing a conversion between a current block of video encoded using a sub-block based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool and a bitstream representation of the video, wherein coordinates of corresponding locations of the current block or sub-block of the current block are selectively masked using a mask based on compression of a motion vector associated with the SbTMVP tool or the TMVP tool, and wherein application of the mask includes a bitwise and operation between a value of the calculated coordinates and a value of the mask.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: determining a valid corresponding region of a current block of a video segment of the video based on one or more characteristics of the current block for application of a sub-block based motion vector prediction (SbTMVP) tool on the current block; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: determining a default motion vector for a current block of video encoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool; and based on the determining, performing a transition between the current block and the bitstream representation of the video, wherein a default motion vector is determined without obtaining a motion vector from a block that covers a corresponding position in the collocated picture associated with the center position of the current block.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a current block of a video segment of video, inferring that a sub-block based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool is disabled for the video segment, where M and X are integers, and where x=0 or x=1, where the current picture of the current block is a reference picture with an index set to M in a reference picture list X; and performing a transition between the current block and the bitstream representation of the video based on the inference.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a current block of video, determining to enable application of a sub-block based temporal motion vector prediction (SbTMVP) tool in the case where a current picture of the current block is a reference picture with an index set to M in a reference picture list X, where M and X are integers; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: performing a conversion between a current block of video and a bitstream representation of the video, wherein the current block is encoded with a sub-block based encoding tool, and wherein performing the conversion comprises encoding a sub-block Merge index using a plurality of binary bits (N) by a unified method with a sub-block based temporal motion vector prediction (SbTMVP) tool enabled or disabled.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a current block of video encoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool, determining a motion vector used by the SbTMVP tool to locate a corresponding block in a picture different from a current picture including the current block; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a transition between a current block of video and a bitstream representation of video, determining whether to insert a zero motion affine Merge candidate into a sub-block Merge candidate list based on whether affine prediction is enabled for the transition of the current block; and performing a conversion based on the determination.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a transition between a current block of a video using a sub-block Merge candidate list and a bitstream representation of the video, inserting zero-motion non-affine fill candidates into the sub-block Merge candidate list if the sub-block Merge candidate list is not full; and performing the conversion after the inserting.

In yet another representative aspect, the disclosed techniques can be used to provide a method of video processing. The method comprises the following steps: for a transition between a current block of video and a bitstream representation of video, determining a motion vector using rules that determine that the motion vector is derived from one or more motion vectors covering blocks in corresponding positions in the collocated picture; and performing conversion based on the motion vector.

In yet another example aspect, a video codec device is disclosed. The video codec device includes a processor configured to implement the methods described herein.

In yet another example aspect, a video decoder apparatus is disclosed. The video decoder apparatus comprises a processor configured to implement the methods described herein.

In another aspect, a computer-readable medium having code stored thereon is disclosed. The code, when executed by a processor, causes the processor to implement the methods described in this document.

These and other aspects are described in this document.

Drawings

Fig. 1 is an example of a derivation process for Merge candidate list construction.

FIG. 2 illustrates example locations of airspace Merge candidates.

Fig. 3 shows an example of candidate pairs that consider redundancy checks for spatial Merge candidates.

Fig. 4A and 4B illustrate example locations of second Prediction Units (PUs) of nx2n and 2nxn partitions.

Fig. 5 is an example illustration of motion vector scaling for a temporal Merge candidate.

Fig. 6 shows example candidate locations C0 and C1 of the time domain Merge candidate.

Fig. 7 shows an example of a combined bi-prediction Merge candidate.

Fig. 8 shows an example derivation process for motion vector prediction candidates.

Fig. 9 is an illustration of motion vector scaling for spatial motion vector candidates.

Fig. 10 shows an example of an Alternative Temporal Motion Vector Prediction (ATMVP) motion prediction for a CU.

FIG. 11 shows an example of one CU with four sub-blocks (A-D) and their neighboring blocks (a-D).

Fig. 12 is a flowchart of an example of encoding with different MV precision.

Fig. 13A and 13B show a division type of 135 degrees (division from the upper left corner to the lower right corner) and a division pattern of 45 degrees, respectively.

Fig. 14 shows an example of the positions of adjacent blocks.

Fig. 15 shows an example of upper and left blocks.

Fig. 16A and 16B show examples of 2 Control Point Motion Vectors (CPMV) and 3 CPMV, respectively.

Fig. 17 shows an example of affine Motion Vector Field (MVF) of each sub-block.

Fig. 18A and 18B show examples of 4 and 6 parametric affine models, respectively.

Fig. 19 shows an example of MVP of af_inter of inherited affine candidates.

Fig. 20 shows an example of constructing an affine motion predictor in af_inter.

Fig. 21A and 21B show examples of control point motion vectors in affine coding in af_merge.

Fig. 22 shows an example of candidate positions of the affine Merge mode.

Fig. 23 shows an example of an intra picture block copy operation.

Fig. 24 shows an example of valid corresponding areas in the juxtaposed pictures.

Fig. 25 shows an example flow chart for history-based motion vector prediction.

Fig. 26 shows a modified Merge list construction process.

Fig. 27 shows an example embodiment of the proposed active area when the current block is within the base area.

Fig. 28 illustrates an example embodiment of an active area when a current block is not within a base area.

Fig. 29A and 29B show a prior example and a proposed example, respectively, for identifying the location of default motion information.

Fig. 30-42 are flowcharts of examples for video processing methods.

Fig. 43 is a block diagram of an example of a hardware platform for implementing the visual media decoding or visual media encoding techniques described herein.

FIG. 44 is a block diagram of an example video processing system in which the disclosed techniques may be implemented.

Detailed Description

The present document provides various techniques that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video or images. Furthermore, the video codec may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

Chapter headings are used in this document for ease of understanding and do not limit the embodiments and techniques to the corresponding chapters. Thus, one part of the embodiments may be combined with other part of the embodiments.

1. Overview of the invention

This patent document relates to video codec technology. In particular, it relates to motion vector coding in video coding. It may be applied to existing video codec standards (e.g., HEVC), or to pending standards (multi-function video codec). It may also be applicable to future video codec standards or video codecs.

2. Introduction to the invention

Video codec standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. The ITU-T makes H.261 and H.263, the ISO/IEC makes MPEG-1 and MPEG-4 Visual, and the two organizations together make the H.262/MPEG-2 video and H.264/MPEG-4 enhanced video codec (AVC) and H.265/HEVC standards. Starting from h.262, the video codec standard is based on a hybrid video codec structure, where temporal prediction and transform coding are utilized. To explore future video codec techniques beyond HEVC, VCEG and MPEG have combined to establish a joint video exploration team (jfet) in 2015. Thereafter, jfet takes in many new approaches and introduces them into reference software named "joint exploration model" (JEM). In month 4 2018, a joint video expert team (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) holds to aim at a VVC standard that is aimed at 50% bit rate reduction compared to HEVC.

The latest version of VVC draft can be found at the following position, i.e. the multi-function video codec (draft 3):

http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/12_Macao/ wg11/JVET-L1001-v2.zip

the latest reference software for a VVC named VTM can be found at the following locations:

https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags/VTM-3.0rc1

2.1 inter prediction in HEVC/H.265

Each inter predicted PU has one or two reference picture list motion parameters. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists may also be signaled using inter predidc. The motion vector may be explicitly encoded as an increment relative to the predictor.

When a CU is encoded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no encoded motion vector delta or reference picture index. A Merge mode is specified by which the motion parameters of the current PU can be obtained from neighboring PUs (including spatial and temporal candidates). The Merge mode may be applied to any inter-predicted PU, not just the skip mode. Another option for the Merge mode is explicit transmission of motion parameters, wherein motion vectors (more precisely, motion Vector Differences (MVDs) compared to motion vector predictors), corresponding reference picture indices for each reference picture list, and the use of reference picture lists are explicitly signaled per PU. In this disclosure, such a mode is named Advanced Motion Vector Prediction (AMVP).

When the signaling indicates that one of the two reference picture lists is to be used, a PU is generated from a sample block. This is called "unidirectional prediction". Unidirectional prediction is available for both P-stripes (slice) and B-stripes.

When the signaling indicates that two reference picture lists are to be used, a PU is generated from the two sample blocks. This is called "bi-prediction". Bi-directional prediction is only available for B-stripes.

Details regarding inter prediction modes specified in HEVC are provided below. The description will start from the Merge mode.

2.1.1 reference Picture List

In HEVC, the term inter-prediction is used to refer to predictions derived from data elements (e.g., sample values or motion vectors) of reference pictures other than the currently decoded picture. As in h.264/AVC, pictures can be predicted from multiple reference pictures. The reference pictures for inter prediction are organized in one or more reference picture lists. The reference index identifies which reference pictures in the list should be used to create the prediction signal.

A single reference picture list, list 0 for P slices, two reference picture lists, list 0 and list 1 for B slices. It should be noted that in terms of capture/display order, the reference pictures contained in list 0/1 may come from past and future pictures.

2.1.2Merge mode

2.1.2.1 derivation of candidates for Merge mode

When predicting a PU using the Merge mode, an index to an entry in the Merge candidate list is parsed from the bitstream and used to retrieve motion information. The construction of this list is specified in the HEVC standard (construction), and can be summarized according to the following sequence of steps:

step 1: original candidate derivation

Step 1.1: spatial candidate derivation

Step 1.2: redundancy check of airspace candidates

Step 1.3: time domain candidate derivation

Step 2: inserting additional candidates

Step 2.1: creating bi-prediction candidates

Step 2.2: inserting zero motion candidates

These steps are also schematically shown in fig. 1. For spatial-domain Merge candidate derivation, a maximum of four Merge candidates are selected among the candidates located at five different positions. For time domain Merge candidate derivation, a maximum of one Merge candidate is selected among the two candidates. Since a constant number of candidates is assumed for each PU at the decoder, when the number of candidates obtained from step 1 does not reach the maximum number of Merge candidates signaled in the stripe header (MaxNumMergeCand), additional candidates are generated. Since the number of candidates is constant, truncated unary binarization (Truncated Unary binarization, TU) is used to encode the index of the best Merge candidate. If the size of the CU is equal to 8, then all PUs of the current CU share a single Merge candidate list, which is the same as the Merge candidate list of the 2Nx2N prediction unit.

Hereinafter, the operations associated with the foregoing steps are described in detail.

Derivation of 2.1.2.2 spatial candidates

In the derivation of the spatial-domain Merge candidates, up to four Merge candidates are selected among candidates located at the positions depicted in fig. 2. The deduced sequence is A ₁ 、B ₁ 、B ₀ 、A ₀ And B ₂ . Only when position A ₁ 、B ₁ 、B ₀ 、A ₀ Position B is only considered if any PU of (e.g. because it belongs to another slice or slice) is not available or is intra-coded ₂ . In addition position A ₁ After the candidates at this point, redundancy check is performed on the addition of the remaining candidates, which ensures that candidates having the same motion information are excluded from the list, so that the encoding efficiency is improved. In order to reduce the computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only the pairs connected by arrows in fig. 3 are considered, and candidates are added to the list only when the corresponding candidates for redundancy check have different motion information. Another source of duplicate motion information is a "second PU" associated with a partition other than 2n×2n. As an example, fig. 4A and 4B depict a second PU for the case of nx2n and 2nxn, respectively. When the current PU is partitioned into N2N, position A ₁ Candidates at which are not considered for list construction. In practice, adding this candidate will result in two prediction units with the same motion information, which is redundant for having only one PU in the coding unit. Similarly, when the current PU is partitioned into 2N×N, position B is not considered ₁ 。

Derivation of 2.1.2.3 time domain candidates

In this step, only one candidate is added to the list. In particular, in the derivation of the temporal Merge candidate, the scaled motion vector is derived based on collocated PUs belonging to the picture within the given reference picture list that has the smallest POC difference from the current picture. The reference picture list of PUs to be used for deriving the concatenation is explicitly signaled in the slice header. As shown by the dashed line in fig. 5, a scaled motion vector for the temporal mere candidate is obtained, which is scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal Merge candidate is set equal to zero. The actual implementation of the scaling procedure is described in the HEVC specification. For the B slice, two motion vectors are obtained, one for reference picture list 0 and the other for reference picture list 1, and the two motion vectors are combined to obtain the bi-prediction Merge candidate.

Fig. 5 is an illustration of the derivation of motion vector scaling for the temporal Merge candidate.

In the collocated PU (Y) belonging to the reference frame, in the candidate C ₀ And C ₁ The locations of the time domain candidates are selected in between, as shown in fig. 6. If position C ₀ The PU at which is unavailable, intra-coded, or outside the current coding tree unit (CTU aka. Lcu, maximum coding unit) line, then position C is used ₁ . Otherwise, position C ₀ For the derivation of time domain mere candidates.

Fig. 6 shows examples C0 and C1 of candidate locations of the time domain Merge candidates.

2.1.2.4 Additional (Additional) candidate insertions

In addition to spatial and temporal Merge candidates, there are two additional types of Merge candidates: combined bi-predictive Merge candidate and zero Merge candidate. A combined bi-predictive Merge candidate is generated by utilizing the spatial and temporal Merge candidates. The combined bi-predictive Merge candidate is only for the B stripe. A combined bi-prediction candidate is generated by combining the first reference picture list motion parameter of the original candidate with the second reference picture list motion parameter of another candidate. If the two tuples provide different motion hypotheses they will form a new bi-prediction candidate. For example, fig. 7 shows the case when there are mvL0 and refIdxL0 or mvL1 and refIdxL1 in two candidates (left side) in the original list, which are used to create a combined bi-prediction Merge candidate added to the final list (right side). There are many rules regarding the combination that are considered to generate these additional Merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list, thereby achieving MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index that starts from zero and increases each time a new zero motion candidate is added to the list.

More specifically, the following steps are performed in order until the Merge list is full:

1. setting a variable numRef to the number of reference pictures associated with list 0 for the P slice, or to the minimum number of reference pictures in both lists for the B slice;

2. adding non-repeated zero motion candidates:

for a variable i of 0 … numRef-1, for list 0 (if P slices) or two lists (if B slices), a default motion candidate is added (MV is set to (0, 0), and the reference picture index is set to i).

3. Repeated zero motion candidates are added, where MV is set to (0, 0), the reference picture index of list 0 is set to 0 (if P slices), and the reference picture indexes of both lists are set to 0 (if B slices).

Finally, no redundancy check is performed on these candidates.

2.1.3 Advanced Motion Vector Prediction (AMVP)

AMVP exploits the spatio-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is constructed by first checking the availability above the temporally adjacent PU locations, left side, removing redundant candidates and adding zero vectors to make the candidate list a constant length. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, truncated unary is used to encode the index of the best motion vector candidate. The maximum value to be encoded in this case is 2 (see fig. 8). In the following section, details are provided regarding the derivation process of motion vector prediction candidates.

Derivation of 2.1.3.1AMVP candidates

Fig. 8 outlines the derivation of motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial domain motion vector candidates and temporal motion vector candidates. For spatial domain motion vector candidate derivation, two motion vector candidates are ultimately derived based on the motion vector of each PU located at five different locations as shown in fig. 2.

For temporal motion vector candidate derivation, one motion vector candidate is selected from the two candidates, which is derived based on two different collocated positions. After generating the first list of spatio-temporal candidates, the repeated motion vector candidates in the list are removed. If the number of potential candidates is greater than 2, motion vector candidates within the associated reference picture list whose reference picture index is greater than 1 are removed from the list. If the number of space-time motion vector candidates is less than 2, additional zero motion vector candidates are added to the list.

2.1.3.2 spatial motion vector candidates

In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived from PUs located at the positions shown in fig. 2, those positions being the same as the positions of the motion Merge. The derivation order of the left side of the current PU is defined as A ₀ 、A ₁ And scaled A ₀ Scaled A ₁ . Defining the derivation order of the upper side of the current PU as B ₀ 、B ₁ 、B ₂ Scaled B ₀ Scaled B ₁ Scaled B ₂ . Thus, for each side, there are four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling, and two of which use spatial scaling. Four different cases are summarized below.

Non-spatial scaling

- (1) identical reference picture list, and identical reference picture index (identical POC) - (2) different reference picture list, but identical reference picture (identical POC)

Spatial domain scaling

- (3) identical reference picture list, but different reference pictures (different POC)

- (4) different reference picture list, different reference picture (different POC)

The case of no spatial scaling is checked first, and then the spatial scaling is checked. Spatial scaling is considered when POC differs between the reference picture of a neighboring PU and the reference picture of the current PU regardless of the reference picture list. If all PUs of the left candidate are not available or are intra coded, the above motion vectors are allowed to be scaled to aid in the parallel derivation of the left and upper MV candidates. Otherwise, spatial scaling of the motion vector is not allowed.

In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner as in the temporal scaling. As shown in fig. 9, the main difference is that the reference picture list and the index of the current PU are given as inputs; the actual scaling process is the same as the time domain scaling process.

2.1.3.3 temporal motion vector candidates

All procedures for deriving temporal Merge candidates are the same as those for deriving spatial motion vector candidates except for reference picture index derivation (see fig. 6). The decoder is signaled with reference picture index.

Motion vector prediction method based on sub CU in 2.2JEM

In JEM with QTBT, each CU may have at most one set of motion parameters for each prediction direction. By dividing the large CU into sub-CUs and deriving the motion information of all sub-CUs of the large CU, two sub-CU level motion vector prediction methods are considered in the encoder. An optional temporal motion vector prediction (Alternative Temporal Motion Vector Prediction, ATMVP) method allows each CU to extract multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference picture. In the Spatial-temporal motion vector prediction (STMVP) method, the motion vector of a sub-CU is recursively derived by using a time-domain motion vector predictor and a Spatial neighboring motion vector.

In order to preserve a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.

Fig. 10 shows an example of ATMVP motion prediction for a CU.

2.2.1 optional temporal motion vector prediction

In an Alternative Temporal Motion Vector Prediction (ATMVP) method, motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. In some implementations, the sub-CUs are square nxn blocks (N is set to 4 by default).

ATMVP predicts the motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture using a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain a motion vector and a reference index for each sub-CU from the block corresponding to each sub-CU.

In a first step, a reference picture and a corresponding block are determined from motion information of spatial neighboring blocks of the current CU. To avoid the repeated scanning process of neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the temporal vector and the index of the motion source picture. In this way, in an ATMVP, the corresponding block (sometimes referred to as a collocated block) can be more accurately identified than a TMVP, where the corresponding block is always located at a lower right or center position relative to the current CU.

In a second step, by adding a temporal vector to the coordinates of the current CU, the corresponding block of the sub-CU is identified by the temporal vector in the motion source picture. For each sub-CU, the motion information of its corresponding block (the smallest motion grid covering the center sample) is used to derive the motion information of the sub-CU. After the motion information of the corresponding nxn block is identified, it is converted into a reference index and a motion vector of the current sub-CU in the same manner as TMVP of HEVC, wherein motion scaling and other processes are also applicable. For example, the decoder checks whether a low-latency condition is satisfied (i.e., POC of all reference pictures of the current picture is smaller than POC of the current picture) and possibly uses the motion vector MV _x Predicting motion vector MV of each sub CU (motion vector corresponding to reference picture list X) _y (wherein X equals 0 or 1 and Y equals 1-X).

2.2.2 space-time motion vector prediction (STMVP)

In this method, the motion vectors of the sub-CUs are recursively deduced in raster scan order. Fig. 11 shows this concept. Let us consider an 8 x 8CU comprising four 4 x 4 sub-CUs a, B, C and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c and d.

The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is an nxn block (block c) above sub CU a. If this block c is not available or is intra coded, then (starting from block c, left to right) the other N blocks above sub-CU A are checked. The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or is intra coded, then (starting from block b, top to bottom) the other blocks to the left of sub-CU A are checked. The motion information obtained from the neighboring blocks of each list is scaled to the first reference frame of the given list. Next, temporal motion vector prediction of sub-block a is derived by following the same procedure as TMVP derivation specified in HEVC (Temporal Motion Vector Predictor, TMVP). Motion information of the collocated block at position D is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged separately for each reference list. The average motion vector is designated as the motion vector of the current sub-CU.

2.2.3 sub-CU motion prediction mode Signaling

The sub-CU mode is enabled as an additional Merge candidate and no additional syntax elements are needed to signal the mode. Two additional Merge candidates are added to the Merge candidate list for each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, a maximum of seven Merge candidates are used. The coding logic of the additional Merge candidates is the same as the Merge candidates in the HM, which means that for each CU in the P or B slices, two additional Merge candidates require two more RD checks.

In JEM, all bins (bins) of the Merge index are context coded by CABAC. Whereas in HEVC, only the first bin is context coded, while the remaining bins are context bypass coded.

2.3 inter prediction method in VVC

There are several new coding tools for inter prediction improvement, such as adaptive motion vector differential resolution (AMVR), affine prediction mode, triangular Prediction Mode (TPM), ATMVP, generalized bi-prediction (GBI), bi-directional optical flow (BIO) for signaling MVD.

2.3.1 adaptive motion vector differential resolution

In HEVC, when use_integer_mv_flag in the slice header is equal to 0, a motion vector difference (Motion Vector Difference, MVD) between the motion vector of the PU and the predicted motion vector) is signaled in quarter luma samples. In VVC, a locally adaptive motion vector resolution (Locally Adaptive Motion Vector Resolution, LAMVR) is introduced. In VVC, MVD may be encoded in units of quarter-luminance samples, integer-luminance samples, or four-luminance samples (i.e., 1/4 pixel, 1 pixel, 4 pixel). The MVD resolution is controlled at the Coding Unit (CU) level and a MVD resolution flag is conditionally signaled for each CU having at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that the quarter-luma sample MV precision is not used, another flag is signaled to indicate whether the integer-luma sample MV precision or the four-luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero or is not coded for the CU (meaning that all MVDs in the CU are zero), a quarter luma sample MV resolution is used for the CU. When the CU uses integer luminance sample MV precision or four luminance sample MV precision, the MVPs in the AMVP candidate list of the CU are rounded to the corresponding precision.

In the encoder, a CU level RD check is used to determine which MVD resolution to use for the CU. That is, three CU-level RD checks are performed for each MVD resolution. In order to accelerate the encoder speed, the following encoding scheme is applied in JEM.

During RD checking of a CU with conventional quarter-luma sample MVD resolution, the motion information (integer luma sample accuracy) of the current CU is stored. The stored motion information (after rounding) is used as a starting point for further small range motion vector refinement during RD checking for the same CU with integer luminance sample and 4 luminance sample MVD resolution, so that the time consuming motion estimation process is not repeated three times.

Conditionally invoke RD checking of CUs with 4 luma samples MVD resolution. For a CU, when the RD cost integer luminance sample MVD resolution is much greater than the quarter luminance sample MVD resolution, the RD check for the 4 luminance sample MVD resolution of the CU is skipped.

The encoding process is shown in fig. 12. First, 1/4 pixel MVs are tested and RD costs are calculated and expressed as RDCost0, then integer MVs are tested and RD costs are expressed as RDCost1. If RDCost1< th RDCost0 (where th is a positive value), then 4 pixels MV are tested; otherwise, 4 pixels MV are skipped. Basically, when checking integer or 4-pixel MVs, motion information and RD costs, etc. are known for 1/4-pixel MVs, which can be reused to speed up the encoding process of integer or 4-pixel MVs.

2.3.2 triangular prediction modes

The concept of the triangulation mode (TPM) is to introduce new triangulation for motion compensated prediction. As shown. Referring to fig. 13A-13B, a CU is divided into two triangular prediction units in a diagonal direction or a diagonal reverse direction. Each triangular prediction unit in the CU uses the reference frame index derived from the single unidirectional prediction candidate list and its own unidirectional prediction motion vector for inter prediction. After the triangular prediction unit is predicted, an adaptive weighting process is performed on the diagonal edges. Then, the transform and quantization process is applied to the entire CU. Note that this mode applies only to the Merge mode (note: skip mode is considered a special Merge mode).

Fig. 13A-13B are diagrams of partitioning a CU into two triangular prediction units (two partition modes): 13A:135 degree partition type (partition from upper left corner to lower right corner); fig. 13B:45 degree split mode.

2.3.2.1TPM unidirectional prediction candidate list

The unidirectional prediction candidate list called TPM motion candidate list consists of five unidirectional prediction motion vector candidates. As shown in fig. 14, it is derived from seven adjacent blocks comprising five spatial adjacent blocks (1 to 5) and two temporal juxtaposed blocks (6 to 7). According to the order of the unidirectional predicted motion vector, the L0 motion vector of the bidirectional predicted motion vector, the L1 motion vector of the bidirectional predicted motion vector, and the average motion vector of the L0 and L1 motion vectors of the bidirectional predicted motion vector, the motion vectors of seven adjacent blocks are collected and put into a unidirectional prediction candidate list. If the number of candidates is less than five, zero motion vectors are added to the list. The motion candidates added for the TPM in this list are referred to as TPM candidates, and the motion information derived from the spatial/temporal blocks is referred to as conventional motion candidates.

And more particularly to the steps of:

1)when adding conventional motion candidates from spatial neighboring blocks, complete clipping operation is performed From A ₁ ，B ₁ ，B ₀ ，A ₀ ，B ₂ Conventional motion candidates are obtained in Col and Col2 (corresponding to blocks 1-7 in fig. 14).

2) Setting variable numcurrmerrgecand=0

3) For slave A ₁ ，B ₁ ，B ₀ ，A ₀ ，B ₂ Each conventional motion candidate deduced for Col and Col2 (if not clipped and numCurrMergeCand is less than 5), if the conventional motion candidate is uni-directionally predicted (from list 0 or list 1), it is added directly to the Merge list as a TPM candidate and numCurrMergeCand is incremented by 1. Such TPM candidates are referred to as "candidates for original unidirectional predictions".

Application ofComplete cutting。

4) For slave A ₁ ，B ₁ ，B ₀ ，A ₀ ，B ₂ Each motion candidate that Col and Col2 derive (if not clipped and numCurrMergeCand is less than 5), if the regular motion candidate is bi-predicted, then the motion information in list 0 is added to the TPM Merge list as a new TPM candidate (i.e., modified to be uni-directional prediction from list 0), and numCurrMergeCand is incremented by 1. Such TPM candidates are referred to as "Truncated (Truncated) list 0 predicted candidates".

Application ofComplete cutting。

5) For slave A ₁ ，B ₁ ，B ₀ ，A ₀ ，B ₂ Each motion candidate (if not clipped and numCurrMergeCand less than 5) derived by Col and Col2, if the regular motion candidate is bi-predictive, then the motion information in list 1 is added to the TPM Merge list (i.e., modified to uni-directional prediction from list 1), and num CurrMergeCand increases by 1. Such TPM candidates are referred to as "truncated list 1 predicted candidates".

Application ofComplete cutting。

6) For slave A ₁ ，B ₁ ，B ₀ ，A ₀ ，B ₂ Each motion candidate deduced for Col and Col2 (if not clipped and numCurrMergeCand less than 5), if the normal motion candidate is bi-predictive,

if the band QP of the list 0 reference picture is smaller than the band QP of the list 1 reference picture, then the motion information of list 1 is first scaled to the list 0 reference picture and the average of two MVs (one from the original list 0 and the other is the scaled MV from list 1) is added to the tpmmorge list, such candidates being referred to as average uni-directional predictions from the list 0 motion candidate and numCurrMergeCand increased by 1.

Otherwise, the motion information of list 0 is scaled to list 1 reference picture first, then the average of two MVs (one from original list 1 and the other from list 0 scaled MV) is added to the TPM Merge list, such a TPM candidate is called average unidirectional prediction from list 1 motion candidate, and numCurrMergeCand is incremented by 1.

Application ofComplete cutting。

7) If numCurrMergeCand is less than 5, zero motion vector candidates are added.

When a candidate is inserted into the list, this process is called complete clipping if it must be compared to all previously added candidates to see if it is the same as one of them.

2.3.2.2 adaptive weighting procedure

After predicting each of the triangular prediction units, an adaptive weighting process is applied to the diagonal edges between the two triangular prediction units to derive the final prediction of the entire CU. Two weighting factor sets are defined as follows:

first set of weighting factors: {7/8, 6/8, 4/8, 2/8, 1/8} and {7/8, 4/8, 1/8} are used for luminance and chrominance samples, respectively;

second set of weighting factors: {7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8} and {6/8, 4/8, 2/8} are used for luminance and chrominance samples, respectively.

A set of weighting factors is selected based on a comparison of the motion vectors of the two triangular prediction units. The second set of weighting factors is used when the reference pictures of the two triangular prediction units are different from each other or their motion vectors differ by more than 16 pixels. Otherwise, the first set of weighting factors is used.

Signaling of 2.3.2.3 Triangle Prediction Mode (TPM)

A one-bit flag indicating whether to use the TPM may first be signaled. Then, an indication of the two partition modes is further signaled (as shown in fig. 13A-13B), along with a Merge index selected for each of the two partitions.

Signaling of 2.3.2.3.1TPM flags

Let us denote the width and height of a luminance block by W and H, respectively. If W x H <64, the triangulation mode is disabled.

When a block is encoded in affine mode, the triangular prediction mode is also disabled.

When a block is encoded in Merge mode, a one bit flag may be signaled to indicate whether or not the triangulation mode is enabled or disabled for the block.

The flags are encoded with 3 contexts based on the following equation:

ctx index= ((left block L available;

FIG. 15 shows an example of neighboring blocks (A and L) for context selection in TPM flag encoding.

2.3.2.3.2 signals an indication of the two partitioning modes (as shown in fig. 13), and the Merge index selected for each of the two partitions

Note that the partition mode and the two partitioned Merge indexes are jointly encoded. In existing implementations, it is limited that two partitions cannot use the same reference index. Therefore, there are 2 (partition mode) N (maximum number of Merge candidates) x (N-1) possibilities, where N is set to 5. An indication is encoded and the mapping between the partitioning pattern, the two Merge indices and the encoded indication is derived from the array defined below:

const uint8_t g_TriangleCombination[TRIANGLE_MAX_NUM_CANDS][3]＝{

{0,1,0},{1,0,1},{1,0,2},{0,0,1},{0,2,0},

{1,0,3},{1,0,4},{1,1,0},{0,3,0},{0,4,0},

{0,0,2},{0,1,2},{1,1,2},{0,0,4},{0,0,3},

{0,1,3},{0,1,4},{1,1,4},{1,1,3},{1,2,1},

{1,2,0},{0,2,1},{0,4,3},{1,3,0},{1,3,2},

{1,3,4},{1,4,0},{1,3,1},{1,2,3},{1,4,1},

{0,4,1},{0,2,3},{1,4,2},{0,3,2},{1,4,3},

{0,3,1},{0,2,4},{1,2,4},{0,4,2},{0,3,4}}；

division pattern (45 degrees or 135 degrees) =g_triamcomination [ signaled indication ] [0];

The Merge index of candidate a = g_triangeecombication [ signaled indication ] [1];

the Merge index of candidate B = g_triangeecombication [ signaled indication ] [2];

once the two motion candidates a and B are derived, two partitioned motion information (PU 1 and PU 2) may be set from a or B. Whether PU1 uses the motion information of Merge candidate a or B depends on the prediction directions of the two motion candidates. Table 1 shows the relationship between two derived motion candidates a and B with two segmentations.

Table 1: deriving segmented motion information from the derived two Merge candidates (a, B)

merge_trie_idx is within the range of [0, 39], inclusive. The K-th order Exponential Golomb (EG) code is used for binarization of merge_triange_idx, where K is set to 1.

EG of the K th order

To encode larger numbers with fewer bits (at the cost of encoding smaller numbers using more bits), the non-negative integer parameter k may be used for generalization. Encoding a non-negative integer x with an exponential Golomb of order k (exp-Golomb) code:

1. using the 0 th order exp-Golomb code pair described aboveCoding is then->

2. Will x mod 2 ^k Encoded as binary bits.

Table 1: exp-Golomb-k coding examples

2.3.3 affine motion compensated prediction

In HEVC, motion Compensated Prediction (MCP) applies only translational motion models. However, there may be various movements in the real world, such as zoom in/out, rotation, perspective movement, and other irregular movements. In VVC, the reduced affine transformation motion compensation prediction is applied to a 4-parameter affine model and a 6-parameter affine model. 16A-16B, the affine motion field of a block is described by two Control Point Motion Vectors (CPMVs) for a 4-parameter affine model (FIG. 16A) and 3 CPMVs for a 6-parameter affine model (FIG. 16B).

The Motion Vector Field (MVF) of a block is described by the following equation with a 4-parameter affine model in equation (1) where 4 parameters are defined as variables a, b, e and f and a 6-parameter affine model in equation (2) where 6 parameters are defined as variables a, b, c, d, e and f:

wherein (mv) ^h ₀ ,mv ^h ₀ ) Is the motion vector of the upper left corner control point, (mv) ^h ₁ ,mv ^h ₁ ) Is the motion vector of the upper right corner control point, (mv) ^h ₂ ,mv ^h ₂ ) Is the motion vector of the motion vector control point in the lower left corner, all three motion vectors are referred to as Control Point Motion Vectors (CPMV), (x, y) representing the coordinates of the representative point relative to the upper left corner sample point in the current block, and (mv) ^h (x,y),mv ^v (x, y)) is a motion vector derived for the samples located at (x, y). The CP motion vector may be signaled (e.g., in affine AMVP mode) or derived on-the-fly (e.g., in affine Merge mode). w and h are the width and height of the current block. In practice, division is achieved by rounding and right shifting. In VTM, a representative point is defined as the center position of a sub-block, for example, when the coordinates of the upper left corner of the sub-block with respect to the upper left corner sample point in the current block are (xs, ys), the coordinates of the representative point are defined as (xs+2, ys+2). For each sub-block (i.e. 4x4 in VTM), the representative point is used to derive the motion vector for the entire sub-block.

To further simplify motion compensated prediction, sub-block based affine transformation prediction is applied. In order to derive the motion vector of each m×n (in the current VVC, M and N are both set to 4) sub-block, as shown in fig. 17, the motion vector of the center sample of each sub-block is calculated according to equations (1) and (2), and rounded to a fractional precision of 1/16. Then, a motion compensated interpolation filter suitable for 1/16 pixels is applied to generate a prediction for each sub-block with a derived motion vector. Affine mode introduces a 1/16 pixel interpolation filter.

After MCP, the high precision motion vector for each sub-block is rounded and saved to the same precision as the conventional motion vector.

2.3.3.1 Signaling of affine predictions

Similar to the translational motion model, there are also two modes for signaling auxiliary information due to affine prediction. They are affine_inter and affine_merge modes.

2.3.3.2AF_INTER mode

For CUs with width and height both greater than 8, the af_inter mode may be applied. Affine flags at CU level are signaled in the bitstream to indicate whether af_inter mode is used.

In this mode, for each reference picture list (list 0 or list 1), the affine AMVP candidate list is composed of three types of affine motion predictors in the following order, with each candidate including the estimated CPMV of the current block. Signaling on the encoder side (such as mv in fig. 20 ₀ ，mv ₁ ，mv ₂ ) The difference between the best CPMV found and the estimated CPMV. In addition, the index of affine AMVP candidates from which the estimated CPMV is derived is further signaled.

1) Inherited affine motion predictor

The order of checking is similar to spatial MVP in HEVC AMVP list construction. First, an affine motion predictor inherited from the left is derived from a first block in { A1, A0} which is affine-encoded and has the same reference picture as the current block. Next, the above-described inherited affine motion predictor is derived from a first block in { B1, B0, B2} which is affine-encoded and has the same reference picture as the current block. Five blocks A1, A0, B1, B0, B2 are shown in fig. 19.

Once a neighboring block is found to be encoded in affine mode, the CPMV of the encoding unit covering the neighboring block is used to derive the predictor of the CPMV of the current block. For example, if A1 is encoded with a non-affine mode and A0 is encoded with a 4-parameter affine mode, the left inherited affine MV predictor will be derived from A0. In this case, CPMU of CU covering A0, such as CPMU of upper left corner in FIG. 21And CPMV in the upper right corner +.>Shown is an estimate for deriving the current blockCPMV is counted, consisting of +_for the upper left (coordinates (x 0, y 0)), upper right (coordinates (x 1, y 1)) and lower right (coordinates (x 2, y 2)) positions for the current block >And (3) representing.

2) Construction affine motion predictor

The constructed affine motion predictor consists of Control Point Motion Vectors (CPMV) derived from neighboring inter-coded blocks having the same reference picture, as shown in fig. 20. If the current affine motion model is 4-parameter affine, the number of CPMV is 2, otherwise if the current affine motion model is 6-parameter affine, the number of CPMV is 3. CPMV in upper left cornerIs derived from the MV at the first block in the { a, B, C } group, which is inter-coded and has the same reference picture as the current block. CPMV in upper right corner->Is derived from the MV at the first block in the { D, E } group, which is inter-coded and has the same reference picture as the current block. CPMV in lower left corner->Is derived from the MV at the first block in the F, G group, which is inter coded and has the same reference picture as the current block. />

-if the current affine motion model is a 4-parameter affine, then only ifAnd->Is known, the constructed affine motion predictor is inserted into the candidate list, i.e. +.>Used as an estimated CPMV of the upper left (coordinates (x 0, y 0)), upper right (coordinates (x 1, y 1)) position of the current block.

-if the current affine motion model is a 6-parameter affine, then only if And->

Is known, the constructed affine motion predictor is inserted into the candidate list, i.e

Andused as estimated CPMV for the upper left (coordinates (x 0, y 0)), upper right (coordinates (x 1, y 1)) and lower right (coordinates (x 2, y 2)) positions of the current block.

When the constructed affine motion predictor is inserted into the candidate list, the clipping process is not applied.

3) Conventional AMVP motion predictor

The following applies until the number of affine motion predictors reaches a maximum.

1) By setting all CPMVs equal toAn affine motion predictor is derived (if available).

2) By setting all CPMVs equal toAn affine motion predictor is derived (if available).

3) By setting all CPMVs equal toPush (if available)And guiding the affine motion predictor.

4) The affine motion predictor is derived by setting all CPMV equal to HEVC TMVP (if available).

5) Affine motion predictors are derived by setting all CPMV to zero MV.

Note that the number of the components to be processed,has been derived in the constructed affine motion predictor.

Fig. 18A shows an example of a 4-parameter affine model. Fig. 18B shows an example of a 6-parameter affine model.

Fig. 19 shows an example of MVP of af_inter of inherited affine candidates.

Fig. 20 shows an example of MVP of af_inter of the constructed affine candidate.

In the AF INTER mode, when a 4/6 parameter affine mode is used, 2/3 control points are required, and thus, 2/3 MVDs need to be encoded for these control points, as shown in fig. 18. In existing implementations, deriving MVs is proposed, as shown below, from mvd ₀ Predicting mvd ₁ And mvd ₂ 。

Wherein the method comprises the steps ofmvd _i And mv ₁ The predicted motion vector, motion vector difference, and motion vector of the upper left pixel (i=0), the upper right pixel (i=1), or the lower left pixel (i=2), respectively, as shown in fig. 18B.Note that the sum of two motion vectors, such as mvA (xA, yA) and mvB (xB, yB), is equal to the sum of the two components, respectively, i.e., newmv=mva+mvb, and the two components of newMV are set to (xa+xb) and (ya+yb), respectively.

2.3.3.3AF_MERGE mode

When a CU is applied in af_merge mode, it obtains the first block encoded in affine mode from the valid neighbor reconstructed blocks. And the selection order of the candidate blocks is from left, upper, right, left lower to left upper (represented by a, B, C, D, E in order), as shown in fig. 21. For example, if a neighbor lower left block is encoded in affine mode, as shown by A0 in FIG. 21B, then the Control Point (CP) motion vectors mv for the upper left, upper right, and lower left corners of the neighboring CU/PU containing block A are obtained ₀ ^N 、mv ₁ ^N And mv ₂ ^N . Based on mv ₀ ^N 、mv ₁ ^N And mv ₂ ^N Calculating motion vector mv of current CU/PU upper left corner/upper right/lower left ₀ ^C 、mv ₁ ^C And mv ₂ ^C (for 6 parameter affine model only). It should be noted that in VTM-2.0, if the current block is affine encoded, the sub-block located in the upper left corner (e.g., 4 x 4 block in VTM) stores mv0 and the sub-block located in the upper right corner stores mv1. If the current block is encoded using the 6-parameter affine model, the sub-block located in the lower left corner stores mv2; otherwise (encoding the current block using a 4-parameter affine model), LB stores mv2'. The other sub-blocks store MVs for the MC.

CPMVmv in deriving current CU ₀ ^C 、mv ₁ ^C And mv ₂ ^C Thereafter, the MVF of the current CU is generated according to the reduced affine motion model equations (1) and (2). In order to identify whether the current CU is encoded in the af_merge mode, an affine flag is signaled in the bitstream when at least one neighboring block is encoded in the affine mode.

In some existing implementations, the affine Merge candidate list is constructed by:

1) Inserting inherited affine candidates

Inherited affine candidates refer to candidates that are derived from affine motion models of their valid neighbor affine coding blocks. Up to two inherited affine candidates are derived from affine motion models of neighboring blocks and inserted into the candidate list. For the left predictor, the scan order is { A0, A1}; for the predictor described above, the scan order is { B0, B1, B2}.

2) Affine candidates for insertion constructs

If the number of candidates in the affine Merge candidate list is less than maxnumaffineca (e.g., 5), then the constructed affine candidate is inserted into the candidate list. Constructing affine candidates refers to constructing candidates by combining neighbor motion information of each control point.

a) Motion information for the control points is first derived from the designated spatial neighbors and temporal neighbors shown in fig. 22. CPk (k=1, 2,3, 4) represents the kth control point. A0 A1, A2, B0, B1, B2 and B3 are spatial locations for predicting CPk (k=1, 2, 3); t is the time domain position used to predict CP 4.

The coordinates of CP1, CP2, CP3 and CP4 are (0, 0), (W, 0), (H, 0) and (W, H), respectively, where W and H are the width and height of the current block.

Motion information for each control point is obtained according to the following priority order:

for CP1, the check priority is B2- > B3- > A2. If B2 is available, then B2 is used.

Otherwise, if B2 is not available, B3 is used. If neither B2 nor B3 is available, A2 is used.

If none of these three candidates is available, the motion information of CP1 cannot be obtained.

For CP2, checking priority b1→b0;

for CP3, the check priority is a1→a0;

For CP4, T is used.

b) Second, affine Merge candidates are constructed using combinations of control points.

I. Motion information of three control points is required to construct 6-parameter affine candidates. Three control points may be selected from one of four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combination CP1, CP2, CP3, { CP2, CP3, CP4}, { CP1, CP3, CP4} will be converted into a 6-parameter motion model represented by the upper left, upper right and lower left control points.

Motion information of two control points is required to construct a 4-parameter affine candidate. Two control points may be selected from one of the two combinations ({ CP1, CP2}, { CP1, CP3 }). These two combinations will be converted into a 4-parameter motion model represented by the upper left and upper right control points.

Inserting the constructed combination of affine candidates into the candidate list in the following order:

{CP1，CP2，CP3}，{CP1，CP2，CP4}，{CP1，CP3，CP4}，{CP2，CP3，CP4}，{CP1，CP2}，{CP1，CP3}

i. for each combination, the reference index of list X for each CP is checked, and if they are all the same, this combination has a valid CPMV for list X. If the combination has no valid CPMVs for both list 0 and list 1, then the combination is marked as invalid. Otherwise, it is valid and puts the CPMV in the sub-block Merge list.

3) Padding zero motion vectors

If the number of candidates in the affine Merge candidate list is less than 5, a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.

More specifically, for the sub-block Merge candidate list, there are 4-parameter Merge candidates for unidirectional prediction (for P slices) and bi-prediction (for B slices) with MV set to (0, 0) and prediction direction set to list 0.

2.3.4 current picture reference

Intra block copying (also known as IBC or intra picture block compensation), also known as Current Picture Reference (CPR), is employed in HEVC screen content coding extension (SCC). The tool is very effective for the coding of screen content video, because repeated patterns of text and graphics rich content often occur in the same picture. Using previously reconstructed blocks having the same or similar patterns as predictors can effectively reduce prediction errors and thus improve coding efficiency. An example of intra block compensation is shown in fig. 23.

Similar to the design of CRP in HEVC SCC, in VVC the use of IBC modes is signaled on sequence and picture level. When IBC mode is enabled on a Sequence Parameter Set (SPS), IBC mode may be enabled at the picture level. When IBC mode is enabled at the picture level, the current reconstructed picture will be regarded as a reference picture. Thus, the use of IBC mode can be signaled over existing VVC inter modes without requiring syntax modification at the block level.

The main characteristics are as follows:

-being regarded as a regular inter mode. Thus, merge and skip modes may also be used for IBC mode. The Merge candidate list construction is unified, containing Merge candidates from neighboring locations, which are encoded in IBC mode or HEVC inter mode. Depending on the selected Merge index, the current block in Merge or skip mode may be merged into one of the IBC mode encoded neighbors or regular inter mode encoded with a different picture as a reference picture.

Block vector prediction and coding scheme for IBC mode reuse the scheme for motion vector prediction and coding in HEVC inter mode (AMVP and MVD coding).

Motion vectors of IBC mode (also called block vectors) are encoded with integer pixel precision, but are stored in memory with 1/16 pixel precision after decoding, since quarter pixel precision is required in the interpolation and deblocking (deblocking) stage. When used for motion vector prediction in IBC mode, the stored vector predictor will shift to the right by 4.

Search range: limited to being within the current CTU.

When affine mode/triangle mode/GBI/weighted prediction is enabled, CPR is not allowed to be used.

Merge list design in 2.3.5VVC

VVC supports three different Merge list construction processes:

1) Sub-block Merge candidate list: it includes ATMVP and affine Merge candidates. Affine and ATMVP modes share a Merge list construction process. Here, ATMVP and affine Merge candidates may be added sequentially. The size of the sub-block Merge list is signaled in the stripe header and has a maximum value of 5.

2) Unidirectional prediction TPM Merge list: for the trigonometric prediction mode, one Merge list construction process for two partitions is shared even though they may select their own Merge candidate index. When constructing the Merge list, the spatial neighboring block and the two temporal blocks of the block are checked. Motion information derived from spatial neighbors and temporal blocks is referred to as regular motion candidates in our IDF. These conventional motion candidates are further used to derive a plurality of TPM candidates. Note that the transformation is performed at the entire block level, even though the two partitions may use different motion vectors to generate their own prediction blocks. The bi-prediction TPM Merge list is fixed to a size of 5.

3) Conventional Merge list: for the remaining coded blocks, a Merge list construction process is shared. Here, the spatial/temporal/HMVP, the bi-prediction Merge candidate and the zero motion candidate combined in pairs may be inserted in order. The size of the regular Merge list is signaled in the stripe header and has a maximum value of 6.

2.3.5.1 subblock Merge candidate list

It is proposed to put all sub-block related motion candidates except the regular Merge list of non-sub-block Merge candidates into a separate Merge list.

The motion candidates associated with the sub-block are placed in a separate Merge list, named "sub-block Merge candidate list".

In one example, the sub-block Merge candidate list includes affine Merge candidates, ATMVP candidates, and/or sub-block-based STMVP candidates.

2.3.5.1.1 another ATMVP embodiment

In this contribution, the ATMVP Merge candidate in the regular Merge list is moved to the first position of the affine Merge list. In this way, all the Merge candidates in the new list (i.e., the Merge candidate list based on the sub-block) are based on the sub-block coding tool.

ATMVP in 2.3.5.1.1VTM-3.0

In VTM-3.0, a special Merge candidate list, called a sub-block Merge candidate list (also called an affine Merge candidate list), is added in addition to the regular Merge candidate list. The sub-block Merge candidate list fills the candidates in the following order:

atmvp candidates (possibly available or unavailable);

c. inherited affine candidates;

d. constructed affine candidates;

e. zero-padded MV

The maximum number of candidates in the sub-block Merge candidate list (denoted ML) is derived as follows:

1) If the ATMVP usage flag (e.g., the flag may be named "sps_sbtmvp_enabled_flag") is on (equal to 1), but the affine usage flag (e.g., the flag may be named "sps_affine_enabled_flag") is off (equal to 0), ML is set equal to 1.

2) If the ATMVP usage flag is off (equal to 0) and the affine usage flag is off (equal to 0), ML is set equal to 0. In this case, the sub-block Merge candidate list is not used.

3) Otherwise (affine usage flag on (equal to 1), ATMVP usage flag on or off), ML is sent from encoder to decoder with new number. The effective ML is 0< = ML < = 5.

When constructing the sub-block Merge candidate list, the ATMVP candidate is first checked. If either of the following conditions is true, the ATMVP candidate is skipped and not put into the sub-block Merge candidate list.

1) ATMVP use flag off;

2) Any TMVP uses a flag (e.g., when signaled at the slice level, the flag may be named "slice_temporal_mvp_enabled_flag");

3) The reference picture with reference index 0 in reference list 0 is the same as the Current Picture (CPR)

ATMVP in VTM-3.0 is much simpler than in JEM. When generating ATMVP Merge candidates, the following procedure applies:

a. As shown in fig. 22, neighboring blocks A1, B0, A0 are examined to find a first inter-coded but not CPR-coded block, denoted block X;

b. initialize tmv= (0, 0). If there is an MV for block X (denoted MV '), the collocated reference picture is referenced (as signaled in the slice header), TMV is set equal to MV'.

c. Assuming that the center point of the current block is (x 0, y 0), the corresponding position of the collocated picture pair (x 0, y 0) is located according to m= (x0+mv 'x, y0+mv' y). The block Z covering M is found.

i. If Z is intra-coded, ATMVP is not available;

if Z is inter coded, then MVZ_0 and MVZ_1 of the two lists of block Z are scaled to (Reflist 0 index 0) and (Reflist 1 index 0) as MVdefaults 0, MVdefaults 1 and stored.

d. For each 8x8 sub-block, assuming that its center point is (x 0S, y 0S), the corresponding position of (x 0S, y 0S) is located in the collocated picture according to ms= (x0s+mv 'x, y0s+mv' y). A block ZS covering the MS is found.

i. If ZS is intra-coded, MVdefaults 0, MVdefaults 1 are assigned to the sub-blocks;

if the ZS is inter coded, then the mvzs_0 and mvzs_1 of the two lists of block ZS are scaled to (Reflist 0 index 0) and (Reflist 1 index 0) and assigned to the sub-blocks;

MV clipping and masking in ATMVP:

when a corresponding location, such as M or MS, is located in the collocated picture, it is cropped into a predetermined area. CTU size is s×s, s=128 in VTM-3.0. Assuming that the upper left corner position of the juxtaposed CTU is (xCTU, yCTU), the corresponding position M or MS at (xN, yN) will be clipped to the active area xCTU < = xN < xctu+s+4; yCTU < = yN < yctu+s.

In addition to clipping, (xN, yN) is also masked as xn=xn&MASK，yN＝yN&MASK, wherein MASK is equal to (2) ^N -1) and n=3 to set the lowest 3 bits to 0. Therefore xN and yN must be multiples of 8 (a "-" represents a bitwise complement operator).

Syntax design in 2.3.5.1.3 strip header

2.3.5.2 conventional Merge list

Unlike the Merge list design, in VVC, a history-based motion vector prediction (HMVP) method is employed.

In HMVP, previously encoded motion information is stored. The motion information of the previously encoded block is defined as HMVP candidate. The plurality of HMVP candidates are stored in a table named HMVP table, and the table is maintained during the instant encoding/decoding process. When encoding/decoding of a new slice is started, the HMVP table will be emptied. Whenever an inter coded block is present, the associated motion information is added to the last entry of the table as a new HMVP candidate. The overall encoding flow is shown in fig. 25.

HMVP candidates may be used in the AMVP and Merge candidate list construction process. Fig. 26 shows a modified Merge candidate list construction process (highlighted in blue). When the Merge candidate list is not full after the TMVP candidate is inserted, the Merge candidate list may be filled with HMVP candidates stored in the HMVP table. Considering that a block generally has a higher correlation in motion information with the nearest neighbor block, HMVP candidates in the table are inserted in descending order of index. The last entry in the table is added first to the list and the first entry is added last. Similarly, redundancy removal is applied to HMVP candidates. Once the total number of available Merge candidates reaches the maximum number of Merge candidates allowed to be signaled, the Merge candidate list construction process terminates.

2.4MV rounding

In VVC, when MV moves right, it is required to round it to zero. For MVs (MVx, MVy) to be shifted right by N bits, the result MV ' (MVx ', MVy ') will be derived in a formulated manner as follows:

MVx’＝(MVx+((1<<N)>>1)-(MVx>＝01:0))>>N；

MVy’＝(MVy+((1<<N)>>1)-(MVy>＝01:0))>>N；

2.5 Reference Picture Resampling (RPR) embodiment

ARC, also known as Reference Picture Resampling (RPR), has been incorporated into some existing and upcoming video standards.

In some embodiments of the RPR, if the collocated picture has a different resolution than the current picture, the TMVP is disabled. Furthermore, when the reference picture has a different resolution than the current picture, BDOF and DMVR are disabled.

In order to process a conventional MC when a reference picture has a different resolution from a current picture, an interpolation section is defined as follows:

8.5.6.3 fractional sample interpolation process

8.5.6.3.1 overview

The inputs to this process are:

-a luminance position (xSb, ySb) specifying an upper left corner sample of the current coding sub-block relative to an upper left corner luminance sample of the current picture;

the variable sbWidth, specifies the width of the current coding sub-block,

the variable sbHeight, specifies the height of the current coded sub-block,

motion vector offset mvOffset,

a refined motion vector refMvLX,

the selected reference picture sample number group refPicLX,

half-pel interpolation filter index hpelIfIdx,

the bidirectional optical flow flag bdofFlag,

the variable cIdx specifies the color component index of the current block.

The output of this process is:

-an array predSamplesLX of (sbwidth+brdextsize) x (sbheight+brdextsize) of predicted sample values.

The prediction block boundary extension size brdExtSize is derived as follows:

brdExtSize＝(bdofFlag||(inter_affine_flag[xSb][ySb]&&sps_affine_prof_enabled_flag))？2:0(8-752)

the variable fRefWidth is set equal to PicOutputWidthL of the reference picture in the luminance sample.

The variable fRefHeight is set equal to PicOutputHeight L of the reference picture in the luminance sample.

The motion vector mvLX is set equal to (refMvLX-mvOffset).

-if cIdx is equal to 0, applying the following operations:

-scaling factor and its fixed point representation are defined as

hori_scale_fp＝((fRefWidth<<14)+(PicOutputWidthL

>>1))/PicOutputWidthL (8-753)

vert_scale_fp＝((fRefHeight<<14)+(PicOutputHeightL

>>1))/PicOutputHeightL (8-754)

Let (xIntL, yIntL) be the luminance position given in full-pel units and (xFracl, yFracl) be the offset given in 1/16-pel units. These variables are used only in this bar to specify fractional sample positions inside the reference sample number group refPicLX.

-setting the upper left coordinates of the boundary block of the reference sample pad (xSbIntL, ySbIntL) equal to (xsb+ (mvLX [0] > 4), ySb + (mvLX [1] > 4)).

For each luminance sample point location (x _L ＝0..sbWidth-1+brdExtSize,y _L =0..sbheight-1+brdextsize), the corresponding predicted luminance sample value predsamplelx [ x ] _L ][y _L ]The derivation is as follows:

-order (refxSb _L ,refySb _L ) Sum (refx) _L ,refy _L ) For a motion vector given in 1/16-sample units (refMvLX [0]],refMvLX[1]) Pointed luminance position (refMvLX [0]]，refMvLX[1]). Variable refxSb _L ,refx _L ,refySb _L And refy _L The derivation of (2) is as follows:

refxSb _L ＝((xSb<<4)+refMvLX[0])*hori_scale_fp (8-755)

refx _L ＝((Sign(refxSb)*((Abs(refxSb)+128)>> 8)

+x _L *((hori_scale_fp+8)>>4))+32)>> 6(8-756)

refySb _L ＝((ySb<<4)+refMvLX[1])*vert_scale_fp (8-757)

refyL＝((Sign(refySb)*((Abs(refySb)+128)>>8)+yL*

((vert_scale_fp+8)>>4))+32)>>6 (8-758)

-variable xInt _L ,yInt _L ,xFrac _L And yFrac _L The derivation of (2) is as follows:

xInt _L ＝refx _L >>4 (8-759)

yInt _L ＝refy _L >>4 (8-760)

xFrac _L ＝refx _L &15 (8-761)

yFrac _L ＝refy _L &15 (8-762)

-if bdofFlag is equal to TRUE (TRUE) or (sps_affine_prof_enabled_flag is equal to TRUE and inter_affine_flag [ xSb)][ySb]Equal to TRUE) and one or more of the following conditions are TRUE, then the predicted luma sample value predsamplelx x _L ][y _L ]Is obtained by calling the luminance integer-sample acquisition procedure specified in clause 8.5.6.3.3 to (xInt _L +(xFrac _L >>3)-1),yInt _L +(yFrac _L >>3) -1) and refPicLX as inputs.

1.x _L Equal to 0.

2.x _L Equal to sbWidth +1.

3.y _L Equal to 0.

4.y _L Equal to sbheight+1.

Otherwise, the predicted luminance sample value predsamplelx [ x ] _L ][y _L ]By calling the luminance sample 8 tap interpolation filter procedure specified in section 8.5.6.3.2 to (xIntL- (brdExtSize)>01:0)、yIntL-(brdExtSize>01:0))、(xFracL,yFracL),(xSbInt _L ,ySbInt _L ) refPicLX, hpelIfIdx, sbWidth, sbHeight and (xSb, ySb) as inputs.

Otherwise (cIdx is not equal to 0), the following applies:

let (xIntC, yIntC) be the chromaticity position given in full-pel units and (xFracc, yFracc) be the offset given in 1/32-pel units. These variables are used only in this bar to specify the conventional fractional sample positions inside the reference sample number group refPicLX.

-setting the upper left coordinates of the boundary block of the reference sample pad (xSbIntC, ySbIntC) equal to ((xSb/submidthc) + (mvLX [0] > > 5), (ySb/subheight c) + (mvLX [1] > > 5)).

-for each chroma-sample point within the predicted chroma-sample number group predsamplelx (xc=0..sbwidth-1, yc=0..sbheight-1), the derivation of the corresponding predicted chroma-sample value predsamplelx [ xC ] [ yC ] is as follows:

-order (refxSb _C ,refySb _C ) Sum (refx) _C ,refy _C ) For a motion vector given in 1/32-sample units (mvLX [0] ]，mvLX[1]) The chromaticity position pointed to. Variable refxSb _C 、efySb _C 、efx _C And refy _C The derivation of (2) is as follows:

refxSb _C ＝((xSb/SubWidthC<<5)+mvLX[0])*hori_scale_fp

(8-763)

refx _C ＝((Sign(refxSb _C )*((Abs(refxSb _C )+256)>> 9)

+xC*((hori_scale_fp+8)>>4))+16)>>5(8-764)

refySb _C ＝((ySb/SubHeightC<<5)+

mvLX[1])*vert_scale_fp (8-765)

refy _C ＝((Sign(refySb _C )*((Abs(refySb _C )+256)>>9)

+yC*((vert_scale_fp+8)>>4))+16)>>5(8-766)

-variable xInt _C 、yInt _C 、xFrac _C And Frac _C The derivation of (2) is as follows:

xInt _C ＝refx _C >>5 (8-767)

yInt _C ＝refy _C >>5 (8-768)

xFrac _C ＝refy _C &31 (8-769)

yFrac _C ＝refy _C &31 (8-770)

the predicted sample value predsamplelx [ xC ] [ yC ] is derived by calling the procedure specified in section 8.5.6.3.4 with (xIntC, yinc), xFracC, yfrac), xSbIntC, ySbIntC), bWidth, bHeight and refP as inputs.

8.5.6.3.2 luminance sample interpolation filtering process

The input of the program is

Luminance position in full sample (xInt _L ,yInt _L )，

-luminance position in fractional samples (xFrac _L ,yFrac _L )，

Luminance position in full sample (xsbsint _L ,ySbInt _L ) Designating a top left sample of the reference sample-filled boundary block with respect to a top left luminance sample of the reference picture,

a luminance reference sample number group refPicLXL,

half-pel interpolation filter index hpelIfIdx,

a variable sbWidth, specifying the width of the current sub-block,

a variable sbHeight, specifying the height of the current sub-block,

-a luminance position (xSb, ySb) specifying an upper left corner sample of the current sub-block relative to an upper left corner luminance sample of the current picture;

the output of this process is the predicted luminance sample value predSampleLX _L

The variables shift1, shift2 and shift3 are derived as follows:

setting the variable shift1 equal to Min (4, bitdepth _Y -8), variable shift2 is set equal to 6, and variable shift3 is set equal to Max (2, 14-BitDepth) _Y )。

Setting the variable picW equal to pic_width_in_luma_samples and the variable picH equal to pic_height_in_luma_samples.

Each equal to xFracl or yFraclLuminance interpolation filter coefficient f for 1/16 fractional sample position p _L [p]The derivation of (2) is as follows:

-if motionmode odeldicc [ xSb][ySb]Greater than 0 and both sbWidth and sbHeight are equal to 4, then the luminance interpolation filter coefficients f are specified in tables 8-12 _L [p]。

Otherwise, the luminance interpolation filter coefficients f are specified in tables 8-11 according to hpelIfIdx _L [p]。

For i=0..7, the luminance position (xInt _i ,yInt _i ) The derivation is as follows:

-if the sub_drive_as_pic_flag [ SubPicIdx ] is equal to 1, then the following applies:

xInt _i ＝Clip3(SubPicLeftBoundaryPos,SubPicRightBoundaryPos,xInt _L +i-3) (8-771)

yInt _i ＝Clip3(SubPicTopBoundaryPos,SubPicBotBoundaryPos,yInt _L +i-3) (8-772)

-else (sub_agent_as_pic_flag [ SubPicIdx ] equal to 0), the following applies:

xInt _i ＝Clip3(0,picW-1,sps_ref_wraparound_enabled_flag？

ClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,picW,xInt _L +i-3): (8-773)

xInt _L +i-3)

yInt _i ＝Clip3(0,picH-1,yInt _L +i-3)

(8-774)

for i=0..7, the luminance position in full-sample units is further modified as follows:

xInt _i ＝Clip3(xSbInt _L -3,xSbInt _L +sbWidth+4,xInt _i ) (8-775)

yInt _i ＝Clip3(ySbInt _L -3,ySbInt _L +sbHeight+4,yInt _i ) (8-776)

predictive luminance sample value predSampleLX _L The derivation of (2) is as follows:

-if xFrac _L And yFrac _L Are all equal to 0, predSampleLX _L The values of (2) are derived as follows:

predSampleLX _L ＝refPicLX _L [xInt ₃ ][yInt ₃ ]<<shift3 (8-777)

otherwise, if xFrac _L Not equal to 0 and yFrac _L Equal to 0, predsampleLX _L The values of (2) are derived as follows:

otherwise, if xFrac _L Equal to 0 and yFrac _L Not equal to 0, predsampleLX _L The values of (2) are derived as follows:

otherwise, if xFrac _L Not equal to 0 and yFrac _L Not equal to 0, predsampleLX _L The values of (2) are derived as follows:

the derivation of the sample number set temp n of n=0.7 is as follows:

-predicting luminance sample value predSampleLX _L The derivation of (2) is as follows:

tables 8-11-luminance interpolation filter coefficients f for each 1/16 fractional sample position p _L [p]Specifications of (2)

TABLE 8-12 luminance interpolation for each 1/16 fractional sample position p in affine motion modeValue filter coefficient f _L [p]Is defined in the specification of (2).

8.5.6.3.3 brightness integer sampling point acquisition process

The inputs of this process are:

luminance position in full sample (xInt _L ,yInt _L )，

-luminance reference sample number group refPicLX _L ，

Setting the variable shift equal to Max (2, 14-BitDepth _Y )。

The variable picW is set equal to pic_width_in_luma_samples and the variable picH is set equal to pic_height_in_luma_samples.

The luminance position (xInt, yInt) in full-sample units is derived as follows:

xInt＝Clip3(0,picW-1,sps_ref_wraparound_enabled_flag？ (8-782)

ClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,picW,xInt _L ):xInt _L )

yInt＝Clip3(0,picH-1,yInt _L ) (8-783)

predicted luminance sample value predSampleLX _L The derivation of (2) is as follows:

predSampleLX _L ＝refPicLX _L [xInt][yInt]<<shift3 (8-784)

8.5.6.3.4 chroma sample interpolation process

The inputs of this process are:

-chromaticity position in full sample units (xInt _C ,yInt _C )，

-chromaticity position (xFrac) in 1/32 fractional samples _C ,yFrac _C )，

A chroma position (xSbIntC, ySbIntC) in full samples, specifying a top left sample of the reference sample filled boundary block relative to a top left chroma sample of the reference picture,

a variable sbWidth, specifying the width of the current sub-block,

a variable sbHeight, specifying the height of the current sub-block,

-chroma reference sample number group refPicLX _C 。

The output of this process is the predicted chroma sample value predSampleLX _C

The variables shift1, shift2 and shift3 are derived as follows:

setting the variable shift1 equal to Min (4, bitdepth) _C -8), variable shift2 is set equal to 6, variable shift3 is set equal to Max (2, 14-BitDepth) _C )。

Variable picW _C Set equal to pic_width_in_luma_samples/SubWidthC and let variable picH _C Set equal to pic_height_in_luma_samples/subheight c.

Equal to xFrac is specified in tables 8-13 _C Or yFrac _C The chroma interpolation filter coefficients f for each 1/32 fractional sample position p _C [p]。

The variable xOffset is set equal to (sps_ref_wraparound_offset_minus1+1) ×mincbsizey)/SubWidthC.

For i=0..3, the chromaticity position (xInt _i ,yInt _i ) The derivation is as follows:

-if the sub_cached_as_pic_flag [ SubPicIdx ] is equal to 1, then the following applies:

xInt _i ＝Clip3(SubPicLeftBoundaryPos/SubWidthC,SubPicRightBoundaryPos/SubWidthC,xInt _L +i) (8-785)

yInt _i ＝Clip3(SubPicTopBoundaryPos/SubHeightC,SubPicBotBoundaryPos/SubHeightC,yInt _L +i) (8-786)

-else (sub_agent_as_pic_flag [ SubPicIdx ] equal to 0), the following applies:

xInt _i ＝Clip3(0,picW _C -1,sps_ref_wraparound_enabled_flagClipH(xOffset,picW _C ,xInt _C +i-1): (8-787)

xInt _C +i-1)

yInt _i ＝Clip3(0,picH _C -1,yInt _C +i-1)

(8-788)

for i=0..3, the chromaticity position (xInt _i ,yInt _i ) The following further modifications were made:

xInt _i ＝Clip3(xSbIntC-1,xSbIntC+sbWidth+2,xInt _i ) (8-789)

yInt _i ＝Clip3(ySbIntC-1,ySbIntC+sbHeight+2,yInt _i ) (8-790)

predicted chroma sample value predSampleLX _C The derivation is as follows:

-if xFrac _C And yFrac _C All equal to 0, predsampleLX _C The values of (2) are derived as follows:

predSampleLX _C ＝refPicLX _C [xInt ₁ ][yInt ₁ ]<<shift3 (8-791)

otherwise, if xFrac _C Not equal to 0 and yFrac _C Equal to 0, predsampleLX _C The values of (2) are derived as follows:

otherwise, if xFrac _C Equal to 0 and yFrac _C Not equal to 0, predsampleLX _C The values of (2) are derived as follows:

otherwise, if xFrac _C Not equal to 0 and yFrac _C Not equal to 0, predsampleLX _C The values of (2) are derived as follows:

-number of samples group temp n of n=0.3 is derived as follows:

predicted chroma sample value predSampleLX _C The derivation is as follows:

predSampleLX _C ＝(f _C [yFrac _C ][0]*temp[0]+

f _C [yFrac _C ][1]*temp[1]+

f _C [yFrac _C ][2]*temp[2]+

(8-795)

f _C [yFrac _C ][3]*temp[3])>>shift2

tables 8-13-chroma interpolation filter coefficients f for each 1/32 fractional sample position p _C [p]Specifications of (2)

2.6 embodiment employing sub-pictures

With the current syntax design of the sub-picture in the existing implementation, the position and size of the sub-picture is derived as follows:

the sub_present_flag being equal to 1 indicates that there is currently a sub picture parameter in the SPS RBSP syntax. A sub_present_flag equal to 0 indicates that there is currently no sub-picture parameter in the SPS RBSP syntax. .

NOTE 2 (NOTE 2) -when the bitstream is the result of the sub-bitstream extraction process and contains only a subset of the sub-pictures of the input bitstream of the sub-bitstream extraction process, it may be necessary to set the value of the sub-bits_present_flag equal to 1 in the RBSP of the sps.

max_sub_minus1 plus 1 specifies the maximum number of sub-pictures that may be present in the CVS. max_sub_minus1 must be in the range of 0 to 254. The value 255 is reserved for ITU-t|iso/IEC for future use.

The sub_grid_col_width_minus1 plus 1 specifies the width of each element of the sub-picture identifier grid in 4 samples. The syntax element has a length of Ceil (Log 2 (pic_width_max_in_luma_samples/4)) bits.

The variable NumSubPicGridCols is derived as follows:

NumSubPicGridCols＝

(pic_width_max_in_luma_samples+subpic_grid_col_width_minus1*4+

3)/

(subpic_grid_col_width_minus1*4+4) (7-5)

the sub_grid_row_height_minus1 plus 1 specifies the height of each element in the sub-picture identifier grid in 4 samples. The length of the syntax element is

The Ceil (Log 2 (pic_height_max_in_luma_samples/4)) bit.

The variable NumSubPicGridRows is derived as follows:

NumSubPicGridRows＝

(pic_height_max_in_luma_samples+subpic_grid_row_height_minus1*4

+3)/

(subpic_grid_row_height_minus1*4+4) (7-6)

sub-picture index of grid position (i, j) is specified by sub-bpic_grid_idx [ i ] [ j ]. The syntax element has a length of Ceil (Log 2 (max_sub_minus1+1)) bits.

The variables SubPicTop [ subbpic_grid_idx [ i ] [ j ] ], subPicLeft [ subbpic_grid_idx [ i ] [ j ] ], subPicWidth [ subbpic_grid_idx [ i ] [ j ] ], subPicHeight [ subbpic_grid_idx [ i ] [ j ] ], and NumSubPics are derived as follows:

The sub-picture i of each coded picture in the CVS is designated as a picture in the decoding process that does not include a loop filtering operation by sub-coded_treated_as_pic_flag i being equal to 1.

The sub-picture of each coded picture specified in the CVS for which the sub-bpic_rotated_as_pic_flag i is equal to 0 is not regarded as a picture in a decoding process that does not include a loop filtering operation. If the sub_acid_as_pic_flag [ i ] does not exist, the value of the sub_acid_as_pic_flag [ i ] is inferred to be equal to 0.

2.7 Combined intra prediction (CIIP)

Combined Inter Intra Prediction (CIIP) is employed in VVC as a special Merge candidate. It can only be enabled for WxH blocks with W < = 64 and H < = 64.

3. Disadvantages of the prior embodiments

In the current design of VVC, ATMVP has the following problems:

1) Whether ATMVP is applied does not match at the slice level and the CU level;

2) In the stripe header, ATMVP may be enabled even if TMVP is disabled. Meanwhile, the ATMVP flag is signaled before the TMVP flag.

3) Masking is always done regardless of whether the MV is compressed;

4) The effective corresponding area may be too large;

5) The derivation of TMV is overly complex;

6) In some cases, ATMVP may not be available, thus requiring a better default MV.

7) MV scaling methods in ATMVP may not be efficient;

8) ATMVP should consider CPR;

9) Even if affine prediction is disabled, default zero affine Merge candidates may be placed in the list.

10 A current picture is considered a long-term reference picture and other pictures are considered short-term reference pictures. For the ATMVP and TMVP candidates, motion information from temporal blocks in collocated pictures is scaled to reference pictures with a fixed reference index (i.e., 0 for each reference picture list in the current design). However, when the CPR mode is enabled, the current picture is also regarded as a reference picture, and the current picture may be added to reference picture list0 (RefPicList 0) having an index equal to 0.

a. For TMVP, if the time domain block is encoded in CPR mode and the reference picture of RefPicList0 is a short reference picture, the TMVP candidate is set to unavailable.

b. If the reference picture of RefPicList0, the index of which is equal to 0, is the current picture, and the current picture is an Intra Random Access Point (IRAP) picture, the ATMVP candidate is set to unavailable.

c. For the ATMVP sub-block within a block, when deriving motion information for the sub-block from the temporal block, if the temporal block is encoded in CPR mode, the default ATMVP candidate (derived from the temporal block identified by the starting TMV and center position of the current block) is used to populate the motion information for the sub-block.

11 Right-shifting MV to integer precision, but not following the rounding rule in VVC.

12 Since it points to the collocated picture, MV (MVx, MVy) (e.g., TMV in 0) used in ATMVP to locate the corresponding block in a different picture is directly used. This is based on the assumption that all pictures have the same resolution. However, when RPR is enabled, a different picture resolution may be used. Similar problems exist for identifying corresponding blocks in collocated pictures to derive sub-block motion information.

13 If the width or height of one block is greater than 32 and the maximum transform block size is 32, generating an intra prediction signal at a CU size for a CIIP encoded block; while an inter prediction signal is generated at a TU size (the current block is recursively divided into a plurality of 32x32 blocks). The use of CU to derive intra-prediction signals results in lower efficiency.

There are some problems with current designs. First, if the reference picture of RefPicList0 with index equal to 0 is the current picture, and the current picture is not an IRAP picture, the ATMVP procedure will still be invoked, but since there is no temporal motion vector that can be scaled to the current picture, it does not find any available ATMVP candidates.

4. Examples of embodiments and techniques

The following list of techniques and embodiments should be considered as examples to explain the general concepts. These techniques should not be interpreted narrowly. Furthermore, these techniques may be combined in any manner in encoder or decoder embodiments.

1. It should be considered whether TMVP is allowed and/or whether CPR is used to decide/resolve the maximum number of candidates in the sub-block Merge candidate list and/or whether ATMVP candidates should be added to the candidate list. The maximum number of candidates in the sub-block Merge candidate list is denoted ML.

a) In one example, in determining or resolving the maximum number of candidates in the sub-block Merge candidate list, if the ATMVP use flag is off (equal to 0) or TMVP is disabled, it is inferred that the ATMVP is not applicable.

i. In one example, the ATMVP use flag is on (equal to 1) and TMVP is disabled, and the ATMVP candidate is not added to the sub-block mere candidate list or the ATMVP candidate list.

in one example, the ATMVP use flag is on (equal to 1) and TMVP is disabled and the affine use flag is off (equal to 0), ML is set equal to 0, which means that the sub-block Merge does not apply.

in one example, the ATMVP use flag is on (equal to 1) and TMVP is enabled and the affine use flag is off (equal to 0), ML is set equal to 1.

b) In one example, when deciding or resolving the maximum number of candidates in the sub-block Merge candidate list, if the ATMVP usage flag is off (equal to 0) or the collocated reference picture of the current picture is the current picture itself, it is inferred that the ATMVP is not applicable.

i. In one example, the ATMVP use flag is on (equal to 1) and the collocated reference picture of the current picture is the current picture itself, the ATMVP candidate is not added to the sub-block Merge candidate list or the ATMVP candidate list.

in one example, the ATMVP use flag is on (equal to 1) and the collocated reference picture of the current picture is the current picture itself and the affine use flag is off (equal to 0), ML is set equal to 0, which means that the sub-block Merge does not apply.

in one example, the ATMVP use flag is on (equal to 1) and the collocated reference picture of the current picture is not the current picture itself and the affine use flag is off (equal to 0), then ML is set equal to 1.

c) In one example, in deciding or resolving the maximum number of candidates in the sub-block Merge candidate list, if the ATMVP usage flag is off (equal to 0), or the reference picture with reference picture index 0 in the reference list 0 is the current picture itself, it is inferred that the ATMVP is not applicable.

i. In one example, the ATMVP use flag is on (equal to 1), and the collocated reference picture with reference picture index 0 in reference list 0 is the current picture itself, without adding the ATMVP candidate to the sub-block Merge candidate list or the ATMVP candidate list.

in one example, the ATMVP use flag is on (equal to 1) and the reference picture with reference picture index 0 in reference list 0 is the current picture itself and the affine use flag is off (equal to 0), ML is set equal to 0, which means that the sub-block mere is not applicable.

in one example, the ATMVP use flag is on (equal to 1) and the reference picture with reference picture index 0 in reference list 0 is not the current picture itself and the affine use flag is off (equal to 0), ML is set equal to 1.

d) In one example, in deciding or resolving the maximum number of candidates in the sub-block Merge candidate list, if the ATMVP usage flag is off (equal to 0), or the reference picture with reference picture index 0 in reference list 1 is the current picture itself, it is inferred that the ATMVP is not applicable.

i. In one example, the collocated reference picture with the ATMVP usage flag on (equal to 1) and reference picture index 0 in reference list 1 is the current picture itself, and the ATMVP candidate is not added to the sub-block Merge candidate list or the ATMVP candidate list.

in one example, the ATMVP use flag is on (equal to 1) and the reference picture with reference picture index 0 in reference list 1 is the current picture itself and the affine use flag is off (equal to 0), ML is set equal to 0, which means that the sub-block mere is not applicable.

in one example, the ATMVP use flag is on (equal to 1) and the reference picture with reference picture index 0 in reference list 1 is not the current picture itself and the affine use flag is off (equal to 0), ML is set equal to 1.

2. It is proposed that if TMVP is disabled at slice/picture level, ATMVP is implicitly disabled and the ATMVP flag is not signaled.

a) In one example, the ATMVP flag is signaled after the TMVP flag in the slice header/PPS.

b) In one example, the ATMVP or/and TMVP flag may not be signaled in the slice header/PPS, but only in the SPS header.

3. Whether and how to mask the corresponding position in the ATMVP depends on whether and how to compress the MV. Let (xN, yN) be the corresponding position calculated with the coordinates of the current block/sub-block and the starting motion vector (e.g. TMV) in the collocated picture.

a) In one example, if compression of the MV is not required (e.g., sps_disable_motioncompression signaled in SPS is 1), then (xN, yN) is not masked; otherwise, (requiring compression of MV) mask (xN, yN) to xn=xn&MASK，yN＝yN&MASK, wherein MASK is equal to (2) ^M -1) and M may be an integer such as 3 or 4.

b) Assume each 2 ^K x 2 ^K The MV compression methods for MV storage results in the block share the same motion information, and masking in the ATMVP process is defined as _to (2) ^M -1). The recommendation K may not be equal to M, e.g., m=k+1.

c) The MASK used in ATMVP and TMVP may be the same or different.

4. In one example, the MV compression method may be flexible.

a) In one example, the MV compression method may be selected between no compression, 8×8 compression (m=3 in entry 3. A), or 16×16 compression (m=4 in entry 3. A).

b) In one example, the MV compression method may be signaled in the VPS/SPS/PPS/slice header.

c) In one example, MV compression methods may be set differently in different standard profiles/levels/layers.

The effective corresponding region in the atmvp may be adaptive;

a) For example, the valid corresponding region may depend on the width and height of the current block;

b) For example, the effective corresponding region may depend on the MV compression method;

i. in one example, if the MV compression method is not used, the effective corresponding region is smaller.

If the MV compression method is used, the effective corresponding area is large.

The effective corresponding region in the atmvp may be based on a base region of size mxn smaller than the CTU region. For example, the CTU size in VTM-3.0 is 128×128, while the base area size may be 64×64. Assume that the width and height of the current block are W and H.

a) In one example, if W < =m and H < =n, meaning that the current block is within the base region, the valid corresponding region in the ATMVP is an extension in the collocated base region and the collocated picture. Fig. 27 shows an example.

i. For example, assuming that the upper left corner position of the collocated base region is (xBR, yBR), the corresponding position at (xN, yN) will be clipped to the active region xBR < = xN < xbr+m+4; yBR < = yN < yBR +n.

Fig. 27 shows an example embodiment of the proposed active area when the current block is within the base area (BR).

b) In one example, if W > M and H > N, which means that the current block is not within the base region, the current block is divided into several parts. Each portion has a separate effective corresponding region in the ATMVP. For position a in the current block, its corresponding position B in the collocated block should be within the valid corresponding region of the portion where position a is located.

i. For example, the current block is divided into non-overlapping base regions. The effective corresponding area of one base area is its extension in the collocated base area and the collocated picture. Fig. 28 shows an example.

1. For example, assume that a position a in the current block is in one base region R. The collocated base region of R in the collocated picture is denoted CR. The corresponding position of a in the juxtaposed block is position B. The upper left corner position of CR is (xCR, yCR), then position B at (xN, yN) will be clipped to the active area xCR < = xN < xCR +m+4; yCR < = yN < yCR +n.

7. It is proposed to derive the motion vectors in ATMVP (e.g. TMV in 2.3.5.1.2) for locating the corresponding block in different pictures as:

a) In one example, TMV is always set equal to a default MV, such as (0, 0).

i. In one example, the default MV is signaled in VPS/SPS/PPS/slice header/slice group header/CTU/CU.

b) In one example, TMV is set to one MV stored in the HMVP table by the following method;

i. if the HMVP list is empty, then TMV is set equal to the default MV, such as (0, 0)

Otherwise (HMVP list is not empty),

1. the TMV may be set equal to the first element stored in the HMVP table;

2. Alternatively, TMV may be set equal to the last element stored in the HMVP table;

3. alternatively, the TMV may be set equal to only the specific MV stored in the HMVP table.

a. In one example, a particular MV references reference list 0.

b. In one example, a particular MV references reference list 1.

c. In one example, a particular MV references a particular reference picture in reference list 0, such as a reference picture with index 0.

d. In one example, a particular MV references a particular reference picture in reference list 1, such as a reference picture with index 0.

e. In one example, a particular MV references a collocated picture.

4. Alternatively, if a particular MV stored in the HMVP table is not found (e.g., as mentioned in entry 3), TMV may be set equal to the default MV;

a. in one example, only the first element stored in the HMVP table is searched to find a particular MV.

b. In one example, only the last element stored in the HMVP table is searched to find a particular MV.

c. In one example, some or all of the elements stored in the HMVP table are searched to find a particular MV.

5. Alternatively, in addition, the TMV obtained from the HMVP cannot reference the current picture itself.

6. Alternatively, if the TMV obtained from the HMVP table does not reference a collocated picture, the TMV obtained from the HMVP table may be scaled to the collocated picture.

c) In one example, TMV is set to one MV for one particular neighboring block. No other neighboring blocks are involved.

i. The particular neighboring blocks may be blocks A0, A1, B0, B1, B2 in fig. 22.

TMV may be set equal to the default MV if the following condition is satisfied

1. Specific neighboring blocks do not exist;

2. certain neighboring blocks are not inter-coded;

only TMV can be set equal to a specific MV stored in a specific neighboring block.

1. In one example, a particular MV references reference list 0.

2. In one example, a particular MV references reference list 1.

3. In one example, a particular MV references a particular reference picture in reference list 0, such as a reference picture with index 0.

4. In one example, a particular MV references a particular reference picture in reference list 1, such as a reference picture with index 0.

5. In one example, a particular MV references a collocated picture.

6. If a particular MV stored in a particular neighboring block cannot be found, TMV may be set equal to the default MV;

if the TMV obtained from a particular neighboring block does not reference a collocated picture, the TMV obtained from the particular neighboring block may be scaled to the collocated picture.

TMV obtained from a particular neighboring block cannot reference the current picture itself.

8. The MVdefaults 0 and MVdefaults 1 used in ATMVP as disclosed in 2.3.5.1.2 can be deduced as

a) In one example, mvdefaults 0 and mvdefaults 1 are set equal to (0, 0);

b) In one example, MVdefaultX (x=0 or 1) is derived from HMVP,

i. if the HMVP list is empty, MVdefaultX is set equal to a predefined default MV, such as (0, 0).

1. The predefined default MV is signaled in VPS/SPS/PPS/slice header/slice group header/CTU/CU.

Otherwise (HMVP list is not empty),

1. MVdefaultX may be set equal to the first element stored in the HMVP table.

2. MVdefaultX may be set equal to the last element stored in the HMVP table.

3. Only MVdefaultX may be set equal to a specific MV stored in the HMVP table.

a. In one example, a particular MV references reference list X.

b. In one example, a particular MV references a particular reference picture in reference list X, such as a reference picture with index 0.

4. If a particular MV stored in the HMVP table cannot be found, MVdefaultX may be set equal to a predefined default MV;

a. in one example, only the first element stored in the HMVP table is searched.

b. In one example, only the last element stored in the HMVP table is searched.

c. In one example, some or all of the elements stored in the HMVP table are searched.

5. If the MVdefaultX obtained from the HMVP table does not reference a collocated picture, the MVdefaultX obtained from the HMVP table may be scaled to a collocated picture.

6. MVdefaultX obtained from HMVP cannot reference the current picture itself.

c) In one example, MVdefaultX (x=0 or 1) is derived from neighboring blocks.

i. The neighboring blocks may include blocks A0, A1, B0, B1, B2 in fig. 22.

1. For example, only one of these blocks is used to derive MVdefaultX.

2. Alternatively, some or all of these blocks are used to derive MVdefaultX.

a. These blocks are checked in turn until a valid MVdefaultX is found.

3. If a valid MVdefaultX is not found from the selected one or more neighboring blocks, it is set equal to a predefined default MV, such as (0, 0).

a. The predefined default MV is signaled in VPS/SPS/PPS/slice header/slice group header/CTU/CU.

Effective MVdefaultX cannot be found from a particular neighboring block under the following conditions

1. Specific neighboring blocks do not exist;

2. certain neighboring blocks are not inter-coded;

MVdefaultX may be set equal to only the specific MVs stored in the specific neighboring blocks;

1. in one example, a particular MV references reference list X.

2. In one example, a particular MV references a particular reference picture in reference list X, such as a reference picture with index 0.

MVdefaultX obtained from a particular neighboring block may be scaled to a particular reference picture, such as the reference picture indexed 0 in reference list X.

MVdefaultX obtained from a particular neighboring block cannot reference the current picture itself.

9. For sub-block or non-sub-block ATMVP candidates, if the temporal blocks of the sub-block/whole block in the collocated picture are encoded with CPR mode, default motion candidates may alternatively be utilized.

a) In one example, the default motion candidate may be defined as a motion candidate associated with the center position of the current block (e.g., mvdefaults 0 and/or mvdefaults 1 as used in ATMVP as disclosed in 2.3.5.1.2).

b) In one example, if available, the default motion candidate may be defined as a (0, 0) motion vector, and both reference picture list reference picture indices are equal to 0.

10. It is proposed that default motion information for the ATMVP process (e.g., mvdefaults 0 and mvdefaults 1 for ATMVP as disclosed in 2.3.5.1.2) can be derived based on the positioning of the positions used in the sub-block motion information derivation process. With the proposed method, for this sub-block no further derivation of motion information is required, since the default motion information is directly assigned.

a) In one example, instead of using the center position of the current block, the center position of a sub-block (e.g., center sub-block) within the current block is utilized.

b) Examples of existing and proposed embodiments are shown in fig. 29A and 29B, respectively.

11. It is proposed that ATMVP candidates are always available for the following methods:

a) Assuming that the center point of the current block is (x 0, y 0), the corresponding position of (x 0, y 0) is located as m= (x0+mv 'x, y0+mv' y) in the collocated picture. The block Z covering M is found. If Z is intra coded, MVdefaults 0, MVdefaults 1 are derived by some of the methods set forth in clause 6.

b) Alternatively, some of the methods set forth in clause 8 are directly applied to obtain mvdefaults 0 and 1 without locating block Z to obtain motion information.

c) Alternatively, default motion candidates used in the ATMVP process are always available. If it is set to be unavailable based on the current design (e.g., temporal blocks are intra coded), other motion vectors may be utilized instead of default motion candidates.

i. In one example, the solution in international application PCT/CN2018/124639, which is incorporated herein by reference, may be applied.

d) Alternatively, in addition, whether ATMVP candidates are always available depends on other high level syntax information.

i. In one example, the ATMVP candidate may be always set to available only when the ATMVP enable flag at the slice/picture header or other video unit is set to true.

in one example, the above method applies only when the ATMVP enable flag in the slice header/picture header or other video unit is set to true and the current picture is not an IRAP picture and the current picture is not inserted into RefPicList0 with reference index equal to 0.

e) A fixed index or fixed index group is assigned to the ATMVP candidate. When ATMVP candidates are always unavailable, the fixed index/group index may be inferred as other types of motion candidates (such as affine candidates).

12. Whether to put zero motion affine Merge candidates into the sub-block Merge candidate list should be made dependent on whether affine prediction is enabled.

a) For example, if the affine use flag is off (sps_affine_enabled_flag is equal to 0), then the zero motion affine Merge candidate is not put into the sub-block Merge candidate list.

b) Alternatively, further, default motion vector candidates other than affine candidates are added.

13. It is proposed that non-affine filler candidates can be put into the sub-block Merge candidate list.

a) If the sub-block Merge candidate list is not full, zero motion non-affine fill candidates may be added.

b) When such a padding candidate is selected, the affine_flag of the current block should be set to 0.

c) Alternatively, if the sub-block Merge candidate list is not full and the affine use flag is off, a zero-motion non-affine fill candidate is placed into the sub-block Merge candidate list.

14. Assume that MV0 and MV1 represent MVs in reference list 0 and reference list 1 covering blocks of corresponding positions (for example, MV0 and MV1 may be mvz_0 and mvz_1 or mvzs_0 and mvzs_1 described in 2.3.5.1.2 section). MV0 'and MV1' represent MVs in reference list 0 and reference list 1 to be derived for a current block or sub-block. MV0 'and MV1' should be derived by scaling the following.

a) MV0, if the collocated picture is in reference list 1;

b) MV1, if the collocated picture is in reference list 0.

15. When the current picture is considered to be a reference picture in reference picture list X (PicRefListX, e.g., x=0) with an index set to M (e.g., 0), the ATMVP and/or TMVP enable/disable flags may be inferred as false for slices/slices or other types of video units. Here, for PicRefListX during the ATMVP/TMVP process, M may be equal to the target reference picture index to which the motion information of the temporal block should be scaled.

a) Alternatively, in addition, the above method only applies when the current picture is an Intra Random Access Point (IRAP) picture.

b) In one example, the ATMVP and/or TMVP enable/disable flag may be inferred to be false when the current picture is considered to be a reference picture in picreflist with an index set to M (e.g., 0) and/or a reference picture in picreflist with an index set to N (e.g., 0). The variables M and N represent target reference picture indices used in the TMVP or ATMVP process.

c) For the ATMVP procedure, the limit acknowledgement bit stream should follow the following rules: the collocated picture from which the motion information of the current block is derived should not be the current picture.

d) Alternatively, when the above condition is true, the ATMVP or TMVP procedure is not invoked.

16. It is proposed that ATMVP may still be enabled for a current block if a reference picture with index set to M (e.g., 0) in reference picture list X (PicRefListX, e.g., x=0) of the block is the current picture.

a) In one example, the motion information for all sub-blocks points to the current picture.

b) In one example, when motion information of a sub-block is obtained from a temporal block, the temporal block should be encoded by at least one reference picture pointing to a current picture of the temporal block.

c) In one example, when motion information of a sub-block is obtained from a temporal block, a scaling operation is not applied.

17. Regardless of whether ATMVP is used, the codec method of the sub-block Merge index is aligned.

a) In one example, for the first L binary bits, they are context coded. For the remaining binary bits they will be bypass encoded. In one example, L is set to 1.

b) Alternatively, for all binary bits, they are context coded.

18. MVs (MVx, MVy) used in ATMVP to locate corresponding blocks in different pictures (e.g., TMVs in 0) can be right-shifted to integer precision (denoted as (MVx ', MVy')) in the same rounding method as in MV scaling.

a) Alternatively, MVs used in ATMVP to locate corresponding blocks in different pictures (e.g., TMVs in 0) may be right shifted to integer precision in the same rounding method as in MV averaging.

b) Alternatively, MVs used in ATMVP to locate corresponding blocks in different pictures (e.g., TMVs in 0) may be right shifted to integer precision in the same rounding method as in the adaptive MV resolution (AMVR) process.

19. MVs (MVx, MVy) used in ATMVP to locate corresponding blocks in different pictures (e.g., TMVs in 0) can be right-shifted to integer precision (denoted as (MVx ', MVy')) by rounding to zero.

a) For example, MVx' = (mvx+ ((1 < < N) > > 1) - (MVx > =01:0)) > N; n is an integer representing the MV resolution, e.g., n=4.

i. For example, MVx' = (mvx+ (MVx > =07:8)) >4.

b) For example, MVy' = (mvy+ ((1 < < N) > > 1) - (MVy > =01:0)) > > N; n is an integer representing the MV resolution, e.g., n=4.

i. For example, MVy' = (mvy+ (MVy > =07:8)) >4.

20. In one example, the MVs (MVx, MVy) in entries 18 and 19 are used to locate the corresponding block to derive default motion information used in the ATMVP, such as MVs using the center position and shift of the sub-block, or MVs using the upper left corner position and shift of the current block.

a) In one example, MVs (MVx, MVy) are used to locate corresponding blocks to derive motion information for sub-blocks in a current block during an ATMVP process, such as MVs using the center position and shift of the sub-blocks.

21. The method presented in entries 18, 19, 20 may also be applied to other codec tools that need to locate reference blocks in different pictures or current pictures by motion vectors.

22. MV (MVx, MVy) in ATMVP to locate a corresponding block (e.g., TMV in 0) in a different picture can be scaled even though it points to a collocated picture.

a) In one example, an MV may be scaled if the width and/or height of the collocated picture (or the coherence window therein) is different from the width and/or height of the current picture (or the coherence window therein).

b) Let the width and height (of the coincidence window) of the collocated picture be denoted W1 and H1, respectively. The width and height of the current picture (of the consistency window) are denoted W2 and H2, respectively. MV (MVx, MVy) can be scaled to MVx '=mvx×w1/W2, and MVy' =mvy×h1/H2.

23. The center point of the current block used to derive motion information in the ATMVP process, such as the position (x 0, y 0) in 2.3.5.1.2, may be further modified by scaling and/or adding an offset.

a) In one example, the center point may be further modified if the width and/or height of the collocated picture (or the coherence window therein) is different from the width and/or height of the current picture (or the coherence window therein).

b) Assume that the upper left positions of the coincidence windows in the collocated pictures are denoted as X1 and Y1. The upper left position of the consistency window defined in the current picture is denoted as X2 and Y2. The width and height (of the coincidence window) of the juxtaposed pictures are denoted W1 and H1, respectively. The width and height (of the coincidence window) of the current picture are denoted W2 and H2, respectively. Then (X0, Y0) may be modified to X0 '= (X0-X2) w1/w2+x1 and y0' = (Y0-Y2) h1/h2+y1.

i. Alternatively, x0 '=x0×w1/W2, y0' =y0×h1/H2.

24. The corresponding location (e.g., location M in 2.3.5.1.2) used to derive motion information in the ATMVP process may be further modified by scaling and/or adding an offset.

a) In one example, if the width and/or height of the collocated picture (or the consistent window therein) is different from the width and/or height of the current picture (or the consistent window therein), the corresponding location may be further modified.

b) Assume that the upper left positions of the coincidence windows in the collocated pictures are denoted as X1 and Y1. The upper left position of the consistency window defined in the current picture is denoted as X2 and Y2. The width and height (of the coincidence window) of the juxtaposed pictures are denoted W1 and H1, respectively. The width and height (of the coincidence window) of the current picture are denoted W2 and H2, respectively. M (X, Y) may be modified to X '= (X-X2) w1/w2+x1 and Y' = (Y-Y2) H1/h2+y1.

i. Alternatively, x '=x×w1/W2, y' =y×h1/H2.

Sub-picture correlation

25. In one example, if the positions (i, j) and (i, j-1) belong to different sub-pictures, the width of the sub-picture S ending at the (j-1) column may be set equal to j minus the leftmost column of the sub-picture S.

a) Examples based on the prior embodiments are highlighted below.

26. In one example, the height of sub-picture S ending at (NumSubPicGridRows-1) line may be set equal to (NumPicGridRows-1) minus the topmost line of sub-picture S followed by 1.

a) Examples based on the prior embodiments are highlighted below.

27. In one example, the width of sub-picture S ending at (NumSubPicGridColumns-1) column may be set equal to (NumPicGridColumns-1) minus the leftmost column of sub-picture S plus 1.

a) Examples based on the prior embodiments are highlighted below.

28. The sub-picture grid must be an integer multiple of the CTU size.

a) Examples based on the prior embodiments are highlighted below.

The width of each element of the sub-picture identifier grid is specified in units of CtbSizeY, with sub-bpic_grid_col_width_minus1 plus 1. The syntax element has a length of Ceil (Log 2 (pic_width_max_in_luma_samples/ctbsize)) bits.

The variable NumSubPicGridCols is derived as follows:

NumSubPicGridCols＝

(pic_width_max_in_luma_samples+subpic_grid_col_width_minus1*CtbSizeY+CtbSizeY-1)/

(subpic_grid_col_width_minus1*CtbSizeY+CtbSizeY) (7-5)

the sub_grid_row_height_minus1 plus 1 specifies the height of each element in the sub-picture identifier grid in units of 4 samples. The syntax element has a length of Ceil (Log 2 (pic_height_max_in_luma_samples/ctbsize)) bits.

The variable NumSubPicGridRows is derived as follows:

NumSubPicGridRows＝(

pic_height_max_in_luma_samples+subpic_grid_row_height_minus1*CtbSizeY+CtbSizeY-1)/

(subpic_grid_row_height_minus1*CtbSizeY+CtbSizeY)

(7-6)

29. Consistency constraints are added to ensure that the sub-pictures cannot overlap each other and that all sub-pictures must cover the entire picture.

a) Examples based on the existing embodiments are highlighted as follows

Any sub_grid_idx [ i ] [ j ] must be equal to idx if the following conditions are met simultaneously:

i>＝SubPicTop[idx]and i<SubPicTop[idx]+SubPicHeight[idx].

j>＝SubPicLeft[idx]and j<SubPicLeft[idx]+SubPicWidth[idx].

any sub_grid_idx [ i ] [ j ] must be different from idx if the following conditions are not met at the same time:

i>＝SubPicTop[idx]and i<SubPicTop[idx]+SubPicHeight[idx].

j>＝SubPicLeft[idx]and j<SubPicLeft[idx]+SubPicWidth[idx].

RPR correlation

30. A syntax element (e.g., a flag) denoted as rpr_flag is signaled to indicate whether RPR can be used in a video unit (e.g., a sequence). The rpr_flag may be signaled in SPS, VPS or DPS.

a) In one example, if RPR is not used for signaling (e.g., rpr_flag is 0), then all widths/heights signaled in PPS must be the same as the maximum width/height signaled in SPS.

b) In one example, if RPR is not used (e.g., rpr_flag is 0) is signaled, all widths/heights in PPS are not signaled and are inferred as maximum width/maximum height signaled in SPS.

c) In one example, if RPR is not signaled (e.g., rpr_flag is 0), then no coherency window information is used in the decoding process. Otherwise (RPR to be used is signaled), the conformance window information can be used in the decoding process.

31. It is proposed that an interpolation filter used to derive a prediction block of a current block in a motion compensation process may be selected depending on whether the resolution of a reference picture is different from the current picture or whether the width and/or height of the reference picture is greater than the width and/or height of the current picture.

a. In one example, an interpolation filter with fewer taps may be applied when condition a is satisfied, where condition a depends on the dimensions of the current picture and/or the reference picture.

i. In one example, condition a is that the resolution of the reference picture is different from the current picture.

in one example, condition a is that the width and/or height of the reference picture is greater than the width and/or height of the current picture.

in one example, condition a is W1> a > W2 and/or H1> b > H2, where (W1, H1) represents the width and height of the reference picture and (W2, H2) represents the width and height of the current picture, a and b are two factors, e.g., a=b=1.5.

in one example, condition a may also depend on whether bi-prediction is used.

1) Condition a is satisfied only when bi-prediction is used for the current block.

In one example, condition A may depend on M and N, where M and N represent the width and height of the current block.

1) For example, condition a is satisfied only when m×n < =t, where T is an integer such as 64.

2) For example, condition a is satisfied only when M < =t1 or N < =t2, where T1 and T2 are integers, e.g., t1=t2=4.

3) For example, condition a is satisfied only when M < =t1 and N < =t2, where T1 and T2 are integers, e.g., t1=t2=4.

4) For example, condition a is satisfied only when M < = T or M < = T1 or N < = T2, where T, T1 and T2 are integers, e.g., t=64, t1=t2=4.

5) In one example, the smaller condition in the above sub-items may be replaced with a larger condition. In one example, a 1 tap filter is applied. In other words, unfiltered integer pixels are output as interpolation results.

In one example, a bilinear filter is applied when the resolution of the reference picture is different from the current picture.

In one example, a 4 tap filter or a 6 tap filter is applied when the resolution of the reference picture is different from the width and/or height of the current picture or the reference picture is greater than the width and/or height of the current picture.

1) The 6 tap filter can also be used for affine motion compensation.

2) The 4 tap filter may also be used for interpolation of chroma samples.

b. Whether and/or how the method disclosed in entry 31 is applied may depend on the color component.

i. For example, the method is applied only to the luminance component.

c. Whether and/or how the method disclosed in entry 31 is applied may depend on the interpolation filtering direction.

i. For example, the method is only applied to horizontal filtering.

For example, the method is only applied to vertical filtering.

CIIP correlation

32. The intra-prediction signal used in the CIIP process may be done at the TU level instead of the CU level (e.g., using TUs instead of reference points external to the CU).

a) In one example, if the CU width or height is greater than the maximum transform block size, the CU may be partitioned into multiple TUs, and intra/inter predictions may be generated for each TU, e.g., using reference samples outside the TUs.

b) In one example, if the maximum transform size K is less than 64 (such as k=32), then the intra prediction used in CIIP will be performed recursively as in a conventional intra-coded block.

c) For example, a CIIP encoded block of km×kn is divided into MN k×k blocks, where M and N are integers, and intra prediction is performed for each k×k block. Intra prediction of a K x K block that is subsequently encoded/decoded may depend on reconstructed samples of a K x K block that was previously encoded/decoded.

5. Additional example embodiments (bold text shows changes to the current standard version)

5.1 example #1: syntax design examples in SPS/PPS/slice header/slice group header

The changes compared to the vtm3.0.1rc1 reference software are highlighted in large bold fonts as follows:

5.2 example #2: syntax design examples in SPS/PPS/slice header/slice group header

7.3.2.1 sequence parameter set RBSP syntax

The sps_sbtmvp_enabled_flag being equal to 1 specifies that the sub-block based temporal motion vector predictor can be used in the CVS to decode pictures of all slices with slice_type not equal to I. The sps_sbtmvp_enabled_flag being equal to 0 specifies that no sub-block based temporal motion vector predictor is used in the CVS. When there is no sps_sbtmvp_enabled_flag, it is inferred to be equal to 0.

five minus max num sub-prediction Merge cand specifies the maximum number of sub-block-based Merge Motion Vector Prediction (MVP) candidates supported in the slice subtracted from 5. When there is no five minus max num sub merge call, it is inferred to be equal to 5-sps sbtmvp enabled flag. The maximum number of sub-block based Merge MVP candidates maxnumsubsublockmerge is derived as follows:

MaxNumSubblockMergeCand＝5-five_minus_max_num_subblock_merge_cand(7-45)

the value of maxnumsubsublockmergecand should be in the range of 0 to 5 (including 0 and 5).

Derivation of motion vectors and reference indices in 8.3.4.2 subblock Merge mode

The inputs to this process are:

.. [ does not change the current VVC specification draft ].

The output of this process is:

.. [ do not change the current VVC specification draft ].

Variable numSbX, numSbY and sub-block Merge candidate list subsublockmergcondlist are derived by the following ordered steps:

when sps_sbtmvp_enabled_flag is equal to 1 and (the current picture is IRAP and index 0 of reference picture list 0 is the current picture) is not true, the following steps are applied:

the derivation procedure of the Merge candidates from the neighboring coding units as specified in 8.3.2.3 is invoked with the luminance coding block position (xCb, yCb), the luminance coding block width cbWidth, the luminance coding block height cbHeight, and the luminance coding block width as inputs, and the outputs are availability flags availableglaga 0, availableglaga 1, availableglaga b0, availableglagb 1, and availableglagb 2, reference indexes refIdxLXA0, refIdxLXA1, and refIdxLXA 2, and the prediction list uses flags predflag lxa0, predflag lxa1, predflag lxb1, and predflag lxb2, and motion vectors mvLXA0, mvLXB1, and mvLXB1, or lxv 1.

The derivation of the sub-block-based temporal Mergee candidates as specified in clause 8.3.4.3 is invoked with the luminance location (xCb, yCb), the luminance coding block width cbWidth, the luminance coding block height cbHeight, the availability flags availableF 0, availableF b1, the reference indices refIdxLXa0, refIdxLXa1, refIdxLXB0, refIdxLXB1, the prediction list uses the flags predFlagLXa0, predFlagLXa1, predFlagLXB0, predFlagLXB1, and the motion vectors mvLXa0, mvLXA1, mvLXB0, mvLXB1 as inputs, and the output is the availability flags availabsmol, the number of luminance blocks in the horizontal and vertical directions and Snux=SnubLX, snux, snubX 1, snubX=SnubLX, snubX 1.

When sps_affine_enabled_flag is equal to 1, the sample positions (xNbA 0, yNbA 0), (xNbA 1, yNbA 1), (xNbA 2, yNbA 2), (xNbB 0, yNbB 0), (xNbB 1, yNbB 1), (xNbB 2, yNbB 2), (xNbB 3, yNbB 3), and the derivation of the variables numSbX and numSbY are as follows:

[ does not change the current VVC Specification ].

5.3 example #3 example of rounding of MV

Grammar modification is based on existing implementations.

8.5.5.3 derivation of time-domain Mergee candidates based on sub-blocks

…

The positions of the juxtaposed sub-blocks inside the ColPic (xColSb, yColSb) are derived as follows.

1. The following steps are applied:

yColSb＝Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),

ySb+((tempMv[1]+8–(tempMv[1]>＝0))>>4))

-if the sub_treated_as_pic_flag [ SubPicIdx ] is equal to 1, the following steps are applied:

xColSb＝Clip3(xCtb,Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

otherwise (sub_treated_as_pic_flag [ SubPicIdx ] equal to 0), the following steps are applied:

xColSb＝Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xSb+(tempMv[0]+8+(tempMV[0]>＝0))>>4))

…

8.5.5.4 derivation of temporal Merge base motion data based on sub-blocks

…

The positions of the collocated blocks inside the ColPic (xccolcb, yccolcb) are derived as follows.

The following steps are applied:

yColCb＝Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),yColCtrCb+((tempMv[1]+8–(tempMv[1]>＝0))>>4))

xColCb＝Clip3(xCtb,Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

-else (sub_treated_as_pic_flag [ SubPicIdx ] equal to 0) applying the following steps:

xColCb＝Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

5.3 example #3 example of MV rounding

Syntax changes are based on existing implementations.

8.5.5.3 derivation of time-domain Merge candidates based on sub-blocks

…

1. The following steps are applied:

-yColSb＝Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),

ySb+((tempMv[1]+8–(tempMv[1]>＝0))>>4))

-xColSb＝Clip3(xCtb,Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

-xColSb＝Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xSb+(tempMv[0]+8+(tempMV[0]>＝0))>>4))

…

8.5.5.4 derivation of temporal Merge base motion data based on sub-blocks

…

-applying the following steps:

-yColCb＝Clip3(yCtb,Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),yColCtrCb+((tempMv[1]+8–(tempMv[1]>＝0))>>4))

-xColCb＝Clip3(xCtb,Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

-xColCb＝Clip3(xCtb,Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8+(tempMV[0]>＝0))>>4))

5.4 example #4 second example of MV rounding

8.5.5.3 derivation of time-domain Merge candidates based on sub-blocks

The inputs of this process are:

luminance position (xCb, yCb) of the left upsampled point of the current luma coding block relative to the left upsampled point of the current picture,

a variable cbWidth, specifying the width of the current coded block in the luma samples,

the variable cbHeight specifies the height of the current coded block in the luma samples.

Availability flag availableglaga of adjacent coding unit ₁ ，

-references adjacent to coding unitsIndex refIdxLXA ₁ Wherein X is 0 or 1

Prediction list utilization flag predflag lxa of neighboring coding units ₁ Wherein X is 0 or 1

Motion vector mvLXA with 1/16 fractional sample precision adjacent to the coding unit ₁ Wherein X is 0 or 1.

The output of this process is:

the availability flag availableglagsbcol,

The number of luminance coding sub-blocks numSbX and numSbY in the horizontal and vertical direction,

reference indices refIdxL0 sbmol and refIdxL1 sbmol,

luminance motion vectors mvL0 sbcl [ xSbIdx ] [ ySbIdx ] and mvL1 sbcl [ xSbIdx ] [ ySbIdx ] with a fraction sample precision of 1/16, where xsbidx=0..numsbx-1, ysbidx=0..numsby-1,

the prediction list uses the flags predflag l0SbCol [ xscidx ] [ yscidx ] and predflag l1SbCol [ xscidx ] [ yscidx ], xscidx=0..numsbx-1, where yscidx=0..numsby-1.

The availability flag availableglagsbcol is derived as follows.

-setting availableglagsbcol equal to 0 if one or more of the following conditions are true. -slice_temporal_mvp_enabled_flag is equal to 0.

-sps_sbtmvp_enabled_flag is equal to 0.

-cbWidth is less than 8.

-cbHeight is less than 8.

Otherwise, the following ordered steps are applied:

1. the position (xCtb, yCtb) of the left upper sample point of the luma coding tree block containing the current coding block and the position (xCtr, yCtr) of the right lower center sample point of the current luma coding block are derived as follows:

-xCtb＝(xCb>>CtuLog2Size)<<CtuLog2Size (8-542)

-yCtb＝(yCb>>CtuLog2Size)<<CtuLog2Size (8-543)

-xCtr＝xCb+(cbWidth/2) (8-544)

-yCtr＝yCb+(cbHeight/2) (8-545)

2. the luminance position (xColCtrCb, yColCtrCb) is set equal to the left superstrate of the juxtaposed luminance coding block within ColPic covering the position given by (xCtr, yCtr) relative to the left superstrate of the juxtaposed picture specified by ColPic.

3. With position (xCtb, yCtb), position (xColCtrCb, yColCtrCb), availability flag availableF lagA ₁ Predictive list utilization flag predFlagLXA ₁ Reference index refIdxLXA ₁ Motion vector mvLXA ₁ As inputs (where X is 0 and 1), the derivation process of the sub-block-based domain Merge base motion data as specified in clause 8.5.5.4 is invoked, and the motion vector ctrmxvlx, the prediction list of the collocated block, uses the flag ctrpredflag lx, and the temporal motion vector tempMv as outputs, where X is 0 and 1.

4. The variable availableglagsbcol is derived as follows:

-if both ctrpredflag l0 and ctrpredflag l1 are equal to 0, then availableglagsbcol is set equal to 0.

Otherwise, availableglagsbcol is set equal to 1.

When availableglagSbmol is equal to 1, the following steps are applied:

variables numSbX, numSbY, sbWidth, sbHeight and refIdxLXSbCol are derived as follows:

-numSbX＝cbWidth>>3

(8-546)

-numSbY＝cbHeight>>3

(8-547)

-sbWidth＝cbWidth/numSbX

(8-548)

-sbHeight＝cbHeight/numSbY

(8-549)

-refIdxLXSbCol＝0

(8-550)

for xsbdx=0..numsbx-1 and ysbdx=0..numsby-1, the motion vector mvlxsbmol [ xsbdx ] [ ysbdx ] and the prediction list utilization flag predflag lxsbmol [ xsbdx ] [ ysbdx ] are derived as follows:

-specifying the luminance position (xSb, ySb) of the top left sample of the current coded sub-block relative to the top left luminance sample of the current picture is derived as follows:

-xSb＝xCb+xSbIdx*sbWidth+sbWidth/2

(8-551)

-ySb＝yCb+ySbIdx*sbHeight+sbHeight/2

(8-552)

The positions of the juxtaposed sub-blocks in ColPic (xColSb, yColSb) are derived as follows.

1. The following steps are applied:

-yColSb＝Clip3(yCtb,

Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),ySb+((tempMv[1]+8–(tempMv[1]>＝01:0))>>4)) (8-553)

-xColSb＝Clip3(xCtb,

Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+8–(tempMv[0]>＝01:0))>>4)) (8-554)

-xColSb＝Clip3(xCtb,

Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+8–(tempMv[0]>＝01:0))>>4)) (8-555)

the variable currCb specifies the luma coded block within the current picture that covers the current coded sub-block.

The variable colCb specifies the luma coded block within ColPic covering the modified position given by ((xColSb > > 3) < <3, (yColSb > > 3) < < 3).

-setting the luminance position (xccolcb, yccolcb) equal to the left upsampling point of the collocated luminance coding block specified by colCb with respect to the left upsampling point of the collocated picture specified by ColPic.

-invoking a derivation procedure of the collocated motion vector as specified in 8.5.2.12 with currCb, colCb, (xcold, ycold), refIdxL0 set equal to 0 and sbFlag set equal to 1 as inputs, and assigning the outputs to the motion vectors mvL0SbCol [ xsbdx ] [ ysbdx ] and availableglagl 0SbCol of the sub-block.

-invoking a derivation procedure of the collocated motion vector as specified in 8.5.2.12 with currCb, colCb, (xcold, ycold), refIdxL1 set equal to 0 and sbFlag set equal to 1 as inputs, and assigning the outputs to the motion vectors mvL1SbCol [ xsbdx ] [ ysbdx ] and availableglagl 1SbCol of the sub-block.

When availableglagl 0SbCol and availableglagl 1SbCol are both equal to 0, the following steps are applied for X being 0 and 1:

-mvLXSbCol[xSbIdx][ySbIdx]＝ctrMvLX (8-556)

-predFlagLXSbCol[xSbIdx][ySbIdx]＝ctrPredFlagLX (8-557)

8.5.5.4 derivation of temporal Merge base motion data based on sub-blocks

The inputs of this process are:

the position (xCtb, yCtb) of the left-hand sample of the luma coding tree block containing the current coding block,

-the position (xColCtrCb, yColCtrCb) of the left upsampled point of the concatenated luma coded block covering the lower right centre upsampled point.

Availability flag availableglaga of adjacent coding unit ₁ ，

-reference index refIdxLXA of adjacent coding units ₁ ，

Prediction list utilization flag predflag lxa of neighboring coding units ₁ ，

Motion vector mvLXA with 1/16 fractional sample precision adjacent to the coding unit ₁ 。

The output of this process is:

motion vectors ctrMvL0 and ctrMvL1,

prediction utilization flags ctrPredFlagL0 and ctrPredFlagL1,

temporal motion vector tempMv.

The variable tempMv was set as follows:

-tempMv[0]＝0 (8-558)

the variable currPic specifies the current picture, tempMv [1] =0 (8-559).

When availableglaga ₁ When equal to TRUE, the following steps are applied:

-setting tempMv equal to mvL0A if all the following conditions are true ₁ ：

–predFlagL0A ₁ Is equal to 1 and is equal to 1,

–DiffPicOrderCnt(ColPic,RefPicList[0][refIdxL0A ₁ ]) Is equal to 0 and is equal to,

otherwise, set tempMv equal to mvL1A if all the following conditions are true ₁ ：

Slice type is equal to B,

–predFlagL1A ₁ is equal to 1 and is equal to 1,

–DiffPicOrderCnt(ColPic,RefPicList[1][refIdxL1A ₁ ]) Equal to 0.

The location of the collocated block within ColPic (xccolcb, yccolcb) is derived as follows.

-applying the following steps:

-yColCb＝Clip3(yCtb,

Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),yColCtrCb+((tempMv[1]+8-(tempMv[1]>＝01:0))>>4)) (8-560)

-xColCb＝Clip3(xCtb,

Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8–(tempMv[0]>＝01:0))>>4)) (8-561)

-xColCb＝Clip3(xCtb,

Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+8–(tempMv[0]>＝01:0))>>4)) (8-562)

the array colPredMode is set equal to the prediction mode array CuPredMode [0] of the collocated picture specified by ColPic.

The motion vectors ctrMvL0 and ctrMvL1, and the prediction list utilization flags ctrpredflag l0 and ctrpredflag l1 are derived as follows:

-if colPredMode [ xccolcb ] [ yccolcb ] is equal to mode_inter, the following steps are applied:

the variable currCb specifies the luma coded block covered (xCtrCb, yCtrCb) within the current picture.

The variable colCb specifies a luma coded block within ColPic covering the modified position given by ((xccolcb > > 3) < <3, (yccolcb > > 3) < < 3).

Invoking the concatenated motion vector derivation procedure specified in 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL0 set to 0 and sbFlag set to 1 as inputs, and assigning the outputs to ctrMvL0 and ctrpredflag l0.

Invoking the concatenated motion vector derivation procedure specified in 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL1 set equal to 0 and sbFlag set equal to 1 as inputs, and assigning the outputs to ctrMvL1 and ctrpredflag l1.

-otherwise, applying the steps of:

-ctrPredFlagL0＝0 (8-563)

-ctrPredFlagL1＝0 (8-564)

5.5 example #5: third example of MV rounding

8.5.5.3 derivation of time domain Merge candidates based on sub-blocks

The inputs of this process are:

-adjacent to the coding unitAvailability flag availableF lagA ₁ ，

-reference index refIdxLXA of adjacent coding units ₁ Wherein X is 0 or 1,

prediction list utilization flag predflag lxa of neighboring coding units ₁ Wherein X is 0 or 1,

The output of this process is:

the availability flag availableglagsbcol,

Reference indices refIdxL0 sbmol and refIdxL1 sbmol,

the prediction list uses the flags predflag l0SbCol [ xSbIdx ] [ ySbIdx ] and predflag l1SbCol [ xSbIdx ] [ ySbIdx ], where xsbidx=0..numsbx-1, ysbidx=0..numsby-1.

The availability flag availableglagsbcol is derived as follows.

-setting availableglagsbcol equal to 0 if one or more of the following conditions are true.

-slice_temporal_mvp_enabled_flag is equal to 0.

-sps_sbtmvp_enabled_flag is equal to 0.

-cbWidth is less than 8.

-cbHeight is less than 8.

Otherwise, the following ordered steps are applied:

5. the position (xCtb, yCtb) of the left upper sample point of the luma coding tree block containing the current coding block and the position (xCtr, yCtr) of the right lower center sample point of the current luma coding block are derived as follows:

-xCtb＝(xCb>>CtuLog2Size)<<CtuLog2Size (8-542)

-yCtb＝(yCb>>CtuLog2Size)<<CtuLog2Size (8-543)

-xCtr＝xCb+(cbWidth/2) (8-544)

-yCtr＝yCb+(cbHeight/2) (8-545)

6. the luminance position (xColCtrCb, yColCtrCb) is set equal to the left-hand sample of the collocated luminance coding block within ColPic covering the position given by (xCtr, yCtr) relative to the left-hand luminance sample of the collocated picture specified by ColPic.

7. With position (xCtb, yCtb), position (xColCtrCb, yColCtrCb), availability flag availableF lagA ₁ Predictive list utilization flag predFlagLXA ₁ Reference index refIdxLXA ₁ Motion vector mvLXA ₁ As inputs, where X is 0 and 1, invokes the derivation process of the sub-block based temporal Merge base motion data as specified in clause 8.5.5.4, and takes as output the motion vector ctrmxvlx, the prediction list of the collocated block, with the flag ctrpredflag lx and the temporal motion vector tempMv, where X is 0 and 1.

8. The variable availableglagsbcol is derived as follows:

Otherwise, availableglagsbcol is set equal to 1.

When availableglagSbmol is equal to 1, the following steps are applied:

-numSbX＝cbWidth>>3

(8-546)

-numSbY＝cbHeight>>3

(8-547)

-sbWidth＝cbWidth/numSbX

(8-548)

-sbHeight＝cbHeight/numSbY

(8-549)

-refIdxLXSbCol＝0

(8-550)

-xSb＝xCb+xSbIdx*sbWidth+sbWidth/2

(8-551)

-ySb＝yCb+ySbIdx*sbHeight+sbHeight/2

(8-552)

the positions of the juxtaposed sub-blocks within ColPic (xColSb, yColSb) are derived as follows:

1. The following steps are applied:

-yColSb＝Clip3(yCtb,

Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),ySb+((tempMv[1]+(tempMv[1]>＝07:8))>>4)) (8-553)

-xColSb＝Clip3(xCtb,

Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+(tempMv[0]>＝0？7:8))>>4)) (8-554)

-xColSb＝Clip3(xCtb,

Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xSb+((tempMv[0]+(tempMv[0]>＝07:8))>>4)) (8-555)

The variable colCb specifies a luma coded block within ColPic covering the modified positions given by ((xColSb > > 3) < <3, (yColSb > > 3) < < 3).

-invoking a concatenated motion vector derivation procedure as specified in 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL0 set equal to 0 and sbFlag set equal to 1 as inputs, and assigning the outputs to the motion vectors mvL0SbCol [ xsbdx ] [ ysbdx ] and availableglagl 0SbCol of the sub-block.

-invoking a concatenated motion vector derivation procedure as specified in 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL1 set equal to 0 and sbFlag set equal to 1 as inputs, and assigning the outputs to the motion vectors mvL1SbCol [ xsbdx ] [ ysbdx ] and availableglagl 1SbCol of the sub-block.

-mvLXSbCol[xSbIdx][ySbIdx]＝ctrMvLX (8-556)

-predFlagLXSbCol[xSbIdx][ySbIdx]＝ctrPredFlagLX (8-557)

8.5.5.4 derivation of temporal Merge base motion data based on sub-blocks

The inputs of this process are:

Availability flag availableglaga of adjacent coding unit ₁ ，

-reference index refIdxLXA of adjacent coding units ₁ ，

The output of this process is:

motion vectors ctrMvL0 and ctrMvL1,

the prediction list uses the flags ctrprefdflag 0 and ctrpreflag 1,

temporal motion vector tempMv.

The variable tempMv was set as follows:

-tempMv[0]＝0 (8-558)

-tempMv[1]＝0 (8-559)

the variable currPic specifies the current picture.

When availableglaga ₁ When equal to TRUE, the following steps are applied:

-setting tempMv equal to mvL0A if all the following conditions are true ₁ ：

–predFlagL0A ₁ Equal to 1

Slice type is equal to B,

–predFlagL1A ₁ is equal to 1 and is equal to 1,

–DiffPicOrderCnt(ColPic,RefPicList[1][refIdxL1A ₁ ]) Equal to 0.

-applying the following steps:

-yColCb＝Clip3(yCtb,

Min(CurPicHeightInSamplesY-1,yCtb+(1<<CtbLog2SizeY)-1),yColCtrCb+((tempMv[1]+(tempMv[1]>＝07:8))>>4)) (8-560)

-xColCb＝Clip3(xCtb,

Min(SubPicRightBoundaryPos,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+(tempMv[0]>＝07:8))>>4)) (8-561)

-xColCb＝Clip3(xCtb,

Min(CurPicWidthInSamplesY-1,xCtb+(1<<CtbLog2SizeY)+3),xColCtrCb+((tempMv[0]+(tempMv[0]>＝07:8))>>4)) (8-562)

-invoking the derivation of the collocated motion vector specified in clause 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL0 set to 0 and sbFlag set to 1 as inputs, and assigning the outputs to ctrMvL0 and ctrpredflag l0.

-invoking the derivation of the collocated motion vector specified in clause 8.5.2.12 with currCb, colCb, (xccolcb, yccolcb), refIdxL1 set to 0 and sbFlag set to 1 as inputs, and assigning the outputs to ctrMvL1 and ctrpredflag l1.

-otherwise, applying the steps of:

-ctrPredFlagL0＝0 (8-563)

-ctrPredFlagL1＝0 (8-564)

8.5.6.3 fractional sample interpolation process

8.5.6.3.1 overview

The inputs of this process are:

-specifying a luminance position (xSb, ySb) of a left upsampled point of the current coded sub-block relative to a left upsampled luminance sample of the current picture;

the variable sbWidth, specifies the width of the current coding sub-block,

the variable sbHeight, specifies the height of the current coded sub-block,

motion vector offset mvOffset,

a refined motion vector refMvLX,

the selected reference picture sample number group refPicLX,

half-pel interpolation filter index hpelIfIdx,

the bidirectional optical flow flag bdofFlag,

the variable cIdx specifies the color component index of the current block.

The output of this process is:

The prediction block boundary extension size brdExtSize is derived as follows:

-brdExtSize＝(bdofFlag||(inter_affine_flag[xSb][ySb]&&sps_affine_prof_enabled_flag))？2:0 (8-752)

The motion vector mvLX is set equal to (refMvLX-mvOffset).

-if cIdx is equal to 0, applying the following steps:

-scaling factor and its fixed point representation are defined as

-hori_scale_fp＝

((fRefWidth<<14)+(PicOutputWidthL

>>1))/PicOutputWidthL (8-753)

-vert_scale_fp＝

((fRefHeight<<14)+(PicOutputHeightL

>>1))/PicOutputHeightL (8-754)

Let (xIntL, yIntL) be the luminance position given in full-pel units and (xFracl, yFracl) be the offset given in 1/16-pel units. These variables are used only in this bar to specify fractional sample positions within the reference sample number group refPicLX.

-filling the upper left coordinates (xsbsint _L ,ySbInt _L ) Is set equal to (xSb+ (mvLX [0 ]]>>4),ySb+(mvLX[1]>>4))。

For each luminance sample point position (x _L ＝0..sbWidth-1+brdExtSize,y _L =0..sbheight-1+brdextsize), the corresponding predicted luminance sample value predsamplelx [ x ] _L ][y _L ]The derivation is as follows:

-order (refxSb _L ,refySb _L ) Sum (refx) _L ,refy _L ) For a motion vector given in 1/16-sample units (refMvLX [0 ]],refMvLX[1]) Pointed luminance position (refMvLX [0 ]]，refMvLX[1]). Variable refxSb _L 、refx _L 、refySb _L And refy _L The derivation is as follows:

-refxSb _L ＝

((xSb<<4)+refMvLX[0])*hori_scale_fp (8-755)

-refx _L ＝

((Sign(refxSb)*((Abs(refxSb)+128)>>8)

+x _L *((hori_scale_fp+8)>>4))+32)>> 6(8-756)

-refySb _L ＝((ySb<<

4)+refMvLX[1])*vert_scale_fp (8-757)

-refyL＝

((Sign(refySb)*((Abs(refySb)+128)>>8)+yL*

((vert_scale_fp+8)>>4))+32)>>6 (8-758)

-variable xInt _L 、yInt _L 、xFrac _L And yFrac _L The derivation is as follows:

-xInt _L ＝refx _L >>4

(8-759)

-yInt _L ＝refy _L >>4

(8-760)

-xFrac _L ＝refx _L &15

(8-761)

-yFrac _L ＝refy _L &15

(8-762)

-setting using6TapFlag to 1 if all the following conditions are met:

–cbWidth[0][xSb][ySb]<＝4||cbHeight[0][xSb][ySb]<＝4||

cbWidth[0][xSb][ySb]*cbHeight[0][xSb][ySb]<＝64.

–PredFlagL0[xSb][ySb]＝＝1&&PredFlagL1[xSb][ySb]＝＝1.

-if bdofFlag is equal to TRUE or (sps_affine_prof_enabled_flag is equal to TRUE and inter_affine_flag [ xSb) ][ySb]Equal to TRUE) and one or more of the following conditions are TRUE, then the luminance integer-sample acquisition procedure is invoked as specified in clause 8.5.6.3.3 and is repeated with (xInt _L +(xFrac _L >>3)-1),yInt _L +(yFrac _L >>3) -1) and refPicLX as inputs to derive a predicted luminance sample value predsamplelx [ x ] _L ][y _L ]。

1.x _L Equal to 0.

2.x _L Equal to sbWidth +1.

3.y _L Equal to 0.

4.y _L Equal to sbheight+1.

Otherwise, by invoking the luminance sample 8 tap interpolation filter procedure as specified in clause 8.5.6.3.2, and filtering with (xIntL- (brdExtSize)>01:0),yIntL-(brdExtSize>01:0))、(xFracL,yFracL)、(xSbInt _L ,ySbInt _L ) refPicLX, hpelIfIdx, sbWidth, sbHeight and (xSb, ySb) and using6TapF lag as inputs to derive predicted luminance sample values predsamplesLX [ xL ]][yL]。

Otherwise (cIdx is not equal to 0), the following steps are applied:

let (xIntC, yIntC) be the chromaticity position given in full-pel units and (xFracc, yFracc) be the offset given in 1/32-pel units. These variables are used only in this bar to specify the conventional fractional sample positions within the reference sample number group refPicLX.

-setting the upper left coordinates (xSbIntC, ySbIntC) of the reference sample filled boundary block equal to ((xSb/submidthc) + (mvLX [0] > > 5), (ySb/subheight c) + (mvLX [1] > > 5)).

-for each chroma-sample point in the predicted chroma-sample point group predsamplelx (xc=0..sbwidth-1, yc=0..sbheight-1), the corresponding predicted chroma-sample point value predsamplelx [ xC ] [ yC ] is derived as follows:

-order (refxSb _C ,refySb _C ) Sum (refx) _C ,refy _C ) For a motion vector given in 1/32-sample units (mvLX [0 ]],mvLX[1]) The chromaticity position pointed to. Variable refxSb _C 、refySb _C 、refx _C And refy _C The derivation is as follows:

-refxSb _C ＝((xSb/SubWidthC<<

5)+mvLX[0])*hori_scale_fp(8-763)

-refx _C ＝

((Sign(refxSb _C )*((Abs(refxSb _C )+256)>>9)

+xC*((hori_scale_fp+8)>>4))+16)>>5(8-764)

-refySb _C ＝((ySb/SubHeightC<<5)+

mvLX[1])*vert_scale_fp (8-765)

-refy _C ＝

((Sign(refySb _C )*((Abs(refySb _C )+256)>>9)

+yC*((vert_scale_fp+8)>>4))+16)>> 5(8-766)

-variable xInt _C 、yInt _C 、xFrac _C And yFrac _C The derivation is as follows:

-xInt _C ＝refx _C >>5

(8-767)

-yInt _C ＝refy _C >>5

(8-768)

-xFrac _C ＝refy _C &31

(8-769)

-yFrac _C ＝refy _C &31

(8-770)

-deriving the predicted sample value predSamplesLX [ xC ] [ yC ] by invoking the procedure specified in section 8.5.6.3.4 with (xIntC, yIntC), (xFracC, yFracC), (xSbIntC, ySbIntC), sbWidth, sbHeight and refPicLX as inputs.

8.5.6.3.2 luminance sample interpolation filtering process

The inputs of this process are:

luminance position in full sample (xInt _L ,yInt _L )，

-luminance position in fractional samples (xFrac _L ,yFrac _L )，

Luminance position in full sample (xsbsint _L ,ySbInt _L ) The boundary block for reference sample filling is relative to the left upsampling point of the left upsampling luminance sample of the reference picture,

-luminance reference sample number group refPicLX _L ，

Half-pel interpolation filter index hpelIfIdx,

a variable sbWidth, specifying the width of the current sub-block,

a variable sbHeight, specifying the height of the current sub-block,

-specifying a luminance position (xSb, ySb) of an upper left corner luminance sample of the current sub-block relative to an upper left corner luminance sample of the current picture;

-using a 6TapFlag specifying whether a 6tap interpolation filter is used.

The variables shift1, shift2 and shift3 are derived as follows:

setting the variable shift1 equal to Min (4, bitdepth _Y -8), variable shift2 is set equal to 6, variable shift3 is set equal to Max (2, 14-BitDepth) _Y )。

Equal to xFrac _L Or yFrac _L Luminance interpolation filter coefficient f for each 1/16 fractional sample position p _L [p]The derivation is as follows:

-specifying a luminance interpolation filter coefficient f in tables 8-12 if at least one of the following conditions is met _L [p]。

MotionNodelIdc [ xSb ] [ ySb ] is greater than 0, and sbWidth and sbHeight are both equal to 4,

using6TapF lag equals 1.

Otherwise, the luminance interpolation filter coefficients f are specified in tables 8-11 in dependence on hpelIfIdx _L [p]。

-if the sub_cached_as_pic_flag [ SubPicIdx ] is equal to 1, the following steps are applied:

-xInt _i ＝Clip3(SubPicLeftBoundaryPos,SubPicRightBoundaryPos,

xInt _L +i-3)(8-771)

-yInt _i ＝Clip3(SubPicTopBoundaryPos,SubPicBotBoundaryPos,yInt _L +i-3)

(8-772)

-else (sub_agent_as_pic_flag [ SubPicIdx ] equal to 0), the following steps are applied:

-xInt _i ＝Clip3(0,picW-1,sps_ref_wraparound_enabled_flag？

ClipH((sps_ref_wraparound_offset_minus1+1)*MinCbSizeY,

picW,xInt _L +i-3):(8-773)

xInt _L +i-3)

-yInt _i ＝Clip3(0,picH-1,yInt _L +i-3)

(8-774)

-xInt _i ＝Clip3(xSbInt _L -3,xSbInt _L +sbWidth+4,xInt _i ) (8-775)

-yInt _i ＝Clip3(ySbInt _L -3,ySbInt _L +sbHeight+4,yInt _i ) (8-776)

Predicted luminance sample value predSampleLX _L The derivation is as follows:

-if xFrac _L And yFrac _L All equal to 0, predsampleLX _L The values of (2) are derived as follows:

-predSampleLX _L ＝refPicLX _L [xInt ₃ ][yInt ₃ ]<<shift3 (8-777)

-sample number set temp n of n=0.7 is derived as follows:

-predicted luminance sample value predSampleLX _L The derivation is as follows:

Tables 8-12-luminance interpolation Filter coefficient f for each 1/16 fractional sample position p in affine motion mode _L [p]Specifications of (2)

Fig. 30 is a flow chart of a method 3000 for video processing. The method 3000 includes, at operation 3010, determining a maximum number of candidates (ML) in a sub-block-based Merge candidate list and/or adding sub-block-based temporal motion vector prediction (SbTMVP) candidates to the sub-block-based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled or whether Current Picture Reference (CPR) codec mode is used for conversion during conversion, for conversion between a current block of video and a bitstream representation of video.

The method 3000 includes, at operation 3020, performing a conversion based on the determination.

Fig. 31 is a flow chart of a method 3100 for video processing. Method 3100 includes, at operation 3110, determining a maximum candidate number (ML) in a sub-block-based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP), sub-block-based temporal motion vector prediction (SbTMVP), and affine codec mode are enabled during a transition between a current block of video and a bitstream representation of the video.

The method 3100 includes, at operation 3120, performing a conversion based on the determination.

Fig. 32 is a flow chart of a method 3200 for video processing. The method 3200 includes, at operation 3210, determining that sub-block based motion vector prediction (SbTMVP) mode is disabled for conversion due to Temporal Motion Vector Prediction (TMVP) mode being disabled at a first video segment level for conversion between a current block of a first video segment of the video and a bitstream representation of the video.

The method 3200 includes, at operation 3220, based on determining to perform the converting, the bitstream representation conforms to a format specifying whether an indication of SbTMVP mode is included and/or a location of the indication of SbTMVP mode relative to the indication of TMVP mode in a Merge candidate list.

Fig. 33 is a flow chart of a method 3300 for video processing. Method 3300 includes, at operation 3310, performing a conversion between a current block of video encoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool and a bitstream representation of the video, selectively masking coordinates of corresponding locations of the current block or sub-block of the current block using a mask based on compression of a motion vector associated with the SbTMVP tool or the TMVP tool, and application of the mask includes a bitwise and operation between a value of the calculated coordinates and a value of the mask.

Fig. 34 is a flow chart of a method 3400 for video processing. The method 3400 includes, at operation 3410, determining a valid corresponding region of a current block of a video segment of the video based on one or more characteristics of the current block for application of a sub-block based motion vector prediction (SbTMVP) tool on the current block.

The method 3400 includes, at operation 3420, performing a transition between the current block and a bitstream representation of the video based on the determination.

Fig. 35 is a flow chart of a method 3500 for video processing. The method 3500 includes, at operation 3510, determining a default motion vector for a current block of video encoded using a sub-block based temporal motion vector prediction (SbTMVP) tool.

The method 3500 includes, at operation 3520, based on the determining, performing a transition between the current block and the bitstream representation of the video, determining a default motion vector without obtaining a motion vector from a block overlaying a corresponding position in the collocated picture associated with the center position of the current block.

Fig. 36 is a flow chart of a method 3600 for video processing. The method 3600 includes, at operation 3610, for a current block of a video segment of a video, inferring that a sub-block-based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool is disabled for the video segment if a current picture of the current block is a reference picture with an index set to M in a reference picture list X, M and X being integers, and x=0 or x=1.

Method 3600 includes, at operation 3620, performing a transition between the current block and a bitstream representation of the video based on the inference.

Fig. 37 is a flowchart of a method 3700 for video processing. The method 3700 includes, at operation 3710, determining to enable application of a sub-block based temporal motion vector prediction (SbTMVP) tool in a case where, for a current block of video, a current picture of the current block is a reference picture with an index set to M in a reference picture list X, M and X being integers.

The method 3700 includes, at operation 3720, performing a transition between a current block and a bitstream representation of the video based on the determining.

Fig. 38 is a flow chart of a method 3800 for video processing. Method 3800 includes, at operation 3810, performing a transition between a current block of video and a bitstream representation of the video, the current block encoded with a sub-block based codec tool, wherein performing the transition includes encoding a sub-block Merge index using a plurality of binary bits (N) by a unified method with a sub-block based temporal motion vector prediction (SbTMVP) tool enabled or disabled.

Fig. 39 is a flow chart of a method 3900 for video processing. The method 3900 includes, at operation 3910, determining, for a current block of video encoded using a sub-block based temporal motion vector prediction (SbTMVP) tool, a motion vector used by the SbTMVP tool to locate a corresponding block in a picture different from a current picture including the current block.

The method 3900 includes, at operation 3920, performing a transition between a current block and a bitstream representation of a video based on the determination.

Fig. 40 is a flow chart of a method 4000 for video processing. The method 4000 includes, at operation 4010, determining, for a transition between a current block of video and a bitstream representation of video, whether to insert a zero motion affine Merge candidate into a sub-block Merge candidate list based on whether affine prediction is enabled for the transition of the current block.

Method 4000 includes, at operation 4020, performing a conversion based on the determination.

Fig. 41 is a flow chart of a method 4100 for video processing. The method 4100 includes, at operation 4110, for a transition between a current block of video using a sub-block Merge candidate list and a bitstream representation of the video, inserting a zero-motion non-affine fill candidate into the sub-block Merge candidate list if the sub-block Merge candidate list is not full.

The method 4100 includes, at operation 4120, performing a conversion after the inserting.

Fig. 42 is a flowchart of a method 4200 for video processing. Method 4200 includes, at operation 4210, determining a motion vector for a transition between a current block of video and a bitstream representation of video using rules that determine that the motion vector is derived from one or more motion vectors covering blocks in respective positions in a collocated picture.

The method 4200 includes, at operation 4220, performing conversion based on the motion vector.

Fig. 43 is a block diagram of the video processing apparatus 4300. The apparatus 4300 may be used to implement one or more methods described herein. The apparatus 4300 may be embodied in a smart phone, tablet, computer, internet of things (IoT) receiver, or the like. The device 4300 may include one or more processors 4302, one or more memories 4304, and video processing hardware 4306. The processor 4302 may be configured to implement one or more methods described herein. Although some embodiments may operate without memory, memory 4304 (or memories) may be used to store data and code for implementing the methods and techniques described herein. The video processing hardware 4306 may be used to implement some of the techniques described in this document in hardware circuitry.

In some embodiments, the video codec method may be implemented using an apparatus implemented on a hardware platform as described with respect to fig. 43.

Some embodiments of the disclosed technology include making decisions or determinations to enable video processing tools or modes. In an example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of video blocks, but not necessarily modify the resulting bitstream based on the use of the tool or mode. That is, the conversion of the bitstream representation from video blocks to video will use the video processing tool or mode when enabling the video processing tool or mode based on a decision or determination. In another example, when the video processing tool or mode is enabled, the decoder will process the bitstream based on the video processing tool or mode having knowledge that the bitstream has been modified. That is, the conversion of the bitstream representation of the video into video blocks will be performed using video processing tools or modes that are enabled based on the decision or determination.

Some embodiments of the disclosed technology include making decisions or determinations to disable video processing tools or modes. In an example, when a video processing tool or mode is disabled, the encoder will not use the tool or mode in the conversion of video blocks to a bitstream representation of video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream knowing that the bitstream has not been modified using a decision-based or decision-enabled video processing tool or mode.

Fig. 44 is a block diagram illustrating an example video processing system 4400 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 4400. The system 4400 may include an input 4402 for receiving video content. The video content may be received in an original or uncompressed format (e.g., 8 or 10 bit multi-component pixel values), or may be received in a compressed or encoded format. Input 4402 may represent a network interface, a peripheral bus interface, or a memory interface. Examples of network interfaces include wired interfaces (such as ethernet, passive Optical Network (PON), etc.) and wireless interfaces (such as Wi-Fi or cellular interfaces).

The system 4400 may include an encoding component 4404 that may implement various encoding or encoding methods described herein. The encoding component 4404 may reduce the average bitrate of video from the input 4402 to the output of the encoding component 4404 to produce an encoded representation of the video. Thus, coding techniques are sometimes referred to as video compression or video transcoding techniques. As represented by component 4406, the output of the encoding component 4404 may be stored or transmitted via a connected communication. The stored or transmitted bitstream (or encoded) representation of the video received at input 4402 may be used by component 4408 to generate pixel values or displayable video that is sent to display interface 4410. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "encoding" operations or tools, it should be understood that a codec tool or operation is used at the encoder and that a corresponding decoding tool or operation of the reverse encoding result will be performed by the encoder.

Examples of the peripheral bus interface or the display interface may include a Universal Serial Bus (USB) or a High Definition Multimedia Interface (HDMI) or Displayport, or the like. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described herein may be embodied in various electronic devices such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.

In some embodiments, the following technical solutions may be implemented:

A1. a video processing method, comprising: for a transition between a current block of video and a bitstream representation of video, determining a maximum number of candidates (ML) in a sub-block based Merge candidate list and/or adding a sub-block based Merge candidate into the sub-block based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled or whether a Current Picture Reference (CPR) codec mode is used for the transition during the transition; and performing the conversion based on the determination.

A2. The method of solution A1, wherein in case of disabling the TMVP tool or disabling the SbTMVP tool, use of the SbTMVP candidate is disabled.

A3. The method of solution A2, wherein determining ML comprises: sbTMVP candidates are excluded from the sub-block-based Merge candidate list based on whether to disable the SbTMVP tool or TMVP tool.

A4. A video processing method, comprising: for a transition between a current block of video and a bitstream representation of video, determining a maximum candidate number (ML) in a sub-block-based Merge candidate list based on whether Temporal Motion Vector Prediction (TMVP) is enabled during the transition, sub-block-based temporal motion vector prediction (SbTMVP), and affine codec mode; and performing the conversion based on the determination.

A5. The method according to solution A4, wherein in case affine codec mode is enabled, ML is dynamically set and signaled in the bitstream representation.

A6. The method according to solution A4, wherein ML is predefined in case affine codec mode is disabled.

A7. The method of solution A2 or A6, wherein determining ML comprises: with TMVP tools disabled, sbTMVP tools enabled, and affine codec mode of the current block disabled, ML is set to zero.

A8. The method of solution A2 or A6, wherein determining ML comprises: in the case where the SbTMVP tool is enabled, the TMVP tool is enabled, and the affine codec mode of the current block is disabled, ML is set to one.

A9. The method of solution A1, wherein in case the collocated reference picture of the current picture of the disabled SbTMVP tool or current block is the current picture, the use of SbTMVP candidates is disabled.

A10. The method of solution A9, wherein determining ML comprises: the SbTMVP candidate is excluded from the sub-block-based Merge candidate list based on whether the SbTMVP tool is disabled or whether the collocated reference picture of the current picture is the current picture.

A11. The method of solution A9, wherein determining ML comprises: in the case where the collocated reference picture of the current picture is the current picture and affine encoding of the current block is disabled, ML is set to zero.

A12. The method of solution A9, wherein determining ML comprises: in the case where the SbTMVP tool is enabled, the collocated reference picture of the current picture is not the current picture, and affine encoding of the current block is disabled, ML is set to 1.

A13. The method of solution A1, wherein in case that the reference picture with reference picture index 0 in the SbTMVP disabled tool or reference picture list 0 (L0) is a current picture of the current block, use of the SbTMVP candidate is disabled.

A14. The method of solution a13, wherein determining ML comprises: sbTMVP candidates are excluded from the sub-block-based Merge candidate list based on whether the SbTMVP tool is disabled or whether the reference picture with reference picture index 0 in L0 is a current picture.

A15. The method of solution a13, wherein determining ML comprises: in the case where the reference picture with reference picture index 0 in the SbTMVP tool enabled, L0 is the current picture, and affine encoding of the current block is disabled, ML is set to zero.

A16. The method of solution a13, wherein determining ML comprises: in the case where the reference picture with reference picture index 0 in the SbTMVP tool enabled, L0 is not the current picture, and affine encoding of the current block is disabled, ML is set to one.

A17. The method of solution A1, wherein in case of disabling the SbTMVP tool or that the reference picture with reference picture index 0 in reference picture list 1 (L1) is the current picture of the current block, the use of the SbTMVP candidate is disabled.

A18. The method of solution a17, wherein determining ML comprises: sbTMVP candidates are excluded from the sub-block-based Merge candidate list based on whether the SbTMVP tool is disabled or whether the reference picture with reference picture index 0 in L1 is a current picture.

A19. The method of solution a17, wherein determining ML comprises: in the case where the reference picture with reference picture index 0 in the SbTMVP tool enabled, L1 is the current picture, and affine encoding of the current block is disabled, ML is set to zero.

A20. The method of solution a17, wherein determining ML comprises: in the case where the SbTMVP tool is enabled, the reference picture with reference picture index 0 in L1 is not the current picture, and affine encoding of the current block is disabled, ML is set to one.

A21. A video processing method, comprising: determining, for a transition between a current block of a first video segment of the video and a bitstream representation of the video, that subblock-based motion vector prediction (SbTMVP) mode is disabled for the transition due to Temporal Motion Vector Prediction (TMVP) mode being disabled at a first video segment level; and performing a conversion based on the determination, wherein the bitstream represents a conforming format, the format specifying whether the indication of SbTMVP mode is included and/or a position of the indication of SbTMVP mode relative to the indication of TMVP mode in the Merge candidate list.

A22. The method of solution a21, wherein the first video clip is a sequence, a slice, or a picture.

A23. The method of solution a21 wherein the format specification omits an indication of SbTMVP mode as an indication of TMVP mode is included on the first video segment level.

A24. The method of solution a21, wherein the indication of format specific SbTMVP mode follows, in decoding order, the indication that the first video clip level is TMVP mode.

A25. The method according to any one of solutions a21 to a24, wherein, in case TMVP mode is indicated as disabled, the format specification omits an indication of SbTMVP mode.

A26. The method of solution a21, wherein the format specification includes an indication of SbTMVP mode at a sequence level of the video and the indication of SbTMVP mode is omitted at a second video clip level.

A27. The method of solution a26, wherein the second video clip at the second video clip level is a slice, a tile, or a picture.

A28. According to the method of any one of solutions A1 to a27, the conversion generates the current block from the bitstream representation.

A29. The method according to any one of solutions A1 to a27, wherein the converting generates a bitstream representation from the current block.

A30. The method of any one of solutions A1-a 27, wherein performing the conversion comprises parsing the bitstream representation based on one or more decoding rules.

A31. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement a method of one or more of solutions A1 to a30.

A32. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of one or more of the solutions A1 to a30.

In some embodiments, the following technical solutions may be implemented:

B1. a video processing method, comprising: performing a conversion between a current block of video encoded using a sub-block based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool and a bitstream representation of the video, wherein coordinates of corresponding locations of the current block or sub-block of the current block are selectively masked using a mask based on compression of a motion vector associated with the SbTMVP tool or the TMVP tool, and wherein application of the mask includes a bitwise and operation between a value of the calculated coordinates and a value of the mask.

B2. The method according to solution 1, wherein the coordinates are (xN, yN), and the MASK (MASK) is equal to (2) ^M -1), wherein M is an integer, wherein application of the mask results in masked coordinates (xN ', yN '), wherein xN ' =xn&MASK and yN' =yn&MASK, and wherein "" is a bitwise NOT operation, and ""&"is a bitwise AND operation.

B3. The method according to solution B2, wherein m=3 or m=4.

B4. The method according to solution B2 or B3, wherein the plurality of sizes is 2 ^K ×2 ^K The same motion information is shared based on the compression of motion vectors, and where K is an integer not equal to M.

B5. The method according to solution B4, wherein M = K +1.

B6. The method of solution B1, wherein no mask is applied after determining that the motion vector associated with the SbTMVP tool or TMVP tool is not compressed.

B7. The method according to any one of solutions B1 to B6, wherein the mask of the SbTMVP tool is identical to the mask of the TMVP tool.

B8. The method according to any one of the solutions B1 to B6, wherein the mask of the ATMVP tool is different from the mask of the TMVP tool.

B9. The method according to solution B1, wherein the type of compression is no compression, 8 x 8 compression or 16 x 16 compression.

B10. The method according to solution B9, wherein the type of compression is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header or a slice group header.

B11. The method of solution B9 or B10, wherein the type of compression is based on a standard profile, level or hierarchy corresponding to the current block.

B12. A video processing method, comprising: determining a valid corresponding region of a current block of a video segment of the video based on one or more characteristics of the current block for application of a sub-block based motion vector prediction (SbTMVP) tool on the current block; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

B13. The method of solution B12, wherein the one or more characteristics include a height or width of the current block.

B14. The method of solution B12, wherein the one or more characteristics include a type of compression of a motion vector associated with the current block.

B15. The method of solution B14, wherein the effective corresponding region is a first size in a case where the type of compression includes no compression, and wherein the effective corresponding region is a second size larger than the first size in a case where the type of compression includes kxk compression.

B16. The method according to solution B12, wherein the size of the effective corresponding area is based on a base area having a size of mxn smaller than the size of a Coding Tree Unit (CTU) area, and wherein the size of the current block is W x H.

B17. The method according to solution B16, wherein the size of the CTU region is 128 x 128, and wherein m=64 and n=64.

B18. The method according to solution B16, wherein in the case where w+.m and h+.n, the effective corresponding region is an extension in the collocated base region and the collocated picture.

B19. The method of solution B16, wherein when determining W > M and H > N, the current block is partitioned into several parts, and wherein each of the several parts includes a separate valid corresponding region for application of the SbTMVP tool.

B20. A video processing method, comprising: determining a default motion vector for a current block of video encoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool; and based on the determining, performing a transition between the current block and the bitstream representation of the video, wherein a default motion vector is determined without obtaining a motion vector from a block that covers a corresponding position in the collocated picture associated with the center position of the current block.

B21. The method of solution B20, wherein the default motion vector is set to (0, 0).

B22. The method of solution B20, wherein the default motion vector is derived from a history-based motion vector prediction (HMVP) table.

B23. The method according to solution B22, wherein in case the HMVP table is empty, the default motion vector is set to (0, 0).

B24. The method of solution B22, wherein the default motion vector is predefined and signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a slice group header, a Coding Tree Unit (CTU), or a Coding Unit (CU).

B25. The method according to solution 22, wherein in case the HMVP table is non-empty, the default motion vector is set to the first element stored in the HMVP table.

B26. The method of solution B22, wherein in case the HMVP table is non-empty, the default motion vector is set to the last element stored in the HMVP table.

B27. The method of solution B22, wherein in case the HMVP table is non-empty, the default motion vector is set to a specific motion vector stored in the HMVP table.

B28. The method of solution B27 wherein the particular motion vector references reference list 0.

B29. The method of solution B27 wherein the specific motion vector references reference list 1.

B30. The method of solution B27, wherein the particular motion vector references a particular reference picture in reference list 0.

B31. The method of solution B27 wherein the particular motion vector references a particular reference picture in reference list 1.

B32. The method of solution B30 or B31, wherein the specific reference picture has an index of 0.

B33. The method of solution B27 wherein the specific motion vector references a collocated picture.

B34. The method according to solution B22, wherein in case that a specific motion vector is not found by the search process in the HMVP table, the default motion vector is set to a predetermined default motion vector.

B35. The method according to solution B34, wherein the search process searches only the first element or only the last element of the HMVP table.

B36. The method of solution B34, wherein the search process searches only a subset of the elements of the HMVP table.

B37. The method of solution B22, wherein the default motion vector does not reference the current picture of the current block.

B38. The method of solution B22, wherein the default motion vector is scaled to the collocated picture if the default motion vector does not reference the collocated picture.

B39. The method of solution B20, wherein the default motion vector is derived from neighboring blocks.

B40. The method of solution B39, wherein the upper right corner of the neighboring block (A0) is directly adjacent to the lower left corner of the current block, or the lower right corner of the neighboring block (A1) is directly adjacent to the lower left corner of the current block, or the lower left corner of the neighboring block (B0) is directly adjacent to the upper right corner of the current block, or the lower right corner of the neighboring block (B1) is directly adjacent to the upper right corner of the current block, or the lower right corner of the neighboring block (B2) is directly adjacent to the upper left corner of the current block.

B41. The method according to solution B40, wherein the default motion vector is derived from only one of the neighboring blocks A0, A1, B0, B1 and B2.

B42. The method of solution B40, wherein the default motion vector is derived from one or more of the neighboring blocks A0, A1, B0, B1 and B2.

B43. The method according to solution B40, wherein in case a valid default motion vector cannot be found in any of the neighboring blocks A0, A1, B0, B1 and B2, the default motion vector is set to a predefined default motion vector.

The method according to solution B43, wherein the predefined default motion vector is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a slice group header, a Coding Tree Unit (CTU), or a Coding Unit (CU).

B45. The method of solutions B43 and B44, wherein the predefined default motion vector is (0, 0).

B46. The method of solution B39, wherein the default motion vector is set to a specific motion vector from a neighboring block.

B47. The method of solution B46, wherein the particular motion vector references reference list 0.

B48. The method of solution B46, wherein the specific motion vector references reference list 1.

B49. The method of solution B46, wherein the particular motion vector references a particular reference picture in reference list 0.

B50. The method of solution B46, wherein the particular motion vector references a particular reference picture in reference list 1.

B51. The method of solution B49 or B50, wherein the specific reference picture has index 0.

B52. The method of solution B46, wherein the specific motion vector references a collocated picture.

B53. The method according to solution B20, wherein in case that the block covering the corresponding position in the collocated picture is intra coded, a default motion vector is used.

B54. The method according to solution B20, wherein the derivation method is modified without locating a block covering a corresponding position in the collocated picture.

B55. The method according to solution B20, wherein default motion vector candidates are always available.

B56. The method of solution B20, wherein upon determining that the default motion vector candidate is set to unavailable, the default motion vector is derived in an alternative manner.

B57. The method of solution B20, wherein the availability of the default motion vector is based on syntax information in a bitstream representation associated with the video clip.

B58. The method of solution B57, wherein the syntax information includes an indication that the SbTMVP tool is enabled, and wherein the video clip is a slice, or a picture.

B59. The method of solution B58, wherein the current picture of the current block is not an Intra Random Access Point (IRAP) picture, and the current picture is not inserted into reference picture list 0 (L0) with reference index 0.

B60. The method of solution B20, wherein in the case of enabling the SbTMVP tool, a fixed index or fixed index group is assigned to a candidate associated with the SbTMVP tool, and wherein in the case of disabling the SbTMVP tool, a fixed index or fixed index group is assigned to a candidate associated with a codec tool other than the SbTMVP tool.

B61. A video processing method, comprising: for a current block of a video segment of video, inferring that a sub-block based temporal motion vector prediction (SbTMVP) tool or a Temporal Motion Vector Prediction (TMVP) tool is disabled for the video segment, where M and X are integers, and where x=0 or x=1, where the current picture of the current block is a reference picture with an index set to M in a reference picture list X; and performing a transition between the current block and the bitstream representation of the video based on the inference.

B62. The method according to solution B61, wherein M corresponds to a target reference picture index, and for reference picture list X for SbTMVP tool or TMVP tool, motion information of the temporal block is scaled to the target reference picture index.

B63. The method of solution B61, wherein the current picture is an Intra Random Access Point (IRAP) picture.

B64. A video processing method, comprising: for a current block of video, determining to enable application of a sub-block based temporal motion vector prediction (SbTMVP) tool in the case where a current picture of the current block is a reference picture with an index set to M in a reference picture list X, where M and X are integers; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

B65. The method of solution B64, wherein the motion information corresponding to each sub-block of the current block references the current picture.

B66. The method of solution B64, wherein the motion information of the sub-block of the current block is derived from a temporal block, and wherein the temporal block is encoded with at least one reference picture that references the current picture of the temporal block.

B67. The method of solution B66, wherein the converting does not include a scaling operation.

B68. A video processing method, comprising: performing a conversion between a current block of video and a bitstream representation of the video, wherein the current block is encoded with a sub-block based codec tool, and wherein performing the conversion comprises encoding a sub-block Merge index using a plurality of binary bits (N) by a unified method with a sub-block based temporal motion vector prediction (SbTMVP) tool enabled or disabled.

B69. The method of solution B68, wherein a first number of bits (L) of the plurality of bits is context coded, and wherein a second number of bits (N-L) is bypass coded.

B70. The method according to solution B69, wherein L = 1.

B71. The method of solution B68 wherein each of the plurality of binary bits is context coded.

B72. According to the method of any of the solutions B1 to B71, the conversion generates the current block from the bitstream representation.

B73. The method according to any of the solutions B1 to B71, wherein the converting generates a bitstream representation from the current block.

B74. The method according to any of the solutions B1 to B71, wherein performing the conversion comprises parsing the bitstream representation based on one or more decoding rules.

B75. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement a method of one or more of solutions B1 to B74.

B76. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing a method according to one or more of solutions B1 to B74.

In some embodiments, the following technical solutions may be implemented:

C1. a video processing method, comprising: for a current block of video encoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool, determining a motion vector used by the SbTMVP tool to locate a corresponding block in a picture different from a current picture including the current block; and based on the determination, performing a transition between the current block and the bitstream representation of the video.

C2. The method according to solution C1, wherein the motion vector is set as a default motion vector.

C3. The method of solution C2, wherein the default motion vector is (0, 0).

C4. The method according to solution C2, wherein the default motion vector is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a slice group header, a Coding Tree Unit (CTU), or a Coding Unit (CU).

C5. The method according to solution C1, wherein the motion vector is set to a motion vector stored in a history-based motion vector prediction (HMVP) table.

C6. The method according to solution C5, wherein the motion vector is set as a default motion vector in case the HMVP table is empty.

C7. The method of solution C6, wherein the default motion vector is (0, 0).

C8. The method according to solution C5, wherein in case the HMVP table is non-empty, the motion vector is set to the first motion vector stored in the HMVP table.

C9. The method according to solution 5, wherein in case the HMVP table is non-empty, the motion vector is set to the last motion vector stored in the HMVP table.

C10. The method according to solution C5, wherein in case the HMVP table is non-empty, the motion vector is set to a specific motion vector stored in the HMVP table.

C11. The method of solution C10, wherein the particular motion vector references reference list 0.

C12. The method of solution C10, wherein the specific motion vector references reference list 1.

C13. The method of solution C10, wherein the particular motion vector references a particular reference picture in reference list 0.

C14. The method of solution C10, wherein the particular motion vector references a particular reference picture in reference list 1.

C15. The method of solution C13 or C14, wherein the specific reference picture has index 0.

C16. The method of solution C10, wherein the specific motion vector references a collocated picture.

C17. The method according to solution C5, wherein in case that a specific motion vector is not found by the search process in the HMVP table, the motion vector is set as a default motion vector.

C18. The method according to solution C17, wherein the search process searches only the first element or only the last element of the HMVP table.

C19. The method of solution C17, wherein the search process searches only a subset of the elements of the HMVP table.

C20. The method according to solution C5, wherein the motion vector stored in the HMVP table does not reference the current picture.

C21. The method according to solution C5, wherein the motion vectors stored in the HMVP table are scaled to the collocated picture without referencing the collocated picture.

C22. The method according to solution C1, wherein the motion vector is set to a specific motion vector of a specific neighboring block.

C23. The method of solution C22, wherein the upper right corner of the neighboring block (A0) is directly adjacent to the lower left corner of the current block, or the lower right corner of the neighboring block (A1) is directly adjacent to the lower left corner of the current block, or the lower left corner of the neighboring block (B0) is directly adjacent to the upper right corner of the current block, or the lower right corner of the neighboring block (B1) is directly adjacent to the upper right corner of the current block, or the lower right corner of the neighboring block (B2) is directly adjacent to the upper left corner of the current block.

C24. The method according to solution C1, wherein in case that a specific neighboring block does not exist, the motion vector is set as a default motion vector.

C25. The method according to solution C1, wherein the motion vector is set as a default motion vector without inter-coding a specific neighboring block.

C26. The method of solution C22, wherein the particular motion vector references reference list 0.

C27. The method of solution C22, wherein the specific motion vector references reference list 1.

C28. The method of solution C22, wherein the particular motion vector references a particular reference picture in reference list 0.

C29. The method of solution C22, wherein the particular motion vector references a particular reference picture in reference list 1.

C30. The method of solution C28 or C29, wherein the particular reference picture has an index of 0.

C31. The method of solution C22 or C23, wherein a particular motion vector references a collocated picture.

C32. The method of solution C22 or C23, wherein the motion vector is set to a default motion vector in case a specific neighboring block does not reference a collocated picture.

C33. The method of any one of solutions C24-C32, wherein the default motion vector is (0, 0).

C34. The method according to solution C1, wherein in case that a specific motion vector stored in a specific neighboring block cannot be found, the motion vector is set as a default motion vector.

C35. The method according to solution 22, wherein the specific motion vector is scaled to the collocated picture in case the specific motion vector does not reference the collocated picture.

C36. The method of solution C22, wherein the particular motion vector does not reference the current picture.

C37. According to the method of any one of solutions C1 to C36, the conversion generates the current block from the bitstream representation.

C38. The method of any one of solutions C1-C36, wherein the converting generates a bitstream representation from the current block.

C39. The method of any of solutions C1-C36, wherein performing the conversion comprises parsing the bitstream representation based on one or more decoding rules.

C40. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement a method of one or more of solutions C1 to C39.

C41. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of one or more of the solutions C1 to C39.

In some embodiments, the following technical solutions may be implemented:

D1. a video processing method, comprising: for a transition between a current block of video and a bitstream representation of video, determining whether to insert a zero motion affine Merge candidate into a sub-block Merge candidate list based on whether affine prediction is enabled for the transition of the current block; and performing a conversion based on the determination.

D2. The method according to solution D1, wherein the zero motion affine Merge candidate is not inserted into the sub-block Merge candidate list in case the affine use flag in the bitstream representation is off.

D3. The method according to solution D2, further comprising: in the case where the affine use flag is off, a default motion vector candidate as a non-affine candidate is inserted into the sub-block Merge candidate list.

D4. A video processing method, comprising: for a transition between a current block of a video using a sub-block Merge candidate list and a bitstream representation of the video, inserting zero-motion non-affine fill candidates into the sub-block Merge candidate list if the sub-block Merge candidate list is not full; and performing the conversion after the inserting.

D5. The method according to solution D4, further comprising: the affine use flag of the current block is set to zero.

D6. The method of solution D4, wherein the inserting is further based on whether affine use flags in the bitstream representation are off.

D7. A video processing method, comprising: for a transition between a current block of video and a bitstream representation of video, determining a motion vector using rules that determine the motion vector to be derived from one or more motion vectors covering blocks in corresponding positions in the collocated picture; and performing conversion based on the motion vector.

D8. The method according to solution D7, wherein the one or more motion vectors comprise MV0 and MV1 representing motion vectors in reference list 0 and reference list D1, respectively, and wherein the motion vectors to be derived comprise MV0 'and MV1' representing motion vectors in reference list 0 and reference list 1.

D9. The method according to solution D8, wherein MV0 'and MV1' are derived based on MV0 in case of the collocated picture in reference list 0.

D10. The method according to solution D8, wherein MV0 'and MV1' are derived based on MV1 in case of collocated pictures in reference list 1.

D11. According to the method of any one of the solutions D1 to D10, the conversion generates the current block from the bitstream representation.

D12. The method according to any of the solutions D1 to D10, wherein the converting generates a bitstream representation from the current block.

D13. The method of any of solutions D1-D10, wherein performing the conversion comprises parsing the bitstream representation based on one or more decoding rules.

D14. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement a method of one or more of solutions D1 to D13.

D15. A computer program product stored on a non-transitory computer readable medium, the computer program product comprising program code for performing the method of one or more of the solutions D1 to D13.

The disclosure and other solutions, examples, embodiments, modules, and functional operations described in this document may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-volatile computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" includes all means, devices, and machines for processing data, including for example, a programmable processor, a computer, or multiple processors or groups of computers. The apparatus may include, in addition to hardware, code that creates an execution environment for a computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. The propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processing and logic flows may also be performed by, and apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disk; CD ROM and DVD ROM discs. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or of the claims, but rather as descriptions of features of particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination and the combination of the claims may be directed to a subcombination or variation of a subcombination.

Also, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples are described, and other implementations, enhancements, and variations may be made based on what is described and illustrated in this patent document.

Claims

1. A method of encoding and decoding video data, comprising:

firstly initializing, for conversion between a current block of a video coded and decoded by using a temporal motion vector based on a sub-block and a bit stream of the video, initializing temporal motion information into default motion information, wherein a motion vector related to the default motion information is (0, 0);

determining whether a reference picture of a particular neighboring block A1 is a collocated picture of the current block, the collocated picture being different from a current picture including the current block;

if the reference picture of the specific neighboring block A1 is a collocated picture of the current block, setting the temporal motion information to the specific motion information associated with the specific neighboring block A1 without checking any other neighboring block, and

if the reference picture of the particular neighboring block A1 is not a collocated picture of the current block, the temporal motion information is still equal to the default motion information,

wherein the particular neighboring block A1 is adjacent to the lower left corner of the current block, and wherein the particular neighboring block A1 covers a luminance position (xCb-1, ycb+cbheight-1), wherein (xCb, yCb) is the luminance position of the left-hand sample point of the current block relative to the left-hand sample point of the current picture, and cbHeight is the height of the current block;

The rounded temporal motion information is generated by rounding the temporal motion information to an integer precision,

locating at least one video region in the collocated picture for the current block based on the rounded temporal motion information;

deriving at least one sub-block motion information related to the current block based on the at least one video region;

constructing a sub-block motion candidate list based on the at least one sub-block motion information;

determining candidates from the sub-block motion candidate list based on sub-block Merge indexes included in the bitstream; and

performing the conversion based on the determined candidate,

wherein deriving at least one sub-block motion information related to the current block based on the at least one video region comprises:

determining whether to use inter mode to encode and decode the at least one video region;

deriving the at least one sub-block motion information using the at least one video region in case of encoding and decoding the at least one video region using the inter mode, and

in case the at least one video region is encoded using the intra copy mode, deriving the at least one sub-block motion information using the at least one video region is avoided.

2. The method of claim 1, wherein a plurality of binary bits (N) are used to represent the sub-block Merge index.

3. The method of claim 2, wherein a first number (L) of the plurality of bits is context-coded, and wherein a second number (N-L) of bits is bypass-coded.

4. A method according to claim 3, wherein L = 1.

5. The method of claim 1, wherein the converting comprises decoding the current block from the bitstream.

6. The method of claim 1, wherein the converting comprises encoding the current block into the bitstream.

7. The method of claim 1, wherein the converting comprises generating the bitstream from the current block; the method further comprises the steps of:

the bit stream is stored in a non-transitory computer readable recording medium.

8. The method of claim 1, further comprising:

for a second block of video that is encoded and decoded using a sub-block-based temporal motion vector prediction (SbTMVP) tool, determining a motion vector used by the SbTMVP tool to locate a corresponding block in a picture that is different from a current picture that includes the second block; and

Based on the determination, a conversion between the second block and a bitstream of the video is performed.

9. The method of claim 8, wherein the motion vector is set to a default motion vector.

10. The method of claim 9, wherein the default motion vector is (0, 0).

11. The method of claim 9, wherein the default motion vector is signaled in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a slice header, a slice group header, a Codec Tree Unit (CTU), or a Codec Unit (CU).

12. The method of claim 8, wherein the motion vector is set to a motion vector stored in a history-based motion vector prediction (HMVP) table.

13. The method of claim 12, wherein the motion vector is set to a default motion vector if the HMVP table is empty.

14. The method of claim 13, wherein the default motion vector is (0, 0).

15. The method of claim 12, wherein the motion vector is set to a first motion vector stored in the HMVP table if the HMVP table is non-empty.

16. The method of claim 12, wherein the motion vector is set to a last motion vector stored in the HMVP table if the HMVP table is non-empty.

17. The method of claim 12, wherein the motion vector is set to a particular motion vector stored in the HMVP table if the HMVP table is non-empty.

18. The method of claim 17, wherein the particular motion vector references reference list 0.

19. The method of claim 17, wherein the particular motion vector references reference list 1.

20. The method of claim 17, wherein the particular motion vector references a particular reference picture in reference list 0.

21. The method of claim 17, wherein the particular motion vector references a particular reference picture in reference list 1.

22. The method of claim 20 or 21, wherein the particular reference picture has an index of 0.

23. The method of claim 17, wherein the particular motion vector references a collocated picture.

24. The method of claim 12, wherein in the event that a particular motion vector is not found by a search process in the HMVP table, the motion vector is set to a default motion vector.

25. The method of claim 24, wherein the search process searches only for a first element or only a last element of the HMVP table.

26. The method of claim 24, wherein the search process searches only a subset of elements of the HMVP table.

27. The method of claim 12, wherein the motion vector stored in the HMVP table does not reference the current picture.

28. The method of claim 12, wherein the motion vectors stored in the HMVP table are scaled to the collocated picture without the motion vectors stored in the HMVP table referencing the collocated picture.

29. The method of claim 8, wherein the motion vector is set to a particular motion vector of a particular neighboring block.

30. The method of claim 29, wherein the upper right corner of the adjacent block (A0) is directly adjacent to the lower left corner of the second block, or the lower right corner of the adjacent block (A1) is directly adjacent to the lower left corner of the second block, or the lower left corner of the adjacent block (B0) is directly adjacent to the upper right corner of the second block, or the lower right corner of the adjacent block (B1) is directly adjacent to the upper right corner of the second block, or the lower right corner of the adjacent block (B2) is directly adjacent to the upper left corner of the second block.

31. The method of claim 8, wherein the motion vector is set to a default motion vector in the absence of a particular neighboring block.

32. The method of claim 8, wherein the motion vector is set as a default motion vector without inter-coding a particular neighboring block.

33. The method of claim 29, wherein the particular motion vector references reference list 0.

34. The method of claim 29, wherein the particular motion vector references reference list 1.

35. The method of claim 29, wherein the particular motion vector references a particular reference picture in reference list 0.

36. The method of claim 29, wherein the particular motion vector references a particular reference picture in reference list 1.

37. The method of claim 35 or 36, wherein the particular reference picture has an index of 0.

38. The method of claim 29 or 30, wherein the particular motion vector references a collocated picture.

39. The method of claim 29 or 30, wherein the motion vector is set to a default motion vector if the particular neighboring block does not reference the collocated picture.

40. The method of any of claims 31-34, wherein the default motion vector is (0, 0).

41. The method of claim 8, wherein the motion vector is set as a default motion vector in case a particular motion vector stored in a particular neighboring block cannot be found.

42. The method of claim 29, wherein the particular motion vector is scaled to a collocated picture if the particular motion vector does not reference the collocated picture.

43. The method of claim 29, wherein the particular motion vector does not reference the current picture.

44. The method of claim 1, wherein performing the conversion comprises parsing the bitstream based on one or more decoding rules.

45. A device for processing video data, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to:

performing the conversion based on the determined candidate,

46. The apparatus of claim 45, wherein a plurality of binary bits (N) are used to represent the sub-block Merge index.

47. The apparatus of claim 46, wherein a first number (L) of the plurality of bits is context-coded, and wherein a second number (N-L) of bits is bypass-coded.

48. The apparatus of claim 47, wherein L = 1.

49. A non-transitory computer-readable storage medium storing instructions that cause a processor to:

determining whether a reference picture of a particular neighboring block A1 is a collocated picture of the current block, the collocated picture being different from a current picture including the current block; if the reference picture of the specific neighboring block A1 is a collocated picture of the current block, setting the temporal motion information to the specific motion information associated with the specific neighboring block A1 without checking any other neighboring block, and

performing the conversion based on the determined candidate,

50. The non-transitory computer readable recording medium of claim 49, wherein a plurality of binary bits (N) are used to represent the sub-block Merge index.

51. The non-transitory computer readable recording medium of claim 50, wherein a first number (L) of the plurality of bits is context-coded, and wherein a second number (N-L) of bits is bypass-coded.

52. A method of storing a bitstream of video, comprising:

constructing a sub-block motion candidate list based at least on the at least one sub-block motion information;

determining candidates from the sub-block motion candidate list based on sub-block Merge indexes included in the bitstream;

generating the bitstream based on the determined candidates; and

The bit stream is stored in a non-transitory computer readable recording medium,

53. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 5 to 44, 52.

54. A non-transitory computer readable storage medium storing computer readable instructions, wherein the computer readable instructions when executed by a computer implement comprises instructions for performing the method of any one of claims 4 to 44, 52.