Nothing Special   »   [go: up one dir, main page]

WO2024027566A1 - Limitation de coefficient de modèle de convolution - Google Patents

Limitation de coefficient de modèle de convolution Download PDF

Info

Publication number
WO2024027566A1
WO2024027566A1 PCT/CN2023/109712 CN2023109712W WO2024027566A1 WO 2024027566 A1 WO2024027566 A1 WO 2024027566A1 CN 2023109712 W CN2023109712 W CN 2023109712W WO 2024027566 A1 WO2024027566 A1 WO 2024027566A1
Authority
WO
WIPO (PCT)
Prior art keywords
coefficients
current block
samples
video
derived
Prior art date
Application number
PCT/CN2023/109712
Other languages
English (en)
Inventor
Cheng-Yen Chuang
Ching-Yeh Chen
Original Assignee
Mediatek Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mediatek Inc. filed Critical Mediatek Inc.
Priority to TW112128822A priority Critical patent/TW202412522A/zh
Publication of WO2024027566A1 publication Critical patent/WO2024027566A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking

Definitions

  • the present disclosure relates generally to video coding.
  • the present disclosure relates to methods of coding pixel blocks by cross-component prediction.
  • High-Efficiency Video Coding is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) .
  • JCT-VC Joint Collaborative Team on Video Coding
  • HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture.
  • the basic unit for compression termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached.
  • Each CU contains one or multiple prediction units (PUs) .
  • VVC Versatile video coding
  • JVET Joint Video Expert Team
  • the input video signal is predicted from the reconstructed signal, which is derived from the coded picture regions.
  • the prediction residual signal is processed by a block transform.
  • the transform coefficients are quantized and entropy coded together with other side information in the bitstream.
  • the reconstructed signal is generated from the prediction signal and the reconstructed residual signal after inverse transform on the de-quantized transform coefficients.
  • the reconstructed signal is further processed by in-loop filtering for removing coding artifacts.
  • the decoded pictures are stored in the frame buffer for predicting the future pictures in the input video signal.
  • a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) .
  • the leaf nodes of a coding tree correspond to the coding units (CUs) .
  • a coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order.
  • a bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block.
  • a predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block.
  • An intra (I) slice is decoded using intra prediction only.
  • a CTU can be partitioned into one or multiple non-overlapped coding units (CUs) using the quadtree (QT) with nested multi-type-tree (MTT) structure to adapt to various local motion and texture characteristics.
  • a CU can be further split into smaller CUs using one of the five split types: quad-tree partitioning, vertical binary tree partitioning, horizontal binary tree partitioning, vertical center-side triple-tree partitioning, horizontal center-side triple-tree partitioning.
  • Each CU contains one or more prediction units (PUs) .
  • the prediction unit together with the associated CU syntax, works as a basic unit for signaling the predictor information.
  • the specified prediction process is employed to predict the values of the associated pixel samples inside the PU.
  • Each CU may contain one or more transform units (TUs) for representing the prediction residual blocks.
  • a transform unit (TU) is comprised of a transform block (TB) of luma samples and two corresponding transform blocks of chroma samples and each TB correspond to one residual block of samples from one color component.
  • An integer transform is applied to a transform block.
  • the level values of quantized coefficients together with other side information are entropy coded in the bitstream.
  • coding tree block CB
  • CB coding block
  • PB prediction block
  • TB transform block
  • motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation.
  • the motion parameter can be signalled in an explicit or implicit manner.
  • a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index.
  • a merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC.
  • the merge mode can be applied to any inter-predicted CU.
  • the alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.
  • Some embodiments of the disclosure provide a method for performing cross component prediction by constraining the coefficients of a component prediction model.
  • a video coder receives data for a block of pixels to be encoded or decoded as a current block of a current picture of a video.
  • the video coder derives a set of coefficients based on corresponding input and output component samples.
  • the video coder constrains the derived set of coefficients based on a set of constraints.
  • the video coder applies the constrained set of coefficients as a component prediction model to generate a predictor for the current block.
  • the video coder encodes or decodes the current block by using the generated predictor.
  • the component prediction model is a cross component model that generates prediction chroma samples based on reconstructed luma samples for the current block, and the corresponding input and output component samples are corresponding luma and chroma samples of a template region neighboring the current block.
  • the predictor may include the generated prediction chroma samples.
  • the component prediction model is a convolution model and the derived set of coefficients are derived by solving a matrix equation between the corresponding input and output component samples.
  • the encoder constrains the derived set of coefficients by clipping a coefficient at a clipping threshold, or by clipping different coefficients at different clipping thresholds, or by confining a coefficient to a predefined range.
  • the derived set of coefficients and the constrained set of coefficients are represented in the encoder by fixed point. For fixed point representation having a fractional portion comprising N bits, the predefined range is sized based on 1 ⁇ N.
  • the set of coefficients is set to equal to an identity filter, or a clipping operation is applied to the out-of-range coefficient, or the derived set of coefficients is not used to encode or decode the current block (CCCM mode disabled. )
  • FIG. 1 conceptually illustrates chroma and luma samples that are used for derivation of linear model parameters.
  • FIG. 2 shows an example of classifying the neighbouring samples into two groups.
  • FIG. 3 conceptually illustrates the spatial components of a convolutional filter.
  • FIG. 4 illustrates a reference area that is used to derive filter coefficients for a convolution model for a current block.
  • FIG. 5 conceptually illustrates a data path of a video coder that derives and constrains model coefficients.
  • FIG. 6 illustrates an example video encoder that may use component prediction models for encoding pixel blocks.
  • FIG. 7 illustrates portions of the video encoder that derives and uses a component prediction model by constraining its coefficients.
  • FIG. 8 conceptually illustrates a process that applies constraints to coefficients of a component prediction model.
  • FIG. 9 illustrates an example video decoder that may use component prediction models for encoding pixel blocks.
  • FIG. 10 illustrates portions of the video decoder that derives and uses a component prediction model by constraining its coefficients.
  • FIG. 11 conceptually illustrates a process that applies constraints to coefficients of a component prediction model.
  • FIG. 12 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.
  • Cross Component Linear Model (CCLM) or Linear Model (LM) mode is a cross component prediction mode in which chroma components of a block is predicted from the collocated reconstructed luma samples by linear models.
  • the parameters (e.g., scale and offset) of the linear model are derived from already reconstructed luma and chroma samples that are adjacent to the block.
  • P (i, j) in eq. (1) represents the predicted chroma samples in a CU (or the predicted chroma samples of the current CU) and rec′ L (i, j) represents the down-sampled reconstructed luma samples of the same CU (or the corresponding reconstructed luma samples of the current CU) .
  • the CCLM model parameters ⁇ (scaling parameter) and ⁇ (offset parameter) are derived based on at most four neighboring chroma samples and their corresponding down-sampled luma samples.
  • LM_Amode also denoted as LM-T mode
  • LM_L mode also denoted as LM-L mode
  • LM-LA mode both left and above templates are used to calculate the linear model coefficients.
  • FIG. 1 conceptually illustrates chroma and luma samples that are used for derivation of linear model parameters.
  • the figure illustrates a current block 100 having luma component samples and chroma component samples in 4: 2: 0 format.
  • the luma and chroma samples neighboring the current block are reconstructed samples. These reconstructed samples are used to derive the cross-component linear model (parameters ⁇ and ⁇ ) .
  • the luma samples are down-sampled first before being used for linear model derivation.
  • there are 16 pairs of reconstructed luma (down-sampled) and chroma samples neighboring the current block are used to derive the linear model parameters.
  • the above neighboring positions are denoted as S [0, -1] . . . S [W’ -1, -1] and the left neighboring positions are denoted as S [-1, 0] . . . S [-1, H’ -1] . Then the four samples are selected as
  • the four neighboring luma samples at the selected positions are down-sampled and compared four times to find two larger values: x 0 A and x 1 A , and two smaller values: x 0 B and x 1 B .
  • Their corresponding chroma sample values are denoted as y 0 A , y 1 A , y 0 B and y 1 B .
  • the operations to calculate the ⁇ and ⁇ parameters according to eq. (4) and (5) may be implemented by a look-up table.
  • the above template is extended to contain (W+H) samples for LM-T mode
  • the left template is extended to contain (H+W) samples for LM-L mode.
  • both the extended left template and the extended above templates are used to calculate the linear model coefficients.
  • the two down-sampling filters are as follows, which correspond to “type-0” and “type-2” content, respectively.
  • the ⁇ and ⁇ parameters computation is performed as part of the decoding process, and is not just as an encoder search operation. As a result, no syntax is used to convey the ⁇ and ⁇ values to decoder.
  • Chroma intra mode coding For chroma intra mode coding, a total of 8 intra modes are allowed. Those modes include five traditional intra modes and three cross-component linear model modes (LM_LA, LM_A, and LM_L) . Chroma intra mode coding may directly depend on the intra prediction mode of the corresponding luma block. Chroma intra mode signaling and corresponding luma intra prediction modes are according to the following table:
  • one chroma block may correspond to multiple luma blocks. Therefore, for chroma derived mode (DM) mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
  • DM chroma derived mode
  • a single unified binarization table (mapping to bin string) is used for chroma intra prediction mode according to the following table:
  • the first bin indicates whether it is regular (0) or LM mode (1) . If it is LM mode, then the next bin indicates whether it is LM_CHROMA (0) or not. If it is not LM_CHROMA, next 1 bin indicates whether it is LM_L (0) or LM_A (1) .
  • the first bin of the binarization table for the corresponding intra_chroma_pred_mode can be discarded prior to the entropy coding. Or, in other words, the first bin is inferred to be 0 and hence not coded.
  • This single binarization table is used for both sps_cclm_enabled_flag equal to 0 and 1 cases.
  • the first two bins in the table are context coded with its own context model, and the rest bins are bypass coded.
  • the chroma CUs in 32x32 /32x16 chroma coding tree node are allowed to use CCLM in the following way:
  • CCLM is not allowed for chroma CU.
  • Multi-Model CCLM (MMLM)
  • Multiple model CCLM mode uses two models for predicting the chroma samples from the luma samples for the whole CU. Similar to CCLM, three multiple model CCLM modes (MMLM_LA, MMLM_A, and MMLM_L) are used to indicate if both above and left neighboring samples, only above neighboring samples, or only left neighboring samples are used in model parameters derivation.
  • neighbouring luma samples and neighbouring chroma samples of the current block are classified into two groups, each group is used as a training set to derive a linear model (i.e., a particular ⁇ and ⁇ are derived for a particular group) . Furthermore, the samples of the current luma block are also classified based on the same rule for the classification of neighbouring luma samples.
  • a convolutional cross-component model is applied to improve the cross-component prediction performance.
  • the convolutional model has 7-tap filter having a 5-tap plus sign shape spatial component, a non-linear term and a bias term.
  • the input to the spatial 5-tap component of the filter includes a center (C) luma sample (which is collocated with the chroma sample to be predicted) and the center luma sample’s above/north (N) , below/south (S) , left/west (W) and right/east (E) neighbors.
  • FIG. 3 conceptually illustrates the spatial components of a convolutional filter.
  • the bias term (denoted as B) represents a scalar offset between the input and output (similarly to the offset term in CCLM) and is set to middle chroma value (512 for 10-bit content) .
  • the filter coefficients c i are calculated by minimizing MSE between the reconstructed (or target) chroma samples in a reference area and their corresponding predicted chroma samples.
  • Each predicted chroma sample is generated from a collocated luma sample and its surrounding luma samples using a derived component prediction model such as Eq. (10) .
  • Eq. (10) is a convolution model based on taps for the center sample and four surrounding samples (C, N, S, E, W) . More generally, eq. (10) can be expanded to include taps for the center sample and 8 surrounding samples (C, N, S, E, W, NE, NW, SE, SW. ) Eq. (10) and its expanded form can more generally be referred to as a component prediction model (as it can be used for cross-component prediction or intra-component prediction) .
  • FIG. 4 illustrates a reference area that is used to derive filter coefficients for a convolution model for a current block.
  • the reference area includes (reference) lines of (chroma) samples above and left of the current block 400.
  • the current block 400 is a PU in this example) .
  • the reference area extends one PU width to the right and one PU height below the PU boundaries.
  • the reference area may be adjusted to include only available samples.
  • An extension area to the reference area is used to support the “side samples” of the plus shaped spatial filter (e.g., N, E, W, S samples as described by reference to FIG. 3 above, and in addition, NW, NE, SW, SE samples) and are padded when in unavailable areas.
  • the plus shaped spatial filter e.g., N, E, W, S samples as described by reference to FIG. 3 above, and in addition, NW, NE, SW, SE samples
  • the MSE minimization may be performed by calculating an autocorrelation matrix for the luma input and a cross-correlation vector between the luma input and chroma output.
  • the autocorrelation matrix is LDL decomposed and the final filter coefficients are calculated using back-substitution.
  • the process is similar to the calculation of the ALF filter coefficients in ECM, however, in some embodiments, LDL decomposition was chosen instead of Cholesky decomposition to avoid using square root operations.
  • a higher degree model is used to predict chroma samples, instead of the linear model.
  • the higher degree model may include a k-tap spatial term, a non-linear term (denoted as P) , and a bias term (denoted as B) , which may be formulated as:
  • rec L ' (i, j) is the down-sampled reconstructed luma sample at position (i, j)
  • neiRec L ' (x) is one of neighboring samples surrounding rec L ' (i, j)
  • a 0 , a x , b, and c are model parameters.
  • the higher degree model can be used in deriving model parameters between color components, or between the reference samples of the current frame/picture and reference frames/pictures.
  • the final derived filter/model coefficients are used to generate a CCCM predictor.
  • the derived filter coefficients may be unreasonable, such as when the model overfits on the reference samples, or when the filter coefficients are too large.
  • FIG. 5 conceptually illustrates a data path 500 of a video coder that derives and constrains model coefficients (e.g., for CCCM) .
  • the constrained model coefficients are used to generate a predictor for the current block.
  • the data path 500 starts with a matrix preparation module 510 that generates an autocorrelation matrix and a cross correlation vector 515.
  • the auto correlation matrix is prepared based on corresponding input component samples 505 (X samples) .
  • the correlation vector is prepared based on corresponding input and output component samples 505 (X samples and Y samples) .
  • Corresponding input and output component samples may be corresponding reconstructed luma and chroma samples of a reference template region neighboring the current block.
  • the generated auto correlation matrix and cross-correlation vector 515 are provided to a matrix equation solver module 530 to produce a set of optimized coefficients 535.
  • a coefficient constraint module 540 applies constraints on the optimized coefficients 535 to result in a set of constrained final coefficients 545.
  • the constrained final coefficients 545 is finally used as a component prediction model 550 (e.g., CCLM or CCCM) to generate prediction component samples 565 (e.g., chroma of the current block) based on reference component samples 560 (e.g., luma of the current block) .
  • the generated prediction component samples 565 can be used as a CCCM predictor.
  • a pre-defined clipping threshold is used to clip the optimized coefficients 535 before generating the constrained final coefficients 545.
  • several clipping thresholds for the optimized coefficients 535 are pre-defined, and a syntax may be signaled for indicating the selected threshold.
  • several clipping thresholds for the optimized coefficients 535 are pre-defined, and the selected threshold may be explicitly derived from neighboring reconstructed samples or side information.
  • the clipping threshold of the different coefficients may be all different or partially different.
  • coefficients of the component prediction model 550 are represented by fixed point, and the bit width of its integer part is constrained to a certain value.
  • the fixed-point format for the optimized coefficients 535 has 48 bits for the integer portion and 16 bits for the fraction portion.
  • the fixed-point format for the constrained coefficients 545 in the example shown in FIG. 5 has 36 bits for the integer portion and 16 bits for the fraction portion.
  • the constrained coefficients 545 may have different bits for the integer portion.
  • the bit-width (s) of the integer and/or the fractional portion of each coefficient may be entirely different or partially different.
  • the constrained coefficients may be represented in a fixed-point format with 36 bits of integer portion and 14 bits of fraction portion.
  • the video coder may apply a clipping operation to out-of-range coefficients. In some embodiments, if a coefficient is out-of-range, CCCM or CCLM mode may be inferred to not be enabled.
  • the optimized coefficient 535 is clipped into one predefined range before generating the final predictor 565.
  • the range may be [-1, 1) , [-2, 2] , [-4, 4) or [-8, 8] in terms of floating-point precision.
  • a fixed-point precision is used to represent a model coefficient, and the number of bits in the fractional part is N bits, then the predefined range will be enlarged by (1 ⁇ N) .
  • the supported range is [-8, 8) in floating-point precision and the number of bits in the fractional part is 5 bits, the supported range is changed from [-8, 8) in the floating-point precision to [-8*32, 8*32) in the fixed-point precision with 5-bits fractional part.
  • the predefined range may be different among different model coefficients.
  • the predefined range depends on the spatial position of model coefficients. In some embodiments, the predefined range depends on the order of the corresponding inputs. In some embodiments, only one predefined range is used for all coefficients.
  • the final coefficient 545 when one derived coefficient is out of the predefined range, is derived by clipping the coefficient into the predefined range. In some embodiments, when one derived coefficient is out of the predefined range, the final coefficient 545 is set equal to zero. In some embodiments, when one derived coefficient is out of the predefined range, the coefficients of CCCM mode are set equal to the identity filter (i.e., no filtering applied by the CCCM mode) .
  • any of the foregoing proposed methods can be implemented in encoders and/or decoders.
  • any of the proposed methods can be implemented in an inter/intra/prediction module of an encoder, and/or an inter/intra/prediction module of a decoder.
  • any of the proposed methods can be implemented as a circuit coupled to the inter/intra/prediction module of the encoder and/or the inter/intra/prediction module of the decoder, so as to provide the information needed by the inter/intra/prediction module.
  • FIG. 6 illustrates an example video encoder 600 that may use component prediction models for encoding pixel blocks.
  • the video encoder 600 receives input video signal from a video source 605 and encodes the signal into bitstream 695.
  • the video encoder 600 has several components or modules for encoding the signal from the video source 605, at least including some components selected from a transform module 610, a quantization module 611, an inverse quantization module 614, an inverse transform module 615, an intra-picture estimation module 620, an intra-prediction module 625, a motion compensation module 630, a motion estimation module 635, an in-loop filter 645, a reconstructed picture buffer 650, a MV buffer 665, and a MV prediction module 675, and an entropy encoder 690.
  • the motion compensation module 630 and the motion estimation module 635 are part of an inter-prediction module 640.
  • the modules 610 –690 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 610 –690 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 610 –690 are illustrated as being separate modules, some of the modules can be combined into a single module.
  • the video source 605 provides a raw video signal that presents pixel data of each video frame without compression.
  • a subtractor 608 computes the difference between the raw video pixel data of the video source 605 and the predicted pixel data 613 from the motion compensation module 630 or intra-prediction module 625 as prediction residual 609.
  • the transform module 610 converts the difference (or the residual pixel data or residual signal 608) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) .
  • the quantization module 611 quantizes the transform coefficients into quantized data (or quantized coefficients) 612, which is encoded into the bitstream 695 by the entropy encoder 690.
  • the inverse quantization module 614 de-quantizes the quantized data (or quantized coefficients) 612 to obtain transform coefficients, and the inverse transform module 615 performs inverse transform on the transform coefficients to produce reconstructed residual 619.
  • the reconstructed residual 619 is added with the predicted pixel data 613 to produce reconstructed pixel data 617.
  • the reconstructed pixel data 617 is temporarily stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
  • the reconstructed pixels are filtered by the in-loop filter 645 and stored in the reconstructed picture buffer 650.
  • the reconstructed picture buffer 650 is a storage external to the video encoder 600.
  • the reconstructed picture buffer 650 is a storage internal to the video encoder 600.
  • the intra-picture estimation module 620 performs intra-prediction based on the reconstructed pixel data 617 to produce intra prediction data.
  • the intra-prediction data is provided to the entropy encoder 690 to be encoded into bitstream 695.
  • the intra-prediction data is also used by the intra-prediction module 625 to produce the predicted pixel data 613.
  • the motion estimation module 635 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 650. These MVs are provided to the motion compensation module 630 to produce predicted pixel data.
  • the video encoder 600 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 695.
  • the MV prediction module 675 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation.
  • the MV prediction module 675 retrieves reference MVs from previous video frames from the MV buffer 665.
  • the video encoder 600 stores the MVs generated for the current video frame in the MV buffer 665 as reference MVs for generating predicted MVs.
  • the MV prediction module 675 uses the reference MVs to create the predicted MVs.
  • the predicted MVs can be computed by spatial MV prediction or temporal MV prediction.
  • the difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 695 by the entropy encoder 690.
  • the entropy encoder 690 encodes various parameters and data into the bitstream 695 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
  • CABAC context-adaptive binary arithmetic coding
  • the entropy encoder 690 encodes various header elements, flags, along with the quantized transform coefficients 612, and the residual motion data as syntax elements into the bitstream 695.
  • the bitstream 695 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.
  • the in-loop filter 645 performs filtering or smoothing operations on the reconstructed pixel data 617 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
  • the filtering or smoothing operations performed by the in-loop filter 645 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
  • DPF deblock filter
  • SAO sample adaptive offset
  • ALF adaptive loop filter
  • FIG. 7 illustrates portions of the video encoder 600 that derives and uses a component prediction model by constraining its coefficients.
  • an initial predictor generation module 720 provides an initial predictor 715 to a component predictor model 710.
  • the initial predictor 715 may include component samples (luma or chroma) of a reference block for predicting the component samples of the current block, or component samples of the current block for cross-component prediction.
  • the component prediction model 710 is applied to the initial predictor 715 to generate a refined predictor 725.
  • the samples of the refined predictor 725 may be used as the predicted pixel data 613.
  • the initial predictor 715 may be reconstructed luma samples of the current block, while the refined predictor 725 may be prediction chroma samples of the current block.
  • the motion estimation module 635 When the current block is coded by inter-prediction, the motion estimation module 635 provides a MV that is used by the motion compensation module 630 to identify a reference block in a reference picture as the initial predictor 715.
  • the intra-prediction estimation module 620 provide an intra mode that is used by the intra-prediction module 625 to generate an intra prediction of current block as the initial predictor 715.
  • the component samples of the initial predictor 715 may then be used as input to the component prediction model 710.
  • a regression data selection module 730 retrieves the required component samples from the reconstructed picture buffer 650 to serve as regression data.
  • the regression data may be taken from regions in and/or around the current block in the current picture and in and/or around the reference block in a reference picture.
  • the retrieved regression data (i.e., the required component samples) include corresponding input (X) and output (Y) component samples used to determine the coefficients or parameters of the component prediction model 710.
  • a model constructor 705 uses the regression data (X and Y) to derive the coefficients of the component prediction model 710 using techniques such as elimination method, iteration method, or decomposition method.
  • the model constructor 705 applies certain constraints to the derived coefficients before providing the constrained coefficients to be used as the component prediction model 710.
  • the model constructor 705 may constrain the coefficients by clipping at a threshold or confine the coefficients to a predefined range.
  • the model constructor 705 may apply different clipping thresholds for different coefficients.
  • the coefficients are represented by a fixed-point representation having N-bits for its fractional portion, and the predefined range is enlarged by 1 ⁇ N. Applying constraints to the coefficients of the component prediction model is described in Section IV above.
  • FIG. 8 conceptually illustrates a process 800 that applies constraints to coefficients of a component prediction model.
  • one or more processing units e.g., a processor
  • a computing device implementing the encoder 600 performs the process 800 by executing instructions stored in a computer readable medium.
  • an electronic apparatus implementing the encoder 600 performs the process 800.
  • the encoder receives (at block 810) data to be encoded as a current block of pixels in a current picture of a video.
  • the encoder derives (at block 820) a set of coefficients based on corresponding input and output component samples for a component prediction model.
  • the component prediction model is a cross component model that generates prediction chroma samples based on reconstructed luma samples for the current block, and the corresponding input and output component samples are corresponding luma and chroma samples of a template region neighboring the current block.
  • the component prediction model is a convolution model and the derived set of coefficients are derived by decomposition and back substitution based on an autocorrelation matrix between the corresponding input and output component samples.
  • the encoder constrains (at block 830) the derived set of coefficients based on a set of constraints.
  • the encoder constrains the derived set of coefficients by clipping a coefficient at a clipping threshold, or by clipping different coefficients at different clipping thresholds, or by confining a coefficient to a predefined range.
  • the derived set of coefficients and the constrained set of coefficients are represented in the encoder by floating point.
  • the coefficients are represented by a fixed-point representation having N-bits for its fractional portion, and the predefined range is sized based on 1 ⁇ N.
  • the set of coefficients is set to equal to an identity filter, or a clipping operation is applied to the out-of-range coefficient, or the derived set of coefficients is not used to encode or decode the current block (CCCM mode disabled. )
  • the encoder applies (at block 840) the constrained set of coefficients as the component prediction model to generate a predictor for the current block.
  • the predictor may include the generated prediction chroma samples.
  • the encoder encodes (at block 850) the current block by using the generated predictor to produce prediction residuals.
  • an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.
  • FIG. 9 illustrates an example video decoder 900 that may use component prediction models for encoding pixel blocks.
  • the video decoder 900 is an image-decoding or video-decoding circuit that receives a bitstream 995 and decodes the content of the bitstream into pixel data of video frames for display.
  • the video decoder 900 has several components or modules for decoding the bitstream 995, including some components selected from an inverse quantization module 911, an inverse transform module 910, an intra-prediction module 925, a motion compensation module 930, an in-loop filter 945, a decoded picture buffer 950, a MV buffer 965, a MV prediction module 975, and a parser 990.
  • the motion compensation module 930 is part of an inter-prediction module 940.
  • the modules 910 –990 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 910 –990 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 910 –990 are illustrated as being separate modules, some of the modules can be combined into a single module.
  • the parser 990 receives the bitstream 995 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard.
  • the parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 912.
  • the parser 990 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.
  • CABAC context-adaptive binary arithmetic coding
  • Huffman encoding Huffman encoding
  • the inverse quantization module 911 de-quantizes the quantized data (or quantized coefficients) 912 to obtain transform coefficients, and the inverse transform module 910 performs inverse transform on the transform coefficients 916 to produce reconstructed residual signal 919.
  • the reconstructed residual signal 919 is added with predicted pixel data 913 from the intra-prediction module 925 or the motion compensation module 930 to produce decoded pixel data 917.
  • the decoded pixels data are filtered by the in-loop filter 945 and stored in the decoded picture buffer 950.
  • the decoded picture buffer 950 is a storage external to the video decoder 900.
  • the decoded picture buffer 950 is a storage internal to the video decoder 900.
  • the intra-prediction module 925 receives intra-prediction data from bitstream 995 and according to which, produces the predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950.
  • the decoded pixel data 917 is also stored in a line buffer (not illustrated) for intra-picture prediction and spatial MV prediction.
  • the content of the decoded picture buffer 950 is used for display.
  • a display device 955 either retrieves the content of the decoded picture buffer 950 for display directly, or retrieves the content of the decoded picture buffer to a display buffer.
  • the display device receives pixel values from the decoded picture buffer 950 through a pixel transport.
  • the motion compensation module 930 produces predicted pixel data 913 from the decoded pixel data 917 stored in the decoded picture buffer 950 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 995 with predicted MVs received from the MV prediction module 975.
  • MC MVs motion compensation MVs
  • the MV prediction module 975 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation.
  • the MV prediction module 975 retrieves the reference MVs of previous video frames from the MV buffer 965.
  • the video decoder 900 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 965 as reference MVs for producing predicted MVs.
  • the in-loop filter 945 performs filtering or smoothing operations on the decoded pixel data 917 to reduce the artifacts of coding, particularly at boundaries of pixel blocks.
  • the filtering or smoothing operations performed by the in-loop filter 945 include deblock filter (DBF) , sample adaptive offset (SAO) , and/or adaptive loop filter (ALF) .
  • DPF deblock filter
  • SAO sample adaptive offset
  • ALF adaptive loop filter
  • FIG. 10 illustrates portions of the video decoder 900 that derives and uses a component prediction model by constraining its coefficients.
  • an initial predictor generation module 1020 provides an initial predictor 1015 to a component predictor model 1010.
  • the initial predictor 1015 may include component samples (luma or chroma) of a reference block for predicting the component samples of the current block, or component samples of the current block for cross-component prediction.
  • the component prediction model 1010 is applied to the initial predictor 1015 to generate a refined predictor 1025.
  • the samples of the refined predictor 1025 may be used as the predicted pixel data 913.
  • the initial predictor 1015 may be reconstructed luma samples of the current block, while the refined predictor 1025 may be prediction chroma samples of the current block.
  • the entropy decoder 990 When the current block is coded by inter-prediction, the entropy decoder 990 provides a MV that is used by the motion compensation module 930 to identify a reference block in a reference picture as the initial predictor 1015. When the current block is coded by intra-prediction, the entropy decoder 990 provide an intra mode that is used by the intra-prediction module 925 to generate an intra prediction of current block as the initial predictor 1015. The component samples of the initial predictor 1015 may then be used as input to the component prediction model 1010.
  • a regression data selection module 1030 retrieves the required component samples from the decoded picture buffer 950 to serve as regression data.
  • the regression data may be taken from regions in and/or around the current block in the current picture and in and/or around the reference block in a reference picture.
  • the retrieved regression data i.e., the required component samples
  • a model constructor 1005 uses the regression data (X and Y) to derive the coefficients of the component prediction model 1010 using techniques such as elimination method, iteration method, or decomposition method.
  • the model constructor 1005 applies certain constraints to the derived coefficients before providing the constrained coefficients to be used as the component prediction model 1010.
  • the model constructor 1005 may constrain the coefficients by clipping at a threshold or confine the coefficients to a predefined range.
  • the model constructor 1005 may apply different clipping thresholds for different coefficients.
  • the coefficients are represented by a fixed-point representation having N-bits for its fractional portion, and the predefined range is enlarged by 1 ⁇ N. Applying constraints to the coefficients of the component prediction model is described in Section IV above.
  • FIG. 11 conceptually illustrates a process 1100 that applies constraints to coefficients of a component prediction model.
  • one or more processing units e.g., a processor
  • a computing device implementing the decoder 900 performs the process 1100 by executing instructions stored in a computer readable medium.
  • an electronic apparatus implementing the decoder 900 performs the process 1100.
  • the decoder receives (at block 1110) data to be decoded as a current block of pixels in a current picture of a video.
  • the decoder derives (at block 1120) a set of coefficients based on corresponding input and output component samples for a component prediction model.
  • the component prediction model is a cross component model that generates prediction chroma samples based on reconstructed luma samples for the current block, and the corresponding input and output component samples are corresponding luma and chroma samples of a template region neighboring the current block.
  • the component prediction model is a convolution model and the derived set of coefficients are derived by decomposition and back substitution based on an autocorrelation matrix between the corresponding input and output component samples.
  • the decoder constrains (at block 1130) the derived set of coefficients based on a set of constraints.
  • the decoder constrains the derived set of coefficients by clipping a coefficient at a clipping threshold, or by clipping different coefficients at different clipping thresholds, or by confining a coefficient to a predefined range.
  • the derived set of coefficients and the constrained set of coefficients are represented in the decoder by floating point.
  • the coefficients are represented by a fixed-point representation having N-bits for its fractional portion, and the predefined range is sized based on 1 ⁇ N.
  • the set of coefficients is set to equal to an identity filter, or a clipping operation is applied to the out-of-range coefficient, or the derived set of coefficients is not used to encode or decode the current block (CCCM mode disabled. )
  • the decoder applies (at block 1140) the constrained set of coefficients as the component prediction model to generate a predictor for the current block.
  • the predictor may include the generated prediction chroma samples.
  • the decoder reconstructs (at block 1150) the current block by using the generated predictor.
  • the decoder may then provide the reconstructed current block for display as part of the reconstructed current picture.
  • Computer readable storage medium also referred to as computer readable medium
  • these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions.
  • computational or processing unit e.g., one or more processors, cores of processors, or other processing units
  • Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc.
  • the computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
  • the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor.
  • multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions.
  • multiple software inventions can also be implemented as separate programs.
  • any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure.
  • the software programs when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
  • FIG. 12 conceptually illustrates an electronic system 1200 with which some embodiments of the present disclosure are implemented.
  • the electronic system 1200 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device.
  • Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media.
  • Electronic system 1200 includes a bus 1205, processing unit (s) 1210, a graphics-processing unit (GPU) 1215, a system memory 1220, a network 1225, a read-only memory 1230, a permanent storage device 1235, input devices 1240, and output devices 1245.
  • the bus 1205 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1200.
  • the bus 1205 communicatively connects the processing unit (s) 1210 with the GPU 1215, the read-only memory 1230, the system memory 1220, and the permanent storage device 1235.
  • the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure.
  • the processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1215.
  • the GPU 1215 can offload various computations or complement the image processing provided by the processing unit (s) 1210.
  • the read-only-memory (ROM) 1230 stores static data and instructions that are used by the processing unit (s) 1210 and other modules of the electronic system.
  • the permanent storage device 1235 is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1200 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1235.
  • the system memory 1220 is a read-and-write memory device. However, unlike storage device 1235, the system memory 1220 is a volatile read-and-write memory, such a random access memory.
  • the system memory 1220 stores some of the instructions and data that the processor uses at runtime.
  • processes in accordance with the present disclosure are stored in the system memory 1220, the permanent storage device 1235, and/or the read-only memory 1230.
  • the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1210 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
  • the bus 1205 also connects to the input and output devices 1240 and 1245.
  • the input devices 1240 enable the user to communicate information and select commands to the electronic system.
  • the input devices 1240 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc.
  • the output devices 1245 display images generated by the electronic system or otherwise output data.
  • the output devices 1245 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
  • CTR cathode ray tubes
  • LCD liquid crystal displays
  • bus 1205 also couples electronic system 1200 to a network 1225 through a network adapter (not shown) .
  • the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1200 may be used in conjunction with the present disclosure.
  • Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) .
  • computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.
  • the computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • integrated circuits execute instructions that are stored on the circuit itself.
  • PLDs programmable logic devices
  • ROM read only memory
  • RAM random access memory
  • the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people.
  • display or displaying means displaying on an electronic device.
  • the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
  • any two components so associated can also be viewed as being “operably connected” , or “operably coupled” , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable” , to each other to achieve the desired functionality.
  • operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé de réalisation d'une prédiction inter-composantes par limitation des coefficients d'un modèle de prédiction de composantes. Un codeur vidéo reçoit des données pour un bloc de pixels à coder ou décoder sous la forme d'un bloc courant d'une image courante d'une vidéo. Le codeur vidéo dérive un ensemble de coefficients sur la base d'échantillons de composantes d'entrée et de sortie correspondants. Le codeur vidéo limite l'ensemble dérivé de coefficients sur la base d'un ensemble de contraintes. Le codeur vidéo applique l'ensemble limité de coefficients en tant que modèle de prédiction de composantes pour générer un prédicteur pour le bloc courant. Le codeur vidéo code ou décode le bloc courant à l'aide du prédicteur généré.
PCT/CN2023/109712 2022-08-02 2023-07-28 Limitation de coefficient de modèle de convolution WO2024027566A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW112128822A TW202412522A (zh) 2022-08-02 2023-08-01 約束卷積模型的係數

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263370133P 2022-08-02 2022-08-02
US63/370,133 2022-08-02

Publications (1)

Publication Number Publication Date
WO2024027566A1 true WO2024027566A1 (fr) 2024-02-08

Family

ID=89848487

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109712 WO2024027566A1 (fr) 2022-08-02 2023-07-28 Limitation de coefficient de modèle de convolution

Country Status (2)

Country Link
TW (1) TW202412522A (fr)
WO (1) WO2024027566A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200154115A1 (en) * 2018-11-08 2020-05-14 Qualcomm Incorporated Cross-component prediction for video coding
WO2021045654A2 (fr) * 2019-12-30 2021-03-11 Huawei Technologies Co., Ltd. Procédé et appareil de filtrage pour la prédiction de modèle linéaire inter-composantes
US20210227229A1 (en) * 2018-10-08 2021-07-22 Huawei Technologies Co., Ltd. Intra prediction method and device
WO2021247881A1 (fr) * 2020-06-03 2021-12-09 Beijing Dajia Internet Information Technology Co., Ltd. Amélioration du codage par chrominance dans la réalisation d'une prédiction à partir de modes pmc (multiple cross-components)
CN114258683A (zh) * 2019-08-20 2022-03-29 佳能株式会社 用于色度的跨分量自适应环路滤波器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210227229A1 (en) * 2018-10-08 2021-07-22 Huawei Technologies Co., Ltd. Intra prediction method and device
US20200154115A1 (en) * 2018-11-08 2020-05-14 Qualcomm Incorporated Cross-component prediction for video coding
CN114258683A (zh) * 2019-08-20 2022-03-29 佳能株式会社 用于色度的跨分量自适应环路滤波器
WO2021045654A2 (fr) * 2019-12-30 2021-03-11 Huawei Technologies Co., Ltd. Procédé et appareil de filtrage pour la prédiction de modèle linéaire inter-composantes
WO2021247881A1 (fr) * 2020-06-03 2021-12-09 Beijing Dajia Internet Information Technology Co., Ltd. Amélioration du codage par chrominance dans la réalisation d'une prédiction à partir de modes pmc (multiple cross-components)

Also Published As

Publication number Publication date
TW202412522A (zh) 2024-03-16

Similar Documents

Publication Publication Date Title
US11546587B2 (en) Adaptive loop filter with adaptive parameter set
US11172203B2 (en) Intra merge prediction
US10887594B2 (en) Entropy coding of coding units in image and video data
WO2020038465A1 (fr) Codage de coefficients de transformée à limites de débit
WO2021139770A1 (fr) Paramètres liés à la quantification de signalisation
US11350131B2 (en) Signaling coding of transform-skipped blocks
WO2024027566A1 (fr) Limitation de coefficient de modèle de convolution
WO2024017006A1 (fr) Accès à des échantillons voisins pour dérivation de modèle non linéaire inter-composantes
WO2024012243A1 (fr) Dérivation unifiée de modèle inter-composants
WO2023217235A1 (fr) Affinement de prédiction avec modèle de convolution
WO2023208063A1 (fr) Dérivation de modèle linéaire pour prédiction inter-composantes par de multiples lignes de référence
WO2023193769A1 (fr) Affinement de vecteur de mouvement côté décodeur multipasse implicite
WO2023236775A1 (fr) Image de codage adaptative et données vidéo
WO2023116704A1 (fr) Prédiction de modèle linéaire trans-composante multi-modèle
WO2024222411A1 (fr) Codage entropique de blocs de transformée
WO2023093863A1 (fr) Compensation d'illumination locale avec paramètres codés
US11785204B1 (en) Frequency domain mode decision for joint chroma coding
WO2023071778A1 (fr) Modèle linéaire inter-composantes de signalisation
WO2023202569A1 (fr) Mise en correspondance de modèles étendus pour codage vidéo
WO2023197998A1 (fr) Types de partition de blocs étendus pour le codage vidéo
WO2024146511A1 (fr) Mode de prédiction représentatif d'un bloc de pixels
WO2024016955A1 (fr) Vérification hors limite dans un codage vidéo
WO2023143173A1 (fr) Affinement de vecteurs de mouvement côté décodeur en plusieurs passes
WO2023186040A1 (fr) Modèle bilatéral avec affinement du vecteur de mouvement côté décodeur multipasse
WO2024022146A1 (fr) Utilisation de multiples lignes de référence aux fins de prédiction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849274

Country of ref document: EP

Kind code of ref document: A1