WO2014038906A1 - 영상 복호화 방법 및 이를 이용하는 장치 - Google Patents
영상 복호화 방법 및 이를 이용하는 장치 Download PDFInfo
- Publication number
- WO2014038906A1 WO2014038906A1 PCT/KR2013/008120 KR2013008120W WO2014038906A1 WO 2014038906 A1 WO2014038906 A1 WO 2014038906A1 KR 2013008120 W KR2013008120 W KR 2013008120W WO 2014038906 A1 WO2014038906 A1 WO 2014038906A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- dimension
- information
- scalability
- prediction
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/184—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Definitions
- the present invention relates to video compression techniques, and more particularly, to a method and apparatus for performing scalable video coding.
- video quality of the terminal device can be supported and the network environment is diversified, in general, video of general quality may be used in one environment, but higher quality video may be used in another environment. .
- a consumer who purchases video content on a mobile terminal can view the same video content on a larger screen and at a higher resolution through a large display in the home.
- UHD Ultra High Definition
- the quality of the image for example, the image quality, the resolution of the image, the size of the image. It is necessary to provide scalability in the frame rate of video and the like. In addition, various image processing methods associated with such scalability should be discussed.
- An object of the present invention is to provide a method and apparatus for describing scalability information in a bitstream.
- Another object of the present invention is to provide a method and apparatus for representing scalability information of various kinds of bitstreams in a flexible manner.
- Another object of the present invention is to provide a method for identifying a scalability layer in a bitstream and an apparatus using the same.
- a decoding method of an image for decoding a bitstream including a plurality of layers includes a dimension type for identifying scalability of a layer and a dimension ID for identifying a layer to which the dimension type is applied.
- the method may further include receiving at least one of the number of the dimension types, the dimension ID for identifying a layer to which the dimension type is applied, and a layer ID.
- the dimension type and the dimension ID may be determined by referring to a predetermined table.
- the sum of lengths of dimension IDs for identifying layers to which the dimension type of the i-th layer is applied may be equal to the number of bits of the layer ID for the i-th layer.
- the sum of the lengths of the dimension IDs for the i th layer may be 6.
- the dimension type may include at least one of multi view scalability, depth scalability, spatial scalability, and quality scalability. .
- the method may further include receiving flag information indicating whether to indicate the dimension ID by dividing the number of bits of the layer ID, wherein the dimension ID may be received when the flag information has a value of zero.
- an apparatus for decoding a bitstream including a plurality of layers including a dimension type for identifying scalability of the plurality of layers and a layer for identifying the layer to which the dimension type is applied.
- An information grasping unit for parsing information about a video parameter set including information about a length of a mental ID to determine scalability information; It may include an upper layer decoding unit for reconstructing the image of the upper layer by using the scalability information.
- a method and apparatus for describing scalability information in a bitstream is provided.
- a method and apparatus for representing scalability information of various types of bitstreams in a flexible manner is provided.
- a method for identifying a scalability layer in a bitstream and an apparatus using the same are provided.
- FIG. 1 is a block diagram schematically illustrating a video encoding apparatus supporting scalability according to an embodiment of the present invention.
- FIG. 2 is a block diagram schematically illustrating a video decoding apparatus supporting scalability according to an embodiment of the present invention.
- FIG. 3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied.
- FIG. 4 is a diagram illustrating an example of a framework for multi-view coding.
- FIG. 5 is a diagram illustrating an example of a framework for coding 3D video using a depth map.
- FIG. 6 is a diagram illustrating an example of a framework for spatial scalability coding.
- FIG. 7 is a diagram illustrating an example of a framework for image quality scalability coding.
- FIG. 8 is a control block diagram illustrating a video encoding apparatus according to an embodiment of the present invention.
- FIG. 9 is a control block diagram illustrating a video decoding apparatus according to an embodiment of the present invention.
- FIG. 10 is a control flowchart illustrating a method of encoding image information according to the present invention.
- FIG. 11 is a control flowchart illustrating a decoding method of image information according to the present invention.
- each of the components in the drawings described in the present invention are shown independently for the convenience of description of the different characteristic functions in the video encoding apparatus / decoding apparatus, each component is a separate hardware or separate software It does not mean that it is implemented.
- two or more of each configuration may be combined to form one configuration, or one configuration may be divided into a plurality of configurations.
- Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention without departing from the spirit of the present invention.
- input signals may be processed for each layer.
- the input signals may differ in at least one of resolution, frame rate, bit-depth, color format, and aspect ratio. Can be.
- scalable coding includes scalable encoding and scalable decoding.
- prediction between layers is performed by using differences between layers, that is, based on scalability, thereby reducing overlapping transmission / processing of information and increasing compression efficiency.
- FIG. 1 is a block diagram schematically illustrating a video encoding apparatus supporting scalability according to an embodiment of the present invention.
- the encoding apparatus 100 includes an encoder 105 for layer 1 and an encoder 135 for layer 0.
- Layer 0 may be a base layer, a reference layer, or a lower layer
- layer 1 may be an enhancement layer, a current layer, or an upper layer.
- the encoding unit 105 of the layer 1 includes a prediction unit 110, a transform / quantization unit 115, a filtering unit 120, a decoded picture buffer (DPB) 125, an entropy coding unit 130, and a MUX (Multiplexer, 165).
- the encoding unit 135 of the layer 0 includes a prediction unit 140, a transform / quantization unit 145, a filtering unit 150, a DPB 155, and an entropy coding unit 160.
- the prediction units 110 and 140 may perform inter prediction and intra prediction on the input image.
- the prediction units 110 and 140 may perform prediction in predetermined processing units.
- the performing unit of prediction may be a coding unit (CU), a prediction unit (PU), or a transform unit (TU).
- the prediction units 110 and 140 may determine whether to apply inter prediction or intra prediction in a CU unit, determine a mode of prediction in a PU unit, and perform prediction in a PU unit or a TU unit. have. Prediction performed includes generation of a prediction block and generation of a residual block (residual signal).
- a prediction block may be generated by performing prediction based on information of at least one picture of a previous picture and / or a subsequent picture of the current picture.
- prediction blocks may be generated by performing prediction based on pixel information in a current picture.
- inter prediction there are a skip mode, a merge mode, a motion vector predictor (MVP) mode method, and the like.
- a reference picture may be selected with respect to the current PU that is a prediction target, and a reference block corresponding to the current PU may be selected within the reference picture.
- the prediction units 110 and 140 may generate a prediction block based on the reference block.
- the prediction block may be generated in integer sample units or may be generated in integer or less pixel units.
- the motion vector may also be expressed in units of integer pixels or units of integer pixels or less.
- motion information that is, information such as an index of a reference picture, a motion vector, and a residual signal
- residuals may not be generated, transformed, quantized, or transmitted.
- the prediction mode may have 33 directional prediction modes and at least two non-directional modes.
- the non-directional mode may include a DC prediction mode and a planner mode (Planar mode).
- a prediction block may be generated after applying a filter to a reference sample.
- the PU may be a block of various sizes / types, for example, in the case of inter prediction, the PU may be a 2N ⁇ 2N block, a 2N ⁇ N block, an N ⁇ 2N block, an N ⁇ N block (N is an integer), or the like.
- the PU In the case of intra prediction, the PU may be a 2N ⁇ 2N block or an N ⁇ N block (where N is an integer).
- the PU of the N ⁇ N block size may be set to apply only in a specific case.
- the NxN block size PU may be used only for the minimum size CU or only for intra prediction.
- PUs such as N ⁇ mN blocks, mN ⁇ N blocks, 2N ⁇ mN blocks, or mN ⁇ 2N blocks (m ⁇ 1) may be further defined and used.
- the prediction unit 110 may perform prediction for layer 1 using the information of the layer 0.
- a method of predicting information of a current layer using information of another layer is referred to as inter-layer prediction for convenience of description.
- Information of the current layer that is predicted using information of another layer may include texture, motion information, unit information, predetermined parameters (eg, filtering parameters, etc.).
- information of another layer used for prediction for the current layer may include texture, motion information, unit information, and predetermined parameters (eg, filtering parameters).
- inter-layer motion prediction is also referred to as inter-layer inter prediction.
- prediction of a current block of layer 1 may be performed using motion information of layer 0 (reference layer or base layer).
- motion information of a reference layer may be scaled.
- inter-layer texture prediction may also be referred to as inter-layer intra prediction or intra base layer (BL) prediction.
- Inter layer texture prediction may be applied when a reference block in a reference layer is reconstructed by intra prediction.
- the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer.
- the texture of the reference block may be scaled by upsampling.
- inter-layer unit parameter prediction derives unit (CU, PU, and / or TU) information of a base layer and uses it as unit information of an enhancement layer, or based on unit information of a base layer. Unit information may be determined.
- the unit information may include information at each unit level.
- information about a partition (CU, PU and / or TU) may include information on transform, information on prediction, and information on coding.
- information on a PU partition and information on prediction (eg, motion information, information on a prediction mode, etc.) may be included.
- the information about the TU may include information about a TU partition, information on transform (transform coefficient, transform method, etc.).
- the unit information may include only the partition information of the processing unit (eg, CU, PU, TU, etc.).
- inter-layer parameter prediction may derive a parameter used in the base layer to reuse it in the enhancement layer or predict a parameter for the enhancement layer based on the parameter used in the base layer.
- interlayer prediction As an example of interlayer prediction, interlayer texture prediction, interlayer motion prediction, interlayer unit information prediction, and interlayer parameter prediction have been described. However, the interlayer prediction applicable to the present invention is not limited thereto.
- the prediction unit 110 may use interlayer residual prediction, which predicts the residual of the current layer using the residual information of another layer as interlayer prediction, and performs prediction on the current block in the current layer based on the prediction. It may be.
- the prediction unit 110 may predict the current block in the current layer by using a difference (differential image) image between the reconstructed picture of the current layer and the resampled picture of another layer as the inter-layer prediction. Inter-layer difference prediction may be performed.
- the prediction unit 110 may use interlayer syntax prediction that predicts or generates a texture of a current block using syntax information of another layer as interlayer prediction.
- the syntax information of the reference layer used for prediction of the current block may be information about an intra prediction mode, motion information, and the like.
- inter-layer syntax prediction may be performed by referring to the intra prediction mode from a block to which the intra prediction mode is applied in the reference layer and referring to motion information from the block MV to which the inter prediction mode is applied.
- the reference layer is a P slice or a B slice
- the reference block in the slice may be a block to which an intra prediction mode is applied.
- inter-layer prediction may be performed to generate / predict a texture for the current block by using an intra prediction mode of the reference block among syntax information of the reference layer.
- the prediction information of the layer 0 may be used to predict the current block while additionally using unit information or filtering parameter information of the corresponding layer 0 or the corresponding block.
- This combination of inter-layer prediction methods can also be applied to the predictions described below in this specification.
- the transform / quantization units 115 and 145 may perform transform on the residual block in transform block units to generate transform coefficients and quantize the transform coefficients.
- the transform block is a block of samples and is a block to which the same transform is applied.
- the transform block can be a transform unit (TU) and can have a quad tree structure.
- the transform / quantization units 115 and 145 may generate a 2D array of transform coefficients by performing transform according to the prediction mode applied to the residual block and the size of the block. For example, if intra prediction is applied to a residual block and the block is a 4x4 residual array, the residual block is transformed using a discrete sine transform (DST), otherwise the residual block is transformed into a discrete cosine transform (DCT). Can be converted using.
- DST discrete sine transform
- DCT discrete cosine transform
- the transform / quantization unit 115 and 145 may quantize the transform coefficients to generate quantized transform coefficients.
- the transform / quantization units 115 and 145 may transfer the quantized transform coefficients to the entropy coding units 130 and 180.
- the transform / quantization unit 145 may rearrange the two-dimensional array of quantized transform coefficients into one-dimensional arrays according to a predetermined scan order and transfer them to the entropy coding units 130 and 180.
- the transform / quantizers 115 and 145 may transfer the reconstructed block generated based on the residual and the predictive block to the filtering units 120 and 150 for inter prediction.
- the transform / quantization units 115 and 145 may skip transform and perform quantization only or omit both transform and quantization as necessary.
- the transform / quantization unit 115 or 165 may omit the transform for a block having a specific prediction method or a specific size block, or a block of a specific size to which a specific prediction block is applied.
- the entropy coding units 130 and 160 may perform entropy encoding on the quantized transform coefficients.
- Entropy encoding may use, for example, an encoding method such as Exponential Golomb, Context-Adaptive Binary Arithmetic Coding (CABAC), or the like.
- CABAC Context-Adaptive Binary Arithmetic Coding
- the filtering units 120 and 150 may apply a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) to the reconstructed picture.
- ALF adaptive loop filter
- SAO sample adaptive offset
- the deblocking filter may remove distortion generated at the boundary between blocks in the reconstructed picture.
- the adaptive loop filter may perform filtering based on a value obtained by comparing the reconstructed image with the original image after the block is filtered through the deblocking filter.
- the SAO restores the offset difference from the original image on a pixel-by-pixel basis to the residual block to which the deblocking filter is applied, and is applied in the form of a band offset and an edge offset.
- the filtering units 120 and 150 may apply only the deblocking filter, only the deblocking filter and the ALF, or may apply only the deblocking filter and the SAO without applying all of the deblocking filter, ALF, and SAO.
- the DPBs 125 and 155 may receive the reconstructed block or the reconstructed picture from the filtering units 120 and 150 and store the received reconstruction picture.
- the DPBs 125 and 155 may provide a reconstructed block or picture to the predictors 110 and 140 that perform inter prediction.
- Information output from the entropy coding unit 160 of layer 0 and information output from the entropy coding unit 130 of layer 1 may be multiplexed by the MUX 185 and output as a bitstream.
- the encoding unit 105 of the layer 1 has been described as including the MUX 165.
- the MUX is separate from the encoding unit 105 of the layer 1 and the encoding unit 135 of the layer 0. It may be a device or a module of.
- the encoding device of FIG. 1 may be implemented as an electronic device capable of capturing and encoding an image, including a camera.
- the encoding device may be implemented in or included in a personal terminal such as a television, computer system, portable telephone or tablet PC, or the like.
- FIG. 2 is a block diagram illustrating an example of interlayer prediction in an encoding apparatus that performs scalable coding according to the present invention.
- the decoding apparatus 200 includes a decoder 210 of layer 1 and a decoder 250 of layer 0.
- Layer 0 may be a base layer, a reference layer, or a lower layer
- layer 1 may be an enhancement layer, a current layer, or an upper layer.
- the decoding unit 210 of the layer 1 includes an entropy decoding unit 215, a reordering unit 220, an inverse quantization unit 225, an inverse transform unit 230, a prediction unit 235, a filtering unit 240, and a memory. can do.
- the decoding unit 250 of the layer 0 includes an entropy decoding unit 255, a reordering unit 260, an inverse quantization unit 265, an inverse transform unit 270, a prediction unit 275, a filtering unit 280, and a memory 285. ) May be included.
- the DEMUX 205 may demultiplex the information for each layer and deliver the information to the decoding device for each layer.
- the entropy decoding units 215 and 255 may perform entropy decoding corresponding to the entropy coding scheme used in the encoding apparatus. For example, when CABAC is used in the encoding apparatus, the entropy decoding units 215 and 255 may also perform entropy decoding using CABAC.
- Information for generating a prediction block among the information decoded by the entropy decoding units 215 and 255 is provided to the prediction units 235 and 275, and a residual value of which entropy decoding is performed by the entropy decoding units 215 and 255. That is, the quantized transform coefficients may be input to the reordering units 220 and 260.
- the reordering units 220 and 260 may rearrange the information of the bitstreams entropy decoded by the entropy decoding units 215 and 255, that is, the quantized transform coefficients, based on the reordering method in the encoding apparatus.
- the reordering units 220 and 260 may rearrange the quantized transform coefficients of the one-dimensional array into the coefficients of the two-dimensional array.
- the reordering units 220 and 260 may generate a two-dimensional array of coefficients (quantized transform coefficients) by performing scanning based on the prediction mode applied to the current block (transform block) and / or the size of the transform block.
- the inverse quantizers 225 and 265 may generate transform coefficients by performing inverse quantization based on the quantization parameter provided by the encoding apparatus and the coefficient values of the rearranged block.
- the inverse transform units 230 and 270 may perform inverse transform on the transform performed by the transform unit of the encoding apparatus.
- the inverse transform units 230 and 270 may perform inverse DCT and / or inverse DST on a discrete cosine transform (DCT) and a discrete sine transform (DST) performed by an encoding apparatus.
- DCT discrete cosine transform
- DST discrete sine transform
- the DCT and / or DST in the encoding apparatus may be selectively performed according to a plurality of pieces of information, such as a prediction method, a size of a current block, and a prediction direction, and the inverse transformers 230 and 270 of the decoding apparatus may perform transform information performed in the encoding apparatus. Inverse transformation may be performed based on.
- the inverse transform units 230 and 270 may apply inverse DCT and inverse DST according to a prediction mode / block size.
- the inverse transformers 230 and 270 may apply an inverse DST to a 4x4 luma block to which intra prediction is applied.
- the inverse transform units 230 and 270 may fixedly use a specific inverse transform method regardless of the prediction mode / block size.
- the inverse transform units 330 and 370 may apply only inverse DST to all transform blocks.
- the inverse transform units 330 and 370 may apply only inverse DCT to all transform blocks.
- the inverse transform units 230 and 270 may generate a residual or residual block by inversely transforming the transform coefficients or the block of the transform coefficients.
- the inverse transformers 230 and 270 may also skip the transformation as needed or in accordance with the manner encoded in the encoding apparatus. For example, the inverse transforms 230 and 270 may omit the transform for a block having a specific prediction method or a specific size or a block of a specific size to which a specific prediction block is applied.
- the prediction units 235 and 275 may perform prediction on the current block based on prediction block generation related information transmitted from the entropy decoding units 215 and 255 and previously decoded blocks and / or picture information provided by the memories 245 and 285.
- a prediction block can be generated.
- the prediction units 235 and 275 may perform intra prediction on the current block based on pixel information in the current picture.
- the prediction units 235 and 275 may perform information on the current block based on information included in at least one of a previous picture or a subsequent picture of the current picture. Inter prediction may be performed. Some or all of the motion information required for inter prediction may be derived from the information received from the encoding apparatus and correspondingly.
- the prediction block may be a reconstruction block.
- the prediction unit 235 of layer 1 may perform inter prediction or intra prediction using only information in layer 1, or may perform inter layer prediction using information of another layer (layer 0).
- the prediction unit 235 of the layer 1 may perform prediction on the current block by using one of the motion information of the layer 1, the texture information of the layer 1, the unit information of the layer 1, and the parameter information of the layer 1.
- the predictor 235 of the layer 1 may receive motion information of the layer 1 from the predictor 275 of the layer 0 to perform motion prediction.
- Inter-layer motion prediction is also called inter-layer inter prediction.
- inter-layer motion prediction prediction of a current block of a current layer (enhanced layer) may be performed using motion information of a reference layer (base layer).
- the prediction unit 335 may scale and use motion information of the reference layer when necessary.
- the predictor 235 of the layer 1 may receive texture information of the layer 0 from the predictor 275 of the layer 0 to perform texture prediction.
- Texture prediction may also be referred to as inter-layer intra prediction or intra base layer (BL) prediction. Texture prediction may be applied when the reference block of the reference layer is reconstructed by intra prediction. Alternatively, texture prediction may refer to a reference index by allocating a reference index.
- the texture of the reference block in the reference layer may be used as a prediction value for the current block of the enhancement layer.
- the texture of the reference block may be scaled by upsampling.
- the predictor 235 of the layer 1 may receive unit parameter information of the layer 0 from the predictor 275 of the layer 0 to perform unit parameter prediction.
- unit parameter prediction unit (CU, PU, and / or TU) information of the base layer may be used as unit information of the enhancement layer, or unit information of the enhancement layer may be determined based on unit information of the base layer.
- the predictor 235 of the layer 1 may perform parameter prediction by receiving parameter information regarding the filtering of the layer 0 from the predictor 275 of the layer 0.
- parameter prediction the parameters used in the base layer can be derived and reused in the enhancement layer, or the parameters for the enhancement layer can be predicted based on the parameters used in the base layer.
- the prediction information of the layer 0 may be used to predict the current block while additionally using unit information or filtering parameter information of the corresponding layer 0 or the corresponding block.
- This combination of inter-layer prediction methods can also be applied to the predictions described below in this specification.
- the adders 290 and 295 may generate a reconstruction block using the prediction blocks generated by the predictors 235 and 275 and the residual blocks generated by the inverse transformers 230 and 270.
- the adders 290 and 295 can be viewed as separate units (restore block generation unit) for generating the reconstruction block.
- Blocks and / or pictures reconstructed by the adders 290 and 295 may be provided to the filtering units 240 and 280.
- the filtering unit 240 of the layer 1 filters the reconstructed picture by using parameter information transmitted from the predicting unit 235 of the layer 1 and / or the filtering unit 280 of the layer 0. You can also do
- the filtering unit 240 may apply filtering to or between layers using the parameters predicted from the parameters of the filtering applied in the layer 0.
- the memories 245 and 285 may store the reconstructed picture or block to use as a reference picture or reference block.
- the memories 245 and 285 may output the stored reconstructed picture through a predetermined output unit (not shown) or a display (not shown).
- the decoding apparatus is configured to perform reordering, inverse quantization, and inverse transformation in order in one module of the inverse quantization / inverse transformation unit. It can also be configured.
- the prediction unit of layer 1 may be different from the interlayer prediction unit that performs prediction using information of another layer (layer 0). It may also be regarded as including an inter / intra predictor for performing prediction without using the information of).
- the decoding apparatus of FIG. 2 may be implemented as various electronic devices capable of playing back, or playing back and displaying an image.
- the decoding device may be implemented in or included in a set-top box, a television, a computer system, a portable telephone, a personal terminal such as a tablet PC, or the like.
- scalable coding In the case of encoding and decoding of a video supporting a plurality of layers in a bitstream, that is, scalable coding, since there is a strong correlation between the plurality of layers, the prediction is performed by using the correlation. Duplicate elements can be removed and the encoding performance of the image can be improved. Performing prediction of the current layer, which is a target of prediction using information of another layer, is referred to as inter-layer prediction in the following.
- Scalable video coding has the same meaning as scalable video coding from a coding point of view and scalable video decoding from a decoding point of view.
- At least one of a resolution, a frame rate, and a color format may be different from each other, and upsampling or downsampling of a layer may be performed to adjust resolution when inter-layer prediction is performed.
- FIG. 3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied.
- a GOP Group of Picture
- FIG. 3 a GOP (Group of Picture) represents a picture group, that is, a group of pictures.
- a transmission medium In order to transmit image data, a transmission medium is required, and its performance varies depending on the transmission medium according to various network environments.
- a scalable video coding method may be provided for application to such various transmission media or network environments.
- the scalable video coding method is a coding method that improves encoding / decoding performance by removing redundancy between layers by using texture information, motion information, and residual signals between layers.
- the scalable video coding method may provide various scalability in terms of spatial, temporal, and image quality according to ambient conditions such as a transmission bit rate, a transmission error rate, and a system resource.
- Scalable video coding may be performed using multiple layers structure to provide a bitstream applicable to various network situations.
- the scalable video coding structure may include a base layer that compresses and processes image data by using a general image encoding method, and compresses the image data by using the encoding information of the base layer and a general image encoding method together. May include an enhancement layer for processing.
- a layer is an image and a bit divided based on spatial (eg, image size), temporal (eg, coding order, image output order, frame rate), image quality, complexity, and the like.
- the base layer may mean a reference layer or a base layer
- the enhancement layer may mean an enhancement layer.
- the plurality of layers may have a dependency between each other.
- the base layer may be defined as a standard definition (SD), a frame rate of 15 Hz, and a 1 Mbps bit rate
- the first enhancement layer may be a high definition (HD), a frame rate of 30 Hz, and a 3.9 Mbps bit rate
- the second enhancement layer may be defined as an ultra high definition (4K-UHE), a frame rate of 60 Hz, and a bit rate of 27.2 Mbps.
- 4K-UHE ultra high definition
- the format, frame rate, bit rate, etc. are exemplary and may be determined differently as necessary.
- the number of hierarchies used is not limited to this embodiment and may be determined differently according to a situation.
- the frame rate of the first enhancement layer HD may be reduced and transmitted at 15 Hz or less.
- the scalable video coding method can provide temporal, spatial and image quality scalability by the method described above in the embodiment of FIG. 3.
- Scalable video coding has the same meaning as scalable video coding from a coding point of view and scalable video decoding from a decoding point of view.
- a bitstream including a plurality of layers is composed of Network Abstraction Layer (NAL) units that facilitate the adaptive transmission of video through a packet-switching network.
- NAL Network Abstraction Layer
- the relationship between the plurality of viewpoints is a spatial layer in video supporting the plurality of layers. Similar to the relationship between.
- the scalability information of the bitstream is very important to effectively and efficiently convert the bitstream at all nodes in the content delivery path.
- Table 1 shows an example of a NAL unit header.
- forbidden_zero_bit has a value of 1.
- nal_unit_type indicates the type of the corresponding nal unit.
- nuh_reserved_zero_6bits is an area for indicating information on another layer, that is, scalability in the future, and may include information on a layer ID for identifying the layer.
- Temporal_id having a length of 3 bits indicates a temporal layer of the video bitstream.
- the temporal layer refers to a layer of a temporally scalable bitstream composed of a video coding layer (VCL) NAL unit, and the temporal layer has a specific temporal_id value.
- VCL video coding layer
- the NAL unit header structure shown in Table 1 may also be used for coding a bitstream supporting a plurality of scalability (eg, multi-view, 3D extension).
- information about scalability in the NAL unit header for example, information such as a layer ID, may be transmitted through 6-bit nuh_reserved_zero_6bits of Table 1.
- the type, type of scalability, and this information that maps them to the layer ID can be included in the video parameter set, and video parameter set extensions for bitstreams that support scalability. Can be included.
- the present invention relates to a method for effectively describing scalability information of an image in a bitstream supporting a plurality of layers, signaling the same, and an apparatus for implementing the same.
- Table 2 shows an example for the current video parameter set.
- video_parameter_set_id is a syntax element for identifying a corresponding video parameter set referred to by other syntax elements.
- vps_temporal_id_nesting_flag indicates whether inter prediction is further limited for a coded video sequence that references a video parameter set when vps_max_sub_layers_minus1 is greater than zero. If vps_max_sub_layers_minus1 is 0, vps_temporal_id_nesting_flag should be 1. Syntax elements of vps_temporal_id_nesting_flag are used for upsampling temporal sublayers.
- the reserved_zero_2bits may be 3 in the bitstream, and a value other than 3 may be additionally used differently. In this case, the decoding unit may ignore the reserved_zero_2bits value.
- max_num_layers_minus1 plus 1 represents the maximum number of layers present in the coded video sequence referring to the video parameter set.
- a value obtained by adding 1 to vps_max_sub_layers_minus1 indicates the maximum number of temporal sublayers that may exist in a coded video sequence.
- vps_max_sub_layers_minus1 may have a value between 0 and 6.
- next_essential_info_byte_offset indicates the byte offset from the start of the NAL unit to the next set of fixed length coding information in the video parameter set NAL unit.
- Video parameter set information that is not base layer or base view starts at the assigned byte position of the video parameter set NAL unit with fixed length coded information.
- the byte offset specified by next_essential_info_byte_offset can facilitate access to essential information in the video parameter set NAL unit without the need for entropy decoding.
- This information about next_essential_info_byte_offset is essential information for fine wire negotiation and / or capacity exchange.
- vps_max_dec_pic_buffering [i] represents the maximum size of the decoded picture buffer required for the coded video sequence in the unit of the image storage buffer.
- vps_max_num_reorder_pics [i] represents the maximum allowable number of pictures that can precede any picture of the coded video sequence in decoding order and can follow in output order.
- vps_max_latency_increase [i] is a non-zero value used to calculate the maximum number of delayed pictures, which is the maximum number of pictures that can precede the decoding order and follow in the output order before any picture in the coded video sequence. Indicates.
- num_hrd_parameters represents the number of hrd_parameters () syntax elements present in the video parameter set, and num_hrd_parameters may have a value equal to or smaller than 1 in the bitstream. If the value is not equal to or smaller than 1, the decoding unit may allow other values in the range of 1 to 1024 indicated by the syntax element as the num_hrd_parameters value.
- bit_equal_to_one has a value of 1.
- vps_extension_flag 0
- the vps_extension_data_flag is a value indicating whether data for the layer extension version exists and may have any value.
- vps_extension_flag and vps_extension_data_flag may be 1, and various information about the layer may include video parameters in the bitstream, for example, video parameter sets extension. May be included and signaled.
- the information of the layer that may be included in the video parameter set extension may include all scalability that the layer may have, that is, information about the dimension, and the information about the dimension may be set using a table.
- the information signaled for the layer may include the number of dimensions of a layer, an active range of each dimension of each layer, information on layer identification, and a list of direct reference layers.
- the total number of bits for indicating the dimensions for each layer must match the number of bits allocated for signaling the layer ID signaled in the NAL unit header of Table 1. If the layer ID signaled in the NAL unit header is 6 bits, the total number of bits describing the dimensions applicable to each layer is 6.
- Table 3 below shows the expansion of the video parameter set according to an embodiment of the present invention.
- vps_extension_byte_alignment_reserved_one_bit is 1.
- a value obtained by adding 1 to num_dimensions_minus1 [i] indicates the number of dimension types and dimension IDs signaled in each layer.
- dimension_type [i] represents the j-th scalability dimension type of the i-th layer, as shown in Table 4 below.
- a dimension type means a type of scalability such as spatial scalability and quality scalability
- a dimension ID refers to a specific dimension
- An index for a layer that a type may have, and information for identifying a layer to which a specific dimension type is applied.
- the dimension type (dimension_type) may be mapped to the dimension ID (dimension_id) corresponding to the scalability ID of the layer.
- scalability types may include multi view scalability, depth scalability, spatial scalability, and quality scalability. have.
- dimension_type signaled for the i-th layer When dimension_type signaled for the i-th layer is 0, multi-view scalability is applied to the layer, and if dimension_type is 1, depth scalability is applied to the layer. If dimension_type is 2, spatial scalability is applied to the layer. If dimension_type is 3, image quality scalability is applied to the layer. According to Table 4, one layer may have up to four types of scalability.
- the dimension ID shown in Table 4 is one example of scalability that can be supported in the bitstream, the dimension ID may be added further, and the bitstream is one of the four dimensions described in Table 4. Only some may be supported. Values 4 through 15 of dimension_type may be used to describe the type of additional scalability.
- the dimension ID is a view order idx capable of identifying multi view scalability, this means that the layer is coded with a multiview coding structure.
- FIG. 4 is a diagram illustrating an example of a framework for multi-view coding.
- an image stream may be generated in all five cameras for multi-view coding, and the stream of the image generated by camera 1 becomes a base view. Images generated by the remaining cameras 2 to 5 may be coded with reference to other viewpoint images including camera 1.
- the video stream generated by the camera 3 may be another view (for example, view 2) to be predicted by referring to the base view and coded.
- the video stream generated by the camera 2 may be another view (for example, view 3) that is predicted by referring to the base view and the view 2 and coded.
- the video stream generated by the camera 5 may be another view (for example, view 4) to be predicted by referring to the base view and coded.
- the video stream generated by the camera 4 may be another view (for example, view 5) that is predicted by referring to the base view and the view 4 and coded.
- the view order idx is a value for identifying the order of the view layer in the bitstream, that is, which layer among the plurality of multiview layers.
- the view order idx may form part of the layer ID associated with the NAL unit.
- the dimension ID is a depth order idx that identifies depth scalability
- this means that the layer is coded with a 3D video coding structure.
- a depth map is used for one or more coded frames to represent a 3D picture, where depth order idx is the layer or depth order of the depth map in the coded 3D video bitstream. the depth layer / order).
- FIG. 5 is a diagram illustrating an example of a framework for coding 3D video using a depth map.
- the 3D image may be coded using a plurality of depth maps, and the base layer may be referred to by an upper layer that may be expressed in depth order 1 and depth order 2.
- a layer indicated by depth order N may have a depth dependency on a layer having a lower order than N.
- the depth order idx is a value that identifies a layer of the depth map or a depth order, that is, which layer among the plurality of depth map layers.
- the depth order idx may form part of the layer ID associated with the NAL unit.
- the dimension ID is a dependency ID capable of identifying spatial scalability, this means that an upper layer refers to a coded image of a lower layer to perform prediction and spatial scalability coding.
- FIG. 6 is a diagram illustrating an example of a framework for spatial scalability coding.
- each spatial scalability layer is composed of a lower layer and an upper layer having a larger spatial resolution (eg, picture width or picture height) than the lower layer.
- the layer with dependency ID N may be a base layer, and the layer with dependency ID N + 1 may be coded by using coded image information of the base layer as an upper layer having a higher resolution than the layer with dependency ID N. have.
- the dependency ID indicates a spatial layer order in the bitstream and may form part of a layer ID associated with the NAL unit.
- the dimension ID is a quality ID capable of identifying quality scalability
- FIG. 7 is a diagram illustrating an example of a framework for image quality scalability coding.
- each quality scalability layer consists of a lower layer and a higher layer that has the same spatial resolution as the lower layer (for example, picture width or picture height) but is visually better. It is.
- a layer having a quality ID of N may be a base layer, and a layer having a quality ID of N + 1 may be coded using a base layer as an upper layer having improved image quality than a layer having a quality ID of N.
- the quality ID indicates the order of the quality scalability layers in the bitstream and may constitute a part of the layer ID associated with the NAL unit.
- the image quality scalability may be applied to the same codec structure as the spatial scalability, and in this case, the image quality scalability and the spatial scalability may be represented by one ID.
- the bitstream may include a layer supporting various scalability, and the scalability may include information about a dimension type and information about a dimension ID for identifying a layer for the corresponding dimension mapped thereto. It can be expressed as.
- the dimension type may be signaled from the encoding apparatus to the decoding apparatus, and the mapping relationship between the dimension type and the dimension ID may be previously set with the encoding apparatus and the decoding apparatus through a predetermined table.
- dimension_len [i] represents the length, i.e., the number of bits, for the i th dimension ID, and the value of dimension_len [i] can be a value in the range 1-6, and the sum of dimension_len [i] for the i th layer. Should be 6.
- dimension_len [i] may be signaled with syntax elements such as dimension_id_len or dimension_id_len_minus1. In this case, the sum of the values specified by dimension_id_len or dimension_id_len_minus1 should be 6.
- the information representing the dimension of the i th layer must be mapped to the layer ID included in the NAL unit header to identify the layer. Therefore, the total sum of the length of the dimension ID representing the dimension, dimension_len [i] It must be less than or equal to the length of the layer ID.
- dimension_len [i] may have a value of 1 to 6, and the total sum of dimension_len [i] should be 6 bits.
- vps_layer_id [i] represents the layer ID of the i-th layer to which dependency information is applied, and each bit of vps_layer_id [i] may be configured as follows.
- the layer ID information included in the video parameter set may be the same as information identifying a layer included in the NAL unit header.
- num_direct_ref_layers [i] indicates the number of layers directly referenced by the i th layer.
- ref_layer_id [i] [j] is information for identifying the j th layer directly referenced by the i th layer.
- the bitstream supports spatial and image quality scalability, and there is a base layer and a first enhancement layer that refers to the image quality scalability, and the base layer and the first enhancement layer corresponding to the spatial scalability.
- a second enhancement layer exists. In this case, it may be signaled as follows.
- the first dimension_type [0] is 2, referring to Table 4, it can be confirmed that spatial scalability is supported because the dimension ID is dependency ID.
- the second dimension_type [1] is 3, referring to Table 4, it can be confirmed that the quality scalability is supported because the dimension ID is the quality ID.
- the dimension_length [1] 3 indicates that the length of the dimension ID indicating the image quality scalability is 3 bits.
- the layer ID transmitted in the bitstream is 6 which is the sum of dimension_length, and the number of bits of vps_layer_id [i] is 6.
- the vps_layer_id may not be signaled.
- vps_layer_id [1] 1 indicates that vps_layer_id [1] of the first layer is 1, which is signaled as a bit string "000001".
- the first three bits (000) of "000001" may be dependency ID indicating a spatial dimension, and the following three bits (001) may mean a quality ID indicating an image quality dimension.
- vps_layer_id [2] 16 indicates that the vps_layer_id [2] of the second layer is 16, which is signaled in the bit string "001000".
- the first three bits 001 of "001000" may be a dependency ID indicating a spatial dimension, and the following three bits (000) may mean a quality ID indicating an image quality dimension.
- the second layer and the first layer directly refer to the same 0th layer.
- vps_layer_id [3] 17 indicates that vps_layer_id [3] of the third layer is 17, which is signaled in the bit string "001001".
- the first three bits 001 of "001001" may be a dependency ID indicating a spatial dimension, and the following three bits 001 may mean a quality ID indicating an image quality dimension.
- vps_layer_id [4] 32 indicates that vps_layer_id [4] of the fourth layer is 32, which is signaled as a bit string "010000".
- the first three bits 010 of "010000" may be a dependency ID indicating a spatial dimension, and the following three bits (000) may mean a quality ID indicating an image quality dimension.
- vps_layer_id [5] 33 indicates that vps_layer_id [5] of the fifth layer is 33, which is signaled as a bit string "010001".
- the first three bits 010 of "010001" may be a dependency ID indicating a spatial dimension, and the following three bits (000) may mean a quality ID indicating an image quality dimension.
- Table 5 shows the expansion of the video parameter set according to another embodiment of the present invention.
- vps_extension_byte_alignment_reserved_one_bit is 1.
- a value obtained by adding 1 to num_dimensions_minus1 [i] indicates the number of dimension types and dimension IDs signaled in each layer.
- dimension_type [i] indicates the j-th scalability dimension type of the i-th layer as shown in Table 4.
- dimension_len [i] represents the length, that is, the number of bits, for the dimension ID of the i-th layer, and the value of dimension_len [i] may be a value ranging from 1 to 8.
- dimension_len [i] may be signaled with syntax elements such as dimension_id_len or dimension_id_len_minus1.
- vps_layer_id [i] represents the layer ID of the i-th layer to which dependency information is applied, and each bit of vps_layer_id [i] may be configured as follows.
- the number of bits of vps_layer_id [i] may have a sum of bits of sum of a value specified by dimension_len [i] or dimension_id_len_minus1 plus 1, that is, value specified by dimension_len [i] or dimension_id_len_minus1.
- the layer ID information included in the video parameter set may be the same as information identifying a layer included in the NAL unit header.
- num_direct_ref_layers [i] indicates the number of layers directly referenced by the i th layer.
- ref_layer_id [i] [j] is information for identifying the j th layer directly referenced by the i th layer.
- vps_layer_id [i] in Table 5 is not a fixed bit in length.
- the descriptor for vps_layer_id [i] is u (n), which means using n bits of an integer for information transfer, where n can be changed depending on other syntax values.
- the number of bits of the layer ID and vps_layer_id [i] for identifying a layer may be determined according to the total sum of dimension_len [i].
- each dimension_len [i] may have a value of 3 bits, that is, a maximum of 8.
- Table 6 shows the expansion of the video parameter set according to another embodiment of the present invention.
- vps_extension_byte_alignment_reserved_one_bit is 1.
- a value obtained by adding 1 to num_dimensions_minus1 [i] indicates the number of dimension types and dimension IDs signaled in each layer.
- dimension_type [i] indicates the j-th scalability dimension type of the i-th layer as shown in Table 4.
- dimension_len [i] represents the length, that is, the number of bits, for the dimension ID of the i-th layer, and the value of dimension_len [i] may be a value ranging from 1 to 8.
- vps_layer_id [i] represents the layer ID of the i-th layer to which dependency information is applied, and each bit of vps_layer_id [i] may be configured as follows.
- the layer ID information included in the video parameter set may be the same as information identifying a layer included in the NAL unit header.
- num_direct_ref_layers [i] indicates the number of layers directly referenced by the i th layer.
- ref_layer_id [i] [j] is information for identifying the j th layer directly referenced by the i th layer.
- ue (v) may be allocated as a descriptor for dimension_len [i].
- ue (v) represents a syntax element that is encoded based on an exponential golem method, which indicates that information is coded according to an encoding method that adaptively determines a bit length, not fixed length encoding.
- the exponential gollum coding scheme When the exponential gollum coding scheme is applied, the length of bits may be variably determined according to the exponential gollum code, and thus the number of bits used to indicate dimension_len [i] may be variable.
- vps_layer_id [i] may also have a variable value.
- FIG. 8 is a control block diagram illustrating a video encoding apparatus according to an embodiment of the present invention.
- the encoding apparatus includes a first encoding unit 810, a second encoding unit 820, and an information generating unit 830.
- the first encoding unit 810 may correspond to the encoding unit 135 for encoding layer 0 in the video encoding apparatus of FIG. 1, and the second encoding unit 820 may correspond to the layer 1 in the video encoding apparatus of FIG. 1. Corresponding to the encoding unit 105 for encoding.
- the first encoding unit 810 and the second encoding unit 820 perform prediction, transformation, and entropy coding on an image for each layer, which is similar to the description of the encoding apparatus described with reference to FIG. 1. Omit them.
- the encoding apparatus may encode three or more layers instead of two layers, and in this case, may further include a third encoding unit and a fourth encoding unit.
- the information generator 830 generates information on the scalability of the layer when the layers are encoded by the encoders 810 and 820.
- the information generator 830 may be a partial configuration included in the first encoding unit 810 or may be a configuration that may be included in the second encoding unit 820.
- the information generator 830 may be designed to be included in each of the encoders 810 and 820. That is, for convenience of description, the information generator 830 is illustrated in an independent configuration of FIG. 8, but the physical structure and location of the information generator 830 is not limited to FIG. 8.
- the information generator 830 may generate the number of the types of the dimension, the dimension type indicating the scalability type, the information indicating the length of the dimension ID, the dimension ID, the layer ID, and the like.
- the mapping relationship between the dimension type and the dimension ID may be generated based on a predetermined table.
- the number of bits of the layer ID may be the length of the dimension ID, that is, the sum of the number of bits of the dimension ID. For example, if the layer ID is 6 bits, the total number of bits of the dimension ID of the layer is 6 bits. do.
- the information generator 830 may generate information about the number of layers directly referenced by the corresponding layer and the reference layer ID for identifying the reference layer.
- the information generated by the information generator 830 is transmitted to the video decoding apparatus in the form of a bitstream through an encoding process similarly to other information.
- FIG. 9 is a control block diagram illustrating a video decoding apparatus according to an embodiment of the present invention.
- the decoding apparatus includes an information grasping unit 910, a first decoding unit 920, and a second decoding unit 930.
- the information determiner 910 grasps inter-layer scalability information when a layer is encoded in each decoder 920 or 930 based on a bitstream received from a video encoding apparatus.
- the information grasping unit 930 may be implemented as a parser that parses a bitstream, or may be implemented as an entropy decoder that entropy decodes the bitstream.
- the information grasping unit 910 may be a partial configuration included in the first decoding unit 920 or may be a configuration that may be included in the second decoding unit 930. Alternatively, the information grasping unit 910 may be designed as a plurality of components included in each of the decoding units 920 and 930. That is, for convenience of description, the information grasping unit 910 is illustrated as an independent configuration in FIG. 9, but the physical structure and location of the information grasping unit 910 are not limited to FIG. 9.
- the information received from the encoding apparatus and grasped by the information grasping unit 910 includes the number of dimension types, the dimension type indicating the scalability type, the information indicating the length of the dimension ID, the dimension ID, the layer ID, and the like. can do.
- the mapping relationship between the dimension type and the dimension ID may be grasped based on a predetermined table.
- the information grasping unit 910 may also receive and grasp information about the number of layers directly referenced by the corresponding layer and the reference layer ID for identifying the reference layer.
- the inter-layer scalability information identified by the information determiner 910 is transferred to the decoders 920 and 930, and the decoders 920 and 930 may perform inter-layer prediction and reconstruction based on the scalability information. Can be.
- the first decoding unit 920 may correspond to the decoding unit 135 for decoding the layer 0 in the video decoding apparatus of FIG. 2, and the second decoding unit 930 may correspond to the layer 1 in the video decoding apparatus of FIG. 2. It may correspond to the decoding unit 105 for decoding.
- first decoding unit 920 and the second decoding unit 930 entropy decoding, inverse transform, prediction, and reconstruction of the image of each layer are performed, which is accompanied by the description of the decoding apparatus described with reference to FIG. 2. It is omitted because it is so.
- the decoding apparatus may perform decoding on three or more layers instead of two layers.
- the decoding apparatus may further include a third decoding unit and a fourth decoding unit.
- FIG. 10 is a control flowchart illustrating a method of encoding image information according to the present invention.
- the encoding apparatus encodes information about a video parameter set including information on a plurality of scalabilities (S81001).
- the information about the plurality of scalabilities included in the video parameter set may include the number of types of dimensions, the dimension type indicating the scalability type, the information indicating the length of the dimension ID, the dimension ID, the layer ID, and the like. Can be.
- the dimension ID refers to an index of a layer that a specific dimension type may have.
- the mapping relationship between the dimension type and the dimension ID may be set through a table such as a lookup table.
- the number of bits of the layer ID may be the length of the dimension ID, that is, the sum of the number of bits of the dimension ID. For example, if the layer ID is 6 bits, the total number of bits of the dimension ID of the layer is 6 bits. do.
- the video parameter set may include information about the number of layers directly referenced by the corresponding layer and the reference layer ID for identifying the reference layer.
- the encoding apparatus transmits information about the encoded video parameter set in the bitstream (S1002).
- FIG. 11 is a control flowchart illustrating a decoding method of image information according to the present invention.
- the decoding apparatus receives information about a video parameter set extension including information about a plurality of scalabilities (S1101).
- the decoding apparatus parses the video parameter set to grasp the information of the scalability included in the bitstream, that is, derives the scalability information (S1102). Parsing of the image information may be performed by an entropy decoder or a separate parser.
- the information about the plurality of scalabilities included in the video parameter set may include the number of types of dimensions, the dimension type indicating the scalability type, the information indicating the length of the dimension ID, the dimension ID, the layer ID, and the like. Can be.
- the dimension ID refers to an index of a layer that a specific dimension type may have.
- the mapping relationship between the dimension type and the dimension ID may be set through a table, and the decoding apparatus may determine scalability using the table.
- the number of bits of the layer ID may be the length of the dimension ID, that is, the sum of the number of bits of the dimension ID. For example, if the layer ID is 6 bits, the total number of bits of the dimension ID of the layer is 6 bits. do.
- the mapping method between the layer ID and the scalability dimension ID is indicated by indicating the relationship between the layer ID and the scalability dimension ID in the bitstream supporting multiple scalability. There may be a first method for notifying and a second method for notifying which dimension type exists in the allocated bits by partitioning or splicing the bits of the layer ID.
- the decoding apparatus may receive the dimension ID.
- the decoding apparatus may determine how many bits of the layer ID information corresponds to the dimension information through information indicating the length of the dimension ID, and determine the layer ID and the dimension by identifying the dimension ID corresponding to the number of bits. You can map mental IDs.
- the dimension ID indicating the multiview scalability is signaled as 3 bits of information
- the dimension ID indicating the spatial scalability is It can be signaled with 2 bits of information.
- the decoding apparatus when mapping the layer ID and the dimension ID in a second method of notifying which dimension type exists in the allocated bits by splitting the number of bits of the layer ID, divides the layer ID and directly It can also signal the dimension ID. In this case, the dimension ID may not be signaled to the decoding device.
- Multi-view scalability and spatial scalability can be divided into 3 bits to form a layer ID.
- the view ID may be "001" and the dependency ID may be identified as "010".
- the two methods may be distinguished through flag information indicating whether the number of bits of the layer ID is divided and used.
- the flag information indicating whether the bit number of the layer ID is divided and used is 0, and in the second method, the flag information indicating whether the bit number of the layer ID is divided and used may be 1. have.
- the video parameter set may include information about the number of layers directly referenced by the corresponding layer and the reference layer ID for identifying the reference layer.
- a method for describing a layer and a device using the same by describing scalability information in a bitstream, and in particular, matching dimension information and layer information with each other.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims (16)
- 복수의 레이어를 포함하는 비트스트림을 디코딩하는 영상의 디코딩 방법에 있어서,레이어의 스케일러빌러티를 식별하는 디멘전 타입 및 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID의 길이에 대한 정보를 포함하는 비디오 파라미터 세트를 수신하는 단계와;상기 비디오 파라미터 세트를 파싱하여 상기 비트스트림에 포함되어 있는 스케일러빌러티의 정보를 유도하는 단계를 포함하는 것을 특징으로 하는 디코딩 방법.
- 상기 디멘전 타입의 개수, 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID, 레이어 ID 중 적어도 하나를 더 수신하는 단계를 포함하는 것을 특징으로 하는 디코딩 방법.
- 제1항에 있어서,상기 디멘전 타입과 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID는 기설정된 표를 참조하여 파악될 수 있는 것을 특징으로 하는 디코딩 방법.
- 제2항에 있어서,i번째 레이어의 대한 상기 디멘전 ID의 길이의 합은 상기 i번째 레이어에 대한 상기 레이어 ID의 비트수와 같은 것을 특징으로 하는 디코딩 방법.
- 제2항에 있어서,i번째 레이어의 대한 상기 디멘전 ID의 길이의 합은 6인 것을 특징으로 하는 디코딩 방법.
- 제1항에 있어서,상기 디멘전 타입은 다시점 스케일러빌러티 (multi view scalability), 깊이 스케일러빌러티 (depth scalability), 공간적 스케일러빌러티 (spatial scalability) 및 화질 스케일러빌러티 (quality scalability) 중 적어도 하나인 것을 특징으로 하는 디코딩 방법.
- 상기 디멘전의 타입의 개수가 2인 경우, 상기 디멘전 ID의 길이에 대한 정보는 하나만 수신되는 것을 특징으로 하는 디코딩 방법.
- 제2항에 있어서,상기 레이어 ID의 비트수를 분할하여 상기 디멘전 ID를 지시하는지 여부를 나타내는 플래그 정보를 수신하는 단계를 더 포함하고,상기 디멘전 ID는 상기 플래그 정보가 0의 값을 가지는 경우 수신되는 것을 특징으로 하는 디코딩 방법.
- 복수의 레이어를 포함하는 비트스트림을 디코딩하는 영상의 디코딩 장치에 있어서,레이어의 스케일러빌러티를 식별하는 디멘전 타입 및 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID의 길이에 대한 정보를 포함하는 비디오 파라미터 세트를 파싱하여 스케일러빌러티 정보를 유도하는 정보 파악부와;상기 스케일러빌러티 정보를 이용하여 상위 레이어의 영상을 복원하는 상위 레이어 디코딩부를 포함하는 것을 특징으로 하는 디코딩 장치.
- 제9항에 있어서,상기 정보 파악부는 디멘전 타입의 개수, 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID, 레이어 ID 중 적어도 하나를 더 수신하여 파악하는 것을 특징으로 하는 디코딩 장치.
- 제9항에 있어서,상기 디멘전 타입과 상기 디멘전 타입이 적용되는 레이어를 식별하는 디멘전 ID는 기설정된 표를 참조하여 파악될 수 있는 것을 특징으로 하는 디코딩 장치.
- 제10항에 있어서,i번째 레이어의 대한 상기 디멘전 ID의 길이의 합은 상기 i번째 레이어에 대한 상기 레이어 ID의 비트수와 같은 것을 특징으로 하는 디코딩 장치.
- 제10항에 있어서,i번째 레이어의 대한 상기 디멘전 ID의 길이의 합은 6인 것을 특징으로 하는 디코딩 장치.
- 제9항에 있어서,상기 디멘전 타입은 다시점 스케일러빌러티 (multi view scalability), 깊이 스케일러빌러티 (depth scalability), 공간적 스케일러빌러티 (spatial scalability) 및 화질 스케일러빌러티 (quality scalability) 중 적어도 하나를 포함하는 것을 특징으로 하는 디코딩 장치.
- 제10항에 있어서,상기 디멘전의 타입의 개수가 2인 경우, 상기 디멘전 ID의 길이에 대한 정보는 하나만 수신되는 것을 특징으로 하는 디코딩 장치.
- 제10항에 있어서,상기 정보 파악부는 상기 레이어 ID의 비트수를 분할하여 상기 디멘전 ID를 지시하는지 여부를 나타내는 플래그 정보를 더 수신하고,상기 디멘전 ID는 상기 플래그 정보가 0의 값을 가지는 경우 수신되는 것을 특징으로 하는 디코딩 장치.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/421,736 US20150288976A1 (en) | 2012-09-09 | 2013-09-09 | Image decoding method and apparatus using same |
KR1020157001217A KR20150054752A (ko) | 2012-09-09 | 2013-09-09 | 영상 복호화 방법 및 이를 이용하는 장치 |
EP13835268.7A EP2876882A4 (en) | 2012-09-09 | 2013-09-09 | IMAGE DECODING METHOD AND APPARATUS USING THE SAME |
JP2015531013A JP5993092B2 (ja) | 2012-09-09 | 2013-09-09 | 映像復号化方法及びそれを利用する装置 |
CN201380046836.9A CN104620585A (zh) | 2012-09-09 | 2013-09-09 | 图像解码方法和使用其的装置 |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261698711P | 2012-09-09 | 2012-09-09 | |
US61/698,711 | 2012-09-09 | ||
US201261700322P | 2012-09-12 | 2012-09-12 | |
US61/700,322 | 2012-09-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014038906A1 true WO2014038906A1 (ko) | 2014-03-13 |
Family
ID=50237429
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2013/008120 WO2014038906A1 (ko) | 2012-09-09 | 2013-09-09 | 영상 복호화 방법 및 이를 이용하는 장치 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20150288976A1 (ko) |
EP (1) | EP2876882A4 (ko) |
JP (1) | JP5993092B2 (ko) |
KR (1) | KR20150054752A (ko) |
CN (1) | CN104620585A (ko) |
WO (1) | WO2014038906A1 (ko) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020014256A (ja) * | 2014-08-07 | 2020-01-23 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
WO2021185278A1 (en) * | 2020-03-17 | 2021-09-23 | Huawei Technologies Co., Ltd. | An encoder, a decoder and corresponding methods |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10805605B2 (en) * | 2012-12-21 | 2020-10-13 | Telefonaktiebolaget Lm Ericsson (Publ) | Multi-layer video stream encoding and decoding |
KR20140087971A (ko) * | 2012-12-26 | 2014-07-09 | 한국전자통신연구원 | 계층적 비디오 부호화에서 다중참조계층을 적용한 화면간 부/복호화 방법 및 그 장치 |
US9942545B2 (en) * | 2013-01-03 | 2018-04-10 | Texas Instruments Incorporated | Methods and apparatus for indicating picture buffer size for coded scalable video |
KR20140122202A (ko) * | 2013-04-05 | 2014-10-17 | 삼성전자주식회사 | 계층 식별자 확장에 따른 비디오 스트림 부호화 방법 및 그 장치, 계층 식별자 확장에 따른 따른 비디오 스트림 복호화 방법 및 그 장치 |
US10075729B2 (en) * | 2013-07-15 | 2018-09-11 | Qualcomm Incorporated | Signaling of view ID bit depth in parameter sets |
JP6212212B2 (ja) * | 2013-10-11 | 2017-10-11 | ヴィド スケール インコーポレイテッド | Hevc拡張のための高レベル構文 |
US10187641B2 (en) | 2013-12-24 | 2019-01-22 | Kt Corporation | Method and apparatus for encoding/decoding multilayer video signal |
WO2015125489A1 (en) * | 2014-02-24 | 2015-08-27 | Sharp Kabushiki Kaisha | Restrictions on signaling |
US10708606B2 (en) * | 2014-03-24 | 2020-07-07 | Kt Corporation | Multilayer video signal encoding/decoding method and device |
CN106233736B (zh) * | 2014-04-25 | 2020-06-05 | 索尼公司 | 发送设备、发送方法、接收设备以及接收方法 |
KR101741212B1 (ko) * | 2015-08-25 | 2017-05-29 | 삼성에스디에스 주식회사 | 3차원 오브젝트의 단면 이미지 송신 시스템 및 방법과 이를 수행하기 위한 송신 장치 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060233242A1 (en) * | 2005-04-13 | 2006-10-19 | Nokia Corporation | Coding of frame number in scalable video coding |
KR20080114388A (ko) * | 2007-06-27 | 2008-12-31 | 삼성전자주식회사 | 스케일러블 영상 부호화장치 및 방법과 그 영상 복호화장치및 방법 |
US20090103615A1 (en) * | 2006-05-05 | 2009-04-23 | Edouard Francois | Simplified Inter-layer Motion Prediction for Scalable Video Coding |
KR20090066176A (ko) * | 2007-12-18 | 2009-06-23 | 한국전자통신연구원 | 사용자 선호도를 이용한 svc 비디오의 일반화된 fgs데이터 추출 장치 및 방법 |
KR20100005225A (ko) * | 2007-04-24 | 2010-01-14 | 노키아 코포레이션 | 미디어 파일들에서의 다중 디코딩 시각들을 시그날링 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100885443B1 (ko) * | 2005-04-06 | 2009-02-24 | 엘지전자 주식회사 | 레이어간 예측방식를 사용해 엔코딩된 영상신호를디코딩하는 방법 |
KR20070074453A (ko) * | 2006-01-09 | 2007-07-12 | 엘지전자 주식회사 | 영상 신호의 인코딩 및 디코딩 방법 |
CN101455082B (zh) * | 2006-03-30 | 2013-02-13 | Lg电子株式会社 | 用于解码/编码视频信号的方法和装置 |
WO2012096981A1 (en) * | 2011-01-14 | 2012-07-19 | Vidyo, Inc. | Improved nal unit header |
US9591318B2 (en) * | 2011-09-16 | 2017-03-07 | Microsoft Technology Licensing, Llc | Multi-layer encoding and decoding |
KR20130116782A (ko) * | 2012-04-16 | 2013-10-24 | 한국전자통신연구원 | 계층적 비디오 부호화에서의 계층정보 표현방식 |
-
2013
- 2013-09-09 US US14/421,736 patent/US20150288976A1/en not_active Abandoned
- 2013-09-09 WO PCT/KR2013/008120 patent/WO2014038906A1/ko active Application Filing
- 2013-09-09 KR KR1020157001217A patent/KR20150054752A/ko not_active Application Discontinuation
- 2013-09-09 EP EP13835268.7A patent/EP2876882A4/en not_active Ceased
- 2013-09-09 JP JP2015531013A patent/JP5993092B2/ja not_active Expired - Fee Related
- 2013-09-09 CN CN201380046836.9A patent/CN104620585A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060233242A1 (en) * | 2005-04-13 | 2006-10-19 | Nokia Corporation | Coding of frame number in scalable video coding |
US20090103615A1 (en) * | 2006-05-05 | 2009-04-23 | Edouard Francois | Simplified Inter-layer Motion Prediction for Scalable Video Coding |
KR20100005225A (ko) * | 2007-04-24 | 2010-01-14 | 노키아 코포레이션 | 미디어 파일들에서의 다중 디코딩 시각들을 시그날링 |
KR20080114388A (ko) * | 2007-06-27 | 2008-12-31 | 삼성전자주식회사 | 스케일러블 영상 부호화장치 및 방법과 그 영상 복호화장치및 방법 |
KR20090066176A (ko) * | 2007-12-18 | 2009-06-23 | 한국전자통신연구원 | 사용자 선호도를 이용한 svc 비디오의 일반화된 fgs데이터 추출 장치 및 방법 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020014256A (ja) * | 2014-08-07 | 2020-01-23 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
WO2021185278A1 (en) * | 2020-03-17 | 2021-09-23 | Huawei Technologies Co., Ltd. | An encoder, a decoder and corresponding methods |
Also Published As
Publication number | Publication date |
---|---|
EP2876882A4 (en) | 2016-03-09 |
JP5993092B2 (ja) | 2016-09-14 |
EP2876882A1 (en) | 2015-05-27 |
US20150288976A1 (en) | 2015-10-08 |
CN104620585A (zh) | 2015-05-13 |
KR20150054752A (ko) | 2015-05-20 |
JP2015531556A (ja) | 2015-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014038906A1 (ko) | 영상 복호화 방법 및 이를 이용하는 장치 | |
WO2014003379A1 (ko) | 영상 디코딩 방법 및 이를 이용하는 장치 | |
WO2014092407A1 (ko) | 영상의 디코딩 방법 및 이를 이용하는 장치 | |
WO2021040492A1 (ko) | 비디오/영상 코딩 시스템에서 변환 계수 코딩 방법 및 장치 | |
WO2015056941A1 (ko) | 다계층 기반의 영상 부호화/복호화 방법 및 장치 | |
WO2021040487A1 (ko) | 영상 코딩 시스템에서 레지듀얼 데이터 코딩에 대한 영상 디코딩 방법 및 그 장치 | |
WO2020213867A1 (ko) | 스케일링 리스트 데이터의 시그널링 기반 비디오 또는 영상 코딩 | |
WO2020256482A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2013168952A1 (ko) | 인터 레이어 예측 방법 및 이를 이용하는 장치 | |
WO2021066618A1 (ko) | 변환 스킵 및 팔레트 코딩 관련 정보의 시그널링 기반 영상 또는 비디오 코딩 | |
WO2021010680A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2021054783A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2013169049A1 (ko) | 인터 레이어 예측 방법 및 이를 이용하는 장치 | |
WO2021182802A1 (ko) | 비디오 또는 영상 코딩 시스템에서의 타일과 관련된 정보 및 슬라이스와 관련된 정보에 기반한 영상 코딩 방법 | |
WO2022039513A1 (ko) | Cpi sei 메시지에 기반한 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체 | |
WO2021241963A1 (ko) | 비디오 또는 영상 코딩 시스템에서의 poc 정보 및 비-참조 픽처 플래그에 기반한 영상 코딩 방법 | |
WO2021201548A1 (ko) | 영상 디코딩 방법 및 그 장치 | |
WO2021235759A1 (ko) | 비디오 또는 영상 코딩 시스템에서의 다중 레이어 기반 영상 코딩 방법 | |
WO2021066609A1 (ko) | 변환 스킵 및 팔레트 코딩 관련 고급 문법 요소 기반 영상 또는 비디오 코딩 | |
WO2021125701A1 (ko) | 인터 예측 기반 영상/비디오 코딩 방법 및 장치 | |
WO2021112479A1 (ko) | 비디오 또는 영상 코딩 시스템에서의 서브 픽처와 관련된 정보 및 직사각형 슬라이스와 관련된 정보에 기반한 영상 코딩 방법 | |
WO2021086149A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2021025528A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2021054779A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 | |
WO2021060827A1 (ko) | 변환에 기반한 영상 코딩 방법 및 그 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13835268 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20157001217 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14421736 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013835268 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2015531013 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |