WO2013187698A1

WO2013187698A1 - Image decoding method and apparatus using same

Info

Publication number: WO2013187698A1
Application number: PCT/KR2013/005207
Authority: WO
Inventors: 헨드리헨드리; 김정선; 김철근; 전병문; 정상오
Original assignee: 엘지전자 주식회사
Priority date: 2012-06-12
Filing date: 2013-06-12
Publication date: 2013-12-19
Also published as: US20180027253A1; KR20160065217A; KR20200077615A; US10448039B2; KR102028527B1; US9794582B2; CN107087205A; US20150085938A1; CN107087204A; CN107071493A; KR102079803B1; KR20190114021A; KR20150027024A; KR102127370B1; KR101984181B1; KR101626522B1; US11546622B2; US20180255312A1; US10863187B2; CN107087203A

Abstract

The present invention includes an image information decoding method which comprises: a step of receiving a bitstream that includes a network abstraction layer (NAL) unit including information related to an encoded image; and a step of parsing an NAL unit header of the NAL unit. The NAL unit header includes layer information including reserved_one_5bits for identifying an extended layer in an extended bitstream and temporal_id for identifying a temporal layer of a bitstream. The reserved_one_5bits of the layer information is received prior to the temporal_id of the layer information. Thus, a method for describing scalability information in a hierarchical bitstream is provided.

Description

Image decoding method and apparatus using same

The present invention relates to video compression techniques, and more particularly, to a method and apparatus for decoding image information in a bitstream.

Recently, the demand for high resolution and high quality images is increasing in various applications. As an image has a high resolution and high quality, the amount of information on the image also increases.

As information volume increases, devices with various performances and networks with various environments are emerging. With the emergence of devices of varying performance and networks of different environments, the same content is available in different qualities.

In detail, as the video quality of the terminal device can be supported and the network environment is diversified, in general, video of general quality may be used in one environment, but higher quality video may be used in another environment. .

For example, a consumer who purchases video content on a mobile terminal can view the same video content on a larger screen and at a higher resolution through a large display in the home.

In recent years, broadcasts having a high definition (HD) resolution are being serviced, and many users are already accustomed to high resolution and high definition video.Ultra High Definition (UHD) has more than four times the resolution of an HDTV. I am also interested in the services of the company.

Therefore, in order to provide various video services required by users in various environments according to the quality, based on a high-efficiency encoding / decoding method for high-capacity video, the quality of the image, for example, the image quality, the resolution of the image, the size of the image, It is necessary to provide scalability in the frame rate of video and the like. In addition, various image processing methods associated with such scalability should be discussed.

It is an object of the present invention to provide a method and apparatus for describing scalability information in a hierarchical bitstream.

The present invention provides a method and apparatus for representing scalability information of a bitstream in a flexible manner.

It is also an object of the present invention to provide a method and apparatus for simplifying a video coding layer type in a bitstream.

An embodiment of the present invention includes receiving a bitstream including a Network Abstraction Layer (NAL) unit including information related to an encoded image, and parsing a NAL unit header of the NAL unit. The NAL unit header includes layer information including reserved_one_5bits for identifying the extended layer in the extended bitstream and temporal_id for identifying a temporal layer of the bitstream, wherein the layer information is reserved_one_5bits to be received before temporal_id. Can be.

reserved_one_5bits and temporal_id may be parsed simultaneously.

Another embodiment of the present invention includes receiving a bitstream including a network abstraction layer (NAL) unit including information related to an encoded image, and parsing a NAL unit header of the NAL unit. The NAL unit header includes NAL unit type information corresponding to the NAL unit type, and the NAL unit type is a picture that becomes a random access point when a group of pictures has an open structure. A random access (PIC) picture type and a coded picture are spliced or the bitstream is broken in the middle, and may include a BLA (Broken link access) picture type existing in the middle of the bitstream as a random access point.

In this case, the CAR picture may have one NAL unit type irrespective of a leading picture that is output before a picture that becomes a random access point and decoded later.

In addition, a leading picture which is output before a picture which becomes a random access point and subsequently decoded after the BLA picture may not exist.

After the BLA picture, a leading picture which is output before a picture which becomes a random access point and which is decoded later may not have a leading picture which is not decoded and removed from the leading picture.

According to one embodiment of the present invention, a method and apparatus for describing scalability information in a hierarchical bitstream is provided.

In addition, according to an embodiment of the present invention, a method and apparatus for representing scalability information of a bitstream in a flexible manner are provided.

Furthermore, according to an embodiment of the present invention, a method and apparatus for simplifying a video coding layer type in a bitstream are provided.

1 is a block diagram schematically illustrating a video encoding apparatus according to an embodiment of the present invention.

2 is a block diagram schematically illustrating a video decoding apparatus according to an embodiment of the present invention.

3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied.

4 is a diagram illustrating a hierarchical structure for coded images processed in the decoding apparatus of the present invention.

5 is a diagram for describing a picture that can be randomly accessed.

6 is a diagram for explaining an IDR picture.

7 is a diagram for explaining a CRA picture.

8 is a diagram for explaining that a CRA picture is changed to a BLA picture according to an embodiment of the present invention.

9 is a control flowchart illustrating a method of encoding video information according to the present invention.

10 is a control flowchart for explaining a method of decoding image information according to the present invention.

As the present invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the invention to the specific embodiments. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the spirit of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described on the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

On the other hand, each of the components in the drawings described in the present invention are shown independently for the convenience of description of the different characteristic functions in the video encoding apparatus / decoding apparatus, each component is a separate hardware or separate software It does not mean that it is implemented. For example, two or more of each configuration may be combined to form one configuration, or one configuration may be divided into a plurality of configurations. Embodiments in which each configuration is integrated and / or separated are also included in the scope of the present invention without departing from the spirit of the present invention.

Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. Hereinafter, the same reference numerals are used for the same components in the drawings, and redundant description of the same components is omitted.

1 is a block diagram schematically illustrating a video encoding apparatus according to an embodiment of the present invention. The video encoding / decoding method or apparatus may be implemented by an extension of a general video encoding / decoding method or apparatus that does not provide scalability, and the block diagram of FIG. 1 is based on a scalable video encoding apparatus. An embodiment of a video encoding apparatus that may be represented.

Referring to FIG. 1, the encoding apparatus 100 may include a picture divider 105, a predictor 110, a transformer 115, a quantizer 120, a reordering unit 125, an entropy encoding unit 130, An inverse quantization unit 135, an inverse transform unit 140, a filter unit 145, and a memory 150 are provided.

The picture dividing unit 105 may divide the input picture into at least one processing unit block. In this case, the block as the processing unit may be a prediction unit (hereinafter referred to as a PU), a transform unit (hereinafter referred to as a TU), or a coding unit (hereinafter referred to as "CU"). It may be called.

The processing unit blocks divided by the picture divider 105 may have a quad-tree structure.

The predictor 110 includes an inter predictor for performing inter prediction and an intra predictor for performing intra prediction, as described below. The prediction unit 110 generates a prediction block by performing prediction on the processing unit of the picture in the picture division unit 105. The processing unit of the picture in the prediction unit 110 may be a CU, a TU, or a PU. In addition, the prediction unit 110 may determine whether the prediction performed on the processing unit is inter prediction or intra prediction, and determine specific contents (eg, prediction mode, etc.) of each prediction method. In this case, the processing unit in which the prediction is performed and the processing unit in which the details of the prediction method and the prediction method are determined may be different. For example, the prediction method and the prediction mode may be determined in units of PUs, and the prediction may be performed in units of TUs.

Through inter prediction, a prediction block may be generated by performing prediction based on information of at least one picture of a previous picture and / or a subsequent picture of the current picture. In addition, through intra prediction, a prediction block may be generated by performing prediction based on pixel information in a current picture.

As a method of inter prediction, a skip mode, a merge mode, a motion vector prediction (MVP), and the like can be used. In inter prediction, a reference picture may be selected for a PU and a reference block corresponding to the PU may be selected. The reference block may be selected in integer pixel units. Subsequently, a prediction block is generated in which a residual signal with the current PU is minimized and the size of the motion vector is also minimized.

The prediction block may be generated in integer sample units, or may be generated in sub-pixel units such as 1/2 pixel unit or 1/4 pixel unit. In this case, the motion vector may also be expressed in units of integer pixels or less.

Information such as an index, a motion vector predictor, and a residual signal of a reference picture selected through inter prediction is entropy encoded and delivered to the decoding apparatus. When the skip mode is applied, the residual may be used as the reconstructed block, and thus the residual may not be generated, transformed, quantized, or transmitted.

When performing intra prediction, a prediction mode may be determined in units of PUs, and prediction may be performed in units of PUs. In addition, a prediction mode may be determined in units of PUs, and intra prediction may be performed in units of TUs.

In intra prediction, the prediction mode may have 33 directional prediction modes and at least two non-directional modes. The non-directional mode may include a DC prediction mode and a planner mode (Planar mode).

In intra prediction, a prediction block may be generated after applying a filter to a reference sample. In this case, whether to apply the filter to the reference sample may be determined according to the intra prediction mode and / or the size of the current block.

The PU may be a block of various sizes / types, for example, in the case of inter prediction, the PU may be a 2N × 2N block, a 2N × N block, an N × 2N block, an N × N block (N is an integer), or the like. In the case of intra prediction, the PU may be a 2N × 2N block or an N × N block (where N is an integer). In this case, the PU of the N × N block size may be set to apply only in a specific case. For example, the NxN block size PU may be used only for the minimum size CU or only for intra prediction. In addition to the above-described PUs, PUs such as N × mN blocks, mN × N blocks, 2N × mN blocks, or mN × 2N blocks (m <1) may be further defined and used.

The residual value (the residual block or the residual signal) between the generated prediction block and the original block is input to the converter 115. In addition, the prediction mode information, the motion vector information, etc. used for the prediction are encoded by the entropy encoding unit 130 together with the residual value and transmitted to the decoding apparatus.

The transform unit 115 performs transform on the residual block in units of transform blocks and generates transform coefficients.

The transform block is a rectangular block of samples to which the same transform is applied. The transform block can be a transform unit (TU) and can have a quad tree structure.

The transformer 115 may perform the transformation according to the prediction mode applied to the residual block and the size of the block.

For example, if intra prediction is applied to a residual block and the block is a 4x4 residual array, the residual block is transformed using a discrete sine transform (DST), otherwise the residual block is transformed into a discrete cosine transform (DCT). Can be converted using.

The transform unit 115 may generate a transform block of transform coefficients by the transform.

The quantization unit 120 may generate quantized transform coefficients by quantizing the residual values transformed by the transform unit 115, that is, the transform coefficients. The value calculated by the quantization unit 120 is provided to the inverse quantization unit 135 and the reordering unit 125.

The reordering unit 125 rearranges the quantized transform coefficients provided from the quantization unit 120. By rearranging the quantized transform coefficients, the encoding efficiency of the entropy encoding unit 130 may be increased.

The reordering unit 125 may rearrange the quantized transform coefficients in the form of a 2D block into a 1D vector form through a coefficient scanning method.

The entropy encoding unit 130 may perform entropy encoding on the quantized transform coefficients rearranged by the reordering unit 125. Entropy encoding may include, for example, encoding methods such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC). The entropy encoding unit 130 may include quantized transform coefficient information, block type information, prediction mode information, partition unit information, PU information, transmission unit information, and motion vector of the CUs received from the reordering unit 125 and the prediction unit 110. Various information such as information, reference picture information, interpolation information of a block, and filtering information may be encoded.

In addition, if necessary, the entropy encoding unit 130 may apply a constant change to a parameter set or syntax to be transmitted.

The inverse quantizer 135 inversely quantizes the quantized values (quantized transform coefficients) in the quantizer 120, and the inverse transformer 140 inversely transforms the inverse quantized values in the inverse quantizer 135.

The reconstructed block may be generated by combining the residual values generated by the inverse quantizer 135 and the inverse transform unit 140 and the prediction blocks predicted by the prediction unit 110.

In FIG. 1, it is described that a reconstructed block is generated by adding a residual block and a prediction block through an adder. In this case, the adder may be viewed as a separate unit (restore block generation unit) for generating a reconstruction block.

The filter unit 145 may apply a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) to the reconstructed picture.

The deblocking filter may remove distortion generated at the boundary between blocks in the reconstructed picture. The adaptive loop filter (ALF) may perform filtering based on a value obtained by comparing the reconstructed image with the original image after the block is filtered through the deblocking filter. ALF may be performed only when high efficiency is applied. The SAO restores the offset difference from the original image on a pixel-by-pixel basis to the residual block to which the deblocking filter is applied, and is applied in the form of a band offset and an edge offset.

Meanwhile, the filter unit 145 may not apply filtering to the reconstructed block used for inter prediction.

The memory 150 may store the reconstructed block or the picture calculated by the filter unit 145. The reconstructed block or picture stored in the memory 150 may be provided to the predictor 110 that performs inter prediction.

2 is a block diagram schematically illustrating a video decoding apparatus according to an embodiment of the present invention. As described above with reference to FIG. 1, the scalable video encoding / decoding method or apparatus may be implemented by extension of a general video encoding / decoding method or apparatus that does not provide scalability, and the block diagram of FIG. 2 shows scalable video decoding. Represents an embodiment of a video decoding apparatus that may be the basis of the apparatus

Referring to FIG. 2, the video decoding apparatus 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, a prediction unit 230, and a filter unit 235. Memory 240 may be included.

When an image bitstream is input in the video encoding apparatus, the input bitstream may be decoded according to a procedure in which image information is processed in the video encoding apparatus.

For example, when variable length coding (VLC, hereinafter called 'VLC') is used to perform entropy encoding in the video encoding apparatus, the entropy decoding unit 210 also uses the VLC. Entropy decoding may be performed by implementing the same VLC table as the table. In addition, when CABAC is used to perform entropy encoding in the video encoding apparatus, the entropy decoding unit 210 may correspondingly perform entropy decoding using CABAC.

Information for generating the prediction block among the information decoded by the entropy decoding unit 210 is provided to the predictor 230, and a residual value where entropy decoding is performed by the entropy decoding unit 210, that is, a quantized transform coefficient It may be input to the reordering unit 215.

The reordering unit 215 may reorder the information of the bitstream entropy decoded by the entropy decoding unit 210, that is, the quantized transform coefficients, based on the reordering method in the encoding apparatus.

The reordering unit 215 may reorder the coefficients expressed in the form of a one-dimensional vector by restoring the coefficients in the form of a two-dimensional block. The reordering unit 215 may generate an array of coefficients (quantized transform coefficients) in the form of a 2D block by scanning coefficients based on the prediction mode applied to the current block (transform block) and the size of the transform block.

The inverse quantization unit 220 may perform inverse quantization based on the quantization parameter provided by the encoding apparatus and the coefficient values of the rearranged block.

The inverse transform unit 225 may perform inverse DCT and / or inverse DST on the DCT and the DST performed by the transform unit of the encoding apparatus with respect to the quantization result performed by the video encoding apparatus.

The inverse transformation may be performed based on a transmission unit determined by the encoding apparatus or a division unit of an image. The DCT and / or DST in the encoding unit of the encoding apparatus may be selectively performed according to a plurality of pieces of information, such as a prediction method, a size and a prediction direction of the current block, and the inverse transform unit 225 of the decoding apparatus is configured in the transformation unit of the encoding apparatus. Inverse transformation may be performed based on the performed transformation information.

The prediction unit 230 may generate the prediction block based on the prediction block generation related information provided by the entropy decoding unit 210 and the previously decoded block and / or picture information provided by the memory 240.

When the prediction mode for the current PU is an intra prediction mode, intra prediction for generating a prediction block based on pixel information in the current picture may be performed.

When the prediction mode for the current PU is an inter prediction mode, inter prediction on the current PU may be performed based on information included in at least one of a previous picture or a subsequent picture of the current picture. In this case, motion information required for inter prediction of the current PU provided by the video encoding apparatus, for example, a motion vector, a reference picture index, and the like, may be derived by checking a skip flag, a merge flag, and the like received from the encoding apparatus.

The reconstruction block may be generated using the prediction block generated by the predictor 230 and the residual block provided by the inverse transform unit 225. In FIG. 2, it is described that the reconstructed block is generated by combining the prediction block and the residual block in the adder. In this case, the adder may be viewed as a separate unit (restore block generation unit) for generating a reconstruction block.

When the skip mode is applied, the residual is not transmitted and the prediction block may be a reconstruction block.

The reconstructed block and / or picture may be provided to the filter unit 235. The filter unit 235 may apply deblocking filtering, sample adaptive offset (SAO), and / or ALF to the reconstructed block and / or picture.

The memory 240 may store the reconstructed picture or block to use as a reference picture or reference block and provide the reconstructed picture to the output unit.

Entropy decoding unit 210, reordering unit 215, inverse quantization unit 220, inverse transform unit 225, predictor 230, filter unit 235, and memory 240 included in the decoding device 200. ) Components directly related to the decoding of an image, for example, an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, a prediction unit 230, and a filter unit ( 235) and the like may be distinguished from other components by a decoder or a decoder.

In addition, the decoding apparatus 200 may further include a parsing unit (not shown) which parses information related to an encoded image included in the bitstream. The parsing unit may include the entropy decoding unit 210 or may be included in the entropy decoding unit 210. Such a parser may also be implemented as one component of the decoder.

3 is a conceptual diagram schematically illustrating an embodiment of a scalable video coding structure using multiple layers to which the present invention can be applied. In FIG. 3, a GOP (Group of Picture) represents a picture group, that is, a group of pictures.

In order to transmit image data, a transmission medium is required, and its performance varies depending on the transmission medium according to various network environments. A scalable video coding method may be provided for application to such various transmission media or network environments.

The scalable video coding method is a coding method that improves encoding and decoding performance by removing redundancy between layers by using texture information, motion information, and residual signals between layers. The scalable video coding method may provide various scalability in terms of spatial, temporal, and image quality according to ambient conditions such as a transmission bit rate, a transmission error rate, and a system resource.

Scalable video coding may be performed using multiple layers structure to provide a bitstream applicable to various network situations. For example, the scalable video coding structure may include a base layer that compresses and processes image data using a general image decoding method, and compresses the image data using both the decoding information of the base layer and a general image decoding method. May include an enhancement layer for processing.

Here, the layer is an image and a bit divided based on spatial (eg, image size), temporal (eg, decoding order, image output order, frame rate), image quality, complexity, and the like. Means a set of bitstreams. In addition, the base layer may mean a reference layer or a base layer, and the enhancement layer may mean an enhancement layer. In addition, the plurality of layers may have a dependency between each other.

Referring to FIG. 3, for example, the base layer may be defined as a standard definition (SD), a frame rate of 15 Hz, and a 1 Mbps bit rate, and the first enhancement layer may be a high definition (HD), a frame rate of 30 Hz, and a 3.9 Mbps bit rate. The second enhancement layer may be defined as an ultra high definition (4K-UHE), a frame rate of 60 Hz, and a bit rate of 27.2 Mbps. The format, frame rate, bit rate, etc. are exemplary and may be determined differently as necessary. In addition, the number of hierarchies used is not limited to this embodiment and may be determined differently according to a situation.

For example, if the transmission bandwidth is 4 Mbps, the frame rate of the first enhancement layer HD may be reduced and transmitted at 15 Hz or less. The scalable video coding method can provide temporal, spatial and image quality scalability by the method described above in the embodiment of FIG. 3.

Scalable video coding has the same meaning as scalable video encoding from an encoding point of view and scalable video decoding from a decoding point of view.

As described above, due to heterogeneous communication networks and various terminals, scalability has become an important function of the current video format.

On the other hand, a bitstream including a plurality of layers is composed of Network Abstraction Layer (NAL) units that facilitate the adaptive transmission of video through a packet-switching network. Similar to the plurality of layers, in multi-view video coding including a plurality of multi-view images in a bitstream, the relationship between the plurality of viewpoints is a spatial layer in video supporting the plurality of layers. Similar to the relationship between.

4 is a diagram illustrating a hierarchical structure of coded images processed by a decoding apparatus.

A coded picture is divided into a network abstraction layer (NAL) between a video coding layer (VCL) that handles decoding of the picture and itself, and a subsystem that transmits and stores coded information. It is.

The NAL unit, which is a basic unit of the NAL, serves to map a coded image to a bit string of a subsystem such as a file format, a real-time transport protocol (RTP), or the like according to a predetermined standard.

On the other hand, the VCL is a parameter set (picture parameter set, sequence parameter set, video parameter set) corresponding to a header such as a sequence and a picture, and a Supplemental Enhancement Information (SEI) message additionally necessary for the decoding process of the image. It is separated from (slice data). A VCL with information about an image consists of slice data and a slice header.

As shown, the NAL unit consists of two parts: a NAL unit header and a raw byte sequence payload (RBSP) generated in the VCL. The NAL unit header includes information on the type of the corresponding NAL unit.

The NAL unit is divided into a VCL NAL unit and a non-VCL NAL unit according to the RBSP generated in the VCL. The VCL NAL unit refers to a NAL unit containing information about an image, and the non-VCL NAL unit represents a NAL unit including information (a parameter set or an SEI message) necessary for decoding the image.

The VCL NAL unit may be divided into various types according to the nature and type of pictures included in the corresponding NAL unit.

Meanwhile, scalability information of the bitstream is very important for effectively and efficiently converting the bitstream at all nodes in the content delivery path. In current high efficiency video coding for a single layer, the NAL unit header includes two fields related to layer information, temporal_id and reserved_one_5bits. The temporal_id having a length of 3 bits indicates a temporal layer of the video bitstream, and reserved_one_5bits corresponds to an area for later indicating other layer information. The temporal layer refers to a layer of a temporally scalable bitstream composed of VCL NAL units, and the temporal layer has a specific temporal_id value.

The present invention relates to a method for effectively describing extraction information and scalability information of an image in a bitstream capable of supporting a plurality of layers, a method for signaling the same, and an apparatus for implementing the same.

In the following description, the bitstream is divided into two types for convenience of description. If the bitstream supports only temporal scalability, express it as a base type, and if the bitstream can have scalability to support space / quality / time including time, extended type).

Table 1 shows the syntax of the NAL unit header encoded in the encoding apparatus and decoded in the decoding apparatus according to an embodiment of the present invention.

<표 1>TABLE 1

In Table 1, of the information contained in the NAL unit header.forbidden_zero_bitIs Should be zero.

nal_unit_type means the data structure of the RBSP included in the corresponding NAL unit. There are a plurality of types according to the data structure of the RBSP.

nal_ref_flag is a flag indicating information indicating whether a corresponding NAL unit is a non-reference picture or a reference picture on the entire bitstream at the time of encoding. If nal_ref_flag is 1, it means that the NAL unit includes a sequence parameter set (SPS), picture parameter set (PPS), adaptation parameter set (APS), or slice of a reference picture. If nal_ref_flag is 0, NALU indicates It means to include some or all of the slices.

reserved_one_5bits is information that can be utilized in an extension type that supports scalability, such as a 3D video coding extension type, and is used to identify additional layers present in a coded video sequence (CVS). The layer may be a spatially scalable layer, a quality scalable layer, a texture view, a depth view, or the like.

If the bitstream is a basic type, reserved_one_5bits becomes 0 and may be used to determine the amount of data included in the decoding unit of the bitstream.

temporal_idIs The identifier of the temporal layer for the NAL unit.

layer_id means a layer earner for the NAL unit and all VCL NAL units for one access unit have the same layer_id value. layer_id may be signaled with 8 bits.

The NAL unit header of Table 1 includes information such as nal_unit_type, nal_ref_flag, reserved_one_5bits, temporal_id, and layer_id. However, the NAL unit header can minimize transmission information and improve coding and decoding efficiency.

That is, as minimum information required at the NAL unit header level, for example, nal_unit_type, temporal_id, layer_id, etc. may be included in the NAL unit header, and the remaining information may be transmitted with reference to the NAL unit header at another syntax level, for example, the NAL unit. have.

As shown in Table 1, when nal_unit_type supports a single layer bitstream, that is, if the bitstream is a basic type, reserved_one_5bits and temporal_id are encoded and decoded sequentially, and if the bitstream is an extended type other than the basic type. The layer_id is encoded and transmitted to the decoding device.

For example, the base layer, that is, the base layer, may be signaled in order of reserved_one_5bits and temporal_id, and in the extended layer, layer_id having a form in which reserved_one_5bits and temporal_id are combined may be signaled.

The layer_id may be additionally signaled in the video parameter set (VPS) as well as the NAL unit header.

Meanwhile, the size of the NAL unit header is fixed at 2 bytes, and all information required in this 2 byte space must be transmitted.

For the basic type bitstream, only temporal scalability is supported, which is described by the temporal_id syntax. However, in the extended type, information such as layer information or dependency should be included in the NAL unit header, and reserved_one_5bits, which was not used in the basic type, is used to transmit such information.

reserved_one_5bits may be insufficient to transmit all information required by the extension type as a 5-bit signal. That is, 32 bits can be represented by all 5 bits, but when the bitstream supports a large number of scalability types or supports a plurality of view layers, 5 bits may be insufficient for identifying all layers.

In addition, conventionally, temporal_id is signaled before reserved_one_5bits in the NAL unit header. In other words, temporal_id for identifying a temporal layer is signaled first, followed by reserved_one_5bits for identifying a layer used for an extension type. This separation between temporal scalability and other types of scalability can lead to confusion. Temporal scalability may be one form of scalability, in which case temporal scalability may be treated similarly to other scalability.

In addition, when temporal_id is signaled before reserved_one_5bits, 3 bits are mandatoryly allocated to account for temporal scalability. However, temporal scalability is not always used, and if the bitstream is an extended type, temporal scalability may not be supported and may support other types of scalability. In this case, three bits that have been allocated to temporal scalability may be unnecessary information. This unnecessary temporal_id can be used to describe other scalability.

In view of this, the present invention uses temporal_id as one piece of information for identifying a layer in the extended type bitstream. In other words, reserved_one_5bits as well as reserved_one_5bits and temporal_id are used for identification of layers in the extended type.

In addition, when signaling the reserved_one_5bits before the temporal_id by changing the signaling order of the reserved_one_5bits and the temporal_id as in the present invention, parsing efficiency is increased. Conventionally, three bits of temporal_id have to be parsed and 5 bits of reserved_one_5bits have to be parsed. However, according to the present invention, two pieces of information, namely reserved_one_5bits and temporal_id, can be parsed at once. In other words, instead of reading the 5-bit bitstream and the 3-bit bitstream twice, the 8-bit bitstream can be read at a time.

The number of times of parsing may vary according to the signaling order of reserved_one_5bits and temporal_id. For example, when the temporal_id value is 101 and the temporal_id is signaled later, the value of parsing reserved_one_5bits and temporal_i is 00000101, and when the temporal_id is signaled first, the parsing value is 10100000. In the latter case, according to the conventional scheme, the decoding apparatus must parse the bitstream twice.

In addition, if 8 bits are required to signal layer_id, the parsing of reserved_one_5bits and temporal_id simultaneously as described above is treated similarly to parsing layer_id, thereby increasing the implementation efficiency.

As in the present invention, when temporal_id is used as one information for identifying a layer in the extended type bitstream, 8 bits may be used to identify the layer, thereby increasing the number of layers that can be represented from 32 to 256. There is.

In addition, according to the present invention, since temporal scalability is treated the same as other types of scalability, there is no confusion as to whether temporal scalability is the same as other types of scalability. In addition, since temporal_id may be used in combination with other types of scalability, there is an advantage of expressing the scalability more variously.

In addition, in the case of a bitstream that does not apply temporal scalability, bits allocated to temporal scalability in the bitstream may be usefully used to describe other scalability.

Table 2 shows syntax of the NAL unit header according to another embodiment of the present invention.

<표 2>TABLE 2

Layer_id shown in Table 2 means a temporal identifier for the NAL unit. All VCL NAL units for one access unit have the same layer_id value, and in the case of a bitstream supporting a single layer, the range of layer_id may be 0 to 7, and the access unit is an access unit that becomes a random access point. The layer_id value for all VCL NAL units for the corresponding access unit is zero.

In this embodiment, reserved_one_5bits and temporal_id are combined to use a new syntax called layer_id. As described above, in the case of a bitstream supporting a single layer, that is, a basic type, layer_id may have a range from 0 to 7, and in this case, layer_id may be used as a syntax for signaling the same information as temporal_id.

The NAL unit header of Table 2 also includes information such as nal_unit_type, nal_ref_flag, reserved_one_5bits, temporal_id, layer_id and the like in Table 1, but it is also possible to minimize the information of the NAL unit header to improve transmission efficiency and coding / decoding efficiency.

On the other hand, if all non-reference pictures, especially non-reference pictures corresponding to most of the highest temporal layers, are extracted, then nal_ref_flag of all pictures remaining after extraction becomes 1. However, some pictures of the extracted bitstream, that is, pictures corresponding to the highest temporal layer in the remaining bitstream, become non-reference pictures even if nal_ref_flag is 1.

Thus, the bits that have been assigned to this nal_ref_flag can be used for other syntax elements of the NAL unit header, for example temporal_id or reserved_one_5bits. If one bit allocated to nal_ref_flag is used for temporal_id or reserved_one_5bits, it becomes possible to identify a larger number of layers included in the bitstream.

For example, when 1 bit allocated to nal_ref_flag is used for reserved_one_5bits, reserved_one_5bits may be a 6-bit signal in which 1 bit is added.

As described above, the NAL unit may be divided into various types according to the nature, type, etc. of a picture included in the corresponding NAL unit.

Table 3 shows an example of the NAL unit type.

<표 3>TABLE 3

As described in Table 3, the NAL unit type may be classified into a VCL NAL unit and a non-VCL NAL unit according to whether or not it includes information about an image. The VCL NAL unit refers to a NAL unit containing information about an image, and the non-VCL NAL unit represents a NAL unit including information (a parameter set or an SEI message) necessary for decoding the image.

The VCL NAL unit may be divided into a picture that is randomly accessible and a picture that is not. In Table 3, NAL units having 4 to 8 nal_unit_type are pictures that can be randomly accessed, and NAL units having 1 to 3 nal_unit_type are pictures that are not to be randomly accessed.

5 is a diagram for describing a picture that can be randomly accessed.

A random-accessible picture, that is, an intra random access point (IRAP) picture that becomes a random access point, is the first picture in decoding order in a bitstream during random access and includes only an I slice.

5 illustrates an output order or display order of a picture and a decoding order. As shown, the output order and decoding order of the pictures may be different. For convenience, the pictures are divided into predetermined groups and described.

Pictures belonging to the first group (I) represent the preceding picture in both the IRAP picture and the output order and the decoding order, and pictures belonging to the second group (II) are preceded in the output order but following the decoding order than the IRAP picture. Indicates. The pictures of the last third group (III) are later in both output order and decoding order than IRAP pictures.

The pictures of the first group I may be decoded and output regardless of the IRAP picture.

Pictures belonging to the second group (II) that are output prior to the IRAP picture are called leading pictures, and the leading pictures may be problematic in the decoding process when the IRAP picture is used as a random access point.

A picture belonging to the third group (III) whose output and decoding order follows the IRAP picture is called a normal picture. The normal picture is not used as the reference picture of the leading picture.

The random access point where random access occurs in the bitstream becomes an IRAP picture, and random access starts as the first puncher of the second group (II) is output.

The IRAP picture may be any one of an instantaneous decoding refresh (IDR) picture, a clean random access (CRA) picture, and a broken link access (BLA) picture.

6 is a diagram for explaining an IDR picture.

An IDR picture is a picture that becomes a random access point when a group of pictures has a closed structure. Since the IDR picture is an IRAP picture, it includes only I slices, and may be the first picture in decoding order in the bitstream, or may be in the middle of the bitstream. When an IDR picture is decoded, all reference pictures stored in a decoded picture buffer (DPB) are marked as “unused for reference”.

The bars shown in FIG. 6 represent pictures and arrows represent reference relationships as to whether a picture can use another picture as a reference picture. The x mark displayed on the arrow indicates that the picture (s) cannot refer to the picture to which the arrow points.

As shown, the POC of the IDR picture is 32, the POC is 25 to 31, and the picture that is output before the IDR picture is the leading pictures 610. Pictures with a POC greater than 33 correspond to the normal picture 620.

The leading pictures 610 preceding the IDR picture can use the leading picture different from the IDR picture as the reference picture, but cannot use the past picture 630 preceding the leading picture 610 as the reference picture.

The normal pictures 620 following the IDR picture may be decoded with reference to the IDR picture and the leading picture and other normal pictures.

7 is a diagram for explaining a CRA picture.

A CRA picture is a picture that becomes a random access point when a group of pictures has an open structure. Since the CRA picture is also an IRAP picture, it contains only I slices, and may be the first picture in decoding order in the bitstream, or may be in the middle of the bitstream for normal play.

Bars shown in FIG. 7 indicate pictures, and arrows indicate reference relationships as to whether a picture can use another picture as a reference picture. The x mark displayed on the arrow indicates that the picture or pictures cannot refer to the picture indicated by the arrow.

The leading pictures 710 preceding the CRA picture may use both the CRA picture and other leading pictures, and past pictures 730 preceding the leading pictures 710 as reference pictures.

On the other hand, normal pictures 720 following the CRA picture may be decoded with reference to a normal picture different from the CRA picture, but the leading pictures 710 may not be used as the reference picture.

A BLA picture has similar functions and properties to a CRA picture, and refers to a picture existing in the middle of the bitstream as a random access point when the coded picture is spliced or the bitstream is broken in the middle. However, since the BLA picture is regarded as the start of a new sequence, when the BLA picture is received by the decoder, all parameter information about the picture is received again, unlike the CRA picture.

The BLA picture may be determined from an encoding device, or a CLA picture received by a system that receives a bitstream from the encoding device may be changed into a BLA picture. For example, when the bitstream is spliced, the system converts the CRA picture into a BLA picture and provides the decoder to decode the image. In this case, parameter information about the image is newly provided from the system to the decoder. In the present invention, the decoder refers to all devices including an image processor for decoding an image, and may be implemented with the decoding apparatus of FIG. 2 or may mean a decoding module which is a core module for processing an image.

Returning to Table 3, a TED (tagged for discard) picture means a picture that can be discarded as a leading picture that cannot be decoded. The TED picture refers to a picture that is not normally decoded, such as referring to a reference picture that is not available. The TED picture is excluded from the decoding process and the output process.

A temporal layer access (TLA) picture is a picture indicating a position where upswitching is possible in temporal scalability, and indicates whether up switching is possible in a sublayer including a TLA picture or a sublayer having a larger temporal_id than the TLA picture.

In Table 3, nal_unit_type indicating a CRA picture is 4 and 5, and nal_unit_type indicating a BLA picture is 6 and 7. CRA pictures and BLA pictures may be classified as follows.

-Type 4: Coded slice of a CRAT (CRA with TFD) picture

-Type 5: Coded slice of a CRANT (CRA with no TFD) picture

-Type 6: Coded slice of a BLAT (Broken link access unit with TFD) picture

-Type 7: Coded slice of a BLANT (Broken link access unit with no TFD) picture

However, the pictures having the nal_unit_type of 4 to 7 may be duplicated to distinguish between the CRA picture and the BLA picture. The distinction between a CRA picture (CRA with TFD) and a CRA picture (CRA with no TFD) that accompanies a TFD picture is ambiguous, and it is not practical to distinguish whether a BLA picture also involves a TED picture. Therefore, distinguishing a CRA picture from a BLA picture according to the presence or absence of a TFD picture adds redundancy to the NAL unit type and may cause confusion because the distinction is not clear.

Accordingly, an embodiment of the present invention proposes to reconfigure the four types into two types to eliminate redundancy of the NAL unit type.

The new NAL unit type defines only CRA pictures and BLA pictures as NAL unit types, with or without TED pictures. That is, the CRA picture of Table 3 may be represented by one type that is not divided into Type 4 and Type 5, and the BLA picture may also be represented by one type that is not divided into Type 6 and Type 7.

In the present invention, a picture that can be followed by a TED picture as a picture that becomes a random access point is defined as a CRA picture and represented by one NAL unit type.

On the other hand, a picture that cannot be followed by a TED picture until a new random access point is defined as a BLA picture and represented by one NAL unit type. That is, there is no TED picture between the BLA picture and the next random access point.

In summary, a TED picture, a CRA picture, and a BLA picture are each represented by separate NAL unit types, and the CRA picture and the BLA picture are distinguished by two NAL units only by distinguishing between the CRA picture and the BLA picture depending on whether the TED picture can be followed. Do not classify by type.

By simplifying the four functionally similar NAL unit types to two, the NAL unit type can be more accurately defined, and this simplification has the effect of reducing complexity.

FIG. 8A illustrates an original bitstream output from an encoding apparatus, and FIG. 8B illustrates a bitstream provided to a decoder in a system.

A coded slice (CS) of FIG. 8 refers to a slice that is normally encoded, and a number for identifying a picture indicates an output order of a picture, that is, a POC.

If random access occurs in the bitstream of (a), the decoding process starts with the CRA picture. In this case, the system may change the NAL unit type of the CRA picture to the BLA picture as shown in (b) and remove all TED pictures that existed after the CRA picture from the bitstream.

In such a case, the decoder receiving the bitstream of (b) may decode the BLA picture of POC 28 and then sequentially decode the CS present later. In this case, the decoder may decode a subsequent picture after delaying the time by a predetermined delay time in order to maintain the picture bit string of the input bitstream, that is, to prevent the overflow or underflow of the buffer for storing the picture. have.

As shown, the encoding apparatus encodes a NAL unit including information related to an image (S910).

The NAL unit header includes layer identification information for identifying the scalable layer in the bitstream supporting the scalable layer. Such layer identification information may be encoded with syntax such as reserved_one_5bits and temporal_id or layer_id.

The encoding apparatus may sequentially encode reserved_one_5bits and temporal_id when the bitstream supports a single layer, and encode layer_id when the bitstream is an extension type other than the basic type.

Alternatively, only one syntax information called layer_id may be encoded by combining reserved_one_5bits and temporal_id regardless of the number of layers supported by the bitstream.

The encoding apparatus encodes information on the NAL unit type in the NAL unit header portion. An IDR picture, a CRA picture, and a BLA picture exist in a picture that becomes a random access point among the VCL NAL units, and a picture that is output before the IRAP picture is called a leading picture.

Each of these pictures is identified by different nal_unit_type information.

In the past, CRA pictures and BLA pictures were represented by two NAL unit types according to whether TED pictures that can be removed from the bitstream without being decoded among the leading pictures.

According to an embodiment of the present invention, the encoding apparatus encodes a TED picture, a CRA picture, and a BLA picture as an independent NAL unit type regardless of the presence or absence of a TED picture, and according to whether a TED picture can be followed by a CRA picture and Encode BLA pictures separately.

According to another embodiment, the NAL unit header of the NAL unit may not include nal_ref_fla, which is information indicating whether the NAL unit includes a slice including at least some or all of the non-reference picture.

In addition, according to another embodiment of the present invention, the encoding apparatus may encode the BLA picture with new nal_unit_type information to identify when there is a decodable leading picture other than the TED picture among the leading pictures encoded after the BLA picture.

The encoding apparatus transmits the NAL unit including the information related to the encoded image to the decoding apparatus in a bitstream (S902).

Referring to FIG. 10, the decoding apparatus receives a NAL unit including information related to an image encoded through a bitstream (S1001).

The NAL unit header includes nal_unit_type information for classifying NAL units according to layer identification information and properties for identifying a scalable layer in a bitstream supporting a scalable layer.

The decoding apparatus parses the header of the NAL unit and the NAL payload (S1002). Parsing of the image information may be performed by an entropy decoding unit or a separate parser.

The decoding apparatus may obtain various information included in the NAL unit header and the NAL payload through parsing.

The decoding apparatus may receive reserved_one_5bits and temporal_id, which are information for identifying a layer in a bitstream supporting a single layer, and parse two pieces of information at once. In this case, there is an effect that the bitstream supporting a plurality of layers can be parsed in a pattern similar to layer_id which is information for identifying a layer.

The decoding device may also parse nal_unit_type to classify the picture type and process an image correspondingly. For example, since an IDR picture, a CRA picture, and a BLA picture are pictures that become random access points, image processing corresponding to an I slice is performed, and the TED picture is not decoded.

If the decoding apparatus can change the CRA picture into a BLA picture, the decoding device may remove or not decode the TED picture received after the picture changed into the BLA picture from the bitstream.

In the exemplary system described above, the methods are described based on a flowchart as a series of steps or blocks, but the invention is not limited to the order of steps, and certain steps may occur in a different order or concurrently with other steps than those described above. Can be. In addition, since the above-described embodiments may include examples of various aspects, a combination of each embodiment should also be understood as an embodiment of the present invention. Accordingly, it is intended that the present invention cover all other replacements, modifications and variations that fall within the scope of the following claims.

Claims

Receiving a bitstream including a Network Abstraction Layer (NAL) unit including information related to the encoded image;
Parsing a NAL unit header of the NAL unit,
The NAL unit header includes layer information including reserved_one_5bits for identifying an extended layer in an extended bitstream and temporal_id for identifying a temporal layer of the bitstream.
And the layer information is reserved_one_5bits is received before temporal_id.
The method of claim 1,
and reserved_one_5bits and temporal_id are parsed simultaneously.
Receiving a bitstream including a Network Abstraction Layer (NAL) unit including information related to the encoded image;
Parsing a NAL unit header of the NAL unit,
The NAL unit header includes NAL unit type information corresponding to a NAL unit type.
The NAL unit type is a clean random access (CRA) picture type, which is a picture that becomes a random access point when a group of pictures has an open structure, and a coded picture is spliced or bit-coded. And a BLA (Broken link access) picture type existing in the middle of the bitstream as a random access point when the stream is broken in the middle.
The method of claim 3,
And the CAR picture has one NAL unit type irrespective of a leading picture which is output before a picture which becomes a random access point and subsequently decoded.
The method of claim 3,
And a leading picture which is output before a picture that becomes a random access point and subsequently decoded after the BLA picture does not exist.
The method of claim 3,
And after the BLA picture is output before a picture that becomes a random access point and decoded afterward, there is no leading picture that is not decoded and removed from a leading picture.