WO2013065301A1

WO2013065301A1 - Video coding device, video coding method, video coding program, transmission device, transmission method, and transmission program, as well as video decoding device, video decoding method, video decoding program, reception device, reception method, and reception program

Info

Publication number: WO2013065301A1
Application number: PCT/JP2012/006981
Authority: WO
Inventors: 博哉中村; 英樹竹原; 福島　茂; 上田　基晴
Original assignee: 株式会社Ｊｖｃケンウッド
Priority date: 2011-10-31
Filing date: 2012-10-31
Publication date: 2013-05-10

Abstract

An inter-prediction information derivation unit (104) derives inter-prediction information candidates from inter-prediction information pertaining to a prediction block to be coded and coded prediction blocks which are contiguous to the prediction block to be coded within the same picture, and inter-prediction information pertaining to prediction blocks in a coded picture which is different from that of the prediction block to be coded. The inter-prediction information derivation unit (104) determines, among the derived inter-prediction information candidates, an inter-prediction information candidate to be used for the inter-prediction of the prediction block to be coded. A second coded bit sequence generation unit (110) encodes indices of the inter-prediction information candidates on the basis of the number of the inter-prediction information candidates.

Description

Video encoding device, video encoding method, video encoding program, transmission device, transmission method and transmission program, and video decoding device, video decoding method, video decoding program, reception device, reception method and reception program

The present invention relates to a moving image encoding and decoding technique, and more particularly to a moving image encoding and decoding technique using motion compensated prediction.

MPEG-4 AVC / H.3 is a typical video compression encoding system. There are H.264 standards. MPEG-4 AVC / H. In H.264, motion compensation is used in which a picture is divided into a plurality of rectangular blocks, a picture that has already been encoded / decoded is used as a reference picture, and motion from the reference picture is predicted. This method of predicting motion by motion compensation is called inter prediction or motion compensated prediction. MPEG-4 AVC / H. In the inter prediction in H.264, a plurality of pictures can be used as reference pictures, and the most suitable reference picture is selected for each block from the plurality of reference pictures to perform motion compensation. Therefore, a reference index is assigned to each reference picture, and the reference picture is specified by this reference index. Note that, for B pictures, a maximum of two reference pictures that have been encoded and decoded can be selected and used for inter prediction. The predictions from these two reference pictures are distinguished as L0 prediction (list 0 prediction) mainly used as forward prediction and L1 prediction (list 1 prediction) mainly used as backward prediction.

Furthermore, bi-prediction using two inter predictions of L0 prediction and L1 prediction at the same time is also defined. In the case of bi-prediction, bi-directional prediction is performed, the inter-predicted signals of L0 prediction and L1 prediction are multiplied by a weighting coefficient, an offset value is added and superimposed, and a final inter-predicted image signal is obtained. Generate. As weighting coefficients and offset values used for weighted prediction, representative values are set for each reference picture in each list and encoded. The encoding information related to inter prediction includes, for each block, a prediction mode for distinguishing between L0 prediction and L1 prediction and bi-prediction, a reference index for specifying a reference picture for each reference list for each block, and a moving direction and a moving amount of the block. There is a motion vector that expresses and encodes and decodes the encoded information.

Furthermore, MPEG-4 AVC / H. H.264 defines a direct mode for generating inter prediction information of a block to be encoded or decoded from inter prediction information of an encoded or decoded block. In the direct mode, encoding of inter prediction information is not necessary, so that encoding efficiency is improved.

The temporal direct mode using the correlation of inter prediction information in the time direction will be described with reference to FIG. A picture in which the reference index of L1 is registered as 0 is defined as a reference picture colPic. A block in the same position as the block to be encoded or decoded in the reference picture colPic is set as a reference block.

If the reference block is encoded using the L0 prediction, the L0 motion vector of the reference block is set as the reference motion vector mvCol, and the reference block is not encoded using the L0 prediction and is encoded using the L1 prediction. If this is the case, the L1 motion vector of the reference block is set as the reference motion vector mvCol. The picture referred to by the reference motion vector mvCol is the L0 reference picture in the temporal direct mode, and the reference picture colPic is the L1 reference picture in the temporal direct mode.

The L0 motion vector mvL0 and the L1 motion vector mvL1 in the temporal direct mode are derived from the reference motion vector mvCol by scaling calculation processing.

The inter-picture distance td is derived by subtracting the POC of the L0 reference picture in the temporal direct mode from the POC of the base picture colPic. Note that POC is a variable associated with the picture to be encoded, and is set to a value that increases by 1 in the picture output order. The difference in POC between two pictures indicates the inter-picture distance in the time axis direction.
td = POC of base picture colPic-POC of L0 reference picture in temporal direct mode

The inter-picture distance tb is derived by subtracting the POC of the L0 reference picture in the temporal direct mode from the POC of the picture to be encoded or decoded.
tb = POC of picture to be encoded or decoded-POC of L0 reference picture in temporal direct mode

The motion vector mvL0 of L0 in the temporal direct mode is derived from the reference motion vector mvCol by scaling calculation processing.
mvL0 = tb / td * mvCol

The motion vector mvL1 of L1 is derived by subtracting the reference motion vector mvCol from the motion vector mvL0 of L0 in the temporal direct mode.
mvL1 = mvL0-mvCol

JP 2004-129191 A

In the conventional method, since the number of motion information candidates to be referred to for each block in motion compensation is conventionally fixed, the encoding efficiency may not be improved.

Under such circumstances, the present inventors have come to recognize the necessity of further compressing the encoded information and reducing the overall code amount in the moving image encoding method using motion compensated prediction.

The present invention has been made in view of such a situation, and an object of the present invention is to calculate a moving picture coding that reduces coding amount of coding information and improves coding efficiency by calculating coding information candidates. And providing a decoding technique.

In order to solve the above-described problem, a video encoding device according to an aspect of the present invention is a video encoding device that encodes a video using inter prediction in units of blocks obtained by dividing each picture. Inter prediction information of an encoded prediction block adjacent to the encoding target prediction block in the same picture as the encoding target prediction block, and a prediction block in an encoded picture different from the encoding target prediction block Prediction information deriving unit (104) for deriving inter prediction information candidates from the inter prediction information, and inter prediction information used for inter prediction of the prediction block to be encoded from the derived inter prediction information candidates And a first encoding unit that encodes a syntax element indicating the number of candidates for the inter prediction information. Includes a (110), an index indicating a candidate for the inter prediction information determined by the determination unit, a second encoding unit for encoding based on a number of candidates of the inter prediction information and (110).

Another aspect of the present invention is also a moving picture coding apparatus. This apparatus is a moving picture encoding apparatus that encodes the moving picture using motion compensated prediction in units of blocks obtained by dividing each picture of a moving picture, and a prediction block adjacent to a prediction block to be encoded, or Prediction for deriving a candidate for inter prediction information from inter prediction information of a prediction block existing at the same position or in the vicinity of the prediction block of the encoding target in a coded picture that is temporally different from the prediction block of the encoding target An information deriving unit (104), a candidate number limiting unit (135) for limiting the number of candidate inter prediction information according to at least one of a profile indicating a set of processing functions and a level indicating decoding processing capability, and a candidate One inter prediction information candidate is selected from the limited number of inter prediction information candidates, and the selected A motion-compensated prediction unit (105) that performs inter prediction of the prediction block to be encoded by a candidate of prediction information, and a syntax element indicating the number of inter prediction information candidates used to limit the number of candidates. And an encoding unit (110).

Still another aspect of the present invention is a video encoding method. This method is a moving picture coding method for coding a moving picture using inter prediction in units of blocks obtained by dividing each picture, and the coding target prediction block in the same picture as the coding target prediction block Prediction information for deriving a candidate for inter prediction information from inter prediction information of an encoded prediction block close to the image and inter prediction information of a prediction block in a coded picture different from the prediction block to be encoded A derivation step, a determination step for determining inter prediction information candidates to be used for inter prediction of the prediction block to be encoded from the derived inter prediction information candidates, and a syntax indicating the number of candidates for the inter prediction information A first encoding step for encoding an element, and inter prediction information determined by the determination step. An index indicating the complement, and a second coding step of coding on the basis of the number of candidates of the inter prediction information.

Still another aspect of the present invention is a transmission device. This apparatus includes a packet processing unit that packetizes an encoded bit sequence encoded by a moving image encoding method that encodes a moving image using inter prediction in units of blocks obtained by dividing each picture, and obtains encoded data; A transmission unit that transmits the packetized encoded data. The moving picture encoding method is different from the prediction block of the encoding target and the inter prediction information of the encoded prediction block adjacent to the prediction block of the encoding target in the same picture as the prediction block to be encoded. A prediction information deriving step for deriving a candidate of inter prediction information from inter prediction information of a prediction block in a coded picture, and an inter prediction information of the prediction block to be encoded from the derived candidate of inter prediction information. A determination step for determining candidates for inter prediction information used for prediction; a first encoding step for encoding a syntax element indicating the number of candidates for inter prediction information; and the inter prediction information determined by the determination step. Second encoding for encoding an index indicating a candidate based on the number of candidates of the inter prediction information And a step.

Still another aspect of the present invention is a transmission method. This method includes a packet processing step of packetizing an encoded bit string encoded by a moving image encoding method that encodes a moving image using inter prediction in units of blocks obtained by dividing each picture to obtain encoded data; Transmitting the packetized encoded data. The moving picture encoding method is different from the prediction block of the encoding target and the inter prediction information of the encoded prediction block adjacent to the prediction block of the encoding target in the same picture as the prediction block to be encoded. A prediction information deriving step for deriving a candidate of inter prediction information from inter prediction information of a prediction block in a coded picture, and an inter prediction information of the prediction block to be encoded from the derived candidate of inter prediction information. A determination step for determining candidates for inter prediction information used for prediction; a first encoding step for encoding a syntax element indicating the number of candidates for inter prediction information; and the inter prediction information determined by the determination step. Second encoding for encoding an index indicating a candidate based on the number of candidates of the inter prediction information And a step.

A moving picture decoding apparatus according to an aspect of the present invention is a moving picture decoding apparatus that decodes a coded bit sequence in which a moving picture is encoded using inter prediction in units of blocks obtained by dividing each picture, and the prediction of a decoding target Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the block and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving unit (205) for deriving prediction information candidates; a first decoding unit (202) for decoding a syntax element indicating the number of inter prediction information candidates and acquiring the number of inter prediction information candidates; , Based on the number of candidates for the inter prediction information acquired by the first decoding unit, used for inter prediction of the prediction block to be decoded A second decoding unit (202) that decodes an index indicating a candidate for inter prediction information and a candidate for inter prediction information indicated by the index are selected from the inter prediction information candidates derived by the prediction information deriving unit. And a selection unit (205).

Another aspect of the present invention is also a video decoding device. This apparatus is a moving picture decoding apparatus that decodes a coded bit sequence in which the moving picture is coded using motion compensated prediction in units of blocks obtained by dividing each picture of a moving picture, and is adjacent to a prediction block to be decoded. A candidate for inter prediction information is derived from inter prediction information of a prediction block existing at the same position or in the vicinity of the prediction block to be decoded in a decoded picture temporally different from the prediction block to be decoded or the prediction block to be decoded A prediction information deriving unit (205), a first decoding unit (202) for decoding a syntax element indicating the number of inter prediction information candidates, and obtaining the number of inter prediction information candidates, and the first decoding A candidate number limiting unit (23) that limits the number of candidate inter prediction information using the number of inter prediction information candidates acquired by ) And a second index that decodes an index indicating the candidate of inter prediction information that becomes inter prediction information of the prediction block to be decoded, based on the number of candidates of inter prediction information acquired by the first decoding unit. The decoding unit (202) selects the inter prediction information candidate indicated by the decoded index from the inter prediction information candidates with a limited number of candidates, and the decoding is performed according to the selected inter prediction information candidate. A motion compensation prediction unit (206) that performs inter prediction on the target prediction block.

Still another aspect of the present invention is a moving picture decoding method. This method is a moving picture decoding method for decoding a coded bit string in which a moving picture is encoded using inter prediction in units of blocks obtained by dividing each picture, and the decoding in the same picture as a prediction block to be decoded Prediction for deriving a candidate of inter prediction information from inter prediction information of a decoded prediction block close to the target prediction block and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded An information deriving step; a first decoding step of decoding a syntax element indicating the number of inter prediction information candidates; and obtaining the number of inter prediction information candidates; and the inter prediction information acquired in the first decoding step. Based on the number of candidates for the inter prediction information used for inter prediction of the prediction block to be decoded. A second decoding step of decoding the index, from the candidate of the said inter-prediction information derived by the prediction information deriving step includes a selection step of selecting the candidate of the inter prediction information the index shown.

Still another aspect of the present invention is a receiving device. This apparatus is a receiving apparatus that receives and decodes an encoded bit string in which a moving image is encoded, and an encoded bit sequence in which the moving image is encoded using inter prediction in units of blocks obtained by dividing each picture. A receiving unit that receives packetized encoded data, a restoration unit that performs packet processing on the received encoded data and restores an original encoded bit string, and the decoding in the same picture as a prediction block to be decoded Prediction for deriving a candidate of inter prediction information from inter prediction information of a decoded prediction block close to the target prediction block and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded An information deriving unit (205), and decoding a syntax element indicating the number of inter prediction information candidates from the restored encoded bit string, The first decoding unit (202) that acquires the number of candidates for inter prediction information and the inter prediction of the prediction block to be decoded based on the number of candidates for the inter prediction information acquired by the first decoding unit. An index indicating a candidate for inter prediction information is indicated by the index from a second decoding unit (202) that decodes the decoded encoded bit string from the restored encoded bit sequence and the candidate for inter prediction information derived by the prediction information deriving unit. And a selection unit (205) that selects a candidate for inter prediction information.

Still another aspect of the present invention is a receiving method. This method is a receiving method for receiving and decoding an encoded bit string in which a moving image is encoded, and an encoded bit sequence in which a moving image is encoded using inter prediction in units of blocks obtained by dividing each picture. A receiving step for receiving packetized encoded data; a restoring step for packetizing the received encoded data to restore the original encoded bit string; and the decoding in the same picture as the prediction block to be decoded Prediction for deriving a candidate of inter prediction information from inter prediction information of a decoded prediction block close to the target prediction block and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A syntax element indicating the number of inter prediction information candidates is decoded from the information derivation step and the restored encoded bit string. A first decoding step for obtaining the number of inter prediction information candidates, and an inter prediction used for inter prediction of the prediction block to be decoded based on the number of inter prediction information candidates obtained in the first decoding step. A second decoding step for decoding an index indicating a prediction information candidate from the restored encoded bit string, and an inter prediction information indicated by the index from the inter prediction information candidate derived by the prediction information deriving step. A selection step of selecting candidates.

It should be noted that an arbitrary combination of the above-described components and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

According to the present invention, it is possible to reduce the amount of generated code of encoded information to be transmitted and improve the encoding efficiency.

It is a block diagram which shows the structure of the moving image encoder which performs the motion vector prediction method which concerns on embodiment. It is a block diagram which shows the structure of the moving image decoding apparatus which performs the motion vector prediction method which concerns on embodiment. It is a figure explaining a tree block and an encoding block. It is a figure explaining the division mode of a prediction block. It is a figure explaining the prediction block of the spatial merge candidate in merge mode. It is a figure explaining the prediction block of the spatial merge candidate in merge mode. It is a figure explaining the prediction block of the spatial merge candidate in merge mode. It is a figure explaining the prediction block of the spatial merge candidate in merge mode. It is a figure explaining the prediction block of the time merge candidate in merge mode. It is a figure explaining the syntax of the bit stream in the prediction block unit regarding merge mode. It is a figure explaining an example of the entropy code | symbol of the syntax element of a merge index. It is a block diagram which shows the detailed structure of the inter prediction information derivation | leading-out part of the moving image encoder of FIG. It is a block diagram which shows the detailed structure of the inter prediction information derivation | leading-out part of the moving image decoding apparatus of FIG. It is a flowchart explaining the merge candidate derivation | leading-out process of merge mode, and the construction process procedure of a merge candidate list | wrist. It is a flowchart explaining the space merge candidate derivation | leading-out process of merge mode. It is a flowchart explaining the derivation | leading-out process procedure of the reference index of the time merge candidate of merge mode. It is a flowchart explaining the time merge candidate derivation | leading-out process procedure of merge mode. It is a flowchart explaining the derivation | leading-out process procedure of the picture of the time from which merge mode differs. It is a flowchart explaining the derivation | leading-out process procedure of the prediction block of the picture of the time from which merge mode differs. It is a flowchart explaining the time merge candidate derivation | leading-out process procedure of merge mode. It is a flowchart explaining the time merge candidate derivation | leading-out process procedure of merge mode. It is a flowchart explaining the scaling calculation processing procedure of a motion vector. It is a flowchart explaining the scaling calculation processing procedure of a motion vector. It is a flowchart explaining the registration processing procedure of the merge candidate to the merge candidate list | wrist of merge mode. FIG. 10 is a flowchart illustrating a setting process procedure of a final merge candidate number finalNumMergeCand that is common on the encoding side and the decoding side according to the method of Embodiment 1. FIG. It is a figure explaining the syntax of the bit stream of the slice header regarding merge mode. 10 is a flowchart for explaining a setting process procedure of a final merge candidate number finalNumMergeCand on the encoding side according to the method of the second embodiment. 10 is a flowchart for explaining a setting process procedure of a final merge candidate number finalNumMergeCand on the decoding side according to the method of the second embodiment. Conventional MPEG-4 AVC / H. It is a figure explaining the H.264 time direct mode.

In the present embodiment, with regard to moving picture coding, in particular, to improve coding efficiency in moving picture coding in which a picture is divided into rectangular blocks of an arbitrary size and shape and motion compensation is performed in units of blocks between pictures. Next, a plurality of predicted motion vectors are derived from the motion vectors of the block adjacent to the encoding target block or the block of the encoded picture, and the difference vector between the motion vector of the encoding target block and the selected prediction motion vector The amount of code is reduced by calculating and encoding. Alternatively, the coding amount is reduced by deriving the coding information of the coding target block by using the coding information of the block adjacent to the coding target block or the block of the coded picture. Also, in the case of decoding a moving image, a plurality of predicted motion vectors are calculated from the motion vectors of a block adjacent to the decoding target block or a decoded picture block, and selected from the difference vector decoded from the encoded stream The motion vector of the decoding target block is calculated from the predicted motion vector and decoded. Alternatively, the encoding information of the decoding target block is derived by using the encoding information of the block adjacent to the decoding target block or the block of the decoded picture.

First, technologies and technical terms used in the present embodiment are defined.

(About tree blocks and coding blocks)
In the embodiment, a slice obtained by dividing a picture into one or more is a basic unit of encoding, and a slice type, which is information indicating a slice type, is set for each slice. As shown in FIG. 3, the slice is equally divided into square units of any same size. This unit is defined as a tree block, and is a block to be encoded or decoded in a slice (a block to be encoded in the encoding process and a block to be decoded in the decoding process. It is used as a basic unit of address management for specifying. Except for monochrome, the tree block is composed of one luminance signal and two color difference signals. The size of the tree block can be freely set to a power of 2 depending on the picture size and the texture in the picture. In order to optimize the encoding process according to the texture in the picture, the tree block divides the luminance signal and chrominance signal in the tree block hierarchically into four parts (two parts vertically and horizontally) as necessary, The block can be made smaller in block size. Each block is defined as a coding block, and is a basic unit of processing when performing coding and decoding. Except for monochrome, the coding block is also composed of one luminance signal and two color difference signals. The maximum size of the coding block is the same as the size of the tree block. An encoded block having the minimum size of the encoded block is called a minimum encoded block, and can be freely set to a power of 2.

In FIG. 3, the coding block A is a single coding block without dividing the tree block. The encoding block B is an encoding block formed by dividing a tree block into four. The coding block C is a coding block obtained by further dividing the block obtained by dividing the tree block into four. The coding block D is a coding block obtained by further dividing the block obtained by dividing the tree block into four parts and further dividing the block into four twice hierarchically, and is a coding block of the minimum size.

(About prediction mode)
For each coding block, switching is performed between intra prediction (MODE_INTRA) in which prediction is performed from a decoded surrounding image signal and inter prediction (MODE_INTER) in which prediction is performed from an image signal of a decoded picture. A mode for identifying the intra prediction (MODE_INTRA) and the inter prediction (MODE_INTER) is defined as a prediction mode (PredMode). The prediction mode (PredMode) has intra prediction (MODE_INTRA) or inter prediction (MODE_INTER) as a value, and can be selected and encoded.

(About split mode, prediction block, prediction unit)
When performing intra prediction (MODE_INTRA) and inter prediction (MODE_INTER) by dividing the picture into blocks, the coded block is divided as necessary to reduce the unit for switching the intra prediction and inter prediction methods. Make predictions. A mode for identifying the division method of the luminance signal and the color difference signal of the coding block is defined as a division mode (PartMode). Furthermore, this divided block is defined as a prediction block. As shown in FIG. 4, four types of partition modes (PartMode) are defined according to the method of dividing the luminance signal of the coding block.
The division mode (PartMode) of what is regarded as one prediction block without dividing the luminance signal of the coding block (FIG. 4A) is 2N × 2N division (PART_2Nx2N), and the luminance signal of the coding block is horizontally The division mode (PartMode) of the two prediction blocks (FIG. 4B) is divided into 2N × N divisions (PART_2NxN), the luminance signal of the encoded block is divided in the vertical direction, and the encoded block is The partition mode (PartMode) of the two prediction blocks (FIG. 4 (c)) is N × 2N partition (PART_Nx2N), and the luminance signal of the encoded block is divided into four prediction blocks by horizontal and vertical equal partitioning. The division mode (PartMode) of (FIG. 4D) is defined as N × N division (PART_NxN), respectively. Except for N × N division (PART_NxN) of intra prediction (MODE_INTRA), the color difference signal is also divided for each division mode (PartMode) in the same manner as the vertical / horizontal division ratio of the luminance signal.

In order to identify each prediction block within the coding block, a number starting from 0 is assigned to the prediction block existing inside the coding block in the coding order. This number is defined as a split index PartIdx. A number described in each prediction block of the encoded block in FIG. 4 represents a partition index PartIdx of the prediction block. In the 2N × N division (PART_2NxN) shown in FIG. 4B, the division index PartIdx of the upper prediction block is set to 0, and the division index PartIdx of the lower prediction block is set to 1. In the N × 2N division (PART_Nx2N) shown in FIG. 4C, the division index PartIdx of the left prediction block is set to 0, and the division index PartIdx of the right prediction block is set to 1. In the N × N partition (PART_NxN) shown in FIG. 4D, the partition index PartIdx of the upper left prediction block is 0, the partition index PartIdx of the upper right prediction block is 1, and the partition index PartIdx of the lower left prediction block is 2. And the division index PartIdx of the prediction block at the lower right is set to 3.

When the prediction mode (PredMode) is inter prediction (MODE_INTER), except for the coding block D which is the smallest coding block, the partition mode (PartMode) is 2N × 2N partition (PART_2Nx2N), 2N × N partition (PART_2NxN), and N × 2N partition (PART_Nx2N) is defined, and only the coding block D which is the smallest coding block, the partition mode (PartMode) is 2N × 2N partition (PART_2Nx2N), 2N × N partition (PART_2NxN), and N × 2N In addition to the division (PART_Nx2N), N × N division (PART_NxN) is defined. The reason why N × N division (PART_NxN) is not defined other than the smallest coding block is that, except for the smallest coding block, the coding block can be divided into four to represent a small coding block.

(Position of tree block, coding block, prediction block, transform block)
The position of each block including the tree block, the encoding block, the prediction block, and the transform block according to the present embodiment has the position of the pixel of the luminance signal at the upper left of the luminance signal screen as the origin (0, 0). The pixel position of the upper left luminance signal included in each block area is represented by two-dimensional coordinates (x, y). The direction of the coordinate axis is a right direction in the horizontal direction and a downward direction in the vertical direction, respectively, and the unit is one pixel unit of the luminance signal. Of course, the luminance signal and the color difference signal have the same image size (number of pixels) and the color difference format is 4: 4: 4. Of course, the luminance signal and the color difference signal have a different color size format of 4: 4. Even in the case of 2: 0, 4: 2: 2, the position of each block of the color difference signal is represented by the coordinates of the pixel of the luminance signal included in the block area, and the unit is one pixel of the luminance signal. In this way, not only can the position of each block of the color difference signal be specified, but also the relationship between the positions of the luminance signal block and the color difference signal block can be clarified only by comparing the coordinate values.

(Inter prediction mode, reference list)
In the embodiment of the present invention, in inter prediction in which prediction is performed from an image signal of a decoded picture, a plurality of decoded pictures can be used as reference pictures. In order to identify a reference picture selected from a plurality of reference pictures, a reference index is attached to each prediction block. Any two reference pictures can be selected for each prediction block and inter prediction can be performed, and there are L0 prediction (Pred_L0), L1 prediction (Pred_L1), and bi-prediction (Pred_BI) as inter prediction modes. The reference picture is managed by L0 (reference list 0) and L1 (reference list 1) of the list structure, and the reference picture can be specified by specifying the reference index of L0 or L1. L0 prediction (Pred_L0) is inter prediction that refers to a reference picture managed in L0, L1 prediction (Pred_L1) is inter prediction that refers to a reference picture managed in L1, and bi-prediction (Pred_BI) is This is inter prediction in which both L0 prediction and L1 prediction are performed and one reference picture managed by each of L0 and L1 is referred to. Only L0 prediction can be used for inter prediction with a slice type of P slice, and L0 prediction, L1 prediction, and bi-prediction (Pred_BI) that averages or weights and adds L0 prediction and L1 prediction can be used for inter prediction with a slice type of B slice. . In the subsequent processing, it is assumed that the constants and variables with the subscript LX in the output are processed for each of L0 and L1.

(Merge mode, merge candidate)
In the merge mode, the prediction mode of the prediction block to be encoded or decoded, the inter prediction information such as the reference index and the motion vector is not encoded or decoded, but within the same picture as the prediction block to be encoded or decoded. The prediction block adjacent to the prediction block to be encoded or decoded or the same position as the prediction block to be encoded or decoded of a picture that has been encoded or decoded that is temporally different from the prediction block to be encoded or decoded. In this mode, inter prediction is performed by deriving inter prediction information of a prediction block to be encoded or decoded from inter prediction information of a prediction block existing in the vicinity (neighboring position). The prediction block adjacent to the prediction block to be encoded or decoded in the same picture as the prediction block to be encoded or decoded and the inter prediction information of the prediction block are spatial merge candidates, the prediction block and time to be encoded or decoded Prediction information derived from a prediction block existing at the same position as or near (previously near) a prediction block to be encoded or decoded of a differently encoded or decoded picture and inter prediction information of the prediction block Are time merge candidates. Each merge candidate is registered in the merge candidate list, and the merge candidate used in the inter prediction is specified by the merge index.

(About adjacent prediction blocks)
5, 6, 7 and 8 are diagrams for explaining a prediction block adjacent to a prediction block to be encoded or decoded in the same picture as the prediction block to be encoded or decoded. FIG. 9 shows an already encoded or decoded prediction existing in the same position as or near the prediction block to be encoded or decoded in a picture that has been encoded or decoded that is temporally different from the prediction block to be encoded or decoded. It is a figure explaining a block. A prediction block adjacent in the spatial direction of a prediction block to be encoded or decoded and a prediction block at the same position at different times will be described with reference to FIGS. 5, 6, 7, 8, and 9.

As shown in FIG. 5, a prediction block A adjacent to the left side of the prediction block to be encoded or decoded in the same picture as the prediction block to be encoded or decoded, and a prediction block B adjacent to the upper side, A prediction block C adjacent to the upper right vertex, a prediction block D adjacent to the lower left vertex, and a prediction block E adjacent to the upper left vertex are defined as prediction blocks adjacent in the spatial direction.

In addition, as shown in FIG. 6, when the size of the prediction block adjacent to the left side of the prediction block to be encoded or decoded is smaller than the prediction block to be encoded or decoded and there are a plurality of prediction blocks, this embodiment In the embodiment, only the lowest prediction block A10 among the prediction blocks adjacent to the left side is set as the prediction block A adjacent to the left side.

Similarly, when the size of the prediction block adjacent to the upper side of the prediction block to be encoded or decoded is smaller than the prediction block to be encoded or decoded and there are a plurality of prediction blocks, the left side in this embodiment Only the rightmost prediction block B10 among the prediction blocks adjacent to is set as the prediction block B adjacent to the upper side.

In addition, as shown in FIG. 7, even when the size of the prediction block F adjacent to the left side of the prediction block to be encoded or decoded is larger than the prediction block to be encoded or decoded, it is adjacent to the left side according to the above condition. If the prediction block F is adjacent to the left side of the prediction block to be encoded or decoded, the prediction block A is determined. If the prediction block F is adjacent to the lower left vertex of the prediction block to be encoded or decoded, the prediction block D is determined. If it is adjacent to the upper left vertex of the prediction block to be encoded or decoded, it is determined as a prediction block E. In the example of FIG. 7, the prediction block A, the prediction block D, and the prediction block E are the same prediction block.

In addition, as shown in FIG. 8, even when the size of the prediction block G adjacent to the upper side of the prediction block to be encoded or decoded is larger than the prediction block to be encoded or decoded, it is adjacent to the upper side according to the above condition. If the prediction block G is adjacent to the upper side of the prediction block to be encoded or decoded, the prediction block B is determined. If the prediction block G is adjacent to the upper right vertex of the prediction block to be encoded or decoded, the prediction block C is determined. If it is adjacent to the upper left vertex of the prediction block to be encoded or decoded, it is determined as a prediction block E. In the example of FIG. 8, the prediction block B, the prediction block C, and the prediction block E are the same prediction block.

As shown in FIG. 9, in a picture that has been encoded or decoded that is temporally different from the prediction block to be encoded or decoded, already encoded or existing at the same position as or near the prediction block to be encoded or decoded. The decoded prediction blocks T0 and T1 are defined as prediction blocks at the same position at different times.

(About POC)
POC is a variable associated with the picture to be encoded, and is set to a value that increases by 1 in the picture output order. Based on the POC value, it is possible to determine whether they are the same picture, to determine the anteroposterior relationship between pictures in the output order, or to derive the distance between pictures. For example, if the POCs of two pictures have the same value, it can be determined that they are the same picture. When the POCs of two pictures have different values, it can be determined that the picture with the smaller POC value is the picture output first, and the difference between the POCs of the two pictures indicates the inter-picture distance in the time axis direction. Show.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a moving picture coding apparatus according to an embodiment of the present invention. The moving image encoding apparatus according to the embodiment includes an image memory 101, a header information setting unit 117, a motion vector detection unit 102, a difference motion vector calculation unit 103, an inter prediction information derivation unit 104, a motion compensation prediction unit 105, and an intra prediction unit. 106, prediction method determination unit 107, residual signal generation unit 108, orthogonal transform / quantization unit 109, first encoded bit sequence generation unit 118, second encoded bit sequence generation unit 110, third encoded bit sequence generation unit 111, A multiplexing unit 112, an inverse quantization / inverse orthogonal transform unit 113, a decoded image signal superimposing unit 114, an encoded information storage memory 115, and a decoded image memory 116 are provided.

The header information setting unit 117 sets information in units of sequences, pictures, and slices. The set sequence, picture, and slice unit information is supplied to the inter prediction information deriving unit 104 and the first encoded bit string generating unit 118, and is also supplied to all blocks (not shown).

The image memory 101 temporarily stores the image signal of the encoding target picture supplied in the order of shooting / display time. The image memory 101 supplies the stored image signal of the picture to be encoded to the motion vector detection unit 102, the prediction method determination unit 107, and the residual signal generation unit 108 in units of predetermined pixel blocks. At this time, the image signals of the pictures stored in the order of shooting / display time are rearranged in the encoding order and output from the image memory 101 in units of pixel blocks.

The motion vector detection unit 102 uses the motion vector for each prediction block size and each prediction mode by block matching between the image signal supplied from the image memory 101 and the reference picture supplied from the decoded image memory 116 for each prediction block. Detection is performed in units, and the detected motion vector is supplied to the motion compensation prediction unit 105, the difference motion vector calculation unit 103, and the prediction method determination unit 107.

The difference motion vector calculation unit 103 calculates a plurality of motion vector predictor candidates by using the encoded information of the already encoded image signal stored in the encoded information storage memory 115, and generates a prediction motion vector list. The optimum motion vector predictor is selected from a plurality of motion vector predictor candidates registered and registered in the motion vector predictor list, and a motion vector difference is calculated from the motion vector detected by the motion vector detector 102 and the motion vector predictor. Then, the calculated difference motion vector is supplied to the prediction method determination unit 107. Furthermore, a prediction motion vector index that identifies a prediction motion vector selected from prediction motion vector candidates registered in the prediction motion vector list is supplied to the prediction method determination unit 107.

The inter prediction information deriving unit 104 derives merge candidates in the merge mode. A plurality of merge candidates are derived using the encoding information of the already encoded prediction block stored in the encoding information storage memory 115 and registered in a merge candidate list described later, and registered in the merge candidate list Flags predFlagL0 [xP] [yP], predFlagL1 [xP indicating whether or not to use the L0 prediction and L1 prediction of each prediction block of the selected merge candidate from among a plurality of merge candidates ] [yP], reference index refIdxL0 [xP] [yP], refIdxL1 [xP] [yP], motion vector mvL0 [xP] [yP], mvL1 [xP] [yP], etc. And a merge index for specifying the selected merge candidate is supplied to the prediction method determination unit 107. Here, xP and yP are indexes indicating the position of the upper left pixel of the prediction block in the picture. The detailed configuration and operation of the inter prediction information deriving unit 104 will be described later.

The motion compensated prediction unit 105 generates a predicted image signal by inter prediction (motion compensated prediction) from the reference picture using the motion vectors detected by the motion vector detection unit 102 and the inter prediction information deriving unit 104, and generates the predicted image signal. This is supplied to the prediction method determination unit 107. In L0 prediction and L1 prediction, one-way prediction is performed. In the case of bi-prediction (Pred_BI), bi-directional prediction is performed, the inter-predicted signals of L0 prediction and L1 prediction are adaptively multiplied by weighting factors, offset values are added and superimposed, and finally A predicted image signal is generated.

The intra prediction unit 106 performs intra prediction for each intra prediction mode. A prediction image signal is generated by intra prediction from the decoded image signal stored in the decoded image memory 116, a suitable intra prediction mode is selected from a plurality of intra prediction modes, the selected intra prediction mode, and A prediction image signal corresponding to the selected intra prediction mode is supplied to the prediction method determination unit 107.

The prediction method determination unit 107 evaluates the encoding information and the code amount of the residual signal, the distortion amount between the prediction image signal and the image signal, and the like, so that the optimum coding block unit can be selected from a plurality of prediction methods. In prediction mode PredMode to determine whether it is inter prediction (PRED_INTER) or intra prediction (PRED_INTRA), and split mode PartMode are determined. In inter prediction (PRED_INTER), it is determined whether or not it is merge mode for each prediction block. Determines the merge index, and when not in the merge mode, the inter prediction mode, the prediction motion vector index, the reference index of L0 and L1, the difference motion vector, etc. are determined, and the encoded information corresponding to the determination is determined as the second encoded bit string generation unit 110. To supply.

Furthermore, the prediction method determination unit 107 stores information indicating the determined prediction method and encoded information including a motion vector corresponding to the determined prediction method in the encoded information storage memory 115. The encoding information stored here is a flag predFlagL0 [xP] [yP], predFlagL1 [that indicates whether to use the prediction mode PredMode, the partition mode PartMode, the L0 prediction of each prediction block, and the L1 prediction of each prediction block. xP] [yP], L0, L1 reference indices refIdxL0 [xP] [yP], refIdxL1 [xP] [yP], L0, L1 motion vectors mvL0 [xP] [yP], mvL1 [xP] [yP], etc. It is. Here, xP and yP are indexes indicating the position of the upper left pixel of the prediction block in the picture. When the prediction mode PredMode is inter prediction (MODE_INTER), a flag predFlagL0 [xP] [yP] indicating whether to use L0 prediction and a flag predFlagL1 [xP] [yP] indicating whether to use L1 prediction are Both are zero. On the other hand, when the prediction mode PredMode is inter prediction (MODE_INTER) and the inter prediction mode is L0 prediction (Pred_L0), the flag predFlagL0 [xP] [yP] indicating whether or not to use L0 prediction uses 1, L1 prediction. The flag predFlagL1 [xP] [yP] indicating whether or not is zero. When the inter prediction mode is L1 prediction (Pred_L1), the flag predFlagL0 [xP] [yP] indicating whether to use L0 prediction is 0, and the flag predFlagL1 [xP] [yP] indicating whether to use L1 prediction is 1. When the inter prediction mode is bi-prediction (Pred_BI), a flag predFlagL0 [xP] [yP] indicating whether to use L0 prediction and a flag predFlagL1 [xP] [yP] indicating whether to use L1 prediction are both 1. It is. The prediction method determination unit 107 supplies a prediction image signal corresponding to the determined prediction mode to the residual signal generation unit 108 and the decoded image signal superimposition unit 114.

The residual signal generation unit 108 generates a residual signal by performing subtraction between the image signal to be encoded and the predicted image signal, and supplies the residual signal to the orthogonal transform / quantization unit 109.
The orthogonal transform / quantization unit 109 performs orthogonal transform and quantization on the residual signal according to the quantization parameter to generate an orthogonal transform / quantized residual signal, and a third encoded bit string generation unit 111 And supplied to the inverse quantization / inverse orthogonal transform unit 113. Further, the orthogonal transform / quantization unit 109 stores the quantization parameter in the encoded information storage memory 115.

The first encoded bit string generation unit 118 encodes the sequence, picture, and slice unit information set by the header information setting unit 117. A first encoded bit string is generated and supplied to the multiplexing unit 112.

The second encoded bit string generation unit 110 encodes encoding information corresponding to the prediction method determined by the prediction method determination unit 107 for each encoded block and prediction block. Specifically, in the prediction mode PredMode, partition mode PartMode, and inter prediction (PRED_INTER) for each coding block, a flag that determines whether or not the mode is merge mode, merge index in the case of merge mode, and inter prediction in the case of not in merge mode Encoding information such as information on the mode, the predicted motion vector index, and the difference motion vector is encoded according to a prescribed syntax rule described later to generate a second encoded bit string, which is supplied to the multiplexing unit 112.

The third encoded bit string generation unit 111 entropy-encodes the residual signal that has been orthogonally transformed and quantized according to a specified syntax rule to generate a third encoded bit string, and supplies the third encoded bit string to the multiplexing unit 112. The multiplexing unit 112 multiplexes the first encoded bit string, the second encoded bit string, and the third encoded bit string in accordance with a specified syntax rule, and outputs a bit stream.

The inverse quantization / inverse orthogonal transform unit 113 performs inverse quantization and inverse orthogonal transform on the orthogonal transform / quantized residual signal supplied from the orthogonal transform / quantization unit 109 to calculate a residual signal, and performs decoding. This is supplied to the image signal superimposing unit 114. The decoded image signal superimposing unit 114 superimposes the predicted image signal according to the determination by the prediction method determining unit 107 and the residual signal subjected to inverse quantization and inverse orthogonal transform by the inverse quantization / inverse orthogonal transform unit 113 to decode the decoded image. Is generated and stored in the decoded image memory 116. Note that the decoded image may be stored in the decoded image memory 116 after filtering processing for reducing distortion such as block distortion due to encoding.

FIG. 2 is a block diagram showing the configuration of the moving picture decoding apparatus according to the embodiment of the present invention corresponding to the moving picture encoding apparatus of FIG. The moving picture decoding apparatus according to the embodiment includes a separation unit 201, a first encoded bit string decoding unit 212, a second encoded bit string decoding unit 202, a third encoded bit string decoding unit 203, a motion vector calculation unit 204, and inter prediction information. A derivation unit 205, a motion compensation prediction unit 206, an intra prediction unit 207, an inverse quantization / inverse orthogonal transform unit 208, a decoded image signal superimposing unit 209, an encoded information storage memory 210, and a decoded image memory 211 are provided.

The decoding process of the moving picture decoding apparatus in FIG. 2 corresponds to the decoding process provided in the moving picture encoding apparatus in FIG. 1, so the motion compensation prediction unit 206 in FIG. The configuration of the inverse orthogonal transform unit 208, the decoded image signal superimposing unit 209, the encoded information storage memory 210, and the decoded image memory 211 is the same as that of the motion compensation prediction unit 105, the inverse quantization / inverse of the moving image encoding device in FIG. The orthogonal transform unit 113, the decoded image signal superimposing unit 114, the encoded information storage memory 115, and the decoded image memory 116 have functions corresponding to the respective configurations.

The bit stream supplied to the separation unit 201 is separated according to a rule of a prescribed syntax, and the separated encoded bit string is a first encoded bit string decoding unit 212, a second encoded bit string decoding unit 202, and a third encoded bit string. The data is supplied to the decoding unit 203.

The first encoded bit string decoding unit 212 decodes the supplied encoded bit string to obtain information in units of sequences, pictures, and slices. Although the obtained sequence, picture, and slice unit information is not shown, it is supplied to all blocks.

The second encoded bit string decoding unit 202 decodes the supplied encoded bit string to obtain encoded block unit information and predicted block unit encoded information. Specifically, in the prediction mode PredMode for determining whether the prediction is inter prediction (PRED_INTER) or intra prediction (PRED_INTRA) for each coding block, split mode PartMode, and inter prediction (PRED_INTER), a flag for determining whether the mode is merge mode, When the merge mode is selected, the merge index is decoded. When the merge mode is not selected, the encoded information related to the inter prediction mode, the predicted motion vector index, the difference motion vector, and the like is decoded according to a predetermined syntax rule to be described later, and the encoded information is a motion vector calculation unit 204, supplied to the inter prediction information deriving unit 205 or the intra prediction unit 207.

The third encoded bit string decoding unit 203 calculates a residual signal that has been orthogonally transformed / quantized by decoding the supplied encoded bit string, and dequantized / inverted the residual signal that has been orthogonally transformed / quantized. This is supplied to the orthogonal transform unit 208.

When the prediction mode PredMode of the prediction block to be decoded is inter prediction (PRED_INTER) and not the merge mode, the motion vector calculation unit 204 stores the encoded information of the already decoded image signal stored in the encoded information storage memory 210. A plurality of motion vector predictor candidates derived and registered in a motion vector predictor list, which will be described later, and a second encoded bit string decoding unit out of the motion vector predictor candidates registered in the motion vector predictor list 202, a prediction motion vector corresponding to the prediction motion vector index decoded and supplied is selected, a motion vector is calculated from the difference vector decoded by the second encoded bit string decoding unit 202 and the selected prediction motion vector, Is supplied to the motion compensation prediction unit 206 together with the encoded information of To pay. The encoding information of the prediction block supplied / stored here includes flags predFlagL0 [xP] [yP], predFlagL1 [xP] [yP indicating whether to use the prediction mode PredMode, the partition mode PartMode, the L0 prediction, and the L1 prediction. ], L0, L1 reference indices refIdxL0 [xP] [yP], refIdxL1 [xP] [yP], L0, L1 motion vectors mvL0 [xP] [yP], mvL1 [xP] [yP], and the like. Here, xP and yP are indexes indicating the position of the upper left pixel of the prediction block in the picture. When the prediction mode PredMode is inter prediction (MODE_INTER) and the inter prediction mode is L0 prediction (Pred_L0), the flag predFlagL0 indicating whether to use L0 prediction is 1, and the flag predFlagL1 indicating whether to use L1 prediction is 0 It is. When the inter prediction mode is L1 prediction (Pred_L1), the flag predFlagL0 indicating whether to use L0 prediction is 0, and the flag predFlagL1 indicating whether to use L1 prediction is 1. When the inter prediction mode is bi-prediction (Pred_BI), a flag predFlagL0 indicating whether to use L0 prediction and a flag predFlagL1 indicating whether to use L1 prediction are both 1.

The inter prediction information deriving unit 205 derives merge candidates when the prediction mode PredMode of the prediction block to be decoded is inter prediction (PRED_INTER) and in the merge mode. A plurality of merge candidates are derived and registered in a merge candidate list, which will be described later, using the encoded information of already decoded prediction blocks stored in the encoded information storage memory 115, and registered in the merge candidate list Whether a merge candidate corresponding to a merge index decoded and supplied by the second encoded bit string decoding unit 202 is selected from among a plurality of merge candidates, and whether to use the L0 prediction and the L1 prediction of the selected merge candidate Flags predFlagL0 [xP] [yP], predFlagL1 [xP] [yP], L0, L1 reference indices refIdxL0 [xP] [yP], refIdxL1 [xP] [yP], L0, L1 motion vectors mvL0 [xP] Inter prediction information such as [yP] and mvL1 [xP] [yP] is supplied to the motion compensation prediction unit 206 and stored in the encoded information storage memory 210. Here, xP and yP are indexes indicating the position of the upper left pixel of the prediction block in the picture. The detailed configuration and operation of the inter prediction information deriving unit 205 will be described later.

The motion compensation prediction unit 206 performs prediction by inter prediction (motion compensation prediction) from the reference picture stored in the decoded image memory 211 using the inter prediction information calculated by the motion vector calculation unit 204 or the inter prediction information deriving unit 205. An image signal is generated, and the predicted image signal is supplied to the decoded image signal superimposing unit 209. In the case of bi-prediction (Pred_BI), a weighted coefficient is adaptively multiplied and superimposed on the two motion-compensated predicted image signals of L0 prediction and L1 prediction to generate a final predicted image signal.

The intra prediction unit 207 performs intra prediction when the prediction mode PredMode of the prediction block to be decoded is intra prediction (PRED_INTRA). The encoded information decoded by the first encoded bit string decoding unit includes an intra prediction mode, and by intra prediction from a decoded image signal stored in the decoded image memory 211 according to the intra prediction mode. A predicted image signal is generated, and the predicted image signal is supplied to the decoded image signal superimposing unit 209. Flags predFlagL0 [xP] [yP] and predFlagL1 [xP] [yP] indicating whether to use L0 prediction and L1 prediction are both set to 0 and stored in the encoded information storage memory 210. Here, xP and yP are indexes indicating the position of the upper left pixel of the prediction block in the picture.

The inverse quantization / inverse orthogonal transform unit 208 performs inverse orthogonal transform and inverse quantization on the orthogonal transform / quantized residual signal decoded by the second encoded bit string decoding unit 202, and performs inverse orthogonal transform / An inverse quantized residual signal is obtained.

The decoded image signal superimposing unit 209 performs the prediction image signal inter-predicted by the motion compensation prediction unit 206 or the prediction image signal intra-predicted by the intra prediction unit 207 and the inverse orthogonal transform by the inverse quantization / inverse orthogonal transform unit 208. The decoded image signal is decoded by superimposing the dequantized residual signal, and stored in the decoded image memory 211. When stored in the decoded image memory 211, the decoded image may be stored in the decoded image memory 211 after filtering processing that reduces block distortion or the like due to encoding is performed on the decoded image.

Next, a syntax that is a common rule for encoding and decoding a bit stream of a moving image that is encoded by a moving image encoding device including the motion vector prediction method according to the present embodiment and decoded by the decoding device explain.

FIG. 10 shows syntax rules described in units of prediction blocks. When the value of the prediction mode PredMode of the prediction block is inter prediction (MODE_INTER), merge_flag [x0] [y0] indicating whether the mode is the merge mode is set. Here, x0 and y0 are indices indicating the position of the upper left pixel of the prediction block in the picture of the luminance signal, and merge_flag [x0] [y0] is the prediction block located at (x0, y0) in the picture It is a flag indicating whether or not merge mode.

Next, when merge_flag [x0] [y0] is 1, it indicates merge mode, and a merge list index syntax element merge_idx [x0] [y0] is set, which is a list of merge candidates to be referred to. . Here, x0 and y0 are indices indicating the position of the upper left pixel of the prediction block in the picture, and merge_idx [x0] [y0] is the merge index of the prediction block located at (x0, y0) in the picture is there. When entropy encoding or decoding a merge index, the smaller the number of merge candidates, the smaller the amount of code that can be encoded or decoded, and the smaller the amount of processing that can be encoded or decoded. FIG. 11 shows an example of the entropy code (code) of the merge index syntax element merge_idx [x0] [y0]. When the number of merge candidates is 3, by setting the maximum value of the merge index to the merge candidate number minus 1, that is, 2, the merge index becomes 0, 1, 2, and the merge index syntax element merge_idx [x0] [y0] The codes are “0”, “10”, and “11”, respectively. When the number of merge candidates is 4, the maximum value of the merge index is set to the maximum number of merge candidates minus 1, that is, 3, so that the merge index becomes 0, 1, 2, 3, and the merge index syntax element merge_idx [x0] The signs of [y0] are “0”, “10”, “110”, and “111”, respectively. When the number of merge candidates is 5, the maximum value of the merge index is set to the maximum number of merge candidates minus 1, that is, 4. When the merge index is 0, 1, 2, 3, 4, the merge index syntax element merge_idx The signs of [x0] [y0] are “0”, “10”, “110”, “1110”, and “1111”, respectively. For example, when the merge index is 2, when the number of merge candidates is 3, it becomes “11”, and when the number of merge candidates is 4 and 5, it becomes “110”. That is, when the maximum number of merge candidates can be limited, the merge index can be expressed with a short code length by limiting the maximum value of the merge index to the number of merge candidates-1. That is, when the number of merge candidates is limited, the merge index can be expressed with a smaller code amount by limiting the maximum value of the merge index to the maximum number of merge candidates−1. In the present embodiment, as shown in FIG. 11, the code amount is reduced by switching the code indicating each value of the merge index according to the number of merge candidates.

On the other hand, when merge_flag [x0] [y0] is 0, it indicates that the mode is not merge mode. When the slice type is B slice, a syntax element inter_pred_flag [x0] [y0] for identifying the inter prediction mode is installed. The syntax element identifies L0 prediction (Pred_L0), L1 prediction (Pred_L1), and bi-prediction (Pred_BI). For each of L0 and L1, syntax elements ref_idx_l0 [x0] [y0] and ref_idx_l1 [x0] [y0] of the reference index for specifying the reference picture, the motion vector and prediction of the prediction block obtained by motion vector detection The differential motion vector syntax elements mvd_l0 [x0] [y0] [j] and mvd_l1 [x0] [y0] [j], which are the differences from the motion vector, are provided. Here, x0 and y0 are indexes indicating the position of the upper left pixel of the prediction block in the picture, and ref_idx_l0 [x0] [y0] and mvd_l0 [x0] [y0] [j] are respectively (x0 , Y0) is the reference index of L0 of the prediction block and the differential motion vector, and ref_idx_l1 [x0] [y0] and mvd_l1 [x0] [y0] [j] are respectively (x0, y0) in the picture It is the reference index of L1 of the prediction block located, and a difference motion vector. Further, j represents a differential motion vector component, j represents 0 as an x component, and j represents 1 as a y component. Next, syntax elements mvp_idx_l0 [x0] [y0] and mvp_idx_l1 [x0] [y0] of an index of a predicted motion vector list that is a list of predicted motion vector candidates to be referred to are set. Here, x0 and y0 are indices indicating the position of the upper left pixel of the prediction block in the picture, and mvp_idx_l0 [x0] [y0] and mvp_idx_l1 [x0] [y0] are (x0, y0) in the picture It is the prediction motion vector index of L0 and L1 of the prediction block located. In the present embodiment of the present invention, the value of the number of candidates is set to 2.

The inter prediction information deriving method according to the embodiment is implemented in the inter prediction information deriving unit 104 of the video encoding device in FIG. 1 and the inter prediction information deriving unit 205 of the video decoding device in FIG.

The inter-prediction information derivation method according to the embodiment is performed in any of the encoding and decoding processes for each prediction block constituting the encoding block. When the prediction mode PredMode of the prediction block is inter prediction (MODE_INTER) and in the merge mode, in the case of encoding, the prediction block to be encoded using the prediction mode, reference index, and motion vector of the encoded prediction block When the prediction mode, the reference index, and the motion vector are derived, in the case of decoding, the prediction mode, the reference index, and the motion vector of the prediction block to be decoded using the prediction mode, the reference index, and the motion vector of the decoded prediction block It is carried out when deriving.

The merge mode includes the prediction block A adjacent to the left, the prediction block B adjacent to the upper side, the prediction block C adjacent to the upper right, and the prediction block D adjacent to the lower left described with reference to FIGS. 5, 6, 7, and 8. In addition to the prediction block E adjacent to the upper left, the prediction block Col (any one of T0 and T1) existing at or near the same position at different times described with reference to FIG. The inter prediction information deriving unit 104 of the moving image encoding device and the inter prediction information deriving unit 205 of the moving image decoding device register these candidates in the merge candidate list in a common order on the encoding side and the decoding side, The inter prediction information deriving unit 104 of the video encoding device determines a merge index for specifying an element of the merge candidate list, encodes it via the second encoded bit string generation unit 110, and performs inter prediction information of the video decoding device. The derivation unit 205 is supplied with the merge index decoded by the second encoded bit string decoding unit 202, selects a prediction block corresponding to the merge index from the merge candidate list, and refers to the prediction mode and reference of the selected merge candidate Motion compensation prediction is performed using inter prediction information such as an index and a motion vector.

最終 Set the final number of merge candidates finalNumMergeCand registered in the merge candidate list mergeCandList in slice units. In this embodiment, when the slice type is P slice, the final merge candidate number finalNumMergeCand is set to a smaller number than the latter, and when the slice type is B slice, the final merge candidate number finalNumMergeCand is set to a larger number than the former. Set. When the slice type is P slice, the final merge candidate number finalNumMergeCand is set to 3, and when the slice type is B slice, the final merge candidate number finalNumMergeCand is set to 5.

The inter prediction information deriving method according to the embodiment will be described with reference to the drawings. FIG. 12 is a diagram illustrating a detailed configuration of the inter prediction information deriving unit 104 of the video encoding device in FIG. FIG. 13 is a diagram illustrating a detailed configuration of the inter prediction information deriving unit 205 of the video decoding device in FIG.

The portions surrounded by the thick frame lines in FIGS. 12 and 13 indicate the inter prediction information deriving unit 104 and the inter prediction information deriving unit 205, respectively.

Furthermore, the part enclosed by the thick dotted line inside shows the operation | movement part of the inter prediction information derivation method mentioned later, and is similarly installed in the moving image decoding apparatus corresponding to the moving image encoding apparatus of embodiment. Thus, the same derivation result consistent with encoding and decoding can be obtained.

The inter prediction information deriving unit 104 includes a spatial merge candidate generating unit 130, a temporal merge candidate reference index deriving unit 131, a temporal merge candidate generating unit 132, a merge candidate registering unit 133, a merge candidate identical determining unit 134, and a merge candidate number limiting unit. 135, a merge candidate supplementing unit 136, and an encoding information selecting unit 137.

The inter prediction information deriving unit 205 includes a spatial merge candidate generating unit 230, a temporal merge candidate reference index deriving unit 231, a temporal merge candidate generating unit 232, a merge candidate registering unit 233, a merge candidate identical determining unit 234, and a merge candidate number limiting unit. 235, a merge candidate supplementing unit 236, and an encoded information selecting unit 237.

FIG. 14 is a merge candidate derivation process and a merge candidate list having functions common to the inter prediction information deriving unit 104 of the video encoding device and the inter prediction information deriving unit 205 of the video decoding device according to the embodiment of the present invention. It is a flowchart explaining the procedure of this construction process.
Hereinafter, the processes will be described in order. In the following description, a case where the slice type slice_type is a B slice will be described unless otherwise specified, but the present invention can also be applied to a P slice. However, when the slice type slice_type is a P slice, only the L0 prediction (Pred_L0) is provided as the inter prediction mode, and there is no L1 prediction (Pred_L1) and bi-prediction (Pred_BI). Therefore, the processing related to L1 can be omitted.

In the spatial merge candidate generation unit 130 of the inter prediction information deriving unit 104 of the moving image encoding device and the spatial merge candidate generation unit 230 of the inter prediction information deriving unit 205 of the moving image decoding device, each adjacent to the block to be encoded or decoded. Spatial merge candidates A, B, C, D, and E are derived from the prediction blocks A, B, C, D, and E. Here, N indicating any of A, B, C, D, E, or Col is defined. A flag availableFlagN indicating whether the inter prediction information of the prediction block N can be used as a merge candidate N, a reference index refIdxL0N of L0 and a reference index refIdxL1N of L1, and an L0 prediction flag predFlagL0N and L1 prediction indicating whether L0 prediction is performed. The L1 prediction flag predFlagL1N indicating whether or not to be performed, the L0 motion vector mvL0N, and the L1 motion vector mvL1N are output (step S101).
The detailed processing procedure of step S101 will be described later in detail using the flowchart of FIG.

Subsequently, the temporal merge candidate reference index deriving unit 131 of the inter prediction information deriving unit 104 of the video encoding device and the temporal merge candidate reference index deriving unit 231 of the inter prediction information deriving unit 205 of the video decoding device A reference index of a temporal merge candidate is derived from a prediction block adjacent to the block to be converted or decoded (step S102). When the inter prediction is performed using the inter prediction information of the temporal merge candidate when the slice type slice_type is P slice, only the L0 reference index is derived in order to perform the L0 prediction (Pred_L0), and the slice type slice_type is the B slice. When performing inter prediction using inter prediction information of temporal merge candidates, in order to perform bi-prediction (Pred_BI), respective reference indexes of L0 and L1 are derived. The detailed processing procedure of step S102 will be described later in detail using the flowchart of FIG.

Subsequently, in the temporal merge candidate generating unit 132 of the inter prediction information deriving unit 104 of the video encoding device and the temporal merge candidate generating unit 232 of the inter prediction information deriving unit 205 of the video decoding device, time from pictures at different times A flag availableFlagCol indicating whether merge candidates can be derived and used, an L0 prediction flag predFlagL0Col indicating whether L0 prediction is performed, an L1 prediction flag predFlagL1Col indicating whether L1 prediction is performed, and a motion vector mvL0N of L0 The motion vector mvL1N of L1 is output (step S103). The detailed processing procedure of step S103 will be described later in detail using the flowchart of FIG.

Subsequently, the merge candidate registration unit 133 of the inter prediction information deriving unit 104 of the video encoding device and the merge candidate registration unit 233 of the inter prediction information deriving unit 205 of the video decoding device create a merge candidate list mergeCandList and perform prediction. Vector candidates A, B, C, D, E, and Col are added (step S104). The detailed processing procedure of step S104 will be described later in detail using the flowchart of FIG.

Subsequently, in the merge candidate identity determination unit 134 of the inter prediction information deriving unit 104 of the video encoding device and the merge candidate identity determination unit 234 of the inter prediction information deriving unit 205 of the video decoding device, in the merge candidate list mergeCandList, When the motion vectors of the same reference index have the same value, the merge candidate is removed except for the merge candidate in the smallest order (step S105).

Subsequently, the merge candidate number limiting unit 135 of the inter prediction information deriving unit 104 of the video encoding device and the merge candidate number limiting unit 235 of the inter prediction information deriving unit 205 of the video decoding device are registered in the merge candidate list mergeCandList. If the number of merge candidates registered in the merge candidate list mergeCandList is greater than the final merge candidate number finalNumMergeCand (YES in step S106), the index i in the merge candidate list mergeCandList is counted. The merge candidates are limited to the final merge candidate number finalNumMergeCand by deleting all merge candidates that are greater than (finalNumMergeCand-1), and the value of the merge candidate number numMergeCand registered in the merge candidate list mergeCandList is set as the final merge candidate The number is updated to finalNumMergeCand (step S107).

Subsequently, the merge candidate supplementing unit 136 of the inter prediction information deriving unit 104 of the video encoding device and the merge candidate supplementing unit 236 of the inter prediction information deriving unit 205 of the video decoding device are registered in the merge candidate list mergeCandList. If the number of merge candidates numMergeCand is smaller than the final merge candidate number finalNumMergeCand (YES in step S108), the merge candidate number numMergeCand registered in the merge candidate list mergeCandList supplements the merge candidates with the final merge candidate number finalNumMergeCand as the upper limit. Then, the value of the merge candidate number numMergeCand registered in the merge candidate list mergeCandList is updated to the final merge candidate number finalNumMergeCand (step S109). With the final merge candidate number finalNumMergeCand as the upper limit, in the P slice, merge candidates having a motion vector of (0, 0) (both horizontal and vertical components are 0) and a prediction mode of L0 prediction (Pred_L0) are added at different reference indexes. In the B slice, the prediction mode in which the combination of L0 prediction and L1 prediction between already registered merge candidates is changed is a bi-prediction (Pred_BI) merge candidate or a motion vector is predicted with a different reference index (0, 0). Add merge candidates whose mode is bi-prediction (Pred_BI).

In the present embodiment, the final merge candidate number finalNumMergeCand is set to a fixed number for each slice. The reason for fixing the final merge candidate number finalNumMergeCand is that if the final merge candidate number finalNumMergeCand fluctuates depending on the build status of the merge candidate list, there will be a dependency relationship between the entropy decoding and the build of the merge candidate list. This is because the merge index cannot be entropy-decoded unless the candidate list is constructed and the final merge candidate number finalNumMergeCand is derived, resulting in delay in merge index decoding and complicated entropy decoding. Furthermore, if entropy decoding depends on the construction state of the merge candidate list including the merge candidate Col derived from the prediction block of the picture at different times, the current picture when an error occurs when decoding the encoded bit string of another picture The encoded bit string of the above is also affected by the error, so that the normal final merge candidate number finalNumMergeCand cannot be derived and the entropy decoding cannot be continued normally. If the final merge candidate number finalNumMergeCand is set to a fixed number in slice units as in this embodiment, derivation of the final merge candidate number finalNumMergeCand in prediction block units is not required, and merge is performed independently of the merge candidate list construction. The index can be entropy-decoded, and entropy decoding of the encoded bit string of the current picture can be continued without being affected by an error when an encoded bit string of another picture is decoded.

Next, a method for deriving the merge candidate N from the prediction block N adjacent to the encoding or decoding target block, which is the processing procedure of step S101 in FIG. 14, will be described in detail. FIG. 15 is a flowchart for explaining the spatial merge candidate derivation processing procedure in step S101 of FIG. In N, A (left side), B (upper side), C (upper right), D (lower left), or E (upper left) representing an area of an adjacent prediction block is entered. In this embodiment, the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is set to 4, and a maximum of four spatial merge candidates are derived from five adjacent prediction blocks. It is also possible to set an upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates registered in the merge candidate list mergeCandList in slice units. Setting a large value for the maximum number of spatial merge candidates maxNumSpatialMergeCand increases the selection efficiency of merge candidates and improves the coding efficiency, but increases the number of merge candidates that are subject to the same merge candidate determination. The amount of processing for the same determination increases. Therefore, in order to reduce the processing amount, an upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates may be set in units of slices, and the processing amount for spatial merge candidate derivation and the merge candidate identity determination may be controlled. For example, when the slice type is P slice, since the final merge candidate number finalNumMergeCand is set to 3 in this embodiment, there is a possibility that even if four spatial merge candidates are derived, they are deleted in the process of step S107 in FIG. Therefore, even if the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is set to 3 or 2, a decrease in encoding efficiency is suppressed to be small, and the merge candidate identity determination unit 134 and the video decoding in the inter prediction information deriving unit 104 of the video encoding device The amount of merge candidate identity determination processing in the merge candidate identity determination unit 234 of the inter prediction information deriving unit 205 of the apparatus can be reduced.

In FIG. 15, the encoding information of the prediction block A adjacent to the left side of the prediction block to be encoded or decoded is examined with the variable N as A to derive the merge candidate A, and the prediction block adjacent to the upper side with the variable N as B The encoding information of B is examined to derive a merge candidate B, the variable N is set as C, the encoding information of the prediction block C adjacent on the upper right side is checked, the merge candidate C is derived, and the variable N is set as D to the lower left side. The encoding information of the adjacent prediction block D is checked to derive the merge candidate D, and the variable N is set as E to check the encoding information of the prediction block E adjacent to the upper left to derive the merge candidate E (step S1101 to step S1101). S1112).

First, when the total number of spatial merge candidates that can be derived so far (availableFlag is 1) is the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates (YES in step S1102), that is, when four spatial merge candidates are derived The flag availableFlagN of the merge candidate N is set to 0 (step S1105), the values of the motion vectors mvL0N and mvL1N of the merge candidate N are set to (0, 0) (step S1106), and the flag predFlagL0N of the merge candidate N is set. Both values of predFlagL1N are set to 0 (step S1107), and this space merge candidate derivation process is terminated.
In the present embodiment, four merge candidates are derived from adjacent prediction blocks. Therefore, when four spatial merge candidates are already derived, it is not necessary to perform further spatial merge candidate derivation processing.

On the other hand, when the sum of spatial merge candidates that can be derived so far (availableFlag is 1) is not the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates (NO in step S1102), it is adjacent to the prediction block to be encoded or decoded The prediction block N is specified, and when each prediction block N can be used, the encoding information of the prediction block N is acquired from the encoding information storage memory 115 or 210 (step S1103).

When the adjacent prediction block N cannot be used or the prediction mode PredMode of the prediction block N is intra prediction (MODE_INTRA) (NO in step S1104), the value of the flag availableFlagN of the merge candidate N is set to 0 (step S1105). The values of the motion vectors mvL0N and mvL1N of the merge candidate N are both set to (0, 0) (step S1106), and the values of the flags predFlagL0N and predFlagL1N of the merge candidate N are both set to 0 (step S1107).

On the other hand, when the adjacent prediction block N can be used and the prediction mode PredMode of the prediction block N is not intra prediction (MODE_INTRA) (YES in step S1104), the inter prediction information of the prediction block N is used as the inter prediction information of the merge candidate N. . The value of the flag availableFlagN of the merge candidate N is set to 1 (step S1108), and the motion vectors mvL0N and mvL1N of the merge candidate N are respectively used as the motion vectors mvL0N [xN] [yN] and mvL1N [xN] [yN] of the prediction block N. (Step S1109), the reference indexes refIdxL0N and refIdxL1N of the merge candidate N are set to the same values as the reference indexes refIdxL0 [xN] [yN] and refIdxL1 [xN] [yN] of the prediction block N, respectively ( In step S1110), the flags predFlagL0N and predFlagL1N of the merge candidate N are set to the flags predFlagL0 [xN] [yN] and predFlagL1 [xN] [yN] of the prediction block N, respectively (step S1111). Here, xN and yN are indexes indicating the position of the upper left pixel of the prediction block N in the picture.

The processes in steps S1102 to S1111 are repeated for N = A, B, C, D, and E (steps S1101 to S1112).

Next, a method for deriving the reference index of the time merge candidate in S102 of FIG. 14 will be described in detail. The reference indices of L0 and L1 as temporal merge candidates are derived.

In the present embodiment, the reference index of the temporal merge candidate is derived using the reference index of the spatial merge candidate, that is, the reference index used in the prediction block adjacent to the block to be encoded or decoded. This is because when the temporal merge candidate is selected, the reference index of the prediction block to be encoded or decoded has high correlation with the reference index of the prediction block adjacent to the encoding or decoding target block to be the spatial merge candidate. It is. In particular, in the present embodiment, only the reference indexes of the prediction block A adjacent to the left side of the prediction block to be encoded or decoded and the prediction block B adjacent to the upper side are used. This is because the prediction blocks A and B that are in contact with the sides of the prediction block to be encoded or decoded among the adjacent prediction blocks A, B, C, D, and E that are also spatial merge candidates are predicted to be encoded or decoded. This is because the correlation is higher than prediction blocks C, D, and E that are in contact with only the vertices of the block. By limiting the prediction blocks to be used to the prediction blocks A and B without using the prediction blocks C, D, and E having relatively low correlation, the effect of improving the coding efficiency by deriving the reference index of the temporal merge candidate In addition, the amount of computation and the amount of memory access related to the reference index derivation process of the temporal merge candidate are reduced.

In the present embodiment, both the prediction block A and the prediction block B are LX predictions (L0 or L1, the list from which reference indexes of temporal merge candidates are derived is LX, and prediction using LX is LX prediction. In this case, the smaller LX reference index value of the prediction block A and the prediction block B is adopted as the LX reference index value of the temporal merge candidate. However, when only one of the prediction block A and the prediction block B is subjected to LX prediction, the LX reference index value of the prediction block on which LX prediction is performed is adopted as the LX reference index value of the temporal merge candidate. When both the prediction block A and the prediction block B do not perform LX prediction, the value of the reference index of the LX that is a temporal merge candidate is set to 0 as the default value.

When both the prediction block A and the prediction block B do not perform LX prediction, the default value of the reference index of the LX that is a temporal merge candidate is set to 0 because the reference picture corresponding to the reference index value of 0 is the most in inter prediction. This is because the probability of being selected is high. However, the present invention is not limited to this, and the default value of the reference index may be a value other than 0 (1, 2, etc.), and the default value of the reference index may be set in the encoded stream in sequence units, picture units, or slice units. The syntax element shown may be installed and transmitted so that it can be selected on the encoding side.

FIG. 16 is a flow chart for explaining the procedure for deriving the reference index of the time merge candidate in step S102 of FIG. 14 of the present embodiment. First, the encoding information of the prediction block A adjacent to the left and the encoding information of the prediction block B are acquired from the encoding information storage memory 115 or 210 (steps S2101 and S2102).
The subsequent processing from step S2104 to step S2110 is performed in each of L0 and L1 (steps S2103 to S2111). Note that LX is set to L0 when deriving the L0 reference index of the temporal merge candidate, and LX is set to L1 when deriving the L1 reference index. However, when the slice type slice_type is a P slice, only the L0 prediction (Pred_L0) is provided as the inter prediction mode, and there is no L1 prediction (Pred_L1) and bi-prediction (Pred_BI). Therefore, the processing related to L1 can be omitted.

When both the flag predFlagLX [xA] [yA] indicating whether to perform LX prediction of the prediction block A and the flag predFlagLX [xB] [yB] indicating whether to perform LX prediction of the prediction block B are not 0 (in step S2104) YES), the LX reference index refIdxLXCol of the temporal merge candidate is the smaller of the LX reference index refIdxLX [xA] [yA] of the prediction block A and the LX reference index refIdxLX [xB] [yB] of the prediction block B The same value is set (step S2105). Here, xA and yA are indexes indicating the position of the upper left pixel of the prediction block A in the picture. Here, xB and yB are indexes indicating the position of the upper left pixel of the prediction block B in the picture.

In the present embodiment, in the prediction block N (N = A, B), when the prediction block N cannot be used outside the slice to be encoded or decoded, the prediction block N is encoded or encoded in the decoding order. Alternatively, a flag indicating whether or not to use L0 prediction when it is not encoded and decoded after the prediction block to be decoded and cannot be used, or when the prediction mode PredMode of the prediction block N is inter prediction (MODE_INTER). Both predFlagL0 [xN] [yN] and the flag predFlagL1 [xN] [yN] indicating whether to use the L1 prediction of the prediction block N are 0. Here, xN and yN are indexes indicating the position of the upper left pixel of the prediction block N in the picture.
When the prediction mode PredMode of the prediction block N is inter prediction (MODE_INTER) and the inter prediction mode is L0 prediction (Pred_L0), the flag predFlagL0 [xN] [yN] indicating whether to use the L0 prediction of the prediction block N is 1 , Flag predFlagL1 [xN] [yN] indicating whether to use L1 prediction is zero. When the inter prediction mode of the prediction block N is L1 prediction (Pred_L1), a flag predFlagL0 [xN] [yN] indicating whether or not to use the L0 prediction of the prediction block N is 0 and a flag indicating whether or not to use the L1 prediction predFlagL1 [xN] [yN] is 1. When the inter prediction mode of the prediction block N is bi-prediction (Pred_BI), a flag predFlagL0 [xN] [yN] indicating whether to use the L0 prediction of the prediction block N, and a flag predFlagL1 [indicating whether to use the L1 prediction xN] [yN] are both 1.

When the flag predFlagLX [xA] [yA] indicating whether to perform LX prediction of the prediction block A is not 0 and the flag predFlagLX [xB] [yB] indicating whether to perform LX prediction of the prediction block B is 0 (NO in step S2104, YES in step S2106), the LX reference index refIdxLXCol of the temporal merge candidate is set to the same value as the LX reference index refIdxLX [xA] [yA] of the prediction block A (step S2107). Here, xA and yA are indexes indicating the position of the upper left pixel of the prediction block A in the picture, and xB and yB are indexes indicating the position of the upper left pixel of the prediction block B in the picture.

When the flag predFlagLX [xA] [yA] indicating whether to perform LX prediction of the prediction block A is 0 and the flag predFlagLX [xB] [yB] indicating whether to perform LX prediction of the prediction block B is not 0 ( In step S2104 NO, step S2106 NO, step S2108 YES), the LX reference index refIdxLXCol of the temporal merge candidate is set to the same value as the LX reference index predFlagLX [xB] [yB] of the prediction block B ( Step S2109).

When the flag predFlagLX [xA] [yA] indicating whether to perform LX prediction of the prediction block A and the flag predFlagLX [xB] [yB] indicating whether to perform LX prediction of the prediction block B are both 0 (step S2104) NO in step S2106 and NO in step S2108), the reference index refIdxLXCol of the LX that is a candidate for time merge is set to a default value of 0 (step S2110).

The processing from step S2104 to step S2110 performed in each of L0 and L1 is performed (steps S2103 to S2111), and this reference index derivation processing is terminated.

Next, the method for deriving merge candidates at different times in S103 of FIG. 14 will be described in detail. FIG. 17 is a flowchart illustrating the time merge candidate derivation processing procedure in step S103 of FIG.

First, as shown in an example of a syntax rule that is a common rule for encoding and decoding a bitstream in FIG. 26, a slice type slice_type described in a slice header in units of slices and a candidate for a predicted motion vector in the time direction, or Flag collocated_from_l0_flag indicating whether a picture colPic at a different time used when deriving merge candidates uses a reference picture registered in the L0 reference list or the L1 reference list of the picture including the prediction block to be processed Thus, pictures colPic at different times are derived (step S3101).

FIG. 18 is a flowchart for explaining the procedure for deriving the picture colPic at different times in step S3101 of FIG. When the slice type slice_type is a B slice and the flag collocated_from_l0_flag is 0 (YES in step S3201 and YES in step S3202), RefPicList1 [0], that is, the picture colPic of the reference list L1 with a reference index of 0 is a different time. (Step S3203). Otherwise, that is, when the slice type slice_type is B slice and the aforementioned flag collocated_from_l0_flag is 1 (YES in step S3201, NO in step S3202), or when the slice type slice_type is P slice (NO in step S3201 and YES in S3204) ), RefPicList0 [0], that is, the picture colPic of the reference list L0 with the reference index 0 is a different time (step S3205).

Next, returning to the flowchart of FIG. 17, a prediction block colPU at a different time is derived, and encoded information is acquired (step S3102).

FIG. 19 is a flowchart for explaining the procedure for deriving the prediction block colPU of the picture colPic at different times in step S3102 of FIG.

First, a prediction block located in the lower right (outside) of the same position as the processing target prediction block in the picture colPic at different times is set as a prediction block colPU at different times (step S3301). This prediction block corresponds to the prediction block T0 in FIG.

Next, the encoding information of the prediction block colPU at different times is acquired (step S3302). When PredMode of a prediction block colPU at a different time cannot be used, or when the prediction mode PredMode of a prediction block colPU at a different time is intra prediction (MODE_INTRA) (YES in step S3303, YES in step S3304), the picture colPic in a different time The prediction block located in the upper left center of the same position as the processing target prediction block is set as a prediction block colPU at a different time (step S3305). This prediction block corresponds to the prediction block T1 in FIG.

Next, returning to the flowchart of FIG. 17, a flag indicating whether or not the L0 motion vector predictor mvL0Col and the temporal merge candidate Col derived from the prediction block of another picture at the same position as the prediction block to be encoded or decoded are valid. In addition to deriving availableFlagL0Col (step S3103), a flag availableFlagL1Col indicating whether the prediction motion vector mvL1Col of L1 and the temporal merge candidate Col are valid is derived (step S3104). Further, when the flag availableFlagL0Col or the flag availableFlagL1Col is 1, the flag availableFlagCol indicating whether or not the time merge candidate Col is valid is set to 1.

FIG. 20 is a flowchart for explaining the process of deriving inter prediction information of the temporal merge candidate in step S3103 and step S3104 in FIG. In L0 or L1, the list of time merge candidate derivation targets is LX, and the prediction using LX is LX prediction. Hereinafter, unless otherwise noted, this meaning is used. LX becomes L0 when called as step S3103, which is a time merge candidate L0 derivation process, and LX becomes L1, when called as step S3104, which is a time merge candidate L1 derivation process.

When the prediction mode PredMode of the prediction block colPU at different times is intra prediction (MODE_INTRA) or cannot be used (NO in step S3401 and NO in step S3402), both the flag availableFlagLXCol and the flag predFlagLXCol are set to 0 (step S3403), and the motion vector mvLXCol Is set to (0, 0) (step S3404), and the process of deriving inter prediction information of the current time merge candidate ends.

When the prediction block colPU is available and the prediction mode PredMode is not intra prediction (MODE_INTRA) (YES in step S3401 and YES in step S3402), mvCol, refIdxCol, and availableFlagCol are derived by the following procedure.

When the flag PredFlagL0 [xPCol] [yPCol] indicating whether or not the L0 prediction of the prediction block colPU is used (YES in step S3405), the prediction mode of the prediction block colPU is Pred_L1, and therefore the motion vector mvCol is predicted. The same value as MvL1 [xPCol] [yPCol] that is the L1 motion vector of the block colPU is set (step S3406), and the reference index refIdxCol is set to the same value as the reference index RefIdxL1 [xPCol] [yPCol] of L1 (step S3406). In step S3407, the list ListCol is set to L1 (step S3408). Here, xPCol and yPCol are indexes indicating the position of the upper left pixel of the prediction block colPU in the picture colPic at different times.

On the other hand, when the L0 prediction flag PredFlagL0 [xPCol] [yPCol] of the prediction block colPU is not 0 (NO in step S3405 in FIG. 20), it is determined whether the L1 prediction flag PredFlagL1 [xPCol] [yPCol] of the prediction block colPU is 0. To do. When the L1 prediction flag PredFlagL1 [xPCol] [yPCol] of the prediction block colPU is 0 (YES in step S3409), the motion vector mvCol becomes the same value as MvL0 [xPCol] [yPCol], which is the L0 motion vector of the prediction block colPU. It is set (step S3410), the reference index refIdxCol is set to the same value as the reference index RefIdxL0 [xPCol] [yPCol] of L0 (step S3411), and the list ListCol is set to L0 (step S3412).

When the L0 prediction flag PredFlagL0 [xPCol] [yPCol] of the prediction block colPU and the L1 prediction flag PredFlagL1 [xPCol] [yPCol] of the prediction block colPU are not 0 (NO in step S3405, NO in step S3409), the prediction block colPU Since the inter prediction mode is bi-prediction (Pred_BI), one of the two motion vectors L0 and L1 is selected (step S3413).

FIG. 21 is a flowchart showing a procedure for deriving inter prediction information of temporal merge candidates when the inter prediction mode of the prediction block colPU is bi-prediction (Pred_BI).

First, it is determined whether the POCs of all the pictures registered in all the reference lists are smaller than the POC of the current encoding or decoding target picture (step S3501), and L0, which is all the reference lists of the prediction block colPU. When the POC of all the pictures registered in L1 is smaller than the POC of the current encoding or decoding target picture (YES in step S3501), LX is L0, that is, the motion vector of L0 of the encoding or decoding target picture When the prediction vector candidate is derived (YES in step S3502), the L0 inter prediction information of the prediction block colPU is selected, and LX is L1, that is, the prediction of the motion vector of the encoding or decoding target picture L1. When the vector candidate is derived (NO in step S3502), the prediction block Inter prediction information for L1 of the lock colPU is selected. On the other hand, when at least one of the POCs of pictures registered in all the reference lists L0 and L1 of the prediction block colPU is larger than the POC of the current encoding or decoding target picture (NO in step S3501), the flag collocated_from_l0_flag is 0. (YES in step S3503), the L0 inter prediction information of the prediction block colPU is selected. If the flag collocated_from_l0_flag is 1 (NO in step S3503), the L1 inter prediction information of the prediction block colPU is selected. To do.

When the inter prediction information of L0 of the prediction block colPU is selected (YES in step, YES in step S3503), the motion vector mvCol is set to the same value as MvL0 [xPCol] [yPCol] (step S3504), and the reference index refIdxCol is set to the same value as RefIdxL0 [xPCol] [yPCol] (step S3505), and the list ListCol is set to L0 (step S3506).

When the inter prediction information of L1 of the prediction block colPU is selected (NO in step S2502 and NO in step S3503), the motion vector mvCol is set to the same value as MvL1 [xPCol] [yPCol] (step S3507), see The index refIdxCol is set to the same value as RefIdxL1 [xPCol] [yPCol] (step S3508), and the list ListCol is set to L1 (step S3509).

Referring back to FIG. 20, if inter prediction information can be acquired from the prediction block colPU, both the flag availableFlagLXCol and the flag predFlagLXCol are set to 1 (step S3414).

Subsequently, the motion vector mvCol is scaled to be a time merge candidate LX motion vector mvLXCol (step S3415). The motion vector scaling calculation processing procedure will be described with reference to FIGS.

FIG. 22 is a flowchart showing the motion vector scaling calculation processing procedure in step S3415 of FIG.

The inter-picture distance td is derived by subtracting the POC of the reference picture corresponding to the reference index refIdxCol referenced in the list ListCol of the prediction block colPU from the POC of the picture colPic at different times (step S3601). Note that when the POC of the reference picture referenced in the list Col of the prediction block colPU is earlier in the display order than the picture colPic at a different time, the inter-picture distance td is a positive value, and the prediction is more accurate than the picture colPic at a different time. When the POC of the reference picture referenced in the list ListCol of the block colPU is later in the display order, the inter-picture distance td is a negative value.
td = POC of picture colPic at different time-POC of reference picture referenced by list ListCol of prediction block colPU

The inter-picture distance tb is derived by subtracting the POC of the reference picture corresponding to the LX reference index of the temporal merge candidate derived in step S102 of FIG. 14 from the current encoding or decoding target picture POC (step S3602). . If the reference picture referenced in the list LX of the current encoding or decoding target picture is earlier than the current encoding or decoding target picture in the display order, the inter-picture distance tb is a positive value. When the reference picture referred to in the encoding or decoding target picture list LX is later in the display order, the inter-picture distance tb is a negative value.
tb = POC of current encoding or decoding target picture—POC of reference picture corresponding to LX reference index of temporal merge candidate

Subsequently, the inter-picture distances td and tb are compared (step S3603). If the inter-picture distances td and tb are equal (YES in step S3603), the LX motion vector mvLXCol as a temporal merge candidate is set to the same value as the motion vector mvCol. After setting (step S3604), the scaling calculation process is terminated.
mvLXCol = mvCol

On the other hand, when the inter-picture distances td and tb are not equal (NO in step S3603), scaling calculation processing is performed by multiplying mvCol by the scaling coefficient tb / td according to the following equation (step S3605), and the scaled time merge candidate Obtain the motion vector mvLXCol of LX.
mvLXCol = tb / td * mvCol

Further, FIG. 23 shows an example in which the scaling operation in step S3605 is performed with an integer precision operation. The processing from step S3606 to step S3608 in FIG. 23 corresponds to the processing in step S3605 in FIG.

First, similarly to the flowchart of FIG. 22, an inter-picture distance td and an inter-picture distance tb are derived (steps S3601 and S3602).

Subsequently, the inter-picture distances td and tb are compared (step S3603). If the inter-picture distances td and tb are equal (YES in step S3603), the LX motion vector mvLXCol as a temporal merge candidate is obtained, as in the flowchart of FIG. Is set to the same value as the motion vector mvCol (step S3604), and this scaling calculation process is terminated.
mvLXCol = mvCol

On the other hand, if the inter-picture distances td and tb are not equal (NO in step S3603), a variable tx is derived from the following equation (step S3606).
tx = (16384 + Abs (td / 2)) / td

Subsequently, the scaling coefficient DistScaleFactor is derived by the following equation (step S3607).
DistScaleFactor = (tb * tx + 32) >> 6

Subsequently, a scaled temporal merge candidate LX motion vector mvLXCol is obtained by the following equation (step S3608).
mvLXCol = ClipMv (Sign (DistScaleFactor * mvCol) * ((Abs (DistScaleFactor * mvCol) + 127) >> 8))

Next, a method for registering the merge candidate in step S104 of FIG. 14 in the merge candidate list will be described in detail. FIG. 24 is a flowchart showing the procedure for registering merge candidates in the merge candidate list. In this method, priorities are assigned and the motion vector candidates are registered in the merge candidate list mergeCandList in descending order of priority, thereby reducing the code amount of the merge index merge_idx [x0] [y0]. The amount of codes is reduced by placing elements with higher priorities in front of the merge candidate list. For example, when there are five elements in the merge candidate list mergeCandList, the index 0 of the merge candidate list is “0”, the index 1 is “10”, the index 2 is “110”, the index 3 is “1110”, and the index 4 is “ By setting “11110”, the code amount representing the index 0 becomes 1 bit, and the code amount is reduced by registering an element considered to have a high occurrence frequency in the index 0.

The merge candidate list mergeCandList has a list structure, and is provided with a merge index indicating the location in the merge candidate list and a storage area for storing merge candidates corresponding to the index as elements. The number of the merge index starts from 0, and merge candidates are stored in the storage area of the merge candidate list mergeCandList. In the subsequent processing, a prediction block that is a merge candidate of the merge index i registered in the merge candidate list mergeCandList is represented by mergeCandList [i], and is distinguished from the merge candidate list mergeCandList by array notation. To do.

First, when availableFlagA is 1 (YES in step S4101), merge candidate A is registered at the head of the merge candidate list mergeCandList (step S4102).
Subsequently, when availableFlagB is 1 (YES in step S4103), merge candidate B is registered at the end of the merge candidate list mergeCandList (step S4104).
Subsequently, when availableFlagC is 1 (YES in step S4105), the merge candidate C is registered at the end of the merge candidate list mergeCandList (step S4106).
Subsequently, when availableFlagD is 1 (YES in step S4107), the merge candidate D is registered at the end of the merge candidate list mergeCandList (step S4108).
Subsequently, when availableFlagE is 1 (YES in step S4109), the merge candidate E is registered at the end of the merge candidate list mergeCandList (step S4110).
Subsequently, when availableFlagCol is 1 (YES in step S4111), the merge candidate Col is registered at the end of the merge candidate list mergeCandList (step S4112).

In the merge mode, the prediction block A adjacent to the left and the prediction block B adjacent to the upper side often move together with the prediction block to be encoded or decoded, so inter prediction of the prediction blocks A and B is performed. When the information can be acquired, the merge candidates A and B are registered ahead of the merge candidate list in preference to the other merge candidates C, D, E, and Col.

In FIG. 12, the encoding information selection unit 137 of the inter prediction information deriving unit 104 of the moving image encoding apparatus selects merge candidates from the merge candidates registered in the merge candidate list, and merge indexes and merge indexes. The inter prediction information of the merge candidate corresponding to is supplied to the motion compensation prediction unit 105.

In the selection of merge candidates, the same method as the prediction method determination unit 107 can be used. For each merge candidate, the coding information and the coding amount of the residual signal and the coding distortion between the predicted image signal and the image signal are derived, and the merge candidate that produces the least generated code amount and coding distortion is determined. The For each merge candidate, entropy coding is performed on the merge index syntax element merge_idx, which is coding information in the merge mode, and the code amount of the coding information is calculated. Further, for each merge candidate, a predicted image signal motion-compensated according to the inter prediction information of each merge candidate by the same method as the motion compensation prediction unit 105, and an encoding target image signal supplied from the image memory 101, The amount of code of the prediction residual signal obtained by encoding the prediction residual signal is calculated. Coding information, that is, the total generated code amount obtained by adding the code amount of the merge index and the code amount of the prediction residual signal is calculated as an evaluation value.

Further, after encoding such a prediction residual signal, it is decoded for distortion amount evaluation, and the encoding distortion is calculated as a ratio representing an error from the original image signal caused by the encoding. By comparing the total generated code amount and the encoding distortion for each merge candidate, encoding information with a small generated code amount and encoding distortion is determined. A merge index corresponding to the determined encoding information is encoded as a flag merge_idx represented by a second syntax pattern in units of prediction blocks.
The generated code amount calculated here is preferably a simulation of the encoding process, but can be approximated or approximated easily.

On the other hand, in FIG. 13, the encoding information selection unit 237 of the inter prediction information deriving unit 205 of the moving image encoding device performs a merge corresponding to the supplied merge index from among the merge candidates registered in the merge candidate list. A candidate is selected, and the inter prediction information of the merge candidate is supplied to the motion compensation prediction unit 206 and stored in the encoded information storage memory 210.

In the present embodiment described above, the final number of merge candidates finalNumMergeCand registered in the merge candidate list mergeCandList is set for each slice. Hereinafter, this embodiment will be described by dividing it into several examples. First, Example 1 of the present embodiment will be described. In Example 1 of this embodiment, a final merge candidate number finalNumMergeCand that is common between the encoding side and the decoding side is defined for each slice type. MPEG-4 AVC / H. Similarly to H.264, the apparatus, software, or bitstream conforming to the present embodiment is a processing load mainly related to a profile representing a set of processing functions defined by purpose and application, and an image size and a frame rate. And a level representing processing capability such as the amount of memory used can be defined, and the profile and level indicate the performance of the device or software, or the performance required to decode the bitstream. The number of final merge candidates finalNumMergeCand may be specified for each slice type according to either profile or level, or the combination of profile and level, and the final merge candidate number for each slice type regardless of profile or level You may specify the value of finalNumMergeCand. For example, in a profile configured with a simple function of encoding or decoding using only I slices and P slices, the final merge candidate number finalNumMergeCand of P slices is defined as 3. In a profile configured with a complicated and efficient coding function that uses B slices in addition to I slices and P slices, the same number of final merge candidates finalNumMergeCand for P slices and B slices. However, by defining the final merge candidate number finalNumMergeCand of the P slice to 3 which is smaller than the final merge candidate number finalNumMergeCand of the B slice, the code amount of the merge index of the P slice can be reduced. It is possible to reduce the amount of processing involved in encoding or decoding the merge index.

When the slice type is a P slice that can use only L0 prediction, inter prediction is less likely to be selected than a B slice that can use L0 prediction, L1 prediction, and bi-prediction, so a merge candidate registered in the merge candidate list is obtained. In addition, since the inter prediction information of the merge candidates is likely to be the same, the number of merge candidates registered in the merge candidate list is likely to be small. Therefore, in the P slice, even if the final number of merge candidates is set smaller than that in the B slice, the encoding efficiency does not decrease as in the B slice, the code amount of the merge index is suppressed, and the merge index is encoded or decoded. The amount of processing involved can be reduced.
One reason for encoding or decoding with P slices rather than B slices with high encoding efficiency is that P slices require less processing. In particular, since a profile configured with a simple function of encoding or decoding using only I slices and P slices is set for encoding or decoding with a small amount of processing, the final merge candidate number of P slices The effect of reducing the amount of processing related to encoding or decoding of the merge index by setting finalNumMergeCand to a small number is great.

FIG. 25 is a flowchart for explaining the procedure for setting the final merge candidate number finalNumMergeCand which is common on the encoding side and the decoding side according to the method of Example 1 of the present embodiment. The final merge candidate number finalNumMergeCand is set by the header information setting unit 117 in the encoding device, and is set by the first encoded bit string decoding unit 212 in the decoding device. When the slice type slice_type is a P slice (YES in step S201 in FIG. 25), the final merge candidate number finalNumMergeCand is set to the specified number of P slices (3 in the present embodiment) (step S203 in FIG. 25). When the slice type slice_type is a B slice (NO in step S201 in FIG. 25, YES in step S202), the final merge candidate number finalNumMergeCand is set to the prescribed number of B slices (5 in this embodiment) (step in FIG. 25). S204). When the slice type slice_type is an I slice (NO in step S201 in FIG. 25, NO in step S202), the final merge candidate number finalNumMergeCand is set to 0 (step S205 in FIG. 25).

Like the final merge candidate number finalNumMergeCand, the value of the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates may be specified for each slice type according to either the profile or the level, or the combination of the profile and the level, Regardless of the profile or level, the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates may be defined for each slice type. In cases where coding efficiency is considered in consideration of coding efficiency and processing amount (profile, level, or slice type), the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is specified as a large value, and in cases where processing amount is important The upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is specified as a small value.

Next, Example 2 of the present embodiment will be described. In Example 2 of the present embodiment, the final merge candidate number finalNumMergeCand is shown in the slice header in units of slices, as shown in an example of a syntax rule that is a common rule for encoding and decoding a bitstream in FIG. Set the syntax element num_merge_cand. However, the upper limit value of the number of final merge candidates is defined for each slice type. Depending on the combination of profile and level, an upper limit value for the number of final merge candidates may be defined for each slice type, or an upper limit value for the number of final merge candidates may be defined for each slice type regardless of the profile or level. . For example, in a profile configured with a simple function of encoding or decoding using only I slices and P slices, the upper limit value of the number of final merge candidates for P slices is defined as 3. In a profile composed of complex and efficient coding functions that use B slices in addition to I slices and P slices, the upper limit of the number of final merge candidates for P slices and B slices Both may be defined as the same number 5, or the upper limit value of the final merge candidate number of P slices may be defined as a number smaller than the upper limit value of the final merge candidate number of B slices.

FIG. 27 is a flowchart for explaining the setting process procedure of the final merge candidate number finalNumMergeCand on the encoding side according to the method of Example 2 of the present embodiment, and FIG. 28 shows decoding by the method of Example 2 of the present embodiment. 10 is a flowchart for explaining a setting process procedure of a final merge candidate number finalNumMergeCand on the side. The final merge candidate number finalNumMergeCand is set by the header information setting unit 117 in the encoding device, and is set by the first encoded bit string decoding unit 212 in the decoding device. On the encoding side, when the slice type slice_type is a P slice (YES in step S201 in FIG. 27), the final merge candidate number finalNumMergeCand is the same value as the upper limit value of the specified P slice or a value not exceeding the upper limit value (this embodiment In the form, it is set to 3) (step S206 in FIG. 27). When the slice type slice_type is B slice (NO in step S201 in FIG. 27, YES in step S202), the final merge candidate number finalNumMergeCand is the same value as the upper limit value of the specified B slice or a value not exceeding the upper limit value (this embodiment) Is set to 5) (step S207 in FIG. 27). When the slice type slice_type is an I slice (NO in step S201 in FIG. 27, NO in step S202), the final merge candidate number finalNumMergeCand is set to 0 (step S205 in FIG. 27). Further, the syntax element numMergeCand indicating the final merge candidate number finalNumMergeCand set in slice units is entropy-coded (step S208 in FIG. 27). The decoding side decodes the bitstream and derives the final merge candidate number finalNumMergeCand from the syntax element numMergeCand (S209 in FIG. 28).

Note that the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates can be defined together with the final merge candidate number finalNumMergeCand according to the value of the syntax element max_num_spatial_merge_cand. In this case, the final merge candidate number finalNumMergeCand and the upper limit value maxNumSpatialMergeCand of the spatial merge candidate number may define the same value or different values. Depending on the profile, level, or slice type, in consideration of coding efficiency and processing amount, in cases where coding efficiency is important, the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is specified as a large value, and processing amount is emphasized In the case, the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates is specified to a small value.

Alternatively, as shown in FIG. 26, a syntax element max_num_spatial_merge_cand indicating the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates can be set in the slice header in units of slices. The encoding side encodes the syntax element max_num_spatial_merge_cand, and the decoding side performs a decoding process based on the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates according to the value of the syntax element max_num_spatial_merge_cand obtained by decoding the bitstream. In this case, by setting the upper limit value maxNumSpatialMergeCand of the number of spatial merge candidates within the range of processing capacity on the encoding side, the processing amount for spatial merge candidate derivation and merge candidate identity determination can be controlled. Can do.

The moving image encoded stream output from the moving image encoding apparatus of the embodiment described above has a specific data format so that it can be decoded according to the encoding method used in the embodiment. Therefore, the moving picture decoding apparatus corresponding to the moving picture encoding apparatus can decode the encoded stream of this specific data format.

When a wired or wireless network is used to exchange an encoded stream between a moving image encoding device and a moving image decoding device, the encoded stream is converted into a data format suitable for the transmission form of the communication path. It may be transmitted. In that case, a video transmission apparatus that converts the encoded stream output from the video encoding apparatus into encoded data in a data format suitable for the transmission form of the communication channel and transmits the encoded data to the network, and receives the encoded data from the network Then, a moving image receiving apparatus that restores the encoded stream and supplies the encoded stream to the moving image decoding apparatus is provided.

The moving image transmitting apparatus is a memory that buffers the encoded stream output from the moving image encoding apparatus, a packet processing unit that packetizes the encoded stream, and transmission that transmits the packetized encoded data via the network. Part. The moving image receiving apparatus generates a coded stream by packetizing the received data, a receiving unit that receives the packetized coded data via a network, a memory that buffers the received coded data, and packet processing. And a packet processing unit provided to the video decoding device.

The above processing relating to encoding and decoding can be realized as a transmission, storage, and reception device using hardware, and is stored in a ROM (Read Only Memory), a flash memory, or the like. It can also be realized by firmware or software such as a computer. The firmware program and software program can be recorded on a computer-readable recording medium, provided from a server through a wired or wireless network, or provided as a data broadcast of terrestrial or satellite digital broadcasting Is also possible.

The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

101 image memory, 117 header information setting unit, 102 motion vector detection unit, 103 differential motion vector calculation unit, 104 inter prediction information derivation unit, 105 motion compensation prediction unit, 106 intra prediction unit, 107 prediction method determination unit, 108 residual Signal generation unit, 109, orthogonal transform / quantization unit, 118, first encoded bit sequence generation unit, 110, second encoded bit sequence generation unit, 111, third encoded bit sequence generation unit, 112 multiplexing unit, 113 dequantization / inverse Orthogonal transform unit, 114 decoded image signal superimposing unit, 115 encoding information storage memory, 116 decoded image memory, 130 spatial merge candidate generation unit, 131 time merge candidate reference index deriving unit, 132 time merge candidate generation unit, 133 merge candidate Climb , 134 merge candidate identity determination unit, 135 merge candidate number limiting unit, 136 merge candidate supplementing unit, 137 encoded information selecting unit, 201 separating unit, 212 first encoded bit sequence decoding unit, 202 second encoded bit sequence decoding unit , 203 third encoded bit string decoding unit, 204 motion vector calculation unit, 205 inter prediction information derivation unit, 206 motion compensation prediction unit, 207 intra prediction unit, 208 dequantization / inverse orthogonal transform unit, 209 decoded image signal superposition unit 210 encoding information storage memory, 211 decoded image memory, 230 spatial merge candidate generation unit, 231 time merge candidate reference index deriving unit, 232 temporal merge candidate generation unit, 233 merge candidate registration unit, 234 merge candidate identical determination unit, 2 5 merge candidate number limiting part 236 merge candidate refill unit, 237 encoded information selection unit.

The present invention can be used for a moving picture coding technique using motion compensation prediction.

Claims

A video encoding device that encodes a video using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving unit for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination unit that determines a candidate of inter prediction information to be used for inter prediction of the prediction block to be encoded from the derived candidate of the inter prediction information;
A first encoding unit that encodes a syntax element indicating the number of candidates for the inter prediction information;
A moving picture encoding apparatus comprising: a second encoding unit that encodes an index indicating an inter prediction information candidate determined by the determination unit based on the number of inter prediction information candidates.
2. The moving picture coding apparatus according to claim 1, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A moving image encoding apparatus that encodes the moving image using motion compensated prediction in units of blocks obtained by dividing each picture of the moving image,
The prediction block adjacent to the prediction block to be encoded, or the prediction block existing in the same position as or near the prediction block in the encoded picture that is temporally different from the prediction block to be encoded. A prediction information deriving unit for deriving inter prediction information candidates from the prediction information;
A candidate number limiting unit that limits the number of candidate inter prediction information according to at least one of a profile indicating a set of processing functions and a level indicating decoding processing capability;
A motion compensated prediction unit that selects one inter prediction information candidate from the inter prediction information candidates whose number of candidates is limited, and performs inter prediction on the prediction block to be encoded using the selected inter prediction information candidate When,
A moving picture encoding apparatus comprising: an encoding unit that encodes a syntax element indicating the number of inter prediction information candidates used for limiting the number of candidates.
A moving image encoding method for encoding a moving image using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving step for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination step of determining a candidate for inter prediction information used for inter prediction of the prediction block to be encoded, from the derived candidates for inter prediction information;
A first encoding step of encoding a syntax element indicating the number of candidates for the inter prediction information;
A moving picture coding method comprising: a second coding step for coding an index indicating an inter prediction information candidate determined in the determining step based on the number of inter prediction information candidates.
5. The moving picture coding method according to claim 4, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A video encoding program that encodes a video using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving step for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination step of determining a candidate for inter prediction information used for inter prediction of the prediction block to be encoded, from the derived candidates for inter prediction information;
A first encoding step of encoding a syntax element indicating the number of candidates for the inter prediction information;
A moving picture code characterized by causing a computer to execute a second encoding step for encoding an index indicating an inter prediction information candidate determined in the determining step based on the number of inter prediction information candidates. Program.
The moving picture encoding program according to claim 6, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A packet processing unit that packetizes an encoded bit sequence encoded by a moving image encoding method that encodes a moving image using inter prediction in units of blocks obtained by dividing each picture, and obtains encoded data;
A transmission unit for transmitting the packetized encoded data,
The moving image encoding method includes:
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving step for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination step of determining a candidate for inter prediction information used for inter prediction of the prediction block to be encoded, from the derived candidates for inter prediction information;
A first encoding step of encoding a syntax element indicating the number of candidates for the inter prediction information;
A transmission apparatus comprising: a second encoding step that encodes an index indicating an inter prediction information candidate determined by the determining step based on the number of inter prediction information candidates.
The transmitting apparatus according to claim 8, wherein the maximum value indicated by the index is a value of the number of candidates for the inter prediction information minus one.
A packet processing step of packetizing an encoded bit sequence encoded by a moving image encoding method that encodes a moving image using inter prediction in units of blocks obtained by dividing each picture, and obtaining encoded data;
Transmitting the packetized encoded data, and
The moving image encoding method includes:
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving step for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination step of determining a candidate for inter prediction information used for inter prediction of the prediction block to be encoded, from the derived candidates for inter prediction information;
A first encoding step of encoding a syntax element indicating the number of candidates for the inter prediction information;
A transmission method comprising: a second encoding step for encoding an index indicating an inter prediction information candidate determined in the determining step based on the number of inter prediction information candidates.
The transmission method according to claim 10, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A packet processing step of packetizing an encoded bit sequence encoded by a moving image encoding method that encodes a moving image using inter prediction in units of blocks obtained by dividing each picture, and obtaining encoded data;
Transmitting the packetized encoded data to the computer, and
The moving image encoding method includes:
Inter prediction information of an encoded prediction block adjacent to the prediction block to be encoded in the same picture as the prediction block to be encoded, and prediction in an encoded picture different from the prediction block to be encoded A prediction information deriving step for deriving candidates for inter prediction information from the inter prediction information of the block;
A determination step of determining a candidate for inter prediction information used for inter prediction of the prediction block to be encoded, from the derived candidates for inter prediction information;
A first encoding step of encoding a syntax element indicating the number of candidates for the inter prediction information;
A transmission program comprising: a second encoding step that encodes an index indicating an inter prediction information candidate determined in the determining step based on the number of inter prediction information candidates.
The transmission program according to claim 12, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A video decoding device that decodes a coded bit string in which a video is coded using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving unit for deriving inter prediction information candidates,
A first decoding unit that decodes a syntax element indicating the number of inter prediction information candidates and obtains the number of inter prediction information candidates;
A second decoding unit that decodes an index indicating a candidate of inter prediction information used for inter prediction of the prediction block to be decoded, based on the number of candidates of the inter prediction information acquired by the first decoding unit;
A video decoding device comprising: a selection unit that selects a candidate for inter prediction information indicated by the index from candidates for the inter prediction information derived by the prediction information deriving unit.
15. The moving picture decoding apparatus according to claim 14, wherein the maximum value indicated by the index is a value of the number of candidates for the inter prediction information minus one.
A moving picture decoding apparatus for decoding a coded bit string obtained by coding the moving picture using motion compensation prediction in units of blocks obtained by dividing each picture of the moving picture,
From the prediction block adjacent to the prediction block to be decoded, or the inter prediction information of the prediction block existing at the same position as or near the prediction block to be decoded in the decoded picture temporally different from the prediction block to be decoded, A prediction information deriving unit for deriving inter prediction information candidates;
A first decoding unit that decodes a syntax element indicating the number of inter prediction information candidates and obtains the number of inter prediction information candidates;
A candidate number limiting unit that limits the number of candidate inter prediction information using the number of inter prediction information candidates acquired by the first decoding unit;
A second decoding unit that decodes an index indicating the candidate of the inter prediction information, which is inter prediction information of the prediction block to be decoded, based on the number of candidates of the inter prediction information acquired by the first decoding unit; ,
The inter prediction information candidate indicated by the decoded index is selected from the inter prediction information candidates whose number of candidates is limited, and the inter prediction of the prediction block to be decoded is selected based on the selected inter prediction information candidates. A moving image decoding apparatus comprising: a motion-compensated prediction unit that performs:
A moving picture decoding method for decoding a coded bit string in which a moving picture is coded using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving step for deriving inter prediction information candidates from
A first decoding step of decoding a syntax element indicating the number of inter prediction information candidates and obtaining the number of inter prediction information candidates;
A second decoding step of decoding an index indicating a candidate of inter prediction information used for inter prediction of the prediction block to be decoded, based on the number of candidates of the inter prediction information acquired in the first decoding step;
A moving picture decoding method comprising: a selection step of selecting a candidate of inter prediction information indicated by the index from candidates of the inter prediction information derived by the prediction information deriving step.
18. The moving picture decoding method according to claim 17, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A moving picture decoding program for decoding a coded bit string in which a moving picture is coded using inter prediction in units of blocks obtained by dividing each picture,
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving step for deriving inter prediction information candidates from
A first decoding step of decoding a syntax element indicating the number of inter prediction information candidates and obtaining the number of inter prediction information candidates;
A second decoding step of decoding an index indicating a candidate of inter prediction information used for inter prediction of the prediction block to be decoded, based on the number of candidates of the inter prediction information acquired in the first decoding step;
A moving picture decoding program that causes a computer to execute a selection step of selecting a candidate of inter prediction information indicated by the index from candidates of the inter prediction information derived by the prediction information deriving step.
20. The moving picture decoding program according to claim 19, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A receiving device that receives and decodes an encoded bit sequence in which a moving image is encoded,
A receiving unit that receives encoded data in which a coded bit sequence in which a moving image is coded using inter prediction in units of blocks obtained by dividing each picture is packetized;
A restoring unit that packet-processes the received encoded data and restores the original encoded bit string;
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving unit for deriving inter prediction information candidates,
A first decoding unit that decodes a syntax element indicating the number of inter prediction information candidates from the restored encoded bit string and obtains the number of inter prediction information candidates;
Based on the number of candidates for the inter prediction information acquired by the first decoding unit, an index indicating a candidate for inter prediction information used for inter prediction of the prediction block to be decoded is decoded from the restored encoded bit string. A second decoding unit;
A receiving apparatus comprising: a selection unit that selects an inter prediction information candidate indicated by the index from the inter prediction information candidates derived by the prediction information deriving unit.
The receiving apparatus according to claim 21, wherein the maximum value indicated by the index is a value of the number of candidates for the inter prediction information minus one.
A receiving method for receiving and decoding an encoded bit string in which a moving image is encoded,
A reception step of receiving encoded data in which a coded bit sequence in which a moving image is encoded using inter prediction in units of blocks obtained by dividing each picture is packetized;
A restoration step of packetizing the received encoded data to restore the original encoded bit sequence;
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving step for deriving inter prediction information candidates from
A first decoding step of decoding a syntax element indicating the number of inter prediction information candidates from the restored encoded bit string, and obtaining the number of inter prediction information candidates;
Based on the number of candidates for the inter prediction information acquired in the first decoding step, an index indicating a candidate for inter prediction information used for inter prediction of the prediction block to be decoded is decoded from the restored encoded bit string. A second decoding step;
And a selection step of selecting an inter prediction information candidate indicated by the index from the inter prediction information candidates derived by the prediction information deriving step.
The reception method according to claim 23, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.
A receiving program that receives and decodes a coded bit string in which a moving image is coded,
A reception step of receiving encoded data in which a coded bit sequence in which a moving image is encoded using inter prediction in units of blocks obtained by dividing each picture is packetized;
A restoration step of packetizing the received encoded data to restore the original encoded bit sequence;
Inter prediction information of a decoded prediction block close to the prediction block to be decoded in the same picture as the prediction block to be decoded, and inter prediction information of a prediction block in a decoded picture different from the prediction block to be decoded A prediction information deriving step for deriving inter prediction information candidates from
A first decoding step of decoding a syntax element indicating the number of inter prediction information candidates from the restored encoded bit string, and obtaining the number of inter prediction information candidates;
Based on the number of candidates for the inter prediction information acquired in the first decoding step, an index indicating a candidate for inter prediction information used for inter prediction of the prediction block to be decoded is decoded from the restored encoded bit string. A second decoding step;
A receiving program that causes a computer to execute a selection step of selecting an inter prediction information candidate indicated by the index from the inter prediction information candidates derived in the prediction information deriving step.
The reception program according to claim 25, wherein the maximum value indicated by the index is a value of the number of candidates of the inter prediction information minus one.