KR20150100355A

KR20150100355A - Method and apparatus of inter prediction

Info

Publication number: KR20150100355A
Application number: KR1020140022070A
Authority: KR
Inventors: 전동산; 석진욱; 김종호; 정순흥; 김연희; 최진수; 김진웅
Original assignee: 한국전자통신연구원
Priority date: 2014-02-25
Filing date: 2014-02-25
Publication date: 2015-09-02

Abstract

An inter-picture prediction method and apparatus are disclosed. The inter-picture prediction method includes a predictive motion vector (PMV) obtained through motion vector prediction for a prediction target block and a gradient between an IMV (Integer Motion Vector) obtained through motion prediction based on the PMV on a per- Calculating a gradient based on the gradient, performing motion prediction through interpolation in units of sub-pixels (Fractional Pixel) with respect to a predetermined direction based on the gradient, calculating an optimal motion vector .

Description

[0001] METHOD AND APPARATUS OF INTER PREDICTION [0002]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to image processing, and more particularly, to a method and apparatus for accelerating fractional pixel unit motion prediction to improve the processing speed of a video encoder.

As the digital transition of broadcasting has been completed around the world, advanced countries such as Japan, USA and Europe have set up realistic media as core next-generation broadcasting and actively invested in order to prevail the next generation HD (Post-HD) era. UHDTV (Ultra High Definition TV) is a representative example that can meet the quality requirement of consumers through extremely realistic high-quality broadcasting service with video with 4 to 16 times resolution and more than 10 channels of multi-channel audio compared to HD (High Definition) It is the next generation broadcasting service.

In order to efficiently compress large capacity UHD contents, a new video encoding standard capable of achieving superior compression performance compared to the conventional encoding method H.264 / AVC is needed. Joint Collaborative Team on Video Coding (JCT-VC), a joint team of video coding of ITU-T VCEG (Video Coding Expert Group) and ISO / IEC MPEG (Moving Picture Expert Group), started CfP (Call for Proposal) The final draft standard (FDIS: Final Draft International Standard) of HEVC (High Efficiency Video Coding), the next generation image coding standard, was completed in the first half of 2013. HEVC provides about twice the compression performance compared to H.264 / AVC by removing the spatial / temporal redundancy more efficiently while maintaining similar structure to H.264 / AVC, which is the existing image coding standard.

As mentioned above, the HEVC is a video standard jointly developed by the ISO / IEC MPEG Group and the ITU-T VCEG Group, and its official name is ISO / IEC 23008-2 MPEG-H Part 2 or ITU-T H.265. And are collectively referred to as HEVC in this specification.

Since the standard for HEVC Version 1 has been completed in January 2013, we have now constructed a scalable HEVC (SHVC) and a 3D HEVC (3D-HEVC) ad hoc group, which are used for HEVC-based scalable video coding and 3D coding Standardization is underway.

The present invention provides an image encoding / decoding method and apparatus capable of improving encoding / decoding efficiency.

The present invention provides a method and apparatus for predicting motion prediction on a subpixel basis at the time of inter-picture prediction.

According to an embodiment of the present invention, an inter-picture prediction method is provided. The method includes the steps of generating a predictive motion vector (PMV) obtained by motion vector prediction for a predictive block and an integer motion vector (IMV) obtained by motion prediction based on the PMV, Performing motion prediction through interpolation in units of sub-pixels (Fractional Pixel) with respect to a predetermined direction based on the gradient, deriving an optimal motion vector of the prediction target block based on the motion prediction, .

By improving the motion prediction speed in the subpixel unit, which occupies the most complexity in inter picture coding, it is possible to reduce the processing time of the entire encoder, and can be applied to the development of a real time encoder.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment of the present invention.
2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment of the present invention.
3 is a diagram showing the complexity distribution of the HEVC encoder.
4 is a diagram for explaining a method of deriving a motion vector in the HEVC encoder.
5 is a flowchart schematically illustrating a method of predicting motion between frames according to an embodiment of the present invention.
6 is a diagram illustrating a process of calculating a gradient between a PMV and an IMV according to an embodiment of the present invention.
7 is a diagram illustrating a direction of a gradient for subpixel-based motion prediction according to an embodiment of the present invention.
8 is a diagram for explaining a process of performing motion prediction of a sub-pixel unit (1/2, 1/4 pixel) according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In describing the embodiments of the present invention, if the detailed description of related known structures or functions is deemed to obscure the subject matter of the present specification, the description may be omitted.

When an element is referred to herein as being connected or connected to another element, it may mean directly connected or connected to the other element, It may mean something. In addition, the description that includes a specific configuration in this specification does not exclude a configuration other than the configuration, and means that additional configurations can be included in the scope of the present invention or the scope of the present invention.

The terms first, second, etc. may be used to describe various configurations, but the configurations are not limited by the term. The terms are used for the purpose of distinguishing one configuration from another. For example, without departing from the scope of the present invention, the first configuration may be referred to as the second configuration, and similarly, the second configuration may be named as the first configuration.

In addition, the constituent elements shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which do not mean that each constituent element is composed of separate hardware or a single software constituent unit. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of each constituent unit may form one constituent unit or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and the separate embodiments of each component are also included in the scope of the present invention unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment of the present invention.

1, the image encoding apparatus 100 includes a motion prediction unit 111, a motion compensation unit 112, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, A quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transformation unit 170, an adder 175, a filter unit 180, and a reference image buffer 190.

The image encoding apparatus 100 may encode an input image in an intra mode or an inter mode and output a bit stream. In the intra mode, the switch 115 is switched to the intra mode, and in the inter mode, the switch 115 can be switched to the inter mode. Intra prediction is intra prediction, and inter prediction is inter prediction. The image encoding apparatus 100 may generate a prediction block for an input block of an input image, and then may code a residual between the input block and the prediction block. At this time, the input image may mean an original picture.

In the intra mode, the intraprediction unit 120 may generate a prediction block by performing spatial prediction using the pixel values of the already coded / decoded blocks around the current block.

In the inter mode, the motion predicting unit 111 can find a motion vector by searching an area of the reference picture stored in the reference picture buffer 190 that is best matched with the input block. The motion compensation unit 112 may generate a prediction block by performing motion compensation using a motion vector. Here, the motion vector is a two-dimensional vector used for inter prediction, and can represent an offset between the current image to be encoded / decoded and the reference image.

The subtractor 125 may generate a residual block by a difference between the input block and the generated prediction block.

The transforming unit 130 may perform a transform on the residual block to output a transform coefficient. The quantization unit 140 may quantize the input transform coefficients according to a quantization parameter (or a quantization parameter) to output a quantized coefficient.

The entropy encoding unit 150 may output a bit stream by performing entropy encoding based on the values calculated by the quantization unit 140 or the encoding parameter values calculated in the encoding process. When entropy encoding is applied, a small number of bits are allocated to a symbol having a high probability of occurrence, and a large number of bits are allocated to a symbol having a low probability of occurrence, thereby expressing symbols, The size of the column can be reduced. Therefore, the compression performance of the image encoding can be enhanced through the entropy encoding. The entropy encoding unit 150 may use an encoding method such as Exponential-Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC) for entropy encoding.

Since the image encoding apparatus 100 according to the embodiment of FIG. 1 performs inter-prediction encoding, that is, inter-view prediction encoding, the currently encoded image needs to be decoded and stored for use as a reference image. Accordingly, the quantized coefficients are inversely quantized in the inverse quantization unit 160 and inversely transformed in the inverse transformation unit 170. The inverse quantized and inverse transformed coefficients are added to the prediction block through the adder 175 and a reconstruction block is generated.

The restoration block passes through the filter unit 180 and the filter unit 180 applies at least one of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) can do. The filter unit 180 may be referred to as an adaptive in-loop filter. The deblocking filter can remove block distortion occurring at the boundary between the blocks. The SAO may add a proper offset value to the pixel value to compensate for coding errors. ALF can perform filtering based on the comparison between the reconstructed image and the original image. The reconstruction block having passed through the filter unit 180 can be stored in the reference picture buffer 190.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment of the present invention.

2, the image decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, a motion compensation unit 250, an adder 255 A filter unit 260, and a reference picture buffer 270.

The video decoding apparatus 200 receives the bit stream output from the encoder and decodes the video stream into the intra mode or the inter mode, and outputs the reconstructed video, that is, the reconstructed video. In the intra mode, the switch is switched to the intra mode, and in the inter mode, the switch can be switched to the inter mode.

The image decoding apparatus 200 may obtain a reconstructed residual block from the input bitstream, generate a prediction block, and add the reconstructed residual block and the prediction block to generate a reconstructed block, i.e., a reconstructed block .

The entropy decoding unit 210 may entropy-decode the input bitstream according to a probability distribution to generate symbols including a symbol of a quantized coefficient type.

When the entropy decoding method is applied, a small number of bits are assigned to a symbol having a high probability of occurrence, and a large number of bits are assigned to a symbol having a low probability of occurrence, so that the size of a bit string for each symbol is Can be reduced.

The quantized coefficients are inversely quantized in the inverse quantization unit 220 and inversely transformed in the inverse transformation unit 230. The reconstructed residual block can be generated as a result of inverse quantization / inverse transformation of the quantized coefficients.

In the intra mode, the intraprediction unit 240 may generate a prediction block by performing spatial prediction using the pixel value of the already decoded block around the current block. In the inter mode, the motion compensation unit 250 may generate a prediction block by performing motion compensation using a motion vector and a reference image stored in the reference picture buffer 270. [

The residual block and the prediction block are added through the adder 255, and the added block can be passed through the filter unit 260. [ The filter unit 260 may apply at least one of a deblocking filter, SAO, and ALF to a restoration block or a restored picture. The filter unit 260 may output a reconstructed image, that is, a reconstructed image. The reconstructed image is stored in the reference picture buffer 270 and can be used for inter prediction.

On the other hand, the HEVC divides the input image into CUs (Coding Units) in units of HEVC blocks and performs inter-picture prediction based on Intra Prediction (IM) and Motion Estimation (ME) Inter Prediction (inter prediction) to generate a residual signal. The generated residual signal is transformed and / or quantized, and the prediction signal generated based on intra-picture prediction and / or inter-picture prediction is used to generate a reference picture. HEVC improves compression performance by performing sophisticated in-screen / inter-view prediction and conversion / quantization compared to existing H.264 / AVC. It also improves compression performance by using existing deblocking filter (De-blocking Filter) ) And SAO (Sample Adaptive Offset) technology to improve subjective image quality.

However, HEVC improves the compression performance by performing more sophisticated in-screen / inter-view prediction compared to H.264 / AVC, while the resulting complexity of the encoder increases by about two times. Therefore, in order to develop a real-time HEVC encoder, an efficient high-speed encoding algorithm capable of reducing the complexity while maintaining the compression performance of the newly added encoding technique is needed.

3 is a diagram showing the complexity distribution of the HEVC encoder. FIG. 3 (a) shows the complexity distribution of the HEVC encoder, and FIG. 3 (b) shows the complexity distribution for the inter-view prediction in the HEVC encoder.

As shown in FIG. 3 (a), the inter picture prediction process during the HEVC image coding process occupies most of the complexity of the HEVC encoder. Also, as shown in FIG. 3 (b), in the inter picture prediction process such as motion prediction, skip, and merge, motion prediction accounts for most of the complexity of the encoder.

More specifically, when the complexity distribution for motion prediction is analyzed, the complexity is mostly required to generate a reference image through an interpolation filter when predicting a motion of a sub-pixel (fractional pixel) unit.

In the HEVC, inter-picture prediction is performed by using an Adaptive Motion Vector Prediction (AMVP) process to obtain an initial motion vector candidate for two motion predictions, and then, from the initial motion vector candidate through Rate-Distortion Optimization (RDO) (PMV: Predicted Motion Vecotr). Motion prediction is performed in an integer pixel unit in a horizontal / vertical direction within a search range centered on the determined PMV, thereby determining an optimal integer motion vector (IMV) do. Pixel-based motion prediction is performed on the determined IMV to finally obtain an optimal motion vector.

HEVC generates reference image samples in units of 1/2 or 1/4 sub-pixels as in H.264 / AVC for subpixel motion prediction. However, unlike the conventional method, DCT (Discrete Cosine Transform) based interpolation filter Are used for each of the luma and chroma components according to the interpolation filter coefficients.

Table 1 and Table 2 show the DCT-based interpolation filter coefficients used in HEVC encoders. Table 1 shows an example of interpolation filter coefficients for luminance components. Table 2 shows an example of interpolation filter coefficients for color components. .

Index -3 -2 -One 0 One 2 3 4 hfilter [i] (1/2) -One 4 -11 40 40 -11 4 One qfilter [i] (1/4) -One 4 -10 58 17 -5 One 0

Index -One 0 One 2 Filter [i] (1/8) -2 58 10 -2 Filter [i] (1/4) -4 54 16 -2 Filter [i] (3/8) -6 46 28 -4 Filter [i] (1/2) -4 36 36 -4

4 is a diagram for explaining a method of deriving a motion vector in the HEVC encoder.

In order to obtain the final motion vector through the inter-picture prediction in the HEVC, as shown in FIG. 4, the motion vector search unit 400 finds the motion vector motion vector 400 through a motion search in an integer-pel unit, Half-pixel motion vector 410, and quarter-pixel motion vector 420 by sequentially performing a motion search in a half-pel unit and a quarter-pel unit.

In this case, since the interpolation filter for generating the reference image must be always performed without a specific early termination method, it takes the largest part of the complexity of the encoder.

Therefore, in the present invention, a method of simplifying a portion for determining a reference image through an interpolation filter occupying the greatest complexity in sub-pixel-based motion prediction is proposed. That is, the present invention provides a method for performing inter-screen prediction at a high speed by reducing the complexity of an interpolation filter for generating a reference image in sub-pixel units.

In order to achieve the above object, the present invention provides a motion compensation method for a motion compensated motion vector (PMV) obtained by predicting a motion vector of a subpixel pixel in an HEVC encoder and a predictive motion vector (PMV) Gradient is obtained to selectively perform interpolation and motion prediction only at a position corresponding to a gradient in the same direction in the subpixel prediction of a subpixel unit.

5 is a flowchart schematically illustrating a method of predicting motion between frames according to an embodiment of the present invention. The method of FIG. 5 may be performed in the encoding apparatus of FIG. 1, and more specifically, may be performed by a motion predicting unit.

Referring to FIG. 5, the encoding apparatus includes a predictive motion vector (PMV) obtained through motion vector prediction for a prediction target block, an integer motion vector (IMV) obtained through motion prediction on a per- ) Is calculated (S500).

The gradient between PMV and IMV can be calculated as: < EMI ID = 1.0 >

6 is a diagram illustrating a process of calculating a gradient between a PMV and an IMV according to an embodiment of the present invention. The shaded pixels in FIG. 6 represent the purification unit.

For example, when the PMV obtained through the motion vector prediction for the predicted block and the IMV obtained through the motion prediction of the refinement unit on the basis of the PMV are as shown in FIG. 6, The gradient between PMV and IMV is 45 degrees.

Referring again to FIG. 5, the encoding apparatus performs motion prediction through interpolation of a fractional pixel unit in a predetermined direction based on a gradient between PMV and IMV (S510).

For example, motion estimation can be performed using the gradient information obtained through the PMV and the IMV, by applying interpolation on a subpixel basis only to a direction selectively defined according to the condition of Equation (2) below. If the gradient has a negative direction, it can be applied symmetrically. The directions (a), (b), (c), and (d) in the following Equation 2 may be as shown in FIG.

For example, when the gradient between PMV and IMV is -50 degrees, the third condition of Equation (2) is satisfied because it corresponds to 130 degrees with reference to Equation (2). Therefore, the encoding apparatus can perform interpolation and motion prediction on a subpixel basis only in the predefined directions (b) and (c).

The encoding apparatus derives an optimal motion vector of a prediction target block through interpolation and motion prediction on a gradient-based sub-pixel basis (S520).

The encoding apparatus can generate a prediction sample of the prediction target block based on the optimal motion vector of the prediction target block. The residual signal information may be derived based on the prediction sample value, and the residual signal information may be transmitted to the decoding apparatus.

7 is a diagram illustrating a direction of a gradient for subpixel-based motion prediction according to an embodiment of the present invention.

7 (a) shows a gradient in the horizontal direction.

7 (b) shows a gradient in the vertical direction.

Figure 7 (c) shows a gradient in the 135 degree diagonal direction.

Figure 7 (d) shows a gradient in a 45 degree diagonal direction.

8 is a diagram for explaining a process of performing motion prediction of a sub-pixel unit (1/2, 1/4 pixel) according to an embodiment of the present invention.

8, " 0 " indicates the position of the motion vector IMV in the cleansing area. The encoding apparatus applies an interpolation filter to pixels of the sub-pixel unit in units of half pixels at the position of the motion vector IMV "0" in the refinement unit, so that "1" to "8 &Quot; position can be interpolated. At this time, the interpolation filter can interpolate "1" to "8" shown in 800 of FIG. 8 by applying the interpolation filter coefficients shown in Tables 1 and 2 above.

The encoder performs motion prediction on a 1/2 sub-pixel basis for " 1 " to " 8 " shown in 800 of FIG. 8, and then determines an optimal 1/2 sub- . Then, the encoding apparatus performs interpolation and motion prediction on a quarter-pixel basis on the basis of the optimal 1/2 sub-pixel unit motion vector, and determines an optimal 1/4 sub-pixel unit motion vector.

For example, when the optimal 1/2 sub-pixel unit motion vector obtained through RDO is determined to be " 6 " shown in 800 in FIG. 8, the encoding apparatus applies an interpolation filter on a quarter- Quot; 1 " to " 8 " positions as shown in 810 of FIG. Then, motion prediction for " 1 " to " 8 " shown in 810 in FIG. 8 is performed to determine an optimal quarter-pixel unit motion vector. At this time, the interpolation filter can interpolate " 1 " to " 8 " shown in 810 of FIG. 8 by applying the interpolation filter coefficients shown in Tables 1 and 2 above.

In the above-described embodiments, the methods are described on the basis of a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or simultaneously . It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of the claims should be construed as being included in the scope of the present invention.

Claims

A gradient between a predictive motion vector (PMV) obtained through motion vector prediction for a prediction target block and an IMV (Integer Motion Vector) obtained by motion prediction based on the PMV is calculated step;
Performing motion prediction through interpolation in a fractional pixel unit with respect to a predetermined direction based on the gradient; And
And deriving an optimal motion vector of the prediction target block based on the motion prediction.