CN102158710B

CN102158710B - Depth view encoding rate distortion judgment method for virtual view quality

Info

Publication number: CN102158710B
Application number: CN 201110140492
Authority: CN
Inventors: 元辉; 刘琚; 孙建德
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2011-05-27
Filing date: 2011-05-27
Publication date: 2012-12-26
Anticipated expiration: 2031-05-27
Also published as: CN102158710A

Abstract

The invention discloses a depth view encoding rate distortion judgment method for virtual view quality. The method comprises the steps of performing predication on a current encoding block to obtain a predicated block; computing the difference between the current encoding block and the predicated block, and performing discrete cosine transform, quantification and entropy coding on the difference to obtain the code rate of the current encoding block; converting pixel gray values of the current encoding block and the predicated block into parallax values; then computing the distortion of the current encoding block according to the converted parallax values; and finally computing the rate-distortion cost of the current encoding block according to the distortion and the code rate. The method better reflects the influence on the composite virtual view quality by the compression distortion of a depth view according to the influence on the composite virtual view quality by the compression distortion of the depth view, improves the encoding code rate of three-dimensional video, and can be applied to the coding standard of three-dimensional videos.

Description

A kind of depth map encoding rate distortion determination methods towards the virtual view quality

Technical field

The present invention relates to a kind of rate distortion judgment criterion that improves depth map encoding efficient, belong to the depth map encoding technical field in the 3 D stereo video coding standard.

Background technology

The 3 D stereo video is meant that as main video applications in future the user can enjoy real 3 D stereo video content through 3 D stereo video display device.The correlation technique of 3 D video, such as, the technology such as demonstration of the collection of 3 D stereo video, 3 D stereo video coding, 3 D stereo video are paid close attention to widely.In order to promote the standardization of 3 D stereo video technology, 2002, (Motion Picture Experts Group MPEG) proposed any viewpoint TV (Free View Television, notion FTV) in Motion Picture Experts Group.It can provide vividly real, interactively 3 D stereo audiovisual system.The user can watch the 3 D stereo video of this angle from different angles, makes the user have and incorporates the sense of reality in the video scene.FTV can be widely used in fields such as broadcast communication, amusement, education, medical treatment and video monitoring.In order to make the user can watch 3 D stereo video at any angle, FTV system service end uses the video camera array of having demarcated to obtain the video on certain viewpoint.And, utilize corrected video information to generate the virtual view of virtual view through the virtual view synthetic technology to the video correction on the different points of view.MPEG suggestion is at present specifically used based on the degree of depth-image (Depth-Image Based Rendering, virtual view synthetic technology DIBR).Depth information is generally represented through depth map.The main process that virtual view synthesizes is following:

1). confirm to want the relative position of virtual view in video camera array.

2). confirm to be used for the texture video of synthetic virtual view.

3). confirm step 2) the corresponding depth map of texture video.

4). according to step 2) with 3) and in texture video and depth map, adopt the DIBR technology, synthetic virtual view.

The standardization effort of FTV is divided into two stages to carry out.Phase I is 2006 to 2008 the expansion scheme-MVC H.264/AVC (Multi-View Video Coding) that formulated by JVT (Joint Video Team, joint video code sets).MVC can encode to many viewpoints texture video.But to finally realize the function of FTV system, also must encode depth information.The standardization formulation work of FTV has at present got into second stage, i.e. 3DVC (Three Dimensional Video Coding).3DVC mainly pays close attention to the expression and the coding of depth information, and the combined coding of texture video and depth information.Among the 3DVC, depth information is represented through depth map.

The leading indicator of weighing the 3DVC performance is the quality of synthetic virtual view, and the encoder bit rate of texture video, depth map.The quality of virtual view:

Usually (Peak Signal-to-Noise Ratio PSNR) weighs the quality of video to adopt Y-PSNR.The computing formula of PSNR is shown below,

PSNR = 10 \times \log (\frac{255^{2}}{MSE})

(1) MSE representes the mean square error between original view and the synthetic virtual view in the formula, is used for weighing the distortion of virtual view, and the coding distortion of the coding distortion of texture video, depth map.

In practical application, the view of virtual view is non-existent, does not also promptly have original view.But,, at first adopt the existing texture video of un-encoded and the corresponding synthetic virtual view V of depth map thereof therefore for weighing the performance of 3DVC because 3DVC mainly pays close attention to coding efficiency _Orig, the depth map that adopts the texture video of the reconstruction after process is encoded and the back of encoding to rebuild then synthesizes virtual view V _Rec, at last through calculating V _RecWith V _OrigBetween MSE, and then obtain PSNR, to weigh the performance of 3DVC.

Fig. 1 has provided existing rate distortion criterion calculation flow chart H.264/AVC, in video encoding standard H.264/AVC, at first the present encoding piece is predicted, calculates Mean Square Error MSE between present encoding piece and the predict blocks then as distortion D _H264Then the grey scale pixel value of present encoding piece is deducted the grey scale pixel value of predict blocks, obtain prediction difference, prediction difference is carried out discrete cosine transform, quantize, entropy coding obtains the encoder bit rate R of present encoding piece then _H264, adopt the rate distortion costs J of computes present encoding piece at last _H264: J _H264=D _H266+ λ R _H264, wherein λ is a Lagrange multiplier.

H.264/AVC the rate distortion judgment criterion that adopts is not considered the influence of the compression artefacts of depth map to the quality of synthetic virtual view, and the efficient that therefore adopts rate distortion judgment criterion H.264/AVC that depth map is encoded is not high.Can know that by theory analysis the distortion of synthetic virtual view is directly by the parallax decision, and the quality of depth map can have influence on the accuracy of parallax.In the compression process of depth map, the distortion of depth map itself not necessarily can cause the inaccurate of parallax; But when the depth map entirely accurate, parallax also is an entirely accurate.

Summary of the invention

Do not consider of the influence of the compression artefacts of depth map to the quality of synthetic virtual view to the rate distortion judgment criterion that H.264/AVC adopts; The present invention proposes a kind of rate distortion determination methods towards the virtual view quality that is applicable to depth map encoding according to the influence to synthetic virtual view quality of the compression artefacts of depth map.

The present invention towards the depth map encoding rate distortion determination methods of virtual view quality is:

The present encoding piece is predicted, obtained predict blocks; Calculate the difference of present encoding piece and predict blocks, and difference is carried out discrete cosine transform, quantification and entropy coding, obtain the code check of present encoding piece; Convert the grey scale pixel value of present encoding piece and predict blocks into parallax value; Calculate the distortion of present encoding piece then according to the parallax value after the conversion; At last according to the distortion of gained and the rate distortion costs of code check calculating present encoding piece; Concrete steps are following:

(1) the present encoding piece of depth map is predicted that obtain the predict blocks of present encoding piece, the gray value of each pixel in the present encoding piece is used L _iExpression, each grey scale pixel value in the predict blocks is used L _{P, i}Expression, i ∈ 1 ... N}, N are the pixel quantity in the present encoding piece;

The present encoding piece of depth map is carried out forecast method to be adopted in the various frames of H.264/AVC standard code or inter-frame prediction method.

(2) with L _iWith L _{P, i}Subtract each other, obtain the difference of present encoding piece, and difference is carried out discrete cosine transform, quantification and entropy coding, confirm the code check R of present encoding piece;

(3) the entropy coding data are decoded, and the data of decoding are carried out inverse quantization and inverse discrete cosine transformation, rebuild difference signal;

(4) with the difference signal addition of present encoding piece and reconstruction, rebuild current block;

(5) convert each grey scale pixel value in the reconstruction current block of present encoding piece and step (4) gained into parallax value respectively according to following formula,

{dis}_{i} = f \cdot b \cdot [\frac{L_{i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

{dis}_{rec, i} = f \cdot b \cdot [\frac{L_{rec, i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

Dis wherein _iThe pairing parallax value of gray value of i pixel in the expression present encoding piece, dis _{Rec, i}The pairing parallax value of gray value of i pixel in the current block, L are rebuild in expression _{Rec, i}The grey scale pixel value of current block is rebuild in expression, and f representes focus of camera, and b is the spacing between the adjacent camera, Z _NearThe real depth value of the object point that expression is nearest apart from video camera, Z _FarExpression is apart from the real depth value of video camera object point farthest;

(6) according to dis _iWith dis _{Rec, i}Calculate the distortion D of present encoding piece,

Wherein, work as dis _iWith dis _{Rec, i}When equating, δ _iEqual 0, work as dis _iWith dis _{Rec, i}When unequal, δ _iEqual L _i-L _{Rec, i}

(7) according to the code check R of the present encoding piece of step (2) gained and the distortion D of step (6) gained, and according to the rate distortion costs J of computes present encoding piece, J=D+ λ R, wherein λ is a Lagrange multiplier.Lagrange multiplier λ adopts the numerical value of H.264/AVC standard code.

The present invention confirms rate distortion according to the compression artefacts of depth map to the influence of synthetic virtual view quality, can better reflect the influence of the compression artefacts of depth map to synthetic virtual view quality, improves the efficient of 3 D stereo video coding; Only needing degree of depth grayvalue transition is corresponding parallax value, and according to the parallax value calculated distortion, calculates and simply be easy to realization, can not increase encoder complexity.

Description of drawings

Fig. 1 is a rate distortion criterion calculation flow chart H.264/AVC;

Fig. 2 is a depth map encoding rate distortion determination methods flow chart of the present invention;

Fig. 3 is the rate distortion curve comparison diagram that adopts respectively after method of the present invention and method are H.264/AVC encoded to depth map.

Embodiment

Depth map encoding rate distortion determination methods towards the virtual view quality of the present invention, as shown in Figure 2, specifically comprise the steps:

(1) the present encoding piece of depth map is predicted that obtain the predict blocks of present encoding piece, the gray value of each pixel in the present encoding piece is used L _iExpression, each grey scale pixel value in the predict blocks is used L _{P, i}Expression, i ∈ 1 ... N}, N are the pixel quantity in the present encoding piece.The present encoding piece of depth map is carried out forecast method to be adopted in the various frames of H.264/AVC standard code or inter-frame prediction method.

(2) with L _iWith L _{P, i}Subtract each other, obtain the difference of present encoding piece, and difference is carried out discrete cosine transform, quantize, entropy coding is confirmed the code check R of present encoding piece;

(5) convert each grey scale pixel value in the predict blocks of present encoding piece and step 1 gained into parallax value respectively according to following formula,

{dis}_{i} = f \cdot b \cdot [\frac{L_{i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

{dis}_{rec, i} = f \cdot b \cdot [\frac{L_{rec, i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

Dis wherein _iThe pairing parallax value of gray value of i pixel in the expression present encoding piece, dis _{Rec, i}The pairing parallax value of gray value of i pixel in the expression reconstructed block, L _{Rec, i}The grey scale pixel value of expression reconstructed block, f representes focus of camera, b is the spacing between the adjacent camera, Z _NearThe real depth value of the object point that expression is nearest apart from video camera, Z _FarExpression is apart from the real depth value of video camera object point farthest;

D = Σ_{i = 1}^{N} δ_{i}^{2},

Wherein, work as dis _iWith dis _{Rec, i}When equating, δ _iEqual 0, work as dis _iWith dis _{Rec, i}When unequal, δ _iEqual L _i-L _{P, i}

(7) according to the code check R of the present encoding piece of step 2 gained and the distortion D of step 6 gained, and according to the rate distortion costs J of computes present encoding piece,

J＝D+λ·R，

Wherein λ is a Lagrange multiplier.Lagrange multiplier λ adopts the numerical value of H.264/AVC standard code.

Effect of the present invention can further specify through experiment.

Experiment test under the different quantized parameters conditions, the encoder bit rate after adopting the present invention that depth map is encoded and the objective quality PSNR of synthetic virtual view.Fig. 3 has compared the rate distortion curve after adopting the present invention and existing method H.264/AVC to depth map encoding.Wherein Fig. 3 (a) is the experimental result that the depth map of 3 D video sequence B ookarrival is encoded; Fig. 3 (b) is the experimental result that the depth map of 3 D video sequence Kendo is encoded, and Fig. 3 (c) is the experimental result that the depth map of 3 D video sequence Lovebirdl is encoded.Visible by Fig. 3, and H.264/AVC compare, adopt the present invention to encode after, under the identical condition of the encoder bit rate of depth map, the objective quality of synthetic virtual view is higher, explains that the present invention has improved the code efficiency of depth map.As far as 3 D video sequence B ookarrival, the objective quality of synthetic virtual view on average increases 0.215dB; As far as 3 D video sequence Kendo, the objective quality of synthetic virtual view on average increases 0.237dB; As far as 3 D video sequence Lovebird1, the objective quality of synthetic virtual view on average increases 0.45dB.

Claims

1. depth map encoding rate distortion determination methods towards the virtual view quality is characterized in that:

(1) the present encoding piece of depth map is predicted, obtained the predict blocks of present encoding piece, the gray value of i pixel in the present encoding piece representes that with Li i grey scale pixel value in the predict blocks used L _{P, i}Expression, i ∈ 1 ... N}, N are the pixel quantity in the present encoding piece;

{dis}_{i} = f \cdot b \cdot [\frac{L_{i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

{dis}_{rec, i} = f \cdot b \cdot [\frac{L_{rec, i}}{255} \cdot (\frac{1}{Z_{near}} - \frac{1}{Z_{far}}) + \frac{1}{Z_{far}}],

Dis wherein _iThe pairing parallax value of gray value of i pixel in the expression present encoding piece, dis _{Rec, i}The pairing parallax value of gray value of i pixel in the current block, L are rebuild in expression _{Rec, i}I grey scale pixel value in the current block rebuild in expression, and f representes focus of camera, and b is the spacing between the adjacent camera, Z _NearThe real depth value of the object point that expression is nearest apart from video camera, Z _FarExpression is apart from the real depth value of video camera object point farthest;

Wherein, work as dis _iWith dis _{Rec, i}When equating, δ _iEqual 0, work as dis _iWith ds _{Rec, i}When unequal, δ _iEqual L _i-L _{Rec, i}

(7) according to the code check R of the present encoding piece of step (2) gained and the distortion D of step (6) gained, and according to the rate distortion costs J of computes present encoding piece, J=D+ λ R, wherein λ is a Lagrange multiplier.

2. the depth map encoding rate distortion determination methods towards the virtual view quality according to claim 1; It is characterized in that, in the said step (1) the present encoding piece of depth map is carried out forecast method and adopt in the various frames of H.264/AVC standard code or inter-frame prediction method.

3. the depth map encoding rate distortion determination methods towards the virtual view quality according to claim 1 is characterized in that the Lagrange multiplier λ in the said step (7) adopts the numerical value of H.264/AVC standard code.