Multi-mode multi-viewpoint video signal code compression method
Technical field
The present invention relates to the code compression method of multi-view point video signal, especially relate to multi-mode multi-viewpoint video signal code compression method based on correlation analysis between the temporal correlation of multi-view point video signal and viewpoint.
Background technology
3DAV (three-dimensional audio frequency and video) is the developing direction of audio frequency and video technology of new generation.Core technology in using as FTV (free view-point TV), 3DTV 3DAV such as (three-dimensional televisions), multiple view video coding technology are intended to solve problems such as the compression of 3D interactive video, mutual, storage and transmission.Multi-view point video signal is actual scene to be taken the one group of vision signal that obtains by camera array, it can provide the video image information of photographed scene different angles, utilize one or more view information can synthesize the information of any viewpoint, reach the purpose of freely switching viewpoint.Multi-view point video is a kind of novel video with third dimension and interactive operation function, will have wide practical use in the interactive multimedia application (as digital amusement, remote monitoring, long-distance education etc.) towards broadband and high-density storage media.Fig. 1 is the schematic diagram of the present multi-view video system of using always, this system can carry out imaging, encoding compression, transmission, reception, decoding, demonstration of multi-view point video signal etc., and wherein the encoding compression of multi-view point video signal is the core of whole system.
Multi-view point video signal exist data volume huge, be unfavorable for Network Transmission and storage, and system resources consumption (high computation complexity, high storage capacity requirement, high power consumption etc.), user side random access (comprise F.F., rewind down, viewpoint switch and watch constantly freeze, access mode is watched in viewpoint slip etc.) etc. problem.Therefore, how to improve the compression efficiency of multi-viewpoint video signal code, the resource consumption of reduction system, make system have performances such as random access, partial decoding of h and drafting flexibly, become the target of being pursued in present multi-view point video encoding method in the world and the standard formulation research, also become the research focus.
Utilize the temporal correlation of multi-view point video signal, the correlation between viewpoint, adopting motion compensated prediction, parallax compensation prediction is the basic ideas of carrying out the multi-viewpoint video signal code compression.Correlation changes with the factors vary such as camera density, illumination variation, camera and object motion of imaging system between the temporal correlation of multi-view point video signal, viewpoint.When camera is intensive, when each viewpoint imaging intensity is consistent, correlation is strong between the viewpoint of multi-view point video signal; When camera is more sparse, when each viewpoint imaging intensity is inconsistent, the temporal correlation of multi-view point video signal then relatively strong and between viewpoint correlation a little less than.In addition, camera and object motion also exert an influence to the correlation of multi-view point video signal.Therefore, if adopting the multiple view video coding framework with single predict pattern encodes to the multi-view point video signal with different correlation characteristics, to cause it or adopt very complicated multi-reference frame predictive mode to guarantee high encoding compression efficient, the rising at double of encoder computation complexity and space complexity, random access performance descend, encoding time delay increases but cause; Adopt simple relatively predict, but encoder is difficult to make full use of correlation between the temporal correlation of multi-view point video signal and viewpoint, thus the raising of restriction encoding compression efficient.
Because the influence of factors such as different cameral density, illumination variation, camera and object motion, cause multi-view point video signal on its time, show different relevance statistical properties between viewpoint.The time of this complexity of multi-view point video signal is gone up the relevance characteristic that reaches between viewpoint, make the multiple view video coding scheme of existing single structure can not finely be adapted to the compression of relevance characteristic multi-view point video signal complicated and changeable, be difficult to obtain effectively compression effectiveness of combination property (encoding compression efficient, random access, system resources consumption, partial decoding of h and drafting, encoding time delay etc.), this also is the existing ubiquitous major issue of multi-view point video encoding method.
Summary of the invention
Technical problem to be solved by this invention provides a kind of multi-viewpoint video signal code compression method, when reducing encoder complexity, improves the combination property of multiple view video coding compression.
It is as follows that the present invention solves the problems of the technologies described above the technical scheme that is adopted: a kind of multi-mode multi-viewpoint video signal code compression method, encoder is arranged to multi-view point video predictive coding module, the correlation statistical analysis module, predictive mode is selected module and four functional modules of schema update trigger module, multi-view point video signal to input, when coding is initial, can be earlier according to Given information, as the camera array parameter, the encoder complexity requirement, random access performance requirement etc., determine the initial predicted coding mode, encode by described multi-view point video predictive coding module, encode according to the following steps then: 1. select the module multi-view point video signal correlative character that statistical analysis obtains according to described correlation statistical analysis module and to the compression efficiency of multiple view video coding by described predictive mode, encoder complexity, the random access performance, the requirement of several combination properties of encoding time delay, Dynamic Selection determines to be fit to the predictive coding pattern of current multi-view point video signal characteristics of encoding from candidate's predictive coding pattern; 2. after the multi-view point video signal of input being encoded with this selected predictive coding pattern by described multi-view point video predictive coding module, output encoder compressed code flow signal; 3. when the schema update trigger condition in the described schema update trigger module does not satisfy, keep current predictive coding pattern, when the schema update trigger condition in the described schema update trigger module satisfies, again open described correlation statistical analysis module, to select to upgrade the predictive coding pattern.
Described candidate's predictive coding pattern can be divided into three major types: the 1st class is the predictive coding pattern that is applicable to based on the multi-view point video signal of temporal correlation, and such predictive coding pattern is based on motion compensated prediction; The 2nd class is the predictive coding pattern that is applicable to based on the multi-view point video signal of correlation between viewpoint, and such predictive coding pattern is predicted as the master with parallax compensation; The 3rd class is the predictive coding pattern that is applicable to the multi-view point video signal of correlation equilibrium between temporal correlation and viewpoint, and such predictive coding pattern is when taking into account, the associated prediction coding mode in spatial domain.Each class in the above-mentioned three major types predictive coding pattern can be made up of several predictive coding patterns again, be applicable to multi-viewpoint video signal code respectively, and the difference of multiple view video coding combination property is required (as encoder complexity, encoding compression efficient, random access performance, encoding time delay etc.) with different correlation characteristics.
The statistical analysis of described correlation statistical analysis module be to encode or just between the temporal correlation of image encoded group GOP (Group of picture) and viewpoint correlation carry out statistical analysis, and definition relative coefficient α is used to characterize the power contrast of correlation between the temporal correlation of the vision signal that obtains and viewpoint.
In described correlation statistical analysis module, can be to encoding or just in the image encoded group, only adopting the parallax compensation prediction to carry out the quantity n of the Intra-coded blocks in the image encoded frame
i DOnly adopt motion compensated prediction to carry out the quantity n of Intra-coded blocks in the image encoded frame
i PAdd up, with n
i DAnd n
i PProportionate relationship the strong or weak relation of correlation between time of current multi-view point video signal and viewpoint is described.
In described correlation statistical analysis module, also can be to encoding or just in the image encoded group, analyze correlation strong or weak relation between time of current multi-view point video signal and viewpoint with the proportionate relationship that only adopts the parallax compensation prediction to carry out the predicated error of image encoded frame and only adopt motion compensated prediction to carry out the predicated error of image encoded frame.
Described predictive mode is selected the module multi-view point video signal correlative character that statistical analysis obtains according to described correlation statistical analysis module and the mode that predictive mode selects is carried out in the requirement of combination properties such as the compression efficiency of multiple view video coding, encoder complexity, random access performance, encoding time delay as follows:
(1) when the time correlation obviously is better than between viewpoint correlation, further judge that correlation is in the distribution situation of time domain inside relative equilibrium whether, or the temporal correlation in the contiguous moment obviously is better than time contiguous temporal correlation constantly, selects the predictive coding pattern based on motion compensated prediction;
(2) when the time correlation obviously is weaker than between viewpoint correlation, select to be predicted as main predictive coding pattern with parallax compensation;
(3) when correlation between time correlation and viewpoint roughly quite the time, when selecting to take into account, the associated prediction coding mode in spatial domain.
Described schema update trigger module can adopt the schema update scheme based on video content, according to the situation of change of the relative coefficient α that obtains in the described correlation statistical analysis module, determine whether to reactivate described predictive mode and select module to upgrade the predictive coding pattern.
Described schema update trigger module also can adopt the mode that triggers of regularly upgrading, regularly open described correlation statistical analysis module correlation between the temporal correlation of multi-view point video signal and viewpoint is carried out statistical analysis, and enable predictive mode and select module to determine the predictive coding pattern.
In the multi-mode multi-viewpoint video encoder, can make the predict of all candidates' multi-view point video predictive coding pattern have certain general character, be to be arranged in described candidate's predictive coding pattern with the picture frame of intracoded frame synchronization and the picture frame that is positioned at the same viewpoint of intracoded frame all to be encoded prior to other picture frame of image sets, and above-mentioned these picture frames all have identical prediction mode in all candidate's predictive modes, can be in these picture frames that are encoded at first of coding, obtain the correlation statistic analysis result of current multi-view point video signal of encoding, and these picture frames that are encoded at first coding finish the back in time determine current just in the image encoded group other frame take which kind of forecast coding structure, promptly a final selected predictive coding pattern that is fit to current multi-view point video signal characteristics and the requirement of multiple view video coding combination property is encoded from all candidate's predictive coding patterns.
The present invention is directed to content relevance between multi-view point video signal time and viewpoint with many view camera density, illumination, different and the phenomenon that changes of factor such as camera and object motion, proposition is based on the multi-mode multi-viewpoint video coding framework of correlation analysis between multi-view point video signal temporal correlation and viewpoint and the requirement of multiple view video coding combination property, density according to many view camera, illumination, the variation of camera and object motion etc., design corresponding different candidate's predictive coding pattern, by correlation between the temporal correlation of multi-view point video signal and viewpoint is carried out simple statistical characteristic analysis, and the difference of multiple view video coding combination property required (as encoder complexity, encoding compression efficient, the random access performance, encoding time delay etc.), Dynamic Selection is adapted to the predictive coding pattern of current multi-view point video signal characteristics from candidate's predictive coding pattern, thereby improves the combination property of multi-viewpoint video signal code.
Compared with prior art, the invention has the advantages that by to correlation analysis between multi-view point video signal temporal correlation and viewpoint, Dynamic Selection is suitable for the predictive coding pattern of current multi-view point video signal characteristics that are encoded and the requirement of multiple view video coding combination property, multi-reference frame predictive coding method with the associating time and space prediction of the calculation of complex that replaces existing single-mode, thereby effectively reduce the computation complexity of multi-viewpoint video signal code compression, improve the random access performance of multi-view video system, guaranteed the encoding compression performance simultaneously.
Description of drawings
Fig. 1 is the multi-view video system schematic diagram;
Fig. 2 is multi-mode multi-viewpoint video encoder structure of the present invention and cataloged procedure schematic diagram;
Fig. 3 a is the 1st a class candidate predictive coding pattern among the embodiment;
Fig. 3 b is the 2nd a class candidate predictive coding pattern among the embodiment;
Fig. 3 c is the 3rd a class candidate predictive coding pattern among the embodiment;
Fig. 4 is for adopting the sequential prediction coding mode PSVP of P frame;
Fig. 5 is for adopting the sequential prediction coding mode BSVP of B frame;
Fig. 6 is the multi-view point video predictive coding pattern of Mpicture;
Fig. 7 is a Joint multi-view point video cycle tests;
Fig. 8 is the average rate distortion curve of Xmas sequence part in the Joint multi-view point video cycle tests;
Fig. 9 is the average rate distortion curve of exit sequence part in the Joint multi-view point video cycle tests;
Figure 10 is the average rate distortion curve of ballroom sequence part in the Joint multi-view point video cycle tests;
Figure 11 is the average rate distortion curve of Joint multi-view point video cycle tests.
Embodiment
Embodiment describes in further detail the present invention below in conjunction with accompanying drawing.
Here, with 5 * 7 representative image group structures is that example is (shown in Fig. 3 a, Fig. 3 b and Fig. 3 c, each image sets has 5 viewpoints, 7 moment, totally 35 frames), be elaborated with regard to 4 functional modules and the collaborative work mode thereof of multi-mode multi-viewpoint video encoder.
1) multi-view point video predictive coding module
This module is responsible for the encoding compression of multi-view point video signal, promptly adopts by predictive mode and selects certain candidate's predictive coding pattern of module Dynamic Selection that current multi-view point video signal is encoded.
According to correlation situation between multi-view point video signal temporal correlation and viewpoint, candidate's multi-view point video predictive coding pattern is divided into three major types, and the 1st class is the predictive coding pattern that is applicable to based on the multi-view point video signal of temporal correlation; The 2nd class is the predictive coding pattern that is applicable to based on the multi-view point video signal of correlation between viewpoint; The 3rd class is the predictive coding pattern that is applicable to the multi-view point video signal of correlation equilibrium between temporal correlation and viewpoint.Each class in the above-mentioned three major types predictive coding pattern can be made up of several predictive coding patterns again, adapting to multi-viewpoint video signal code, and the difference of multiple view video coding combination property required (as encoder complexity, encoding compression efficient, random access performance, encoding time delay etc.) with different correlation characteristics.
Fig. 3 a, Fig. 3 b and Fig. 3 c represent 3 kinds of different classes of predictive coding patterns being adopted respectively, I represents intracoded frame among the figure, D represents the parallax compensation encoded predicted frame, P represents the motion compensated predictive coding frame, during P ' expression, empty bi-directional predictive coding frame, can be with reference to D, P frame, when B ' is, empty associated prediction frame, can be with reference to D, P and P ' frame.The predictive coding pattern of Fig. 3 a is applicable to the multi-viewpoint video signal code based on temporal correlation based on motion compensated prediction, belongs to the 1st class predictive coding pattern; The predictive coding pattern of Fig. 3 b is predicted as the master with parallax compensation, is applicable to the multi-viewpoint video signal code based on correlation between viewpoint, belongs to the 2nd class predictive coding pattern; The predictive coding pattern of Fig. 3 c then when taking into account, the associated prediction in spatial domain, be applicable to the multi-viewpoint video signal code of correlation equilibrium between the time viewpoint to belong to the 3rd class predictive coding pattern.In the present embodiment, each the class predictive coding pattern in the three major types only has a candidate pattern, and actual use can design a plurality of different predictive coding patterns when of the present invention as required.
2) correlation statistical analysis module
Definition relative coefficient α is used to characterize the power contrast of correlation between the temporal correlation of vision signal and viewpoint, this coefficient can by to encode or just between the temporal correlation of image encoded group and viewpoint correlation carry out statistical analysis and obtain.
In the multi-mode multi-viewpoint video coding, for with I frame synchronization but be positioned at the image of different points of view, as being positioned at some D frames of 2 sides about the I frame among Fig. 3 a, Fig. 3 b and Fig. 3 c, only it to be encoded by the parallax compensation prediction, the quantitaes of I piece in the D frame (being Intra-coded blocks) is n
i DFor being same viewpoint but the image in the different moment with the I frame, as being positioned at I frame some P frames of 2 sides (the actual front and back frame that shows as the I frame from the time) up and down among Fig. 3 a, Fig. 3 b and Fig. 3 c, only by motion compensated prediction it is encoded, the quantitaes of I piece is n in the P frame
i PRelative coefficient α may be defined as
Wherein, n, m represent to be used to calculate the D frame of relative coefficient and the frame number of P frame respectively.This relative coefficient α can be used for characterizing the power contrast of correlation between the temporal correlation of vision signal and viewpoint.And calculate the required I number of blocks of α can coding simultaneously statistics obtain, additional computational overhead is extremely low, thereby can effectively realize the vision signal correlation statistical analysis of multi-mode multi-viewpoint video coding by α.Present embodiment promptly adopts the proportionate relationship of I number of blocks in D frame and the P frame to calculate relative coefficient α, and selects to adopt in the module threshold method 1 predictive coding pattern of final selection from 3 candidate's predictive coding patterns shown in Fig. 3 a, Fig. 3 b and Fig. 3 c to submit to multi-view point video predictive coding module at predictive mode and encode.
Except that such scheme, also can be by encoding or just those only adopt parallax compensation prediction to carry out the predicated error (for example sad value) of image encoded frame (D frame) in the image encoded group, and those only adopt motion compensated prediction to carry out the proportionate relationship of the predicated error of image encoded frame (P frame), correlation strong or weak relation between the time of the current multi-view point video signal of statistical analysis and viewpoint.
3) predictive mode is selected module
Multi-view point video signal correlation statistic analysis result according to the correlation statistical analysis module, and, from candidate's predictive coding pattern, select to be fit to certain predictive coding pattern of current multi-view point video signal characteristics and coding comprehensive performance requirement to the requirement of combination properties such as the compression efficiency of multi-mode multi-viewpoint video coding, encoder complexity, random access performance, encoding time delay.The selection mode of predictive coding pattern is as follows:
(1) when the time correlation obviously is better than between viewpoint correlation, can judge further that correlation is in the distribution situation of time domain inside relative equilibrium whether, or the temporal correlation in the contiguous moment obviously is better than time contiguous temporal correlation constantly, to select to determine certain the 1st suitable class predictive coding pattern.
(2) when the time correlation obviously is weaker than between viewpoint correlation, select certain to be predicted as the 2nd main class predictive coding pattern, so that in multi-view point video predictive coding module, adopt this predictive coding pattern to encode with parallax compensation.
(3) when correlation between time correlation and viewpoint roughly quite the time, when then selecting certain the 3rd class to take into account, the associated prediction coding mode in spatial domain.
In the present embodiment, because the picture frame that 3 candidate's predictive coding patterns that adopted are arranged in Fig. 3 a, Fig. 3 b and Fig. 3 c on the central cross is encoded prior to other picture frame of image sets, and these picture frames that are positioned on the central cross of these 3 predictive coding patterns have identical prediction mode, therefore can in these picture frames of coding, obtain the required n of correlation statistical analysis module
i DAnd n
i PThereby can obtain the correlation statistic analysis result of current multi-view point video signal of encoding, so that being arranged in picture frame coding on the central cross at these finishes in time definite other frame of image sets in back and takes which kind of predictive mode, promptly a final selected predictive coding pattern that is fit to current multi-view point video signal characteristics from 3 candidate's predictive coding patterns shown in Fig. 3 a, Fig. 3 b and Fig. 3 c is submitted to multi-view point video predictive coding module and is encoded.
4) schema update trigger module
Can adopt schema update scheme, promptly, determine whether to reactivate predictive mode by threshold method and select module to upgrade corresponding predictive coding pattern according to the situation of change of the relative coefficient α that obtains in the correlation statistical analysis module based on video content; Perhaps also can adopt the mode that triggers of regularly upgrading, regularly enable the correlation statistical analysis module correlation between the temporal correlation of multi-view point video signal and viewpoint is carried out statistical analysis, and enable predictive mode and select the predictive coding pattern of module to determine to adopt.Present embodiment adopts the schema update scheme based on video content.
Below carry out multiple view video coding with regard to present embodiment performance describe:
1) the random access performance of multi-mode multi-viewpoint Video Coding Scheme
For multi-view point video, its random access comprise F.F., rewind down, viewpoint switch and watch constantly freeze, access mode such as viewpoint slip.Suppose that v the viewpoint that is used to encode, the multi-view point video frame sum s=v * t of each viewpoint t frame are limited.Make x
iBeing illustrated in needs the frame number of decoding in advance, p before the i frame decoded
iBe the probability of user's random access i frame, the then mathematic expectaion of random access cost
Be to estimate the important indicator of a predictive coding pattern n to the random access degree of support.This cost is high more, illustrates that decoding end is low more to the tenability of random access, and the resource that consumes for the support random access is just many more.If k
nFor adopting the probability of n predictive coding pattern-coding multi-view point video signal, candidate's predictive coding number of modes is N, and then the random access cost of multi-mode multi-viewpoint video coding can be expressed as
The probability k of each pattern-coding in the multi-mode multi-viewpoint video coding
nDirectly relevant with the characteristics of actual multi-view point video signal.N=3 in the present embodiment, and suppose that each pattern-coding probability is identical, i.e. k
n=1/3 (n=1,2,3), then the random access cost of different schemes is as shown in table 1.PSVP and BSVP represent the sequential prediction method that adopts P frame, B frame respectively in the table, and its predictive coding pattern respectively as shown in Figure 4 and Figure 5.Mpiture is people's such as Japanese Fujii a Mpicture multi-view point video encoding method, and its predictive coding pattern as shown in Figure 6.PSVP, BSVP and Mpiture are the multi-reference frame predictive coding method of single-mode.MMVC is the multi-mode multi-viewpoint method for video coding (is the present invention program's representative with the present embodiment) of employing of the present invention 3 kinds of candidate's predictive coding patterns as shown in Figure 3.By table 1 as seen, with regard to the random access performance, PSVP is the poorest, and BSVP is relative with Mpicture quite a lot of.And the random access cost of multi-mode multi-viewpoint method for video coding MMVC of the present invention is minimum, relative PSVP, BSVP and Mpicture method, and its random access cost has reduced by 49%~72%, and the random access performance is significantly improved.
2) computation complexity of multi-mode multi-viewpoint Video Coding Scheme
Prediction of high accuracy parallax compensation and motion compensated prediction based on coding framework H.264/AVC account for the computation complexity of whole multiple view video coding device 75% or more, therefore can predict and the number of times of motion compensated prediction characterize the computation complexity of whole encoder by one 5 * 7 required parallax compensation of image sets of on average encoding.Each scheme computation complexity is more as shown in table 1, owing to adopted the multi-reference frame method, the computation complexity of PSVP, BSVP and Mpicture scheme is all very big, especially BSVP and Mpicuture method.And compare with the Mpicture scheme with PSVP, BSVP, the present invention program's computation complexity has then reduced by 29%~57% relatively.
The random access cost of table 1 the present invention program MMVC and computation complexity are relatively
Encoding scheme |
E(X) |
Random access cost multiple |
Computation complexity |
The computation complexity multiple |
PSVP |
11.0 |
364% |
58 |
141% |
BSVP |
7.5 |
248% |
83 |
202% |
Mpicture |
6.0 |
199% |
97 |
237% |
MMVC |
3.02 |
100% |
41 |
100% |
3) distortion performance of multi-mode multi-viewpoint Video Coding Scheme
In order to estimate the code efficiency of MMVC scheme of the present invention,, carried out multiple view video coding experiment (quantization parameter QP is respectively 24,30,36,40) based on (JM8.5mainprofile) video coding framework H.264/AVC.The multi-view point video cycle tests selects for use Xmas (correlation is big between camera spacing 9mm, viewpoint), the exit of Tanimoto laboratory and MERL (to move slowly, big parallax, camera spacing 19.5cm) and many viewpoints cycle tests collection of ballroom (motion violent), it is captured that 3 sequences are parallel camera system, and resolution is 640 * 480.Choose 5 viewpoints, 5 scenes, 5 image sets of each scene, and it is spliced into as shown in Figure 7 Joint sequence, i.e. each 5 * 5 * 7=175 frame of each viewpoint video.In the experiment, the scene of simulating actual video by the mode of sequence assembly is switched, in the present embodiment, MMVC can self adaptation select suitable predictive coding pattern that the Joint sequence is encoded from the candidate's predictive coding pattern shown in Fig. 3 a, Fig. 3 b and Fig. 3 c according to video content.
Fig. 8,9,10,11 distortion performance of the Joint cycle tests being encoded for the MMVC and the methods such as sequential prediction method PSVP, BSVP and Mpicture of employing present embodiment compare.Wherein Fig. 8,9,10 is respectively Xmas, exit and three sequences of the ballroom average rate distortion curve separately in the Joint sequence.The population mean rate distortion curve of Joint sequence shown in Figure 11 shows that the distortion performance of MMVC and BSVP, PSVP and Mpicture is suitable substantially.
In sum, compared with prior art, the invention has the advantages that by to correlation analysis between multi-view point video signal temporal correlation and viewpoint, Dynamic Selection is suitable for the predictive coding pattern of current multi-view point video signal characteristics that are encoded and the requirement of multiple view video coding combination property, multi-reference frame predictive coding method with the associating time and space prediction of the calculation of complex that replaces existing single-mode, thereby effectively reduce the computation complexity of multi-viewpoint video signal code compression, improve the random access performance of multi-view video system, guarantee the encoding compression performance simultaneously.
Obviously, multi-view point video predictive coding pattern is not limited only to the form of present embodiment, therefore under the situation of the spirit and scope of the universal that does not deviate from claim and equal scope and limited, the example that the present invention is not limited to specific details and illustrates here and describe.