CN100415002C

CN100415002C - Coding and compression method of multi-mode and multi-viewpoint video signal

Info

Publication number: CN100415002C
Application number: CNB2006100528959A
Authority: CN
Inventors: 蒋刚毅; 郁梅; 张云
Original assignee: Ningbo University
Current assignee: Shanghai Spparks Technology Co ltd
Priority date: 2006-08-11
Filing date: 2006-08-11
Publication date: 2008-08-27
Anticipated expiration: 2026-08-11
Also published as: CN1913640A

Abstract

The invention discloses a multi-mode multi-viewpoint video encoding method, which analyzes the time correlation and inter-viewpoint correlation of multi-viewpoint video signals, and the system's compression efficiency, encoding complexity, random access performance, and encoding of multi-viewpoint video encoding. Delay and other comprehensive performance requirements, adaptively select the predictive coding mode that is suitable for the characteristics of the current coding multi-view video signal and the comprehensive performance requirements of multi-view video coding from the candidate predictive coding modes to encode the multi-view video signal, instead of A multi-reference frame multi-viewpoint video predictive coding method with single-mode and computationally complex joint temporal and spatial prediction reduces the computational complexity of multi-viewpoint video signal coding and compression, while ensuring coding compression efficiency and improving random access performance.

Description

Coding and compression method of multi-mode and multi-viewpoint video signal

技术领域 technical field

本发明涉及多视点视频信号的编码压缩方法，尤其是涉及基于多视点视频信号的时间相关性与视点间相关性分析的多模式多视点视频信号编码压缩方法。The invention relates to a method for encoding and compressing multi-viewpoint video signals, in particular to a method for encoding and compressing multi-mode multi-viewpoint video signals based on the analysis of time correlation and inter-viewpoint correlation of multi-viewpoint video signals.

背景技术 Background technique

3DAV(三维音视频)是新一代音视频技术的发展方向。作为FTV(自由视点电视)、3DTV(三维电视)等3DAV应用中的核心技术，多视点视频编码技术旨在解决3D交互式视频的压缩、交互、存储和传输等问题。多视点视频信号是由相机阵列对实际场景进行拍摄得到的一组视频信号，它能提供拍摄场景不同角度的视频图像信息，利用其中的一个或多个视点信息可以合成任意视点的信息，达到自由切换视点的目的。多视点视频是一种具有立体感和交互操作功能的新型视频，将在面向宽带与高密度存储介质的交互式多媒体应用领域(如数字娱乐、远程监控、远程教育等)有广泛的应用前景。图1是目前常用的多视点视频系统的示意图，这种系统可以进行多视点视频信号的成像、编码压缩、传输、接收、解码、显示等，而其中多视点视频信号的编码压缩是整个系统的核心部分。3DAV (three-dimensional audio and video) is the development direction of the new generation of audio and video technology. As the core technology in 3DAV applications such as FTV (Free Viewpoint Television) and 3DTV (3D Television), multi-viewpoint video coding technology aims to solve problems such as compression, interaction, storage and transmission of 3D interactive video. The multi-viewpoint video signal is a group of video signals obtained by shooting the actual scene by the camera array. It can provide video image information from different angles of the shooting scene. Using one or more viewpoint information, the information of any viewpoint can be synthesized to achieve freedom. The purpose of switching viewpoints. Multi-viewpoint video is a new type of video with stereoscopic and interactive operation functions, and will have broad application prospects in interactive multimedia applications (such as digital entertainment, remote monitoring, and distance education, etc.) oriented to broadband and high-density storage media. Figure 1 is a schematic diagram of a commonly used multi-view video system at present. This system can perform imaging, encoding and compression, transmission, reception, decoding, and display of multi-view video signals, and the encoding and compression of multi-view video signals is the core of the entire system. core part.

多视点视频信号存在着数据量巨大、不利于网络传输和存储，以及系统资源消耗(高计算复杂度、高存储容量要求、高功耗等)、用户端随机访问(包括快进、快退、视点切换和观看时刻冻结、视点滑动等观看访问方式)等问题。因此，如何提高多视点视频信号编码的压缩效率、降低系统的资源消耗，使系统具有灵活的随机访问、部分解码与绘制等性能，已成为目前国际上多视点视频编码方法与标准制定研究中所追求的目标，也成为研究热点。Multi-viewpoint video signals have a huge amount of data, which is not conducive to network transmission and storage, as well as system resource consumption (high computational complexity, high storage capacity requirements, high power consumption, etc.), client random access (including fast forward, fast rewind, Viewpoint switching and viewing time freezing, viewing point sliding and other viewing access methods) and other issues. Therefore, how to improve the compression efficiency of multi-view video signal coding, reduce the resource consumption of the system, and enable the system to have flexible random access, partial decoding and rendering, etc., has become the current international research on multi-view video coding methods and standards. The pursuit of the goal has also become a research hotspot.

利用多视点视频信号的时间相关性、视点间的相关性，采用运动补偿预测、视差补偿预测是进行多视点视频信号编码压缩的基本思路。多视点视频信号的时间相关性、视点间相关性随成像系统的相机密度、光照变化、相机及对象运动等因素变化而变化。当相机密集、各视点成像强度一致时，多视点视频信号的视点间相关性强；当相机较稀疏、各视点成像强度不一致时，多视点视频信号的时间相关性则相对较强、而视点间相关性较弱。此外，相机及对象运动对多视点视频信号的相关性也产生影响。因此，如果采用具有单一预测结构模式的多视点视频编码框架对具有不同相关性特点的多视点视频信号进行编码，将导致其要么采用非常复杂的多参考帧预测模式以保证高编码压缩效率，但造成编码器计算复杂度和空间复杂度的成倍上升、随机访问性能下降、编码延时增加；要么采用相对简单的预测结构，但编码器难以充分利用多视点视频信号的时间相关性和视点间相关性，从而制约编码压缩效率的提高。Utilizing the temporal correlation and inter-view correlation of multi-view video signals, motion compensation prediction and parallax compensation prediction are the basic ideas for encoding and compressing multi-view video signals. The time correlation and inter-viewpoint correlation of multi-viewpoint video signals change with the camera density of the imaging system, illumination changes, camera and object motion and other factors. When the cameras are dense and the imaging intensity of each viewpoint is consistent, the correlation between the viewpoints of the multi-viewpoint video signal is strong; when the cameras are sparse and the imaging intensity of each viewpoint is inconsistent, the temporal correlation of the multi-viewpoint video signal is relatively strong, while Correlation is weak. In addition, camera and object motion also have an impact on the correlation of multi-view video signals. Therefore, if a multi-view video coding framework with a single prediction structure mode is used to encode multi-view video signals with different correlation characteristics, it will either use a very complex multi-reference frame prediction mode to ensure high coding compression efficiency, but Causes the multiplied increase of the computational complexity and space complexity of the encoder, the decrease of random access performance, and the increase of encoding delay; Correlation, thus restricting the improvement of coding compression efficiency.

由于不同相机密度、光照变化、相机及对象运动等因素的影响，导致多视点视频信号在其时间上、视点间表现出不同的内容关联统计特性。多视点视频信号的这种复杂的时间上及视点间的内容关联特性，使得现有单一结构的多视点视频编码方案不能很好适应于内容关联特性复杂多变的多视点视频信号的压缩，难以获得综合性能(编码压缩效率、随机访问、系统资源消耗、部分解码与绘制、编码延时等)有效的压缩效果，这也是现有多视点视频编码方法普遍存在的一个重要问题。Due to the influence of factors such as different camera densities, illumination changes, camera and object motions, multi-view video signals show different statistical characteristics of content correlation in time and between viewpoints. The complex temporal and inter-view content correlation characteristics of multi-view video signals make the existing multi-view video coding schemes with a single structure unable to adapt well to the compression of multi-view video signals with complex and changeable content correlation characteristics. Obtaining an effective compression effect with comprehensive performance (encoding compression efficiency, random access, system resource consumption, partial decoding and rendering, encoding delay, etc.) is also an important problem common to existing multi-view video encoding methods.

发明内容 Contents of the invention

本发明所要解决的技术问题是提供一种多视点视频信号编码压缩方法，在降低编码复杂度的同时，提高多视点视频编码压缩的综合性能。The technical problem to be solved by the present invention is to provide a method for coding and compressing multi-view video signals, which can improve the comprehensive performance of multi-view video coding and compression while reducing the coding complexity.

本发明解决上述技术问题所采用的技术方案如下：一种多模式多视点视频信号编码压缩方法，将编码器设置成多视点视频预测编码模块、相关性统计分析模块、预测模式选择模块和模式更新触发模块四个功能模块，对输入的多视点视频信号，在编码初始时，可以先根据已知信息，如相机阵列参数、编码复杂度要求、随机访问性能要求等，确定初始预测编码模式，由所述的多视点视频预测编码模块进行编码，然后按以下步骤进行编码：①由所述的预测模式选择模块根据所述的相关性统计分析模块统计分析得到的多视点视频信号相关性特征以及对多视点视频编码的压缩效率、编码复杂度、随机访问性能、编码延时几项综合性能的要求，从候选预测编码模式中动态选择确定适合当前正在编码的多视点视频信号特点的预测编码模式；②由所述的多视点视频预测编码模块以该选定的预测编码模式对输入的多视点视频信号进行编码后，输出编码压缩后的码流信号；③当所述的模式更新触发模块中的模式更新触发条件未满足时，保持当前的预测编码模式，当所述的模式更新触发模块中的模式更新触发条件满足时，重新开启所述的相关性统计分析模块，以选择更新预测编码模式。The technical scheme adopted by the present invention to solve the above-mentioned technical problems is as follows: a multi-mode multi-viewpoint video signal encoding and compression method, the encoder is set as a multi-viewpoint video predictive encoding module, a correlation statistical analysis module, a prediction mode selection module and a mode update The trigger module has four functional modules. For the input multi-view video signal, at the beginning of encoding, the initial predictive encoding mode can be determined based on known information, such as camera array parameters, encoding complexity requirements, random access performance requirements, etc., by The described multi-viewpoint video predictive encoding module encodes, and then encodes according to the following steps: ① by the described prediction mode selection module according to the statistical analysis of the multi-viewpoint video signal correlation characteristics obtained by the statistical correlation analysis module and the According to the comprehensive performance requirements of multi-view video coding, such as compression efficiency, coding complexity, random access performance, and coding delay, dynamically select the predictive coding mode suitable for the characteristics of the multi-view video signal currently being coded from the candidate predictive coding modes; ② After encoding the input multi-view video signal with the selected predictive coding mode by the multi-view video predictive coding module, the encoded and compressed code stream signal is output; ③ when the mode update triggers the When the mode update trigger condition is not met, keep the current predictive coding mode, and when the mode update trigger condition in the mode update trigger module is satisfied, restart the correlation statistical analysis module to select and update the predictive coding mode.

所述的候选预测编码模式可以分为三大类：第1类为适用于以时间相关性为主的多视点视频信号的预测编码模式，该类预测编码模式以运动补偿预测为主；第2类为适用于以视点间相关性为主的多视点视频信号的预测编码模式，该类预测编码模式以视差补偿预测为主；第3类为适用于时间相关性和视点间相关性均衡的多视点视频信号的预测编码模式，该类预测编码模式为兼顾时、空域的联合预测编码模式。上述三大类预测编码模式中的每一类又可由若干个预测编码模式组成，分别适用于具有不同相关性特点的多视点视频信号编码，以及对多视点视频编码综合性能的不同要求(如编码复杂度、编码压缩效率、随机访问性能、编码延时等)。The candidate predictive coding modes can be divided into three categories: the first type is a predictive coding mode applicable to multi-viewpoint video signals based on time correlation, and this type of predictive coding mode is mainly based on motion compensation prediction; the second type The first category is the predictive coding mode suitable for multi-viewpoint video signals based on inter-viewpoint correlation, and this kind of predictive coding mode is mainly based on parallax compensation prediction; The predictive coding mode of the viewpoint video signal, this type of predictive coding mode is a joint predictive coding mode that takes into account the temporal and spatial domains. Each of the above three categories of predictive coding modes can be composed of several predictive coding modes, which are respectively suitable for multi-view video signal coding with different correlation characteristics, and different requirements for the comprehensive performance of multi-view video coding (such as coding complexity, encoding compression efficiency, random access performance, encoding delay, etc.).

所述的相关性统计分析模块的统计分析是对已编码或正在编码的图像组GOP(Group of picture)的时间相关性与视点间相关性进行统计分析，并定义相关性系数α用于表征得到的视频信号的时间相关性与视点间相关性的强弱对比。The statistical analysis of the described correlation statistical analysis module is to perform statistical analysis on the time correlation and inter-viewpoint correlation of the encoded or being encoded image group GOP (Group of picture), and define the correlation coefficient α for characterization to obtain The time correlation of the video signal is compared with the strength of the correlation between viewpoints.

在所述的相关性统计分析模块中，可以对已编码或正在编码的图像组中仅采用视差补偿预测进行编码的图像帧中的帧内编码块的数量n_i ^D和仅采用运动补偿预测进行编码的图像帧中帧内编码块的数量n_i ^P进行统计，以n_i ^D和n_i ^P的比例关系来描述当前多视点视频信号的时间和视点间相关性的强弱关系。In the correlation statistical analysis module, the number n _i ^D of intra-coded blocks in the image frame that is coded or coded by using only parallax compensation prediction and only using motion compensation prediction can be calculated. The number n _i ^P of intra-frame coding blocks in the coded image frame is counted, and the relationship between time and inter-viewpoint correlation of the current multi-view video signal is described by the proportional relationship between n _i ^D and n _i ^P.

在所述的相关性统计分析模块中，也可以对已编码或正在编码的图像组，以仅采用视差补偿预测进行编码的图像帧的预测误差和仅采用运动补偿预测进行编码的图像帧的预测误差的比例关系来分析当前多视点视频信号的时间和视点间相关性强弱关系。In the correlation statistical analysis module, it is also possible to use only the prediction error of the image frame encoded by parallax compensation prediction and the prediction of the image frame encoded by only motion compensation prediction for the image group that has been encoded or is being encoded The proportional relationship of the error is used to analyze the time of the current multi-viewpoint video signal and the relationship between the strength and weakness of the correlation between viewpoints.

所述的预测模式选择模块根据所述的相关性统计分析模块统计分析得到的多视点视频信号相关性特征以及对多视点视频编码的压缩效率、编码复杂度、随机访问性能、编码延时等综合性能的要求进行预测模式选择的方式如下：The prediction mode selection module is based on the multi-viewpoint video signal correlation characteristics obtained through statistical analysis by the correlation statistical analysis module, and the multi-viewpoint video coding compression efficiency, coding complexity, random access performance, coding delay, etc. The performance requirements for prediction mode selection are as follows:

(1)当时间相关性明显强于视点间相关性时，进一步判断相关性在时域内部的分布情况是否相对均衡，或是最邻近时刻的时间相关性明显强于次邻近时刻的时间相关性，选择以运动补偿预测为主的预测编码模式；(1) When the time correlation is obviously stronger than the inter-viewpoint correlation, further judge whether the distribution of the correlation in the time domain is relatively balanced, or whether the time correlation of the nearest moment is obviously stronger than that of the next nearest moment , select the prediction coding mode mainly based on motion compensation prediction;

(2)当时间相关性明显弱于视点间相关性时，选择以视差补偿预测为主的预测编码模式；(2) When the temporal correlation is obviously weaker than the inter-view correlation, select the predictive coding mode based on parallax compensation prediction;

(3)当时间相关性与视点间相关性大致相当时，选择兼顾时、空域的联合预测编码模式。(3) When the temporal correlation is roughly equivalent to the inter-view correlation, choose a joint predictive coding mode that takes both temporal and spatial domains into consideration.

所述的模式更新触发模块可以采用基于视频内容的模式更新方案，根据所述的相关性统计分析模块中得到的相关性系数α的变化情况，确定是否重新启用所述的预测模式选择模块以更新预测编码模式。The mode update triggering module may adopt a mode update scheme based on video content, and determine whether to re-enable the prediction mode selection module to update according to the variation of the correlation coefficient α obtained in the correlation statistical analysis module. Predictive coding mode.

所述的模式更新触发模块也可以采用定时更新触发的方式，定期开启所述的相关性统计分析模块对多视点视频信号的时间相关性和视点间相关性进行统计分析，并启用预测模式选择模块以确定预测编码模式。The mode update triggering module may also adopt a timing update triggering method to regularly open the correlation statistical analysis module to perform statistical analysis on the time correlation and inter-viewpoint correlation of multi-viewpoint video signals, and enable the prediction mode selection module to determine the predictive coding mode.

在多模式多视点视频编码器中，可以使所有候选的多视点视频预测编码模式的预测结构具有一定的共性，即所述的候选预测编码模式中位于和帧内编码帧同一时刻的图像帧以及位于与帧内编码帧同一视点的图像帧均先于图像组中其它图像帧被编码，而且上述这些图像帧在所有候选预测模式中都具有相同的预测方式，可以在编码这些最先被编码的图像帧的同时，获得当前正在编码的多视点视频信号的相关性统计分析结果，并在这些最先被编码的图像帧编码完成后及时确定当前正在编码的图像组中其它帧采取何种预测编码结构，即从所有候选预测编码模式中最终选定一个适合当前多视点视频信号特点以及多视点视频编码综合性能要求的预测编码模式进行编码。In the multi-mode multi-viewpoint video encoder, the prediction structures of all candidate multi-viewpoint video prediction coding modes can have certain commonality, that is, the image frames located at the same moment as the intra-frame coding frames in the candidate prediction coding modes and The image frames located at the same viewpoint as the intra-coded frame are coded before other image frames in the image group, and the above-mentioned image frames have the same prediction method in all candidate prediction modes, and these image frames that are coded first can be coded At the same time as the image frame, obtain the correlation statistical analysis results of the multi-viewpoint video signal currently being encoded, and determine in time which predictive encoding is adopted for other frames in the image group currently being encoded after the encoding of these first encoded image frames is completed Structure, that is, from all candidate predictive coding modes, finally select a predictive coding mode that is suitable for the characteristics of the current multi-viewpoint video signal and the comprehensive performance requirements of multi-viewpoint video coding for coding.

本发明针对多视点视频信号时间及视点间的内容相关性随多视点相机密度、光照、相机及对象运动等因素不同而变化的现象，提出基于多视点视频信号时间相关性及视点间相关性分析以及多视点视频编码综合性能要求的多模式多视点视频编码框架，根据多视点相机的密度、光照、相机及对象运动等的变化，设计相应的不同候选预测编码模式，通过对多视点视频信号的时间相关性和视点间相关性进行简单的统计特性分析，以及对多视点视频编码综合性能的不同要求(如编码复杂度、编码压缩效率、随机访问性能、编码延时等)，从候选预测编码模式中动态选择适应于当前多视点视频信号特点的预测编码模式，从而提高多视点视频信号编码的综合性能。Aiming at the phenomenon that the multi-viewpoint video signal time and content correlation between viewpoints change with different factors such as multi-viewpoint camera density, illumination, camera and object motion, the present invention proposes an analysis based on multi-viewpoint video signal time correlation and inter-viewpoint correlation And the multi-mode multi-view video coding framework required by the comprehensive performance of multi-view video coding. According to the changes of multi-view camera density, illumination, camera and object motion, different candidate predictive coding modes are designed. Through the multi-view video signal Simple statistical analysis of time correlation and inter-view correlation, as well as different requirements for the comprehensive performance of multi-view video coding (such as coding complexity, coding compression efficiency, random access performance, coding delay, etc.), from candidate predictive coding Among the modes, the predictive coding mode suitable for the characteristics of the current multi-viewpoint video signal is dynamically selected, thereby improving the comprehensive performance of the multi-viewpoint video signal coding.

与现有技术相比，本发明的优点在于通过对多视点视频信号时间相关性与视点间相关性分析，动态选择适合于当前被编码的多视点视频信号特点以及多视点视频编码综合性能要求的预测编码模式，以取代现有单一模式的计算复杂的联合时间与空间预测的多参考帧预测编码方法，从而有效降低多视点视频信号编码压缩的计算复杂度，提高了多视点视频系统的随机访问性能，同时保证了编码压缩性能。Compared with the prior art, the advantage of the present invention is that by analyzing the temporal correlation of multi-view video signals and the inter-view correlation, dynamic selection is suitable for the characteristics of the currently encoded multi-view video signal and the comprehensive performance requirements of multi-view video coding. Predictive coding mode to replace the multi-reference frame predictive coding method of joint time and space prediction in the existing single mode, so as to effectively reduce the computational complexity of multi-viewpoint video signal coding and compression, and improve the random access of multi-viewpoint video system Performance, while ensuring the encoding compression performance.

附图说明 Description of drawings

图1为多视点视频系统示意图；Fig. 1 is a schematic diagram of a multi-viewpoint video system;

图2为本发明多模式多视点视频编码器结构与编码过程示意图；Fig. 2 is a schematic diagram of the structure and encoding process of the multi-mode multi-viewpoint video encoder of the present invention;

图3a为实施例中的第1类候选预测编码模式；Figure 3a is the first type of candidate predictive coding mode in the embodiment;

图3b为实施例中的第2类候选预测编码模式；Figure 3b is a second type of candidate predictive coding mode in the embodiment;

图3c为实施例中的第3类候选预测编码模式；Figure 3c is a third type of candidate predictive coding mode in the embodiment;

图4为采用P帧的顺序预测编码模式PSVP；FIG. 4 is a sequential predictive coding mode PSVP using P frames;

图5为采用B帧的顺序预测编码模式BSVP；FIG. 5 is a sequential predictive coding mode BSVP using B frames;

图6为Mpicture的多视点视频预测编码模式；Fig. 6 is the multi-view video prediction coding mode of Mpicture;

图7为Joint多视点视频测试序列；Fig. 7 is a Joint multi-viewpoint video test sequence;

图8为Joint多视点视频测试序列中Xmas序列部分的平均率失真曲线；Fig. 8 is the average rate-distortion curve of the Xmas sequence part in the Joint multi-viewpoint video test sequence;

图9为Joint多视点视频测试序列中exit序列部分的平均率失真曲线；Fig. 9 is the average rate-distortion curve of the exit sequence part in the Joint multi-viewpoint video test sequence;

图10为Joint多视点视频测试序列中ballroom序列部分的平均率失真曲线；Figure 10 is the average rate-distortion curve of the ballroom sequence part in the Joint multi-viewpoint video test sequence;

图11为Joint多视点视频测试序列的平均率失真曲线。Figure 11 is the average rate-distortion curve of the Joint multi-viewpoint video test sequence.

具体实施方式 Detailed ways

以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

这里，以具有代表性的5×7图像组结构为例(如图3a、图3b和图3c所示，每个图像组共有5个视点、7个时刻，共35帧)，就多模式多视点视频编码器的4个功能模块及其协同工作方式进行详细说明。Here, taking the representative 5×7 image group structure as an example (as shown in Fig. The four functional modules of the viewpoint video encoder and their cooperative working methods are described in detail.

1)多视点视频预测编码模块1) Multi-view video predictive coding module

该模块负责多视点视频信号的编码压缩，即采用由预测模式选择模块动态选择的某个候选预测编码模式对当前多视点视频信号进行编码。This module is responsible for encoding and compressing multi-viewpoint video signals, that is, using a candidate predictive coding mode dynamically selected by the prediction mode selection module to encode the current multi-viewpoint video signal.

根据多视点视频信号时间相关性与视点间相关性情况，候选的多视点视频预测编码模式分为三大类，第1类为适用于以时间相关性为主的多视点视频信号的预测编码模式；第2类为适用于以视点间相关性为主的多视点视频信号的预测编码模式；第3类为适用于时间相关性和视点间相关性均衡的多视点视频信号的预测编码模式。上述三大类预测编码模式中的每一类又可由若干个预测编码模式组成，以适应具有不同相关性特点的多视点视频信号编码，以及对多视点视频编码综合性能的不同要求(如编码复杂度、编码压缩效率、随机访问性能、编码延时等)。According to the temporal correlation and inter-view correlation of multi-view video signals, the candidate multi-view video predictive coding modes are divided into three categories. The first category is the predictive coding mode suitable for multi-view video signals with temporal correlation ; The second category is the predictive coding mode suitable for multi-viewpoint video signals based on inter-viewpoint correlation; the third category is the predictive coding mode suitable for multi-viewpoint video signal with balanced temporal correlation and inter-viewpoint correlation. Each of the above three categories of predictive coding modes can be composed of several predictive coding modes to adapt to multi-view video signal coding with different correlation characteristics, and different requirements for the comprehensive performance of multi-view video coding (such as complex coding degree, encoding compression efficiency, random access performance, encoding delay, etc.).

图3a、图3b和图3c分别表示所采用的3种不同类别的预测编码模式，图中I表示帧内编码帧，D表示视差补偿预测编码帧，P表示运动补偿预测编码帧，P′表示时、空双向预测编码帧，可参考D、P帧，B′为时、空联合预测帧，可参考D、P和P′帧。图3a的预测编码模式以运动补偿预测为主，适用于以时间相关性为主的多视点视频信号编码，属于第1类预测编码模式；图3b的预测编码模式以视差补偿预测为主，适用于以视点间相关性为主的多视点视频信号编码，属于第2类预测编码模式；图3c的预测编码模式则为兼顾时、空域的联合预测，适用于时间视点间相关性均衡的多视点视频信号编码，属于第3类预测编码模式。本实施例中，三大类中的每一类预测编码模式仅有一个候选模式，实际使用本发明时可根据需要设计多个不同的预测编码模式。Figure 3a, Figure 3b and Figure 3c respectively show the three different types of predictive coding modes adopted. In the figure, I represents the intra-frame coding frame, D represents the parallax compensation predictive coding frame, P represents the motion compensation predictive coding frame, and P' represents For time-space bidirectional predictive coding frames, D and P frames can be referred to, and B' is a time-space joint prediction frame, and D, P, and P' frames can be referred to. The predictive coding mode in Figure 3a is mainly based on motion compensation prediction, which is suitable for multi-view video signal coding based on temporal correlation, and belongs to the first type of predictive coding mode; the predictive coding mode in Figure 3b is mainly based on parallax compensation prediction, applicable It belongs to the second type of predictive coding mode for multi-view video signal coding based on inter-view correlation; the predictive coding mode in Fig. Video signal coding belongs to the third type of predictive coding mode. In this embodiment, there is only one candidate mode for each type of predictive coding mode in the three categories, and multiple different predictive coding modes can be designed according to requirements when the present invention is actually used.

2)相关性统计分析模块2) Correlation statistical analysis module

定义相关性系数α用于表征视频信号的时间相关性与视点间相关性的强弱对比，该系数可由对已编码或正在编码的图像组的时间相关性与视点间相关性进行统计分析得到。The correlation coefficient α is defined to characterize the strength contrast between the temporal correlation and inter-viewpoint correlation of the video signal. This coefficient can be obtained by statistical analysis of the temporal correlation and inter-viewpoint correlation of the encoded or being encoded image group.

在多模式多视点视频编码中，对于与I帧同一时刻但位于不同视点的图像，如图3a、图3b和图3c中位于I帧左右2侧的若干D帧，仅通过视差补偿预测对其进行编码，D帧中I块(即帧内编码块)的数量表示为n_i ^D；对于与I帧为同一视点但不同时刻的图像，如图3a、图3b和图3c中位于I帧上下2侧(实际从时间上表现为I帧的前后帧)的若干P帧，仅通过运动补偿预测对其进行编码，P帧中I块的数量表示为n_i ^P。相关性系数α可定义为In multi-mode multi-viewpoint video coding, for images at the same moment as I frame but located at different viewpoints, such as several D frames located on the left and right sides of I frame in Figure 3a, Figure 3b and Figure 3c, only through parallax compensation Encoding, the number of I blocks (i.e. intra-frame coded blocks) in the D frame is expressed as n _i ^D ; for images with the same viewpoint but different moments with the I frame, as shown in Figure 3a, Figure 3b and Figure 3c, they are located above and below the I frame Several P frames on side 2 (actually shown as frames before and after the I frame in terms of time) are coded only by motion compensation prediction, and the number of I blocks in the P frame is expressed as n _i ^P . The correlation coefficient α can be defined as

$α α = = \frac{11}{n no} {Σ Σ}_{i i = = 00}^{n no} {n no}_{i i}^{D D.} / / \frac{11}{m m} {Σ Σ}_{i i = = 00}^{m m} {n no}_{i i}^{P P}$

其中，n、m分别表示用于计算相关性系数的D帧和P帧的帧数。该相关性系数α可用于表征视频信号的时间相关性和视点间相关性的强弱对比。而且计算α所需的I块数量可在编码同时统计得到，额外计算开销极低，因而可以通过α来有效实现多模式多视点视频编码的视频信号相关性统计分析。本实施例即采用D帧和P帧中I块数量的比例关系来计算相关性系数α，并在预测模式选择模块中采用阈值法从图3a、图3b和图3c所示的3个候选预测编码模式中最终选择1个预测编码模式提交给多视点视频预测编码模块进行编码。Wherein, n and m represent the frame numbers of the D frame and the P frame used to calculate the correlation coefficient respectively. The correlation coefficient α can be used to characterize the time correlation of the video signal and the strength and weakness of the correlation between viewpoints. Moreover, the number of I-blocks required to calculate α can be statistically obtained at the same time of encoding, and the additional calculation overhead is extremely low. Therefore, α can be used to effectively implement statistical analysis of video signal correlation in multi-mode and multi-view video coding. In this embodiment, the correlation coefficient α is calculated by using the proportional relationship between the number of I blocks in the D frame and the P frame, and the threshold method is used in the prediction mode selection module from the three candidate predictions shown in Fig. 3a, Fig. 3b and Fig. 3c Among the encoding modes, one predictive encoding mode is finally selected and submitted to the multi-view video predictive encoding module for encoding.

除上述方案外，也可以通过已编码或正在编码的图像组中那些仅采用视差补偿预测进行编码的图像帧(D帧)的预测误差(例如SAD值)，以及那些仅采用运动补偿预测进行编码的图像帧(P帧)的预测误差的比例关系，统计分析当前多视点视频信号的时间和视点间相关性强弱关系。In addition to the above schemes, it is also possible to use the prediction errors (such as SAD values) of those image frames (D frames) that are coded or are coded only using disparity compensation prediction for coding, and those that only use motion compensation prediction for coding The proportional relationship of the prediction error of the image frame (P frame), statistical analysis of the time of the current multi-viewpoint video signal and the strength of the correlation between the viewpoints.

3)预测模式选择模块3) Prediction mode selection module

根据相关性统计分析模块的多视点视频信号相关性统计分析结果，以及对多模式多视点视频编码的压缩效率、编码复杂度、随机访问性能、编码延时等综合性能的要求，从候选预测编码模式中选择适合当前多视点视频信号特点和编码综合性能要求的某个预测编码模式。预测编码模式的选择方式如下：According to the statistical analysis results of multi-viewpoint video signal correlation of the correlation statistical analysis module, and the comprehensive performance requirements of multi-mode multi-viewpoint video coding such as compression efficiency, coding complexity, random access performance, and coding delay, the candidate predictive coding Select a predictive coding mode that is suitable for the characteristics of the current multi-viewpoint video signal and the comprehensive performance requirements of coding. The predictive coding mode is selected as follows:

(1)当时间相关性明显强于视点间相关性时，可进一步判断相关性在时域内部的分布情况是否相对均衡，或是最邻近时刻的时间相关性明显强于次邻近时刻的时间相关性，以选择确定某个合适的第1类预测编码模式。(1) When the time correlation is obviously stronger than the inter-viewpoint correlation, it can be further judged whether the distribution of the correlation in the time domain is relatively balanced, or whether the time correlation at the nearest moment is obviously stronger than that at the next nearest moment to select and determine an appropriate Type 1 predictive coding mode.

(2)当时间相关性明显弱于视点间相关性时，选择某个以视差补偿预测为主的第2类预测编码模式，以便在多视点视频预测编码模块中采用该预测编码模式进行编码。(2) When the temporal correlation is obviously weaker than the inter-view correlation, select a type 2 predictive coding mode mainly based on disparity compensation prediction, so as to use this predictive coding mode in the multi-view video predictive coding module for coding.

(3)当时间相关性与视点间相关性大致相当时，则选择某个第3类兼顾时、空域的联合预测编码模式。(3) When the temporal correlation is approximately equivalent to the inter-view correlation, select a third type of joint predictive coding mode that takes into account temporal and spatial domains.

在本实施例中，由于所采用的3个候选预测编码模式在图3a、图3b和图3c中位于中心十字上的图像帧先于图像组中其它图像帧被编码，而且这3个预测编码模式的这些位于中心十字上的图像帧具有相同的预测方式，因此可以在编码这些图像帧的同时，获得相关性统计分析模块所需的n_i ^D和n_i ^P，从而能够获得当前正在编码的多视点视频信号的相关性统计分析结果，以便在这些位于中心十字上的图像帧编码完成后及时确定图像组中其它帧采取何种预测模式，即从图3a、图3b和图3c所示的3个候选预测编码模式中最终选定一个适合当前多视点视频信号特点的预测编码模式，提交给多视点视频预测编码模块进行编码。In this embodiment, due to the three candidate predictive coding modes adopted, the image frame located on the central cross in Fig. 3a, Fig. 3b and Fig. These image frames located on the central cross in the mode have the same prediction method, so while encoding these image frames, the n _i ^D and n _i ^P required by the correlation statistical analysis module can be obtained, so that the currently encoding The statistical analysis results of the correlation of multi-viewpoint video signals, in order to determine in time which prediction mode to adopt for other frames in the image group after the encoding of these image frames located on the central cross is completed, that is, from the Among the three candidate predictive coding modes, a predictive coding mode suitable for the characteristics of the current multi-view video signal is finally selected, and submitted to the multi-view video predictive coding module for coding.

4)模式更新触发模块4) Mode update trigger module

可以采用基于视频内容的模式更新方案，即根据相关性统计分析模块中得到的相关性系数α的变化情况，由阈值法确定是否重新启用预测模式选择模块以更新相应的预测编码模式；或者也可以采用定时更新触发的方式，定期启用相关性统计分析模块对多视点视频信号的时间相关性和视点间相关性进行统计分析，并启用预测模式选择模块以确定将要采用的预测编码模式。本实施例采用基于视频内容的模式更新方案。A mode update scheme based on video content can be used, that is, according to the change of the correlation coefficient α obtained in the correlation statistical analysis module, the threshold method is used to determine whether to re-enable the prediction mode selection module to update the corresponding prediction coding mode; or In the way of timing updating and triggering, the correlation statistical analysis module is regularly enabled to statistically analyze the time correlation and inter-viewpoint correlation of multi-viewpoint video signals, and the prediction mode selection module is enabled to determine the predictive coding mode to be adopted. This embodiment adopts a mode update scheme based on video content.

以下就本实施例进行多视点视频编码的性能进行说明：The following describes the performance of multi-viewpoint video coding in this embodiment:

1)多模式多视点视频编码方案的随机访问性能1) Random access performance of multi-mode multi-view video coding scheme

对于多视点视频，其随机访问包括快进、快退、视点切换和观看时刻冻结、视点滑动等访问方式。假设用于编码的v个视点、每个视点t帧的多视点视频帧总数s＝v×t是有限的。令x_i表示在对第i帧进行解码前需要提前解码的帧数，p_i为用户随机访问第i帧的概率，则随机访问代价的数学期望 $E_{n} = Σ_{t = 1}^{v \times t} x_{i} p_{i}$ 是评价一个预测编码模式n对随机访问支持程度的重要指标。这个代价越高，说明解码端对随机访问的支持能力越低，为支持随机访问而消耗的资源就越多。设k_n为采用第n个预测编码模式编码多视点视频信号的概率，候选预测编码模式个数为N，则多模式多视点视频编码的随机访问代价可表示为 $E (X) = Σ_{n = 1}^{N} (k_{n} \times E_{n}) .$ For multi-viewpoint video, its random access includes fast forward, rewind, viewpoint switching, viewing moment freezing, viewpoint sliding and other access methods. Assuming v views for encoding, the total number of multi-view video frames s=v×t for each view t frames is limited. Let x _i represent the number of frames that need to be decoded in advance before decoding the i-th frame, p _i is the probability of the user randomly accessing the i-th frame, then the mathematical expectation of the random access cost ${E.}_{no} = Σ_{t = 1}^{v \times t} x_{i} p_{i}$ It is an important index to evaluate the support degree of a predictive coding mode n to random access. The higher the cost, the lower the decoder's ability to support random access, and the more resources consumed to support random access. Let k _n be the probability of encoding a multi-view video signal using the nth predictive coding mode, and the number of candidate predictive coding modes is N, then the random access cost of multi-mode multi-view video coding can be expressed as $E. (x) = Σ_{no = 1}^{N} (k_{no} \times {E.}_{no}) .$

多模式多视点视频编码中各模式编码的概率k_n直接与实际多视点视频信号的特点相关。本实施例中N＝3，且假定各模式编码概率相同，即k_n＝1/3(n＝1，2，3)，则不同方案的随机访问代价如表1所示。表中PSVP和BSVP分别代表采用P帧、B帧的顺序预测方法，其预测编码模式分别如图4和图5所示。Mpiture为日本Fujii等人的Mpicture多视点视频编码方法，其预测编码模式如图6所示。PSVP、BSVP和Mpiture均为单一模式的多参考帧预测编码方法。MMVC为本发明的采用如图3所示的3种候选预测编码模式的多模式多视点视频编码方法(以本实施例为本发明方案的代表)。由表1可见，就随机访问性能而言，PSVP最差，BSVP和Mpicture相对好些。而本发明的多模式多视点视频编码方法MMVC的随机访问代价最低，相对PSVP、BSVP以及Mpicture方法，其随机访问代价降低了49％～72％，随机访问性能有明显提高。The coding probability k _n of each mode in multi-mode multi-view video coding is directly related to the characteristics of the actual multi-view video signal. In this embodiment, N=3, and assuming that the encoding probability of each mode is the same, that is, k _n =1/3 (n=1, 2, 3), the random access costs of different schemes are shown in Table 1. In the table, PSVP and BSVP respectively represent the sequential prediction method using P frame and B frame, and their predictive coding modes are shown in Fig. 4 and Fig. 5 respectively. Mpiture is the Mpicture multi-viewpoint video coding method developed by Japan Fujii et al., and its predictive coding mode is shown in FIG. 6 . PSVP, BSVP and Mpiture are single-mode multi-reference frame predictive coding methods. MMVC is a multi-mode multi-viewpoint video coding method of the present invention using three candidate predictive coding modes as shown in FIG. 3 (this embodiment is a representative of the solution of the present invention). It can be seen from Table 1 that in terms of random access performance, PSVP is the worst, while BSVP and Mpicture are relatively better. However, the multi-mode multi-viewpoint video coding method MMVC of the present invention has the lowest random access cost, compared with PSVP, BSVP and Mpicture methods, its random access cost is reduced by 49% to 72%, and the random access performance is obviously improved.

2)多模式多视点视频编码方案的计算复杂度2) Computational complexity of multi-mode multi-view video coding scheme

基于H.264/AVC编码框架的高精度视差补偿预测和运动补偿预测占整个多视点视频编码器75％以上的计算复杂度，因此可通过平均编码一个5×7图像组所需视差补偿预测和运动补偿预测的次数来表征整个编码器的计算复杂度。各方案计算复杂度比较如表1所示，由于采用了多参考帧方法，PSVP、BSVP和Mpicture方案的计算复杂度都很大，尤其是BSVP和Mpicuture方法。而与PSVP、BSVP和Mpicture方案相比，本发明方案的计算复杂度则相对降低了29％～57％。The high-precision disparity compensation prediction and motion compensation prediction based on the H.264/AVC coding framework account for more than 75% of the computational complexity of the entire multi-view video encoder. Therefore, the disparity compensation prediction and The number of motion-compensated predictions characterizes the computational complexity of the entire encoder. Computational complexity comparison of various schemes is shown in Table 1. Due to the use of multiple reference frame methods, the computational complexity of PSVP, BSVP and Mpicture schemes is very large, especially the BSVP and Mpicture methods. Compared with the PSVP, BSVP and Mpicture schemes, the computational complexity of the scheme of the present invention is relatively reduced by 29% to 57%.

表1本发明方案MMVC的随机访问代价和计算复杂度比较Table 1 Comparison of Random Access Cost and Computational Complexity of the MMVC of the present invention

编码方案 encoding scheme E(X) E(X) 随机访问代价倍数 Random access cost multiplier 计算复杂度 Computational complexity 计算复杂度倍数 Computational complexity multiple PSVP PSVP 11.0 11.0 364％ 364% 58 58 141％ 141% BSVP BSVP 7.5 7.5 248％ 248% 83 83 202％ 202% Mpicture Mpicture 6.0 6.0 199％ 199% 97 97 237％ 237% MMVC MMVC 3.02 3.02 100％ 100% 41 41 100％ 100%

3)多模式多视点视频编码方案的率失真性能3) Rate-distortion performance of multi-mode multi-view video coding schemes

为了评价本发明MMVC方案的编码效率，基于H.264/AVC(JM8.5mainprofile)视频编码框架，进行了多视点视频编码实验(量化参数QP分别为24、30、36、40)。多视点视频测试序列选用Tanimoto实验室和MERL的Xmas(相机间距9mm、视点间相关性大)、exit(运动缓慢，大视差，相机间距19.5cm)和ballroom(运动剧烈)的多视点测试序列集，3个序列均为平行相机系统所拍摄，分辨率为640×480。选取5个视点，5个场景，每个场景5个图像组，并将其拼接成如图7所示的Joint序列，即每个视点视频各5×5×7＝175帧。实验中，通过序列拼接的方式模拟实际视频的场景切换，本实施例中，MMVC可以自适应根据视频内容从图3a、图3b和图3c所示的候选预测编码模式中选择合适的预测编码模式对Joint序列进行编码。In order to evaluate the coding efficiency of the MMVC scheme of the present invention, based on the H.264/AVC (JM8.5mainprofile) video coding framework, a multi-viewpoint video coding experiment was carried out (quantization parameters QP are 24, 30, 36, 40 respectively). The multi-viewpoint video test sequence uses the multi-viewpoint test sequence set of Xmas (camera distance 9mm, high correlation between viewpoints), exit (slow motion, large parallax, camera distance 19.5cm) and ballroom (vigorous movement) from Tanimoto Lab and MERL , all three sequences were captured by a parallel camera system with a resolution of 640×480. Select 5 viewpoints, 5 scenes, and 5 image groups in each scene, and stitch them into a Joint sequence as shown in Figure 7, that is, each viewpoint video has 5×5×7=175 frames. In the experiment, the scene switching of the actual video is simulated by sequence splicing. In this embodiment, MMVC can adaptively select an appropriate predictive coding mode from the candidate predictive coding modes shown in FIG. 3a, FIG. 3b and FIG. 3c according to the video content Encode the Joint sequence.

图8、9、10、11为采用本实施例的MMVC与顺序预测法PSVP、BSVP以及Mpicture等方法对Joint测试序列编码的率失真性能比较。其中图8、9、10分别为Joint序列中的Xmas、exit和ballroom三个序列各自的平均率失真曲线。图11所示的Joint序列的总体平均率失真曲线表明MMVC与BSVP、PSVP和Mpicture的率失真性能基本相当。Figures 8, 9, 10, and 11 are comparisons of the rate-distortion performance of joint test sequence encoding using MMVC of this embodiment and methods such as sequential prediction methods PSVP, BSVP, and Mpicture. Among them, Figures 8, 9, and 10 are the average rate-distortion curves of the Xmas, exit, and ballroom sequences in the Joint sequence, respectively. The overall average rate-distortion curve of the Joint sequence shown in Figure 11 shows that the rate-distortion performance of MMVC is basically equivalent to that of BSVP, PSVP and Mpicture.

综上所述，与现有技术相比，本发明的优点在于通过对多视点视频信号时间相关性与视点间相关性分析，动态选择适合于当前被编码的多视点视频信号特点以及多视点视频编码综合性能要求的预测编码模式，以取代现有单一模式的计算复杂的联合时间与空间预测的多参考帧预测编码方法，从而有效降低多视点视频信号编码压缩的计算复杂度，提高多视点视频系统的随机访问性能，同时保证编码压缩性能。In summary, compared with the prior art, the present invention has the advantage of dynamically selecting the characteristics of the currently encoded multi-view video signal and the multi-view video signal by analyzing the temporal correlation and inter-view correlation of the multi-view video signal. The predictive coding mode required by the comprehensive coding performance can replace the multi-reference frame predictive coding method of the existing single mode, which is computationally complex and joint temporal and spatial prediction, so as to effectively reduce the computational complexity of multi-viewpoint video signal coding and compression, and improve the performance of multi-viewpoint video. The random access performance of the system, while ensuring the encoding compression performance.

显而易见，多视点视频预测编码模式不仅限于本实施例的形式，因此在不背离权利要求及同等范围所限定的一般概念的精神和范围的情况下，本发明并不限于特定的细节和这里示出与描述的示例。It is obvious that the multi-view video predictive coding mode is not limited to the form of this embodiment, so the invention is not limited to the specific details and examples shown here without departing from the spirit and scope of the general concept defined by the claims and equivalents. Example with description.

Claims

1. A multi-mode multi-viewpoint video signal coding compression method is characterized in that encoder is set to four functional modules of multi-viewpoint video predictive coding module, correlation statistical analysis module, prediction mode selection module and mode update trigger module, for The input multi-viewpoint video signal first determines the initial predictive coding mode according to the known information, encodes by the multi-viewpoint video predictive coding module, and then performs coding according to the following steps: ① by the prediction mode selection module according to the described The statistical correlation analysis module of the multi-view video signal obtained by statistical analysis and the requirements for the comprehensive performance of multi-view video coding, dynamically select from the candidate predictive coding modes to determine the prediction suitable for the characteristics of the multi-view video signal currently being coded Coding mode; 2. After the multi-view video predictive coding module encodes the input multi-view video signal with the selected predictive coding mode, the encoded and compressed code stream signal is output; 3. When the mode update triggers When the mode update triggering condition in the module is met, the correlation statistical analysis module is restarted to select and update the predictive coding mode.

2. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 1, wherein the candidate predictive coding modes are divided into three categories: the first category is applicable to multi-viewpoints based on temporal correlation The predictive coding mode of the video signal, this type of predictive coding mode is based on motion compensation prediction; the second type is the predictive coding mode suitable for multi-viewpoint video signals based on inter-viewpoint correlation, this type of predictive coding mode is based on parallax compensation Prediction-based; the third category is the predictive coding mode of multi-view video signals suitable for the balance of temporal correlation and inter-view correlation.

3. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 1, wherein the statistical analysis of the statistical analysis module of correlation is to the temporal correlation between the encoded or the image group being encoded and between the viewpoints. The correlation is statistically analyzed, and the correlation coefficient α is defined to characterize the strength comparison between the time correlation of the obtained video signal and the correlation between viewpoints.

4. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 3, characterized in that in the described correlation statistics analysis module, only parallax compensation prediction is used for encoding in the image group that has been encoded or is being encoded The number n _l ^D of the intra-coded blocks in the image frame and the number n _l ^P of the intra-coded blocks in the image frame encoded only by motion compensation prediction are counted, and the ratio between n _l ^D and n _l ^P is calculated Describe the relationship between time and inter-view correlation strength of the current multi-view video signal.

5. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 3, characterized in that in the statistical correlation analysis module, the group of images that have been encoded or are being encoded are only predicted using parallax compensation The proportional relationship between the prediction error of the coded image frame and the prediction error of the coded image frame using only motion compensation prediction is used to analyze the temporal and inter-view correlation strength of the current multi-viewpoint video signal.

6. The multi-mode multi-viewpoint video signal coding and compression method as claimed in claim 2, wherein said prediction mode selection module obtains multi-viewpoint video signal correlation characteristics and The method of selecting the prediction mode for the comprehensive performance requirements of multi-view video coding is as follows:

(1) When the time correlation is obviously stronger than the inter-viewpoint correlation, further judge whether the distribution of the correlation in the time domain is relatively balanced, or whether the time correlation of the nearest moment is obviously stronger than that of the next nearest moment , select the prediction coding mode mainly based on motion compensation prediction;

(2) When the temporal correlation is obviously weaker than the inter-view correlation, select the predictive coding mode based on parallax compensation prediction;

(3) When the temporal correlation is roughly equivalent to the inter-view correlation, choose a joint predictive coding mode that takes both temporal and spatial domains into account.

7. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 1, wherein said mode update trigger module adopts a mode update scheme based on video content, and obtains according to said correlation statistical analysis module The variation of the correlation coefficient α determines whether to re-enable the prediction mode selection module to update the prediction coding mode.

8. The method for encoding and compressing multi-mode and multi-viewpoint video signals as claimed in claim 1, wherein said mode update trigger module adopts the mode of timing update trigger, and regularly opens said correlation statistics analysis module for multi-viewpoint video The temporal correlation and inter-view correlation of the signal are statistically analyzed and the predictive mode selection module is enabled to determine the predictive coding mode.

9. The multi-mode multi-viewpoint video signal coding and compression method as claimed in claim 1, wherein in all candidate modes of said candidate predictive coding mode, the image frame located at the same moment as the intra-frame encoded frame and the image frame located at the same time as the intra-frame encoded frame The image frames of the same viewpoint of the coded frame are encoded before other image frames in the image group, and these image frames encoded first have the same prediction mode in all candidate modes.