KR100738867B1

KR100738867B1 - Method for Coding and Inter-view Balanced Disparity Estimation in Multiview Animation Coding/Decoding System

Info

Publication number: KR100738867B1
Application number: KR1020050030724A
Authority: KR
Inventors: 손광훈; 김용태; 서형갑; 서정동; 이재호; 남승진; 이준용; 박창섭; 정신일
Original assignee: 연세대학교 산학협력단; 한국방송공사
Priority date: 2005-04-13
Filing date: 2005-04-13
Publication date: 2007-07-12
Also published as: KR20060108952A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은, 다시점 동영상 부호화/복호화 시스템의 부호화 방법 및 시점간 보정 변이 추정 방법에 관한 것임.The present invention relates to an encoding method of a multiview video encoding / decoding system and a method of estimating inter-view correction variation.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 새로운 구조의 GOP를 제안하여, 이를 통해 더욱 간단하고 효율적인 다시점 동영상을 부호화하고, 복호화기의 상황에 따라 다양한 시점의 영상을 복원하기 위한, 다시점 동영상 부호화/복호화 시스템의 부호화 방법을 제공하는데 그 목적이 있음.The present invention proposes a GOP of a new structure, through which a simpler and more efficient multi-view video is encoded, and a method of encoding a multi-view video encoding / decoding system for reconstructing images of various viewpoints according to the situation of a decoder. The purpose is to provide

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 같은 시점 내에 있는 다수개의 다시점 영상의 집합인 다시점 그룹(GOMV)에 대하여 부호화를 수행하기 위한, 다시점 동영상 부호화/복호화 시스템의 부호화 방법은, GOP를 시점축으로 확장시킨 GGOP에서 기준 GOP(base GOP)-I 픽쳐를 포함하는 GOP임-의 프레임에 대한 부호화를 수행하는 단계; 상기 기준 GOP를 이용하여 적어도 하나 이상의 GOP_B-P 픽쳐를 적어도 하나 이상 포함하는 GOP임-의 프레임에 대한 부호화를 수행하는 단계; 및 상기 기준 GOP 및 상기 GOP_B를 이용하여상기 기준 GOP와 적어도 하나 이상의 상기 GOP_B의 사이에 존재하는 GOP_C-B 픽쳐만을 포함하는 GOP임-의 프레임에 대하여 양방향으로 변이 추정을 수행하여 부호화를 수 행하는 단계를 포함함.The encoding method of a multiview video encoding / decoding system for performing encoding on a multiview group (GOMV), which is a set of a plurality of multiview images within the same view, is a GGOP in which a GOP is extended to a view axis. Performing encoding on a frame of a base GOP (I am a GOP) including a base picture (OPP) in I; Performing encoding on a frame of a GOP including at least one or more GOP _B- P pictures using the reference GOP; And using the reference GOP and the GOP _B Performing encoding by performing a disparity estimation in both directions on a frame of a GOP _C -B picture including only a GOP _C picture existing between the reference GOP and at least one or more GOP _Bs .

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 부호화/복호화 시스템 등에 이용됨.The present invention is used in an encoding / decoding system.

부호화, 복호화, GOP, GGOP, GOMV, IBDE Encoding, Decoding, GOP, GGOP, GOMV, IBDE

Description

Method for Coding and Inter-view Balanced Disparity Estimation in Multiview Animation Coding / Decoding System}

도 1은 MPEG-2 표준 규격의 시간 스케일러빌리티를 응용하여 구현되는 다시점 프로파일의 부호화/복호화 시스템의 구조도,1 is a structural diagram of a multiview profile encoding / decoding system implemented by applying time scalability of the MPEG-2 standard;

도 2는 MPEG-2 다시점 프로파일을 이용한 스테레오 동영상 부호화/복호화 시스템의 구조도,2 is a structural diagram of a stereo video encoding / decoding system using an MPEG-2 multiview profile;

도 3은 쌍방향 예측을 위해 두 개의 변이 예측을 사용하여 시차만을 고려한 예측 부호화를 설명하기 위한 예시도,FIG. 3 is an exemplary diagram for describing prediction encoding considering only parallax using two disparity predictions for bidirectional prediction; FIG.

도 4는 쌍방향 예측을 위해 변이 벡터와 움직임 벡터를 사용한 예측 부호화 를 설명하기 위한 예시도,4 is an exemplary diagram for describing prediction encoding using a disparity vector and a motion vector for bidirectional prediction.

도 5는 MPEG-2에서 규정하고 있는 픽쳐 형태를 설명하기 위한 예시도,5 is an exemplary diagram for explaining a picture form defined in MPEG-2;

도 6은 본 발명에 따라 정의되는 GOMV의 일실시예 구조도,6 is a structural diagram of one embodiment of a GOMV defined in accordance with the present invention;

도 7은 본 발명에 따라 정의되는 GGOP의 일실시예 구조도,7 is a structural diagram of one embodiment of a GGOP defined in accordance with the present invention;

도 8은 상기 도 7에서 GOP_B를 설명하기 위한 일실시예 구조도,8 is a structural diagram of an embodiment for explaining GOP _B in FIG. 7;

도 9는 상기 도 7에서 GOP_C를 설명하기 위한 일실시예 구조도,FIG. 9 is a diagram illustrating an embodiment of GOP _C in FIG. 7;

도 10은 본 발명에 따라 시점 스케일러빌리티를 고려한 GGOP의 비트스트림 저장 위치를 설명하기 위한 일실시예 구조도,FIG. 10 is a structural diagram illustrating a bitstream storage location of a GGOP considering view scalability according to the present invention; FIG.

도 11은 본 발명에 따른 다시점 동영상 부호화/복호화 시스템의 시점간 보정 변이 추정 방법을 설명하기 위한 일실시예 흐름도.11 is a flowchart illustrating a method for estimating inter-view correction variation of a multiview video encoding / decoding system according to the present invention.

본 발명은 다시점 동영상 부호화/복호화 시스템의 부호화 방법 및 시점간 보정 변이 추정 방법에 관한 것으로, 특히 다시점 영상의 압축을 효율적으로 수행하고, 다시점 영상 간의 상관성을 효율적으로 제거하여, 시점 간의 불균형을 해소하기 위한 다시점 동영상 부호화/복호화 시스템의 부호화 방법 및 시점간 보정 변이 추정 방법에 관한 것이다.The present invention relates to an encoding method and an inter-view corrected variation estimation method of a multi-view video encoding / decoding system. In particular, the multi-view video is efficiently compressed, the correlation between the multi-view images is efficiently removed, and the imbalance between views is improved. The present invention relates to an encoding method and an inter-view correction variation estimation method of a multiview video encoding / decoding system.

1970년대 이후에 음성과 영상 정보 전달을 위한 미디어로써 전화와 텔레비전을 중심으로 한 정보 전송 기술이 발전하였으며, 특히 컴퓨터 기술 발달은 문자 등의 데이터 전달 관련 기술의 발전을 가져왔다. 또한, 통신, 컴퓨터, 반도체 기술의 비약적 발전은 독립적으로 처리되던 데이터, 음성, 영상 정보를 하나의 미디어로 전송하는 멀티미디어 통신시대를 탄생시켰다. Since the 1970s, information transmission technology centered on telephones and televisions has been developed as a medium for transmitting voice and video information. In particular, the development of computer technology has led to the development of data transmission technology such as text. In addition, the rapid development of communication, computer, and semiconductor technology gave birth to the era of multimedia communication that transmits data, voice, and video information that have been processed independently in one media.

이런 기술적 발전과 더불어 21세기 고도 정보화 사회에서 요구되는 정보 통신 서비스는 전달하려고 하는 정보를 가장 현실적이고 자연적이며, 친근감 있게 표현할 수 있는 영상 매체를 요구한다. Along with these technological developments, information and communication services required in the highly information society of the 21st century require a video medium that can express the information to be conveyed most realistically, naturally and intimately.

현재 영상 매체 중에서 가장 대표적인 TV의 경우, 지금까지의 연구개발 방향은 고선명 TV(HDTV: High Definition TV)로 기존 TV 화면의 최적화 및 대형화, 해상도 향상 및 상의 자연색화를 통해 시청자에게 임장감과 현실감을 줄 수 있도록 하고 있으나, 이들은 단지 평면 화상 표현만이 가능하여 현실 세계와는 상이하므로 정보의 사실적 전달에는 한계가 있다. In the case of the most representative TV among the current image media, the research and development direction so far is high definition TV (HDTV), which gives the viewers a sense of presence and reality through optimization and enlargement of the existing TV screen, enhancement of resolution, and natural color of the image. However, they are limited to the realistic transfer of information because they are only possible to represent flat images and are different from the real world.

따라서, 오늘날에는 발달한 통신망을 바탕으로 인간의 시각적 감각 및 인식의 3차원 표현, 기록 및 재생을 위하여, 더욱 자연스럽고, 현실적으로 사용자 인터페이스가 고려된 영상 매체 기술에 관심을 가지게 되었다.Therefore, today, with the development of the communication network, the interest in the image media technology, which is considered more natural and realistic user interface, for the three-dimensional representation, recording and reproduction of human visual sense and perception.

3차원 입체 TV 방송 기술은 현재의 디지털 서비스 분야와 멀티미디어의 발전 추세로 볼 때 장차 대부분의 영상 분야에 적용될 것으로 예상되며, 수요 및 연구 개발의 전망 측면에서도 21세기 주요 산업의 한 분야가 될 것이다. 현재 가장 쉽게 인간에게 입체감을 느낄 수 있게 3차원 영상을 인식시키는 방식은 인간의 시각 시스템이 물체까지의 깊이감을 인식하는데 양안 시차(Binocular Disparity)에 크게 의존함을 이용하여 좌, 우 눈에서 본 것과 같은 영상을 적절한 방법으로 각 눈에 입력시켜주는 스테레오 방식이다. Three-dimensional stereoscopic TV broadcasting technology is expected to be applied to most video fields in the future according to the development of digital service and multimedia, and it will be one of the major industries in the 21st century in terms of demand and research and development prospects. Currently, the method of recognizing three-dimensional images so that humans can easily feel a three-dimensional effect is that the human visual system relies heavily on binocular disparity to recognize a sense of depth to an object. It is a stereo system that inputs the same image to each eye in a proper way.

이와 같은 스테레오 방식의 경우 관찰자에게는 두 개의 평면 영상이 필요하고, 따라서 이들을 처리하는데 있어서 기존의 동영상 압축 표준인 MPEG(Moving Picture Experts Group) 또는 H.261, H.263 및 H.264등을 적용할 수 있는 장점이 있다. This stereo system requires two planar images for the observer, so the existing video compression standards MPEG (Moving Picture Experts Group) or H.261, H.263 and H.264 can be used to process them. There are advantages to it.

이론적으로 스테레오 영상(Stereo Image)의 전송을 위해서는 기존 영상에 비해 두 배의 대역폭이 필요하지만, 두 영상은 서로 수평으로 인간의 눈 사이 거리만큼 떨어진 곳에서 획득된 영상이므로 상호 큰 유사성(Correlation)을 가지고 있다. 그러므로 이 유사성을 이용해 데이터량을 크게 줄일 수 있는 장점이 있다. In theory, the transmission of a stereo image requires twice the bandwidth of a conventional image, but since the two images are acquired at a distance from each other horizontally to each other, a large similarity (Correlation) is achieved. Have. Therefore, this similarity has the advantage of greatly reducing the amount of data.

스테레오 영상 부호화에서 가장 기본이 되는 것은 두 영상의 유사성을 찾아 그 변이를 벡터로 표현하는 변이 추정(Disparity Estimation)이다. 변이 추정의 방법은 시간의 중복성을 없애기 위해 사용되는 움직임 추정(Motion Estimation)과 비슷하여 보통의 경우 움직임 추정에서 활용되는 방법을 그대로 이용한다. The most basic of stereo image coding is disparity estimation, which finds similarity between two images and expresses the disparity as a vector. The method of disparity estimation is similar to the motion estimation used to eliminate the redundancy of time, so that the method used in the motion estimation is usually used as it is.

한편, 입체감을 느끼기 위해서는 스테레오 영상만으로 충분하지만, 다양한 시점을 제공하여 보다 현장감 있는 영상을 재생하기 위해서는 여러 위치에서 영상을 획득한 다시점 영상(Multiview Image)이 필요하다. 다시점 영상을 사용하면 시청자의 위치 이동에 따라 보는 영상이 달라지는데, 시점의 수가 많아질수록 더욱 자연스러운 시점 이동이 가능해진다. 하지만, 이 경우 데이터 양이 시점수만큼 증가하기 때문에 효율적인 압축 방법이 필요하게 된다. 스테레오 영상과 마찬가지로 다시점 영상은 시점마다 중복되는 부분이 많기 때문에, 변이 추정 방식을 이용하여 효율적으로 압축을 수행할 수 있다. On the other hand, only a stereo image is sufficient to feel a three-dimensional effect, but in order to reproduce a more realistic image by providing various viewpoints, a multiview image obtained by acquiring images from various locations is required. When the multi-view image is used, the view image varies according to the viewer's position movement. As the number of viewpoints increases, more natural view movement becomes possible. However, in this case, since the amount of data increases by the number of viewpoints, an efficient compression method is required. Like a stereo image, since a multiview image has many overlapping points for each view, compression can be efficiently performed using a disparity estimation method.

다시점 영상을 효율적으로 압축하기 위해서는, 다시점 영상을 압축하는 순서인 GOP(Group of pictures) 구조에 대한 개선이 필요하다. 이하, 도 1 내지 도 5 를 참조로 MPEG-2의 다시점 동영상 부호화/복호화 시스템과 GOP 구조에 대하여 상세히 알아보도록 한다.In order to efficiently compress a multiview image, it is necessary to improve the structure of a group of pictures (GOP) which is a sequence of compressing a multiview image. Hereinafter, a multiview video encoding / decoding system and a GOP structure of MPEG-2 will be described in detail with reference to FIGS. 1 to 5.

도 1은 MPEG-2 표준 규격의 시간 스케일러빌리티를 응용하여 구현되는 다시점 프로파일의 동영상 부호화/복호화 시스템의 구조도이다.1 is a structural diagram of a video encoding / decoding system of a multiview profile implemented by applying time scalability of the MPEG-2 standard.

MPEG-2에서 제공하는 스케일러빌리티(scalability)는 하나의 영상장비를 사용해서 다른 해상도나 형식을 갖는 영상을 동시에 복호화하기 위한 것이며, MPEG-2에서 지원하는 스케일러빌리티 중에서 시간 스케일러빌리티는 화면율(frame rate)을 높임으로써 시각적 화질을 향상시키기 위한 기술이다. 다시점 동영상은 이러한 시간 스케일러빌리티를 고려하여 스테레오 동영상에 적용한 것이다.The scalability provided by MPEG-2 is used to simultaneously decode video having different resolutions or formats by using one video device, and the time scalability among the scalability supported by MPEG-2 is frame rate (frame). It is a technique to improve visual quality by increasing rate. Multi-view video is applied to stereo video in consideration of this time scalability.

실질적으로, 스테레오 동영상 개념을 갖는 부호화/복호화 시스템의 구조는 도 1의 시간 스케일러빌리티와 같은 구조를 갖는 것으로, 스테레오 동영상 중 좌측 영상들은 베이스 뷰 인코더(base view encoder)로 입력되며, 스테레오 동영상의 우측 영상들은 시간적으로 위치한 보조의 뷰 인코더(temporal auxiliary view encoder)로 입력된다. 이러한 부호화기는 시간 스케일러빌리티를 위한 것으로서, 시간적으로 기본계층(base layer)의 영상들 사이에 영상을 만드는 인터레이어 인코더(interlayer encoder)인 것이다.Substantially, the structure of an encoding / decoding system having a stereo video concept has a structure similar to the temporal scalability of FIG. 1, wherein left images of the stereo video are input to a base view encoder, and a right side of the stereo video Images are input to a temporally located auxiliary auxiliary view encoder. Such an encoder is for temporal scalability, and is an interlayer encoder that generates an image between images of a base layer in time.

이에 따라, 좌측 영상을 따로 부호화 및 복호화하면 보통의 동영상을 얻을 수 있으며, 좌측 영상과 우측 영상을 동시에 부호화 및 복호화하면 입체 동영상을 구현할 수 있는 것이다. 여기서, 동영상 전송이나 저장을 위해 두 영상의 시퀀스를 합치거나 분리할 수 있는 시스템 멀티플렉스 및 시스템 디멀티플렉스가 필요한 것 이다.Accordingly, a normal video can be obtained by separately encoding and decoding the left image, and a stereoscopic video can be realized by simultaneously encoding and decoding the left image and the right image. Here, a system multiplex and a system demultiplex capable of combining or separating sequences of two images are required for video transmission or storage.

도 2는 MPEG-2 다시점 프로파일을 이용한 스테레오 동영상 부호화/복호화 시스템의 구조도이다.2 is a structural diagram of a stereo video encoding / decoding system using an MPEG-2 multiview profile.

도면에 도시된 바와 같이, 기본계층은 움직임 보상 및 이산 여현 변환(DCT: Discrete Cosine Transform)을 이용하여 부호화하고 역과정을 통하여 복호화하며, 시간적으로 위치한 보조의 뷰 인코더(temporal auxiliary view encoder)는 복호화된 기본계층(base layer)의 영상을 바탕으로 예측한 템포럴 인터레이어 인코더(temporal interlayer encoder)의 역할을 수행한다.As shown in the figure, the base layer is coded using motion compensation and Discrete Cosine Transform (DCT) and decoded through inverse processes, and the temporally located auxiliary auxiliary view encoder is decoded. It plays the role of a temporal interlayer encoder predicted based on the image of the base layer.

즉, 두 개의 변이 예측 또는 각각 한 개의 변이 예측 및 움직임 보상 예측이 여기에 사용될 수 있으며, 기본계층(base layer)의 부호화 및 복호화기와 마찬가지로 시간적으로 위치한 보조의 뷰 인코더(temporal auxiliary view encoder)는 변이 및 움직임 보상 DCT 부호화기 및 복호화기를 포함한다.That is, two variation predictions or one variation prediction and a motion compensation prediction may be used here, and temporal auxiliary view encoders that are temporally positioned like the base layer encoder and the decoder may be used for variation. And a motion compensation DCT encoder and decoder.

또한, 움직임 예측/보상 부호화 과정에서 움직임 예측기와 보상기가 필요한 것처럼 변이 보상 부호화 과정은 변이 예측기와 보상기가 필요하며, 블록 기반의 움직임/변이 예측 및 보상에 덧붙여 부호화 과정에서는 예측된 결과 영상과 원영상 과 차영상들의 DCT, DCT 계수의 양자화, 그리고 가변장 부호화 등이 포함된다. 반대로 복호화 과정은 가변장 복호화, 역양자화, 역DCT등의 과정인 것이다.In addition, just as a motion predictor and a compensator are required in the motion prediction / compensation encoding process, the disparity compensation encoding process requires a disparity predictor and a compensator. DCT, quantization of DCT coefficients, and variable length coding of differential images. In contrast, the decoding process is a process of variable length decoding, inverse quantization, inverse DCT, and the like.

MPEG-2 부호화는 B-픽쳐를 위한 쌍방향 움직임 예측으로 인해서 매우 효율적인 압축방법이며, 시간 스케일러빌리티도 상당히 효율적이기 때문에, 단지 쌍방향 예측만을 사용한 B-픽쳐를 우측 영상의 부호화에 사용하여 고효율의 압축을 얻을 수 있다.MPEG-2 encoding is a very efficient compression method due to bidirectional motion prediction for B-pictures, and since temporal scalability is also very efficient, B-pictures using only bidirectional prediction are used for encoding the right image. You can get it.

도 3은 쌍방향 예측을 위해 두 개의 변이 예측을 사용하여 시차만을 고려한 예측 부호화를 설명하기 위한 예시도이다.FIG. 3 is an exemplary diagram for describing prediction encoding considering only parallax using two disparity predictions for bidirectional prediction.

도면에 도시된 바와 같이, 좌측 영상은 논 스케레블(non-scalable) MPEG-2 인코더(encoder)를 사용하여 부호화하고, 우측 영상은 복호화된 좌측 영상을 바탕으로 MPEG-2 시간적으로 위치한 보조의 뷰 인코더(temporal auxiliary view encoder)를 사용하여 부호화한다. 즉, 두 개의 다른 좌측 영상으로부터 구한 예측을 사용하여 B-픽쳐로 부호화한다. 이때, 두 개의 참조영상 중 하나는 시간적으로 디스플레이될 때의 좌측 영상이며, 다른 하나는 시간적으로 다음에 나올 좌측 영상이다. As shown in the figure, the left image is encoded using a non-scalable MPEG-2 encoder, and the right image is an MPEG-2 temporally positioned secondary view based on the decoded left image. Encoding is done using an encoder (temporal auxiliary view encoder). That is, the B-picture is encoded by using prediction obtained from two different left images. In this case, one of the two reference images is the left image when displayed in time, and the other is the left image that is next shown in time.

그리고, 두 개의 예측은 움직임 추정/보상과 마찬가지로 순방향(forward), 역방향(backward), 양방향(interpolated)의 세 가지 예측모드를 만든다. 여기서 순방향 모드는 같은 시간의 좌측 영상으로부터 예측한 변이를 의미하며, 역방향 모드는 바로 다음의 좌측 영상으로부터 예측한 변이를 의미한다. 이러한 방법의 경우, 우측 영상의 예측은 두 개의 좌측 영상의 변이 벡터를 통해 이루어지기 때문에, 이런 형태의 예측방법을 변이만을 고려한 예측 부호화라고 하며, 결국, 부호화기에서는 우측 동영상의 각 프레임마다 두 개의 변이 벡터를 추정하고, 복호화기에서는 이 두 변이 벡터를 이용하여 좌측 동영상으로부터 우측 동영상을 복호화한다.The two predictions make three prediction modes: forward, backward, and interpolated, similar to motion estimation / compensation. Here, the forward mode refers to the shift predicted from the left image at the same time, and the reverse mode refers to the shift predicted from the next left image. In the case of this method, since the prediction of the right image is performed through the disparity vectors of the two left images, this type of prediction method is called predictive encoding considering only the variation. In the end, the encoder encodes two variations for each frame of the right image. The vector is estimated, and the decoder decodes the right video from the left video using these two disparity vectors.

도 4는 쌍방향 예측을 위해 변이 벡터와 움직임 벡터를 사용한 예측 부호화 를 설명하기 위한 예시도로서, 상기 도 3에서와 같이 쌍방향 예측을 통한 B-픽쳐를 사용하지만, 쌍방향 예측의 방향으로 한 개의 변이 추정과 한 개의 움직임 추정을 사용하는 것을 나타낸 것이다. 즉, 하나는 동 시간대의 좌측 영상으로부터의 변이 예측과 바로 이전 시간의 우측 영상으로부터의 움직임 예측을 사용한다.FIG. 4 is an exemplary diagram for explaining prediction coding using a disparity vector and a motion vector for bidirectional prediction. As shown in FIG. 3, B-picture using bidirectional prediction is used, but one variation is estimated in the direction of bidirectional prediction. And using one motion estimation. That is, one uses variation prediction from the left image of the same time zone and motion prediction from the right image of the previous time.

그리고, 변이만을 고려한 예측 부호화와 마찬가지로 쌍방향 예측도 순방향, 역방향 그리고 양방향 모드로 불리는 3가지의 예측모드를 만들어낸다. 여기서 순방향 모드는 복호화된 우측 영상으로부터의 움직임 예측을 말하며, 역방향 모드는 복호화된 좌측 영상으로부터의 변이 예측을 의미한다.Like predictive coding considering only variation, bidirectional prediction produces three prediction modes called forward, reverse, and bidirectional modes. Here, the forward mode refers to motion prediction from the decoded right image, and the reverse mode refers to disparity prediction from the decoded left image.

따라서, MPEG-2 다시점 프로파일(Multi-View Profile: MVP)의 규격 자체는 실제 스테레오 동영상에 적합하도록 설계되어 있어 다시점 동영상에 대한 부호화기의 구조는 전혀 언급이 되어 있지 않다는 문제점이 있어 다수의 사람에게 동시에 입체감 및 현장감을 제공하기 위한 다시점 동영상을 효율적으로 제공할 수 있는 부호화기가 필요한 것이다.Therefore, the specification of MPEG-2 Multi-View Profile (MVP) itself is designed to be suitable for real stereo video, so the structure of the encoder for multi-view video is not mentioned at all. There is a need for an encoder that can efficiently provide a multi-view video to provide a three-dimensional and realism at the same time.

한편, MPEG-2는 동영상 부호화 및 복호화에 대한 표준을 제시하고 있다. On the other hand, MPEG-2 proposes a standard for encoding and decoding video.

도 5는 MPEG-2에서 규정하고 있는 픽쳐 형태를 설명하기 위한 예시도이다.5 is an exemplary diagram for describing a picture type defined in MPEG-2.

도면에 도시된 바와 같이, MPEG-2에서 규정하고 있는 픽쳐는 I 픽쳐, P 픽쳐, B 픽쳐의 세 가지가 있는데, I(Intra-coded) 픽쳐는 움직임 벡터 추정/보상을 이용하지 않고 단순히 그 픽쳐만을 DCT하여 부호화하고, P(Predictive coded) 픽쳐는 I 픽쳐 또는 다른 P 픽쳐를 참조하면서 움직임 추정/보상을 한 후, 나머지 차분의 데이터를 DCT하여 부호화하며, B(Bidirectionally Predictive coded) 픽쳐는 P 픽쳐와 같이 움직임 보상을 사용하지만 시간축상에 있는 두 개의 프레임으로부터 움직임 추정/보상을 수행한다.As shown in the figure, there are three types of pictures defined in MPEG-2: I picture, P picture, and B picture. An I (Intra-coded) picture simply does not use motion vector estimation / compensation. Only DCT-encoded, P (Predictive coded) picture performs motion estimation / compensation while referring to I picture or other P picture, and then DCT-encodes the remaining differential data, and B (Bidirectionally Predictive coded) picture is P picture Motion compensation is used as shown below, but motion estimation / compensation is performed from two frames on the time axis.

MPEG-2의 픽쳐는 B, B, I, B, B, P,.... 와 같은 구조로 되어 있으며, I 픽쳐부터 다음의 I 픽쳐까지를 GOP(Group of Picture)라 칭하며, GOP내의 픽쳐 개수를 N이라고 하고 I 픽쳐와 P 픽쳐 사이 혹은 P 픽쳐와 P 픽쳐 사이의 픽쳐 개수를 M이라 정의한다.MPEG-2 pictures have the same structure as B, B, I, B, B, P, ..., and from the I picture to the next I picture are called GOP (Group of Picture), and the pictures in the GOP The number is N and the number of pictures between the I picture and the P picture or between the P picture and the P picture is defined as M.

기존의 2D 동영상 부호화/복호화 시스템의 경우, 시간축으로의 예측만을 생각하기 때문에 구조가 단순하였지만 다시점 영상을 압축하기 위해서는 시간축과 공간축을 모두 고려해야 하기 때문에 이전과는 다른 구조를 적용해야 할 필요성이 생기게 되었다. In the conventional 2D video encoding / decoding system, the structure is simple because it only considers the prediction on the time axis. However, in order to compress a multiview image, both the time axis and the spatial axis need to be considered. It became.

또한, 변이 추정은 다시점 동영상 부호화/복호화 시스템의 가장 큰 성능 좌우 요인인데, 카메라 간의 불일치로 인하여 변이 추정의 성능이 크게 떨어질 수 있다. 이런 이유로 인해 다시점 동영상 코덱의 성능은 기존의 2D 코덱들과 비교하여 크게 나아지지 못하는 문제점이 있다.In addition, the variation estimation is the biggest performance influence factor of the multi-view video encoding / decoding system, and the performance of the variation estimation may be greatly degraded due to inconsistency between cameras. For this reason, the performance of a multi-view video codec does not improve significantly compared to existing 2D codecs.

본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 새로운 구조의 GOP를 제안하여, 이를 통해 더욱 간단하고 효율적인 다시점 동영상을 부호화하고, 복호화기의 상황에 따라 다양한 시점의 영상을 복원하기 위한, 다시점 동영상 부호화/복호화 시스템의 비트 스트림 생성 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, and proposes a GOP of a new structure, thereby encoding a simpler and more efficient multi-view video, and reconstructing images of various views according to the situation of the decoder. An object of the present invention is to provide a method for generating a bit stream of a multiview video encoding / decoding system.

또한, 본 발명은 시점간 보정 변이 추정을 수행함으로써 다시점 영상간의 상관성을 효율적으로 제거하여, 시점간의 불균형을 해소하기 위한, 다시점 동영상 부호화/복호화 시스템의 시점간 보정 변이 추정 방법을 제공하는데 또 다른 목적이 있다.In addition, the present invention provides a method for estimating the inter-view correction variation of a multi-view video encoding / decoding system to efficiently remove the correlation between the multi-view images by performing the inter-view correction estimation. There is another purpose.

상기 목적을 달성하기 위한 본 발명은, 다시점 동영상 부호화/복호화 시스템의 부호화 방법에 있어서, 같은 시점 내에 있는 다수개의 다시점 영상의 집합인 다시점 그룹(GOMV)에 대하여 부호화를 수행하는 것을 특징으로 하는 다시점 동영상 부호화/복호화 시스템의 부호화 방법을 제공한다.According to an aspect of the present invention, there is provided a coding method of a multiview video encoding / decoding system, characterized in that encoding is performed on a multiview group (GOMV), which is a set of a plurality of multiview images within the same viewpoint. A coding method of a multi-view video encoding / decoding system is provided.

또한, 본 발명은 같은 시점 내에 있는 다수개의 다시점 영상의 집합인 다시점 그룹(GOMV)에 대하여 부호화를 수행하기 위한, 다시점 동영상 부호화/복호화 시스템의 부호화 방법에 있어서, GOP를 시점축으로 확장시킨 GGOP에서 기준 GOP(base GOP)-I 픽쳐를 포함하는 GOP임-의 프레임에 대한 부호화를 수행하는 단계; 상기 기준 GOP를 이용하여 적어도 하나 이상의 GOP_B-P 픽쳐를 적어도 하나 이상 포함하는 GOP임-의 프레임에 대한 부호화를 수행하는 단계; 및 상기 기준 GOP 및 상기 GOP_B를 이용하여상기 기준 GOP와 적어도 하나 이상의 상기 GOP_B의 사이에 존재하는 GOP_C-B 픽쳐만을 포함하는 GOP임-의 프레임에 대하여 양방향으로 변이 추정을 수행하여 부호화를 수행하는 단계를 포함하는 다시점 동영상 부호화/복호화 시스템의 부호화 방법을 제공한다.In addition, the present invention extends a GOP to a view axis in an encoding method of a multiview video encoding / decoding system for encoding a multiview group (GOMV), which is a set of a plurality of multiview images within the same view. Performing encoding on a frame of a GOP including a base GOP-I picture in the GGOP; Performing encoding on a frame of a GOP including at least one or more GOP _B- P pictures using the reference GOP; And using the reference GOP and the GOP _B A multiview video encoding / operation is performed by performing disparity estimation in both directions on a frame of a GOP _C -B picture including only a GOP _C -B picture existing between the reference GOP and at least one or more GOP _Bs . A coding method of a decoding system is provided.

또한, 본 발명은 다시점 동영상 부호화/복호화 시스템의 시점간 보정 변이 추정 방법에 있어서, 전역 보정 파라미터를 구하여, 이를 기준으로 초기 대표값을 결정하고, 이 값들을 이용하여 변이 추정을 수행한 후 가장 좋은 성능을 내는 인덱스를 선택하고, 각 인덱스에 대해서 확률 모델을 계산하여, 이를 이용하여 최적의 대표 보정 파라미터를 결정하는 초기단계; 대표 보정 파라미터를 이용하여 변이 추정한 변이 벡터를 통하여 색차 영상에 대한 보정 파라미터를 결정하는 압축단계; 및 각 보정값을 다른 시간의 영상에 적용하는 적용단계를 포함하는 다시점 동영상 부호화/복호화 시스템의 시점간 보정 변이 추정 방법을 제공한다.The present invention also provides a method for estimating the inter-view correction variation in a multi-view video encoding / decoding system, obtaining a global correction parameter, determining an initial representative value based on this, and performing the estimation of the variation using the values. Selecting an index having a good performance, calculating a probabilistic model for each index, and determining an optimal representative correction parameter using the index; A compression step of determining a correction parameter for the chrominance image using the disparity estimation vector obtained by using the representative correction parameter; And it provides a method for estimating the inter-view correction variation of a multi-view video encoding / decoding system comprising the step of applying each correction value to the image of a different time.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 우선 각 도면의 구성요소들에 참조 번호를 부가함에 있어서, 동일한 구성요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. First of all, in adding reference numerals to the components of each drawing, it should be noted that the same components have the same number as much as possible even if displayed on different drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서는, 같은 시점 내에 있는 N개의 다시점 영상의 집합을 "다시점 그룹(Group of multiview; 이하, 간단히 'GOMV'라 함)"으로 정의하기로 한다. 기존의 2차원 동영상 코덱의 경우, 시간의 순서대로 I, P, B 픽쳐를 할당하여 부호화하지만, 다시점 영상으로 확장할 경우 부호화 순서가 복잡해질 수 있다. 따라서, 본 발명에서는 부호화기의 구조를 단순화하면서 압축 효율을 극대화시키기 위해서 GOMV 구조를 제안하는 것으로 한다.In the present invention, a set of N multiview images within the same viewpoint will be defined as "Group of multiviews (hereinafter, simply referred to as 'GOMV'). In the conventional 2D video codec, I, P, and B pictures are allocated and encoded in a chronological order, but the coding order may be complicated when extended to a multiview image. Therefore, the present invention proposes a GOMV structure in order to maximize the compression efficiency while simplifying the structure of the encoder.

도 6은 본 발명에 따라 정의되는 GOMV의 일실시예 구조도이다.6 is an embodiment structural diagram of a GOMV defined in accordance with the present invention.

도면에 도시된 바와 같이, 전체 부호화기에서 GOMV를 부호화하는 위치는 2차원 동영상에서 한 프레임을 부호화하는 위치와 동일하다. GOMV내에서 한 프레임을 부호화하는 방법은 기존 2차원 동영상에서 한 프레임을 부호화하는 함수를 이용한다. 기존의 2차원 동영상에서의 부호화 함수에 대하여는, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 자명하다 할 것이므로, 그 상세한 설명은 생략하기로 한다.As shown in the figure, the position of encoding GOMV in the entire encoder is the same as the position of encoding one frame in the 2D video. A method of encoding one frame in the GOMV uses a function of encoding one frame in the existing 2D video. The encoding function of the existing two-dimensional video will be apparent to those of ordinary skill in the art, and thus a detailed description thereof will be omitted.

한편, 본 발명에서는, 다시점 동영상의 GOP의 집합으로서, GOP를 시점축으로 확장시킨 구조를 "Group of GOP(이하, 'GGOP'라 함)이라 하기로 한다.Meanwhile, in the present invention, as a set of GOPs of a multiview video, a structure in which the GOP is expanded on the view axis will be referred to as "Group of GOP" (hereinafter referred to as "GGOP").

도 7은 본 발명에 따라 정의되는 GGOP의 일실시예 구조도이다.7 is a structural diagram of an embodiment of a GGOP defined in accordance with the present invention.

도면에 도시된 바와 같이, 본 발명에 따른 GGOP는 기준 GOP(Base GOP; I픽쳐를 포함하는 GOP)를 기준으로 다른 GOP들을 변이 추정 기법을 이용하여 부호화하게 된다. 시간축으로의 부호화 방법은 기존의 일반적인 GOP 부호화 방식과 동일하다. As shown in the figure, the GGOP according to the present invention encodes other GOPs based on a base GOP (GOP including an I picture) using a variation estimation technique. The coding method on the time axis is the same as the conventional general GOP coding method.

도 8 및 도 9는 각각 상기 도 7에서 GOP_B 및 GOP_C를 설명하기 위한 일실시예 구조도이다.8 and 9 are exemplary structural diagrams for describing GOP _B and GOP _C in FIG. 7, respectively.

GOMV 내에서의 프레임 부호화 방식은 상기 도 8과 같이 기준 GOP(Base GOP)의 프레임을 부호화하고 이 프레임을 기준으로 P 픽쳐를 적어도 하나 이상 포함하는 GOP인 'GOP_B'에 속하는 프레임들을 부호화한다. GOP_B에 해당하는 프레임들을 부호화한 후, B 픽쳐만으로 구성된 GOP인 'GOP_C'에 해당하는 프레임에 대해 부호화를 수행한다. 이 프레임들은 미리 부호화된 기준 GOP(Base GOP)와 GOP_B를 이용해서 도 9와 같이 양방향으로 변이 추정을 수행하게 된다. The frame coding scheme in the GOMV encodes a frame of a base GOP (base GOP) as shown in FIG. 8 and encodes frames belonging to 'GOP _B ', which is a GOP including at least one P picture based on the frame. After encoding frames corresponding to GOP _B , encoding is performed on a frame corresponding to 'GOP _C ', which is a GOP composed of only B pictures. These frames use the pre-coded base GOP and GOP _B to perform disparity estimation in both directions as shown in FIG. 9.

즉, GOP_B는 공간적으로 기준 GOP와 미리 부호화된 GOP_B를 참조하여 압축을 수행하며, 시간적으로 기존의 2차원 부호화 방식을 이용할 수 있다.In other words, GOP is _B with reference to a spatially pre-encoding the reference GOP GOP _B performs the compression, in time can be used for conventional two-dimensional coding method.

H.264에서는 2개 이상의 참조 영상을 제공하기 때문에 시간축에서의 움직임 추정을 양방향으로 수행하여도 공간상에서도 양방향 변이 추정을 수행할 수 있기 때문에 보다 효율적인 압축을 수행할 수 있다. 다만, H.264 이전의 부호화기에서는 2개 방향으로의 추정이 최대이기 때문에 다시점 영상에서 한쪽 방향으로 움직임 추정을 수행하면 변이 추정 또한 한 방향밖에 선택하지 못하는 단점이 있었지만 H.264에서는 여러 개의 참조 영상을 지원하므로 다시점 영상에서 효율을 극대화시킬 수 있다. Since two or more reference images are provided in H.264, bidirectional disparity estimation can be performed even in space even if motion estimation is performed in both directions, thereby enabling more efficient compression. However, in the encoder before H.264, the estimation in two directions is maximum, so if the motion estimation is performed in one direction in a multiview image, the variation estimation also selects only one direction. However, in H.264, multiple references are used. Support for video maximizes efficiency in multiview video.

특히, GOP_C에 해당하는 프레임의 경우, 상/하/좌/우 방향으로 총 4개의 인접 참조 영상을 사용할 수 있는데, 이런 경우 적은 비트로도 높은 화질의 영상을 복원할 수 있다. 즉, GOP_C는 공간적으로 미리 부호화된 기준 GOP와 GOP_B를 참조 영상으로 하여 부호화하고, 시간적으로 기존의 2차원 부호화 방식을 따른다. In particular, in the case of a frame corresponding to GOP _C , four adjacent reference images can be used in the up / down / left / right directions. In this case, a high quality image can be reconstructed even with a few bits. That is, GOP _C encodes the reference GOP and GOP _B spatially pre-coded as reference pictures, and temporally follows a conventional two-dimensional coding scheme.

도 10은 본 발명에 따라 시점 스케일러빌리티를 고려한 GGOP의 비트스트림 저장 위치를 설명하기 위한 일실시예 구조도로서, N이 8인 경우를 설명한 것이다.FIG. 10 is a diagram for explaining a bitstream storage location of a GGOP in consideration of view scalability according to the present invention. FIG.

복호화단에서 영상을 복호화하여 영상을 출력할 때 기존의 2D 디스플레이, 스테레오 디스플레이 및 다시점 디스플레이를 모두 지원하기 위해서 본 발명의 부 호화 방법은 스케일러빌리티(scalability)를 지원한다. The encoding method of the present invention supports scalability in order to support the existing 2D display, stereo display, and multi-view display when the decoding unit outputs the image by decoding the image.

상기 도 7의 GGOP구조에서 기준 GOP(Base GOP)는 다른 시점의 참조 영상이 되며, 그 GOP 자체로 복호화가 가능하기 때문에 이 GOP에 대한 정보만을 추출하여 복호화할 경우 복호화기는 2D 영상을 완벽하게 복원할 수 있다. 그렇기 때문에 기준 GOP(Base GOP)를 기본계층(Base layer)을 통해서 전송하게 한다. In the GGOP structure of FIG. 7, the base GOP becomes a reference image at another point in time, and since the GOP itself can be decoded, the decoder completely reconstructs the 2D image when only the information on the GOP is extracted and decoded. can do. Therefore, base GOP is transmitted through the base layer.

GOP_B에 해당하는 시점 중 기본계층(Base layer)에 가장 가까운 시점은 고위계층(Enhancement layer) #1을 통해서 전송한다. 기본계층(Base layer)의 정보와 고위계층(Enhancement layer) #1의 정보를 가지고 복호화기는 스테레오 영상들을 복원할 수 있게 된다. GOP_B의 나머지 시점들은 고위계층(Enhabcement layer) #2를 통해서 전송된다. 기본계층(Base layer)과 고위계층(Enhancement layer) #1과 #2를 전송받은 복호화기는 N/2개의 시점을 복원할 수 있다. GOP_C에 해당하는 프레임들은 고위계층(Enhancement layer) #3을 통해서 전송된다. GOMV내의 모든 N개의 다시점 영상들을 복원하기 위해서는 기본계층(Base layer) 및 고위계층(Enhancement layer) #1, #2와 #3을 모두 필요로 한다. The time point closest to the base layer among the time points corresponding to the GOP _B is transmitted through the enhancement layer # 1. With the information of the base layer and the information of the enhancement layer # 1, the decoder can reconstruct stereo images. The remaining time points of GOP _B are transmitted through Enhabcement layer # 2. The decoders having received the base layer and the enhancement layer # 1 and # 2 may restore N / 2 views. Frames corresponding to GOP _C are transmitted through enhancement layer # 3. In order to reconstruct all the N multiview images in the GOMV, both the base layer and the enhancement layer # 1, # 2, and # 3 are required.

한편, 변이 추정은 시점간의 불일치로 인해 성능이 저하될 수 있다. 3차원 공간상에서 어느 한 좌표는 다시점 영상에 어느 위치들로 각각 맵핑이 되는데 이 맵핑된 지점들에서 같은 휘도와 색차 값을 지녀야만 정상적인 변이 추정을 수행할 수 있다. 그러나 카메라간 파라미터의 불일치로 인해 같은 값을 가지지 못하고 다른 값을 가지게 될 수도 있다. 이렇게 같은 위치를 표시하는 화소인데도 불구하고 다른 휘도와 색차 값을 가지게 되는 경우 변이 추정의 신뢰도는 하락하게 된다. On the other hand, the variance estimation may be degraded due to inconsistency between views. One coordinate is mapped to a certain position in a multiview image in three-dimensional space, and normal shift estimation can be performed only if the mapped points have the same luminance and color difference values. However, due to the mismatch of parameters between cameras, they may not have the same value but may have different values. In the case of the pixel displaying the same position as described above, if the luminance and the color difference are different, the reliability of the disparity estimation decreases.

다시점 동영상 부호화기의 성능의 대부분은 시점간의 상관성의 제거로부터 나오게 되는데, 이는 변이 추정을 이용해서 이루어지게 된다. 그러므로 낮은 변이 추정의 성능은 다시점 동영상 부호화기의 성능의 하락을 야기한다. Much of the performance of multi-view video encoders comes from the elimination of inter-view correlation, which is done using disparity estimation. Therefore, the performance of low variance estimation causes a decrease in the performance of a multiview video encoder.

보통, 시점간의 불일치 문제는 전처리로 해결되어 왔다. 하지만, 전처리 과정으로 다시점간의 불일치 문제를 완벽하게 해결할 수 없을 뿐만 아니라, 전처리 과정을 거치고 난 후의 영상은 원영상과 비교하여 훼손될 수 있으며 전처리 과정의 특성상 낮은 선명도를 가지게 되므로 바람직한 방법은 아니다. 그러므로 본 발명에서는 다시점 동영상간의 불일치를 고려한 변이 추정 방식을 제시하기로 한다.Usually, the problem of inconsistency between viewpoints has been solved by preprocessing. However, the preprocessing process may not completely solve the problem of inconsistency between multi-view points, and after the preprocessing process, the image may be damaged compared to the original image, and it is not preferable because the preprocessing process has a low sharpness. Therefore, the present invention will propose a variation estimation method in consideration of the inconsistency between the multi-view video.

도 11은 본 발명에 따른 다시점 동영상 부호화/복호화 시스템의 시점간 보정 변이 추정 방법(Inter-view Balanced Disparity Estimation; IBDE)을 설명하기 위한 일실시예 흐름도이다.11 is a flowchart illustrating an inter-view balanced disparity estimating method (IBDE) of a multiview video encoding / decoding system according to the present invention.

도면에 도시된 바와 같이, 본 발명의 보정 변이 추정 방법은, 전역 보정 파라미터인

와

값을 구한다. 이후, 상기 전역 보정 파라미터를 기준으로 초기 대표값

와

를 정하고 최적의 대표값들(a[n]과 b[n])을 구하게 된다.As shown in the figure, the correction shift estimation method of the present invention is a global correction parameter.

Wow

Find the value. Thereafter, an initial representative value based on the global correction parameter

Wow

And the optimal representative values (a [n] and b [n]).

초기 보정 파라미터

과

은 전역 보정 파라미터를 기준으로 영상의 특성을 고려하여 일정한 간격으로 배치한다. 이 값들을 이용하여 변이 추정을 수행한 후 가장 좋은 성능을 내는 인덱스(index) n과 m을 선택하고, 각 인덱스에 대해서 확률 모델을 계산한다. 이 확률 모델을 'Lloyd max 기법'을 이용하여 최 적의 대표 보정 파라미터

과

을 선택하게 되는 것이다. 여기서,

와

으로 대표 보정 파라미터의 수는 초기 보정 파라미터의 수보다 적다. Initial calibration parameters

and

Are arranged at regular intervals taking into account the characteristics of the image based on the global correction parameters. After performing the disparity estimation using these values, we select the indexes n and m that have the best performance, and compute the probability model for each index. This probabilistic model is then optimized using the 'Lloyd max technique'

and

Will be selected. here,

Wow

As a result, the number of representative correction parameters is smaller than the number of initial correction parameters.

이렇게 선택된 대표 보정 파라미터들은 압축 단계에서 비용 함수에 적용된다. 색차 영상에 대한 보정 파라미터는 대표 보정 파라미터를 이용해서 구한 변이 벡터를 이용하여 일치 영역을 구하고, 그 영역에 대해서 보정 파라미터를 계산하여 구하게 된다. The representative calibration parameters thus selected are applied to the cost function in the compression step. The correction parameter for the chrominance image is obtained by using the disparity vector obtained by using the representative correction parameter to obtain a matching area, and calculating the correction parameter for the area.

이하, 상기 도 11에 도시된 바와 같이, 본 발명의 시점간 보정 변이 추정 방법을 '초기단계' 및 '압축단계'로 구분하여 각각에 대하여 설명하기로 한다.Hereinafter, as illustrated in FIG. 11, the method for estimating the inter-view correction shift according to the present invention will be described in detail by dividing the initial stage and the compression stage.

먼저, '초기단계'에 대하여 설명하기로 한다.First, the initial stage will be described.

디지털 스테레오 쌍의 영상 사이에 휘도 불균형을 제거하기 위한 불균형 보정 알고리즘은 기존의 여러 논문에서 제안되었다. 본 발명에서는 기본적으로 이 이론을 전역 보정 파라미터 계산에 사용하기로 한다. 이런 접근의 기본적인 가정은 카메라간의 파라미터의 불일치로 본다. An imbalance correction algorithm for eliminating luminance imbalance between images of digital stereo pairs has been proposed in many existing papers. In the present invention, this theory is basically used for the calculation of the global correction parameter. The basic assumption of this approach is the mismatch of parameters between cameras.

현재 시점의 휘도성분의 평균과 분산에 참조 시점의 휘도성분에 간단한 선형 변환을 수행하여, 전처리 과정으로 변환된 참고 시점을 생성한다. 이것은 다음의 수학식 1과 같다.A simple linear transformation is performed on the luminance component of the reference viewpoint to the average and variance of the luminance component of the current viewpoint, thereby generating a reference viewpoint converted into a preprocessing process. This is shown in Equation 1 below.

이때,

와

은 각각 원래 참조 영상의 휘도 성분값과 보정된 참조영상의 휘도 성분을 의미하고, a와 b는 보정 파라미터이다. At this time,

Wow

Denotes the luminance component value of the original reference image and the luminance component of the corrected reference image, respectively, and a and b are correction parameters.

변환된 참조 시점의 평균값과 분산값은 다음의 수학식 2와 같다.The average value and the variance of the converted reference time point are shown in Equation 2 below.

는 각각 참조 영상과 보정된 참조 영상의 평균값,

와

는 각각 참조 영상과 보정된 참조 영상의 분산값을 의미한다. 파라미터

,

는 다음의 수학식 3에 의해 구할 수 있다.

Are the mean values of the reference image and the corrected reference image, respectively,

Wow

Denote variance values of the reference image and the corrected reference image, respectively. parameter

,

Can be obtained by the following equation (3).

위에서 언급했던 바와 같이, 현재 영상의 평균, 표준편차값과 변환된 우영상의 평균과 표준편차가 같다는 두 가지 조건을 만족시켜야 한다. 따라서, a와 b의 값은 다음의 수학식 4를 통해서 구할 수 있다.As mentioned above, two conditions must be met: the mean, standard deviation of the current image, and the mean and standard deviation of the converted right image are equal. Therefore, the values of a and b can be obtained through Equation 4 below.

이 방식은 색차성분에도 똑같이 적용된다. 그러나 이 방식으로는 두 시점의 휘도 성분의 전역 보정 파라미터에서만 사용된다. 전역 보정 파라미터를 구한 후 이값을 기준으로 초기 보정 파라미터를 결정한다. 알고리즘의 복잡도를 피하기 위해

과

의 배열에서 n과 m의 수는 영상의 특성을 고려하여 적당한 값을 선택한다.

과

의 값은 수학식 5와 같이 일정한 간격으로 배치되며,

의 값은

에 의해서 결정된다.

의 값들의 중간값은

로 설정되도록

의 값을 결정한다.

의 값들도 와 마찬가지로 설정한다.The same applies to chrominance components. However, this method is used only in the global correction parameter of the luminance component of two viewpoints. After the global calibration parameter is obtained, the initial calibration parameter is determined based on this value. To avoid the complexity of the algorithm

and

The number of n and m in the array is selected appropriately considering the characteristics of the image.

and

The values of are arranged at regular intervals as shown in Equation 5,

The value of

Determined by

The median of the values of

To be set to

Determine the value of.

Set the values of as in.

블록기반 변이 추정을 위한 비용함수로 다음의 수학식 6의 SAD(Sum absolute difference)를 사용하기로 한다.As a cost function for estimating block-based variation, SAD (Sum absolute difference) of Equation 6 will be used.

초기 단계에서 대표 보정 파라미터를 결정하기 위해서는

과

에 대해서 확률 모델을 필요로 하는데, 이는 직접적으로 변이 추정을 수행해야만 구할 수 있다. 초기 보정 파라미터를 적용한 새로운 SAD 비용 함수는 다음의 수학식 7과 같다.To determine representative calibration parameters at an early stage

and

We need a probabilistic model for, which can be obtained only by performing the variance estimation directly. The new SAD cost function applying the initial correction parameter is shown in Equation 7 below.

는 변이 벡터를 의미하고,

,

은 초기 보정 파라미터의 인덱스이다.

와

는 블록의 가로 및 세로 크기를 의미한다. 이 비용 함수를 이용하여 각 매크로 블록마다

와

의 인덱스를 결정하게 되고, 확률 모델을 만든다. 또한, 이 인덱스를 저장하여 후에 색차 보정 파라미터를 구할 때 사용한다.

Means the variation vector,

,

Is the index of the initial calibration parameter.

Wow

Means the horizontal and vertical size of the block. Using this cost function, each macro block

Wow

Determine the index of, and create a probabilistic model. In addition, this index is stored and used later to obtain color difference correction parameters.

이렇게 초기 보정 파라미터

과

에 대해서 확률 모델을 만들고 난 후, 로이드 맥스 방법(Lloyd max method)을 이용해서 대표 보정 파라미터

과

를 결정하게 된다. 여기서

와

은

과

더욱 작은 값으로 선택한다. 시점간 보정 변이 추정을 수행한 후 대표 보정 파라미터의 인덱스를 보내야 복호화단에서 영상을 복원할 수 있기 때문에 너무 큰 수의

와

을 선택할 경우 부호화량이 증가하게 되어 성능이 악화될 수 있기 때문이다. So the initial calibration parameters

and

After creating a probabilistic model for, use the Lloyd max method to represent the representative calibration parameters.

and

Will be determined. here

Wow

silver

and

Choose a smaller value. After performing the point-to-point correction variation estimation, it is necessary to send the index of the representative correction parameter so that the decoder can reconstruct the image.

Wow

This is because the encoding amount may be increased if the performance is deteriorated.

이하에서는, 상기 도 11의 '압축단계'에 대하여 설명하기로 한다.Hereinafter, the 'compression step' of FIG. 11 will be described.

이 단계에서는 이전 단계에서 구한 대표 보정 파라미터

과

를 이용하여 직접적인 변이 추정을 수행한다. 시점간 보정 변이 추정의 SAD 비용함수는 상기 수학식 7과 같이 검색 범위를 나타내는

와

외에도 보정 파라미터에 서도 반복 문을 수행해야만 하기 때문에 계산량의 증가를 야기한다. In this step, the representative calibration parameters obtained in the previous step

and

We perform direct variation estimation using. The SAD cost function of the inter-view corrected variance estimation represents a search range as shown in Equation 7 above.

Wow

In addition, because the iterative statement must be executed in the calibration parameter, it increases the amount of computation.

또한, SAD 함수 내에서도 참조 영상을 보정하기 위해서

와

값을 곱하고 더하는 과정이 들어가기 때문에 더욱 많이 계산량이 증가하게 된다. 이런 문제점을 해결하기 위해 대표 보정 파라미터가 결정되고 난 후 다음의 수학식 8과 같이 영상에 대한 새로운 히스토그램을 생성한다. 이때, 모든 경우의 수를 생각하여 새로운

을 생성한다.Also, to correct the reference image within the SAD function,

Wow

The process of multiplying and adding values increases the amount of computation. To solve this problem, after the representative correction parameter is determined, a new histogram for the image is generated as shown in Equation 8 below. At this time, considering the number of all cases

Create

0부터 255까지의 휘도 성분을 의미한다. 모든 휘도 성분에 대해서

과

에 매칭되는 새로운 값을 계산한 값을

에 저장하는 것이다. 이 배열의 계산에는 많은 연산량을 필요로 하지 않는다. 새로운 히스토그램을 이용하여 비용함수를 정의하면 다음의 수학식 9와 같다.

Means a luminance component from 0 to 255. For all luminance components

and

Computes a new value that matches

To save on. The computation of this array does not require much computation. The cost function is defined using the new histogram as shown in Equation 9 below.

상기 수학식 9는 상기 수학식 8과 같은 성능을 내면서도 계산량은 줄어듦을 알 수 있다.Equation (9) shows the same performance as in Equation (8) while reducing the amount of calculation.

색차 보정 파라미터를 찾기 위해서 휘도영상 변이 추정과정에서 대표 보정 인덱스 정보를 저장하였다. 또한, 변이 추정을 수행하였기 때문에 변이 벡터 정보를 구하였다. 이 변이 벡터들을 이용해서 현재 매크로 블록이 참조 영상의 어느 부분에 매칭되는지를 구해낼 수 있다. In order to find the chrominance correction parameter, representative correction index information was stored in the process of estimating luminance image variation. In addition, since the disparity estimation was performed, disparity vector information was obtained. The disparity vectors may be used to determine which part of the reference picture the current macroblock matches.

색차 영상에 대해서 변이 보상 기법을 사용하여 매칭시킨 후 현재 매크로블록의 색차 신호의 평균 분산값과 매칭된 참조 영상의 색차 신호의 평균 분산값이 일치하도록 휘도 성분에서 한 방법과 동일하게 상기 수학식 4를 적용하여 색차 성분에 대한 보정값을 만들어 낸다. Equation 4 is performed in the same manner as in the luminance component to match the average variance value of the chrominance signal of the matched reference image after matching the chrominance image using a disparity compensation technique. Apply to create a correction value for the chrominance component.

색차 보정 파라미터는 휘도 보정 파라미터와 동일한 방법으로 구할 수 있지만, 이 경우 색차 보정 파라미터에 대해서도 인덱스 정보를 복호화기에 전송해야 하므로 부호화량이 증가하는 단점이 있기 때문에, 휘도 보정 파라미터와 변이 벡터를 이용하여 유추하는 방법을 이용한다. The chrominance correction parameter can be obtained in the same way as the luminance correction parameter, but in this case, since the encoding amount must be transmitted to the decoder for the chrominance correction parameter, there is a disadvantage in that the coding amount is increased. Use the method.

이 경우 휘도 보정 파라미터의 인덱스 정보만을 알고 있으면 휘도와 색차 둘 다 보정 파라미터를 알 수 있기 때문에, 부호화해야할 정보가 적을 뿐만 아니라 독립적으로 인덱스를 구해낼 때와 비교하여 유사한 성능을 내는 것을 확인할 수 있다.In this case, if only the index information of the luminance correction parameter is known, the correction parameter can be known for both the luminance and the color difference. Therefore, it is confirmed that not only the information to be encoded has similar performance compared to when the index is independently obtained.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기한 바와 같은 본 발명은, GOMV와 GGOP 구조를 이용하여 스케일러빌리티를 고려한 다시점 동영상 부호화를 수행함과 동시에 부호화 효율을 상승할 수 있도록 하는 효과가 있다.As described above, the present invention has the effect of increasing the coding efficiency while performing multi-view video encoding considering scalability by using the GOMV and GGOP structures.

또한, 본 발명은 다시점 동영상간의 불균형 문제점을 해결하는 시점간 보정 변이 추정을 사용하여 다시점 동영상간의 상관성을 효과적으로 제거하여 다시점 동영상 부호화기의 성능을 향상시키도록 하는 효과가 있다. In addition, the present invention has an effect of improving the performance of the multi-view video encoder by effectively removing the correlation between the multi-view video by using the inter-view corrected variance estimation to solve the imbalance between the multi-view video.

Claims

delete

In the multi-view video encoding / decoding method for performing encoding on a multi-view group (GOMV), which is a set of a plurality of (N) multi-view pictures within the same view,

Performing encoding on a frame of a GOP including a base GOP-I picture in a GGOP in which the GOP is extended with a view axis;

Performing encoding on a frame of a GOP including at least one or more GOP _B- P pictures using the reference GOP; And

Using the reference GOP and the GOP _B Performing encoding by performing a disparity estimation in both directions on a frame of a GOP _C -B picture including only a GOP _C picture existing between the reference GOP and at least one or more of the GOP _Bs .

Multi-view video encoding / decoding method comprising a.

The method of claim 2,

The encoding for the GOP _B is

Performing compression with reference to a reference GOP and a precoded GOP _B spatially

Multi-view video encoding / decoding method characterized in that.

The method of claim 2,

The encoding for the GOP _C is

Spatially referencing the reference GOP and GOP _B

Multi-view video encoding / decoding method characterized in that.

The method of claim 2,

The reference GOP is,

Transmitted to an encoder through a base layer, the encoder reconstructing a two-dimensional image

Multi-view video encoding / decoding method characterized in that.

The method of claim 2,

The time point closest to the base layer among the time points corresponding to the GOP _B ,

Transmitted to an encoder through a first enhancement layer, wherein the encoder reconstructs a stereo image using the base layer and the first high layer.

Multi-view video encoding / decoding method characterized in that.

The method of claim 2,

GOP _B other than the GOP _B transmitted through the first higher layer,

Transmitted to an encoder through a second higher layer, wherein the encoder restores N / 2 viewpoints using the base layer and the first and second higher layers.

Multi-view video encoding / decoding method characterized in that.

The method of claim 2,

The GOP _C is,

Transmitted to an encoder through a second high layer, wherein the encoder reconstructs N multi-view images using the base layer and the first, second, and third high layers.

Multi-view video encoding / decoding method characterized in that.

In the multi-view video inter-view corrected shift estimation method,

The global representative parameter is obtained, the initial representative value is determined based on these values, the variance estimation is performed using these values, the index that performs the best performance is selected, and the probability model is calculated for each index. An initial step of determining an optimal representative correction parameter;

A compression step of determining a correction parameter for the chrominance image using the disparity estimation vector obtained by using the representative correction parameter; And

Application step of applying each correction value to images of different time

The multi-view video inter-view correction variation estimation method comprising a.

The method of claim 9,

The initial stage,

Global Compensation Parameter

And

Determining;

remind

And

The initial calibration parameter

and

Determining;

Performing variation estimation using a cost function; And

Representative Calibration Parameter

and

Steps to determine

The multi-view video inter-view corrected variation estimation method comprising a.

The method of claim 10,

The global compensation parameter

And

Is,

Determined by the following equation

The method for estimating the correction variation between multi-view video points of view, characterized in that the.

The method of claim 10,

The cost function is

Having the following equation

The method of claim 10,

The representative correction parameter

and

silver,

What to determine by the Lloyd Max method

The method of claim 9,

The compression step,

The representative correction parameter

and

Estimating the variation using; And

Determining the color difference correction parameter by matching the luminance and color shift vectors of the color image.

The method of claim 14,

The cost function for variance estimation is

Having the following equation

The method of claim 14,

Determining the color difference correction parameter,

Determining which part of the reference image the current macroblock matches using representative correction index information stored in the luminance image disparity estimation and the disparity vector information.

The method for estimating the correction variation between multi-view video points of view, characterized in that.