JP6799468B2

JP6799468B2 - Image processing equipment, image processing methods and computer programs

Info

Publication number: JP6799468B2
Application number: JP2017006084A
Authority: JP
Inventors: 敬介野中
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2020-12-16
Anticipated expiration: 2037-01-17
Also published as: JP2018116421A

Description

本発明は、画像処理装置、画像処理方法及びコンピュータプログラムに関する。 The present invention relates to an image processing apparatus , an image processing method and a computer program .

従来、スポーツシーンなどを対象として、カメラ視点以外の自由な視点からの映像（以下、自由視点映像と称す）を生成する技術が提案されている。この技術は、複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 Conventionally, a technique for generating an image from a free viewpoint other than the camera viewpoint (hereinafter referred to as a free viewpoint image) has been proposed for sports scenes and the like. This technology synthesizes images from virtual viewpoints that are not arranged based on images taken by multiple cameras, and displays the results on the screen for viewing images from various viewpoints. It is possible.

ここで、自由視点映像を合成する技術のうち、ビルボードと呼ばれる簡易なモデルを利用して高速に自由視点映像を合成する技術が存在する（非特許文献１参照）。このビルボードを利用した技術では、映像からモデル化対象のオブジェクトのテクスチャを正確に切り出し、それを厚みのないビルボードモデルとして仮想空間の地面に立たせることで、自由視点映像を生み出す。 Here, among the techniques for synthesizing free-viewpoint images, there is a technique for synthesizing free-viewpoint images at high speed using a simple model called a billboard (see Non-Patent Document 1). In the technology using this billboard, the texture of the object to be modeled is accurately cut out from the image, and it is made to stand on the ground of the virtual space as a thin billboard model to create a free viewpoint image.

ここで、一般にビルボード方式では、あるビルボードの最下点（例えば、人物の足先）が仮想空間の地面に接するようにビルボードが配置される。また、仮想視点が水平方向に移動する際はその仮想視点の移動に合わせてビルボードを回転させ、垂直方向に移動する際はビルボードの方向を変化させない。 Here, generally, in the billboard method, the billboard is arranged so that the lowest point (for example, the toes of a person) of a certain billboard touches the ground of the virtual space. Further, when the virtual viewpoint moves in the horizontal direction, the billboard is rotated according to the movement of the virtual viewpoint, and when the virtual viewpoint moves in the vertical direction, the direction of the billboard is not changed.

Hayashi, K.; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer StadiumADIUM," in Computer Graphics, Imaging and Visualisation, 2006 International Conference on , vol., no., pp.220-225, 26-28 July 2006Hayashi, K .; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer Stadium ADIUM," in Computer Graphics, Imaging and Visualization, 2006 International Conference on, vol., No., pp.220-225, 26-28 July 2006

非特許文献１に記載の方式は、品質の高い自由視点映像を高速に合成可能であり、かつ合成されたコンテンツデータのサイズが他の方式に比べて小さい、という点において優れている。しかしながら、ビルボードを地面に垂直に立たせるという制約のため、実空間の被写体とビルボードとが対応しなくなる状況が発生しうる。この場合、得られる自由視点映像が不自然となる虞がある。 The method described in Non-Patent Document 1 is excellent in that high-quality free-viewpoint video can be synthesized at high speed and the size of the synthesized content data is smaller than that of other methods. However, due to the restriction that the billboard stands vertically on the ground, a situation may occur in which the subject in the real space and the billboard do not correspond to each other. In this case, the obtained free-viewpoint image may be unnatural.

本発明はこうした課題に鑑みてなされたものであり、その目的は、ビルボードを用いたより質の高い自由視点映像の生成を可能とする技術の提供にある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique capable of generating a higher quality free-viewpoint image using a billboard.

本発明のある態様は、画像処理装置に関する。この画像処理装置は、実空間内の被写体を複数の視点から撮像することにより得られる複数の画像を取得する手段と、取得された複数の画像から、実空間における被写体の位置を表す座標を算出する手段と、取得された複数の画像のうちの少なくともひとつの画像から生成される被写体のビルボードを、算出された座標を参照して配置することによって、複数の視点に含まれない視点に対応する合成画像を生成する手段と、を備える。
そして、前記算出する手段は、取得された前記複数の画像から前記被写体の三次元モデルを生成する手段と、生成された前記三次元モデルの代表点の座標を、実空間における前記被写体の位置を表す座標として算出する手段と、を含み、前記代表点の座標を算出する手段は、生成された前記三次元モデルを空間内で統計処理するに際して、被写体の三次元モデルに含まれる全てのボクセルの重心の座標をモデル代表点の座標として算出する。 One aspect of the present invention relates to an image processing apparatus. This image processing device calculates a means for acquiring a plurality of images obtained by imaging a subject in the real space from a plurality of viewpoints and a coordinate representing the position of the subject in the real space from the acquired plurality of images. By arranging the billboard of the subject generated from at least one of the acquired multiple images with reference to the calculated coordinates, it corresponds to the viewpoints not included in the plurality of viewpoints. A means for generating a composite image to be used is provided.
Then, the means for calculating the means for generating the three-dimensional model of the subject from the acquired plurality of images and the coordinates of the generated representative points of the three-dimensional model are used to determine the position of the subject in the real space. The means for calculating the coordinates of the representative point, including the means for calculating the coordinates to be represented, are the means for calculating the coordinates of the representative point of all the voxels included in the three-dimensional model of the subject when statistically processing the generated three-dimensional model in space. The coordinates of the center of gravity are calculated as the coordinates of the model representative point.

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above components and those in which the components and expressions of the present invention are mutually replaced between devices, methods, systems, computer programs, recording media storing computer programs, etc. are also the present invention. It is effective as an aspect of.

本発明によれば、ビルボードを用いたより質の高い自由視点映像の生成が可能となる。 According to the present invention, it is possible to generate a higher quality free-viewpoint image using a billboard.

従来のビルボード方式におけるビルボードの配置を示す模式図である。It is a schematic diagram which shows the arrangement of the billboard in the conventional billboard system. 実施の形態に係る画像処理装置を備える自由視点画像配信システムを示す模式図である。It is a schematic diagram which shows the free viewpoint image distribution system which includes the image processing apparatus which concerns on embodiment. 図２の画像処理装置の機能および構成を示すブロック図である。It is a block diagram which shows the function and structure of the image processing apparatus of FIG. カメラの画像平面上の座標とフィールド座標との対応関係を示す説明図である。It is explanatory drawing which shows the correspondence relationship between the coordinates on the image plane of a camera and the field coordinates. 図５（ａ）〜（ｃ）は、図３の背景差分部における処理の例を示す説明図である。5 (a) to 5 (c) are explanatory views showing an example of processing in the background subtraction portion of FIG. 図３の三次元処理部によって生成される三次元モデルおよびそのモデル代表点を示す模式図である。It is a schematic diagram which shows the 3D model generated by the 3D processing part of FIG. 3 and the model representative point. ビルボード基準面を示す模式図である。It is a schematic diagram which shows the billboard reference plane. 図８（ａ）〜（ｃ）は、図３のビルボード生成部におけるビルボードの生成処理を説明するための模式図である。8 (a) to 8 (c) are schematic views for explaining the billboard generation process in the billboard generation unit of FIG. 図２の画像処理装置における一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processing in the image processing apparatus of FIG. 複数のカメラで撮影された複数の画像からビルボードを配置すべき座標を決定する変形例に係る方法の説明図である。It is explanatory drawing of the method concerning the modification which determines the coordinates which should arrange a billboard from a plurality of images taken by a plurality of cameras.

以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 Hereinafter, the same or equivalent components, members, and processes shown in the drawings shall be designated by the same reference numerals, and redundant description will be omitted as appropriate. In addition, some of the members that are not important for explanation are omitted in each drawing.

従来のビルボード方式では、ビルボードの最下点（例えば、人物の足先）が仮想空間の地面（以下、フィールドと称す）に接するように配置される。本発明者は、この制約に伴う以下の課題を独自に認識した。 In the conventional billboard method, the lowest point of the billboard (for example, the toes of a person) is arranged so as to be in contact with the ground (hereinafter referred to as a field) in the virtual space. The present inventor independently recognized the following problems associated with this limitation.

図１は、従来のビルボード方式におけるビルボード１０の配置を示す模式図である。カメラ１４は被写体である人物１２がフィールドから高くジャンプしたところを撮影（撮像ともいう）する。カメラ１４から得られた画像から人物１２のビルボード１０が生成されるが、ビルボード１０の最下点はフィールドに接しなければならないという制約のため、生成されたビルボード１０は人物１２の実際の位置１６ではなく、カメラ１４の視線２０とフィールドとの交点１８に配置される。この場合、仮想視点からみたビルボード１０は本来の人物１２の位置１６とは全く異なる座標に配置されるので、合成映像品質が低下しうる。このような課題は、例えばバレーボールやバスケットボールなどのようにカメラの近くで人物が高くジャンプする（以下、被写体がフィールドを離れる行為を空中移動と称す）シーンで頻繁に生じうる。 FIG. 1 is a schematic view showing the arrangement of billboards 10 in the conventional billboard system. The camera 14 captures (also referred to as imaging) a place where the subject 12 jumps high from the field. The billboard 10 of the person 12 is generated from the image obtained from the camera 14, but the generated billboard 10 is actually the person 12 due to the restriction that the lowest point of the billboard 10 must touch the field. It is arranged at the intersection 18 between the line of sight 20 of the camera 14 and the field, not at the position 16. In this case, since the billboard 10 viewed from the virtual viewpoint is arranged at coordinates completely different from the position 16 of the original person 12, the quality of the composite video may deteriorate. Such a problem can frequently occur in a scene where a person jumps high near a camera (hereinafter, the act of the subject leaving the field is referred to as aerial movement) such as volleyball or basketball.

従来のビルボード方式では、仮想視点からの映像を生成する際に、１つの固定カメラからの対応する映像から切り出された被写体のビルボードを三次元空間内に配置する。しかしながら、フィールドに接しているなどの前提条件がなければ１つの固定カメラからの映像から被写体の真の位置を特定することは困難である。また、バレーボールやバスケットボールの撮影などの比較的近距離で行われる撮影では、画面内において人物の空中移動が占める割合が大きいため、映像品質の低下につながりやすい。 In the conventional billboard method, when generating an image from a virtual viewpoint, a billboard of a subject cut out from a corresponding image from one fixed camera is arranged in a three-dimensional space. However, it is difficult to identify the true position of the subject from the image from one fixed camera unless there is a precondition such as being in contact with the field. Further, in shooting performed at a relatively short distance such as shooting volleyball or basketball, the aerial movement of a person occupies a large proportion on the screen, which tends to lead to deterioration of image quality.

これに対して、実施の形態に係る画像処理装置は、複数の撮影装置（例えば、カメラ）から得られる複数の画像から実空間における被写体の位置を推定する。推定される位置はフィールド上に限られず、空中であってもよい。画像処理装置は、推定結果を参照してビルボードの配置を行うことで、空中移動を伴う映像にビルボード方式の自由視点映像技術を適用した場合により自然な表示を可能とする。被写体の位置の推定は、例えば厚みのある被写体の三次元モデルを生成することにより行われる。 On the other hand, the image processing device according to the embodiment estimates the position of the subject in the real space from a plurality of images obtained from a plurality of photographing devices (for example, a camera). The estimated position is not limited to the field and may be in the air. By arranging the billboards with reference to the estimation results, the image processing device enables a more natural display when the billboard-type free-viewpoint video technology is applied to the video accompanied by aerial movement. The position of the subject is estimated, for example, by generating a three-dimensional model of a thick subject.

図２は、実施の形態に係る画像処理装置２００を備える自由視点画像配信システム１１０を示す模式図である。自由視点画像配信システム１１０は、複数のカメラ１１６、１１８、１２０と、それらのカメラと接続された画像処理装置２００と、携帯電話やタブレットやスマートフォンやＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）などの携帯端末１１４と、を備える。画像処理装置２００と携帯端末１１４とはインターネットなどのネットワーク１１２を介して接続される。自由視点画像配信システム１１０では、例えばアリーナ内に配置された複数のカメラ１１６、１１８、１２０がフィールド１２６に立つか高くジャンプするバレーボールの選手１２４を撮影する。複数のカメラ１１６、１１８、１２０は撮った映像を画像処理装置２００に送信し、画像処理装置２００はそれらの映像を処理する。携帯端末１１４のユーザは画像処理装置２００に対して希望の視点を指定し、画像処理装置２００は指定された視点（仮想視点）から選手１２４を見た場合の画像を合成し、ネットワーク１１２を介して携帯端末１１４に配信する。 FIG. 2 is a schematic view showing a free-viewpoint image distribution system 110 including the image processing device 200 according to the embodiment. The free-viewpoint image distribution system 110 includes a plurality of cameras 116, 118, 120, an image processing device 200 connected to these cameras, and a mobile terminal 114 such as a mobile phone, a tablet, a smartphone, or an HMD (Head Mounted Display). , Equipped with. The image processing device 200 and the mobile terminal 114 are connected to each other via a network 112 such as the Internet. In the free-viewpoint image distribution system 110, for example, a plurality of cameras 116, 118, 120 arranged in the arena photograph a volleyball player 124 standing on the field 126 or jumping high. The plurality of cameras 116, 118, 120 transmit the captured images to the image processing device 200, and the image processing device 200 processes the images. The user of the mobile terminal 114 specifies a desired viewpoint to the image processing device 200, and the image processing device 200 synthesizes an image when the player 124 is viewed from the designated viewpoint (virtual viewpoint), and via the network 112. And deliver it to the mobile terminal 114.

なお、図２ではアリーナ内のバレーボールの選手１２４を撮影する場合を説明したが、これに限られず、例えばフィットネスのインストラクタを撮影する場合やテニスの試合を撮影する場合やサッカーの試合を撮影する場合などの、空中移動を行いうる被写体を撮影する場合に、本実施の形態の技術的思想を適用できる。また、スポーツのシーンを撮影する場合に限られず、複数のカメラから得られる複数の映像に同じ被写体が撮影されうるアプリケーションであれば広く本実施の形態の技術的思想を適用できる。また、携帯端末１１４の代わりに、デスクトップＰＣやラップトップＰＣ、ＴＶ受像機等の据え置き型端末が使用されてもよい。 In addition, although FIG. 2 has described the case of shooting the volleyball player 124 in the arena, the case is not limited to this, for example, when shooting a fitness instructor, when shooting a tennis match, or when shooting a soccer match. The technical idea of the present embodiment can be applied to a subject capable of moving in the air, such as. Further, the technical idea of the present embodiment can be widely applied to an application in which the same subject can be photographed in a plurality of images obtained from a plurality of cameras, not limited to the case of photographing a sports scene. Further, instead of the mobile terminal 114, a stationary terminal such as a desktop PC, a laptop PC, or a TV receiver may be used.

図３は、実施の形態に係る画像処理装置２００の機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウエア的には、コンピュータのＣＰＵ（Central Processing Unit）をはじめとする素子や機械装置で実現でき、ソフトウエア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウエア、ソフトウエアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 3 is a block diagram showing the functions and configurations of the image processing apparatus 200 according to the embodiment. Each block shown here can be realized by elements such as the CPU (Central Processing Unit) of a computer or a mechanical device in terms of hardware, and can be realized by a computer program or the like in terms of software. It depicts a functional block realized by their cooperation. Therefore, it will be understood by those skilled in the art who have referred to this specification that these functional blocks can be realized in various forms by combining hardware and software.

画像処理装置２００は、複数のカメラ１１６、１１８、１２０にて撮影された画像から任意の仮想視点の画像を合成する。画像処理装置２００は、同一の被写体を撮影した複数のカメラ映像を基に、ビルボード方式に則った自由視点映像の生成を行う。従来のビルボード方式ではビルボードの生成の際にひとつのカメラからの映像のみを利用していたが、本実施の形態に係る画像処理装置２００では複数のカメラ映像から算出される三次元モデルの座標データを利用する。 The image processing device 200 synthesizes an image of an arbitrary virtual viewpoint from images taken by a plurality of cameras 116, 118, 120. The image processing device 200 generates a free-viewpoint image according to the billboard method based on a plurality of camera images of the same subject. In the conventional billboard method, only images from one camera are used when generating a billboard, but in the image processing device 200 according to the present embodiment, a three-dimensional model calculated from a plurality of camera images is used. Use coordinate data.

以下では、複数のカメラ１１６、１１８、１２０の間での時刻同期は事前に行われているとする。また、以下では被写体として人物を想定するがその他の被写体についても本実施の形態の技術的思想を適用可能である。また、仮想視点は例えばユーザが任意に指定可能な仮想的な視点であり、複数のカメラ１１６、１１８、１２０が配置される実際の視点とは意義が異なる。 In the following, it is assumed that the time synchronization between the plurality of cameras 116, 118, 120 is performed in advance. Further, although a person is assumed as a subject in the following, the technical idea of the present embodiment can be applied to other subjects. Further, the virtual viewpoint is, for example, a virtual viewpoint that can be arbitrarily specified by the user, and has a different meaning from the actual viewpoint in which a plurality of cameras 116, 118, 120 are arranged.

［画像処理装置２００の概要］
画像処理装置２００は、キャリブレーション部２０２と、背景差分部２０４と、三次元処理部２０６と、基準面算出部２０８と、ビルボード生成部２１０と、被投影点算出部２１２と、自由視点映像生成部２１４と、を備える。キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０で撮影した複数の映像を入力として受ける。キャリブレーション部２０２は、カメラごとに実空間のフィールドとカメラで撮影された画像との対応付けを取り、キャリブレーションデータとして出力する。カメラが固定されていることを前提とした場合、キャリブレーション部２０２におけるこのキャリブレーション操作は最初に一度行うのみでよい。背景差分部２０４は公知の背景差分法を用いて画像を背景と前景とに分類し、２値化した画像をマスクデータとして出力する。 [Overview of Image Processing Device 200]
The image processing device 200 includes a calibration unit 202, a background subtraction unit 204, a three-dimensional processing unit 206, a reference plane calculation unit 208, a billboard generation unit 210, a projected point calculation unit 212, and a free viewpoint image. It includes a generation unit 214. The calibration unit 202 receives a plurality of images captured by the plurality of cameras 116, 118, 120 as inputs. The calibration unit 202 associates the field in the real space with the image taken by the camera for each camera, and outputs it as calibration data. Assuming that the camera is fixed, this calibration operation in the calibration unit 202 only needs to be performed once at the beginning. The background subtraction unit 204 classifies an image into a background and a foreground using a known background subtraction method, and outputs a binarized image as mask data.

三次元処理部２０６は複数のカメラ１１６、１１８、１２０で撮影された複数の映像から得られる複数のマスクデータから被写体の三次元モデルを生成する。三次元処理部２０６は、生成された三次元モデルから実空間における被写体の位置を推定する。三次元処理部２０６は、推定の結果得られる座標を出力する。三次元処理部２０６は、三次元モデル構築部２１６と、モデル代表点算出部２１８と、平滑化部２２０と、を含む。 The three-dimensional processing unit 206 generates a three-dimensional model of the subject from a plurality of mask data obtained from a plurality of images taken by the plurality of cameras 116, 118, 120. The three-dimensional processing unit 206 estimates the position of the subject in the real space from the generated three-dimensional model. The three-dimensional processing unit 206 outputs the coordinates obtained as a result of the estimation. The three-dimensional processing unit 206 includes a three-dimensional model construction unit 216, a model representative point calculation unit 218, and a smoothing unit 220.

三次元モデル構築部２１６は、背景差分部２０４により生成された前景マスクとキャリブレーション部２０２により生成されたキャリブレーションデータとを用いて、ビルボードとは異なる厚みのある三次元モデルを生成する。モデル代表点算出部２１８は、三次元モデルを表す代表的な（例えば、人物の姿勢変化に頑健な）点であるモデル代表点の座標を算出する。平滑化部２２０は、モデル代表点算出部２１８により算出されたモデル代表点の座標を時間軸方向に平滑化する。 The three-dimensional model building unit 216 generates a three-dimensional model having a thickness different from that of the billboard by using the foreground mask generated by the background subtraction unit 204 and the calibration data generated by the calibration unit 202. The model representative point calculation unit 218 calculates the coordinates of the model representative point, which is a representative point (for example, robust to a person's posture change) representing the three-dimensional model. The smoothing unit 220 smoothes the coordinates of the model representative points calculated by the model representative point calculation unit 218 in the time axis direction.

基準面算出部２０８は、ビルボードがその面内に配置される平面であるビルボード基準面を設定する。基準面算出部２０８は、モデル代表点を含むようビルボード基準面を設定する。被投影点算出部２１２は、ビルボードの生成時に参照される画像平面内の座標を算出する。ビルボード生成部２１０は、被写体のビルボードを生成し出力する。自由視点映像生成部２１４は、ビルボード生成部２１０から出力されたビルボードとビルボード基準面とモデル代表点とを用いて自由視点映像を生成し、出力する。 The reference plane calculation unit 208 sets a billboard reference plane, which is a plane on which the billboard is arranged. The reference plane calculation unit 208 sets the billboard reference plane so as to include the model representative points. The projected point calculation unit 212 calculates the coordinates in the image plane referred to when the billboard is generated. The billboard generation unit 210 generates and outputs a billboard of the subject. The free viewpoint image generation unit 214 generates and outputs a free viewpoint image using the billboard, the billboard reference plane, and the model representative point output from the billboard generation unit 210.

[キャリブレーション部２０２]
キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０から複数の映像を取得する。キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０のそれぞれについて、ある時刻において撮影された画像中のフィールドの特徴的な点（コートの白線の交点など）と実際の実空間内のフィールド上の点との対応付けを行い、カメラパラメータとして算出する。例えば、一般的なスポーツの試合を撮影する場合、コートのサイズは規格化されているため、キャリブレーション部２０２は画像平面上の点が実空間内（世界座標系）のどの座標に対応するかを計算することが可能である。カメラパラメータは外部パラメータと内部パラメータとを含む。外部パラメータは実空間とカメラとの関係を示すパラメータであり、例えばカメラの位置（視点の位置）やカメラの姿勢（回転など）を表すパラメータを含む。内部パラメータはカメラ固有のパラメータであり、例えばレンズ歪みを含む。 [Calibration unit 202]
The calibration unit 202 acquires a plurality of images from the plurality of cameras 116, 118, 120. The calibration unit 202 describes the characteristic points of the field in the image taken at a certain time (such as the intersection of the white lines of the court) and the field in the actual real space for each of the plurality of cameras 116, 118, 120. It is calculated as a camera parameter by associating with the point of. For example, when shooting a general sports game, since the size of the court is standardized, the calibration unit 202 indicates which coordinates in the real space (world coordinate system) the points on the image plane correspond to. Can be calculated. Camera parameters include external and internal parameters. The external parameter is a parameter indicating the relationship between the real space and the camera, and includes, for example, a parameter indicating the position of the camera (position of the viewpoint) and the posture of the camera (rotation, etc.). Internal parameters are camera-specific parameters and include, for example, lens distortion.

図４は、カメラ４０２の画像平面上の座標とフィールド座標との対応関係を示す説明図である。カメラ４０２の２次元画像平面上の座標を（ｕ、ｖ）、世界座標系のフィールド平面上の座標（フィールド座標）を（ｘ’、ｙ’）としたときに、両者の対応関係はホモグラフィ行列
とスカラー値ｓとを用いて次の通りに表すことができる。
…（式１） FIG. 4 is an explanatory diagram showing the correspondence between the coordinates on the image plane of the camera 402 and the field coordinates. When the coordinates on the two-dimensional image plane of the camera 402 are (u, v) and the coordinates (field coordinates) on the field plane of the world coordinate system are (x', y'), the correspondence between the two is homography. queue
And the scalar value s can be expressed as follows.
… (Equation 1)

式１に上記の対応点の組を入力することでｓおよびＨを求めることが可能となり、画像平面上の任意の画素の座標とフィールド座標との相互変換が可能となる。なお、フィールドに対するカメラキャリブレーションの手法は上記のものに限られない。 By inputting the above set of corresponding points into Equation 1, s and H can be obtained, and mutual conversion between the coordinates of arbitrary pixels on the image plane and the field coordinates becomes possible. The camera calibration method for the field is not limited to the above.

図３に戻り、キャリブレーション部２０２におけるカメラのキャリブレーションは、手動のほか、公知の自動キャリブレーションに関する技術を用いて行われてもよい。手動の方法としては、例えば画面上の白線の交点をユーザ操作により選択し、あらかじめ測定されたフィールドモデルとの対応付けをとることで、カメラのパラメータを推定する手法がある。なお、画面に歪みがある場合は下記の通り先に内部パラメータを推定しておく。同様の操作を自動で行う方法の一例としては、閾値処理などを用いて上記画面内の白線を抽出し、ハフ変換による直線成分の抽出を施すことで交点の画面内の座標を推定する方法などがある。 Returning to FIG. 3, the calibration of the camera in the calibration unit 202 may be performed manually or by using a known technique related to automatic calibration. As a manual method, for example, there is a method of estimating camera parameters by selecting the intersection of white lines on the screen by user operation and associating it with a field model measured in advance. If the screen is distorted, estimate the internal parameters first as shown below. As an example of a method of automatically performing the same operation, a method of extracting the white line in the above screen by using threshold processing or the like and estimating the coordinates in the screen of the intersection by extracting the linear component by the Hough transform. There is.

一方で、魚眼レンズなどの広角なレンズを備えるカメラを撮影に用いる場合は、キャリブレーション部２０２はカメラの内部パラメータを個別に推定し、画面の歪みを補正する。この推定は、予め撮影に用いるカメラにてチェッカーボードなどの幾何模様を撮影することにより実行されてもよい。固定されたカメラでの撮影を前提とした場合、キャリブレーション部２０２はカメラのキャリブレーションを映像生成の最初に一度行えばよい。また、移動するカメラでの撮影を前提とした場合、キャリブレーション部２０２は上述の公知の自動キャリブレーションをフレーム毎に行う。キャリブレーション部２０２は、算出されたカメラパラメータを含むキャリブレーションデータを生成し、出力する。 On the other hand, when a camera equipped with a wide-angle lens such as a fisheye lens is used for photographing, the calibration unit 202 individually estimates the internal parameters of the camera and corrects the distortion of the screen. This estimation may be performed by photographing a geometric pattern such as a checkerboard with a camera used for photographing in advance. Assuming shooting with a fixed camera, the calibration unit 202 may calibrate the camera once at the beginning of image generation. Further, assuming shooting with a moving camera, the calibration unit 202 performs the above-mentioned known automatic calibration for each frame. The calibration unit 202 generates and outputs calibration data including the calculated camera parameters.

以下、ある時刻ｔに複数のカメラ１１６、１１８、１２０のそれぞれにより撮影された画像に対する処理を説明する。
[背景差分部２０４]
背景差分部２０４は、各カメラからのある時刻ｔの画像の各画素を背景と前景との２つに分類することで、該画像を背景と前景とに分ける。本実施の形態では、この分離は、例えば公知の背景差分法を使用して実現されてもよい。背景差分部２０４は、背景、前景とされた画素の値にそれぞれ０、１などの２値を割り当てることで前景マスクを生成する。この背景と前景との分離を行うことによって、被写体を含むおおまかな領域を抽出することができる。 Hereinafter, processing for images taken by each of the plurality of cameras 116, 118, and 120 at a certain time t will be described.
[Background subtraction part 204]
The background subtraction unit 204 classifies each pixel of an image at a certain time t from each camera into a background and a foreground, thereby dividing the image into a background and a foreground. In this embodiment, this separation may be achieved using, for example, a known background subtraction method. The background subtraction unit 204 generates a foreground mask by assigning two values such as 0 and 1 to the values of the pixels used as the background and the foreground, respectively. By separating the background from the foreground, a rough area including the subject can be extracted.

図５（ａ）〜（ｃ）は、背景差分部２０４における処理の例を示す説明図である。図５（ａ）はあるカメラにより撮影された時刻ｔの画像５０２を示す。背景差分部２０４はこの画像５０２を原画像として処理する。図５（ｂ）は図５（ａ）の原画像に対して背景差分法を適用した結果得られる前景マスク５０４を示す。前景マスク５０４において、黒色の部分は背景と判定され、０が割り当てられている。白色の部分は前景と判定され、１が割り当てられている。図５（ｃ）は図５（ａ）の原画像と図５（ｂ）の前景マスク５０４とから得られる人物５０６のテクスチャを示す。 5 (a) to 5 (c) are explanatory views showing an example of processing in the background subtraction unit 204. FIG. 5A shows an image 502 at time t taken by a camera. The background subtraction unit 204 processes this image 502 as an original image. FIG. 5B shows a foreground mask 504 obtained as a result of applying the background subtraction method to the original image of FIG. 5A. In the foreground mask 504, the black portion is determined to be the background and 0 is assigned. The white part is determined to be the foreground and 1 is assigned. 5 (c) shows the texture of the person 506 obtained from the original image of FIG. 5 (a) and the foreground mask 504 of FIG. 5 (b).

［三次元モデル構築部２１６］
図３に戻り、三次元モデル構築部２１６は、キャリブレーション部２０２により生成されたキャリブレーションデータおよび背景差分部２０４により生成された前景マスクを取得する。三次元モデル構築部２１６は、取得された情報を用いて、実空間を模した仮想三次元空間内で被写体の形を概略的に表す三次元モデルを構築する。被写体が人物である場合は、三次元モデルは人物の身体の概形を表してもよい。この三次元モデルはポリゴンメッシュモデルやボクセルモデルなどで表現される。なお、三次元モデルはビルボードモデルとは異なり、厚みを有する。 [3D model construction unit 216]
Returning to FIG. 3, the three-dimensional model construction unit 216 acquires the calibration data generated by the calibration unit 202 and the foreground mask generated by the background subtraction unit 204. The three-dimensional model construction unit 216 constructs a three-dimensional model that roughly represents the shape of the subject in a virtual three-dimensional space that imitates the real space, using the acquired information. If the subject is a person, the 3D model may represent the outline of the person's body. This three-dimensional model is represented by a polygon mesh model or a voxel model. The three-dimensional model has a thickness unlike the billboard model.

複数のカメラに由来する複数の前景マスクから三次元モデルを抽出するために、公知の視体積交差法が用いられてもよい。この手法では、三次元空間内にボクセル空間と呼ばれる縦・横・奥行きのそれぞれの方向に均一に分割された立方体群から成る空間を定義する。各カメラに由来する前景マスクのシルエットをそのボクセル空間に射影することで、実際の被写体の立体的な概形を得る。ｉ番目のカメラに由来する前景マスクをＭ_ｉ ^ｔとし、三次元モデルをｉ番目のカメラの画像平面に射影することで得られるシルエットマスクをＮ_ｉ ^ｔとした場合、理論的には
となる。 A known visual volume crossing method may be used to extract a 3D model from a plurality of foreground masks derived from a plurality of cameras. In this method, a space called a voxel space, which consists of a group of cubes uniformly divided in each of the vertical, horizontal, and depth directions, is defined in the three-dimensional space. By projecting the silhouette of the foreground mask derived from each camera onto the voxel space, a three-dimensional outline of the actual subject is obtained. The foreground mask from the i-th camera and M _i ^t, if the silhouette mask obtained by projecting the three-dimensional model to the image plane of the i-th camera and N _i ^t, in theory
Will be.

各カメラに由来する前景マスクには一般に、ノイズによって誤検出された前景が含まれる。しかしながら、上記の手法ではボクセル空間に射影された複数の前景マスクの情報を統合するため、このような過剰な前景検出を軽減することができる。したがって、以降の処理において、三次元モデルをカメラの画像平面に射影することで得られるシルエットマスクを前景マスクとして利用してもよい。 The foreground mask derived from each camera generally includes a foreground that is falsely detected by noise. However, in the above method, since the information of the plurality of foreground masks projected on the voxel space is integrated, such excessive foreground detection can be reduced. Therefore, in the subsequent processing, the silhouette mask obtained by projecting the three-dimensional model onto the image plane of the camera may be used as the foreground mask.

［モデル代表点算出部２１８］
モデル代表点算出部２１８は、三次元モデル構築部２１６によって生成された被写体の三次元モデル（ボクセルデータ）の位置を表すモデル代表点の座標を、実空間における被写体の位置を表す座標として算出する。モデル代表点は三次元モデルの位置を表す点であればあらゆる点を採用することができる。しかしながら、時間軸方向の動きの繋がりも含めた映像品質の観点から、被写体（例えば、人や動物）の姿勢の変化に大きく依存しない点が採用されることが望ましい。すなわち、手先や足先は人や動物の姿勢によって急激に変化する蓋然性が高いため、体幹や腰などの骨格に含まれる点をモデル代表点として採用することが望ましい。 [Model representative point calculation unit 218]
The model representative point calculation unit 218 calculates the coordinates of the model representative points representing the positions of the three-dimensional model (voxel data) of the subject generated by the three-dimensional model construction unit 216 as the coordinates representing the position of the subject in the real space. .. As the model representative point, any point can be adopted as long as it represents the position of the three-dimensional model. However, from the viewpoint of image quality including the connection of movements in the time axis direction, it is desirable to adopt a point that does not greatly depend on changes in the posture of the subject (for example, a person or an animal). That is, since it is highly probable that the hands and feet change rapidly depending on the posture of a person or animal, it is desirable to adopt the points included in the skeleton such as the trunk and hips as model representative points.

あるいはまた、モデル代表点算出部２１８は、生成された三次元モデルをボクセル空間内で統計処理することでモデル代表点を決定し、その座標を算出してもよい。例えば、モデル代表点算出部２１８は、被写体の三次元モデルに含まれる全てのボクセルの重心の座標をモデル代表点の座標として算出してもよい。この場合、被写体の姿勢の変化に頑健なモデル代表点を算出することができる。または、被写体の姿勢の変化に頑健であるという性質が得られる限り、他の統計処理によりモデル代表点の座標を求めてもよい。例えば、三次元モデルをｘｙ平面に平行な平面で順次切断していき、切断面に含まれるボクセルが最多となる平面を最多ｘｙ平面として特定する。ｙｚ平面、ｚｘ平面についても同様に切断面に含まれるボクセルが最多となる最多ｙｚ平面、最多ｚｘ平面をそれぞれ特定する。モデル代表点算出部２１８は、最多ｘｙ平面と最多ｙｚ平面と最多ｚｘ平面とが交わる交点をモデル代表点として決定してもよい。また、より精緻な方法としては、モデル代表点算出部２１８は人物の部位ごとに追跡を行い三次元モデルのボーンを生成し、生成されたボーンのうち腰や背骨に当たる点をモデル代表点として決定してもよい。またはモデル代表点算出部２１８は画像から機械学習により人物の腰の位置を推定し、推定により得られた点をモデル代表点として決定してもよい。 Alternatively, the model representative point calculation unit 218 may determine the model representative point by statistically processing the generated three-dimensional model in the voxel space and calculate the coordinates thereof. For example, the model representative point calculation unit 218 may calculate the coordinates of the centers of gravity of all voxels included in the three-dimensional model of the subject as the coordinates of the model representative points. In this case, it is possible to calculate a model representative point that is robust against changes in the posture of the subject. Alternatively, the coordinates of the model representative point may be obtained by other statistical processing as long as the property of being robust to changes in the posture of the subject can be obtained. For example, the three-dimensional model is sequentially cut in a plane parallel to the xy plane, and the plane having the largest number of voxels contained in the cut plane is specified as the most xy plane. Similarly, for the yz plane and the zx plane, the maximum yz plane and the maximum zx plane having the largest number of voxels contained in the cut plane are specified. The model representative point calculation unit 218 may determine the intersection of the most xy plane, the most yz plane, and the most zx plane as the model representative point. In addition, as a more elaborate method, the model representative point calculation unit 218 tracks each part of the person to generate bones of a three-dimensional model, and determines the points corresponding to the waist and spine among the generated bones as model representative points. You may. Alternatively, the model representative point calculation unit 218 may estimate the position of the waist of a person by machine learning from the image, and determine the point obtained by the estimation as the model representative point.

［平滑化部２２０］
平滑化部２２０は、三次元モデルのモデル代表点の座標を時間軸方向に統計処理する。例えば平滑化部２２０は、モデル代表点算出部２１８によって算出されたモデル代表点の座標を時間軸方向に平滑化する。モデル代表点の座標の算出時に、人物の姿勢の変化に対して頑健な算出方法を選択した場合でも、前景マスクや三次元モデルのノイズ等によって実際に映像化した際には不自然な動き（座標移動）になることが想定される。これを軽減するために平滑化部２２０において時間軸方向での座標の平滑化を行う。 [Smoothing unit 220]
The smoothing unit 220 statistically processes the coordinates of the model representative points of the three-dimensional model in the time axis direction. For example, the smoothing unit 220 smoothes the coordinates of the model representative points calculated by the model representative point calculation unit 218 in the time axis direction. Even if a robust calculation method is selected for changes in the posture of a person when calculating the coordinates of the model representative points, the movement will be unnatural when actually visualized due to the noise of the foreground mask or the 3D model ( (Coordinate movement) is expected. In order to alleviate this, the smoothing unit 220 smoothes the coordinates in the time axis direction.

例えば、平滑化部２２０は、現在のフレームの前後ｎフレームにおけるモデル代表点の座標に対して、ローパスフィルタを適用することで平滑化を行う。また、平滑化部２２０は、離散的にサンプリングされたモデル代表点を制御点とするスプライン曲線やＢ−スプライン曲線を用いて滑らかな軌跡を生成し、生成された軌跡に従うようにモデル代表点の座標を移動させてもよい。この場合、自然な移動を実現できる。その他、平滑化部２２０はモデル代表点の座標の時系列データにカルマンフィルタなどの時系列フィルタを適用してもよい。この場合、自然な移動を実現できる。以下、モデル代表点は平滑化部２２０により平滑化されたモデル代表点とする。 For example, the smoothing unit 220 smoothes the coordinates of the model representative points in the n frames before and after the current frame by applying a low-pass filter. Further, the smoothing unit 220 generates a smooth locus by using a spline curve or a B-spline curve whose control point is a discretely sampled model representative point, and follows the generated locus of the model representative point. The coordinates may be moved. In this case, natural movement can be realized. In addition, the smoothing unit 220 may apply a time-series filter such as a Kalman filter to the time-series data of the coordinates of the model representative points. In this case, natural movement can be realized. Hereinafter, the model representative point will be the model representative point smoothed by the smoothing unit 220.

図６は、三次元処理部２０６によって生成される三次元モデル６０２およびそのモデル代表点６０４を示す模式図である。三次元処理部２０６は、複数のカメラ６０６、６０８から得られる画像から被写体（フィールド６１０から高くジャンプした人物）の三次元モデル６０２を生成する。三次元処理部２０６は、三次元モデル６０２の重心の座標をモデル代表点６０４の座標として算出する。後述するとおり、被写体のビルボードはモデル代表点６０４を参照して配置される。 FIG. 6 is a schematic diagram showing a three-dimensional model 602 generated by the three-dimensional processing unit 206 and a model representative point 604 thereof. The three-dimensional processing unit 206 generates a three-dimensional model 602 of a subject (a person who jumps high from the field 610) from images obtained from a plurality of cameras 606 and 608. The three-dimensional processing unit 206 calculates the coordinates of the center of gravity of the three-dimensional model 602 as the coordinates of the model representative point 604. As will be described later, the billboard of the subject is arranged with reference to the model representative point 604.

［基準面算出部２０８］
図３に戻り、基準面算出部２０８は、算出されたモデル代表点の座標およびキャリブレーション部２０２により生成された外部パラメータ（例えば、カメラの仰角α）に基づいてビルボード基準面を決定する。図７は、ビルボード基準面７０２を示す模式図である。基準面算出部２０８は、モデル代表点７０４を含みフィールド７０６と角度θをなす面をビルボード基準面７０２として設定する。後述の自由視点映像生成部２１４は設定されたビルボード基準面７０２に重畳するようにビルボード７１０を配置する。したがって、θは被写体のビルボード７１０がフィールド７０６となす角度である。 [Reference plane calculation unit 208]
Returning to FIG. 3, the reference plane calculation unit 208 determines the billboard reference plane based on the calculated coordinates of the model representative point and the external parameters (for example, the elevation angle α of the camera) generated by the calibration unit 202. FIG. 7 is a schematic view showing a billboard reference surface 702. The reference plane calculation unit 208 sets a plane including the model representative point 704 and forming an angle θ with the field 706 as the billboard reference plane 702. The free viewpoint image generation unit 214, which will be described later, arranges the billboard 710 so as to overlap the set billboard reference surface 702. Therefore, θ is the angle formed by the subject billboard 710 with the field 706.

基準面算出部２０８は、キャリブレーション部２０２により生成された外部パラメータからカメラ７０８の仰角αを取得する。基準面算出部２０８は、カメラ７０８の仰角α（単位は度）について、θ＝９０（度）−αを計算することによりθを算出する。この場合、自然な表示を実現できる。外部パラメータは、世界座標系とカメラ座標系との間の絶対的な位置Ｔと姿勢Ｒ（回転行列）の関係を表すものであり、このうちＲのもつ成分を利用することで所望のαを算出することができる。カメラ７０８の仰角αはカメラ７０８の位置にも依存する。例えば、カメラ７０８がより高いところに設置されると、カメラ７０８の仰角αも、より大きくなる。 The reference plane calculation unit 208 acquires the elevation angle α of the camera 708 from the external parameters generated by the calibration unit 202. The reference plane calculation unit 208 calculates θ by calculating θ = 90 (degrees) −α with respect to the elevation angle α (unit is degree) of the camera 708. In this case, a natural display can be realized. The external parameter represents the relationship between the absolute position T and the attitude R (rotation matrix) between the world coordinate system and the camera coordinate system, and the desired α can be obtained by using the component of R. Can be calculated. The elevation angle α of the camera 708 also depends on the position of the camera 708. For example, when the camera 708 is installed at a higher position, the elevation angle α of the camera 708 also becomes larger.

例えば、バレーボールなどの比較的近い距離で撮影されるシーンの場合、カメラ７０８の仰角αに合わせて、ビルボード７１０がフィールド７０６となす角度θを変更して配置したほうが自然な見え方となる。人物の像は実際には斜め上から撮影されているからである。特に、遠近感の観点で好適である。すなわち、斜め上から近距離で人物を撮影すると、カメラに近い頭部が、カメラから遠い胴部よりも相対的に大きく写る。この画像を切り出してビルボードとし、フィールドに垂直に立たせた場合、フィールド上の仮想視点（視線はフィールドと平行）からそのビルボードを見ると、あたかもカメラ側に傾いているように見えて不自然である。そこで、カメラの仰角分だけビルボードを反対側に傾かせることで、頭部と胴部とのプロポーションについてより違和感の低減された表現が可能となる。 For example, in the case of a scene shot at a relatively short distance such as volleyball, it is more natural to arrange the billboard 710 by changing the angle θ formed with the field 706 according to the elevation angle α of the camera 708. This is because the image of the person is actually taken from diagonally above. In particular, it is suitable from the viewpoint of perspective. That is, when a person is photographed from diagonally above at a short distance, the head close to the camera appears relatively larger than the body far from the camera. If you cut out this image to make a billboard and stand it vertically on the field, when you look at the billboard from a virtual viewpoint on the field (the line of sight is parallel to the field), it looks as if it is tilted toward the camera, which is unnatural. Is. Therefore, by tilting the billboard to the opposite side by the elevation angle of the camera, it is possible to express the proportions of the head and the body with less discomfort.

［被投影点算出部２１２］
図３に戻り、被投影点算出部２１２は、算出されたモデル代表点の座標およびキャリブレーション部２０２によって生成されたカメラパラメータを用いて、ビルボード側の被投影点を決定する。従来のビルボード方式では、ビルボードの最下部の点を被投影点としてフィールドに投影していた。本実施の形態に係る画像処理装置２００では、三次元モデル側を基準としているため、フレームごとにビルボード内での被投影点が異なる。被投影点算出部２１２は、モデル代表点（の座標）をカメラの画像平面に投影することで被投影点を決定する。自由視点映像生成部２１４はビルボードを、その被投影点がビルボード基準面上かつモデル代表点と一致するように配置する。 [Projected point calculation unit 212]
Returning to FIG. 3, the projected point calculation unit 212 determines the projected point on the billboard side using the calculated coordinates of the model representative point and the camera parameters generated by the calibration unit 202. In the conventional billboard method, the lowest point of the billboard is projected onto the field as the projected point. In the image processing apparatus 200 according to the present embodiment, since the three-dimensional model side is used as a reference, the projected point in the billboard differs for each frame. The projected point calculation unit 212 determines the projected point by projecting (coordinates) the model representative point onto the image plane of the camera. The free viewpoint image generation unit 214 arranges the billboard so that its projected point is on the billboard reference plane and coincides with the model representative point.

［ビルボード生成部２１０］
ビルボード生成部２１０は、カメラからの画像と背景差分部２０４により生成された前景マスクと被投影点算出部２１２によって算出された被投影点とを用いてビルボードを生成する。図８（ａ）〜（ｃ）は、ビルボード生成部２１０におけるビルボードの生成処理を説明するための模式図である。図８（ａ）〜（ｃ）のそれぞれにおいて、カメラからの画像に前景マスクを適用して得られるテクスチャ８０２が示される。図８（ａ）に示されるように被投影点８０６が被写体の内部にありかつ被写体のマスク領域の外接矩形８０４に含まれる場合、ビルボード生成部２１０はその外接矩形８０４を切り出してビルボードとする。図８（ｂ）に示されるように被投影点８１０が被写体の外部にあるが依然として被写体のマスク領域の外接矩形８０４に含まれる場合、ビルボード生成部２１０はその外接矩形８０４を切り出してビルボードとする。一方、図８（ｃ）に示されるように被投影点８１４がマスク領域の外接矩形に含まれない場合、ビルボード生成部２１０は被投影点８１４とマスク領域とを含む最小の矩形領域８１２を切り出してビルボードとする。ここで、ビルボード生成部２１０は、マスク領域に含まれる画素についてはビルボードにおいても実画像の画素値を割り当て、その他の領域については画素値をもたない（透過扱い）ように設定する。 [Billboard generator 210]
The billboard generation unit 210 generates a billboard by using the image from the camera, the foreground mask generated by the background subtraction unit 204, and the projected point calculated by the projected point calculation unit 212. 8 (a) to 8 (c) are schematic views for explaining the billboard generation process in the billboard generation unit 210. In each of FIGS. 8A to 8C, the texture 802 obtained by applying the foreground mask to the image from the camera is shown. As shown in FIG. 8A, when the projected point 806 is inside the subject and is included in the circumscribed rectangle 804 of the mask area of the subject, the billboard generation unit 210 cuts out the circumscribed rectangle 804 to form a billboard. To do. As shown in FIG. 8B, when the projected point 810 is outside the subject but is still included in the circumscribed rectangle 804 of the mask area of the subject, the billboard generation unit 210 cuts out the circumscribed rectangle 804 and billboards. And. On the other hand, when the projected point 814 is not included in the circumscribing rectangle of the masked area as shown in FIG. 8C, the billboard generation unit 210 includes the smallest rectangular area 812 including the projected point 814 and the masked area. Cut it out and use it as a billboard. Here, the billboard generation unit 210 assigns the pixel value of the actual image to the pixel included in the mask area even in the billboard, and sets the other area so as not to have the pixel value (transparent treatment).

なお、図８（ｂ）、（ｃ）に示されるような、モデル代表点が三次元モデルの外部に設定される状況としては、例えば平滑化部２２０における時間軸方向の平滑化の結果、モデル代表点が三次元モデルからはみ出す状況が考えられる。また、ビルボードの被投影点をフィールド上に置くために、モデル代表点算出部２１８が三次元モデルの真下のフィールド上の点をモデル代表点として設定する状況では、ジャンプしている被写体の三次元モデルの下のほうにモデル代表点が設定される。 As a situation in which the model representative point is set outside the three-dimensional model as shown in FIGS. 8 (b) and 8 (c), for example, as a result of smoothing in the time axis direction in the smoothing unit 220, the model It is conceivable that the representative points extend beyond the 3D model. Further, in a situation where the model representative point calculation unit 218 sets a point on the field directly below the three-dimensional model as the model representative point in order to place the projected point of the billboard on the field, the tertiary of the jumping subject. The model representative point is set at the bottom of the original model.

［自由視点映像生成部２１４］
図３に戻り、自由視点映像生成部２１４は、キャリブレーション部２０２で生成されたカメラパラメータを用いて、ビルボード生成部２１０により生成されたビルボードをモデル代表点に配置することによって、仮想視点に対応する合成画像を生成する。自由視点映像生成部２１４は、ユーザにより指定された仮想視点の情報、例えば仮想視点の座標を取得する。この合成画像は自由視点映像の１フレームとなる。自由視点映像生成部２１４により配置されるビルボードはフィールドと垂直であるとは限らず、角度θを保つ。すなわち、ビルボードは、仮想視点の垂直方向の移動についてはフィールドとの角度θを保ち、水平方向の移動についてはフィールドに垂直な軸の周りで回転することで仮想視点に正対する。自由視点映像生成部２１４は、上記の処理をフレームごとに連続して行うことで、自由視点映像を生成する。 [Free viewpoint video generator 214]
Returning to FIG. 3, the free viewpoint image generation unit 214 uses the camera parameters generated by the calibration unit 202 to place the billboard generated by the billboard generation unit 210 at the model representative point, thereby arranging the virtual viewpoint. Generate a composite image corresponding to. The free viewpoint video generation unit 214 acquires the information of the virtual viewpoint specified by the user, for example, the coordinates of the virtual viewpoint. This composite image is one frame of the free viewpoint video. The billboard arranged by the free viewpoint image generation unit 214 is not always perpendicular to the field and maintains an angle θ. That is, the billboard faces the virtual viewpoint by maintaining an angle θ with the field for vertical movement of the virtual viewpoint and rotating around an axis perpendicular to the field for horizontal movement. The free viewpoint image generation unit 214 generates a free viewpoint image by continuously performing the above processing for each frame.

以上の構成による画像処理装置２００の動作を説明する。
図９は、画像処理装置２００における一連の処理の流れを示すフローチャートである。画像処理装置２００は、フィールド上の被写体の周りに設定された複数の視点のそれぞれに配置されたカメラから、被写体の像を含む画像を取得する（Ｓ９０２）。画像処理装置２００は、取得された画像のそれぞれに背景差分法を適用することで前景マスクを生成する（Ｓ９０６）。画像処理装置２００は、生成された複数の前景マスクから被写体の三次元モデルを生成する（Ｓ９０８）。画像処理装置２００は、生成された三次元モデルのモデル代表点の座標を算出する（Ｓ９１０）。画像処理装置２００は、ステップＳ９０２で取得された画像および対応する前景マスクを用いて被写体のビルボードを生成する（Ｓ９１２）。画像処理装置２００は、生成されたビルボードをステップＳ９１０で算出された座標に配置することによって、仮想視点から見た画像を合成する（Ｓ９１４）。 The operation of the image processing device 200 with the above configuration will be described.
FIG. 9 is a flowchart showing a flow of a series of processes in the image processing apparatus 200. The image processing device 200 acquires an image including an image of the subject from cameras arranged at each of a plurality of viewpoints set around the subject on the field (S902). The image processing device 200 generates a foreground mask by applying the background subtraction method to each of the acquired images (S906). The image processing device 200 generates a three-dimensional model of the subject from the generated plurality of foreground masks (S908). The image processing device 200 calculates the coordinates of the model representative points of the generated three-dimensional model (S910). The image processing apparatus 200 generates a billboard of the subject by using the image acquired in step S902 and the corresponding foreground mask (S912). The image processing device 200 synthesizes an image viewed from a virtual viewpoint by arranging the generated billboard at the coordinates calculated in step S910 (S914).

本明細書の記載に基づき、各部を、図示しないＣＰＵや、インストールされたアプリケーションプログラムのモジュールや、システムプログラムのモジュールや、ハードディスクから読み出したデータの内容を一時的に記憶する半導体メモリなどにより実現できることは本明細書に触れた当業者には理解される。 Based on the description of this specification, each part can be realized by a CPU (not shown), an installed application program module, a system program module, a semiconductor memory that temporarily stores the contents of data read from the hard disk, and the like. Is understood by those skilled in the art who have referred to this specification.

本実施の形態に係る画像処理装置２００によると、従来のビルボード方式で課せられる制約に起因する不自然な表示を解消することができる。本実施の形態は、被写体の概形を表す三次元モデル（厚みのあるモデル）を利用してビルボードが配置されるべき位置を決定するので、表示負荷が軽いというビルボードの利点を活かした上で、より自然な表示が可能となる。 According to the image processing apparatus 200 according to the present embodiment, it is possible to eliminate an unnatural display caused by the restrictions imposed by the conventional billboard method. In this embodiment, since the position where the billboard should be placed is determined using a three-dimensional model (thick model) representing the outline of the subject, the advantage of the billboard that the display load is light is utilized. Above, a more natural display is possible.

また、本実施の形態に係る画像処理装置２００によると、被写体の空中移動中の位置情報を被写体の姿勢に頑健な態様で推定するので、動画にした際に滑らかな移動を伴う表示を実現することができる。さらに、推定の結果得られる座標を時間軸方向に平滑化することで、より滑らかな表現が可能となる。 Further, according to the image processing device 200 according to the present embodiment, since the position information of the subject during aerial movement is estimated in a manner robust to the posture of the subject, it is possible to realize a display accompanied by smooth movement when making a moving image. be able to. Further, by smoothing the coordinates obtained as a result of the estimation in the time axis direction, a smoother expression becomes possible.

以上、実施の形態に係る画像処理装置２００の構成と動作について説明した。この実施の形態は例示であり、各構成要素や各処理の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解される。 The configuration and operation of the image processing device 200 according to the embodiment have been described above. This embodiment is an example, and it will be understood by those skilled in the art that various modifications are possible for each component and each combination of processes, and that such modifications are also within the scope of the present invention.

実施の形態では、被写体の三次元モデルを生成し、生成された三次元モデルを用いてビルボードが配置される座標を算出する場合について説明したが、これに限られず、複数のカメラから取得された複数の画像から、実空間における被写体の位置を表す座標を算出してもよい。 In the embodiment, a case where a three-dimensional model of the subject is generated and the coordinates where the billboard is arranged is calculated using the generated three-dimensional model has been described, but the present invention is not limited to this, and is acquired from a plurality of cameras. Coordinates representing the position of the subject in the real space may be calculated from the plurality of images.

図１０は、複数のカメラ１５０、１５２、１５４で撮影された複数の画像１５０ａ、１５２ａからビルボードを配置すべき座標を決定する変形例に係る方法の説明図である。本変形例に係る画像処理装置は、複数のカメラ１５０、１５２で撮影された複数の画像１５０ａ、１５２ａを取得する。画像処理装置は、第１カメラ１５０で撮影された第１画像１５０ａに写る被写体１５４の対象部位１５４ａの像１５０ｂを通る第１カメラ１５０の第１光線１５０ｃを特定する。画像処理装置は、第２カメラ１５２で撮影された第２画像１５２ａに写る被写体１５４の対象部位１５４ａの像１５２ｂを通る第２カメラ１５２の第２光線１５２ｃを特定する。画像処理装置は、同様に第３カメラ１５６の第３光線１５６ｃを特定する。 FIG. 10 is an explanatory diagram of a method according to a modified example in which coordinates for arranging a billboard are determined from a plurality of images 150a, 152a taken by a plurality of cameras 150, 152, 154. The image processing apparatus according to this modification acquires a plurality of images 150a and 152a captured by the plurality of cameras 150 and 152. The image processing device identifies the first ray 150c of the first camera 150 that passes through the image 150b of the target portion 154a of the subject 154 captured in the first image 150a taken by the first camera 150. The image processing device identifies the second ray 152c of the second camera 152 that passes through the image 152b of the target portion 154a of the subject 154 reflected in the second image 152a taken by the second camera 152. The image processing apparatus also identifies the third ray 156c of the third camera 156.

画像処理装置は、複数のカメラ１５０、１５２、１５６の光線１５０ｃ、１５２ｃ、１５６ｃと最も距離の近い空間座標を、対象部位１５４ａの座標とする。画像処理装置は、決定された対象部位１５４ａの座標を、被写体１５４のビルボードが配置されるべき座標として特定する。この場合、ビルボードを配置すべき座標を特定するために三次元モデルを生成する必要はなく、処理量を低減できる。 The image processing device uses the spatial coordinates closest to the light rays 150c, 152c, and 156c of the plurality of cameras 150, 152, and 156 as the coordinates of the target portion 154a. The image processing device specifies the coordinates of the determined target portion 154a as the coordinates on which the billboard of the subject 154 should be arranged. In this case, it is not necessary to generate a three-dimensional model in order to specify the coordinates where the billboard should be placed, and the amount of processing can be reduced.

１１０自由視点画像配信システム、１１２ネットワーク、１１４携帯端末、２００画像処理装置。 110 Free viewpoint image distribution system, 112 network, 114 mobile terminal, 200 image processing device.

Claims

A means of acquiring a plurality of images obtained by photographing a subject in a real space from a plurality of viewpoints,
A means for calculating coordinates representing the position of the subject in the real space from the acquired plurality of images, and
By arranging the billboard of the subject generated from at least one of the acquired plurality of images with reference to the calculated coordinates, it corresponds to the viewpoints not included in the plurality of viewpoints. A means for generating a composite image to be used, and
The means for calculating the above
A means for generating a three-dimensional model of the subject from the plurality of acquired images, and
Includes means for calculating the coordinates of the generated representative points of the three-dimensional model as coordinates representing the position of the subject in real space.
The means for calculating the coordinates of the representative point calculates the coordinates of the center of gravity of all voxels included in the three-dimensional model of the subject as the coordinates of the model representative point when statistically processing the generated three-dimensional model in space. Image processing device.

The image processing apparatus according to claim 1, wherein the means for calculating the coordinates of the representative point is statistically processing the coordinates of the generated representative point of the three-dimensional model in the time axis direction.

A means for generating a mask for separating the foreground and the background for each of the acquired plurality of images, and
A means for generating a billboard of the subject using the generated mask, and
With more
The image processing apparatus according to claim 1 or 2 , wherein the means for generating the three-dimensional model of the subject is to generate the three-dimensional model of the subject based on the generated mask.

The image processing according to any one of claims 1 to 3 , further comprising means for generating a billboard of the subject based on a point on the image plane of the at least one image corresponding to the calculated coordinates. apparatus.

The image processing apparatus according to any one of claims 1 to 4 , further comprising means for setting an angle formed by the billboard of the subject with the field according to the position of the viewpoint corresponding to the at least one image.

Acquiring multiple images obtained by capturing a subject in real space from multiple viewpoints,
From the acquired plurality of images, the coordinates representing the position of the subject in the real space are calculated, and
A three-dimensional model of the subject is generated from the acquired plurality of images, and is included in the three-dimensional model of the subject as the coordinates of the representative point of the three-dimensional model, which is the coordinates representing the position of the subject in the real space. To calculate the coordinates of the center of gravity of all voxels,
By arranging the billboard of the subject generated from at least one of the acquired plurality of images with reference to the calculated coordinates, it corresponds to the viewpoints not included in the plurality of viewpoints. To generate a composite image and
Image processing method including.

A function to acquire multiple images obtained by capturing a subject in real space from multiple viewpoints, and
A three-dimensional model of the subject is generated from the acquired plurality of images, and is included in the three-dimensional model of the subject as the coordinates of the representative point of the three-dimensional model, which is the coordinates representing the position of the subject in the real space. A function to calculate the coordinates of the center of gravity of all voxels,
By arranging the billboard of the subject generated from at least one of the acquired plurality of images with reference to the calculated coordinates, it corresponds to the viewpoints not included in the plurality of viewpoints. And the function to generate a composite image
A computer program to make a computer realize.