JP7494153B2

JP7494153B2 - Generation device, generation method, and program

Info

Publication number: JP7494153B2
Application number: JP2021146788A
Authority: JP
Inventors: 祥吾水野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-12-15
Filing date: 2021-09-09
Publication date: 2024-06-03
Anticipated expiration: 2041-09-09
Also published as: JP2022094907A

Description

本開示は、仮想視点画像を生成する技術に関するものである。 This disclosure relates to technology for generating virtual viewpoint images.

近年、撮影領域の周囲に複数の撮影装置を配置して撮影を行い、それぞれの撮影装置から取得された複数の撮影画像を用いて、指定された視点（仮想視点）から見た画像（仮想視点画像）を生成する技術が注目されている。この技術によれば、サッカーやラグビー等のスポーツ競技、コンサート及びダンス等を任意の視点から見るような画像を生成することができるため、ユーザに高臨場感を与えることができる。 In recent years, a technology that uses multiple image capture devices arranged around a capture area to capture images and generates an image (virtual viewpoint image) viewed from a specified viewpoint (virtual viewpoint) using the multiple captured images acquired from each of the capture devices has been attracting attention. This technology can generate images that allow users to view sporting events such as soccer and rugby, concerts, and dances from any viewpoint, providing a highly realistic experience to the user.

特許文献１では、複数のカメラが被写体を取り囲むように配置して被写体を撮影した画像を用いて、任意の仮想カメラ画像を生成、表示する技術が開示されている。特許文献１によれば、仮想カメラ画像生成の技術を用いた視聴コンテンツにおいて、撮影された演技者（被写体）のダンス、演技などをいろいろな角度から視聴することが可能となる。 Patent Document 1 discloses a technology for generating and displaying arbitrary virtual camera images using images of a subject captured by multiple cameras arranged to surround the subject. According to Patent Document 1, in viewing content using the technology for generating virtual camera images, it becomes possible to view the dance, performance, etc. of a filmed performer (subject) from various angles.

特開２００８－１５７５６号公報JP 2008-15756 A

しかしながら、例えばダンスや演技などにおいては、撮影時に、被写体である演者の位置が所望の位置と異なる位置となる場合がありうる。この結果、撮影画像に基づいて生成される仮想視点画像においても、演者の位置が所望の位置と異なる位置となってしまう。また、撮影時に被写体の位置がずれている場合に、指定される仮想視点の位置によっては、ユーザは仮想視点画像上で被写体の位置が所望の位置と異なる位置となっていることに気づかないこともありうる。 However, for example, in dancing or acting, the subject may be in a position different from the desired position during filming. As a result, the subject's position in the virtual viewpoint image generated based on the filmed image will also be different from the desired position. Furthermore, if the subject's position is misaligned during filming, depending on the specified virtual viewpoint position, the user may not notice that the subject's position in the virtual viewpoint image is different from the desired position.

本開示は上記の課題に鑑みてなされたものである。その目的は、被写体の位置が所望する位置と異なることに対応可能な仮想視点画像を生成することである。 This disclosure has been made in consideration of the above problems. Its purpose is to generate a virtual viewpoint image that can accommodate cases where the subject's position is different from the desired position.

本開示に係る生成装置は、複数の撮影装置が撮影領域における被写体を撮影することにより得られる複数の撮影画像に基づく前記被写体の形状を表す形状データを取得する取得手段と、前記撮影領域における複数の被写体の位置に対応する複数の被写体位置を特定する第１の特定手段と、前記第１の特定手段により特定される複数の被写体位置に基づいて、基準位置を特定する第２の特定手段と、前記取得手段により取得される形状データに基づいて、前記第１の特定手段により特定される被写体位置と前記第２の特定手段により特定される基準位置とのずれに応じた仮想視点画像を生成する生成手段とを有することを特徴とする。 The generating device according to the present disclosure is characterized in that it has an acquisition means for acquiring shape data representing the shape of a subject based on multiple captured images obtained by multiple imaging devices capturing images of the subject in a shooting area, a first identification means for identifying multiple subject positions corresponding to the positions of the multiple subjects in the shooting area, a second identification means for identifying a reference position based on the multiple subject positions identified by the first identification means, and a generating means for generating a virtual viewpoint image based on the shape data acquired by the acquisition means according to the deviation between the subject positions identified by the first identification means and the reference position identified by the second identification means.

本開示によれば、被写体の位置が所望する位置と異なることに対応可能な仮想視点画像を生成することができる。 According to the present disclosure, it is possible to generate a virtual viewpoint image that can accommodate the subject's position being different from the desired position.

画像処理システムの構成を説明するための図である。FIG. 1 is a diagram illustrating a configuration of an image processing system. 複数の撮影装置の設置の一例を示す図である。FIG. 1 is a diagram showing an example of installation of a plurality of image capturing devices. 画像生成装置のハードウェア構成を説明するための図である。FIG. 2 is a diagram illustrating a hardware configuration of an image generating apparatus. 第１の実施形態における画像生成装置の機能構成を説明するための図である。FIG. 2 is a diagram for explaining the functional configuration of an image generating apparatus according to the first embodiment. 第１の実施形態における画像生成装置が行う処理を説明するためのフローチャートである。4 is a flowchart for explaining a process performed by the image generating apparatus according to the first embodiment. 第１の実施形態における被写体の形状データの配置の一例を示す図である。5 is a diagram showing an example of an arrangement of shape data of a subject in the first embodiment; FIG. 第１の実施形態において生成される仮想視点画像の一例を示す図である。FIG. 4 is a diagram showing an example of a virtual viewpoint image generated in the first embodiment. 第２の実施形態における画像生成装置の機能構成を説明するための図である。FIG. 11 is a diagram for explaining the functional configuration of an image generating apparatus according to a second embodiment. 第２の実施形態における画像生成装置が行う処理を説明するためのフローチャートである。10 is a flowchart illustrating a process performed by an image generating apparatus according to a second embodiment. 第２の実施形態における被写体の形状データの配置の一例を示す図である。FIG. 11 is a diagram showing an example of an arrangement of shape data of a subject in the second embodiment. 第３の実施形態における画像生成装置の機能構成を説明するための図である。FIG. 13 is a diagram for explaining the functional configuration of an image generating apparatus according to a third embodiment. 第３の実施形態における画像生成装置が行う処理を説明するためのフローチャートである。13 is a flowchart illustrating a process performed by an image generating apparatus according to a third embodiment. 第３の実施形態における被写体の形状データの配置の一例を示す図である。FIG. 13 is a diagram showing an example of an arrangement of shape data of a subject in the third embodiment. 第４の実施形態における画像生成装置の機能構成を説明するための図である。FIG. 13 is a diagram for explaining the functional configuration of an image generating apparatus according to a fourth embodiment. 第４の実施形態における仮想カメラの設置の一例を示す図である。FIG. 13 is a diagram illustrating an example of installation of a virtual camera in the fourth embodiment. 撮影される被写体と、生成される仮想視点画像の一例を示す図である。1A and 1B are diagrams illustrating an example of a subject to be photographed and a virtual viewpoint image to be generated. 第１の実施形態における被写体の形状データの撮影領域外への配置の一例を示す図である。5 is a diagram showing an example of arrangement of object shape data outside a shooting area in the first embodiment; FIG.

以下、本開示の実施形態について、図面を参照しながら説明する。なお、以下の実施形態に記載される構成要素は、実施の形態の一例を示すものであり、本開示をそれらのみに限定するものではない。 Embodiments of the present disclosure will be described below with reference to the drawings. Note that the components described in the following embodiments are examples of embodiments, and the present disclosure is not limited to these.

（第１の実施形態）
図１は、本実施形態に係る画像処理システム１００を示す図である。画像処理システム１００は、複数の撮影装置１１０、画像生成装置１２０、及び端末装置１３０を有する。各撮影装置１１０と画像生成装置１２０は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ケーブル等の通信ケーブルを介して接続している。なお、本実施形態においては、通信ケーブルはＬＡＮケーブルであるものとするが、通信ケーブルは実施形態に限定されるものではない。 First Embodiment
1 is a diagram showing an image processing system 100 according to this embodiment. The image processing system 100 includes a plurality of image capture devices 110, an image generation device 120, and a terminal device 130. Each image capture device 110 and the image generation device 120 are connected via a communication cable such as a LAN (Local Area Network) cable. Note that in this embodiment, the communication cable is a LAN cable, but the communication cable is not limited to this embodiment.

撮影装置１１０は、例えば画像（静止画及び動画）を撮影可能なデジタルカメラである。各撮影装置１１０は、撮影スタジオなどで撮影領域を囲むように設置され、画像（映像）を撮影する。本実施形態では、ダンスシーンなど複数の演者を被写体として撮影する場合を例にとって説明する。撮影された画像は、撮影装置１１０から画像生成装置１２０に送信される。図２は、撮影装置１１０の設置例を示す図である。本実施形態においては、複数の撮影装置１１０は、それぞれ撮影スタジオ内のすべて又は一部を撮影するように設置されているものとする。つまり、本実施形態の画像処理システム１００には、被写体を複数の方向から撮影するための複数の撮影装置１１０が含まれる。 The image capture device 110 is, for example, a digital camera capable of capturing images (still images and video). Each image capture device 110 is installed to surround a capture area in a photography studio or the like, and captures images (video). In this embodiment, an example will be described in which multiple performers are captured as subjects, such as in a dance scene. The captured images are sent from the image capture device 110 to the image generation device 120. Figure 2 is a diagram showing an example of the installation of the image capture device 110. In this embodiment, the multiple image capture devices 110 are each installed to capture all or part of the photography studio. In other words, the image processing system 100 of this embodiment includes multiple image capture devices 110 for capturing images of a subject from multiple directions.

画像生成装置１２０は、撮影装置１１０により得られた撮影画像を蓄積しておき、端末装置１３０におけるユーザ操作により、仮想視点情報と再生時刻情報とが入力されると、撮影画像と仮想視点とに基づいて、仮想視点画像を生成する。ここで、仮想視点情報は、撮影画像から構築される仮想空間における仮想的な視点（仮想視点）の位置と、仮想視点からの視線方向とを示す情報を含む。なお、仮想視点情報に含まれる情報はこれに限定されない。例えば、仮想視点の視野の広さ（画角）に関する情報が含まれてもよい。また、仮想視点の位置、仮想視点からの視線方向、及び仮想視点の画角のうち、少なくともいずれかを表す情報が含まれる構成でもよい。再生時刻情報は、撮影画像の録画開始時刻からの時刻情報である。ユーザは、例えば後述する端末装置１３０を操作して再生時刻を指定することにより、録画された撮影画像において指定された再生時刻に対応するシーンの仮想視点画像を生成することができる。 The image generating device 120 accumulates the captured images obtained by the image capturing device 110, and when virtual viewpoint information and playback time information are input by a user operation on the terminal device 130, the image generating device 120 generates a virtual viewpoint image based on the captured image and the virtual viewpoint. Here, the virtual viewpoint information includes information indicating the position of a virtual viewpoint (virtual viewpoint) in a virtual space constructed from the captured image and the line of sight from the virtual viewpoint. Note that the information included in the virtual viewpoint information is not limited to this. For example, information regarding the width of the field of view (angle of view) of the virtual viewpoint may be included. Also, a configuration may be adopted in which information indicating at least one of the position of the virtual viewpoint, the line of sight from the virtual viewpoint, and the angle of view of the virtual viewpoint is included. The playback time information is time information from the start time of recording of the captured image. For example, the user can generate a virtual viewpoint image of a scene corresponding to the specified playback time in the recorded captured image by operating the terminal device 130 described later to specify the playback time.

画像生成装置１２０は、例えば、サーバ装置であり、データベース機能や、画像処理機能を備えている。データベースには、本番の撮影の開始前など、予め被写体が存在しない状態の場面を撮影した画像を背景画像として、撮影装置１１０を介して保持しておく。また、被写体の存在するシーンでは、画像生成装置１２０は、撮影画像のうち演者等の人物、及び演者が使用する道具など特定のオブジェクトに対応する領域（以下、前景画像ともいう）を画像処理により分離して、前景画像として保持しておく。なお、特定オブジェクトは、小道具などの画像パターンが予め定められている物体であってもよい。 The image generating device 120 is, for example, a server device, and has a database function and an image processing function. In the database, images captured in advance of a scene in which no subject is present, such as before the start of actual filming, are stored as background images via the filming device 110. In addition, in a scene in which a subject is present, the image generating device 120 separates areas of the captured image that correspond to people such as performers and specific objects such as tools used by performers (hereinafter also referred to as foreground images) through image processing, and stores the areas as foreground images. Note that the specific objects may be objects such as props, whose image patterns are predetermined.

仮想視点情報に対応した仮想視点画像は、データベースで管理された背景画像と特定オブジェクト画像とから生成されるものとする。仮想視点画像の生成方式として、例えばモデルベースレンダリング（Ｍｏｄｅｌ－ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＭＢＲ）が用いられる。ＭＢＲとは、被写体を複数の方向から撮影した複数の撮影画像に基づいて生成される三次元形状を用いて仮想視点画像を生成する方式である。具体的には、視体積交差法、Ｍｕｌｔｉ－Ｖｉｅｗ－Ｓｔｅｒｅｏ（ＭＶＳ）などの三次元形状復元手法により得られた対象シーンの三次元形状（モデル）を利用し、仮想視点からのシーンの見えを画像として生成する技術である。なお、仮想視点画像の生成方法は、ＭＢＲ以外のレンダリング手法を用いてもよい。生成された仮想視点画像は、ＬＡＮケーブルなどを介して、端末装置１３０に伝送される。 The virtual viewpoint image corresponding to the virtual viewpoint information is generated from a background image and a specific object image managed in a database. For example, model-based rendering (MBR) is used as a method for generating the virtual viewpoint image. MBR is a method for generating a virtual viewpoint image using a three-dimensional shape generated based on multiple captured images of a subject taken from multiple directions. Specifically, it is a technology that uses a three-dimensional shape (model) of a target scene obtained by a three-dimensional shape restoration method such as a visual volume intersection method or Multi-View-Stereo (MVS) to generate an image of the scene as seen from a virtual viewpoint. Note that the method for generating the virtual viewpoint image may use a rendering method other than MBR. The generated virtual viewpoint image is transmitted to the terminal device 130 via a LAN cable or the like.

端末装置１３０は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やタブレットである。コントローラ１３１は、例えば、マウス、キーボード、６軸コントローラ、タッチパネルであり、これらを用いてユーザは操作し、画面上に静止画像や動画像を表示する。端末装置１３０は、例えば、画像生成装置１２０から受信した仮想視点画像を表示部１３２に表示する。端末装置１３０は、さらに、接続されたコントローラ１３１に対するユーザ操作に応じて、再生時刻と仮想視点の移動の指示（移動量と移動方向に関する指示）を受け付け、受け付けた指示に応じた指示情報を示す伝送信号を画像生成装置１２０に送信する。 The terminal device 130 is, for example, a PC (Personal Computer) or a tablet. The controller 131 is, for example, a mouse, a keyboard, a six-axis controller, or a touch panel, which are operated by the user to display still images and moving images on the screen. The terminal device 130 displays, for example, a virtual viewpoint image received from the image generating device 120 on the display unit 132. The terminal device 130 further receives instructions for the playback time and movement of the virtual viewpoint (instructions regarding the amount of movement and the direction of movement) in response to user operations on the connected controller 131, and transmits a transmission signal indicating instruction information corresponding to the received instruction to the image generating device 120.

図３は、画像生成装置１２０のハードウェア構成を示す図である。 Figure 3 shows the hardware configuration of the image generating device 120.

画像生成装置１２０は、ＣＰＵ３０１と、ＲＯＭ３０２と、ＲＡＭ３０３と、ＨＤＤ３０４と、表示部３０５と、入力部３０６と、通信部３０７とを有する。ＣＰＵ３０１は、ＲＯＭ３０２に記憶された制御プログラムを読み出して各種処理を実行する。ＲＡＭ３０３は、ＣＰＵ３０１の主メモリ、ワークエリア等の一時記憶領域として用いられる。ＨＤＤ３０４は、各種データや各種プログラム等を記憶する。表示部３０５は、各種情報を表示する。入力部３０６は、キーボードやマウスを有し、ユーザによる各種操作を受け付ける。通信部３０７は、ネットワークを介して撮影装置１１０等の外部装置との通信処理を行う。なお、ネットワークとしては、イーサネット（登録商標）が挙げられる。また、他の例としては、通信部３０７は、無線により外部装置との通信を行ってもよい。 The image generating device 120 has a CPU 301, a ROM 302, a RAM 303, a HDD 304, a display unit 305, an input unit 306, and a communication unit 307. The CPU 301 reads out a control program stored in the ROM 302 and executes various processes. The RAM 303 is used as a temporary storage area such as the main memory and work area of the CPU 301. The HDD 304 stores various data and various programs. The display unit 305 displays various information. The input unit 306 has a keyboard and a mouse, and accepts various operations by the user. The communication unit 307 performs communication processing with an external device such as the imaging device 110 via a network. Note that an example of the network is Ethernet (registered trademark). As another example, the communication unit 307 may communicate with an external device wirelessly.

なお、図３に示す例では、ＨＤＤ３０４、表示部３０５及び入力部３０６が画像生成装置１２０の内部に含まれるものとしたが、これに限定されない。例えば、ＨＤＤ３０４、表示部３０５及び入力部３０６の少なくともいずれかが、他の装置として画像生成装置１２０の外部に接続される構成でもよい。また、後述する画像生成装置１２０の機能や処理は、ＣＰＵ３０１がＲＯＭ３０２又はＨＤＤ３０４に格納されているプログラムを読み出し、このプログラムを実行することにより実現されるものである。また、端末装置１３０のハードウェア構成についても、画像生成装置１２０のハードウェア構成と同様である。 In the example shown in FIG. 3, the HDD 304, the display unit 305, and the input unit 306 are included inside the image generating device 120, but this is not limiting. For example, at least one of the HDD 304, the display unit 305, and the input unit 306 may be connected to the outside of the image generating device 120 as another device. Furthermore, the functions and processing of the image generating device 120 described below are realized by the CPU 301 reading out a program stored in the ROM 302 or the HDD 304 and executing this program. Furthermore, the hardware configuration of the terminal device 130 is similar to the hardware configuration of the image generating device 120.

図４は、画像生成装置１２０の機能構成を示す図である。ここで、本実施形態における画像生成装置１２０が行う処理について説明する。本実施形態においては、ダンスシーンを撮影し、撮影画像に基づいて仮想視点画像を生成することを想定する。しかしながら、ダンス中における被写体の立ち位置が、所望の立ち位置とは異なる位置となることがありうる。図１６（ａ）は、線で囲まれた撮影領域が複数のカメラで撮影されることにより生成された仮想空間上の被写体と、仮想視点に対応する仮想的なカメラ（以下、仮想カメラという）との概念図である。仮想カメラ２１０１のように被写体を正面から撮影し視聴する場合もあれば、仮想カメラ２１０２のように被写体を横方向から撮影し視聴するなど、自由な角度から被写体を撮影するような仮想カメラを指定可能である。図１６（ｃ）は仮想カメラ２１０２に対応する仮想視点画像の画面表示である。ここで、演者は、仮想カメラ２１０２から撮影した場合に、一直線上になるような立ち位置で演技したいと考えているとする。しかしながら、実際に撮影した場合に、立ち位置がずれでしまい、図１６（ｃ）のように被写体の立ち位置が直線上に揃っていない仮想視点画像となってしまうことがありうる。 Figure 4 is a diagram showing the functional configuration of the image generating device 120. Here, the processing performed by the image generating device 120 in this embodiment will be described. In this embodiment, it is assumed that a dance scene is shot and a virtual viewpoint image is generated based on the shot image. However, the subject's standing position during the dance may be a position different from the desired standing position. Figure 16 (a) is a conceptual diagram of a subject in a virtual space generated by shooting a shooting area surrounded by a line with multiple cameras, and a virtual camera (hereinafter referred to as a virtual camera) corresponding to a virtual viewpoint. A virtual camera that shoots a subject from a free angle can be specified, such as a case where the subject is shot from the front and viewed as in the virtual camera 2101, or a case where the subject is shot from the side and viewed as in the virtual camera 2102. Figure 16 (c) is a screen display of a virtual viewpoint image corresponding to the virtual camera 2102. Here, it is assumed that the performer wants to perform from a standing position that will be in a straight line when shot from the virtual camera 2102. However, when actually photographing, the subjects' positions may shift, resulting in a virtual viewpoint image in which the subjects' positions are not aligned in a straight line, as shown in Figure 16(c).

また、演者が仮想視点画像を見て立ち位置を確認するという用途も考えられる。このとき、指定される仮想カメラによっては、立ち位置のずれを容易に認識できない場合がある。図１６（ｂ）は仮想カメラ２１０１に対応する仮想視点画像の画面表示である。仮想カメラ２１０１に対応する仮想視点画像においては、被写体の立ち位置は揃っているように見える。しかし実際には、仮想カメラ２１０２から撮影された仮想視点画像上では、被写体の立ち位置がずれてしまっている。本実施形態における画像生成装置１２０は、上記の問題を解決することを目的とする。 Another possible use would be for an actor to check his/her standing position by looking at the virtual viewpoint image. In this case, depending on the virtual camera specified, it may not be easy to recognize the deviation in standing position. FIG. 16(b) is a screen display of a virtual viewpoint image corresponding to virtual camera 2101. In the virtual viewpoint image corresponding to virtual camera 2101, the subjects appear to be positioned in the same way. However, in reality, the subjects are positioned out of place in the virtual viewpoint image captured by virtual camera 2102. The image generating device 120 in this embodiment aims to solve the above problem.

画像生成装置１２０は、撮影画像入力４０１、前景背景分離部４０２、撮影画像データ保存部４０３、カメラパラメータ保持部４０４、被写体形状生成部４０５、被写体形状重心算出部４０６、被写体形状移動部４０７、及び被写体位置決定部４０８を有する。また、画像生成装置１２０は、ユーザ入力部４０９、仮想視点情報設定部４１０、着色情報算出部４１１、仮想視点画像生成部４１２、及び画像出力部４１３を有する。以下、各処理部について説明する。 The image generating device 120 has a captured image input 401, a foreground/background separation unit 402, a captured image data storage unit 403, a camera parameter holding unit 404, a subject shape generation unit 405, a subject shape center of gravity calculation unit 406, a subject shape movement unit 407, and a subject position determination unit 408. The image generating device 120 also has a user input unit 409, a virtual viewpoint information setting unit 410, a coloring information calculation unit 411, a virtual viewpoint image generation unit 412, and an image output unit 413. Each processing unit will be described below.

撮影画像入力部４０１は、撮影装置１１０からＬＡＮケーブルを介して入力された伝送信号を撮影画像データに変換して、前景背景分離部４０２へ出力する。前景背景分離部４０２は、撮影画像入力部４０１から入力された撮影画像のうち、被写体の演技開始前など予め被写体が存在しない状態の場面を撮影した画像を背景画像データとして、撮影画像データ保存部４０３へ出力する。また、被写体の演技中に撮影された画像から被写体を抽出し、前景画像データとして撮影画像データ保存部４０３へ出力する。 The captured image input unit 401 converts a transmission signal input from the image capture device 110 via a LAN cable into captured image data and outputs it to the foreground/background separation unit 402. The foreground/background separation unit 402 outputs, from the captured images input from the captured image input unit 401, images captured of a scene in which the subject is not present beforehand, such as before the subject begins their performance, as background image data to the captured image data storage unit 403. In addition, the unit extracts the subject from an image captured during the subject's performance, and outputs it to the captured image data storage unit 403 as foreground image data.

撮影画像データ保存部４０３は、データベースであり、前景背景分離部４０２から入力された撮影画像データのうち、被写体が存在しない状態で予め撮影された画像を背景画像データとしてＨＤＤ３０４に保存する。また撮影画像データ保存部４０３は、背景画像データと被写体の存在する撮影画像データとの差分データを前景画像データとしてＨＤＤ３０４に保存する。また、撮影画像データ保存部４０３は、被写体形状生成部４０５に前景画像データを出力する。また、着色情報算出部４１１により指定された背景画像データと前景画像データを着色情報算出部４１１へ出力する。 The photographed image data storage unit 403 is a database, and among the photographed image data input from the foreground/background separation unit 402, the unit stores an image that was previously photographed without the presence of a subject as background image data in the HDD 304. The photographed image data storage unit 403 also stores difference data between the background image data and the photographed image data in which a subject is present as foreground image data in the HDD 304. The photographed image data storage unit 403 also outputs foreground image data to the subject shape generation unit 405. The unit also outputs the background image data and foreground image data specified by the coloring information calculation unit 411 to the coloring information calculation unit 411.

カメラパラメータ保持部４０４は、複数の撮影装置１１０の撮影位置情報、撮影装置１１０のレンズの焦点距離、及び撮影装置１１０のシャッタースピード等のカメラ設定情報を、カメラパラメータ情報として保持する。複数の撮影装置１１０はあらかじめ決められた位置に設置され、カメラパラメータ情報があらかじめ取得されるものとする。また、カメラパラメータ保持部４０４は、カメラパラメータ情報を、被写体形状生成部４０５及び着色情報算出部４１１に出力する。 The camera parameter holding unit 404 holds, as camera parameter information, information on the shooting positions of the multiple image capture devices 110, the focal length of the lenses of the image capture devices 110, and camera setting information such as the shutter speed of the image capture devices 110. The multiple image capture devices 110 are installed at predetermined positions, and the camera parameter information is acquired in advance. The camera parameter holding unit 404 also outputs the camera parameter information to the subject shape generation unit 405 and the coloring information calculation unit 411.

被写体形状生成部４０５は、前景画像データと、カメラパラメータ情報とを用いて、被写体の形状を表す形状データを生成する。被写体形状生成部４０５は、例えば、視体積交差法などの三次元形状復元手法を用いて被写体の形状データを生成するとする。また、形状データを被写体形状重心算出部４０６と着色情報算出部４１１へ出力する。 The subject shape generation unit 405 generates shape data representing the shape of the subject using the foreground image data and the camera parameter information. The subject shape generation unit 405 generates the shape data of the subject using a three-dimensional shape restoration method such as a volume intersection method. The subject shape generation unit 405 also outputs the shape data to the subject shape center of gravity calculation unit 406 and the coloring information calculation unit 411.

被写体形状重心算出部４０６は、撮影領域における被写体の位置を特定する。具体的には、被写体形状重心算出部４０６は、被写体形状生成部４０５から入力された形状データを使用して、被写体の形状の重心を被写体位置として特定する。このとき、被写体形状重心算出部４０６は、所定視点位置から被写体を見た場合の重心位置を算出する。被写体形状重心算出部４０６は、例えば、被写体を真上から見下ろす視点での重心位置を被写体の位置として算出する。被写体形状重心算出部４０６は、被写体重心位置からなる被写体重心情報を、被写体形状移動部４０７へ出力する。 The subject shape center of gravity calculation unit 406 identifies the position of the subject in the shooting area. Specifically, the subject shape center of gravity calculation unit 406 uses the shape data input from the subject shape generation unit 405 to identify the center of gravity of the subject's shape as the subject position. At this time, the subject shape center of gravity calculation unit 406 calculates the center of gravity position when the subject is viewed from a specified viewpoint position. The subject shape center of gravity calculation unit 406 calculates, for example, the center of gravity position at a viewpoint looking down on the subject from directly above as the subject position. The subject shape center of gravity calculation unit 406 outputs subject center of gravity information consisting of the subject center of gravity position to the subject shape movement unit 407.

被写体形状移動部４０７は、被写体形状重心算出部４０６から入力された被写体重心情報と、後述する被写体位置設定部４０８から入力された被写体移動先位置情報とに基づき、被写体の形状データを配置する位置を決定する。ここで、被写体移動先位置情報は、被写体が配置されるべき基準の位置（以下、基準位置ともいう）を表す情報である。被写体形状移動部４０７は、基準位置と被写体の位置とのずれに応じて、形状データの配置を決定する。 The subject shape moving unit 407 determines the position at which to place the subject's shape data based on the subject's center of gravity information input from the subject shape center of gravity calculation unit 406 and the subject's destination position information input from the subject position setting unit 408 described below. Here, the subject's destination position information is information that represents the reference position (hereinafter also referred to as the reference position) at which the subject is to be placed. The subject shape moving unit 407 determines the placement of the shape data depending on the deviation between the reference position and the position of the subject.

被写体形状移動部４０７は、被写体移動先位置情報が床面に設定された所定の間隔（例えば３メートル間隔）の格子点情報である場合、格子点の位置を基準位置として、格子点の位置と、被写体の重心の位置とが一致するように、被写体の形状データを配置する。なお、被写体移動先位置情報に基づいて、被写体の形状データが再生成されてもよい。また、被写体形状移動部４０７は、基準位置と被写体の位置とのずれが所定の閾値以上である場合に、ずれが所定の閾値よりも小さくなるように形状データの位置を変更し、変更した位置に形状データが配置されるようにしてもよい。なお、被写体形状移動部４０７は、被写体形状移動部４０７は、移動した形状データを仮想視点画像生成部４１２へ出力する。 When the subject destination position information is lattice point information at a predetermined interval (e.g., 3 meter intervals) set on the floor surface, the subject shape moving unit 407 uses the position of the lattice point as a reference position and arranges the shape data of the subject so that the position of the lattice point coincides with the position of the center of gravity of the subject. Note that the subject shape data may be regenerated based on the subject destination position information. Furthermore, when the deviation between the reference position and the position of the subject is equal to or greater than a predetermined threshold, the subject shape moving unit 407 may change the position of the shape data so that the deviation becomes smaller than the predetermined threshold, and arrange the shape data at the changed position. Note that the subject shape moving unit 407 outputs the moved shape data to the virtual viewpoint image generating unit 412.

被写体位置設定部４０８は、ユーザにより予め設定された三次元空間上の被写体移動先位置情報を被写体形状移動部４０７へ出力する。例えば、複数の被写体を所定の直線上の所定区間に配置されるように床面に対応した３メートル間隔の格子点情報を出力するものとする。なお、被写体移動先位置情報は格子点に限定されず、例えば直線や曲線であってもよい。 The subject position setting unit 408 outputs subject destination position information in three-dimensional space, which has been set in advance by the user, to the subject shape moving unit 407. For example, it outputs lattice point information at intervals of 3 meters corresponding to the floor surface so that multiple subjects are placed in a specified section on a specified straight line. Note that the subject destination position information is not limited to lattice points, and may be, for example, a straight line or a curve.

ユーザ入力部４０９は、端末装置１３０からＬＡＮケーブルを介して入力された伝送信号をユーザ入力データに変換する。ユーザ入力データが再生時刻情報と仮想視点情報である場合、再生時刻情報と仮想視点情報を仮想視点情報設定部４１０へ出力する。 The user input unit 409 converts a transmission signal input from the terminal device 130 via a LAN cable into user input data. If the user input data is playback time information and virtual viewpoint information, the user input unit 409 outputs the playback time information and the virtual viewpoint information to the virtual viewpoint information setting unit 410.

仮想視点情報設定部４１０は、ユーザ入力部４０９から入力された再生時刻情報と仮想視点情報に基づき、仮想空間内の現在位置と方向と再生時刻とを更新する。 The virtual viewpoint information setting unit 410 updates the current position, direction, and playback time in the virtual space based on the playback time information and virtual viewpoint information input from the user input unit 409.

その後、再生時刻情報と仮想視点情報とを被写体形状生成部４０５と着色情報算出部４１１と仮想視点画像生成部４１２へ出力する。なお、仮想空間の原点は競技場の中心などを予め設定するものとする。 Then, the playback time information and virtual viewpoint information are output to the subject shape generation unit 405, the coloring information calculation unit 411, and the virtual viewpoint image generation unit 412. Note that the origin of the virtual space is set in advance to the center of the stadium, etc.

着色情報算出部４１１は、仮想視点情報設定部４１０から入力された再生時刻情報と仮想視点情報とに基づいた前景画像データと背景画像データとを撮影画像データ保存部４０３から入力する。またカメラパラメータをカメラパラメータ保持部４０４から入力し、また形状データを被写体形状生成部４０５から入力する。次に仮想視点位置から見た被写体形状に対して、該当時刻に実カメラで撮影された画像データの色情報でレンダリング（着色処理）して被写体形状の着色情報を保持する。例えば、仮想視点から形状データに基づく被写体が見えている状況で、仮想視点の位置から所定の範囲内に実カメラ位置情報がある場合、その実カメラの前景画像データを形状の色として使用するものとする。また着色情報算出部４１１は、仮想視点画像生成部４１２へ着色情報を出力する。 The coloring information calculation unit 411 inputs foreground image data and background image data based on the playback time information and virtual viewpoint information input from the virtual viewpoint information setting unit 410 from the captured image data storage unit 403. It also inputs camera parameters from the camera parameter holding unit 404 and shape data from the subject shape generation unit 405. Next, the subject shape as seen from the virtual viewpoint position is rendered (colored) using color information of image data captured by a real camera at the corresponding time, and coloring information of the subject shape is stored. For example, when a subject based on shape data is visible from the virtual viewpoint, and real camera position information is within a specified range from the virtual viewpoint position, the foreground image data of the real camera is used as the color of the shape. The coloring information calculation unit 411 also outputs coloring information to the virtual viewpoint image generation unit 412.

仮想視点画像生成部４１２は、仮想視点情報設定部４０８から入力された再生時刻情報と仮想視点情報とに基づいた前景画像データと背景画像データとを撮影画像データ保存部４０３から入力する。またカメラパラメータをカメラパラメータ保持部４０４から入力し、また移動形状データを被写体形状移動部４０７から入力する。その後、背景画像データを仮想視点位置から背景として見えるように投影変換や画像処理を施して仮想視点画像の背景とする。さらに次に仮想視点位置から見た移動被写体形状に対して、該当時刻に撮影装置で撮影された画像データによる着色情報を着色情報算出部４１１より入力して、色情報でレンダリング（着色処理）して仮想視点画像を生成する。最後に仮想視点画像生成部４１２にて生成された仮想視点画像を画像出力部４１３へ出力する。画像出力部４１３は、仮想視点画像生成部４１２から入力した画像データを、端末装置１３０へ伝送可能な伝送信号に変換して、端末装置１３０へ出力する。 The virtual viewpoint image generating unit 412 inputs the foreground image data and background image data based on the playback time information and virtual viewpoint information input from the virtual viewpoint information setting unit 408 from the captured image data storage unit 403. It also inputs the camera parameters from the camera parameter holding unit 404, and also inputs the moving shape data from the subject shape moving unit 407. After that, the background image data is subjected to projection transformation and image processing so that it is seen as the background from the virtual viewpoint position, and becomes the background of the virtual viewpoint image. Next, for the moving subject shape seen from the virtual viewpoint position, coloring information based on the image data captured by the shooting device at the corresponding time is input from the coloring information calculation unit 411, and rendering (coloring processing) is performed with the color information to generate a virtual viewpoint image. Finally, the virtual viewpoint image generated by the virtual viewpoint image generating unit 412 is output to the image output unit 413. The image output unit 413 converts the image data input from the virtual viewpoint image generating unit 412 into a transmission signal that can be transmitted to the terminal device 130, and outputs it to the terminal device 130.

次に、画像生成装置１２０の動作について説明する。図５は、実施例１に係る画像生成装置１２０の動作を示すフローチャートである。ＣＰＵ３０１がＲＯＭ３０２またはＨＤＤ３０４に記憶されたプログラムを読み出して実行することにより、以下の処理が行われる。端末装置１３０から仮想視点情報及び再生時刻を指定するための入力が行われると、処理が開始される。 Next, the operation of the image generating device 120 will be described. FIG. 5 is a flowchart showing the operation of the image generating device 120 according to the first embodiment. The CPU 301 reads out and executes a program stored in the ROM 302 or the HDD 304, thereby performing the following processing. When an input is made from the terminal device 130 to specify virtual viewpoint information and a playback time, the processing is started.

仮想視点情報設定部４１０は、ユーザ入力部４０９を介して、仮想視点情報と再生時刻情報が入力されたか否か判断する（Ｓ５０１）。仮想視点情報と再生時刻情報が入力されない場合（Ｓ５０１のＮｏ）、待機する。一方、仮想視点情報と再生時刻情報が入力された場合（Ｓ５０１のＹｅｓ）、仮想視点情報と再生時刻情報とを被写体形状生成部４０５と着色情報算出部４１１と仮想視点画像生成部４１２へ出力する。 The virtual viewpoint information setting unit 410 determines whether or not virtual viewpoint information and playback time information have been input via the user input unit 409 (S501). If virtual viewpoint information and playback time information have not been input (No in S501), the unit waits. On the other hand, if virtual viewpoint information and playback time information have been input (Yes in S501), the unit outputs the virtual viewpoint information and playback time information to the subject shape generation unit 405, the coloring information calculation unit 411, and the virtual viewpoint image generation unit 412.

次に、被写体形状生成部４０５は、仮想視点情報設定部４１０から入力された再生時刻情報に基づく撮影画像データ保存部４０３から入力された前景画像データと、カメラパラメータ保持部４０４から入力されたカメラパラメータ情報を読み込む（Ｓ５０２）。続いて被写体形状生成部４０５は、被写体の３次元形状を推定する（Ｓ５０３）。例えば、視体積交差法などの三次元形状復元手法を用いて被写体の形状データを生成するとする。ここで、形状データとは、複数の点群からなり各点は位置情報を含むものとする。 Next, the subject shape generation unit 405 reads the foreground image data input from the captured image data storage unit 403 based on the playback time information input from the virtual viewpoint information setting unit 410, and the camera parameter information input from the camera parameter holding unit 404 (S502). Next, the subject shape generation unit 405 estimates the three-dimensional shape of the subject (S503). For example, it is assumed that the shape data of the subject is generated using a three-dimensional shape restoration method such as a visual volume intersection method. Here, the shape data is assumed to be composed of a group of multiple points, with each point including position information.

次に、着色情報算出部４１１は、仮想視点情報設定部４１０から入力された再生時刻情報と仮想視点情報に基づいた前景画像データと背景画像データとを撮影画像データ保存部４０３から入力する。またカメラパラメータをカメラパラメータ保持部４０４から入力し、また形状データを被写体形状生成部４０５から入力する。次に仮想視点位置から見た被写体形状に対して、該当時刻に実カメラで撮影された画像データの色情報でレンダリング（着色処理）、被写体形状の着色情報を保持する（Ｓ５０４）。 Next, the coloring information calculation unit 411 inputs from the captured image data storage unit 403 the foreground image data and background image data based on the playback time information and virtual viewpoint information input from the virtual viewpoint information setting unit 410. It also inputs the camera parameters from the camera parameter holding unit 404 and the shape data from the subject shape generation unit 405. Next, the subject shape as seen from the virtual viewpoint position is rendered (coloring process) using the color information of the image data captured by the real camera at the corresponding time, and the coloring information of the subject shape is stored (S504).

次に、被写体形状重心算出部４０６は、被写体形状生成部４０５から入力された被写体形状の所定視点位置から見た場合の重心位置を、被写体位置として特定する（Ｓ５０５）。なお、本実施例では真上から見た場合の重心位置を被写体形状重心情報とする。図６（ａ）は、仮想空間上で真上から見た被写体の形状データと、形状データの重心位置の概念図である。本実施形態では、複数の被写体を所定格子点上に配置しなおすため、被写体に対して真上から見下ろす視点で、被写体の前後方向軸の中心位置、また左右方向軸の中心位置などを算出して重心位置を算出する。その重心位置は被写体の黒点で示しており、重心位置を直線上や格子点上に配置することになる。 Next, the subject shape center of gravity calculation unit 406 identifies the center of gravity position of the subject shape input from the subject shape generation unit 405 when viewed from a specified viewpoint position as the subject position (S505). In this embodiment, the center of gravity position when viewed from directly above is used as the subject shape center of gravity information. FIG. 6(a) is a conceptual diagram of the shape data of the subject viewed from directly above in virtual space and the center of gravity position of the shape data. In this embodiment, in order to rearrange multiple subjects on specified lattice points, the center of gravity position is calculated by calculating the center of the front-back axis and the center of the left-right axis of the subject from a viewpoint looking down from directly above the subject. The center of gravity position is indicated by a black dot on the subject, and the center of gravity position is positioned on a straight line or a lattice point.

図５の説明に戻り、被写体形状移動部４０７は、被写体位置設定部４０８より三次元空間上の被写体移動先位置情報があるか否か判定する（Ｓ５０６）。被写体位置設定部４０８より三次元空間上の被写体移動先位置情報がある場合（Ｓ５０６のＹｅｓ）、被写体移動先位置情報を入力する（Ｓ５０７）。一方で、被写体移動先位置情報がない場合（Ｓ５０６のＮｏ）、Ｓ５０９へ進む。図６（ｂ）は、被写体移動先位置情報に基づく被写体の基準位置を表す、床面の格子点位置の概念図である。例えば、複数の被写体を所定の直線上の所定区間に配置されるように床面に対応した３メートル間隔の格子点情報を出力するものとする。 Returning to the explanation of FIG. 5, the subject shape moving unit 407 determines whether or not there is subject destination position information in three-dimensional space from the subject position setting unit 408 (S506). If there is subject destination position information in three-dimensional space from the subject position setting unit 408 (Yes in S506), the subject destination position information is input (S507). On the other hand, if there is no subject destination position information (No in S506), the process proceeds to S509. FIG. 6(b) is a conceptual diagram of lattice point positions on the floor surface, which represent the reference position of the subject based on the subject destination position information. For example, assume that lattice point information at 3-meter intervals corresponding to the floor surface is output so that multiple subjects are placed in a specified section on a specified straight line.

図５の説明に戻り、次に、被写体形状移動部４０７は、被写体形状重心算出部４０６から入力された被写体重心情報と被写体位置設定部４０８から入力された被写体移動先位置情報に基づき、形状データを移動する（Ｓ５０８）。図６（ｃ）は、移動後の被写体形状の概念図である。被写体形状に対して真上から見下ろす視点で、被写体移動先位置情報に基づく格子点位置と被写体形状の重心位置とが一致するように被写体位置を変更し、変更した被写体位置に形状データを移動したことを示している。これにより、移動被写体形状の位置は一定の距離間となるように移動されたことになる。なお、基準位置と被写体位置とが所定の閾値よりも小さくなるように被写体位置が変更され、形状データが配置されてもよい。 Returning to the explanation of FIG. 5, next, the subject shape moving unit 407 moves the shape data based on the subject centroid information input from the subject shape centroid calculation unit 406 and the subject destination position information input from the subject position setting unit 408 (S508). FIG. 6(c) is a conceptual diagram of the subject shape after movement. It shows that the subject position has been changed so that the grid point position based on the subject destination position information matches the centroid position of the subject shape when viewed from directly above the subject shape, and the shape data has been moved to the changed subject position. This means that the position of the moved subject shape has been moved to a certain distance apart. Note that the subject position may be changed and the shape data may be positioned so that the reference position and subject position are smaller than a specified threshold.

図５の説明に戻り、仮想視点画像生成部４１２は、移動された形状データに基づいて、仮想視点位置から見た形状データに対して、指定時刻に撮影装置１１０で撮影された画像データを用いてレンダリング（着色処理）して仮想視点画像を生成する（Ｓ５０９）。すなわち、被写体移動前に決定、保持された被写体の色情報を被写体移動後にレンダリングすることで被写体が移動された位置で表示されることになる。仮想視点画像生成部４１２は、仮想視点画像生成部４１２にて生成された仮想視点画像を画像出力部４１３へ出力する（Ｓ５１０）。以上説明した処理は、録画して蓄積した画像データに基づいて行われてもよいし、撮影装置１１０による撮影が行われるのと並行して、リアルタイムで行うことも可能である。 Returning to the explanation of FIG. 5, the virtual viewpoint image generating unit 412 generates a virtual viewpoint image by rendering (coloring) the shape data seen from the virtual viewpoint position using image data captured by the image capturing device 110 at a specified time based on the moved shape data (S509). That is, the color information of the subject determined and held before the subject movement is rendered after the subject movement, so that the subject is displayed at the position to which it has been moved. The virtual viewpoint image generating unit 412 outputs the virtual viewpoint image generated by the virtual viewpoint image generating unit 412 to the image output unit 413 (S510). The above-described processing may be performed based on recorded and stored image data, or may be performed in real time in parallel with the image capturing by the image capturing device 110.

なお、上述した図５の説明では、被写体移動前に決定、保持された被写体の色情報を被写体移動後にレンダリングする、と説明しているが、これに限定されない。例えば、対象とする被写体を指定の位置に移動させると共に全撮影装置１１０の撮影位置と対象の被写体以外の位置も対象の被写体との位置関係を保つように相対的に移動させて、対象とする被写体のみをレンダリングする。これを各被写体に対して順次行い合成することで仮想視点画像を生成するとしてもよい。すなわち、被写体の移動前に色情報を決定するのではなく、被写体移動後に色情報の計算とレンダリングとを行う構成であってもよい。 Note that, in the explanation of FIG. 5 above, it is explained that the color information of the subject that was determined and held before the subject moved is rendered after the subject moved, but this is not limited to this. For example, the target subject is moved to a specified position, and the shooting positions of all the shooting devices 110 and positions other than the target subject are moved relatively to maintain their positional relationship with the target subject, and only the target subject is rendered. This may be performed sequentially for each subject, and the virtual viewpoint image may be generated by combining the results. In other words, instead of determining the color information before the subject moves, the color information may be calculated and rendered after the subject moves.

図７は、図１６（ａ）に示す被写体を撮影した場合に、図５に示す処理を行うことにより生成される仮想視点画像を表す図である。図７（ａ）は、撮影領域に対応する三次元空間における被写体の形状データの位置を表す。図７（ａ）によれば、複数の被写体がダンスなどの演技をしながらも、演技実施時とは異なる立ち位置である所定の格子点位置に移動され、一定の距離をとった立ち位置で、同じ動作をしている様子が仮想カメラで撮影される。仮想カメラ９０１の位置から被写体を撮影しても仮想カメラ９０２の位置から被写体を撮影しても、実際にダンスなどの演技をした立ち位置と比較して、異なる立ち位置になる。 Figure 7 is a diagram showing a virtual viewpoint image generated by performing the processing shown in Figure 5 when the subject shown in Figure 16 (a) is photographed. Figure 7 (a) shows the position of the shape data of the subject in three-dimensional space corresponding to the photographing area. According to Figure 7 (a), multiple subjects, while performing a dance or other performance, are moved to a specified lattice point position, which is a different standing position from when performing the performance, and the virtual camera photographs them performing the same movement from a standing position at a certain distance. Whether the subjects are photographed from the position of virtual camera 901 or the position of virtual camera 902, they will be in a different standing position compared to the standing position when they actually performed a dance or other performance.

図７（ｂ）は、端末装置１３０に表示される、仮想カメラ９０１に対応する仮想視点画像の表示例である。被写体の正面から撮影しても一定の位置を常に保って演技させることが可能となる。また、図７（ｃ）は仮想カメラ９０２に対応する仮想視点画像の表示例である。被写体の横方向から撮影しても直線状に配置しなおして表示可能であるため、常に同じ位置で演技させることが可能となる。 Figure 7(b) is an example of a virtual viewpoint image corresponding to virtual camera 901, displayed on the terminal device 130. Even if the subject is photographed from the front, it is possible to have the subject perform the action while always maintaining a constant position. Also, Figure 7(c) is an example of a virtual viewpoint image corresponding to virtual camera 902. Even if the subject is photographed from the side, it can be rearranged and displayed in a straight line, so it is possible to have the subject perform the action while always maintaining the same position.

以上、実施例１の形態によれば、複数被写体を複数カメラで撮影し、仮想視点撮影し表示する場合に、所望の位置に被写体を配置しなおした仮想視点画像を生成することが可能となる。演技の撮影時と比較して、仮想カメラの映像においては被写体の揃った演技を視聴することが可能となる。なお、上述した例では、基準位置が格子点で表されるものとしたが、基準位置は所定の直線や曲線で表されてもよい。この場合は、被写体の位置が線上からずれた場合、被写体位置が線上の任意の点（例えば、現在の被写体の位置から最も近い線上の点）の位置と一致するように形状データが配置される。また、基準位置は、任意の位置の点（座標）であってもよい。このとき、被写体が複数存在する場合は、被写体ごとに基準位置の点が設定されてもよい。 As described above, according to the first embodiment, when multiple subjects are photographed by multiple cameras and photographed from a virtual viewpoint and displayed, it is possible to generate a virtual viewpoint image in which the subjects are repositioned at a desired position. Compared to when the performance is photographed, it is possible to view a performance with the subjects aligned in the image of the virtual camera. In the above example, the reference position is represented by a lattice point, but the reference position may be represented by a predetermined straight line or curve. In this case, if the subject position deviates from the line, the shape data is positioned so that the subject position coincides with the position of an arbitrary point on the line (for example, a point on the line closest to the current subject position). The reference position may also be a point (coordinate) at an arbitrary position. In this case, if there are multiple subjects, a reference position point may be set for each subject.

また、基準位置は、複数の被写体の被写体位置に基づいて特定されてもよい。例えば、３人の被写体が撮影される場合に、３人の被写体が一直線上に位置するように立ち位置を設定したいものとする。このとき、３人のうち２人の被写体位置を結ぶ直線を算出し、算出した直線上の点を基準位置として、残りの１人の被写体位置を変更する。こうすることで、３人の被写体が一直線上に位置するように形状データが配置された仮想視点画像が生成される。なお、被写体の数が３人以外の場合にも適用可能である。 The reference position may also be specified based on the subject positions of multiple subjects. For example, suppose that when three subjects are to be photographed, it is desired to set the standing positions so that the three subjects are positioned in a straight line. In this case, a line connecting the subject positions of two of the three subjects is calculated, and a point on the calculated line is used as the reference position, and the subject position of the remaining person is changed. In this way, a virtual viewpoint image is generated in which shape data is arranged so that the three subjects are positioned in a straight line. Note that this can also be applied to cases where the number of subjects is other than three.

また、基準位置は、図７（ａ）に示したような撮影領域に対応する三次元空間位置に被写体を配置するに限らず、撮影領域外の任意の三次元空間位置に設定されてもよい。図１７は、撮影領域外の三次元空間位置に被写体の形状データを配置した例を表す。図１７に示す例では、対象とする被写体の形状データを移動するための基準位置が、撮影領域外の位置に設定されている。図１７によれば、複数の被写体は実際にダンスなどの演技を撮影領域内で撮影したとしても、その撮影領域よりも広い領域でダンスなどの演技をしているように仮想カメラ９０１の位置からは撮影される。 The reference position is not limited to placing the subject at a three-dimensional space position corresponding to the shooting area as shown in FIG. 7(a), but may be set to any three-dimensional space position outside the shooting area. FIG. 17 shows an example in which the shape data of the subject is placed at a three-dimensional space position outside the shooting area. In the example shown in FIG. 17, the reference position for moving the shape data of the target subject is set to a position outside the shooting area. According to FIG. 17, even if multiple subjects are actually photographed performing a dance or other performance within the shooting area, they are photographed from the position of the virtual camera 901 as if they are performing a dance or other performance in an area larger than the shooting area.

（第２の実施形態）
本実施形態では、被写体の所定の特徴に基づき、被写体の位置を変更する例である。図８は、第２の実施形態に係る画像生成装置１１００の機能構成を示す図である。画像生成装置１１００は、図４に示した第１の実施形態に係る画像生成装置１２０の被写体形状重心算出部４０６のかわりに、被写体特徴生成部１１０１を有する。なお、画像生成装置１１００のハードウェア構成は、第１の実施形態に係る画像生成装置１２０と同様であるものとする。また、画像生成装置１２０と同様の構成については、同じ符号を付し、説明を省略する。 Second Embodiment
In this embodiment, the position of the subject is changed based on a predetermined feature of the subject. FIG. 8 is a diagram showing a functional configuration of an image generating device 1100 according to the second embodiment. The image generating device 1100 has a subject feature generating unit 1101 instead of the subject shape center of gravity calculation unit 406 of the image generating device 120 according to the first embodiment shown in FIG. 4. Note that the hardware configuration of the image generating device 1100 is assumed to be the same as that of the image generating device 120 according to the first embodiment. Also, the same reference numerals are used for the same components as those of the image generating device 120, and the description thereof will be omitted.

被写体特徴生成部１１０１は、被写体形状生成部４０５から入力された形状データと形状データに対応した着色情報から、被写体の所定特徴認識とその位置を算出する。例えば、複数の被写体の顔を特徴として直線上や床面の所定格子点上に配置しなおす場合には、形状データと着色情報とを使用して被写体の顔認識を行い、顔の位置を特定する。これにより、その後の処理において、顔の位置が真上から見下ろす視点で、顔の位置が所定の直線上や格子点上にくるように形状データが配置されるようになる。すなわち、被写体特徴生成部１１０１は、被写体の形状における所定の部位を被写体位置として特定する。被写体特徴生成部１１０１は、被写体特徴位置情報を、被写体形状移動部４０７へ出力する。 The subject feature generation unit 1101 calculates the recognition of specific features of the subject and their positions from the shape data and coloring information corresponding to the shape data input from the subject shape generation unit 405. For example, when rearranging the faces of multiple subjects as features on a straight line or on specific grid points on the floor surface, the subject's faces are recognized using the shape data and coloring information, and the position of the faces is identified. As a result, in subsequent processing, the shape data is arranged so that the face position is on a specific straight line or grid point when viewed from directly above. In other words, the subject feature generation unit 1101 identifies a specific part of the subject's shape as the subject position. The subject feature generation unit 1101 outputs the subject feature position information to the subject shape movement unit 407.

図９は、第２の実施形態に係る画像処理装置１１００による画像処理を示すフローチャートである。なお、Ｓ５０１からＳ５０４までは図５の説明と同一であるため説明は省略する。またＳ５０６からＳ５１０においても図５の説明と同一であるため説明は省略する。 Figure 9 is a flowchart showing image processing by the image processing device 1100 according to the second embodiment. Note that steps S501 to S504 are the same as those in Figure 5, and therefore their explanations are omitted. Steps S506 to S510 are also the same as those in Figure 5, and therefore their explanations are omitted.

Ｓ１２０１において、被写体形状生成部４０５は、被写体位置として所定の部位の位置を特定する。このとき、被写体形状生成部４０５は、被写体の形状データのうち所定の特徴を有する位置を、例えば顔認識などの画像解析を使用して特定する。 In S1201, the subject shape generation unit 405 identifies the position of a predetermined part as the subject position. At this time, the subject shape generation unit 405 identifies a position having a predetermined characteristic in the subject's shape data by using image analysis such as face recognition.

図１０は、被写体の顔認識を形状データと着色情報とから解析し、顔の位置を特定し被写体位置とする場合の例である。被写体特徴生成部１１０１は、図１０（ａ）に示すように、被写体の顔の位置を特定する。また、被写体形状移動部４０７は、被写体特徴生成部１１０１が特定した被写体位置に基づいて、形状データを配置する。これにより、図１０（ｂ）に示すように、上から見た場合に顔の位置が格子点と一致するように形状データが配置される。 Figure 10 shows an example of a case where the subject's face is recognized and analyzed from shape data and coloring information, and the position of the face is identified and taken as the subject position. The subject feature generation unit 1101 identifies the position of the subject's face, as shown in Figure 10(a). The subject shape movement unit 407 also positions the shape data based on the subject position identified by the subject feature generation unit 1101. As a result, the shape data is positioned so that the position of the face coincides with the lattice points when viewed from above, as shown in Figure 10(b).

なお、被写体の特徴認識（例えば顔認識など）及び位置の特定は、撮影装置１１０により得られる複数の撮影画像から特徴（顔）が抽出され、さらにカメラパラメータ情報に基づいて特徴の位置（三次元空間上の位置）が算出されるものとする。しかしこれに限定されず、例えば仮想カメラを所定の位置に設定して仮想視点画像を生成し、生成された仮想視点画像を使用して特徴認識及び位置の特定が行われてもよい。 Note that feature recognition (e.g., facial recognition) and position identification of the subject are performed by extracting features (faces) from multiple captured images obtained by the image capture device 110, and then calculating the positions of the features (positions in three-dimensional space) based on camera parameter information. However, this is not limited to the above, and for example, a virtual camera may be set at a predetermined position to generate a virtual viewpoint image, and feature recognition and position identification may be performed using the generated virtual viewpoint image.

以上説明したように、本実施形態の画像生成装置１１００は、被写体の所定の部位の位置を特定し、特定した位置と基準位置とのずれに応じて形状データを配置し仮想視点画像を生成する。所定の部位は、顔に限らず、手や足、靴などであってもよい。これにより、例えば靴のＣＭ（コマーシャル）撮影を行う場合は、靴の特徴を識別し、靴の位置と基準位置とが一致する、又は位置のずれが所定の閾値より小さくなるように、形状データを配置することが可能となる。このようにすることで、靴の位置が所望の位置となるような仮想視点画像を生成することができる。 As described above, the image generating device 1100 of this embodiment identifies the position of a specific part of the subject, arranges shape data according to the deviation between the identified position and a reference position, and generates a virtual viewpoint image. The specific part is not limited to the face, but may be the hands, feet, shoes, etc. As a result, when shooting a shoe commercial, for example, it is possible to identify the characteristics of the shoes and arrange the shape data so that the position of the shoes matches the reference position or the position deviation is smaller than a specific threshold. In this way, a virtual viewpoint image can be generated in which the shoes are positioned in a desired position.

（第３の実施形態）
第３の実施形態は、時間軸方向における被写体の重心位置に基づき、被写体の位置を変更する例である。図１１は、実施例３に係る画像生成装置１５００の機能構成を示す図である。画像生成装置１５００は、図４に示した第１の実施形態に係る画像生成装置１２０の被写体形状重心算出部４０６のかわりに、被写体形状平均重心算出部１６０１を有する。なお、本実施形態における画像生成装置１５００のハードウェア構成は、上述した実施形態と同様であるものとする。また、同様の機能構成については同じ符号を付し、説明を省略する。 Third Embodiment
The third embodiment is an example in which the position of the subject is changed based on the position of the center of gravity of the subject in the time axis direction. Fig. 11 is a diagram showing the functional configuration of an image generating device 1500 according to the third embodiment. The image generating device 1500 has a subject shape average center of gravity calculation unit 1601 instead of the subject shape center of gravity calculation unit 406 of the image generating device 120 according to the first embodiment shown in Fig. 4. Note that the hardware configuration of the image generating device 1500 in this embodiment is assumed to be the same as that of the above-mentioned embodiment. Also, the same reference numerals are given to the same functional configurations, and the description will be omitted.

被写体形状平均重心算出部１５０１は、撮影装置１１０により得られる撮影画像の動画フレームのそれぞれにおいて、被写体の重心位置を算出する。例えばダンスシーン１０分間が撮影された場合には、各フレームにおいて被写体を真上から見下ろす視点での重心位置を特定し、さらに各動画フレームの重心位置の平均位置を算出する。被写体形状平均重心算出部１５０１は、算出した重心の平均位置を被写体位置とし、平均位置の情報を被写体形状移動部４０７へ出力する。 The subject shape average center of gravity calculation unit 1501 calculates the center of gravity position of the subject in each video frame of the captured image obtained by the imaging device 110. For example, if a 10-minute dance scene is captured, the center of gravity position from a viewpoint looking down on the subject from directly above is identified in each frame, and the average position of the center of gravity positions of each video frame is calculated. The subject shape average center of gravity calculation unit 1501 sets the calculated average position of the centers of gravity as the subject position, and outputs information on the average position to the subject shape movement unit 407.

図１２は、実施例３に係る画像処理装置１５００による画像処理を示すフローチャートである。なお、図１６に示す処理のうち、図５に示す各処理と同じ処理については、説明を省略する。 Figure 12 is a flowchart showing image processing by the image processing device 1500 according to the third embodiment. Note that, among the processes shown in Figure 16, the description of the same processes as those shown in Figure 5 will be omitted.

Ｓ５０４の処理の後、被写体形状平均重心算出部１５０１は、被写体形状生成部４０５から入力された複数の撮影フレームの被写体形状の所定視点位置から見た場合の重心位置を算出する。また、被写体形状平均重心算出部１５０１は、各動画フレームにおける被写体の重心に基づいて、重心の平均位置を算出し、平均位置の情報を被写体形状移動部４０７へ出力する（Ｓ１６０１）。 After the processing of S504, the subject shape average center of gravity calculation unit 1501 calculates the center of gravity position when viewed from a specified viewpoint position of the subject shape of the multiple shooting frames input from the subject shape generation unit 405. In addition, the subject shape average center of gravity calculation unit 1501 calculates the average position of the center of gravity based on the center of gravity of the subject in each video frame, and outputs information on the average position to the subject shape movement unit 407 (S1601).

図１３は、被写体の重心の平均位置を表す図である。例えばダンスシーンの一連の中で、被写体は図１３（ａ）に示す矢印の方向に移動したとする。ここでは、被写体の演技中における移動前の時刻のフレームと演技中における移動後のフレームとで、被写体の重心位置が算出される。さらに、各フレームの重心の平均位置が算出される。図１３（ａ）においては、重心の平均位置として、点１３０１が特定される。 Figure 13 is a diagram showing the average position of the center of gravity of a subject. For example, suppose that in a series of dance scenes, the subject moves in the direction of the arrow shown in Figure 13(a). Here, the position of the center of gravity of the subject is calculated for a frame before the movement during the subject's performance and a frame after the movement during the performance. Furthermore, the average position of the center of gravity of each frame is calculated. In Figure 13(a), point 1301 is identified as the average position of the center of gravity.

図１３（ｂ）は、基準位置に基づいて重心の平均位置を移動させた場合の図である。被写体に対して真上から見下ろす視点で、被写体移動先位置情報に基づく格子点位置に、被写体の平均重心位置が一致するように形状データが配置される。これにより、被写体の位置は、一連の撮影の間は平均して一定の位置に配置され、その周辺で移動していくことになる。 Figure 13 (b) shows the case where the average position of the center of gravity is moved based on the reference position. When viewed from directly above the subject, the shape data is positioned so that the average center of gravity position of the subject coincides with the grid point position based on the subject destination position information. As a result, the subject is positioned at a constant position on average during a series of shots, and moves around that position.

以上、本実施形態によれば、撮影時間に対する被写体の重心の平均位置を算出し、所定高視点上に配置しなおすことで、シーンの一連の移動にも違和感なく被写体の再配置が可能となる。なお、本実施例では、平均重心位置を算出するとしたが、これに限らず、撮影の任意のフレームの被写体重心位置を基本位置として使用してもよいものとする。例えば、撮影開始時点のフレームの被写体重心位置を、床面の格子点位置に合わせて再配置することも可能であるし、撮影終了の直前のフレームの被写体重心位置を用いてもよいものとする。また、本実施形態においては、被写体の重心を特定し、重心の平均位置を被写体位置としたが、これに限定されない。例えば、第２の実施形態で説明した所定の部位の位置を特定し、その平均位置を被写体位置としてもよい。 As described above, according to this embodiment, the average position of the center of gravity of the subject over the shooting time is calculated, and the subject is repositioned at a predetermined height viewpoint, making it possible to reposition the subject without creating a sense of incongruity even with a series of movements in the scene. Note that, although this embodiment calculates the average center of gravity position, this is not limited to this, and the center of gravity position of the subject in any frame of shooting may be used as the basic position. For example, it is possible to reposition the center of gravity position of the subject in the frame at the start of shooting to match the grid point position of the floor surface, or the center of gravity position of the subject in the frame immediately before the end of shooting may be used. Also, in this embodiment, the center of gravity of the subject is identified, and the average position of the center of gravity is used as the subject position, but this is not limited to this. For example, the position of the specified part described in the second embodiment may be identified, and the average position may be used as the subject position.

（第４の実施形態）
第４の実施形態では、被写体を移動させるのではなく、被写体の移動前後の位置関係に基づいて仮想カメラ相対的に移動して撮影して、移動前仮想カメラ画像と合成する方法について説明する。図１４は、被写体位置差分算出部１９０１、複数カオス視点情報設定部１９０２、複数仮想視点画像生成部１９０３を有する。なお、画像生成装置１１００のハードウェア構成は、上述した実施形態と同様であるものとする。また、同様の構成については、同じ符号を付し、説明を省略する。 Fourth Embodiment
In the fourth embodiment, a method is described in which the subject is not moved, but the virtual camera is moved relatively based on the positional relationship before and after the movement of the subject, and the captured image is synthesized with the pre-movement virtual camera image. Fig. 14 has a subject position difference calculation unit 1901, a multiple chaos viewpoint information setting unit 1902, and a multiple virtual viewpoint image generation unit 1903. Note that the hardware configuration of the image generation device 1100 is the same as that of the above-mentioned embodiment. Also, the same reference numerals are used for the same configurations, and the description will be omitted.

被写体位置差分算出部１９０１は、被写体形状重心算出部４０５から被写体重心情報を入力する。また被写体位置設定部４０８から被写体の基準位置の情報を入力し、基準位置と被写体重心位置との差分を算出する。 The subject position difference calculation unit 1901 inputs subject center of gravity information from the subject shape center of gravity calculation unit 405. It also inputs information on the reference position of the subject from the subject position setting unit 408, and calculates the difference between the reference position and the subject center of gravity position.

複数仮想視点設定部１９０２は、ユーザ入力部４０９から入力された再生時刻情報と仮想視点情報に基づき、仮想空間内の現在位置と方向と再生時刻とを更新し、再生時刻情報と仮想視点情報とを複数仮想視点画像生成部１９０３へ出力する。また、複数仮想視点設定部１９０２は、被写体位置差分算出部１９０１から被写体の移動前位置と被写体の移動後位置との位置関係を被写体差分情報として入力した場合、その被写体差分情報に基づく差分仮想視点情報を生成する。 The multiple virtual viewpoint setting unit 1902 updates the current position, direction, and playback time in the virtual space based on the playback time information and virtual viewpoint information input from the user input unit 409, and outputs the playback time information and virtual viewpoint information to the multiple virtual viewpoint image generation unit 1903. In addition, when the multiple virtual viewpoint setting unit 1902 receives the positional relationship between the pre-movement position of the subject and the post-movement position of the subject as subject difference information from the subject position difference calculation unit 1901, it generates difference virtual viewpoint information based on the subject difference information.

複数仮想視点画像生成部１９０３は、複数仮想視点情報設定部１９０２から入力された再生時刻情報と仮想視点情報とに基づいた前景画像データと背景画像データとを撮影画像データ保存部４０３から入力する。またカメラパラメータをカメラパラメータ保持部４０４から入力し、移動後の形状データを被写体形状移動部４０７から入力する。その後、背景画像データを仮想視点位置から背景として見えるように投影変換や画像処理を施して仮想視点画像の背景とする。次に仮想視点位置から見た移動被写体形状に対して、該当時刻に実カメラで撮影された画像データによる色情報でレンダリング（着色処理）して仮想視点画像を生成する。また、被写体位置差分算出部１９０１が被写体差分情報を算出したことによる差分仮想視点情報を入力された場合、差分仮想視点情報の指定位置で仮想視点画像を生成し、ユーザ入力に基づく仮想視点画像と合成する。最後に仮想視点画像を画像出力部４１３へ出力する。 The multiple virtual viewpoint image generating unit 1903 inputs the foreground image data and background image data based on the playback time information and virtual viewpoint information input from the multiple virtual viewpoint information setting unit 1902 from the captured image data storage unit 403. The camera parameters are also input from the camera parameter holding unit 404, and the shape data after movement is input from the object shape moving unit 407. After that, the background image data is subjected to projection transformation and image processing so that it can be seen as the background from the virtual viewpoint position, and becomes the background of the virtual viewpoint image. Next, the moving object shape as seen from the virtual viewpoint position is rendered (colored) with color information based on the image data captured by the real camera at the corresponding time to generate a virtual viewpoint image. In addition, when difference virtual viewpoint information resulting from the object position difference calculation unit 1901 calculating the object difference information is input, a virtual viewpoint image is generated at the specified position of the difference virtual viewpoint information, and is composited with the virtual viewpoint image based on the user input. Finally, the virtual viewpoint image is output to the image output unit 413.

図１５は、仮想空間上の移動させる被写体位置における仮想カメラ位置の概念図である。床面の中心（０、０）の位置を原点として、移動後の被写体位置情報も同様に床面の中心（０、０）の位置とする。なお単位はメートルとする。移動前の被写体の位置が（０、２）であるとすると、基準位置（０、０）に被写体を配置する場合は、仮想カメラ２００１の位置（ｘ、ｙ）を、（０、２）移動した位置（ｘ、ｙ＋２）に移動される。そして、移動させた仮想カメラ２００１から被写体を撮影して仮想視点画像を生成する。さらに、生成した仮想視点画像と、移動させたい被写体以外を撮影した仮想カメラ２００１に対応する仮想視点画像と合成する。 Figure 15 is a conceptual diagram of the virtual camera position at the subject position to be moved in virtual space. The center (0,0) of the floor surface is set as the origin, and the subject position information after the movement is also set as the center (0,0) of the floor surface. The unit is meters. If the position of the subject before the movement is (0,2), when placing the subject at the reference position (0,0), the position (x,y) of the virtual camera 2001 is moved by (0,2) to a position (x,y+2). Then, the subject is photographed from the moved virtual camera 2001 to generate a virtual viewpoint image. Furthermore, the generated virtual viewpoint image is composited with a virtual viewpoint image corresponding to the virtual camera 2001 that photographed an object other than the subject to be moved.

以上、本実施形態によれば、移動させたい被写体に対して仮想カメラを移動させて仮想視点画像を生成し、移動させたい被写体以外を撮影する仮想カメラに対応する仮想視点画像と合成する。これにより、仮想視点画像の合成のみで、被写体の位置を移動させることが可能となる。 As described above, according to this embodiment, a virtual viewpoint image is generated by moving a virtual camera relative to a subject to be moved, and is then composited with a virtual viewpoint image corresponding to a virtual camera capturing an image of an object other than the subject to be moved. This makes it possible to move the position of the subject simply by composite of the virtual viewpoint images.

（その他の実施形態）
上述した実施形態においては、被写体位置と基準位置とのずれに応じて、被写体の形状データを配置し直して仮想視点画像を生成する例について説明した。しかしこれに限定されず、被写体位置と基準位置とがずれていることを識別可能な情報を含む仮想視点画像が生成される構成でもよい。例えば、特定された被写体位置に形状データが配置するとともに、基準位置を示す情報（例えば、格子点や直線など）を表示させてもよい。また、被写体位置に形状データが配置するとともに、基準位置からずれている被写体を識別可能にする表示を行ってもよい。これは例えば、基準位置からずれている被写体の形状データを線で囲む、被写体の色又は明度を変更して強調する等である。これらの方法により、仮想視点画像を見るユーザは被写体位置と基準位置とがずれていることを認識することができる。例えば、演者が生成された仮想視点画像を見て立ち位置を確認したい場合に、仮想視点の位置によらずに、演者の位置のずれを容易に識別することができる。 Other Embodiments
In the above-mentioned embodiment, an example was described in which the shape data of the subject is rearranged according to the deviation between the subject position and the reference position to generate a virtual viewpoint image. However, the present invention is not limited to this, and a configuration may be used in which a virtual viewpoint image is generated that includes information that can identify that the subject position is offset from the reference position. For example, shape data may be arranged at a specified subject position, and information indicating the reference position (for example, a lattice point or a straight line) may be displayed. In addition, shape data may be arranged at the subject position, and a display that makes it possible to identify a subject that is offset from the reference position may be performed. For example, the shape data of a subject that is offset from the reference position may be surrounded by a line, or the color or brightness of the subject may be changed to highlight the subject. With these methods, a user who views the virtual viewpoint image can recognize that the subject position is offset from the reference position. For example, when a performer wants to check his/her standing position by looking at the generated virtual viewpoint image, the offset of the performer's position can be easily identified regardless of the position of the virtual viewpoint.

本開示は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present disclosure can also be realized by supplying a program that realizes one or more of the functions of the above-described embodiments to a system or device via a network or a storage medium, and having one or more processors in the computer of the system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that realizes one or more of the functions.

１２０画像生成装置
４０５被写体形状生成部
４０６被写体形状重心算出部
４０８被写体位置設定部
４１２仮想視点画像生成部 120 Image generating device 405 Subject shape generating unit 406 Subject shape center of gravity calculating unit 408 Subject position setting unit 412 Virtual viewpoint image generating unit

Claims

an acquisition means for acquiring shape data representing a shape of a subject based on a plurality of captured images obtained by capturing images of the subject in a capturing area using a plurality of image capturing devices;
a first specifying means for specifying a plurality of subject positions corresponding to positions of a plurality of subjects in the shooting area;
a second specifying means for specifying a reference position based on a plurality of subject positions specified by the first specifying means;
and a generation means for generating a virtual viewpoint image according to a deviation between a subject position identified by the first identification means and a reference position identified by the second identification means, based on the shape data acquired by the acquisition means.

The generating device according to claim 1, characterized in that the generating means generates a virtual viewpoint image by arranging the shape data based on the reference position in response to the deviation being equal to or greater than a predetermined threshold value.

The generating means includes:
changing the subject position identified by the first identification means so that the deviation becomes smaller than a predetermined threshold value;
The generating device according to claim 1 , further comprising: a virtual viewpoint image generating unit configured to generate the virtual viewpoint image by arranging the shape data based on a changed subject position.

The generating means includes:
changing the subject position based on the deviation so that the subject position identified by the first identification means coincides with the reference position;
The generating device according to claim 1 , further comprising: a virtual viewpoint image generating unit configured to generate the virtual viewpoint image by arranging the shape data based on a changed subject position.

The generating device according to any one of claims 1 to 4, characterized in that the reference position is identified based on lattice points arranged at a predetermined interval.

The generating device according to any one of claims 1 to 5, characterized in that the reference position is a position outside the imaging area.

The generating device according to any one of claims 1 to 6, characterized in that the second identification means identifies the reference position so that the multiple subject positions are spaced apart by a predetermined distance.

The generating device according to any one of claims 1 to 7, characterized in that the second identification means identifies the reference position so that the multiple subject positions are positions on a predetermined straight line.

The generating device according to any one of claims 1 to 8, characterized in that the first identification means identifies the subject position based on the shape of the subject represented by the shape data acquired by the acquisition means.

The generating device according to claim 9, characterized in that the first identification means identifies, as the subject position, the position of the center of gravity in the shape of the subject represented by the shape data.

The generating device according to claim 9, characterized in that the first identification means identifies, as the subject position, the position of a predetermined part in the shape of the subject represented by the shape data.

The generating device according to any one of claims 1 to 11, characterized in that the generating means generates a virtual viewpoint image including information that allows the deviation to be identified.

The generating device according to claim 12, characterized in that the generating means generates a virtual viewpoint image including information representing the subject position identified by the first identifying means and the reference position identified by the second identifying means as information that can identify the deviation.

The generating device according to claim 12 or 13, characterized in that the generating means generates a virtual viewpoint image including information that can identify the deviation and that can identify a subject whose position that is deviated from the reference position is identified as the subject position by the first identifying means, as the information that can identify the deviation.

an acquisition step of acquiring shape data representing a shape of a subject based on a plurality of captured images obtained by capturing images of the subject in a capturing area using a plurality of image capturing devices;
a first specifying step of specifying a plurality of subject positions corresponding to positions of a plurality of subjects in the shooting area;
a second specifying step of specifying a reference position based on the plurality of subject positions specified by the first specifying step ;
and a generation step of generating a virtual viewpoint image according to a deviation between the position identified in the first identification step and the reference position identified in the second identification step, based on the shape data acquired in the acquisition step.

A program for causing a computer to function as a generating device according to any one of claims 1 to 14.

an acquisition means for acquiring shape data representing a shape of a subject based on a plurality of captured images obtained by capturing images of the subject in a capturing area using a plurality of image capturing devices;
a first specifying means for specifying a plurality of subject positions corresponding to positions of a plurality of subjects in the photographing area;
a second specifying means for specifying a reference position based on a plurality of subject positions specified by the first specifying means;
and a generation means for generating a virtual viewpoint image according to a deviation between the subject position identified by the first identification means and the reference position identified by the second identification means, based on the shape data acquired by the acquisition means.