JP2008010966A

JP2008010966A - Moving picture generation system and method

Info

Publication number: JP2008010966A
Application number: JP2006176937A
Authority: JP
Inventors: Kazuhiro Omura; 和弘大村
Original assignee: Xing Inc
Current assignee: Xing Inc
Priority date: 2006-06-27
Filing date: 2006-06-27
Publication date: 2008-01-17
Anticipated expiration: 2026-06-27
Also published as: JP4981370B2

Abstract

<P>PROBLEM TO BE SOLVED: To present a user with new pleasure of karaoke by compounding a 3D moving picture changing the visual point at random and a subject image extracted from a photography video image. <P>SOLUTION: The moving picture generation system 10 included in a karaoke system 1 photoes a user U singing a karaoke musical piece by means of a camera instrument 11 and sends the video images sequentially to a chromakey device 20. A 3D moving picture generator 30 generates a 3D moving picture including the three-dimensional image of a back dancer. A 3D moving picture including a content of the position for viewing the back dancer changed at random is generated and sent to the chromakey device 20. The chromakey device 20 extracts the subject image of the user U from a photography moving picture sent from the camera instrument 11, compounds that subject image and the 3D moving picture sent from the 3D moving picture generator 30 to generate a compound image where the user appears in such a scene as the position for viewing the back dancer is switched at random. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、様々な動作を行うことが可能であると共に見る位置を自在に変更できる三次元的な人体画像等を含む動画に、実際に撮影した被写体の画像を合成して合成動画を生成する動画生成システム、及び動画生成方法に関する。 The present invention generates a synthesized moving image by synthesizing an actually photographed subject image with a moving image including a three-dimensional human body image that can perform various operations and can freely change the viewing position. The present invention relates to a moving image generation system and a moving image generation method.

従来、モーションキャプチャ技術を用いて三次元的な動画を作成することが行われている（特許文献１参照）。モーションキャプチャとは、人体の所要箇所にデータ取得対象となるマーカを複数取り付けた状態で、その人にダンス及びスポーツ等の動きを行ってもらうことにより、三次元座標系におけるマーカを付した箇所の座標値及び角度を表すモーションデータ（姿勢情報）を取得するものである。 Conventionally, a three-dimensional moving image is created using a motion capture technique (see Patent Document 1). Motion capture is a process of attaching a marker in a 3D coordinate system by attaching a plurality of markers for data acquisition to the required part of the human body and having the person perform movements such as dance and sports. Motion data (posture information) representing coordinate values and angles is acquired.

このように取得されたモーションデータを利用して人体を表す動体（人体を模した形態の三次元的な人体画像）を作成すれば、その動体がモーションデータに基づいた姿勢で様々な動作を行う内容の動画を作成できる。また、モーションデータに基づく動画は、動体を見る位置を様々に規定することで、動体を見る向きを自在に変更でき、例えば、動体を真正面から見た状態で動画を生成すること、動体を斜め上から見た状態で動画を生成することなども行える。 If a motion object (three-dimensional human body image imitating a human body) representing a human body is created using the motion data acquired in this way, the motion object performs various operations with postures based on the motion data. You can create a video of the content. In addition, moving images based on motion data can freely change the direction in which the moving object is viewed by prescribing the position where the moving object is viewed, for example, generating a moving image when the moving object is viewed from the front, You can also create a video as seen from above.

一方、上述したモーションキャプチャ技術を用いた内容とは別に、撮影した被写体を含む映像（実写映像）から被写体の画像のみを抽出し、その抽出した被写体画像を、別の画像に合成することが従来から行われている。例えば、実写映像から被写体画像のみを抽出する方法として、クロマキー（chroma key）法、ロートスコーピング（roto scoping）法、ディファレンスマッチング（difference matting）法などがある。 On the other hand, apart from the contents using the motion capture technology described above, it has been conventionally possible to extract only the image of the subject from the video including the photographed subject (actual video) and synthesize the extracted subject image with another image. It is made from. For example, there are a chroma key method, a roto scoping method, a difference matting method and the like as a method for extracting only a subject image from a live-action video.

また、上述した方法を利用して抽出された被写体の画像の合成対象となる別の画像としては、予め準備された静止画、動画以外にも、随時生成される画像を対象にしているものもある。例えば、特許文献２では、画像合成を行う処理部に加えて、歌い手を撮影するビデオカメラを設け、ビデオカメラで撮影した歌い手の映像に基づく検出結果に対応してグラフィック映像（例えば、風船が飛び交う映像）が生成され、このグラフィック映像にビデオカメラで撮影した歌い手の映像を合成することが開示されている。なお、特許文献２では、ビデオカメラで撮影する歌い手の動作を検出し、例えば、歌い手の右手がグラフィック映像の風船に触れた時は、合成対象のグラフィック映像を変化させることも記載されている。 Further, as another image to be combined with the image of the subject extracted using the above-described method, there are images that are generated at any time other than previously prepared still images and moving images. is there. For example, in Patent Document 2, a video camera for photographing a singer is provided in addition to a processing unit that performs image composition, and a graphic image (for example, balloons fly around) corresponding to a detection result based on the image of the singer photographed by the video camera. It is disclosed that a video of a singer photographed with a video camera is synthesized with this graphic video. Note that Patent Document 2 also describes that the motion of a singer shooting with a video camera is detected, and for example, when the singer's right hand touches a balloon of a graphic image, the graphic image to be synthesized is changed.

また、特許文献３では、画像合成を行う処理部と、役者を撮影するカメラを設けると共に、予め蓄積されたＣＧ（コンピュータグラフィックス）キャラクタの三次元モデルデータを元に、操作者がＣＧ操作部で入力したＣＧ動き情報を付加して３次元ＣＧデータを生成し、この生成した３次元ＣＧデータに、カメラで撮影した役者の映像を三次元的に合成することが開示されている。なお、特許文献３では、合成した映像を表示する際、ＣＧ操作部の操作者が分かり易いように、その操作者の視点（つまりＣＧのキャラクタの視点）から見えるであろう仮想的な映像を提示することが記載されている。
特開平１０−２２２６６８号公報特開平５−２３２８６１号公報特開２０００−２３０３７号公報 Further, in Patent Document 3, a processing unit that performs image composition and a camera that captures an actor are provided, and an operator can obtain a CG operation unit based on three-dimensional model data of CG (computer graphics) characters accumulated in advance. It is disclosed that three-dimensional CG data is generated by adding the CG motion information input in step S3, and that the three-dimensional CG data of the actor photographed by the camera is combined with the generated three-dimensional CG data. In Patent Document 3, when a synthesized video is displayed, a virtual video that can be seen from the viewpoint of the operator (that is, the viewpoint of the CG character) is displayed so that the operator of the CG operation unit can easily understand. It is described to present.
Japanese Patent Laid-Open No. 10-222668 JP-A-5-2322861 JP 2000-23037 A

特許文献２に記載された内容は、撮影画像の合成対象となるグラフィック映像は、二次元的なものであるため、合成された映像に三次元的な奥行き、及び三次元的な映像の変化を表現できないと云う問題がある。 The content described in Patent Document 2 is that a graphic image to be combined with a captured image is a two-dimensional image. Therefore, a three-dimensional depth and a three-dimensional image change are added to the combined image. There is a problem that it cannot be expressed.

また、特許文献３に記載された内容は、操作者の入力によるＣＧ動き情報を付加して３次元ＣＧデータを生成するため、３次元ＣＧデータを見る方向を変更できず、画一的なアングルの３次元ＣＧデータを撮影映像に合成するに留まり、合成された映像に含まれる３次元ＣＧデータに対するアングル変化が乏しく、ユーザが飽きやすい内容になると云う問題がある。 Also, the content described in Patent Document 3 is that the CG motion information input by the operator is added to generate 3D CG data, and the viewing direction of 3D CG data cannot be changed. However, there is a problem that the angle change with respect to the 3D CG data included in the synthesized video is scarce and the user is easily bored.

本発明は、斯かる事情に鑑みてなされたものであり、動体を見る方向が変更可能な動画を、撮影した被写体の画像の合成対象にすることで、アングル変化が多彩な合成画像を生成できる動画生成システム、及び動画生成方法を提供することを目的とする。
また、本発明は、動体を見る位置を様々な条件に基づき変更することで毎回、画像内容が多様に変化する合成画像を生成できる動画生成システムを提供することを目的とする。
さらに、本発明は、生成する合成画像をカラオケと組み合わせること、合成画像を記憶媒体に記憶可能にすること、及び合成画像をネットワークを通じて配信可能にすることにより、生成した合成画像の利用範囲を広げられるようにした動画生成システムを提供することを目的とする。 The present invention has been made in view of such circumstances, and by generating a moving image in which the moving object viewing direction can be changed as a composition target of the captured subject image, it is possible to generate a composite image with various angle changes. It is an object to provide a moving image generation system and a moving image generation method.
Another object of the present invention is to provide a moving image generation system that can generate a composite image in which the image content changes variously every time the position where the moving object is viewed is changed based on various conditions.
Furthermore, the present invention expands the range of use of the generated composite image by combining the generated composite image with karaoke, enabling the composite image to be stored in a storage medium, and enabling the composite image to be distributed over a network. An object of the present invention is to provide a moving image generation system that can be used.

上記課題を解決するために本発明に係る動画生成システムは、動体の三次元座標系での姿勢を単位時間ごとに規定する姿勢情報、及び動体を見る位置を規定する視点情報に基づいて姿勢及び視点が特定された動体を含む動画を生成する動画生成手段を備える動画生成システムにおいて、被写体を撮影する撮影手段と、該撮影手段が撮影した映像中に含まれる被写体画像を抽出する画像抽出手段と、該画像抽出手段が抽出した被写体画像を、前記動画生成手段が生成した動画に合成して合成動画を生成する合成動画生成手段とを備えることを特徴とする。
また、本発明に係る動画生成方法は、動画生成システムが、動体の三次元座標系での姿勢を単位時間ごとに規定する姿勢情報、及び動体を見る位置を規定する視点情報に基づいて姿勢及び視点が特定された動体を含む動画を生成する動画生成方法において、前記動画生成システムは、被写体を撮影し、撮影した映像中に含まれる被写体画像を抽出し、抽出した被写体画像を、前記動画に合成して合成動画を生成することを特徴とする。 In order to solve the above problems, the moving image generating system according to the present invention includes a posture based on posture information that defines a posture of a moving object in a three-dimensional coordinate system for each unit time, and viewpoint information that defines a position where the moving object is viewed. In a moving image generation system including a moving image generation unit that generates a moving image including a moving object with a specified viewpoint, an imaging unit that captures a subject, and an image extraction unit that extracts a subject image included in a video captured by the imaging unit And a synthesized moving picture generating means for generating a synthesized moving picture by synthesizing the subject image extracted by the image extracting means with the moving picture generated by the moving picture generating means.
In addition, the moving image generation method according to the present invention includes a moving image generation system based on posture information that defines a posture of a moving object in a three-dimensional coordinate system for each unit time and viewpoint information that defines a position where the moving object is viewed. In the moving image generation method for generating a moving image including a moving object with a specified viewpoint, the moving image generation system captures a subject, extracts a subject image included in the captured video, and converts the extracted subject image into the moving image. A synthesized moving image is generated by synthesis.

本発明にあっては、撮影した映像から抽出した被写体画像を、姿勢情報及び視点情報を利用して三次元的な形態が特定された動体を含む動画に合成するので、合成動画に表れる動体を見る方向が様々に変化自在となる。その結果、アングル変化が多彩な合成画像を生成でき、様々な用途に適用可能な合成画像を各種ユーザに提供できる。特に、撮影手段の被写体としてユーザを撮影することで、視点変更が可能な動体を含む動画中にユーザを登場させて新たなアミューズメントサービスを実現できる。 In the present invention, the subject image extracted from the captured video is synthesized into a moving image including a moving object whose three-dimensional form is specified using posture information and viewpoint information. The viewing direction can be changed in various ways. As a result, a composite image with various angle changes can be generated, and a composite image applicable to various uses can be provided to various users. In particular, by photographing the user as the subject of the photographing means, it is possible to realize a new amusement service by allowing the user to appear in a moving image including a moving object whose viewpoint can be changed.

本発明に係る動画生成システムは、楽曲を取得する楽曲取得手段と、該楽曲取得手段が取得した楽曲の再生処理を行う楽曲再生手段と、該楽曲再生手段の再生処理に合わせて、前記合成動画の表示処理を行う表示処理手段とを備えることを特徴とする。 The video generation system according to the present invention includes a music acquisition unit that acquires music, a music playback unit that performs playback processing of the music acquired by the music acquisition unit, and the synthetic video according to the playback processing of the music playback unit. And display processing means for performing the display processing.

本発明にあっては、取得した楽曲の再生処理に合わせて、生成した合成動画の表示処理を行うことで、多彩な動きを行うと共にアングルを自由に変更できる動体を含む合成動画を楽曲の進行に合わせて表示でき、聴覚及び視覚の両面でユーザを楽しませることができる。 In the present invention, by performing display processing of the generated composite video in accordance with the playback processing of the acquired music, a composite video including a moving body that can perform various movements and change the angle freely can be processed. The user can be entertained in both hearing and vision.

また、本発明に係る動画生成システムは、前記楽曲取得手段は、歌詞を表す文字が付帯された楽曲を取得するようにしてあり、前記楽曲に付帯された文字を、前記合成動画に合成する文字合成手段を備え、前記表示処理手段は、前記文字合成手段により文字が合成された合成動画の表示処理を行うことを特徴とする。 Further, in the moving image generating system according to the present invention, the music acquisition means is configured to acquire a music accompanied by a character representing lyrics, and a character that combines the character attached to the music with the synthesized moving image. The image processing apparatus includes a composing unit, and the display processing unit performs display processing of a synthesized moving image in which characters are synthesized by the character synthesizing unit.

本発明にあっては、楽曲に付帯された歌詞を表す文字を合成動画に合成して表示処理を行うので、カラオケに好適なシステムを提供できる。即ち、合成動画には歌詞のテロップ（文字）が表示されるので、ユーザはテロップを参照して歌うことができ、特に、撮影手段で歌うユーザを撮影すると共に、動画に含まれる動体をバックダンサーを模した人体画像にすれば、まるで多彩な動きをするバックダンサーを引き連れた歌手の状況を擬似的にユーザは体験可能となり、さらに、カラオケ曲に合わせて振りの動作を行えば、自身の振りも合成動画を見ることで確認でき、カラオケにユーザを楽しませる新たな機能を付加できる。さらに、本発明をカラオケに適用した場合、歌うユーザ以外に、表示する合成動画を見るユーザも歌っている人（ユーザ）が映り込んだ表示を見て楽しむことができ、歌うユーザと、見るユーザの連帯感（一体感）を高めることができる。 In the present invention, a character suitable for karaoke can be provided because the character representing the lyrics attached to the music is combined with the synthesized moving image and displayed. That is, since the telop (characters) of the lyrics is displayed in the synthesized video, the user can sing with reference to the telop. In particular, the user shoots the user singing with the shooting means and moves the moving object included in the video as a back dancer. If you create a human body image that mimics the situation, the user will be able to experience the situation of a singer with a back dancer that makes various movements. Can also be confirmed by watching the synthesized video, and can add new functions to entertain users in karaoke. Furthermore, when the present invention is applied to karaoke, in addition to the user who sings, the user who sees the composite video to be displayed can also enjoy watching the display reflected by the person (user) who is singing, and the user who sings and the user who watches Can enhance the sense of solidarity.

さらに、本発明に係る動画生成システムは、前記楽曲取得手段は、複数の視点情報が楽曲進行順に付帯された楽曲を取得するようにしており、前記動画生成手段は、前記楽曲再生手段の再生処理の進行時点に応じた視点情報に基づいて動画を生成することを特徴とする。 Furthermore, in the moving image generation system according to the present invention, the music acquisition unit acquires a piece of music in which a plurality of viewpoint information is attached in the order of music progression, and the moving image generation unit is a reproduction process of the music reproduction unit. The moving image is generated based on the viewpoint information corresponding to the progress time of.

本発明にあっては、複数の視点情報が楽曲進行順に付帯された楽曲の再生処理を行い、再生処理の進行時点に応じた視点情報に基づき動画を生成するので、表示処理が行われる合成動画中の動体を見る位置が楽曲の再生処理の進行に伴い変化するようになる。そのため、楽曲の再生処理にリンクした合成動画中の動体のアングル変化をユーザは楽しめることができる。 According to the present invention, a music reproduction process in which a plurality of viewpoint information is added in the order of music progression is performed, and a moving image is generated based on the viewpoint information corresponding to the point in time of the reproduction processing, so that a synthetic video in which display processing is performed The position where the moving object is viewed changes as the music playback process proceeds. Therefore, the user can enjoy the change in the angle of the moving object in the synthesized moving image linked to the music reproduction process.

さらにまた、本発明に係る動画生成システムは、複数の視点情報の中から１つの視点情報をランダムに選択する選択手段を備え、前記動画生成手段は、選択された視点情報に基づいて動画を生成することを特徴とする。 Furthermore, the moving image generation system according to the present invention includes a selection unit that randomly selects one viewpoint information from a plurality of viewpoint information, and the moving image generation unit generates a moving image based on the selected viewpoint information. It is characterized by doing.

本発明にあっては、複数の視点情報の中からランダムに視点情報を選択して、その視点情報に基づいて動画を生成するので、ランダムに動体を見る位置が様々に変化する合成動画を生成でき、動体の視点変化が多様な合成動画をユーザに提供できる。 In the present invention, since viewpoint information is randomly selected from a plurality of viewpoint information, and a moving image is generated based on the viewpoint information, a synthetic moving image in which the position where the moving object is viewed varies randomly is generated. It is possible to provide a user with a composite video with various viewpoint changes of moving objects.

また、本発明に係る動画生成システムは、前記撮影手段が撮影した映像中に含まれる被写体画像の位置を検出する画像位置検出手段と、映像中の各位置、及び複数の視点情報をそれぞれ対応付けた位置対応テーブルと、前記画像位置検出手段が検出した位置に対応する視点情報を、前記位置対応テーブルから選択する手段とを備え、前記動画生成手段は、前記位置対応テーブルから選択された視点情報に基づいて動画を生成することを特徴とする。 Also, the moving image generating system according to the present invention associates the image position detecting means for detecting the position of the subject image included in the video imaged by the imaging means with each position in the video and a plurality of viewpoint information. A position correspondence table; and means for selecting viewpoint information corresponding to the position detected by the image position detection means from the position correspondence table, wherein the moving image generation means is the viewpoint information selected from the position correspondence table. A moving image is generated based on the above.

本発明にあっては、撮影した映像中の被写体画像の位置に基づいて、動体を見る位置が変化する動画を生成するので、合成画像中の動体のアングルも被写体の位置に応じて変動し、被写体の位置変化に連動して動体のアングルが変化する合成動画を得られる。よって、被写体がユーザであれば、ユーザは積極的に移動すれば、合成動画中の動体のアングルを変化させることが可能となる。そのため、ユーザは、自らが撮影映像中に映り込む位置をコントロールすることで動体のアングルを自在に制御できる。なお、動体を見る位置の変更は、最適な構図の合成動画を得る観点より、検出された被写体画像の位置を参考にして、被写体画像と重ならない状況へ変更することが好ましい。 In the present invention, based on the position of the subject image in the captured video, a moving image in which the position where the moving object is viewed changes is generated, so the angle of the moving object in the composite image also varies depending on the position of the subject, A composite video in which the angle of the moving object changes in conjunction with a change in the position of the subject can be obtained. Therefore, if the subject is a user, if the user actively moves, the angle of the moving object in the synthesized moving image can be changed. Therefore, the user can freely control the angle of the moving object by controlling the position where the user reflects in the captured video. Note that it is preferable to change the position where the moving object is viewed from the viewpoint of obtaining a synthesized moving image having an optimal composition, with reference to the position of the detected subject image so as not to overlap the subject image.

さらに、本発明に係る動画生成システムは、前記撮影手段が撮影した映像中の被写体画像の動作を検出する動作検出手段と、被写体画像に係る各動作、及び複数の視点情報をそれぞれ対応付けた動作対応テーブルと、前記動作検出手段が検出した動作に対応する視点情報を、前記動作対応テーブルから選択する手段とを備え、前記動画生成手段は、前記動作対応テーブルから選択された視点情報に基づいて動画を生成することを特徴とする。 Furthermore, the moving image generation system according to the present invention is an operation in which the motion detection unit that detects the motion of the subject image in the video captured by the capturing unit, the motions related to the subject image, and the motions associated with the plurality of viewpoint information, respectively. A correspondence table; and means for selecting viewpoint information corresponding to the motion detected by the motion detection means from the motion correspondence table, wherein the moving image generation means is based on the viewpoint information selected from the motion correspondence table. It is characterized by generating a moving image.

本発明にあっては、撮影した映像中の被写体画像の動作に基づいて、動体を見る位置が変化する動画を生成するので、合成動画中の動体のアングルも被写体の動作に連動して変化するようになる。そのため、被写体がユーザであれば、ユーザは自らの動きにより合成動画中のアングルをコントロール可能となり、合成動画のサービスにおいてユーザを楽しませる新たな機能を追加できる。 In the present invention, a moving image in which the position where the moving object is viewed changes is generated based on the operation of the subject image in the captured video, so the angle of the moving object in the synthesized moving image also changes in conjunction with the operation of the subject. It becomes like this. Therefore, if the subject is a user, the user can control the angle in the synthesized moving image by his / her movement, and can add a new function to entertain the user in the synthesized moving image service.

さらにまた、本発明に係る動画生成システムは、前記姿勢情報には、複数の動体に係る三次元座標系の姿勢が規定してあり、前記動画生成手段は、前記姿勢情報に基づいて複数の動体を含む動画を生成することを特徴とする。 Furthermore, in the moving image generating system according to the present invention, the posture information defines a posture of a three-dimensional coordinate system related to a plurality of moving objects, and the moving image generating unit is configured to generate a plurality of moving objects based on the posture information. It is characterized by producing | generating the moving image containing.

本発明にあっては、姿勢情報が複数の動体に係る姿勢を規定するので、複数の動体を含む動画が生成されるようになり、合成動画中の動体数を豊富にして、よりダイナミックな内容の動画をユーザに提供可能となる。例えば、本発明をカラオケに適用して被写体としてユーザを撮影すると共に、動体をバックダンサーを模した人体画像にすれば、ユーザは複数のバックダンサーに囲まれて歌う雰囲気を擬似的に楽しむことが可能となり、カラオケの楽しさを高められる。 In the present invention, since the posture information defines postures related to a plurality of moving objects, a moving image including a plurality of moving objects is generated, and the number of moving objects in the composite moving image is abundant and more dynamic content is generated. Can be provided to the user. For example, when the present invention is applied to karaoke and a user is photographed as a subject, and the moving body is a human body image that imitates a back dancer, the user can enjoy a singing atmosphere surrounded by a plurality of back dancers. It becomes possible and can enhance the fun of karaoke.

また、本発明に係る動画生成システムは、背景の三次元座標系での位置を規定した背景情報を記憶する手段を備え、前記動画生成手段は、記憶された背景情報に基づいて背景を含む動画を生成することを特徴とする。 In addition, the moving image generating system according to the present invention includes means for storing background information that defines the position of the background in the three-dimensional coordinate system, and the moving image generating means includes a moving image including a background based on the stored background information. Is generated.

本発明にあっては、動体に加えて、背景を含む動画を生成するので、合成動画中には背景も追加されるようになり、合成動画の内容を詳細にしてユーザの目を楽しませることができる。特に、本発明をカラオケに適用して被写体としてユーザを撮影し、動体をバックダンサーを模した人体画像にすると共に、背景をステージにすれば、ユーザはステージ上でバックダンサーを従えて歌う雰囲気を擬似的に楽しむことが可能となり、カラオケの楽しさを視覚的に一段と向上できる。 In the present invention, in addition to moving objects, a moving image including a background is generated, so a background is also added to the synthesized moving image, and the details of the synthesized moving image are detailed to entertain the user's eyes. Can do. In particular, when the present invention is applied to karaoke and a user is photographed as a subject, the moving body is a human body image simulating a back dancer, and the background is a stage, the user can sing along with the back dancer on the stage. It becomes possible to enjoy in a pseudo manner, and the joy of karaoke can be further improved visually.

さらに、本発明に係る動画生成システムは、前記合成動画生成手段が生成した合成動画を記憶媒体に記憶する処理を行う記憶処理手段を備えることを特徴とする。 Furthermore, the moving image generating system according to the present invention is characterized by comprising storage processing means for performing processing for storing the synthesized moving image generated by the synthesized moving image generating means in a storage medium.

本発明にあっては、生成した合成動画を記憶媒体に記憶できるので、例えば、ユーザ自身が登場する合成動画をＤＶＤ等の着脱式の記憶媒体に記憶して、その記憶媒体を再生すれば、擬似的な体験を家庭でも楽しめるようになり、結婚式の二次会、同窓会、各種オーディション等の多様なイベントで利用可能なサービスを提供できる。 In the present invention, since the generated composite video can be stored in a storage medium, for example, if the composite video that the user himself appears is stored in a removable storage medium such as a DVD, and the storage medium is reproduced, You can enjoy a simulated experience at home, and can provide services that can be used at various events such as wedding reunions, alumni associations, and various auditions.

さらにまた、本発明に係る動画生成システムは、ネットワークを通じて送信された動画要求信号を受信する受信手段と、該受信手段が動画要求信号を受信した場合、該動画要求信号の送信元へ動画を送信する動画送信手段とを備え、前記動画送信手段は、前記合成動画生成手段が生成した合成動画を送信することを特徴とする。 Furthermore, the moving image generation system according to the present invention receives a moving image request signal transmitted through a network, and transmits a moving image to a transmission source of the moving image request signal when the receiving unit receives the moving image request signal. Moving image transmitting means for transmitting the synthesized moving image generated by the synthesized moving image generating means.

本発明にあっては、生成した合成動画をネットワークを通じて、要求するユーザの元へ配信可能となるので、合成動画をネットワークを利用して広く配布可能となり、生成した合成動画の利用範囲を広げられる。 In the present invention, since the generated synthesized video can be distributed to the requesting user via the network, the synthesized video can be widely distributed using the network, and the usage range of the generated synthesized video can be expanded. .

本発明にあっては、撮影した映像から抽出した被写体画像を、姿勢情報に加えて視点情報も利用して三次元的な状態が特定される動体を含む動画に合成するので、生成された合成動画に含まれる動体の見る方向を変更でき、合成動画中の動体の見る方向を変えて多様な表現形態で毎回、ユーザの目を楽しませることができる。 In the present invention, the subject image extracted from the captured video is synthesized into a moving image including a moving object whose three-dimensional state is specified using viewpoint information in addition to posture information. The viewing direction of the moving object included in the moving image can be changed, and the viewing direction of the moving object in the synthesized moving image can be changed to entertain the user's eyes every time in various expression forms.

また、本発明にあっては、取得した楽曲の再生処理に合わせて、生成した合成動画の表示処理を行うことで、楽曲の進行に会わせて動体を見るアングルが変化する合成動画をユーザに表示でき、聴覚及び視覚の両面でユーザに楽しさを提供できる。 In addition, in the present invention, by performing display processing of the generated composite video in accordance with the playback processing of the acquired music, a composite video in which the angle at which the moving object is viewed changes according to the progress of the music is displayed to the user. It can be displayed, and enjoyment can be provided to the user in both auditory and visual senses.

さらに、本発明にあっては、楽曲に付帯された歌詞を表す文字を合成動画に合成して表示処理を行うので、動体の見る方向が様々に変化可能な合成動画中に歌詞を表示して、カラオケに好適なシステムを実現でき、カラオケを歌うユーザは擬似的なステージ体験を楽しめると共に、表示された合成動画を見るユーザには知人、友人等が画面中に登場する表示内容を楽しめる。
さらにまた、本発明にあっては、複数の視点情報が付帯した楽曲を用いるので、楽曲の再生処理の進行時点に応じて合成動画中の動体を見る位置の変化を行い、楽曲のイントロ、盛り上がり箇所など楽曲の再生状況にマッチした表示内容でユーザを楽しませることができる。 Furthermore, in the present invention, since the character representing the lyrics attached to the music is combined with the composite video and the display processing is performed, the lyrics are displayed in the composite video in which the moving direction of the moving object can be changed variously. A system suitable for karaoke can be realized, and a user who sings karaoke can enjoy a pseudo stage experience, and a user who sees the displayed synthesized video can enjoy display contents that acquaintances, friends, etc. appear on the screen.
Furthermore, in the present invention, since the music with a plurality of viewpoint information is used, the position of viewing the moving object in the synthesized video is changed according to the progress of the music playback process, and the music intro and excitement The user can be entertained with the display content that matches the playback status of the music such as the location.

また、本発明にあっては、見る位置の規定がそれぞれ異なる複数の視点情報の中からランダムに視点情報を選択して、その視点情報に基づいて動画を生成するので、動体を見る位置が毎回変化して飽きが来ない合成動画をユーザに提供できる。 In the present invention, viewpoint information is randomly selected from a plurality of viewpoint information with different viewing positions, and a moving image is generated based on the viewpoint information. It is possible to provide the user with a synthetic video that does not get tired of changes.

さらに、本発明にあっては、撮影した映像中の被写体画像の位置に基づいて、合成動画中の動体を見る位置を変化でき、撮影の被写体となるユーザに対して、撮影される位置を変えることで動体の位置が制御可能なシステムを実現できる。
さらにまた、本発明にあっては、撮影した映像中の被写体画像の動作に基づいて、合成動画中の動体を見る位置を変化でき、撮影の被写体となるユーザに対して、撮影中の動作を変えることで動体の位置が制御可能なシステムを実現できる。 Furthermore, according to the present invention, the position where the moving object in the synthesized moving image is viewed can be changed based on the position of the subject image in the shot video, and the position where the shot is taken can be changed for the user who is the shooting subject. Thus, a system capable of controlling the position of the moving object can be realized.
Furthermore, in the present invention, based on the operation of the subject image in the captured video, the position of viewing the moving object in the composite video can be changed, and the operation during the shooting is performed for the user who is the subject of the shooting. A system that can control the position of the moving object can be realized by changing it.

本発明にあっては、姿勢情報が複数の動体に係る姿勢を規定するので、複数の動体のアングルが変化するダイナミックな動きの合成動画をユーザに提供できる。
また、本発明にあっては、動体の他に背景を含む動画を生成するので、合成動画中に視点位置の変更が可能な動体及び背景が登場し、一段と多様な三次元内容の合成動画を作成できる。 In the present invention, since the posture information defines postures related to a plurality of moving objects, it is possible to provide the user with a dynamic motion synthesized moving image in which the angles of the plurality of moving objects change.
In the present invention, since a moving image including a background in addition to a moving object is generated, moving objects and backgrounds whose viewpoint position can be changed appear in the combined moving image, and a combined moving image having a more various three-dimensional content is displayed. Can be created.

本発明にあっては、生成した合成動画を記憶媒体に記憶するので、その記憶媒体を介して多くの人に生成した合成動画を見てもらう機会を提供できる。
また、本発明にあっては、生成した合成動画をネットワークを通じて、要求するユーザの元へ配信可能となるので、ネットワークを利用して多くの人に合成動画を容易に見てもらう機会を提供できる。 In the present invention, since the generated synthesized moving image is stored in the storage medium, it is possible to provide an opportunity for many people to see the generated synthesized movie through the storage medium.
Further, in the present invention, since the generated synthesized video can be distributed to the requesting user via the network, it is possible to provide an opportunity for many people to easily see the synthesized video using the network. .

図１は、本発明の第１実施形態に係る動画生成システム１０を適用したカラオケシステム１の全体的な構成を示している。カラオケシステム１は、本発明の動画生成システム１０を用いることで、カラオケ楽曲を歌うユーザＵが登場する合成動画を生成して大型ディスプレイ２に表示し、ユーザＵには、多様な動きを行う３人のバックダンサー１５ａ〜１５ｃを前にして歌う状況を擬似的に体験可能にすると共に、周囲のユーザには、歌うユーザＵのパフォーマンスを楽しめるようにして、カラオケの新たな楽しさを歌う人、見る人の両方に提供できることが特徴になっている。 FIG. 1 shows an overall configuration of a karaoke system 1 to which a moving image generating system 10 according to the first embodiment of the present invention is applied. The karaoke system 1 uses the video generation system 10 of the present invention to generate a composite video in which a user U who sings karaoke music appears and displays it on the large display 2. A person who sings a new enjoyment of karaoke by making it possible to simulate the situation of singing in front of human back dancers 15a to 15c and allowing the surrounding users to enjoy the performance of the singing user U, The feature is that it can be provided to both viewers.

第１実施形態のカラオケシステム１は、ユーザＵを撮影した被写体画像を取り込むために、クロマキー法を採用しており、カラオケの楽曲を歌うユーザＵが位置する場所の背部及び周囲に、青色の壁部材６を設けている。なお、カラオケシステム１は、歌うユーザＵが歌詞及び自身の撮影画像等を確認するためのサブディスプレイ４を、壁部材６に対向するよう配置している。また、カラオケシステム１は、カラオケ楽曲の配信を行うカラオケ楽曲サーバ５をネットワークＮＷを通じて動画生成システム１０に接続し、さらに生成した合成動画及びユーザＵの歌唱を、周囲のユーザが確認できるように大型ディスプレイ２及び左右スピーカ３ａ、３ｂを設けている。 The karaoke system 1 according to the first embodiment employs a chroma key method in order to capture a subject image obtained by photographing the user U, and has a blue wall on the back and the periphery of the place where the user U who sings karaoke music is located. A member 6 is provided. In addition, the karaoke system 1 arrange | positions the sub display 4 for the user U who sings to confirm a lyrics, an own picked-up image, etc. so that the wall member 6 may be opposed. In addition, the karaoke system 1 connects the karaoke music server 5 that distributes karaoke music to the video generation system 10 through the network NW, and is large enough to allow the surrounding users to check the generated synthesized video and the song of the user U. A display 2 and left and right speakers 3a and 3b are provided.

カラオケシステム１に適用された本実施形態の動画生成システム１０は図１中、波線で囲まれた範囲に該当し、カメラ装置１１、クロマキー装置２０、３Ｄ動画生成装置３０、分配装置４０、カラオケ装置４１、記憶装置４４、及び３Ｄ動画配信サーバ４５を含んでいる。なお、動画生成システム１０において、必須となるのはカメラ装置１１、クロマキー装置２０、及び３Ｄ動画生成装置３０であり、その他の部分（分配装置４０、カラオケ装置４１等）は適用対象のサービスの種類に応じてオプション的に追加される周辺装置に該当する。 A moving image generating system 10 according to the present embodiment applied to the karaoke system 1 corresponds to a range surrounded by a wavy line in FIG. 1 and includes a camera device 11, a chroma key device 20, a 3D moving image generating device 30, a distributing device 40, and a karaoke device. 41, a storage device 44, and a 3D moving image distribution server 45. In the moving image generation system 10, the camera device 11, the chroma key device 20, and the 3D moving image generation device 30 are indispensable, and the other parts (distribution device 40, karaoke device 41, etc.) are the types of services to be applied. This corresponds to a peripheral device that is optionally added depending on the device.

よって、第１実施形態の動画生成システム１０は、カラオケシステム１に適用されることから、分配装置４０及びカラオケ装置４１を有し、さらに、生成した合成動画を記憶媒体（ＤＶＤ）に記憶可能にすると共に、ネットワークＮＷを通じて配信可能にするため、記憶装置４４及び３Ｄ動画配信サーバ４５を設けている。以下、動画生成システム１０が有する各装置１１、２０等について、追加した周辺装置（分配装置４０及びカラオケ装置４１等）から説明する。 Therefore, since the moving image generating system 10 of the first embodiment is applied to the karaoke system 1, the moving image generating system 10 includes the distribution device 40 and the karaoke device 41, and can further store the generated combined moving image in a storage medium (DVD). In addition, a storage device 44 and a 3D moving image distribution server 45 are provided to enable distribution through the network NW. Hereinafter, each of the devices 11 and 20 included in the moving image generation system 10 will be described from the added peripheral devices (distribution device 40 and karaoke device 41 and the like).

分配装置４０は、クロマキー装置２０で生成された合成動画を分配して複数の分配先へ送る処理を行うものである。具体的には、第３ビデオ線Ｖ３でクロマキー装置２０から受け取る合成動画を、第４ビデオ線Ｖ４を通じて大型ディスプレイ２へ送ると共に、第５ビデオ線Ｖ５を通じてカラオケ装置４１へ送る処理を行う。 The distribution device 40 performs processing for distributing the composite moving image generated by the chroma key device 20 and sending it to a plurality of distribution destinations. Specifically, the composite moving image received from the chroma key device 20 through the third video line V3 is sent to the large display 2 through the fourth video line V4, and sent to the karaoke device 41 through the fifth video line V5.

図２は、カラオケ装置４１の内部構成を示すブロック図である。カラオケ装置４１は内部バス４１ｉを介して、各種制御を行う制御部４１ａ、通信処理部４１ｂ、カラオケ楽曲処理部４１ｃ、楽曲再生処理部４１ｄ、メモリ部４１ｅ、テロップ合成部４１ｆ、入出力インタフェース４１ｇ、及び赤外光受光部４１ｈを接続した構成にしている。 FIG. 2 is a block diagram showing the internal configuration of the karaoke apparatus 41. The karaoke apparatus 41 includes a control unit 41a that performs various controls, a communication processing unit 41b, a karaoke music processing unit 41c, a music reproduction processing unit 41d, a memory unit 41e, a telop synthesis unit 41f, an input / output interface 41g, via an internal bus 41i. The infrared light receiving unit 41h is connected.

通信処理部４１ｂは楽曲取得手段に相当し、ネットワークＮＷを介してカラオケ楽曲サーバ５と繋がっており、制御部４１ａの制御指示に基づきカラオケ楽曲サーバ５へユーザが指定するカラオケ楽曲の要求信号を送信する。カラオケ楽曲サーバ５は、要求信号の受信に伴って指定されたカラオケ楽曲を送信するようになっており、通信処理部４１ｂは、カラオケ楽曲サーバ５から送信されたカラオケ楽曲を受信して取得する。なお、カラオケ楽曲サーバ５が配信するカラオケ楽曲は、音楽に関する楽曲データに、歌詞を表す文字データ（テロップ）が付帯されたものになっている。 The communication processing unit 41b corresponds to music acquisition means, is connected to the karaoke music server 5 via the network NW, and transmits a request signal for karaoke music specified by the user to the karaoke music server 5 based on the control instruction of the control unit 41a. To do. The karaoke song server 5 is configured to transmit the specified karaoke song with the reception of the request signal, and the communication processing unit 41b receives and acquires the karaoke song transmitted from the karaoke song server 5. Note that the karaoke music distributed by the karaoke music server 5 is obtained by adding character data (telop) representing lyrics to music data related to music.

さらに、本実施形態では、通信処理部４１ｂは、後述する音声合成部４２ａでカラオケ楽曲とユーザ音声が合成された合成音の音ファイル及びテロップ合成部４１ｆでテロップが合成された合成動画の動画ファイルをメモリ部４１ｅから読み出して、両者を関連づけて３Ｄ動画配信サーバ４５へネットワークＮＷを通じてアップロード（送信）する処理も行う。このようなアップロード処理は、制御部４１ａの制御指示に基づいて行われており、アップロードされる際には、日付、及びカラオケが行われた場所の情報（例えば、カラオケ店の名称）等が付加されたファイル形式で送信される。 Furthermore, in the present embodiment, the communication processing unit 41b includes a sound file of a synthesized sound obtained by synthesizing karaoke music and user voice by a voice synthesizing unit 42a, which will be described later, and a moving image file of a synthesized moving image obtained by synthesizing a telop by the telop synthesizing unit 41f. Is also read out from the memory unit 41e, and the both are associated with each other and uploaded (transmitted) to the 3D moving image distribution server 45 via the network NW. Such upload processing is performed based on the control instruction of the control unit 41a, and when uploading, the date and information on the place where the karaoke was performed (for example, the name of the karaoke store) are added. Sent in the specified file format.

また、カラオケ楽曲処理部４１ｃは、通信処理部４１ｂで取得されたカラオケ楽曲を、楽曲データと、文字データに分離して、楽曲データを楽曲再生処理部４１ｄに送ると共に、文字データをテロップ合成部４１ｆに送る処理を行う。 The karaoke song processing unit 41c separates the karaoke song acquired by the communication processing unit 41b into song data and character data, and sends the song data to the song reproduction processing unit 41d, and the character data is converted into a telop synthesis unit. Processing to send to 41f is performed.

楽曲再生処理部４１ｄは楽曲再生手段に相当し、楽曲データの再生処理を順次行って再生した楽曲音を音声合成部４２ａに送る。音声合成部４２ａには、第１音声ケーブルＡ１が繋がったマイク音声入力部４２ｂが受け付けるカラオケ楽曲を歌うユーザＵのユーザ音声も送られており、音声合成部４２ａは、再生された楽曲音とユーザ音声を合成する処理を行い、合成音を増幅部４２ｃへ送ると共に、第２音声ケーブルＡ２を通じて記憶装置４４へも送るようにしている。さらに、音声合成部４２ａは合成音を、アップロード用の音ファイルにしてメモリ部４１ｅへ送る処理も行っている。なお、増幅部４２ｃは、合成音を増幅して第３音声ケーブルＡ３を通じて左右スピーカ３ａ、３ｂから出力する。 The music reproduction processing unit 41d corresponds to music reproduction means, and sequentially performs the reproduction processing of the music data and sends the reproduced music sound to the voice synthesis unit 42a. User speech of the user U who sings karaoke music received by the microphone voice input unit 42b connected to the first voice cable A1 is also sent to the voice synthesis unit 42a. The voice synthesis unit 42a A process of synthesizing voice is performed, and the synthesized sound is sent to the amplifying unit 42c and also sent to the storage device 44 through the second voice cable A2. Furthermore, the voice synthesizer 42a also performs a process of converting the synthesized sound into a sound file for uploading and sending it to the memory unit 41e. The amplifying unit 42c amplifies the synthesized sound and outputs it from the left and right speakers 3a, 3b through the third audio cable A3.

一方、テロップ合成部４１ｆは文字合成手段に相当し、動画入力部４２ｄと接続されている。動画入力部４２は第５ビデオ線Ｖ５を通じて、後述するクロマキー装置２０で生成された合成動画を取得しており、取得した合成動画をテロップ合成部４１ｆへ送る処理を行う。よって、テロップ合成部４１ｆは送られた合成動画を受け取ると、その合成動画にカラオケ楽曲のテロップを合成する処理を行い、テロップを合成した合成動画（図１６参照）を動画インタフェース部４２ｅへ送る。なお、テロップ合成部４１ｆも、合成処理を行った合成動画を、アップロード用の動画ファイルとしてメモリ部４１ｅに送る処理を行う。 On the other hand, the telop synthesis unit 41f corresponds to a character synthesis unit and is connected to the moving image input unit 42d. The moving image input unit 42 acquires a combined moving image generated by the chroma key device 20 to be described later via the fifth video line V5, and performs processing for sending the acquired combined moving image to the telop combining unit 41f. Accordingly, when the telop synthesizing unit 41f receives the transmitted synthesized moving image, the telop synthesizing unit 41f performs a process of synthesizing the karaoke music telop with the synthesized moving image, and sends the synthesized movie (see FIG. 16) synthesized with the telop to the moving image interface unit 42e. The telop compositing unit 41f also performs processing for sending the composite video that has undergone the compositing process to the memory unit 41e as a video file for upload.

また、動画インタフェース部４２ｅは表示処理手段に相当し、受け取った合成動画を表示用のデータ（データ信号及び走査信号）に変換して、所定のタイミングで第６ビデオ線Ｖ６を通じてサブディスプレイ４へ送る処理を行う。動画インタフェース部４２ｅでの上述した表示処理のタイミングは制御部４１ａにより、楽曲再生処理部４１ｄにおける再生処理と同期が取られており、楽曲の再生状況に合ったタイミングで、図１６に示すようなカラオケ楽曲のテロップ付きの合成動画が表示されるように表示処理が行われる。なお、動画インタフェース部４２ｅは、音声合成部４１ａが第２音声ケーブルＡ２を通じて音データを記憶装置４４へ送るタイミングに合わせて、合成動画の動画データを、第７ビデオ線Ｖ７を通じて記憶装置４４へ送る処理も行っている。 The moving image interface unit 42e corresponds to display processing means, converts the received combined moving image into display data (data signal and scanning signal), and sends it to the sub display 4 through the sixth video line V6 at a predetermined timing. Process. The timing of the above-described display processing in the moving image interface unit 42e is synchronized with the playback processing in the music playback processing unit 41d by the control unit 41a, and the timing according to the playback status of the music is as shown in FIG. Display processing is performed so that a composite video with telop of karaoke music is displayed. The moving picture interface unit 42e sends the moving picture data of the synthesized moving picture to the storage device 44 through the seventh video line V7 in accordance with the timing at which the voice synthesis unit 41a sends the sound data to the storage device 44 through the second audio cable A2. Processing is also performed.

メモリ部４１ｅは、アップロード用となる音ファイル及び動画ファイルを関連付けて一時的に記憶している。また、入出力インタフェース４１ｇは、ネットワークケーブルＬ１を通じて後述する３Ｄ動画生成装置３０と接続されている。入出力インタフェース４１ｇは、制御部４１ａの制御に基づき楽曲再生処理部４１ｄで再生処理を開始すると、再生開始信号を３Ｄ動画生成装置３０へ送ると共に、再生処理を終了すると、再生終了信号を３Ｄ動画生成装置３０へ送る処理を行う。 The memory unit 41e temporarily stores a sound file and a moving image file for uploading in association with each other. Further, the input / output interface 41g is connected to a 3D moving image generating apparatus 30 described later through a network cable L1. The input / output interface 41g sends a playback start signal to the 3D video generation device 30 when the music playback processing unit 41d starts the playback process based on the control of the control unit 41a. Processing to send to the generation device 30 is performed.

赤外光受光部４１ｈは、ユーザが操作するリモコン装置４３から発せられる操作指示を含む赤外光を受光するものであり、受光した赤外光に含まれる操作指示を内部バス４０ｉを通じて制御部４１ａへ送る処理を行う。制御部４１ａは、受け取った操作指示に従って上述した各部４１ｂ、４１ｃ等の制御を行っている。 The infrared light receiving unit 41h receives infrared light including an operation instruction issued from the remote control device 43 operated by the user, and receives an operation instruction included in the received infrared light through the internal bus 40i. Process to send to. The control unit 41a controls the above-described units 41b and 41c according to the received operation instruction.

また、図１に示す記憶装置４４は記憶媒体であるＤＶＤに、生成された合成動画の記憶処理を行う記憶処理手段に相当し、具体的には第７ビデオ線Ｖ７を通じてカラオケ装置４１からテロップが合成された合成動画を受け取ると共に、第２音声ケーブルＡ２を通じてカラオケ装置４１からカラオケ楽曲音とユーザ音声の合成音を受け取っている。記憶装置４４は、受け取った合成動画及び合成音を記憶処理部４４ａでＤＶＤに書き込む処理（記憶する処理）を行っており、合成動画等が記憶されたＤＶＤを、ユーザはカラオケに伴うサービス品目（サービスメニュー）の一つとして有償で入手できる。 Further, the storage device 44 shown in FIG. 1 corresponds to storage processing means for storing the generated composite moving image on a DVD as a storage medium. Specifically, a telop is sent from the karaoke device 41 through the seventh video line V7. The synthesized synthesized video is received, and the synthesized sound of the karaoke music sound and the user voice is received from the karaoke apparatus 41 through the second audio cable A2. The storage device 44 performs a process of storing (storing) the received synthesized moving image and synthesized sound on the DVD by the storage processing unit 44a. It is available for a fee as one of the service menus.

図３は、３Ｄ動画配信サーバ４５の主要な内部構成を示すブロック図である。３Ｄ動画配信サーバ４５は、クロマキー装置２０で生成された合成動画をネットワークＮＷを通じて広く配信可能にするものであり、図４に示すようなウェブページ４７を有するウェブサイトをネットワーク上に設けている。３Ｄ動画配信サーバ４５は、ＭＰＵ４５ａ、通信インタフェース４５ｂ、ＲＡＭ４５ｃ、ＲＯＭ４５ｄ、及びハードディスク装置４５ｅを内部バス４５ｉで接続している。 FIG. 3 is a block diagram illustrating a main internal configuration of the 3D moving image distribution server 45. The 3D moving image distribution server 45 enables the combined moving image generated by the chroma key device 20 to be widely distributed through the network NW, and a website having a web page 47 as shown in FIG. 4 is provided on the network. In the 3D moving image distribution server 45, an MPU 45a, a communication interface 45b, a RAM 45c, a ROM 45d, and a hard disk device 45e are connected by an internal bus 45i.

通信インタフェース４５ｂは、ネットワークＮＷと接続されており、各種信号及びデータファイルの送受信を行い、本実施形態ではカラオケ装置４１からアップロードされる合成動画及び音声のファイル（動画ファイル及び音ファイル）を受信して、ハードディスク装置４５ｅに記憶されたコンテンツデータベース４６へ送る処理を行う。また、通信インタフェース４５ｂは、ネットワークＮＷを通じてウェブサイトへアクセスしてきたアクセス元へＭＰＵ４５ａの制御によりウェブページ４７のページデータを送信すると共に、コンテンツの要求信号（動画要求信号）の受信、コンテンツ（動画ファイル及び音ファイル）の配信等もＭＰＵ４５ａの制御に基づき行う。 The communication interface 45b is connected to the network NW, and transmits and receives various signals and data files. In the present embodiment, the communication interface 45b receives a composite video and audio file (video file and sound file) uploaded from the karaoke apparatus 41. Then, processing to send to the content database 46 stored in the hard disk device 45e is performed. The communication interface 45b transmits the page data of the web page 47 to the access source that has accessed the website through the network NW under the control of the MPU 45a, receives the content request signal (video request signal), and receives the content (video file). And sound files) are also performed based on the control of the MPU 45a.

ＲＡＭ４５ｃはＭＰＵ４５ａの処理に従うデータ及びフォルダ等を一時的に記憶し、ＲＯＭ４５ｄはＭＰＵ４５ａが行う基本的な処理内容を規定したプログラム等を予め記憶する。ハードディスク装置４５ｅは、サーバの基本的な処理を規定したサーバプログラム４５ｆ、コンテンツの配信処理を規定した配信プログラム４５ｇ、ウェブページ用のページデータ４５ｈ、及びアップロードされた動画ファイル及び音ファイルを格納したコンテンツデータベース４６を記憶している。 The RAM 45c temporarily stores data, folders, and the like according to the processing of the MPU 45a, and the ROM 45d stores in advance a program that defines basic processing contents performed by the MPU 45a. The hard disk device 45e includes a server program 45f that defines basic processing of the server, a distribution program 45g that defines content distribution processing, page data 45h for web pages, and content that stores uploaded video files and sound files. A database 46 is stored.

配信プログラム４５ｇは、コンテンツ配信に係るＭＰＵ４５ａの制御処理の内容を規定したものであり、ウェブサイトのアクセス元の端末に図４のサイトページ４７を表示させる処理を行う。なお、サイトページ４７は、コンテンツデータベース４６に格納されているコンテンツを、コンテンツの作成日付及びコンテンツが作成されたカラオケ店を表記して選択可能にした選択欄４７ａ、選択欄４７ａで選択した状態のコンテンツの配信を決定する決定ボタン４７ｂ、及び選択した状態のコンテンツをキャンセルするキャンセルボタン４７ｃを有する。決定ボタン４７ｂがアクセス元の端末で選択されると、選択状態のコンテンツを要求する動画要求信号が、３Ｄ動画配信サーバ４５へ送信されるようになっている。 The distribution program 45g defines the contents of the control process of the MPU 45a related to content distribution, and performs the process of displaying the site page 47 of FIG. 4 on the terminal that is the website access source. The site page 47 is a state in which the content stored in the content database 46 is selected in the selection column 47a and the selection column 47a in which the creation date of the content and the karaoke shop where the content is created can be selected. A determination button 47b for determining content distribution and a cancel button 47c for canceling the selected content are provided. When the determination button 47b is selected at the access source terminal, a moving image request signal for requesting the selected content is transmitted to the 3D moving image distribution server 45.

そのため、配信プログラム４５ｇは、３Ｄ動画配信サーバ４５の通信インタフェース４５ｂで動画要求信号を受信すると、選択されたコンテンツ（動画ファイル及び音ファイル）をコンテンツデータベース４６から読み出して、アクセス元の端末へ通信インタフェース４５ｂから送信することを規定している。 Therefore, when the distribution program 45g receives the video request signal by the communication interface 45b of the 3D video distribution server 45, the distribution program 45g reads the selected content (video file and sound file) from the content database 46 and communicates with the access source terminal. 45b is specified for transmission.

次に、動画生成システム１０において必須となるカメラ装置１１、クロマキー装置２０、及び３Ｄ動画生成装置３０を説明する。カメラ装置１１は、被写体としてカラオケ楽曲を歌うユーザＵを、所定のフレームレートでビデオ撮影する撮影手段に相当し、撮影した映像は第１ビデオ線Ｖ１によりクロマキー装置２０へ順次送っている。 Next, the camera device 11, the chroma key device 20, and the 3D moving image generating device 30 that are essential in the moving image generating system 10 will be described. The camera device 11 corresponds to photographing means for photographing a user U who sings karaoke music as a subject at a predetermined frame rate, and the photographed images are sequentially sent to the chroma key device 20 through the first video line V1.

図５は、クロマキー装置２０の内部構成を示すブロック図である。クロマキー装置２０は、第１入力部２１、第２入力部２２、被写体画像抽出部２３、合成部２４、及び出力部２５を有する。第１入力部２１は第１ビデオ線Ｖ１が接続されており、カメラ装置１１から送られる撮影映像が入力される。また、第２入力部２２は第２ビデオ線Ｖ２が接続されており、３Ｄ動画生成装置３０で生成された３Ｄ動画（図１４（ｂ）参照）が入力される。被写体画像抽出部２３は画像抽出手段に相当し、第１入力部２１に入力された撮影映像中に含まれるユーザＵの画像（被写体画像）のみをクロマキー法により抽出する処理を行い、抽出した被写体画像（図１４（ａ）参照）を合成部２４へ送る。 FIG. 5 is a block diagram showing an internal configuration of the chroma key device 20. The chroma key device 20 includes a first input unit 21, a second input unit 22, a subject image extraction unit 23, a synthesis unit 24, and an output unit 25. The first input unit 21 is connected to the first video line V <b> 1, and a captured video sent from the camera device 11 is input. The second input unit 22 is connected to the second video line V2, and receives the 3D moving image generated by the 3D moving image generating device 30 (see FIG. 14B). The subject image extraction unit 23 corresponds to an image extraction unit, and performs a process of extracting only an image (subject image) of the user U included in the captured video input to the first input unit 21 by the chroma key method. The image (see FIG. 14A) is sent to the combining unit 24.

合成部２４は合成動画生成手段に相当し、被写体画像抽出部２３から送られる被写体画像を、第２入力部２２で入力された３Ｄ動画に合成して合成動画（図１の大型ディスプレイ２に表示された内容）を随時生成するものである。なお、合成部２４は、図１４（ａ）に示すように撮影映像Ｗ（図中、波線で示す）から抽出された被写体画像Ｈを、撮影映像Ｗの映像枠の下辺Ｗａと、３Ｄ動画生成装置３０で生成された図１４（ｂ）に示す３Ｄ動画Ｇの動画枠の下辺Ｇａが一致するように合成する処理を行って、図１５（ａ）（ｂ）に示すような合成動画のフレーム画像Ｇ１、Ｇ２等を生成する。また、出力部２５は第３ビデオ線Ｖ３が接続されており、合成部２４で生成された合成動画を第３ビデオ線Ｖ３を通じて分配装置４０へ随時出力する処理を行っている。 The synthesizing unit 24 corresponds to a synthetic moving image generating unit, and synthesizes the subject image sent from the subject image extracting unit 23 with the 3D moving image input by the second input unit 22 and displays the synthesized moving image (displayed on the large display 2 in FIG. 1). Generated content) at any time. The synthesizing unit 24 generates a subject image H extracted from the captured video W (indicated by a wavy line in the figure) as shown in FIG. The synthesized moving image frame as shown in FIGS. 15A and 15B is obtained by performing a process of combining so that the lower side Ga of the moving image frame of the 3D moving image G shown in FIG. Images G1, G2, etc. are generated. Further, the output unit 25 is connected to the third video line V3, and performs a process of outputting the synthesized moving image generated by the synthesizing unit 24 to the distribution device 40 through the third video line V3 as needed.

図６は、３Ｄ動画生成装置３０（動画生成手段に相当）の内部構成を示すブロック図である。本実施形態の３Ｄ動画生成装置３０は汎用のパーソナルコンピュータを適用している。３Ｄ動画生成装置３０は、モーションキャプチャ技術により得られた動体のモーションデータ（姿勢情報に相当）を動画フレームごとに含んだ動画データＤと、動体を見る位置を規定した視点情報を複数格納した視点テーブルＴを予め記憶しており、これら動画データＤ及び視点テーブルＴに基づき、動体に対する視点位置がランダムに変更する３Ｄ動画（図１４（ｂ）参照）を生成する。 FIG. 6 is a block diagram showing an internal configuration of the 3D moving image generating apparatus 30 (corresponding to moving image generating means). A general-purpose personal computer is applied to the 3D moving image generating apparatus 30 of the present embodiment. The 3D moving image generating apparatus 30 stores a plurality of moving image data D including moving image motion data (corresponding to posture information) obtained by the motion capture technology for each moving image frame, and a plurality of viewpoint information defining a position where the moving object is viewed. A table T is stored in advance, and based on the moving image data D and the viewpoint table T, a 3D moving image (see FIG. 14B) in which the viewpoint position with respect to the moving object is randomly changed is generated.

３Ｄ動画生成装置３０は、コンピュータ本体３０ａの内部において、各種制御処理を行う制御部３１（プロセッサ）に、ＲＡＭ３２、ＲＯＭ３３、動画出力インタフェース３４、入出力インタフェース３５、及びハードディスク装置３６を内部バス３０ｂで接続している。ＲＡＭ３２は制御部３１の処理に従うデータ及びフォルダ等を一時的に記憶し、ＲＯＭ３３は制御部３１が行う基本的な処理内容を規定したプログラム等を予め記憶する。動画出力インタフェース３４は第２ビデオ線Ｖ２が接続されており、生成した３Ｄ動画をクロマキー装置２０へ随時送る処理を行う。入出力インタフェース３５はネットワークケーブルＬ１が接続されており、カラオケ装置４１から送られる再生開始信号、再生終了信号等を受け付けている。 The 3D moving image generating apparatus 30 includes a RAM 32, a ROM 33, a moving image output interface 34, an input / output interface 35, and a hard disk device 36 via an internal bus 30b in a control unit 31 (processor) that performs various control processes inside the computer main body 30a. Connected. The RAM 32 temporarily stores data, folders, and the like according to the processing of the control unit 31, and the ROM 33 stores in advance a program that defines basic processing contents performed by the control unit 31. The moving image output interface 34 is connected to the second video line V2, and performs processing to send the generated 3D moving image to the chroma key device 20 as needed. The input / output interface 35 is connected to the network cable L1 and receives a reproduction start signal, a reproduction end signal, and the like sent from the karaoke apparatus 41.

ハードディスク装置３６は、各種プログラム及びデータ等を記憶し、本実施形態ではプログラムとして、コンピュータ本体３０ａを作動させる上でベースとなる処理を規定したシステムプログラム３７、３Ｄ動画の生成に係る処理を規定した動画生成プログラム３８、動画データＤ、及び視点テーブルＴ等を記憶している。 The hard disk device 36 stores various programs, data, and the like. In the present embodiment, the program defines a system program 37 that defines a base process for operating the computer main body 30a, and a process related to the generation of a 3D moving image. A moving image generating program 38, moving image data D, a viewpoint table T, and the like are stored.

ハードディスク装置３６に記憶される動画データＤは、動体として３体の三次元的なバックダンサー（バックダンサー画像）を含む動画コンテンツであり、実際のダンサーにマーカを付してモーションキャプチャ技術により得た三次元座標系における実際のダンサーの姿勢を、マーカの座標及び角度等で単位時間（動画フレーム）ごとに規定して作成されたモーションデータを含む内容になっている。このような動画データＤは、３Ｄ動画の生成時に視点テーブルＴに含まれる視点情報に基づき動体を見る位置（視点）を規定することで、その視点から３体のバックダンサーを見た状態の姿勢で３Ｄ動画が生成されるようになっている。なお、本実施形態の動画データＤは、３体のバックダンサーに関する三次元座標系の姿勢情報に加えて、ステージ背景画像となる背景情報（三次元座標系でのステージ背景画像の位置を規定したもの）も含んでいる。 The moving image data D stored in the hard disk device 36 is moving image content including three three-dimensional back dancers (back dancer images) as moving objects, and obtained by motion capture technology with markers attached to actual dancers. The content includes motion data created by defining the actual posture of the dancer in the three-dimensional coordinate system for each unit time (moving image frame) with the coordinates and angle of the marker. Such moving image data D specifies the position (viewing point) at which a moving object is viewed based on the viewpoint information included in the viewpoint table T when the 3D moving image is generated. A 3D video is generated. Note that the moving image data D of the present embodiment defines background information (stage background image position in the three-dimensional coordinate system) in addition to the posture information in the three-dimensional coordinate system regarding the three back dancers. Stuff).

図７は３Ｄ動画の生成に係る状態のイメージを概略的に示したものである。図中、Ｘ軸、Ｙ軸、Ｚ軸で構成されるＸＹＺ座標系における３体のバックダンサー１５ａ〜１５ｃ、及び４個のステージ背景１６ａ〜１６ｄ（ハート型のモチーフ画像）は、動画データＤに基づくものであり、図中、多数存在するカメラ１７Ａ、１７Ｂ等は視点テーブルＴが規定する三次元座標ＸＹＺにおける動体を見る位置を表したものである。なお、各カメラ１７Ａ、１７Ｂ等ごとに、ＸＹＺ座標系とは相違するカメラ系座標であるＵＶＷ座標系が設けてあり、本実施形態の各カメラ１７Ａ、１７Ｂは、撮像方向（視点方向）に一致するＶ軸を中央に位置するバックダンサー１５ｂに向けている。これら複数のカメラ１７Ａ、１７Ｂ等の中から１つを選択することで、ＸＹＺ座標系での動体（バックダンサー１５ａ〜１５ｃ）を見る位置が定まり、その定まったカメラで撮影した内容になるように３Ｄ動画が生成される。なお、図７で示されたカメラの数及び位置は一例であり、仕様に応じてカメラの数及び位置は適宜設定できる。 FIG. 7 schematically shows an image of a state related to generation of a 3D moving image. In the figure, three back dancers 15a to 15c and four stage backgrounds 16a to 16d (heart-shaped motif images) in an XYZ coordinate system composed of an X-axis, a Y-axis, and a Z-axis are represented as moving image data D. In the figure, a large number of cameras 17A, 17B and the like represent positions where the moving object is viewed in the three-dimensional coordinates XYZ defined by the viewpoint table T. Each camera 17A, 17B, etc. is provided with a UVW coordinate system which is a camera system coordinate different from the XYZ coordinate system, and each camera 17A, 17B of the present embodiment matches the imaging direction (viewpoint direction). The V axis to be directed is directed to the back dancer 15b located in the center. By selecting one of the plurality of cameras 17A, 17B, etc., the position where the moving body (back dancers 15a to 15c) is viewed in the XYZ coordinate system is determined, and the content captured by the determined camera is obtained. A 3D video is generated. Note that the number and positions of the cameras shown in FIG. 7 are examples, and the number and positions of the cameras can be set as appropriate according to the specifications.

図８は、視点テーブルＴの中身を示している。視点テーブルＴはカメラ（第１カメラ１７Ａ〜第ｎカメラ１７Ｎ）ごとにＸＹＺ座標における座標値を規定したものになっており、テーブル中のカメラの種類（及び座標値）が視点情報に該当する。 FIG. 8 shows the contents of the viewpoint table T. The viewpoint table T defines the coordinate values in the XYZ coordinates for each camera (the first camera 17A to the nth camera 17N), and the type (and coordinate value) of the camera in the table corresponds to the viewpoint information.

なお、動体データＤに含まれる動体（バックダンサー）ついて少し説明すると、図９（ａ）は三次元コンピュータグラフィックス技術により作成される３体の中の１体のバックダンサー１５ａ（バックダンサー画像）を示し、バックダンサー画像は図９（ｂ）に示すように、人体の骨に相当するボーンＢと云う棒状のリンク部材を連結したものに、人体の皮膚に相当するスキンを被せて作成される。さらに、図９（ｂ）に示すボーンＢの各所に付された点Ｐ１〜Ｐ１７が、実際のダンサーに付されたマーカ位置に相当し、これら各点Ｐ１〜Ｐ１７ごとにモーションデータの値が存在する。なお、図９（ｂ）に示す各点Ｐ１〜Ｐ１７の位置及び個数は一例であり、実際のダンサーに付すマーカの位置及び個数に応じて適宜変更できる。 The moving body (back dancer) included in the moving body data D will be described briefly. FIG. 9A shows one back dancer 15a (back dancer image) of three bodies created by three-dimensional computer graphics technology. As shown in FIG. 9B, the back dancer image is created by connecting a rod-like link member called bone B corresponding to the bone of the human body to a skin corresponding to the skin of the human body. . Furthermore, the points P1 to P17 attached to each part of the bone B shown in FIG. 9B correspond to the marker positions attached to the actual dancers, and there is a motion data value for each of these points P1 to P17. To do. In addition, the position and number of each point P1-P17 shown in FIG.9 (b) are examples, and can be suitably changed according to the position and number of the marker attached | subjected to an actual dancer.

図１０は、動画データＤに含まれる動画フレーム（単位時間）ごとに生成される三次元的な状態が特定されたバックダンサーを含む３Ｄ動画における各フレームのイメージを示す図である。３Ｄ動画は、時刻ｔ１、ｔ２、ｔ３・・・における動画フレームｆ１、ｆ２、ｆ３・・・により構成されており、各動画フレームｆ１、ｆ２、ｆ３等に応じた画像を順次生成することで、各動画フレームｆ１、ｆ２、ｆ３等に含まれるバックダンサー１５ａ〜１５ｃが動く動画を得られる。なお、本実施形態の動画データＤは、１秒当たりの動画フレーム数を６０個（６０フレーム／秒）にしているが、この数値はあくまで一例であり、モーションデータを取得する際のフレーム数の範囲であれば、要求される動画品質に応じて適宜増減できる。また、図１０では、動画データＤに含まれるステージ背景画像の図示は省略している。 FIG. 10 is a diagram illustrating an image of each frame in a 3D moving image including a back dancer in which a three-dimensional state generated for each moving image frame (unit time) included in the moving image data D is specified. The 3D moving image is composed of moving image frames f1, f2, f3... At times t1, t2, t3..., And sequentially generates images corresponding to the moving image frames f1, f2, f3, etc. A moving image in which the back dancers 15a to 15c included in each of the moving image frames f1, f2, f3 and the like move is obtained. Note that the moving image data D of the present embodiment has 60 moving image frames per second (60 frames / second), but this value is merely an example, and the number of frames when the motion data is acquired. If it is within the range, it can be appropriately increased or decreased according to the required moving image quality. In FIG. 10, the stage background image included in the moving image data D is not shown.

図１１は、動画データＤに含まれる時刻ｔ１、ｔ２における第１動画フレームｆ１、第２動画フレームｆ２中の１つのバックダンサー１５ａに対するモーションデータＭ１ａ、Ｍ２ａの中身を概略的に示したものである。モーションデータＭ１ａ、Ｍ２ａは、図９（ｂ）に示す各点Ｐ１〜Ｐ１７ごとに、図７に示すＸＹＺ座標系におけるＸ軸、Ｙ軸、Ｚ軸のそれぞれに対する回転角度、及び座標値を有する。このような動画フレームごとのモーションデータに基づいて、図１０に示す各動画フレームに含まれるバックダンサー１５ａ〜１５ｃの姿勢が特定された画像が生成される。なお、図１０の画像は、中央のバックダンサー１５ｂを正面から見た状態で視点が特定されたものになっている。また、図１１では、残りのバックダンサー１５ｂ、１５ｃ及びステージ背景１６ａ〜１６ｄに係るモーションデータの図示を省略しているが、動画データＤは、これらのモーションデータも勿論含んでいる。 FIG. 11 schematically shows the contents of motion data M1a and M2a for one back dancer 15a in the first moving image frame f1 and the second moving image frame f2 at times t1 and t2 included in the moving image data D. . The motion data M1a and M2a have rotation angles and coordinate values for the X, Y, and Z axes in the XYZ coordinate system shown in FIG. 7 for each point P1 to P17 shown in FIG. 9B. Based on the motion data for each moving image frame, an image in which the postures of the back dancers 15a to 15c included in each moving image frame illustrated in FIG. 10 are specified is generated. In the image of FIG. 10, the viewpoint is specified with the central back dancer 15b viewed from the front. In FIG. 11, illustration of motion data relating to the remaining back dancers 15b and 15c and the stage backgrounds 16a to 16d is omitted, but the moving image data D naturally includes these motion data.

次に、動画生成プログラム３８が規定する処理内容について説明する。動画生成プログラム３８は、制御部３１が行う制御処理内容を規定しており、カラオケ装置４１からの再生開始信号の入力に伴い、視点テーブルＴからランダムにカメラを選択する処理を行う。具体的には、Ｎ個のカメラを視点テーブルＴが含んでいるとすると、１／Ｎの確率でいずれか１つのカメラに対応する番号を発生する乱数発生処理を行い、発生した乱数の数値に対応するカメラを選択する処理を行う。例えば、乱数発生処理で「２」が生じると、図８の視点テーブルＴから第２カメラ１７Ｂを制御部３１が選択することになる。 Next, processing contents defined by the moving image generation program 38 will be described. The moving image generation program 38 defines the content of control processing performed by the control unit 31 and performs processing for randomly selecting a camera from the viewpoint table T in accordance with the input of the playback start signal from the karaoke apparatus 41. Specifically, if the viewpoint table T includes N cameras, random number generation processing is performed to generate a number corresponding to any one camera with a probability of 1 / N, and the generated random number is converted into a numerical value. Processing to select the corresponding camera is performed. For example, when “2” occurs in the random number generation process, the control unit 31 selects the second camera 17B from the viewpoint table T in FIG.

また、動画生成プログラム３８は、選択されたカメラの視点で３Ｄ動画を生成することを規定している。なお、本実施形態の動画生成プログラム３８は、このようなカメラ選択処理を８秒ごとに行うようにしており、３Ｄ動画の生成及びカメラ選択処理はカラオケ装置４１からの再生終了信号の入力に伴って終わらせるようにしている。 The moving image generation program 38 defines that a 3D moving image is generated from the viewpoint of the selected camera. Note that the moving image generation program 38 of the present embodiment performs such camera selection processing every 8 seconds, and the generation of the 3D moving image and the camera selection processing are accompanied by the input of the reproduction end signal from the karaoke apparatus 41. To end.

図１２は、３Ｄ動画生成装置３０における動画生成プログラム３８に基づく処理の流れを整理した第１フローチャートである。以下、この第１フローチャートに従って３Ｄ動画生成装置３０の処理手順を説明する。先ず、３Ｄ動画生成装置３０は、カラオケ装置４１からの再生開始信号を受け取ったか否かを判断する（Ｓ１）。再生開始信号を受け取っていない場合（Ｓ１：ＮＯ）、３Ｄ動画生成装置３０は処理待ちとなる。また、再生開始信号を受け取った場合（Ｓ１：ＹＥＳ）、３Ｄ動画生成装置３０は複数のカメラの中から一つのカメラをランダムに選択する処理を行い（Ｓ２）、視点テーブルＴを参照して、選択したカメラの視点で３Ｄ動画の生成処理を行う（Ｓ３）。それから、３Ｄ動画生成装置３０は、８秒が経過したか否かを判断し（Ｓ４）、８秒が経過していない場合（Ｓ４：ＮＯ）、３Ｄ動画生成の処理段階（Ｓ３）へ戻り、以降、８秒が経過するまで所定のフレームレートで３体のバックダンサーが所定の動作を行う３Ｄ動画の生成を行う。 FIG. 12 is a first flowchart in which the flow of processing based on the moving image generation program 38 in the 3D moving image generation apparatus 30 is organized. Hereinafter, the processing procedure of the 3D moving image generating apparatus 30 will be described according to the first flowchart. First, the 3D moving image generating apparatus 30 determines whether or not a reproduction start signal is received from the karaoke apparatus 41 (S1). When the reproduction start signal is not received (S1: NO), the 3D moving image generating apparatus 30 waits for processing. Further, when the reproduction start signal is received (S1: YES), the 3D moving image generating apparatus 30 performs a process of randomly selecting one camera from a plurality of cameras (S2), referring to the viewpoint table T, A 3D moving image generation process is performed from the viewpoint of the selected camera (S3). Then, the 3D moving image generating apparatus 30 determines whether or not 8 seconds have passed (S4). If 8 seconds have not elapsed (S4: NO), the 3D moving image generating apparatus 30 returns to the 3D moving image generation processing step (S3), Thereafter, a 3D moving image is generated in which three back dancers perform a predetermined motion at a predetermined frame rate until 8 seconds elapse.

また、８秒が経過した場合（Ｓ４：ＹＥＳ）、３Ｄ動画生成装置３０は、再生終了信号をカラオケ装置４１から受け取ったか否かを判断し（Ｓ５）、再生終了信号を受け取っていない場合（Ｓ５：ＮＯ）、カメラ選択処理の段階（Ｓ２）へ戻り、新たなカメラの選択処理を行うことになる。この新たなカメラの選択により自動的に３Ｄ動画の視点が変更される。以降、再生終了信号を受け取るまでＳ２〜Ｓ５の段階を繰り返し、再生終了信号を受け取った場合（Ｓ５：ＹＥＳ）、３Ｄ動画生成装置３０は、処理を終了する。 If 8 seconds have elapsed (S4: YES), the 3D moving image generating apparatus 30 determines whether or not a playback end signal has been received from the karaoke apparatus 41 (S5), and if no playback end signal has been received (S5). : NO), the process returns to the stage (S2) of the camera selection process, and a new camera selection process is performed. By selecting this new camera, the viewpoint of the 3D moving image is automatically changed. Thereafter, steps S2 to S5 are repeated until a playback end signal is received, and when the playback end signal is received (S5: YES), the 3D moving image generating apparatus 30 ends the process.

図１３は、３Ｄ動画における視点の変更状態を表したタイムチャートである。３Ｄ動画生成装置３０が、上述したようなカメラの選択処理を行うので、本実施形態ではカラオケ楽曲の再生開始から８秒ごとにカメラが切り替わる。それにより、本発明では、図１４（ｂ）に示す３Ｄ動画Ｇのアングル以外に、各バックダンサー１５ａ〜１５ｃ及びステージ背景１６ａ〜１６ｄの三次元的な画像を見る位置が８秒ごと切り替わる３Ｄ画像が３Ｄ動画生成装置３０で生成される。 FIG. 13 is a time chart showing a viewpoint change state in a 3D moving image. Since the 3D moving image generating apparatus 30 performs the camera selection process as described above, in this embodiment, the camera is switched every 8 seconds from the start of the reproduction of the karaoke music. Accordingly, in the present invention, in addition to the angle of the 3D moving image G shown in FIG. 14B, a 3D image in which the positions of viewing the three-dimensional images of the back dancers 15a to 15c and the stage backgrounds 16a to 16d are switched every 8 seconds. Is generated by the 3D moving image generating apparatus 30.

また、最終的にはクロマキー装置２０での合成処理により、図１５（ａ）（ｂ）に示すように、被写体画像Ｈの背後に位置する３体のバックダンサー１５ａ〜１５ｃ、及びステージ背景１６ａ〜１６ｄの三次元的な画像を見るアングルが様々に切り替わる合成画像のフレーム画像Ｇ１、Ｇ２を得ることができる。このようなフレーム画像Ｇ１、Ｇ２により構成される合成画像は、大型ディスプレイ２に表示されるため、カラオケ楽曲を歌うユーザＵの周囲にいるユーザも大型ディスプレイ２の表示内容を見て楽しむことができ、歌うユーザＵと一緒に楽しめる一体感を演出できる。なお、図１５（ａ）は、図７において、正面の第６カメラ１７Ｆが選択された場合のものであり、図１５（ｂ）は左側の第２カメラ１７Ｂが選択された場合のものである。 Finally, as shown in FIGS. 15A and 15B, three back dancers 15a to 15c and stage backgrounds 16a to 16c located behind the subject image H are obtained by combining processing in the chroma key device 20. It is possible to obtain frame images G1 and G2 of a composite image in which angles for viewing a 16d three-dimensional image are switched variously. Since the composite image composed of such frame images G1 and G2 is displayed on the large display 2, users around the user U who sings karaoke music can also enjoy watching the display contents on the large display 2. A sense of unity that can be enjoyed with the singing user U can be produced. FIG. 15A shows the case where the front sixth camera 17F is selected in FIG. 7, and FIG. 15B shows the case where the left second camera 17B is selected. .

さらに、図１６は、サブディスプレイ４で表示されるテロップ１９が合成された合成動画のフレーム画像Ｇ１０を示している。この合成動画のフレーム画像Ｇ１０は、カラオケ楽曲を歌うユーザＵが見ることになる。そのためユーザＵは、通常のカラオケ画面に加えて自身が登場するため、楽曲に合わせた振りを確認できると共に、リアルに多様な動きを行うバックダンサー１５ａ〜１５ｃとステージ背景１６ａ〜１６ｄにより、擬似的にステージ上で熱唱している雰囲気を味わえる。さらに、バックダンサー１５ａ〜１５ｃとステージ背景１６ａ〜１６ｄを見る位置及びアングルがランダムに切り替わるため、ユーザＵはテレビ番組における本物のカメラワークにようにダイナミックな構図の変化を毎回楽しむことができる。 Further, FIG. 16 shows a frame image G10 of a combined moving image in which the telop 19 displayed on the sub display 4 is combined. The user U who sings a karaoke song will see the frame image G10 of this synthesized moving image. Therefore, since the user U appears in addition to the normal karaoke screen, the user U can confirm the swing according to the music, and the back dancers 15a to 15c and the stage backgrounds 16a to 16d that perform various realistic movements are simulated. You can enjoy the atmosphere of singing on stage. Furthermore, since the positions and angles at which the back dancers 15a to 15c and the stage backgrounds 16a to 16d are viewed are switched at random, the user U can enjoy a dynamic composition change every time like real camera work in a television program.

さらにまた、本発明のカラオケシステム１では、図１６に示すような合成動画を記憶装置４４でＤＶＤに記憶できると共に、３Ｄ動画配信サーバ４５よりネットワーク配信可能にしているので、生成したコンテンツ（合成動画）の二次的な利用も容易に行えるようにしている。その結果、本発明のカラオケシステム１は、今までにない新たな楽しさをユーザに与えられると共に、カラオケを歌うことで生成されたコンテンツの提供もスムーズに展開することができ、カラオケサービスを行う事業体にとって新たな収益源の確保にも役立てることができ、さらに、カラオケサービス以外にも結婚式の二次会、同窓会等の各種イベント、また、新人歌手を発掘するためのオーディション等にも本発明を活用できる。 Furthermore, in the karaoke system 1 of the present invention, a composite video as shown in FIG. 16 can be stored on a DVD in the storage device 44 and can be distributed over the network from the 3D video distribution server 45. ) Can be easily used for secondary use. As a result, the karaoke system 1 of the present invention can give the user a new enjoyment that has never existed, and can smoothly provide the content generated by singing karaoke, thereby providing a karaoke service. In addition to karaoke services, the present invention can also be used to secure new revenue sources for business entities, as well as various events such as wedding reunions and alumni associations, and auditions for discovering new singers. Can be used.

なお、第１実施形態のカラオケシステム１及び動画生成システム１０は、上述した内容に限定されるものではなく、種種の変形例の適用が可能である。たとえば、カラオケ楽曲の取得方法は、図１に示すようにネットワークＮＷを通じてカラオケ楽曲サーバ５から取得する以外にも、カラオケ装置４１に、多数のカラオケ楽曲を記憶した記憶媒体（ＤＶＤ、ハードディスク装置等）の読取部を設け、この読取部でユーザが指定したカラオケ楽曲を記憶媒体から読み取ることで、カラオケ装置４１がカラオケ楽曲を取得するようにしてもよい。 The karaoke system 1 and the moving image generation system 10 of the first embodiment are not limited to the above-described contents, and various modifications can be applied. For example, the karaoke song acquisition method is not limited to the acquisition from the karaoke song server 5 through the network NW as shown in FIG. 1, but a storage medium (DVD, hard disk device, etc.) storing a large number of karaoke songs in the karaoke device 41. The karaoke device 41 may acquire the karaoke music by reading the karaoke music specified by the user from the storage medium.

また、カラオケ楽曲を歌うユーザＵも大型ディスプレイ２の表示を確認できるときは、サブディスプレイ４を省略してもよく、このときは、分配装置４０を省略できると共に、大型ディスプレイ２にはテロップを合成した合成動画を表示するようにカラオケ装置４１からの出力動画を表示することが好適である。また、仕様を簡略化する場合は、記憶装置４４及び３Ｄ動画配信サーバ４５を省略してもよい。さらに、動画生成システム１０をカラオケシステム１に適用しない場合は、カラオケ装置４１も省略して動画のみに処理を絞った仕様にしてもよい。 Further, when the user U who sings karaoke music can also confirm the display on the large display 2, the sub-display 4 may be omitted. In this case, the distribution device 40 can be omitted and a telop is synthesized on the large display 2. It is preferable to display the output moving image from the karaoke apparatus 41 so as to display the synthesized moving image. Further, when simplifying the specification, the storage device 44 and the 3D moving image distribution server 45 may be omitted. Furthermore, when the moving image generation system 10 is not applied to the karaoke system 1, the karaoke device 41 may be omitted and the processing may be limited to only the moving image.

さらにまた、３Ｄ動画生成装置３０でランダムにカメラを選択する時間間隔は、８秒に限定されるものではなく（図１２のステップＳ４参照）、他の時間を適用することも可能である。さらに、カメラを選択する間隔の時間もランダムに変化させることも可能であり、例えば、最初の時間間隔は３秒、２番目の時間間隔は１０秒、３番目の時間間隔を７秒と云うように不規則な時間間隔にして、時間的にもランダムに視点が変化する動画内容にしてもよい。なお、この場合は、カメラ選択の処理後に時間間隔を決定する処理が必要となり、決定する時間は例えば１秒から１５秒の範囲の各秒が１／１５の確率で当たるように乱数を発生させて時間を決定する。 Furthermore, the time interval for selecting a camera at random by the 3D moving image generating apparatus 30 is not limited to 8 seconds (see step S4 in FIG. 12), and other times can be applied. Furthermore, the time interval for selecting the camera can also be changed randomly. For example, the first time interval is 3 seconds, the second time interval is 10 seconds, and the third time interval is 7 seconds. The content of the moving image may change at random time intervals so that the viewpoint changes randomly in time. In this case, it is necessary to determine the time interval after the camera selection process. For example, a random number is generated so that each second in the range of 1 to 15 seconds has a probability of 1/15. Determine the time.

また、動体に相当するバックダンサー１５ａ〜１５ｃは３体以外に適宜増減可能であり、さらに、バックダンサー以外にも動物、アニメのキャラクタ等の様々な形態を動体として適用してもよい。さらにまた、ステージ背景は別の形態のものを適用することが可能であり、画像内容の簡略化を図るときは省略することも勿論可能である。また、合成処理に用いる方法は、クロマキー法以外にも、他の方法を適用してもよい。 Further, the back dancers 15a to 15c corresponding to the moving objects can be appropriately increased or decreased in addition to the three bodies, and various forms such as animals and animated characters may be applied as moving objects in addition to the back dancers. Furthermore, it is possible to apply another stage background, and it is of course possible to omit it when simplifying the image content. In addition to the chroma key method, other methods may be applied as the method used for the composition processing.

図１７は、第１実施形態の変形例の処理に用いられるカラオケ楽曲のデータ構造を概略的に示したものである。このカラオケ楽曲は、楽曲データの最初（時刻０）から最後（時刻Ｔｎ）までにおいて、所定の時間間隔（例えば、８秒ごと）で、選択するカメラを指定する情報が楽曲進行順に付帯されたものになっている。なお、このようなデータ構造のカラオケ楽曲は、カラオケ楽曲の作成者側で、３Ｄ動画生成装置３０で生成される際のアングルを決めることができるため、カラオケ楽曲の作成者側がカラオケ楽曲の際に表示される動画内容のアングルをコントロールできるメリットがある。また、この変形例では上述したデータ構造のカラオケ楽曲を、図１に示すカラオケ楽曲サーバ５から配信できる構成にして、カラオケ装置４１で取得できるようにする。 FIG. 17 schematically shows the data structure of a karaoke piece used in the process of the modification of the first embodiment. In this karaoke music piece, information for specifying a camera to be selected is attached in order of the musical piece progression at a predetermined time interval (for example, every 8 seconds) from the first (time 0) to the last (time Tn) of the music data. It has become. In addition, since the karaoke music creator of the data structure can determine the angle when the creator of the karaoke music is generated by the 3D moving image generating device 30, the creator of the karaoke music can use the karaoke music. There is an advantage that the angle of the displayed video content can be controlled. Further, in this modification, the karaoke piece having the above-described data structure is configured to be distributed from the karaoke piece server 5 shown in FIG.

さらに、この変形例では、カラオケ装置４１は、カラオケ楽曲の再生処理を行う際、付帯するカメラの情報をカメラ指定信号として３Ｄ動画生成装置３０へ、楽曲の再生処理に合わせて送る処理を行う。例えば、カラオケ装置４１は、図１７に示すデータ構造のカラオケ楽曲の再生処理を行う場合、楽曲の再生開始時に、再生開始信号及び第４カメラを指定するカメラ指定信号を３Ｄ動画生成装置３０へ送り、再生開始からｔ１０秒経過後に第６カメラを指定するカメラ指定信号を３Ｄ動画生成装置３０へ送り、再生開始からｔ１１秒経過後に第３カメラを指定するカメラ指定信号を３Ｄ動画生成装置３０へ送る。 Furthermore, in this modified example, when the karaoke device 41 performs the karaoke song reproduction process, the karaoke device 41 performs processing to send the information of the attached camera as a camera designation signal to the 3D moving image generation device 30 in accordance with the song reproduction process. For example, when the karaoke apparatus 41 performs the reproduction process of the karaoke piece having the data structure shown in FIG. 17, the reproduction start signal and the camera designation signal for designating the fourth camera are sent to the 3D moving picture generating apparatus 30 at the start of reproduction of the piece of music. Then, a camera designation signal designating the sixth camera is sent to the 3D video generation device 30 after elapse of t10 seconds from the start of reproduction, and a camera designation signal designating the third camera is sent to the 3D video generation device 30 after elapse of t11 seconds from the start of reproduction. .

また、図１８は、図１７に示す変形例のカラオケ楽曲を用いる場合の３Ｄ動画生成装置３０が行う処理手順を示す第２フローチャートである。この第２フローチャートに従って変形例の３Ｄ動画生成装置３０が行う３Ｄ動画の生成処理を説明する。先ず、３Ｄ動画生成装置３０は、カラオケ装置４１からの再生開始信号を受け取ったか否かを判断し（Ｓ１０）、再生開始信号を受け取っていない場合（Ｓ１０：ＮＯ）、処理待ちとなり、再生開始信号を受け取った場合（Ｓ１０：ＹＥＳ）、次にカメラ指定信号をカラオケ装置４１から受け取ったか否かを判断する。 FIG. 18 is a second flowchart showing a processing procedure performed by the 3D moving image generating apparatus 30 when the karaoke piece of the modification shown in FIG. 17 is used. A 3D moving image generation process performed by the modified 3D moving image generating apparatus 30 will be described with reference to the second flowchart. First, the 3D moving image generating apparatus 30 determines whether or not a reproduction start signal has been received from the karaoke apparatus 41 (S10). If no reproduction start signal has been received (S10: NO), the 3D moving image generating apparatus 30 waits for processing and receives a reproduction start signal. Is received (S10: YES), it is then determined whether a camera designation signal has been received from the karaoke apparatus 41 or not.

カメラ指定信号を受け取っていない場合（Ｓ１１：ＮＯ）、３Ｄ動画生成装置３０は、処理待ちとなり、カメラ指定信号を受け取った場合（Ｓ１１：ＹＥＳ）、受け取ったカメラ指定信号が指定するカメラを選択する処理を行い（Ｓ１２）、選択したカメラの視点で３Ｄ動画の生成処理を行う（Ｓ１３）。それから、３Ｄ動画生成装置３０は、新たなカメラ指定信号を受け取ったか否かを判断し（Ｓ１４）、新たなカメラ指定信号を受け取った場合（Ｓ１４：ＹＥＳ）、カメラ選択段階（Ｓ１２）へ戻り、新たなカメラ指定信号が指定するカメラを選択する。 When the camera designation signal is not received (S11: NO), the 3D moving image generating apparatus 30 waits for processing, and when the camera designation signal is received (S11: YES), the camera designated by the received camera designation signal is selected. Processing is performed (S12), and 3D moving image generation processing is performed from the viewpoint of the selected camera (S13). Then, the 3D moving image generating apparatus 30 determines whether or not a new camera designation signal has been received (S14). If a new camera designation signal is received (S14: YES), the process returns to the camera selection stage (S12). A camera designated by a new camera designation signal is selected.

また、新たなカメラ指定信号を受け取っていない場合（Ｓ１４：ＮＯ）、再生終了信号をカラオケ装置４１から受け取ったか否かを判断し（Ｓ１５）、再生終了信号を受け取っていない場合（Ｓ１５：ＮＯ）、３Ｄ動画生成の段階（Ｓ１３）へ戻り、カメラを変更することなく３Ｄ動画の生成処理を行う。また、再生終了信号を受け取った場合（Ｓ１５：ＹＥＳ）、３Ｄ動画生成装置３０は、処理を終了する。 If a new camera designation signal has not been received (S14: NO), it is determined whether or not a playback end signal has been received from the karaoke apparatus 41 (S15). If a playback end signal has not been received (S15: NO) Returning to the 3D moving image generation stage (S13), a 3D moving image generation process is performed without changing the camera. When the playback end signal is received (S15: YES), the 3D moving image generating apparatus 30 ends the process.

このように図１７に示すカラオケ楽曲に基づく変形例では、３Ｄ動画生成装置３０において、視点を決めるカメラをランダムに選択する処理が不要となり、処理負担の低減を図れる。また、カラオケ楽曲の作成側は表示する合成動画中に含まれる３Ｄ動画部分（バックダンサーの画像及び背景の画像等）の視点を指定できるため、作成側の意図に沿った動画コンテンツの生成を実現できる。 As described above, in the modification based on the karaoke music piece shown in FIG. 17, the 3D moving image generating apparatus 30 does not need to select a camera for determining the viewpoint at random, and the processing load can be reduced. In addition, since the karaoke song creation side can specify the viewpoint of the 3D movie part (back dancer image, background image, etc.) included in the synthesized movie to be displayed, the creation of movie content according to the intention of the creation side is realized. it can.

図１９は、本発明の第２実施形態に係る動画生成システム５０の主要部を示すブロック図である。第２実施形態の動画生成システム５０は、カメラ装置５１で撮影された映像中における被写体画像の位置に応じて、３Ｄ動画の視点情報を切り替えることを特徴にしている。そのため、カメラ装置５１の第１ビデオ線Ｖ１の一方の端を二股に分岐して第１分岐線Ｖ１ａをクロマキー装置６０に接続すると共に、第２分岐線Ｖ１ｂを、３Ｄ動画生成装置７０に新たに設けた動画入力インタフェース７７に接続して、３Ｄ動画生成装置７０が撮影映像を取得して、撮影映像中に含まれる被写体画像の位置を検出する処理を行えるようにしている。なお、カメラ装置５１は、図１に示す第１実施形態と同様に所定の撮像方向を向いた状態で固定されている。 FIG. 19 is a block diagram showing a main part of the moving image generating system 50 according to the second embodiment of the present invention. The moving image generating system 50 according to the second embodiment is characterized in that the viewpoint information of the 3D moving image is switched according to the position of the subject image in the video captured by the camera device 51. Therefore, one end of the first video line V1 of the camera device 51 is bifurcated to connect the first branch line V1a to the chroma key device 60, and the second branch line V1b is newly connected to the 3D moving image generating device 70. By connecting to the provided moving image input interface 77, the 3D moving image generating apparatus 70 acquires the captured video and can perform processing for detecting the position of the subject image included in the captured video. Note that the camera device 51 is fixed in a state in which it faces a predetermined imaging direction as in the first embodiment shown in FIG.

第２実施形態の３Ｄ動画生成装置７０は、ハードディスク装置７６に記憶される動画生成プログラム７９に被写体画像の位置検出の処理を含ませており、制御部７１は動画生成プログラム７９が規定する内容に基づいて被写体画像の位置検出処理を行う。 In the 3D moving image generating apparatus 70 of the second embodiment, the moving image generating program 79 stored in the hard disk device 76 includes processing for detecting the position of the subject image, and the control unit 71 sets the contents defined by the moving image generating program 79. Based on this, the position detection process of the subject image is performed.

図２０（ａ）は、３Ｄ動画生成装置７０の制御部７１が行う被写体画像の位置検出処理を説明する図である。制御部７１は、動画入力インタフェース７７で、カメラ装置５１の撮影映像Ｗ１を取得すると、取得した撮影映像Ｗ１の全範囲を格子状に区分けして計１２個のブロックＢ１〜Ｂ１２を形成する。それから、制御部７１は各ブロックＢ１〜Ｂ１２中において、被写体画像Ｈが占有する領域が最も大きいブロックを特定し、その特定したブロックを被写体画像Ｈの検出した位置（検出位置）に決定する。なお、図２０（ａ）では、被写体画像Ｈの占める領域が最も大きいブロックＢ１１が被写体画像Ｈの検出位置になる。 FIG. 20A is a diagram for describing a subject image position detection process performed by the control unit 71 of the 3D moving image generating apparatus 70. When the control unit 71 acquires the captured video W1 of the camera device 51 by using the moving image input interface 77, the control unit 71 divides the entire range of the acquired captured video W1 into a lattice shape to form a total of 12 blocks B1 to B12. Then, the control unit 71 specifies a block having the largest area occupied by the subject image H in each of the blocks B1 to B12, and determines the specified block as a detected position (detection position) of the subject image H. In FIG. 20A, the block B11 having the largest area occupied by the subject image H is the detection position of the subject image H.

また、図２０（ｂ）は、第２実施形態の３Ｄ動画生成装置７０が、ハードディスク装置７６に新たに記憶する位置対応テーブル８０の中身を示している。位置対応テーブル８０は、区分けされた撮影映像Ｗ１のブロックＢ１〜Ｂ１２ごとに、複数種類のカメラを予め対応付けた内容になっている。なお、位置対応テーブル８０の対応付けは、被写体画像Ｈが占有するブロックの位置に対して、動体（バックダンサー）及び背景の画像が重複せずに見やすくなることを考慮して決められている。例えば、被写体画像Ｈが右側（又は右下）のブロックを占有する場合、その逆側となる左側、又は対角的に反対方向となる左上側のカメラが対応付けられており、さらに、被写体画像Ｈが左側（又は左下）のブロックを占有する場合、その逆側となる右側、又は対角的に反対方向となる右上側のカメラが対応付けられている。 FIG. 20B shows the contents of the position correspondence table 80 newly stored in the hard disk device 76 by the 3D moving image generating apparatus 70 of the second embodiment. The position correspondence table 80 has a content in which a plurality of types of cameras are associated in advance for each of the divided blocks B1 to B12 of the captured video W1. The association in the position correspondence table 80 is determined in consideration of the fact that the moving object (back dancer) and the background image are easy to see with respect to the position of the block occupied by the subject image H without overlapping. For example, when the subject image H occupies the right side (or lower right) block, the left side camera that is the opposite side or the upper left camera that is diagonally opposite is associated with the subject image H. When H occupies the left (or lower left) block, the right side camera that is the opposite side or the upper right side camera that is diagonally opposite is associated.

ハードディスク装置７６が記憶する動画生成プログラム７９は、図２０（ｂ）の位置対応テーブル８０を用いて、視点を定めるカメラを選択する処理を規定している。詳しくは、動画生成プログラム７９は、上述した被写体画像Ｈの検出処理により一つのブロックを特定すると、その特定したブロックに対応するカメラ（視点情報）を位置対応テーブル８０から選択する処理を制御部７１が行うことを規定している。それから、動画生成プログラム７９は、選択したカメラの位置（ＸＹＺ座標系の位置）を視点テーブルＴ（図８参照）に基づき制御部７１が特定し、３Ｄ動画の生成処理を行うことを規定している。 The moving image generation program 79 stored in the hard disk device 76 defines a process for selecting a camera for defining a viewpoint using the position correspondence table 80 in FIG. Specifically, when the moving image generating program 79 identifies one block by the above-described detection processing of the subject image H, the control unit 71 performs processing for selecting a camera (viewpoint information) corresponding to the identified block from the position correspondence table 80. Stipulates what to do. Then, the moving image generation program 79 specifies that the control unit 71 specifies the position of the selected camera (the position of the XYZ coordinate system) based on the viewpoint table T (see FIG. 8) and performs the 3D moving image generation process. Yes.

第２実施形態の３Ｄ動画生成装置７０における他の部分（ＲＡＭ７２、ＲＯＭ７３、動画出力インタフェース７４、入出力インタフェース７５等）及びハードディスク装置７６に記憶されるシステムプログラム７８、動画データＤ、視点テーブルＴは、第１実施形態と同等である。また、第２実施形態の動画生成システム５０のカメラ装置５１及びクロマキー装置６０も第１実施形態と同等であるため説明を省略する。さらに、第２実施形態の動画生成システム５０も、図１に示すようなカラオケシステム１に採用され、カラオケ楽曲を歌うユーザＵが合成された合成動画の生成を行っている。なお、カラオケシステム１に含まれる動画生成システム５０以外の分配装置４０、カラオケ装置４１等も第１実施形態と同等であるため、説明を省略すると共に、第２実施形態でも第１実施形態と同等の符号を用いて以下の説明を行う。 The other part (RAM 72, ROM 73, moving image output interface 74, input / output interface 75, etc.) in the 3D moving image generating apparatus 70 of the second embodiment and the system program 78, moving image data D, and viewpoint table T stored in the hard disk device 76 are This is equivalent to the first embodiment. In addition, since the camera device 51 and the chroma key device 60 of the moving image generation system 50 of the second embodiment are the same as those of the first embodiment, description thereof is omitted. Furthermore, the moving image generating system 50 of the second embodiment is also employed in the karaoke system 1 as shown in FIG. 1, and generates a combined moving image in which a user U who sings karaoke music is synthesized. Note that the distribution device 40, the karaoke device 41, and the like other than the moving image generation system 50 included in the karaoke system 1 are also equivalent to those in the first embodiment, and thus the description thereof is omitted, and the second embodiment is also equivalent to the first embodiment. The following description will be made using the reference numeral.

図２１は、第２実施形態の３Ｄ動画生成装置７０における３Ｄ動画の生成処理を示す第３フローチャートである。先ず、３Ｄ動画生成装置７０は、カラオケ装置４１からの再生開始信号を受け取ったか否かを判断し（Ｓ２０）、再生開始信号を受け取っていない場合（Ｓ２０：ＮＯ）、処理待ちとなり、再生開始信号を受け取った場合（Ｓ２０：ＹＥＳ）、次にカメラ装置５１から送られる撮影映像の入力があるか否かを判断する（Ｓ２１）。 FIG. 21 is a third flowchart illustrating 3D moving image generation processing in the 3D moving image generation apparatus 70 according to the second embodiment. First, the 3D moving image generating apparatus 70 determines whether or not a reproduction start signal has been received from the karaoke apparatus 41 (S20). If no reproduction start signal has been received (S20: NO), the process waits for the reproduction start signal. Is received (S20: YES), it is then determined whether or not there is an input of a captured video sent from the camera device 51 (S21).

撮影映像の入力がない場合（Ｓ２１：ＮＯ）、３Ｄ動画生成装置７０は、撮影映像の入力待ちとなり、撮影映像の入力があった場合（Ｓ２１：ＹＥＳ）、図２０（ａ）に示すような被写体画像Ｈの位置検出処理を行い（Ｓ２２）、検出した位置に応じたカメラの選択処理を位置対応テーブル８０を利用して行い（Ｓ２３）、選択したカメラの視点で３Ｄ動画の生成処理を行う（Ｓ２４）。それから、３Ｄ動画生成装置７０は、再生終了信号をカラオケ装置４１から受け取ったか否かを判断し（Ｓ２５）、再生終了信号を受け取っていない場合（Ｓ２５：ＮＯ）、撮影映像の入力判断処理の段階（Ｓ２１）へ戻り、処理を継続する。また、再生終了信号を受け取った場合（Ｓ２５：ＹＥＳ）、３Ｄ動画生成装置７０は、処理を終了する。 When there is no input of a captured video (S21: NO), the 3D moving image generating apparatus 70 waits for input of a captured video, and when there is an input of a captured video (S21: YES), as shown in FIG. A position detection process of the subject image H is performed (S22), a camera selection process corresponding to the detected position is performed using the position correspondence table 80 (S23), and a 3D moving image generation process is performed from the viewpoint of the selected camera. (S24). Then, the 3D moving image generating apparatus 70 determines whether or not a playback end signal has been received from the karaoke apparatus 41 (S25), and if it has not received a playback end signal (S25: NO), a stage of input determination processing for a captured video Returning to (S21), the processing is continued. When the playback end signal is received (S25: YES), the 3D moving image generating apparatus 70 ends the process.

このように第２実施形態の３Ｄ動画生成装置７０は、撮影映像中の被写体画像の位置に応じて視点を変更した３Ｄ動画を生成するので、最終的にクロマキー装置６０において生成される合成動画のフレーム画像Ｇ２０、Ｇ２１等は、図２２（ａ）（ｂ）に示すように、被写体画像Ｈの位置が変わるごとに、バックダンサー１５ａ〜１５ｃ及びステージ背景１６ａ〜１６ｄの画像を見る方向が変化する内容になる。 As described above, since the 3D moving image generating apparatus 70 according to the second embodiment generates a 3D moving image in which the viewpoint is changed according to the position of the subject image in the captured video, the synthesized moving image finally generated by the chroma key device 60 is generated. In the frame images G20, G21, etc., as shown in FIGS. 22A and 22B, the direction of viewing the images of the back dancers 15a to 15c and the stage backgrounds 16a to 16d changes every time the position of the subject image H changes. Become content.

即ち、図２２（ａ）では、ユーザＵの移動により被写体画像Ｈが、フレーム画像Ｇ２０において左側に位置するので、左側と反対側になる右側のカメラを視点にして、右端のバックダンサー１５ｃが大きくなるように視点変更が行われている。一方、図２２（ｂ）では、ユーザＵの移動により被写体画像Ｈが、フレーム画像Ｇ２１において右側に位置するので、右側と反対側になる左側のカメラに視点にして、左端のバックダンサー１５ａが大きくなるように視点変更が行われている。そのため、カラオケ楽曲を歌うユーザＵは、撮影を行うカメラ装置５１に対する位置を移動することで、バックダンサー１５ａ〜１５ｃ及びステージ背景１６ａ〜１６ｄの画像の視点が切り替わるようになり、歌いながら位置を変えることよる視覚的な楽しさを味わえる。 That is, in FIG. 22A, the subject image H is positioned on the left side in the frame image G20 due to the movement of the user U, so that the right end back dancer 15c is large when viewed from the right camera on the opposite side to the left side. The viewpoint has been changed so that On the other hand, in FIG. 22B, since the subject image H is positioned on the right side in the frame image G21 due to the movement of the user U, the left-end back dancer 15a is large from the viewpoint of the left camera on the opposite side to the right side. The viewpoint has been changed so that Therefore, the user U who sings karaoke music moves the position with respect to the camera device 51 that performs shooting, so that the viewpoints of the images of the back dancers 15a to 15c and the stage backgrounds 16a to 16d are switched, and the position changes while singing. You can enjoy the visual enjoyment.

なお、第２実施形態でも、第１実施形態で述べた各種変形例の適用が可能である。また、図２３（ａ）（ｂ）は、第２実施形態特有の変形例の処理内容を示している。この変形例では、カメラ装置５１で撮影された映像中における被写体画像の動作に応じて、３Ｄ動画の視点情報を切り替えることを特徴にしている。そのため、変形例の３Ｄ動画生成装置７０は、ハードディスク装置７６に記憶される動画生成プログラム７９に撮影された被写体の動作を検出する処理を含ませており、制御部７１は動画生成プログラム７９が規定する内容に基づいて被写体の動作検出を行う。 In the second embodiment, the various modifications described in the first embodiment can be applied. FIGS. 23A and 23B show the processing contents of a modification specific to the second embodiment. This modification is characterized in that the viewpoint information of the 3D moving image is switched according to the operation of the subject image in the video shot by the camera device 51. For this reason, in the modified 3D moving image generating apparatus 70, the moving image generating program 79 stored in the hard disk device 76 includes processing for detecting the action of the photographed subject. The movement of the subject is detected based on the contents to be processed.

被写体の動作検出は、カメラ装置５１から順次送られる撮影映像の時間的に前後する映像フレームの差分を取ることで行う。例えば、図２３（ａ）は、時刻ｔ２０での撮影映像Ｗ１０の内容、図２３（ｂ）は時刻ｔ２０に続く時刻ｔ２１での撮影映像Ｗ１１の内容を示し、動画生成プログラム７９の規定に基づき制御部７１は、各撮影映像Ｗ１０、Ｗ１１を区分けした各ブロックＢ１〜Ｂ１２で、被写体画像Ｈが位置するブロックを検出する。 The motion detection of the subject is performed by taking a difference between video frames that are temporally mixed in the captured video sequentially sent from the camera device 51. For example, FIG. 23A shows the content of the captured video W10 at time t20, and FIG. 23B shows the content of the captured video W11 at time t21 following time t20. The unit 71 detects a block in which the subject image H is located in each of the blocks B1 to B12 obtained by dividing the captured videos W10 and W11.

そして、制御部７１は、検出した被写体画像Ｈが位置するブロックの中で、時刻ｔ２０の撮影映像Ｗ１０では位置しないが、時刻ｔ２１の撮影映像Ｗ１１では被写体Ｈが位置するブロックがあるか否かを、各ブロックごとの差分で判断し、そのようなブロックがあるときは、被写体に動作があったと判定する。例えば、図２３（ａ）（ｂ）の場合では、図２３（ａ）の第６ブロックＢ６に被写体画像Ｈは位置しないが、図２３（ｂ）の第６ブロックＢ６では被写体画像Ｈ（ユーザＵの右手）が位置するため、動作の有ったブロックとして第６ブロックＢ６が特定され、被写体に動作があったと判定される。 Then, the control unit 71 determines whether or not there is a block where the subject H is located in the photographed video W10 at time t21 but not in the photographed video W10 at time t20 among the blocks where the detected subject image H is located. Judgment is made based on the difference for each block, and when there is such a block, it is determined that the subject has moved. For example, in the case of FIGS. 23 (a) and 23 (b), the subject image H is not located in the sixth block B6 in FIG. 23 (a), but in the sixth block B6 in FIG. 23 (b), the subject image H (user U The right hand) is located, the sixth block B6 is identified as a block having movement, and it is determined that the subject has moved.

図２４は、動作検出に係る変形例で使用される動作対応テーブル８５の中身を示しており、動作対応テーブル８５は変形例の３Ｄ動画生成装置７０が、ハードディスク装置７６に新たに記憶するものである。動作対応テーブル８５は、動作のあったブロックＢ１〜Ｂ１２ごとに、複数種類のカメラ（視点情報に相当）を予め対応付けた内容になっている。なお、動作対応テーブル８５の対応付けは、動作のあったブロックの位置に対して、動体（バックダンサー）及び背景の画像が重複せずに見やすくなることを考慮して決められているが、動作に連動するようにカメラの対応付けを行ってもよい。例えば、現在の被写体Ｈが位置するブロックより右側のブロックが被写体Ｈの動作により新たに検出されれば、使用中のカメラより右側に位置するカメラを対応付けるようにしてもよく、また、被写体Ｈが位置するブロックより左側のブロックが新たに検出されれば、使用中のカメラより左側のカメラを対応付けるようにしてもよい。なお、上述した各ブロックＢ１〜Ｂ１２が本発明では内容的に、検出対象の被写体の動作に該当したものになっている。 FIG. 24 shows the contents of the motion correspondence table 85 used in the modification relating to motion detection. The motion correspondence table 85 is newly stored in the hard disk device 76 by the 3D moving image generating apparatus 70 of the modification. is there. The operation correspondence table 85 has a content in which a plurality of types of cameras (corresponding to viewpoint information) are associated in advance for each of the blocks B1 to B12 that have operated. The correspondence in the motion correspondence table 85 is determined in consideration of the fact that the moving object (back dancer) and the background image are easy to see without overlapping with respect to the position of the block where the motion has occurred. Cameras may be associated so as to be linked to each other. For example, if a block on the right side of the block where the current subject H is located is newly detected by the operation of the subject H, a camera located on the right side of the camera in use may be associated with the subject H. If a block on the left side of the located block is newly detected, the camera on the left side of the camera in use may be associated. In the present invention, the blocks B1 to B12 described above correspond to the operation of the subject to be detected.

また、ハードディスク装置７６に記憶される変形例の動画生成プログラム７９は、動作検出に基づき、図２４の動作対応テーブル８５を用いて、視点を定めるカメラを選択する処理を規定している。詳しくは、動画生成プログラム７９は上述した動作の検出処理により、動作のあったブロックを特定し、その特定したブロックに対応するカメラ（視点情報）を動作対応テーブル８５から選択する処理を規定する。 In addition, the modified moving image generation program 79 stored in the hard disk device 76 defines a process for selecting a camera for determining a viewpoint based on the motion detection using the motion correspondence table 85 of FIG. Specifically, the moving image generation program 79 specifies a block in which an operation has been performed by the above-described operation detection process, and a process of selecting a camera (viewpoint information) corresponding to the specified block from the operation correspondence table 85.

図２５は、上述した動作検出に係る変形例の３Ｄ動画生成装置７０が行う処理内容を示す第４フローチャートである。先ず、３Ｄ動画生成装置７０は、カラオケ装置４１から再生開始信号の受け取りの判断処理（Ｓ３０）、及びカメラ装置５１から送られる撮影映像の入力の判断処理（Ｓ３１）を、図２１に示す第３フローチャートと同様に行う。 FIG. 25 is a fourth flowchart illustrating the processing contents performed by the 3D moving image generating apparatus 70 according to the modified example related to the motion detection described above. First, the 3D moving image generating apparatus 70 performs a determination process (S30) for receiving a reproduction start signal from the karaoke apparatus 41 and a determination process (S31) for inputting a photographed image sent from the camera apparatus 51, as shown in FIG. The same as the flowchart.

そして、カメラ装置５１からの撮影映像の入力があった場合（Ｓ３１：ＹＥＳ）、３Ｄ動画生成装置７０は、図２３（ａ）（ｂ）に示すような被写体の動作検出処理を行い（Ｓ３２）、動作による変化があったか否かを判断する（Ｓ３３）。変化があったと判断した場合（Ｓ３３：ＹＥＳ）、３Ｄ動画生成装置７０は、動作のあったブロックに対応するカメラの選択処理を、動作対応テーブル８５を用いて行う（Ｓ３４）。また、変化が無いと判断した場合（Ｓ３３：ＮＯ）、被写体画像Ｈが占有するブロックに応じたカメラの選択処理を、図２０（ｂ）の位置対応テーブル８０を用いて行う（Ｓ３５）。 When a captured video is input from the camera device 51 (S31: YES), the 3D moving image generating device 70 performs a subject motion detection process as shown in FIGS. 23A and 23B (S32). Then, it is determined whether or not there has been a change due to the operation (S33). When it is determined that there has been a change (S33: YES), the 3D moving image generating apparatus 70 uses the operation correspondence table 85 to perform a camera selection process corresponding to the block having the operation (S34). If it is determined that there is no change (S33: NO), camera selection processing corresponding to the block occupied by the subject image H is performed using the position correspondence table 80 of FIG. 20B (S35).

それから、３Ｄ動画生成装置７０は、選択したカメラの視点で３Ｄ動画の生成処理を行う（Ｓ３６）。そして、３Ｄ動画生成装置７０は、再生終了信号をカラオケ装置４１から受け取ったか否かを判断し（Ｓ３７）、再生終了信号を受け取っていない場合（Ｓ３７：ＮＯ）、撮影映像の入力判断処理の段階（Ｓ３１）へ戻り、処理を継続する。また、再生終了信号を受け取った場合（Ｓ３７：ＹＥＳ）、３Ｄ動画生成装置７０は、処理を終了する。 Then, the 3D moving image generating apparatus 70 performs 3D moving image generation processing from the viewpoint of the selected camera (S36). Then, the 3D moving image generating apparatus 70 determines whether or not a playback end signal has been received from the karaoke apparatus 41 (S37), and if it has not received a playback end signal (S37: NO), a stage of input determination processing for a captured video Returning to (S31), the processing is continued. When the playback end signal is received (S37: YES), the 3D moving image generating apparatus 70 ends the process.

このように第２実施形態の変形例の３Ｄ動画生成装置７０は、被写体の動作に応じて視点を変更した３Ｄ動画を生成するので、最終的にクロマキー装置６０において生成される合成動画もカラオケ楽曲を歌うユーザＵの動作に応じて、バックダンサー１５ａ〜１５ｃ及びステージ背景１６ａ〜１６ｄの画像を見る方向が切り替わる。そのため、カラオケ楽曲を歌うユーザＵは、歌っている途中に振りの動作を行うことで、合成動画中のバックダンサー１５ａ〜１５ｃ及びステージ背景１６ａ〜１６ｄの画像の視点が変化するので、振りの動作に対する楽しみを得られると共に、各ユーザは積極的に様々な振りの動作を行って画像の視点を変えて、合成動画を見るユーザを楽しませることもできる。なお、図２５中の第４フローチャートは、動作検出に係る処理と、位置検出に係る処理を組み合わせた内容にしているが（Ｓ３４、Ｓ３５）、動作検出に係る処理のみを行って、動作による変化が無い場合は（Ｓ３３：ＮＯ）、固定の視点（デフォルトの視点）を選択するようにしてもよい。 As described above, the 3D moving image generating apparatus 70 according to the modified example of the second embodiment generates a 3D moving image in which the viewpoint is changed according to the motion of the subject, so that the synthesized moving image finally generated by the chroma key device 60 is also a karaoke piece. The direction of viewing the images of the back dancers 15a to 15c and the stage backgrounds 16a to 16d is switched according to the operation of the user U who sings. Therefore, the user U who sings karaoke music performs a swing motion while singing, so that the viewpoints of the images of the back dancers 15a to 15c and the stage backgrounds 16a to 16d in the synthesized video change, so that the swing motion In addition, each user can actively perform various swinging motions to change the viewpoint of the image and entertain the user who sees the composite video. Note that the fourth flowchart in FIG. 25 is a combination of processing related to motion detection and processing related to position detection (S34, S35), but only processing related to motion detection is performed and changes due to motion are performed. If there is no (S33: NO), a fixed viewpoint (default viewpoint) may be selected.

本発明の第１実施形態に係る動画生成システムを適用したカラオケシステムの全体的な構成を示す概略図である。It is the schematic which shows the whole structure of the karaoke system to which the moving image production | generation system which concerns on 1st Embodiment of this invention is applied. カラオケ装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of a karaoke apparatus. ３Ｄ動画配信サーバの内部構成を示すブロック図である。It is a block diagram which shows the internal structure of a 3D moving image delivery server. ウェブサイトのサイトページの一例を示す概略図である。It is the schematic which shows an example of the site page of a website. クロマキー装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of a chroma key apparatus. ３Ｄ動画生成装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of a 3D moving image production | generation apparatus. 三次座標系における三次元的な動体、及び視点の位置関係等を説明する概略図である。It is the schematic explaining the three-dimensional moving body in a tertiary coordinate system, the positional relationship of a viewpoint, etc. FIG. 視点テーブルの中身を示す図表である。It is a chart which shows the contents of a viewpoint table. （ａ）三次元コンピュータグラフィック技術により作成されるバックダンサーの画像を示す概略図、（ｂ）はモーションキャプチャ技術によるマーカに対応した点及びボーンを示す概略図である。(A) Schematic diagram showing an image of a back dancer created by three-dimensional computer graphic technology, (b) is a schematic diagram showing points and bones corresponding to markers by motion capture technology. 動画データを構成する各動画フレームに含まれるバックダンサーの状態を示す概略図である。It is the schematic which shows the state of the back dancer contained in each moving image frame which comprises moving image data. 各動画フレームに対応するモーションデータの内容を表した図である。It is a figure showing the content of the motion data corresponding to each moving image frame. 第１実施形態の３Ｄ動画生成装置の処理方法を示す第１フローチャートである。It is a 1st flowchart which shows the processing method of the 3D moving image production | generation apparatus of 1st Embodiment. 楽曲の再生進行に伴って選択されたカメラを表したタイムチャートである。It is the time chart showing the camera selected with the reproduction progress of music. （ａ）は撮影映像から抽出された被写体画像のイメージを表した概略図、（ｂ）は３Ｄ動画生成装置で生成された３Ｄ動画のフレーム画像を示す概略図である。(A) is a schematic diagram showing an image of a subject image extracted from a captured video, and (b) is a schematic diagram showing a frame image of a 3D moving image generated by the 3D moving image generating device. （ａ）が生成された合成動画を構成するフレーム画像の一例を示す概略図、（ｂ）は（ａ）と視点が異なる合成動画のフレーム画像を示す概略図である。(A) is the schematic which shows an example of the frame image which comprises the synthetic | combination moving image produced | generated, (b) is the schematic which shows the frame image of the synthetic | combination moving image from which a viewpoint differs from (a). テロップが合成された合成動画のフレーム画像の一例を示す概略図である。It is the schematic which shows an example of the frame image of the synthetic | combination moving image with which the telop was synthesize | combined. 第１実施形態の変形例に用いられるカラオケ楽曲のデータ構造を示す図である。It is a figure which shows the data structure of the karaoke musical piece used for the modification of 1st Embodiment. 第１実施形態の変形例に係る３Ｄ動画生成の処理方法を示す第２フローチャートである。It is a 2nd flow chart which shows the processing method of 3D animation generation concerning the modification of a 1st embodiment. 本発明の第２実施形態に係る動画生成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the moving image production | generation system which concerns on 2nd Embodiment of this invention. （ａ）は被写体画像の位置検出に係る処理を説明するための撮影映像の図、（ｂ）は位置対応テーブルの中身を示す図表である。(A) is a figure of the picked-up image for demonstrating the process concerning the position detection of a to-be-photographed image, (b) is a table | surface which shows the content of a position corresponding | compatible table. 第２実施形態に係る３Ｄ動画生成の処理方法を示す第３フローチャートである。It is a 3rd flow chart which shows the processing method of 3D animation generation concerning a 2nd embodiment. （ａ）は合成動画を構成するフレーム画像の一例を示す概略図、（ｂ）は被写体画像の移動に伴い視点が変更された状態のフレーム画像を示す概略図である。(A) is a schematic diagram showing an example of a frame image constituting a synthesized moving image, and (b) is a schematic diagram showing a frame image in a state where the viewpoint is changed as the subject image moves. 第２実施形態の変形例に係る動作検出に係る処理を説明するための撮影映像の図であり、（ａ）は連続する時間における前側の状態を示す図、（ｂ）は被写体の動作があった後側の状態を示す図である。FIG. 10A is a photographed video image for explaining processing related to motion detection according to a modification of the second embodiment, FIG. 10A is a diagram illustrating a front side state in continuous time, and FIG. It is a figure which shows the state of the rear side. 動作対応テーブルの中身を示す図表である。It is a chart which shows the contents of an operation correspondence table. 第２実施形態の変形例に係る３Ｄ動画生成の処理方法を示す第４フローチャートである。It is a 4th flow chart which shows the processing method of 3D animation generation concerning the modification of a 2nd embodiment.

Explanation of symbols

１カラオケシステム
２大型ディスプレイ
４サブディスプレイ
５カラオケ楽曲サーバ
６壁部材
１０動画生成システム
１１カメラ装置
１５ａ〜１５ｃバックダンサー
１６ａ〜１６ｄステージ背景
２０クロマキー装置
２３被写体画像抽出部
２４合成部
３０３Ｄ動画生成装置
３８動画生成プログラム
４０分配装置
４１カラオケ装置
４１ｄ楽曲再生処理部
４０ｆテロップ合成部
４４記憶装置
４５３Ｄ動画配信サーバ
４６コンテンツデータベース
８０位置対応テーブル
８５動作対応テーブル
Ｄ動画データ
Ｔ視点テーブル
Ｈ被写体画像 DESCRIPTION OF SYMBOLS 1 Karaoke system 2 Large display 4 Sub display 5 Karaoke music server 6 Wall member 10 Movie production | generation system 11 Camera apparatus 15a-15c Back dancer 16a-16d Stage background 20 Chroma key apparatus 23 Subject image extraction part 24 Composition part 30 3D animation production | generation apparatus 38 Movie generation program 40 Distribution device 41 Karaoke device 41d Music playback processing unit 40f Telop synthesis unit 44 Storage device 45 3D movie distribution server 46 Content database 80 Position correspondence table 85 Operation correspondence table D Movie data T Viewpoint table H Subject image

Claims

Movie generation means for generating a moving image including a moving object whose posture and viewpoint are specified based on posture information that defines a posture of the moving object in a three-dimensional coordinate system per unit time and viewpoint information that defines a position where the moving object is viewed In the video generation system provided,
Photographing means for photographing the subject;
Image extracting means for extracting a subject image included in the video imaged by the imaging means;
A moving image generating system comprising: a synthesized moving image generating unit configured to combine a subject image extracted by the image extracting unit with a moving image generated by the moving image generating unit to generate a combined moving image.

Music acquisition means for acquiring music;
Music playback means for performing playback processing of the music acquired by the music acquisition means;
The moving image generating system according to claim 1, further comprising display processing means for performing display processing of the synthetic moving image in accordance with the reproduction processing of the music reproducing means.

The music acquisition means is adapted to acquire a music accompanied by a character representing lyrics,
Character synthesis means for synthesizing the characters attached to the music into the synthesized video,
The moving image generation system according to claim 2, wherein the display processing unit performs display processing of a combined moving image in which characters are combined by the character combining unit.

The music acquisition means is configured to acquire a music in which a plurality of viewpoint information is attached in the order of music progress,
4. The moving image generation system according to claim 2, wherein the moving image generation unit generates a moving image based on viewpoint information corresponding to a progress point of the reproduction process of the music reproduction unit.

A means for randomly selecting one viewpoint information from a plurality of viewpoint information,
The moving image generating system according to any one of claims 1 to 3, wherein the moving image generating means generates a moving image based on selected viewpoint information.

Image position detecting means for detecting the position of a subject image included in the video imaged by the imaging means;
A position correspondence table in which each position in the video and viewpoint information are associated with each other;
Means for selecting viewpoint information corresponding to the position detected by the image position detection means from the position correspondence table;
The moving image generation system according to any one of claims 1 to 3, wherein the moving image generation unit generates a moving image based on viewpoint information selected from the position correspondence table.

An action detecting means for detecting an action of a subject image in a video taken by the photographing means;
An action correspondence table in which each action related to a subject image and a plurality of viewpoint information are respectively associated;
Means for selecting viewpoint information corresponding to the motion detected by the motion detection means from the motion correspondence table;
The moving image generating system according to any one of claims 1, 2, 3, and 6, wherein the moving image generating means generates a moving image based on viewpoint information selected from the operation correspondence table.

In the posture information, the posture of a three-dimensional coordinate system related to a plurality of moving objects is defined,
The moving image generation system according to any one of claims 1 to 7, wherein the moving image generation unit generates a moving image including a plurality of moving objects based on the posture information.

Means for storing background information defining the position of the background in the three-dimensional coordinate system;
The moving image generation system according to claim 1, wherein the moving image generation unit generates a moving image including a background based on the stored background information.

The moving image generation system according to any one of claims 1 to 9, further comprising a storage processing unit that performs a process of storing the combined moving image generated by the combined moving image generation unit in a storage medium.

Receiving means for receiving a video request signal transmitted through the network;
A video transmission means for transmitting a video to a transmission source of the video request signal when the receiving means receives the video request signal;
The moving image generation system according to any one of claims 1 to 10, wherein the moving image transmission unit transmits the combined moving image generated by the synthetic moving image generation unit.

The video generation system generates a video that includes a moving object whose posture and viewpoint are specified based on posture information that defines the posture of the moving object in the 3D coordinate system per unit time and viewpoint information that defines the position where the moving object is viewed. In the video generation method to
The video generation system includes:
Shoot the subject,
Extract the subject image included in the captured video,
A moving image generating method comprising combining the extracted subject image with the moving image to generate a combined moving image.