JP2005049999A

JP2005049999A - Image input system, image input method, program described to execute method on information processing device, and storage medium with program stored

Info

Publication number: JP2005049999A
Application number: JP2003204006A
Authority: JP
Inventors: Yushi Hasegawa; 雄史長谷川; Takashi Kitaguchi; 貴史北口; Norihiko Murata; 憲彦村田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2003-07-30
Filing date: 2003-07-30
Publication date: 2005-02-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image input technique that can generate a three-dimensional image of high quality by easily imaging the same subject from different viewpoints by means of a digital camera or the like and saving an image group of, for example, a QuickTime(R) VR format. <P>SOLUTION: An image input system for inputting images with imaging means comprises: an imaging part 11 for imaging the same subject from a plurality of viewpoints to input a plurality of pieces of image information; an attitude detection part 12 for detecting attitude information indicating the attitude of the imaging device 1 that images the subject; an image conversion part 21 for rotatively converting the image information input by the imaging part 11 according to the attitude information; and an image recording part 22 for recording the image information converted by the image conversion part 21 as adding the attitude information thereto. If the center position of the subject projected on an imaging plane of the imaging device 1 deviates from the center position of the imaging plane, the center position of the imaged image information is translated to register the center position of the image information and the center position of the subject. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタルカメラやビデオカメラなどにより同一の被写体を様々な視点で撮影した撮影画像群を表示装置などへ３次元的に表示するための画像処理技術に関する。
【０００２】
【従来の技術】
電子技術の飛躍的な進歩によりデジタルカメラのコストが安価になったことに伴って、デジタルカメラの普及が急速に進んだ。デジタルカメラが小型で持ち運びに便利なこと、様々な機器に容易に備え付けられることなどから、デジタルカメラで撮影した画像情報を利用するアプリケーションの開発がおこなわれている。このようなアプリケーション例の一つに電子商取引の広告が挙げられる。デジタルカメラで商品を撮影した画像をＷｅｂ（多くの利用者がネットワークを介して自分の情報処理装置の画面で見ることができるように提供されている電子化情報源）上に掲載するのである。この際、商品を一方向から撮影した画像だけでは、その方向だけでしか商品の外観を見ることができないので、様々な方向から撮影した複数枚の画像を掲載することが必要となる。
複数枚画像の表現方法としては、画像を単純に並べて表示する方法も可能であるが、複数枚の画像がただ単純に並んでいるだけでは、商品全体のイメージが掴みにくい。これを解決するために、複数枚の画像群から商品の３次元形状を算出し、ポリゴンを生成し、ポリゴン上に画像テクスチャを貼り付けたものを表示することにより商品を３次元画像で表現して商品イメージを掴みやすくさせる手法がある。しかし、デジタルカメラで撮影された画像群から３次元形状を精度良く計測することは困難であり、その結果、計測した３次元形状に画像テクスチャを貼り付けて３次元画像を生成すると画質が劣化するという問題があった。
このような問題の解決方法として、撮影した画像群を利用者の指示に応じて切り替えることにより商品を３次元的に表示するＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどの表現方法がある。このような表現方法は、画質を劣化させることなく商品のイメージを掴みやすくさせることができるということで期待されている。しかし、前記したＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどの表現方法では、利用者の指示に応じて画像群が切替わるように、商品を撮影した際のデジタルカメラと商品との撮影位置関係を画像群データに保存しなければならない。そのため、デジタルカメラと商品の撮影位置関係を計測する必要があり、一般的にＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどの画像群を撮影するときには、ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ画像撮影専用装置を使用する。
【０００３】
なお、特開平９−８１７９０号公報記載の従来技術では、加速度センサや角速度センサなどによりカメラの動きを検出し、異なる視点からの光軸が任意の点で交わるように光軸方向を補正して３次元画像を生成する。その際、センサ情報と予め設定した被写体・カメラ間距離から算出した動きベクトルの推定値と、画像処理により求めた動きベクトルとの比較により被写体の検出をおこなう。
また、特開平９−１８６９５７号公報に示された従来技術では、複数の撮像手段の位置ベクトル（Ｘ，Ｙ，Ｚ）と姿勢（θ，φ）を撮影した画像情報に付加して保存することにより、同一の被写体を様々な視点から撮影した画像群を用いて撮影した画像群を３次元的に表示するように記録する。
また、特開平９−２４５１９５号公報に示された従来技術では、異なる視点から撮影された画像群を光線空間に射影し、光線空間全域に光線空間データを作成することにより光線空間データから撮影した視点とは異なる任意視点の画像を生成する。
また、特開平１１−３０６３６３号公報に示された従来技術では、撮影装置に姿勢検出センサを取り付け、撮像装置の位置を移動させた際に変化する姿勢情報をセンサ情報から算出し、並進移動情報は少なくとも２枚以上の画像内にある対応点の位置情報から算出することにより移動した撮像装置の位置情報を精度良く算出し被写体の３次元形状を計測する。
また、特開２００１−１７７８５０公報に示された従来技術では、撮像手段に姿勢センサを取り付けて姿勢情報を検出し、画像信号の変化分から移動速度を求めて並進成分情報を算出する。これにより、姿勢情報と並進成分情報から被写体の３次元空間内の位置を検出し、被写体を３次元空間内の位置から画像平面に投影する。
【特許文献１】特開平９−８１７９０号公報
【特許文献２】特開平９−１８６９５７号公報
【特許文献３】特開平９−２４５１９５号公報
【特許文献４】特開平１１−３０６３６３号公報
【特許文献５】特開２００１−１７７８５０公報
【特許文献６】特開平１１−３７７３６号公報
【０００４】
【発明が解決しようとする課題】
前記したように、ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどの表現方法では、一般的にＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどの画像群を撮影する場合、ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ画像撮影専用装置を使用するなど特殊な撮影環境でなければＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲの画像群を生成できず、したがって、専用装置を持っていない一般の人々がＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式の画像を生成することは困難であり、そのため、ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式の表示は商品イメージを掴みやすくする表現方法でありながらも、一部のＷｅｂサイトでしか用いられていないという問題があった。
また、特開平９−８１７９０号公報記載の従来技術では、被写体とカメラの距離が予め設定されているので、特定の撮影条件下でしか３次元画像を生成できないし、光軸の向きを変えるための駆動機構が必要であるので、装置の構造が複雑になる。
また、特開平９−１８６９５７号公報記載の従来技術では、位置ベクトル（Ｘ，Ｙ，Ｚ）と姿勢（θ，φ）が既知である、固定された複数の撮像手段を主に対象にしているので、単眼の撮像手段である場合には、位置ベクトル（Ｘ，Ｙ，Ｚ）と姿勢（θ，φ）を算出することができず、撮影した画像に位置・姿勢情報を付加して保存することができない。また、撮影画像面の中心位置からずれて被写体が投影された場合に関して考慮していないので、撮影時に画像面の中心に被写体が投影されるように撮影しなければ、複数の画像間で被写体の投影されている位置が変化していまい、利用者の指示により画像を切り替えたときに画像面上で被写体が上下左右に平行移動してしまい、３次元的に表示することができない。
【０００５】
また、特開平９−２４５１９５号公報に示された従来技術では、姿勢情報と並進成分情報が既知である場合を対象にしているので、単眼の撮像手段を使用するときなど姿勢情報や並進成分情報が不明である場合には、光線空間データを生成することができない。したがって、複数枚の画像から、撮影した視点とは異なる任意視点の画像を生成することもできなくなる。
また、特開平１１−３０６３６３号公報に示された従来技術では、被写体の３次元的形状を作成するには、画像内の対応点を数多く検出しなければならないので、誤対応点が含まれる可能性が高くなり、３次元形状の計測精度が低下し、画像劣化を生ずる可能性がある。
また、特開２００１−１７７８５０公報に示された従来技術では、ビデオカメラなどで連続的に被写体を撮影する場合を想定して撮影画像信号の変化分から並進成分情報を算出しているので、デジタルカメラなど非連続的に被写体を撮影した場合には、撮影画像信号の変化分が大きくなり、並進成分情報の精度を正確に算出することができない。
本発明の目的は、このような従来技術の問題を解決することにあり、具体的には、専用装置を持っていない一般の人々でもデジタルカメラなど単眼の撮像装置を使用して、同一の被写体を簡単に様々な視点から撮影し、例えばＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式の画像群を保存することにより、高画質の３次元画像を特定の撮影条件下だけでなく生成できる画像入力技術を提供することにある。
【０００６】
【課題を解決するための手段】
前記の課題を解決するために、請求項１記載の発明では、撮像手段を用いて画像を入力する画像入力装置において、少なくとも２箇所以上の視点から同一の被写体を撮影して複数枚の画像情報を入力する撮像手段と、前記被写体を撮影した際の前記撮像手段の姿勢を示す姿勢情報を検出する姿勢検出手段と、前記撮像手段により入力された前記画像情報を前記姿勢情報に基づいて回転変換する画像変換手段と、その画像変換手段により変換された画像情報に前記姿勢検出手段により検出された姿勢情報を付加して記録する画像記録手段とを備えた。
また、請求項２記載の発明では、請求項１記載の発明において、前記撮像手段の撮像面に投影される被写体の中心位置が撮像面の中心位置から外れた場合、撮影された画像情報の中心位置を並進移動させることにより、画像情報の中心位置と被写体の中心位置とを一致させる構成にした。
また、請求項３記載の発明では、請求項１記載の発明において、前記撮像手段により入力された複数の画像情報を変倍する画像変倍手段を備えた。
また、請求項４記載の発明では、請求項１記載の発明において、前記撮像手段が前記被写体を撮像した際の視点位置とは異なる視点から前記被写体を撮影したときに入力される画像情報を生成する任意視点画像生成手段を備えた。
【０００７】
また、請求項５記載の発明では、撮影した画像を入力する画像入力方法において、少なくとも２箇所以上の視点から同一の被写体を撮影して複数枚の画像情報と前記被写体を撮影した際の撮像素子の姿勢を示す姿勢情報とを取得し、取得した前記画像情報を前記姿勢情報に基づいて回転変換し、回転変換された画像情報に前記姿勢情報を付加して記録する構成にした。
また、請求項６記載の発明では、請求項５記載の発明において、撮影された被写体の中心位置が入力された画像情報の中心位置から外れた場合、その画像情報の中心位置を並進移動させることにより画像情報の中心位置と被写体の中心位置とを一致させる構成にした。
また、請求項７記載の発明では、請求項５記載の発明において、入力された複数の画像情報を変倍する構成にした。
また、請求項８記載の発明では、請求項５記載の発明において、前記被写体を撮像した際の視点位置とは異なる視点から前記被写体を撮影したときに入力される画像情報を生成する構成にした。
また、請求項９記載の発明では、情報処理装置上で実行されるプログラムにおいて、請求項５乃至請求項８のいずれか１項に記載の画像入力方法によった画像入力を実行させるようにプログラミングされている構成にした。
また、請求項１０記載の発明では、プログラムを記憶した記憶媒体において、請求項９記載のプログラムを記憶した。
【０００８】
【発明の実施の形態】
以下、図面により本発明の実施の形態を詳細に説明する。
図１は、本発明の第１の実施形態を示す、画像入力装置の構成ブロック図である。図示したように、この実施例の画像入力装置は、撮像装置１および画像処理装置２を備え、撮像装置１は、撮像部１１、姿勢検出部１２、および情報転送部１３を備え、画像処理装置２は、画像変換部２１および画像記録部２２を備える。なお、撮像装置１と画像処理装置２とを一つの装置として同一筐体内に組み込んでもよい。一つの装置とした場合、情報転送部１３は削減可能である。
前記において、撮像部１１は被写体を撮影して画像情報を入力する。一例を挙げれば、デジタルカメラやビデオカメラなどである。また、姿勢検出部１２は撮像装置１の姿勢情報を検出する。この姿勢検出部１２としては、例えば３軸加速度センサと３軸磁気センサを用いる。または、３軸磁気センサの代わりに角速度センサを用いる構成や、磁気センサと角速度センサを併用する構成なども可能である。
情報転送部１３は、撮像部１１により撮影された画像情報と姿勢検出部１２により検出された姿勢情報を画像処理装置２へ転送する。転送方式としては、有線方式（例えばＵＳＢ、ＳＣＳＩ、ＩＥＥＥ１３９４など）や無線方式（例えば無線ＬＡＮ、ＢｌｕｅＴｏｏｔｈなど）、記憶媒体（例えば半導体メモリ、スマートメディア、メモリスティック、フラッシュメモリなど）の移動などを用いる。
画像変換部２１は、姿勢検出部１２により検出された姿勢情報を用いて情報転送部１３により転送される画像情報の画像変換をおこなう。また、画像記録部２２は、例えば半導体メモリ、ハードディスク記憶装置（ＨＤＤ）、磁気テープ、またはＤＶＤ−ＲＷで構成され、画像変換部２１により変換された画像情報、および情報転送部１３により転送された姿勢情報を所定のデータ形式で記録する。
【０００９】
図２に、撮像装置１の詳細な構成を示す。このような構成で、被写体の像は、固定レンズ３１、ズームレンズ３２、絞り機構３３、およびフォーカスレンズ３５を通し、シャッタ３４により露光時間を制御され、撮像素子３６上に形成される。そして、撮像素子３６からの画像信号はＣＤＳ（ＣｏｒｒｅｌａｔｅｄＤｏｕｂｌｅＳａｍｐｌｉｎｇ：相関二重サンプリング）回路３７においてサンプリングされた後、Ａ／Ｄ変換器３８によりデジタル信号化される。このときのタイミングはＴＧ（ＴｉｍｉｎｇＧｅｎｅｒａｔｏｒ）３９により生成される。
画像信号はその後、ＩＰＰ（ＩｍａｇｅＰｒｅ−Ｐｒｏｃｅｓｓｏｒ）４０によりアパーチャ補正などの画像処理および圧縮処理などを施され、一時的に画像バッファメモリ４１に保存される。一時的に保存された画像情報は、さらに、ＭＰＵ（主制御部）４２内にある記憶領域に保存される。
このＭＰＵ４２は各ユニットの動作を制御するとともに、入力指示スイッチ４３からの画像撮影開始指示を受け付け、さらに、姿勢検出部１２を構成している３軸加速度センサ４５および３軸磁気センサ４６からのセンサ出力信号を第２のＡ／Ｄ変換器４７を介して検出する。検出されたセンサ出力信号は撮影された画像情報に添付され、ＭＰＵ４２内にある記憶領域に保存される。ＭＰＵ４２の記憶領域に保存されたデータは、情報転送部１３を介して画像処理装置２に渡される。なお、情報転送部１３としては、ＰＣ（パーソナルコンピュータ）用の汎用インターフェース、例えば、ＲＳ−２３２Ｃ、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、ＩＥＥＥ１３９４、ネットワークアダプタ、ＩｒＤＡ（ＩｎｆｒａｒｅｄＤａｔａＡｓｓｏｃｉａｔｉｏｎ）、無線ＬＡＮ、ＢｌｕｅＴｏｏｔｈなどを用いる。
【００１０】
図３は画像処理装置２のハードウェア構成を示す構成ブロック図である。図示したように、この画像処理装置２は、画像変換部２１などを構成するとともに、各部の制御を実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）５１およびＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）５２、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ：ハードディスク記憶装置）５３、マウスなどポインティングデバイスやキーボードやボタンなどからデータや信号を入力させる入力インターフェース部（以下、入力Ｉ／Ｆと略す）５４、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）など表示装置５５、その表示装置５５の表示制御をおこなう表示インターフェース部（以下、表示Ｉ／Ｆと略す）５６、ＣＤ−ＲＷ（ＣｏｍｐａｃｔＤｉｓｋＲｅｗｒｉｔａｂｌｅ）ドライブなど記録装置５７、撮像装置１やプリンタなど外部機器およびインターネットなど通信回線と有線または無線接続する外部インターフェース部（以下、外部Ｉ／Ｆと略す）５８などを備え、バスによりそれらが接続された構成である。なお、ＳＤＲＡＭ５２は、ＣＰＵ５１の作業記憶領域として用いられるとともに、本実施形態によった画像変換処理など画像処理を実行するための処理プログラムや制御プログラムなどの記憶領域として用いられる。処理プログラムは、例えば記録装置５７を介してＳＤＲＡＭ５２に格納されるか、ＨＤＤ５３に一旦保存された後に必要なときにＳＤＲＡＭ５２に格納される。または外部Ｉ／Ｆ５８に接続された通信回線を介してＳＤＲＡＭ５２に格納される。また、処理の対象となる画像情報は、記録装置５７または外部Ｉ／Ｆ５８を介して撮像装置１から入力される。
【００１１】
次に、撮像装置１の姿勢検出部１２による姿勢検出の一例として、姿勢検出部１２を３軸加速度センサ４５と３軸磁気センサ４６で構成した例について説明する。なお、姿勢検出部１２は、ワールド座標系（ＸＹＺ座標）の向きと撮影したときの撮像装置１に固有な装置座標系（ｘｙｚ座標）の向きとを比較することにより撮像装置１の姿勢情報を検出する。
まず、装置座標系とワールド座標系を以下のように定義する。
（ａ）装置座標系：ｘｙｚ座標系（図４参照）
ｘ軸：画像面右向きを正
ｙ軸：画像面下向きを正
ｚ軸：光軸方向；対象に向かう向きを正
原点ｏ：撮像装置１の光学中心
ｆ：カメラの焦点距離
ｐ：撮像装置１の光学中心から対応点までのベクトル成分
（ｂ）ワールド座標系：ＸＹＺ座標系（図５参照）
Ｙ軸：重力加速度の向きを正
Ｚ軸：磁気の向きを正
Ｘ軸：ＸＹＺの順に右手直交系をなす向き
簡単化のため、撮像装置１の移動により生じる運動加速度は無視でき、重力加速度と磁場は直交し、かつ磁場は地磁気以外に存在しないと仮定する。この場合、厳密には地磁気の伏角が存在し、地磁気と重力加速度とは直交しないが、伏角が既知ならば地磁気の向きと重力加速度が直交する場合と同様に計算できる。また、３軸で地磁気を検出すれば伏角が未知でも姿勢情報を計算可能である。つまり、Ｘ軸は東向き、Ｚ軸は北向きを正にとる。
ワールド座標系に対する装置座標系の向きを、ワールド座標系を基準とした（１）式の回転行列ベクトルＲで記述する。

但し、（１）式において、α、β、γはそれぞれワールド座標系を基準としたＸ軸、Ｙ軸、Ｚ軸回りの回転角であり、このとき撮像装置１を、以下の順に回転させたことに相当する。つまり、ＸＹＺ座標系とｘｙｚ座標系が一致している状態から、次の（１）、（２）、（３）による回転の結果、回転行列ベクトルＲが成立する。
（１）撮像装置１を、Ｚ軸回りにγだけ回転する。
（２）撮像装置１を、Ｘ軸回りにαだけ回転する。
（３）撮像装置１を、Ｙ軸回りにβだけ回転する。
今、重力加速度ベクトルと地磁気ベクトルがワールド座標系においてそれぞれ（２）式で表され、３軸加速度センサ４５および３軸磁気センサ４６により検出された装置座標系を基準とした加速度ベクトルおよび地磁気ベクトルをそれぞれ（３）式で表されるとする。

そのとき、ベクトルｇとベクトルａ、およびベクトルＭとベクトルｍの関係は、回転行列ベクトルＲを用いて以下の（４）式および（５）式で記述される。以下記号Ｒ、ｇ（なおｇはスカラの場合もある）、Ｍ（なおＭはスカラの場合もある）、ａ、ｍはベクトルを表す。
Ｒａ＝ｇ（４）
Ｒｍ＝Ｍ（５）
（４）式からＸ軸回りの回転角αとＺ軸回りの回転角γが（６）式および（７）式のように計算される。

また、Ｚ軸回りの回転角γが既知である場合、（５）式より、地磁気ベクトルｍからＸ軸回りの回転角αが（８）式のように、Ｙ軸回りの回転角βが（９）式のように計算される。

但し、加速度ベクトルａを用いて求めたＸ軸回りの回転角αを用いるならば、（８）式の計算は不要である。以上の計算により、α、β、γおよび回転行列Ｒを、３軸加速度センサ４５と３軸磁気センサ４６から検出することができ、装置座標系をワールド座標系に変換することができる。
複数枚の画像を撮影した際の姿勢情報の変化量を検出することも可能である。基準画像を撮影したときの姿勢情報を回転行列Ｒ_Ａ、相対画像を撮影したときの姿勢情報を回転行列Ｒ_Ｂとすると、基準画像を撮影した際の撮像装置の姿勢から相対画像を撮影した際の撮像装置の姿勢までの変化量の回転行列Ｒ_ＡＢは、
Ｒ_ＡＢ＝Ｒ_Ｂ／Ｒ_Ａとなる。
【００１２】
次に、画像変換部２１がおこなう画像変換について説明する。
ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲなどによった複数の画像を３次元的に表示するための画像群を作成するときには、一般的に画像情報へ前記した姿勢情報としてθ、φの角度情報を付加して記録する。ここで、θおよびφは被写体位置を座標中心としたときに、撮像装置１が移動した際の移動前後の角度である（図６参照）。θおよびφの角度情報を付加した画像情報を図７に示したように複数枚保存し、利用者の指示に応じて画像群を切り替えて表示することにより３次元的に表示するのである。
図８に示したような撮影専用装置３を被写体Ｈと撮像装置１に取り付けて撮影すれば、撮影専用装置３の移動情報からθ、φの角度情報を検出することができる。また、撮影専用装置３を用いると、撮像装置１はθ、φ方向にだけ回転し、ψ方向には回転しないので、画像ごとに被写体Ｈがψ方向に回転して投影されることがない（図９参照．但し、各図にはθの変化の反映を省略している）。それに対して、撮影専用装置３を用いず、撮像装置１を固定しない状態で撮影すると、撮像装置１はθ、φ、ψ方向全部が回転する（図１０参照。但し、各図にはθおよびφの変化の反映を省略している）。つまり、画像群にθ、φの角度情報を付加して表示しても、ψ角度の回転があるので、画像ごとに被写体が回転して投影されてしまうのである。
したがって、画像変換部２１では、回転して投影された画像（図１０参照）を−ψ方向へ回転させることにより、回転していない画像（図９参照）へ変換する。なお、回転変換を実行することにより、画像上に画像情報のない空白部分が生成される（図１１参照）。このような空白部分に、画像背景部分の色を適用して描画してもよい。また、図１２に示したように、隣接する画像情報と空白となる部分の画像情報とが重なっているときには、隣接する画像情報を空白部分に投影して描画してもよい。
【００１３】
次に、画像記録部２２がおこなう画像記録について説明する。
画像記録部２２は、複数の画像を３次元的に表示するために、画像変換部２１により変換された画像情報に姿勢検出部１２により検出された姿勢情報（θ、φの角度情報）を付加して所定のデータ形式で記録・保存する（図１３参照）。このように画像群を記録することにより、利用者の指示に応じて画像群を切り替えて表示することが可能となり、画像群を３次元的に表示することが可能になる。図１４に、撮影専用装置３（図８参照）を用いない場合について動作フローを示す。以下、この動作フローについて説明する。
まず、撮像装置１において２箇所以上の視点から撮影された画像情報と撮像装置１の姿勢情報を取得し、メモリに記憶する（ステップＳ１）。そして、その姿勢情報から撮像装置１の回転角度θ、φ、ψを算出する（ステップＳ２）。このステップＳ２の内訳は次の（Ｓ２−１）（Ｓ２−２）（Ｓ２−３）である。
（Ｓ２−１）３軸加速度センサ４５の出力信号から得られる電圧値の大きさを読み取り、３軸方向の加速度ベクトルの大きさを比較し、重力方向に対する撮像装置１の傾きを検出する。
（Ｓ２−２）３軸磁気センサ４６の出力信号から得られる電圧値の大きさを読み取り、３軸方向の磁気ベクトルの大きさを比較し、地磁気方向に対する撮像装置１の傾きを検出する。
（Ｓ２−３）加速度センサ４５により検出された重力方向に対する傾きと、磁気センサ４６により検出された地磁気方向に対する傾きを合成することにより撮像装置１の回転角度θ、φ、ψを検出する。
【００１４】
次に、画像処理装置２の画像変換部２１が、画像情報を−ψ方向に回転させて画像変換し、画像変換により生じた空白領域を隣接する画像情報で補間する（ステップＳ３）。そして、画像変換部２１は、画像変換した画像情報に回転角度θ、φ情報を付加し、画像記録部２２に記憶する（ステップＳ４）。ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式の画像を生成するためには、姿勢情報（θ、φ）を用いて複数枚撮影した画像群を図９のように整列させる必要があるので、画像変換した画像情報に回転角度θ、φ情報を付加して記憶するのである。ＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲでは、姿勢情報によって整列された画像の位置情報を用いることにより、利用者のマウス操作に合わせた２次元画像を表示し、３次元的画像に見せるわけである。
こうして、この実施形態によれば、一般の人々でも、デジタルカメラなど単眼の撮像装置１を使用して、撮像装置１の姿勢を気にすることなく、被写体を様々な視点から複数枚簡単に撮影して、３次元的に表示する画像群データを自動的に生成し、例えばＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式で姿勢情報とともに保存することができる。
【００１５】
次に、本発明の第２の実施形態について説明する。
図１５は、本発明の第２の実施形態を示す、画像入力装置の構成ブロック図である。また、図１６はその説明図である。図示したように、この実施例の画像入力装置は、撮像装置１と画像処理装置２ａを備え、撮像装置１は、撮像部１１、姿勢検出部１２、および情報転送部１３を備えている。また、画像処理装置２ａは、画像変換部２１、画像記録部２２を備えると共に、対応検出部２３、並進成分算出部２４、３次元位置算出部２５、および被写体中心位置算出部２６を備えている。なお、撮像装置１と画像処理装置２ａを一体の装置として構成してもよい。一体の装置とすることにより情報転送部１３を削減することも可能である。また、撮像部１１、姿勢検出部１２、および情報転送部１３は、第１の実施形態と同じ構成である。
画像変換部２１は、姿勢検出部１２により検出された姿勢情報と並進成分算出部２４により算出された並進成分情報を用い、情報転送部１３により転送された画像情報の画像変換をおこなう。画像記録部２２は第１の実施形態と同じ構成である。
また、対応検出部２３は情報転送部１３により転送された画像情報から画像間の対応点の装置座標系の位置情報を検出し、並進成分算出部２４は情報転送部１３により転送された姿勢情報と対応検出部２３により検出された対応点の位置情報から並進成分情報を算出する。また、３次元位置算出部２５は対応点の装置座標系の位置情報と姿勢情報と並進成分情報から対応点のワールド座標系の３次元位置を算出し、被写体中心位置算出部２６は、対応点の３次元位置情報より、被写体の中心位置を算出する。
【００１６】
次に、対応検出部２３における画像間の対応点検出の一例として、２箇所の視点から撮影された基準画像と相対画像間の対応点検出を図１７により説明する。
まず、基準画像内にある画像情報から特徴量の抽出をおこなって特徴点を検出する。なお、画像情報から特徴量を抽出して特徴点を検出する方法は従来から多くの研究がされており、例えば徐剛、辻三郎著「３次元ビジョン」（共立出版）の３章「特徴抽出」などに記述されている。
次に、基準画像から抽出した特徴量を元にして検出した基準画像の特徴点に対応する対応点を相対画像から検出する。図１７に、対応点検出の一例として、相互相関によるブロックマッチングにより対応点を検出する手法を示す。つまり、基準画像におけるｉ番目の特徴点（ｘｉ０，ｙｉ０）と、相対画像における点（ｘｉ０＋ｄｘ，ｙｉ０＋ｄｙ）の対応付けを（２Ｎ＋１）（２Ｐ＋１）の相関窓（図１７参照）におけるブロックマッチングでおこなう場合の相互相関値Ｓｉは（１０）式で計算されるのである。

なお、Ｎ、Ｐは相関窓の大きさを表す任意に定めた定数である。また、（１０）式における各記号の意味は以下の通りである。における各記号の意味は以下の通りである。

Ｋ：定数
この実施形態では、各特徴点に対して、このような相互相関値Ｓｉを最大にする対応点を順次検出することにより基準画像と相対画像との間の対応点を検出るである。
【００１７】
次に、並進成分算出部２４における並進成分算出の一例を説明する（特開平１１−３０６３６３号公報に詳細な例がある）。
並進成分の算出では、姿勢検出部１２から得た姿勢情報と対応検出部２３から得た対応点を用いて簡単な計算により並進成分情報を求める。その一例として、図１８に示した視点Ａで撮影された画像Ａと視点Ｂで撮影された画像Ｂの並進成分情報ｔを算出する方法を以下に説明する。
図１８に示した画像内にある点（ｘＡ，ｉ，ｙＡ，ｉ）と（ｘＢ，ｉ，ｙＢ，ｉ）は、それぞれ画像Ａと画像Ｂに被写体の特徴点ｏｂｊｅｃｔｉを投影した点を示す。ｉは被写体の対応点を識別するための番号を表す。また、画像Ａから画像Ｂまでの姿勢情報の変化を姿勢情報Ｒで表す。また、ｐＡ，ｉは画像Ａ内にある対応点と視点Ａでの光学中心を結ぶベクトルを、ｐＢ，ｉは画像Ｂ内にある対応点と視点Ｂでの光学中心を結ぶベクトルを表す。ｐＡ，ｉとｐＢ，ｉは、それぞれ（ｘ，ｙ，ｚ）座標系と（ｘ’，ｙ’，ｚ’）座標系とで異なるので、ｐＢ，ｉに姿勢情報Ｒを乗ずる（ＲｐＢ，ｉ）ことにより、（ｘ，ｙ，ｚ）座標系でのｐＢ，ｉの向きを記述する。ここで、ｐＡ，ｉとＲｐＢ，ｉとｔのベクトルは、被写体の特徴点Ｏｂｊｅｃｔｊと視点Ａでの光学中心と視点Ｂでの光学中心を結んだ面上に存在する（図１８参照）。よって、ｐＡ，ｉとＲｐＢ，ｉとｔが作るスカラ３重積（立体の体積を表す）は０となるので（１１）式が成り立つ。

求める並進成分情報ｔはベクトルの向きを表すので変数は２つとなる。したがって、被写体の特徴点Ｏｂｊｅｃｔｉに対応する対応点を２組以上検出し、対応点の位置情報を（１１）式に代入し、連立方程式を解くことによって並進成分情報ｔを算出することができる。対応点が２組以上検出される場合には、相互相関値Ｓｉが高いものだけを使用して算出してもよいし、画像ノイズによる誤差を考慮して（１１）式のスカラ３重積を最小化する並進成分情報ｔを求めてもよいし、多数の並進成分情報を求めて並進成分情報群を投票空間に投影して最適な並進成分情報ｔを求めてもよい（岡谷貴之、出口光一郎著「３次元向きセンサを取り付けたカメラを用いた投票によるカメラの並進運動の推定」ヒューマンインタフェース・コンピュータビジョンとイメージメディア２００１．９．１３）。
【００１８】
次に、３次元位置算出部２５による対応点の３次元位置算出例を説明する。
対応点の３次元位置を算出するには、画像上の対応点の位置情報と姿勢情報と並進成分情報とを用いて３角測量の原理により算出する。並進成分算出部２４が前記した並進成分算出方法により並進成分情報ｔを算出した場合、算出できるのは並進移動方向だけであり、並進移動した距離は算出できない。それに対して、３角測量の原理では、画像Ａ、Ｂを用いて求めた３次元位置と画像Ｂ、Ｃを用いて求めた３次元位置とのスケールが一致しなくなるという問題を生じるが、図１９に示したように３枚の画像Ａ、Ｂ、Ｃに同じ対応点が取得されている場合においては、画像Ａ、Ｂを用いて求めた３次元的な位置と画像Ｂ、Ｃを用いて求めた３次元的な位置とのスケール比を算出することができる。これは画像Ｂ、Ｃ、Ｄを用いたときにも同様なことが成り立つので、全ての画像間で算出した３次元位置のスケール比を算出することが可能となる。撮像装置１が実際に並進移動した距離を算出することはできないが、全ての画像間でスケール比が算出できるので、画像Ａ、Ｂ間の並進成分情報ｔの距離を１と仮定した、対応点の３次元位置を算出することが可能となるのである。
次に、被写体中心位置算出部２６における被写体中心位置算出例を説明する。
この実施例における被写体の中心位置算出では、３次元位置算出部２５により算出された対応点の３次元位置の平均値を算出してそれを被写体の中心位置とする。なお、検出した対応点の数に画像間で偏りがある場合には、各々の画像間で検出した対応点に重み付けをして平均値を算出してもよい。
【００１９】
次に、画像変換部２１における画像変換の一例を説明する。
図８に示したような撮影専用装置を被写体と撮像装置１に取り付けて撮影を実行する際には、撮影画像の中心位置に被写体が投影されるように撮影することは容易であるが（図２０参照）、撮影専用装置を使用しないで撮影する場合には、撮像装置１が固定されていないので、被写体は撮影画像の中心位置からずれて投影されてしまう（図２１参照）。このように、被写体が中心位置からすれて投影されてしまうと、利用者の指示に応じて画像群を切り替えたときに被写体が画像上で上下左右に移動し、利用者の意図した画像とは異なる画像が表示され、３次元的な表示ができなくなる。そのため、この実施形態では、画像変換部２１が被写体の投影位置を撮影画像の中心位置に移動させる画像変換を実行する。
一般的な撮像装置１は、図４に示したように中心射影で被写体を撮像面に投影するが、このとき、図２２に示したように正射影モデルで被写体が撮像面に投影されると近似した場合の画像変換の一例を以下に示す。
まず、姿勢検出部１２により検出された姿勢情報Ｒと並進成分算出部２４により算出された並進成分情報ｔから撮像装置１の各々の撮影位置を求め、撮像装置１の撮像面（図２３に示した例えばＸ−Ｙ平面）とその撮像面の法線（図２３に示したｚ方向の直線）を算出する。撮像面の法線方程式を（１２）式に、撮像面の平面方程式を（１３）式に示す。

ここで、撮像面の法線は点（Ｘ１，Ｙ１，Ｚ１）、（Ｘ２，Ｙ２，Ｚ２）を通る直線であり、撮像面は（Ｘ２，Ｙ２，Ｚ２）を通る平面とする。
前記において、画像Ａの撮影位置をワールド座標系の基準位置とすると、画像Ａの法線と撮像面の方程式は、（Ｘ１，Ｙ１，Ｚ１）＝（０，０，１），（Ｘ２，Ｙ２，Ｚ２）＝（０，０，０）を（１２）、（１３）式へ代入することによって算出される。また、画像Ｂの法線と撮像面の方程式は姿勢情報Ｒと並進成分情報ｔより、点（Ｘ１，Ｙ１，Ｚ１）、（Ｘ２，Ｙ２，Ｚ２）の位置を変換した点（Ｘ’１，Ｙ’１，Ｚ’１）、（Ｘ’２，Ｙ’２，Ｚ’２）（変換式を（１４）（１５）式に示す）を（１２）、（１３）式に代入することによって算出される。同様にして各々の撮影位置における撮像面の法線と平面方程式を算出することが可能である。
【００２０】
次に、被写体中心位置算出部２６により算出された被写体中心位置を通り、法線ベクトルＬ１と平行な直線Ｌ２を算出する（図２４参照）。被写体中心位置を（Ｘ３，Ｙ３，Ｚ３）とすると、画像Ａにおける直線方程式は（１６）式のようになる。同様にして各々の撮影位置で被写体中心位置を通り法線ベクトルＬ１と平行な直線Ｌ２を算出することが可能である（図２４参照）。
続いて、算出した直線Ｌ２と撮像面との交点ｐ４（Ｘ４，Ｙ４，Ｚ４）を算出する（図２４参照）。交点ｐ４は（１３）、（１６）式の連立方程式を解くことによって算出可能である。さらに、撮影画像の中心位置（ＸＣ，ＹＣ，ＺＣ）を交点ｐ４の位置（Ｘ４，Ｙ４，Ｚ４）と一致させるように画像情報を並進移動させるに当たって、画像情報を並進移動させる並進ベクトル（Ｔｘ，Ｔｙ）を算出するため、撮影画像の中心位置（ＸＣ，ＹＣ，ＺＣ）と交点ｐ４の位置（Ｘ４，Ｙ４，Ｚ４）を（１７）、（１８）式より装置座標系（ｘＣ，ｙＣ，ｚＣ）、（ｘ４，ｙ４，ｚ４）に変換する（図２４に矢印で示す）。ここで、撮影画像の中心位置と交点ｐ４の位置は、同じ撮像面内にあるので、装置座標系におけるＺ座標値は同じ値となる。

よって、（１９）式のＴｚは、常にＴｚ＝０となり、並進ベクトル（Ｔｘ，Ｔｙ）は（２０）式のようになる。この並進ベクトル方向に撮影画像を並進移動させることにより、被写体の投影位置を撮影画像の中心位置に移動させるのである。
前記において、画像変換部２１では、第１の実施形態のように投影された画像をψ方向へ回転させる画像変換をしてもよい。また、回転移動および並進移動によって画像上に画像情報がなく、空白になる部分が生成されるが、その空白部分には画像背景部分の色を適用して描画してもよいし、隣接する画像情報と空白となる部分の画像情報が重なっているときには、隣接する画像情報を空白部分に投影して描画してもよい。
【００２１】
図２５および図２６に、この実施形態の画像入力動作の動作フローを示す。以下、図２５および図２６に従って、この動作フローを説明する。
まず、撮像装置１において２箇所以上の視点から撮影された画像情報と撮像装置１の姿勢情報を取得しメモリに記憶する（ステップＳ１１）。そして、姿勢情報から撮像装置１の回転角度θ、φ、ψを算出する（ステップＳ１２）。つまり、まず、撮像装置１において、３軸加速度センサ４５の出力信号から得られる電圧値の大きさを読み取り、３軸方向の加速度ベクトルの大きさを比較し、重力方向に対する画像入力装置の傾きを検出するとともに、３軸磁気センサ４６の出力信号から得られる電圧値の大きさを読み取り、３軸方向の磁気ベクトルの大きさを比較し、地磁気方向に対する画像入力装置の傾きを検出する。そして、３軸加速度センサ４５から検出される重力方向に対する傾きと、磁気センサ４６から検出される地磁気方向に対する傾きを合成することにより撮像装置１の回転角度θ、φ、ψが検出する。
【００２２】
続いて、対応検出部２３が基準画像から特徴点を検出し、相対画像から各特徴点に対応する対応点の位置情報を検出する（ステップＳ１３）。具体的にはまず、基準画像を構成する各画素を中心位置にして、特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出する。さらに、基準画像を領域分割し、分割した領域内で特徴量抽出ブロックの輝度値分布の差が顕著であるブロックを検出し、検出したブロックの中心位置にある画像位置情報と輝度値分布情報をメモリに記憶する（輝度値分布情報の代わりにＲＧＢ値分布情報を用いてもよい）。
続いて、相対画像において、基準画像の場合と同様に特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出し、記憶されている基準画像の輝度値分布情報と相対画像の特徴量抽出ブロックの輝度値分布情報とのマッチングをおこない、各特徴点に対して最も近い特徴量抽出ブロックの中心位置にある画像位置情報を対応点の位置情報としてメモリに記憶する。但し、特徴点の特徴量抽出ブロックの輝度値分布情報に近い特徴量抽出ブロックが存在しない場合には、記憶した相対画像の特徴点の画像位置情報と輝度値分布情報をメモリから消去する。
次に、対応点の位置情報と姿勢情報から並進成分算出部２４が並進成分情報を算出する（ステップＳ１４）。具体的にはまず、（１１）式に姿勢情報と対応点の位置情報を代入して連立方程式を立てる（一般に３次元空間の中で、剛体の移動を表すベクトルの変数は３個であるが、デジタルカメラを想定した透視投影による撮影なので、画像座標系にある画像情報からワールド座標系にある被写体のスケールは一意に決定されない。よって、並進成分情報は、ベクトルの向きを表し、変数の数は２個となる）。対応点が多数に検出される場合には、並進成分情報の変数の数よりも多く連立方程式が成り立つ。したがって、最小２乗法により多数の連立方程式で誤差が最も小さい解を算出し、並進成分情報としてメモリに記憶する。
【００２３】
次に、姿勢情報、並進成分情報、および対応点の位置情報（装置座標系）から３次元位置算出部２５が対応点の３次元位置情報（ワールド座標系）を算出する（ステップＳ１５）。具体的にはまず、姿勢情報および並進成分情報から基準画像と相対画像との撮影位置関係を求め、３角測量により基準画像と相対画像上にある対応点の位置情報（装置座標系）を求め、その位置関係から対応点の３次元位置情報（ワールド座標系）を算出することにより全画像上の対応点の３次元位置情報を算出する。そして、全ての画像間で算出した３次元位置のスケール比を算出し、初めの画像間で算出した並進成分情報ｔの距離を１と仮定して、全ての画像間のスケール比を一致させ、対応点の３次元位置（ワールド座標系）を算出する。
続いて、被写体中心位置算出部２６が対応点の３次元位置情報（ワールド座標系）から被写体中心位置を算出する（ステップＳ１６）。対応点の３次元位置の平均値を算出し、平均値を被写体中心位置とするのである。さらに、画像変換部２１が、画像情報を並進移動（Ｔｘ，Ｔｙ）させる画像変換をおこなう（ステップＳ１７）。具体的にはまず、姿勢情報と並進成分情報から撮像装置１内にある撮像面と撮像面の法線の方程式を算出し、前記被写体中心位置を通り、法線と平行な直線を算出する。そして、撮像面とその直線との交点を算出し、撮影画像の中心位置をその交点の位置と一致させるように画像情報を並進移動させる。
このあとは、画像変換した画像情報に回転角度θ、φ情報を付加し、その画像情報などを画像記録部２２に保存する（ステップＳ１８）。
こうして、この実施形態によれば、図８に示したような特別な画像撮影専用装置などを使用しなくとも、被写体を撮像装置１により様々な視点から複数枚撮影するだけで３次元的に表示するための画像群データを自動的に生成することが可能になる。
【００２４】
次に、本発明の第３の実施形態について説明する。
第２の実施形態では、被写体中心位置を算出するためにワールド座標上における対応点の３次元位置を算出し、対応点の平均値を被写体中心位置と仮定していたが、第３の実施形態では、各々の画像を撮影した際の撮像装置１の光軸方向を考慮することにより被写体中心位置を算出する。
図２７に、第３の実施形態の画像入力装置の構成ブロック図を示す。図示したように、この画像入力装置は、撮像装置１と画像処理装置２ｂから構成され、撮像装置１は、撮像部１１、姿勢検出部１２、および情報転送部１３を備える。また、画像処理装置２ｂは、画像変換部２１、画像記録部２２、対応検出部２３、並進成分算出部２４、および光軸高密度位置検出部２７などを備える。なお、撮像装置１と画像処理装置２ｂを一つの装置として構成してもよい。一つの装置とすることで、情報転送部１３を削減することも可能である。また、撮像部１１、姿勢検出部１２、情報転送部１３、画像変換部２１、画像記録部２２、対応検出部２３、および並進成分算出部２４は第１の実施形態と同様な構成である。
一方、光軸高密度位置検出部２７は、各々の撮影位置における光軸（ｚ方向）をワールド座標上へ描画したとき、光軸が交わる位置または光軸の密度が最も高い位置を検出する。以下、光軸高密度位置検出部２７が光軸密度の高い位置を検出する動作を説明する。
前記したように、第２の実施形態では、対応点の３次元位置の平均値を被写体中心位置としていたが、第３の実施形態では、撮影者は被写体の中心位置を画像中心に投影されるように撮影しているという仮定のもと、投影された各画像の被写体中心位置を平均すると統計的に被写体中心位置はその平均画像中心位置に一致すると考え、光軸密度の最も高い位置を被写体中心位置とする。
そこで、この実施形態では、姿勢検出部１２により検出された姿勢情報Ｒと並進成分算出部２４により算出された並進成分情報ｔから各々の撮像装置１の撮影位置を求め、（１３）式の（Ｘ２，Ｙ２，Ｚ２）に撮影画像の中心位置（ＸＣ，ＹＣ，ＺＣ）を代入することにより光軸の直線方程式を算出する。図２８は、各々の撮影位置における光軸の直線方程式を算出し、ワールド座標上に投影したものである。このようなワールド座標系の空間を図２９に示したように領域分割し、領域ごとに光軸の通る本数を算出する。そして、各領域の中で最も光軸の通る本数が多かった領域の中心位置を光軸密度の最も高い位置として検出し、検出した位置を被写体中心位置として画像変換部２１が画像変換をおこなう。
【００２５】
図３０に、この実施形態の動作フローを示す。以下、図３０に従ってこの実施形態の動作を説明する。
まず、撮像装置１において、２箇所以上の視点から撮影された画像情報と撮像装置１の姿勢情報を取得し、メモリに記憶する（ステップＳ２１）。そして、その姿勢情報から撮像装置１の回転角度θ、φ、ψを算出する（ステップＳ２２）。具体的にはまず、３軸加速度センサ４５の出力信号から得られる電圧値の大きさを読み取り、３軸方向の加速度ベクトルの大きさを比較し、重力方向に対する画像入力装置の傾きを検出する。さらに、３軸磁気センサ４６の出力信号から得られる電圧値の大きさを読み取り、３軸方向の磁気ベクトルの大きさを比較し、地磁気方向に対する画像入力装置の傾きを検出する。続いて、３軸加速度センサ４５により検出される重力方向に対する傾きと、磁気センサ４６により検出される地磁気方向に対する傾きを合成することにより撮像装置１の回転角度θ、φ、ψを検出する。
次に、対応検出部２３が基準画像から特徴点を検出し、各特徴点に対応する対応点の位置情報を相対画像から検出する（ステップＳ２３）。具体的にはまず、基準画像を構成する各画素を中心位置にして、特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出する。そして、基準画像を領域分割し、分割した領域内で特徴量抽出ブロックの輝度値分布の差が顕著であるブロックを検出し、検出したブロックの中心位置にある画像位置情報と輝度値分布情報をメモリに記憶する（輝度値分布情報の代わりにＲＧＢ値分布情報を用いてもよい）。
続いて、相対画像において、同様に特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出し、記憶しておいた基準画像の輝度値分布情報と相対画像の特徴量抽出ブロックの輝度値分布情報とのマッチングをおこない、各特徴点に対して、最も近い特徴量抽出ブロックの中心位置にある画像位置情報を対応点の位置情報としてメモリに記憶する。但し、特徴点の特徴量抽出ブロックの輝度値分布情報に近い特徴量抽出ブロックが存在しない場合には、記憶しておいた特徴点の画像位置情報と輝度値分布情報をメモリから消去する。
【００２６】
次に、並進成分算出部２４が対応点の位置情報と姿勢情報から並進成分情報を算出する（ステップＳ２４）。具体的にはまず、（１１）式に姿勢情報と対応点の位置情報を代入し連立方程式を立てる（一般に３次元空間の中で、剛体の移動を表すベクトルの変数は３個であるが、デジタルカメラを想定した透視投影による撮影なので、ワールド座標系にある被写体のスケールは画像座標系にある画像情報から一意に決定されない。よって、並進成分情報は、ベクトルの向きを表し、変数の数は２個となる）。なお、対応点が多数に検出される場合には、並進成分情報の変数の数よりも多く連立方程式が成り立つ。したがって、最小２乗法を用い、多数の連立方程式により誤差が最も小さい解を算出し、並進成分情報としてメモリに記憶する。
次に、光軸高密度位置検出部２７が姿勢情報および並進成分情報から光軸の直線方程式を算出し、光軸密度の高い位置を検出する（ステップＳ２５）。具体的にはまず、姿勢情報および並進成分情報から各々の画像を撮影したときの光軸方向を算出し、ワールド座標空間を領域分割し、分割した領域ごとに光軸が通る本数を算出する。そして、光軸が最も多く通る領域の中心位置を被写体中心位置と仮定して検出する。
次に、画像変換部２１が画像情報を並進移動（Ｔｘ，Ｔｙ）させる画像変換をおこなう（ステップＳ２６）。具体的にはまず、姿勢情報と並進成分情報から撮像装置１内にある撮像面と撮像面の法線の方程式を算出し、算出した被写体中心位置を通り、法線と平行な直線を算出する。そして、算出した直線と撮像面との交点を算出し、撮影画像の中心位置と算出した交点の位置とを一致させるように画像情報を並進移動させる。
このあとは、画像変換した画像情報に回転角度θ、φ情報を付加し、画像記録部２２にその画像情報などを保存する（ステップＳ２７）。
こうして、この実施形態によれば、図８に示したような特別な画像撮影専用装置などを使用しなくとも、撮像装置１により被写体を様々な視点から複数枚撮影するだけで３次元的に表示するための画像群データを自動的に生成することが可能になる。
【００２７】
次に、本発明の第４の実施形態について説明する。
この実施形態の画像入力装置は、第２または第３の実施形態の画像処理装置に画像変倍手段を追加した構成となる（例えば画像変換部２１内に追加する）。このような構成で、画像変倍手段が各々の撮影画像に投影された被写体の大きさが等しくなるように撮影画像の倍率を変化させるのである。被写体を撮影する際に撮像面から被写体までの距離が異なると、撮像面に投影される被写体の大きさも異なり図３１に示したように投影されるので、画像変倍手段が撮像面から被写体までの距離に応じて撮影した画像を変倍することにより撮像面に投影される被写体の大きさを一定にするのである。以下、その動作を説明する。
まず、画像Ａを撮影した撮像面の位置（Ｘ２，Ｙ２，Ｚ２）から被写体中心位置（Ｘ３，Ｙ３，Ｚ３）までの距離Ｌを（２１）式により算出する。さらに、画像Ｂを撮影した撮像面の位置（Ｘ’２，Ｙ’２，Ｚ’２）から被写体中心位置（Ｘ３，Ｙ３，Ｚ３）までの距離Ｌ’を（２２）式により算出する。そして、画像Ａを基準画像とし、画像Ａの距離Ｌと画像Ｂの距離Ｌ’との比を算出して画像Ｂの変倍率Ｈ_Ｂ＝Ｌ’／Ｌを算出する。さらに、画像Ｂの撮影画像をＨ_Ｂ倍することにより画像Ａと画像Ｂの撮像面に投影される被写体の大きさを一定にする。このような処理を各々の撮影画像に対して実行し、全ての画像面に投影される被写体の大きさを一定にするのである。以下、図３２に示した動作フローを参考にしてこの実施形態の動作フローを説明する。
まず、撮像装置１において２箇所以上の視点から撮影された画像情報と撮像装置１の姿勢情報を取得し、メモリに記憶する（ステップＳ３１）。
続いて、姿勢情報から撮像装置１の回転角度θ、φ、ψを算出する（ステップＳ３２）。具体的にはまず、３軸加速度センサ４５の出力信号から得られる電圧値の大きさを読み取り、３軸方向の加速度ベクトルの大きさを比較し、重力方向に対する画像入力装置の傾きを検出する。さらに、３軸磁気センサ４６の出力信号から得られる電圧値の大きさを読み取り、３軸方向の磁気ベクトルの大きさを比較し、地磁気方向に対する画像入力装置の傾きを検出する。そして、３軸加速度センサ４５から検出される重力方向に対する傾きと、３軸磁気センサ４６から検出される地磁気方向に対する傾きを合成することにより撮像装置１の回転角度θ、φ、ψを検出する。
【００２８】
次に、対応検出部２３が基準画像から特徴点を検出し、各特徴点に対応する対応点の位置情報を相対画像から検出する（ステップＳ３３）。具体的にはまず、基準画像を構成する各画素を中心位置にして特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出する。続いて、基準画像を領域分割し、分割した領域内で特徴量抽出ブロックの輝度値分布の差が顕著であるブロックを検出し、検出したブロックの中心位置にある画像位置情報と輝度値分布情報をメモリに記憶する。さらに、相対画像において、同様に特徴量抽出ブロックを作成し、ブロック内の輝度値分布を検出する。そして、記憶しておいた基準画像の輝度値分布情報と相対画像の特徴量抽出ブロックの輝度値分布情報とのマッチングをおこない、各特徴点に対して、最も近い特徴量抽出ブロックの中心位置にある画像位置情報を対応点の位置情報としてメモリに記憶する。但し、基準画像の特徴点の特徴量抽出ブロックの輝度値分布情報に近い特徴量抽出ブロックが存在しない場合には、記憶しておいた特徴点の画像位置情報と輝度値分布情報をメモリから消去する。
次に、並進成分算出部２４が、対応点の位置情報と姿勢情報とを（１１）式に代入することにより連立方程式を解き、並進成分情報を算出する（ステップＳ３４）。但し、対応点が多数に検出される場合には、並進成分情報の変数の数よりも多く連立方程式が成り立つので、最小２乗法により多数の連立方程式から誤差の最も小さい解を算出し、並進成分情報としてメモリに記憶する。
次に、光軸高密度位置検出部２７が姿勢情報および並進成分情報から光軸の直線方程式を算出し、光軸密度が高い位置を検出する（ステップＳ３５）。つまり、姿勢情報および並進成分情報から各々の画像を撮影したときの光軸方向を算出し、ワールド座標空間を領域分割し、分割した領域ごとに光軸が通る本数を算出し、光軸が最も多く通る領域の中心位置を被写体中心位置と仮定して検出する。次に、画像変換部２１内の画像変倍手段が、各々の画像を撮影した際の撮像面から被写体中心位置までの距離を算出し、各々の画像間における距離の比を算出する（ステップＳ３６）。具体的にはまず、各々の画像において、撮像面から被写体中心位置までの距離を算出し、算出した距離の比を算出して、画像変倍率Ｈを求める。さらに、得られた距離の比から画像の変倍率を算出し、画像変倍手段が画像情報を変倍して画像変換する（ステップＳ３７）。そして、画像変換した画像情報に回転角度θ、φ情報を付加して保存する（ステップＳ３８）。
【００２９】
次に、本発明の第５の実施形態について説明する。
図３３は、この実施形態の画像入力装置を示す、構成ブロック図である。図示したように、この実施形態の画像入力装置は撮像装置１と画像処理装置２ｃから構成されている。また、撮像装置１は、撮像部１１、姿勢検出部１２、および情報転送部１３から構成され、画像処理装置２ｃは、画像記録部２２、対応検出部２３、並進成分算出部２４、任意視点画像生成部２９などから構成されている。撮像装置１と画像処理装置２ｃを一つの装置として構成してもよい。また、撮像部１１、姿勢検出部１２、情報転送部１３、画像記録部２２、対応検出部２３、および並進成分算出部２４は第１の実施形態と同様の構成である。
このような構成で、この実施例の画像入力装置では、任意視点画像生成部２９が、任意視点画像を生成するために、撮影された画像群、姿勢情報、および並進成分情報から光線空間画像を生成する。姿勢情報と並進成分情報から撮影画像内の各画素値を３次元空間内の光線データに変換し、変換した光線データを光線空間ｆ（Ｘ，Ｙ，Ｚ，θ，φ）に投影し、光線空間上の光線データがない領域に対しては線形補間処理などを実行して光線データを補間合成して、光線空間画像を生成するのである。なお、光線空間画像を生成する手法については、公知の技術であり、柳澤健之、苗村健、金子正秀、原島博「光線空間を用いた３次元物体の操作」（テレビジョン学会誌ｖｏｌ５０，Ｎｏ９，ｐｐ１３４５〜１３５１，１９９６）、苗村健、金子正秀、原島博「光線情報の正投影表現に基づく３次元空間の記述」（テレビジョン学会誌ｖｏｌ５１，Ｎｏ１２，ｐｐ２０８２〜２０９０，１９９７）、苗村健、原島博「３次元画像の符号化技術（後編）」（画像ラボ、ｐｐ５７〜６２，１９９７．４）などに記述されている。
【００３０】
図３４にこの実施例の動作フローを示す。以下、図３４に従ってこの動作フローを説明する。
まず、撮像装置１において、２箇所以上の視点から撮影された画像情報と撮像装置１の姿勢情報を取得し、メモリに記憶する（ステップＳ４１）。さらに、前記した第４の実施形態と同様にして姿勢情報から撮像装置１の回転角度θ、φ、ψを算出する（ステップＳ４２）。
次に、第４の実施形態と同様にして、対応検出部２３が基準画像から特徴点を検出し、各特徴点に対応する対応点の位置情報を相対画像から検出する（ステップＳ４３）。さらに、第４の実施形態と同様にして、並進成分算出部２４が対応点の位置情報と姿勢情報から並進成分情報を算出する（ステップＳ４４）。
次に、任意視点画像生成部２９が姿勢情報および並進成分情報から光線空間画像を生成する（ステップＳ４５）。具体的にはまず、姿勢情報および並進成分情報から撮影画像内の各画素値を３次元空間内の光線データに変換し、その光線データを光線空間上ｆ（Ｘ，Ｙ，Ｚ，θ，φ）に投影する。なお、光線空間上の光線データがない領域に対しては、線形補間処理により光線データを補間合成し、光線空間画像を生成する。
このあとは、生成された光線空間画像を画像記録部２２に記録する（ステップＳ４６）。
【００３１】
以上、図１などに示した実施の形態について説明したが、撮像装置は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）や携帯情報端末装置などを用いて実現させることもできる。近年、各種処理プログラムを読み取ることにより種々の機能を実現可能なデジタルカメラも登場するようになった。また、向きを回動可能なカメラを具備したノート型ＰＣや、カメラを搭載したＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）および携帯電話が普及するようになったが、これらの機器に前記した第１乃至第５の実施形態で説明した画像入力方法を実現する処理プログラムを実行させることによりその機能を実現させることができるのである。
例えば、カメラを具備したノート型ＰＣにおいて、このようなプログラムを実行させる場合について言えば、ＡＦ（ＡｕｔｏＦｏｃｕｓ）などカメラ固有の処理はカメラ内部で実行させ、撮影モードの選択、撮影の開始・終了の制御、焦点検出動作の一時停止、画像および焦点検出領域の表示などは、プログラムを取り込んだ後、ノート型ＰＣの任意のユーザー・インターフェイスを介してそのプログラムを実行させる。以下は、ユーザー・インターフェイスにおける割り付けの一例である。
撮影モードの選択：カーソルキー
撮影の開始・終了の制御：リターンキー
焦点検出動作の一時停止：スペースキー
画像および焦点検出領域の表示：ＬＣＤ（ＰＣが備えている液晶表示装置）に表示
前記において、各実施例で説明した画像入力方法を実現する処理プログラムが書き込まれている記憶媒体からそのプログラムを読み取らせることにより、本発明によった画像入力を実現する実施形態を第６の実施形態として図３５に示す。図３５（ａ）に示したように、そのプログラムが書き込まれているＣＤ−ＲＯＭ６１をカメラ付きノート型ＰＣ６２に装着し、適宜そのプログラムを実行させる。また、図３５（ｂ）に示したように、そのようなプログラムを書き込んだスマートメディア６３を、それを読み取り可能なデジタルスチルカメラ６４に装着し、適宜そのプログラムを実行させることによっても実現させることができる。なお、前記プログラムを書き込んでおく記憶媒体は前記した例に制限されず、例えばＣＤ−ＲＷやＤＶＤ−ＲＯＭなど別の媒体であってもよい。
【００３２】
【発明の効果】
以上説明したように、本発明によれば、請求項１および請求項５記載の発明では、少なくとも２箇所以上の視点から同一の被写体を撮影して複数枚の画像情報とその被写体を撮影した際の撮像素子の姿勢を示す姿勢情報とを取得し、取得した画像情報を姿勢情報に基づいて回転変換し、回転変換させた画像情報に姿勢情報を付加して記録することができるので、専用装置を持っていない一般の人々でもデジタルカメラなど単眼の撮像装置を使用して、同一の被写体を簡単に様々な視点から撮影し、例えばＱｕｉｃｋＴｉｍｅ（登録商標）ＶＲ形式の画像群を保存することができ、したがって、高画質の３次元的画像を特定の撮影条件下だけでなく生成できる。
【００３３】
また、請求項２記載の発明では請求項１記載の発明において、請求項６記載の発明では請求項５記載の発明において、撮影された被写体の中心位置が入力された画像情報の中心位置から外れた場合、その画像情報の中心位置を並進移動させることにより画像情報の中心位置と被写体の中心位置とを一致させることができるので、撮影時に画像情報の中心位置と被写体の中心位置とを一致させる苦労がなくなる。
また、請求項３記載の発明では請求項１記載の発明において、請求項７記載の発明では請求項５記載の発明において、入力された複数の画像情報を変倍することができるので、被写体・撮影位置間の距離のばらつきにより画像面に投影される被写体の大きさが変化しても、画像群データ内の被写体の大きさを同一にすることができる。
また、請求項４記載の発明では請求項１記載の発明において、請求項８記載の発明では請求項５記載の発明において、被写体を撮像した際の視点位置とは異なる視点から被写体を撮影したときに入力される画像情報を生成することができるので、撮影した画像枚数が少ない画像群データでも被写体を３次元的に表示することができる。
また、請求項９記載の発明では、請求項５乃至請求項８のいずれか１項に記載の画像入力方法によった画像入力を実行させるようにプログラミングされているプログラムを情報処理装置上で実行させることができるので、情報処理装置を用いて請求項５乃至請求項８のいずれか１項に記載の発明の効果を得ることができる。
また、請求項１０記載の発明では、請求項９記載のプログラムを着脱可能な記憶媒体に記憶することができるので、その記憶媒体をこれまで請求項５乃至請求項８のいずれか１項に記載の発明によった画像入力をおこなえなかったパーソナルコンピュータなど情報処理装置に装着することにより、そのような情報処理装置においても請求項５乃至請求項８のいずれか１項に記載の発明の効果を得ることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態を示す、画像入力装置の構成ブロック図。
【図２】本発明の第１の実施形態を示す、画像入力装置要部の構成ブロック図。
【図３】本発明の第１の実施形態を示す、画像入力装置要部のハードウェア構成図。
【図４】本発明の第１の実施形態を示す、画像入力装置要部の説明図。
【図５】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図６】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図７】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図８】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図９】本発明の第１の実施形態の画像入力装置に係る説明図。
【図１０】本発明の第１の実施形態の画像入力装置に係る他の説明図。
【図１１】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図１２】本発明の第１の実施形態を示す、画像入力装置要部の他の説明図。
【図１３】本発明の第１の実施形態を示す、画像入力装置要部のデータ構成図。
【図１４】本発明の第１の実施形態を示す、画像入力方法の動作フロー図。
【図１５】本発明の第２の実施形態を示す、画像入力装置の構成ブロック図。
【図１６】本発明の第２の実施形態を示す、画像入力装置の説明図。
【図１７】本発明の第２の実施形態を示す、画像入力装置要部の説明図。
【図１８】本発明の第２の実施形態を示す、画像入力装置要部の他の説明図。
【図１９】本発明の第２の実施形態を示す、画像入力装置要部の他の説明図。
【図２０】本発明の第２の実施形態の画像入力装置に係る説明図。
【図２１】本発明の第２の実施形態の画像入力装置に係る他の説明図。
【図２２】本発明の第２の実施形態を示す、画像入力装置要部の他の説明図。
【図２３】本発明の第２の実施形態を示す、画像入力装置要部の他の説明図。
【図２４】本発明の第２の実施形態を示す、画像入力装置要部の他の説明図。
【図２５】本発明の第２の実施形態を示す、画像入力方法の動作フロー図。
【図２６】本発明の第２の実施形態を示す、画像入力方法の他の動作フロー図。
【図２７】本発明の第３の実施形態を示す、画像入力装置の構成ブロック図。
【図２８】本発明の第３の実施形態を示す、画像入力装置要部の説明図。
【図２９】本発明の第３の実施形態を示す、画像入力装置要部の他の説明図。
【図３０】本発明の第３の実施形態を示す、画像入力方法の動作フロー図。
【図３１】本発明の第４の実施形態の画像入力装置に係る説明図。
【図３２】本発明の第４の実施形態を示す、画像入力方法の動作フロー図。
【図３３】本発明の第５の実施形態を示す、画像入力装置の構成ブロック図。
【図３４】本発明の第５の実施形態を示す、画像入力方法の動作フロー図。
【図３５】本発明の第６の実施形態を示す、画像入力装置のハードウェア構成図。
【符号の説明】
１撮像装置、２画像処理装置、１１撮像部、１２姿勢検出部、１３情報転送部、２１画像変換部、２２画像記録部、２３対応検出部、２４並進成分算出部、２５３次元位置算出部、２６被写体中心位置算出部、２７光軸高密度位置検出部、２９任意視点画像生成部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing technique for three-dimensionally displaying, on a display device or the like, a group of captured images obtained by capturing the same subject from various viewpoints using a digital camera or a video camera.
[0002]
[Prior art]
With the dramatic advancement of electronic technology and the low cost of digital cameras, digital cameras have rapidly spread. Because digital cameras are small and convenient to carry and can be easily installed in various devices, applications that use image information taken with digital cameras are being developed. One example of such an application is an electronic commerce advertisement. An image obtained by photographing a product with a digital camera is posted on the Web (an electronic information source provided so that many users can view it on the screen of their information processing apparatus via a network). At this time, since the appearance of the product can be seen only in an image obtained by photographing the product from one direction, it is necessary to publish a plurality of images taken from various directions.
As a method of expressing a plurality of images, a method of displaying images in a simple arrangement is possible, but it is difficult to grasp an image of the entire product simply by arranging a plurality of images. In order to solve this problem, the product is expressed as a three-dimensional image by calculating the three-dimensional shape of the product from a plurality of image groups, generating a polygon, and displaying the image texture pasted on the polygon. There is a technique that makes it easy to grasp the product image. However, it is difficult to accurately measure a three-dimensional shape from a group of images taken with a digital camera. As a result, image quality deteriorates when a three-dimensional image is generated by pasting an image texture onto the measured three-dimensional shape. There was a problem.
As a method for solving such a problem, there is an expression method such as QuickTime (registered trademark) VR that displays a product three-dimensionally by switching photographed images according to a user's instruction. Such an expression method is expected because it can easily grasp the image of a product without degrading the image quality. However, in the above-described expression method such as QuickTime (registered trademark) VR, the image group data indicates the photographing position relationship between the digital camera and the product when the product is photographed so that the image group is switched according to a user instruction. Must be saved to. Therefore, it is necessary to measure the shooting position relationship between the digital camera and the product. Generally, when shooting a group of images such as QuickTime (registered trademark) VR, a device dedicated to shooting QuickTime (registered trademark) VR images is used.
[0003]
In the prior art described in JP-A-9-81790, the movement of the camera is detected by an acceleration sensor or an angular velocity sensor, and the optical axis direction is corrected so that the optical axes from different viewpoints intersect at an arbitrary point. A three-dimensional image is generated. At that time, the subject is detected by comparing the estimated value of the motion vector calculated from the sensor information and the preset distance between the subject and the camera and the motion vector obtained by the image processing.
In the prior art disclosed in Japanese Patent Application Laid-Open No. 9-186957, the position vector (X, Y, Z) and posture (θ, φ) of a plurality of imaging means are added to the captured image information and stored. Thus, an image group obtained by using an image group obtained by photographing the same subject from various viewpoints is recorded so as to be displayed three-dimensionally.
In the prior art disclosed in Japanese Patent Laid-Open No. 9-245195, a group of images taken from different viewpoints is projected onto the light space, and the light space data is created over the entire light space, and the image is taken from the light space data. An image with an arbitrary viewpoint different from the viewpoint is generated.
In the prior art disclosed in Japanese Patent Laid-Open No. 11-306363, a posture detection sensor is attached to the photographing apparatus, posture information that changes when the position of the imaging device is moved is calculated from the sensor information, and translational movement information is obtained. Calculates the position information of the moved imaging apparatus with high accuracy by calculating from the position information of corresponding points in at least two images, and measures the three-dimensional shape of the subject.
In the prior art disclosed in Japanese Patent Application Laid-Open No. 2001-177850, posture information is detected by attaching a posture sensor to an imaging unit, and translation component information is calculated by obtaining a moving speed from a change in an image signal. Thus, the position of the subject in the three-dimensional space is detected from the posture information and the translation component information, and the subject is projected from the position in the three-dimensional space onto the image plane.
[Patent Document 1] JP-A-9-81790
[Patent Document 2] JP-A-9-186957
[Patent Document 3] JP-A-9-245195
[Patent Document 4] Japanese Patent Laid-Open No. 11-306363
[Patent Document 5] Japanese Patent Laid-Open No. 2001-177850
[Patent Document 6] Japanese Patent Laid-Open No. 11-37736
[0004]
[Problems to be solved by the invention]
As described above, in an expression method such as QuickTime (registered trademark) VR, in general, when photographing an image group such as QuickTime (registered trademark) VR, a special device such as a QuickTime (registered trademark) VR image capturing dedicated device is used. Therefore, it is difficult for ordinary people who do not have a dedicated device to generate QuickTime (registered trademark) VR images. For this reason, the display in the QuickTime (registered trademark) VR format is an expression method that makes it easy to grasp the product image, but there is a problem that it is used only on some Web sites.
In the prior art described in Japanese Patent Laid-Open No. 9-81790, since the distance between the subject and the camera is set in advance, a three-dimensional image can be generated only under specific shooting conditions, and the direction of the optical axis is changed. Therefore, the structure of the apparatus becomes complicated.
The prior art described in Japanese Patent Laid-Open No. 9-186957 mainly targets a plurality of fixed imaging means whose position vectors (X, Y, Z) and postures (θ, φ) are known. Therefore, in the case of a monocular imaging means, the position vector (X, Y, Z) and orientation (θ, φ) cannot be calculated, and the captured image is added with position / orientation information and stored. I can't. In addition, since the case where the subject is projected out of the center position of the captured image plane is not considered, if the subject is not projected so that the subject is projected at the center of the image plane at the time of shooting, The projected position does not change, and when the image is switched according to the user's instruction, the subject moves up and down and left and right on the image plane, and cannot be displayed three-dimensionally.
[0005]
In addition, since the conventional technique disclosed in Japanese Patent Laid-Open No. 9-245195 is intended for the case where posture information and translation component information are known, posture information and translation component information such as when using a monocular imaging means. If is unknown, ray space data cannot be generated. Therefore, it is impossible to generate an image with an arbitrary viewpoint different from the captured viewpoint from a plurality of images.
In addition, in the conventional technique disclosed in Japanese Patent Laid-Open No. 11-306363, in order to create a three-dimensional shape of a subject, a large number of corresponding points in an image must be detected. And the measurement accuracy of the three-dimensional shape is lowered, and image degradation may occur.
In the prior art disclosed in Japanese Patent Laid-Open No. 2001-177850, the translation component information is calculated from the change in the captured image signal on the assumption that the subject is continuously captured by a video camera or the like. When the subject is photographed discontinuously, the amount of change in the photographed image signal becomes large, and the accuracy of the translation component information cannot be accurately calculated.
An object of the present invention is to solve such a problem of the prior art. Specifically, even a general person who does not have a dedicated device uses a monocular imaging device such as a digital camera, and the same subject. Is provided from various viewpoints and, for example, by storing a QuickTime (registered trademark) VR format image group, a high-quality three-dimensional image can be generated not only under specific shooting conditions, There is.
[0006]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, in the invention according to claim 1, in the image input device that inputs an image using the imaging means, a plurality of pieces of image information are obtained by photographing the same subject from at least two viewpoints. Imaging means for inputting image, attitude detecting means for detecting attitude information indicating the attitude of the imaging means when the subject is imaged, and rotation conversion of the image information input by the imaging means based on the attitude information And image recording means for adding the posture information detected by the posture detection means to the image information converted by the image conversion means and recording the image information.
According to a second aspect of the present invention, in the first aspect of the present invention, when the center position of the subject projected on the imaging surface of the imaging means deviates from the center position of the imaging surface, the center of the captured image information The center position of the image information is matched with the center position of the subject by translating the position.
According to a third aspect of the present invention, in the first aspect of the present invention, an image scaling unit for scaling a plurality of pieces of image information input by the imaging unit is provided.
According to a fourth aspect of the present invention, in the first aspect of the present invention, image information that is input when the imaging unit captures the subject from a viewpoint different from the viewpoint position when the subject is captured is generated. Arbitrary viewpoint image generation means is provided.
[0007]
According to a fifth aspect of the present invention, in an image input method for inputting a photographed image, an image pickup device when photographing the same subject from at least two or more viewpoints and photographing a plurality of pieces of image information and the subject. The posture information indicating the posture of the image is acquired, the acquired image information is rotationally converted based on the posture information, and the posture information is added to the rotation-converted image information and recorded.
In the invention described in claim 6, in the invention described in claim 5, when the center position of the photographed subject deviates from the center position of the input image information, the center position of the image information is translated. Thus, the center position of the image information is matched with the center position of the subject.
In the invention described in claim 7, in the invention described in claim 5, a plurality of input image information is scaled.
In the invention according to claim 8, in the invention according to claim 5, the image information input when the subject is photographed from a viewpoint different from the viewpoint position when the subject is imaged is generated. .
According to a ninth aspect of the present invention, in a program executed on the information processing apparatus, programming is performed so as to execute an image input by the image input method according to any one of the fifth to eighth aspects. It has been configured.
In the invention according to claim 10, the program according to claim 9 is stored in the storage medium storing the program.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram showing the configuration of an image input apparatus according to the first embodiment of the present invention. As shown in the figure, the image input device of this embodiment includes an imaging device 1 and an image processing device 2, and the imaging device 1 includes an imaging unit 11, an attitude detection unit 12, and an information transfer unit 13, and the image processing device. 2 includes an image conversion unit 21 and an image recording unit 22. In addition, you may incorporate the imaging device 1 and the image processing apparatus 2 in the same housing | casing as one apparatus. In the case of a single device, the information transfer unit 13 can be reduced.
In the above, the imaging unit 11 captures a subject and inputs image information. An example is a digital camera or a video camera. In addition, the posture detection unit 12 detects posture information of the imaging device 1. As this attitude | position detection part 12, a triaxial acceleration sensor and a triaxial magnetic sensor are used, for example. Alternatively, a configuration using an angular velocity sensor instead of the three-axis magnetic sensor, a configuration using both a magnetic sensor and an angular velocity sensor, and the like are possible.
The information transfer unit 13 transfers the image information captured by the imaging unit 11 and the posture information detected by the posture detection unit 12 to the image processing apparatus 2. As a transfer method, a wired method (for example, USB, SCSI, IEEE 1394, etc.), a wireless method (for example, wireless LAN, BlueTooth, etc.), a movement of a storage medium (for example, semiconductor memory, smart media, memory stick, flash memory, etc.), etc. are used. .
The image conversion unit 21 performs image conversion of the image information transferred by the information transfer unit 13 using the posture information detected by the posture detection unit 12. Further, the image recording unit 22 is composed of, for example, a semiconductor memory, a hard disk storage device (HDD), a magnetic tape, or a DVD-RW, and is transferred by the image information converted by the image conversion unit 21 and the information transfer unit 13. Posture information is recorded in a predetermined data format.
[0009]
FIG. 2 shows a detailed configuration of the imaging apparatus 1. With such a configuration, the subject image is formed on the image sensor 36 through the fixed lens 31, zoom lens 32, aperture mechanism 33, and focus lens 35, and the exposure time is controlled by the shutter 34. The image signal from the image sensor 36 is sampled in a CDS (Correlated Double Sampling) circuit 37 and then converted into a digital signal by an A / D converter 38. The timing at this time is generated by a TG (Timing Generator) 39.
Thereafter, the image signal is subjected to image processing such as aperture correction and compression processing by an IPP (Image Pre-Processor) 40 and temporarily stored in the image buffer memory 41. The temporarily stored image information is further stored in a storage area in the MPU (main control unit) 42.
The MPU 42 controls the operation of each unit, accepts an image capturing start instruction from the input instruction switch 43, and further sensors from the triaxial acceleration sensor 45 and the triaxial magnetic sensor 46 constituting the attitude detection unit 12. The output signal is detected via the second A / D converter 47. The detected sensor output signal is attached to the captured image information and stored in a storage area in the MPU 42. Data stored in the storage area of the MPU 42 is transferred to the image processing apparatus 2 via the information transfer unit 13. As the information transfer unit 13, a general-purpose interface for PC (personal computer) such as RS-232C, USB (Universal Serial Bus), IEEE 1394, network adapter, IrDA (Infrared Data Association), wireless LAN, BlueTooth, etc. Use.
[0010]
FIG. 3 is a block diagram showing the hardware configuration of the image processing apparatus 2. As shown in the figure, the image processing apparatus 2 includes an image conversion unit 21 and the like, and a CPU (Central Processing Unit) 51 that executes control of each unit, an SDRAM (Synchronous Dynamic Random Access Memory) 52, and an HDD (Hard Disk). (Drive: hard disk storage device) 53, an input interface unit (hereinafter abbreviated as input I / F) 54 for inputting data and signals from a pointing device such as a mouse, a keyboard or a button, a display device 55 such as a CRT (Cathode Ray Tube), A display interface unit (hereinafter abbreviated as a display I / F) 56 for performing display control of the display device 55, a CD-RW (Compact Disk Rewriteable) drive. A recording device 57 such as Eve, an external device such as the imaging device 1 and printer, and an external interface unit 58 (hereinafter abbreviated as an external I / F) 58 that is wired or wirelessly connected to a communication line such as the Internet, etc. It is a configuration. The SDRAM 52 is used as a work storage area for the CPU 51 and also as a storage area for processing programs and control programs for executing image processing such as image conversion processing according to the present embodiment. The processing program is stored in the SDRAM 52 via, for example, the recording device 57, or once stored in the HDD 53 and then stored in the SDRAM 52 when necessary. Alternatively, it is stored in the SDRAM 52 via a communication line connected to the external I / F 58. Further, image information to be processed is input from the imaging device 1 via the recording device 57 or the external I / F 58.
[0011]
Next, as an example of posture detection by the posture detection unit 12 of the imaging apparatus 1, an example in which the posture detection unit 12 is configured with a triaxial acceleration sensor 45 and a triaxial magnetic sensor 46 will be described. The posture detection unit 12 compares the orientation of the world coordinate system (XYZ coordinates) with the orientation of the device coordinate system (xyz coordinates) unique to the imaging device 1 when the image is taken, thereby obtaining posture information of the imaging device 1. To detect.
First, the device coordinate system and the world coordinate system are defined as follows.
(A) Device coordinate system: xyz coordinate system (see FIG. 4)
x-axis: Positive image plane right
y-axis: Image face down is positive
z-axis: optical axis direction; direction toward the object is positive
Origin o: Optical center of the imaging apparatus 1
f: Camera focal length
p: vector component from the optical center of the imaging apparatus 1 to the corresponding point
(B) World coordinate system: XYZ coordinate system (see FIG. 5)
Y axis: Gravitational acceleration direction is positive
Z-axis: Magnetic direction is positive
X axis: XYZ in the order of right-handed orthogonal system
For the sake of simplicity, it is assumed that the motion acceleration caused by the movement of the imaging apparatus 1 can be ignored, the gravitational acceleration and the magnetic field are orthogonal, and no magnetic field exists other than the geomagnetism. In this case, strictly speaking, there is a geomagnetic dip, and the geomagnetism and the gravitational acceleration are not orthogonal to each other. However, if the dip is known, it can be calculated in the same manner as when the geomagnetic direction and the gravitational acceleration are orthogonal. If geomagnetism is detected on three axes, posture information can be calculated even if the dip angle is unknown. In other words, the X axis is eastward and the Z axis is positive north.
The orientation of the apparatus coordinate system with respect to the world coordinate system is described by a rotation matrix vector R of the expression (1) with the world coordinate system as a reference.

However, in the equation (1), α, β, and γ are rotation angles around the X, Y, and Z axes with respect to the world coordinate system, respectively. At this time, the imaging apparatus 1 was rotated in the following order. It corresponds to that. That is, from the state in which the XYZ coordinate system and the xyz coordinate system coincide with each other, the rotation matrix vector R is established as a result of the following rotations (1), (2), and (3).
(1) The imaging device 1 is rotated by γ around the Z axis.
(2) The imaging device 1 is rotated by α around the X axis.
(3) The imaging device 1 is rotated by β around the Y axis.
Now, the gravitational acceleration vector and the geomagnetic vector are expressed by the equation (2) in the world coordinate system, respectively, and the acceleration vector and the geomagnetic vector based on the apparatus coordinate system detected by the triaxial acceleration sensor 45 and the triaxial magnetic sensor 46 are expressed as follows. It is assumed that each is expressed by equation (3).

At this time, the relationship between the vector g and the vector a and the vector M and the vector m is described by the following equations (4) and (5) using the rotation matrix vector R. Hereinafter, symbols R and g (g may be a scalar), M (M may be a scalar), a, and m represent vectors.
Ra = g (4)
Rm = M (5)
From equation (4), the rotation angle α around the X axis and the rotation angle γ around the Z axis are calculated as in equations (6) and (7).

When the rotation angle γ around the Z axis is known, the rotation angle β around the Y axis is expressed by the following equation (5) from the geomagnetic vector m, as shown in the equation (8). 9) Calculated as follows:

However, if the rotation angle α around the X axis obtained using the acceleration vector a is used, the calculation of the equation (8) is unnecessary. With the above calculation, α, β, γ and the rotation matrix R can be detected from the triaxial acceleration sensor 45 and the triaxial magnetic sensor 46, and the apparatus coordinate system can be converted into the world coordinate system.
It is also possible to detect the amount of change in posture information when a plurality of images are taken. The posture information when the reference image is photographed is the rotation matrix R _A , The rotation matrix R _B Then, the rotation matrix R of the amount of change from the orientation of the imaging device when the reference image is captured to the orientation of the imaging device when the relative image is captured _AB Is
R _AB = R _B / R _A It becomes.
[0012]
Next, image conversion performed by the image conversion unit 21 will be described.
When creating an image group for three-dimensionally displaying a plurality of images using QuickTime (registered trademark) VR or the like, generally, angle information of θ and φ is added to the image information as the above-described posture information. Record. Here, θ and φ are angles before and after the movement of the imaging apparatus 1 when the subject position is the coordinate center (see FIG. 6). A plurality of pieces of image information to which angle information of θ and φ is added are stored as shown in FIG. 7 and are displayed in a three-dimensional manner by switching and displaying image groups in accordance with user instructions.
If the photographing-dedicated device 3 as shown in FIG. 8 is attached to the subject H and the imaging device 1, the angle information of θ and φ can be detected from the movement information of the photographing-only device 3. Further, when the dedicated photographing device 3 is used, the imaging device 1 rotates only in the θ and φ directions and does not rotate in the ψ direction, so that the subject H is not rotated and projected in the ψ direction for each image ( Refer to Fig. 9. However, the reflection of the change of θ is omitted in each figure). On the other hand, if the imaging apparatus 1 is not fixed and the imaging apparatus 1 is not fixed, the imaging apparatus 1 rotates in the θ, φ, and ψ directions (see FIG. 10, provided that θ and Reflection of changes in φ is omitted). In other words, even if the angle information of θ and φ is added to the image group and displayed, the subject is rotated and projected for each image because of the rotation of the ψ angle.
Therefore, the image conversion unit 21 converts the rotated and projected image (see FIG. 10) into the non-rotated image (see FIG. 9) by rotating in the −ψ direction. By executing the rotation conversion, a blank portion having no image information is generated on the image (see FIG. 11). The blank portion may be drawn by applying the color of the image background portion. Also, as shown in FIG. 12, when the adjacent image information and the image information of the blank portion overlap, the adjacent image information may be projected and drawn on the blank portion.
[0013]
Next, image recording performed by the image recording unit 22 will be described.
The image recording unit 22 adds posture information (angle information of θ and φ) detected by the posture detection unit 12 to the image information converted by the image conversion unit 21 in order to display a plurality of images three-dimensionally. Then, it is recorded / saved in a predetermined data format (see FIG. 13). By recording the image group in this way, it is possible to switch and display the image group in accordance with a user instruction, and to display the image group three-dimensionally. FIG. 14 shows an operation flow in the case where the photographing-dedicated device 3 (see FIG. 8) is not used. Hereinafter, this operation flow will be described.
First, image information taken from two or more viewpoints in the imaging device 1 and attitude information of the imaging device 1 are acquired and stored in a memory (step S1). Then, rotation angles θ, φ, and ψ of the imaging device 1 are calculated from the posture information (step S2). The breakdown of step S2 is the following (S2-1) (S2-2) (S2-3).
(S2-1) The magnitude of the voltage value obtained from the output signal of the triaxial acceleration sensor 45 is read, the magnitude of the acceleration vector in the triaxial direction is compared, and the inclination of the imaging device 1 with respect to the direction of gravity is detected.
(S2-2) The magnitude of the voltage value obtained from the output signal of the triaxial magnetic sensor 46 is read, the magnitudes of the magnetic vectors in the triaxial direction are compared, and the inclination of the imaging device 1 with respect to the geomagnetic direction is detected.
(S2-3) The rotation angles θ, φ, and ψ of the imaging apparatus 1 are detected by combining the inclination with respect to the gravitational direction detected by the acceleration sensor 45 and the inclination with respect to the geomagnetic direction detected by the magnetic sensor 46.
[0014]
Next, the image conversion unit 21 of the image processing apparatus 2 performs image conversion by rotating the image information in the −ψ direction, and interpolates a blank area generated by the image conversion with adjacent image information (step S3). Then, the image conversion unit 21 adds the rotation angle θ and φ information to the image information subjected to the image conversion, and stores the information in the image recording unit 22 (step S4). In order to generate an image in the QuickTime (registered trademark) VR format, it is necessary to align a group of images taken using posture information (θ, φ) as shown in FIG. The rotation angle θ and φ information is added to and stored. In QuickTime (registered trademark) VR, by using position information of images arranged according to posture information, a two-dimensional image in accordance with a user's mouse operation is displayed and displayed as a three-dimensional image.
Thus, according to this embodiment, even a general person can easily shoot a plurality of subjects from various viewpoints using the monocular imaging device 1 such as a digital camera without worrying about the posture of the imaging device 1. Then, image group data to be displayed three-dimensionally can be automatically generated and stored together with the posture information in, for example, QuickTime (registered trademark) VR format.
[0015]
Next, a second embodiment of the present invention will be described.
FIG. 15 is a block diagram showing the configuration of the image input apparatus according to the second embodiment of the present invention. FIG. 16 is an explanatory diagram thereof. As shown in the figure, the image input device of this embodiment includes an imaging device 1 and an image processing device 2a, and the imaging device 1 includes an imaging unit 11, an attitude detection unit 12, and an information transfer unit 13. The image processing apparatus 2 a includes an image conversion unit 21 and an image recording unit 22, and also includes a correspondence detection unit 23, a translation component calculation unit 24, a three-dimensional position calculation unit 25, and a subject center position calculation unit 26. . The imaging device 1 and the image processing device 2a may be configured as an integrated device. It is also possible to reduce the information transfer unit 13 by using an integrated device. The imaging unit 11, the posture detection unit 12, and the information transfer unit 13 have the same configuration as that of the first embodiment.
The image conversion unit 21 performs image conversion of the image information transferred by the information transfer unit 13 using the posture information detected by the posture detection unit 12 and the translation component information calculated by the translation component calculation unit 24. The image recording unit 22 has the same configuration as that of the first embodiment.
Further, the correspondence detection unit 23 detects the position information of the device coordinate system of the corresponding points between the images from the image information transferred by the information transfer unit 13, and the translation component calculation unit 24 receives the posture information transferred by the information transfer unit 13. The translation component information is calculated from the position information of the corresponding points detected by the correspondence detecting unit 23. The three-dimensional position calculation unit 25 calculates the three-dimensional position of the corresponding point in the world coordinate system from the position information of the device coordinate system of the corresponding point, the posture information, and the translation component information, and the subject center position calculation unit 26 The center position of the subject is calculated from the three-dimensional position information.
[0016]
Next, detection of corresponding points between a reference image and a relative image taken from two viewpoints will be described with reference to FIG.
First, feature points are extracted from image information in the reference image to detect feature points. Many methods have been studied for extracting feature values from image information to detect feature points. For example, Chapter 3 “Feature Extraction” of “3D Vision” (by Kyoritsu Publishing) by Xugang and Saburo Saburo. It is described in.
Next, corresponding points corresponding to the feature points of the reference image detected based on the feature amount extracted from the reference image are detected from the relative image. FIG. 17 shows a method for detecting corresponding points by block matching based on cross-correlation as an example of corresponding point detection. That is, when the i-th feature point (xi0, yi0) in the reference image and the point (xi0 + dx, yi0 + dy) in the relative image are associated by block matching in the correlation window (see FIG. 17) of (2N + 1) (2P + 1). The cross-correlation value Si is calculated by the equation (10).

N and P are arbitrarily determined constants representing the size of the correlation window. Moreover, the meaning of each symbol in Formula (10) is as follows. The meaning of each symbol in is as follows.

K: Constant
In this embodiment, the corresponding points between the reference image and the relative image are detected by sequentially detecting the corresponding points that maximize the cross-correlation value Si for each feature point.
[0017]
Next, an example of translation component calculation in the translation component calculation unit 24 will be described (Japanese Patent Laid-Open No. 11-306363 has a detailed example).
In calculating the translation component, the translation component information is obtained by simple calculation using the posture information obtained from the posture detection unit 12 and the corresponding points obtained from the correspondence detection unit 23. As an example, a method for calculating the translation component information t of the image A taken at the viewpoint A and the image B taken at the viewpoint B shown in FIG. 18 will be described below.
The points (xA, i, yA, i) and (xB, i, yB, i) in the image shown in FIG. 18 indicate points obtained by projecting the feature point object i of the subject on the images A and B, respectively. . i represents a number for identifying the corresponding point of the subject. In addition, a change in posture information from image A to image B is represented by posture information R. Also, pA, i represents a vector connecting the corresponding point in the image A and the optical center at the viewpoint A, and pB, i represents a vector connecting the corresponding point in the image B and the optical center at the viewpoint B. Since pA, i and pB, i are different in the (x, y, z) coordinate system and the (x ′, y ′, z ′) coordinate system, respectively, pB, i is multiplied by posture information R (RpB, i). ) To describe the direction of pB, i in the (x, y, z) coordinate system. Here, vectors of pA, i and RpB, i and t exist on a plane connecting the feature point Objectj of the subject, the optical center at the viewpoint A, and the optical center at the viewpoint B (see FIG. 18). Therefore, the scalar triple product (representing the volume of a solid) created by pA, i and RpB, i and t is 0, so that equation (11) holds.

Since the translation component information t to be obtained represents the direction of the vector, there are two variables. Therefore, it is possible to calculate the translation component information t by detecting two or more sets of corresponding points corresponding to the feature point Object of the subject, substituting the position information of the corresponding points into the equation (11), and solving the simultaneous equations. When two or more sets of corresponding points are detected, calculation may be performed using only those having a high cross-correlation value Si, or the scalar triple product of equation (11) may be calculated in consideration of errors due to image noise. The translation component information t to be minimized may be obtained, or a number of translation component information may be obtained and the translation component information group may be projected onto the voting space to obtain the optimum translation component information t (Takayuki Okaya, Koichiro Deguchi Author "Estimating translational motion of a camera by voting using a camera with a three-dimensional orientation sensor" Human Interface Computer Vision and Image Media 2001.9.13).
[0018]
Next, an example of calculating the three-dimensional position of the corresponding point by the three-dimensional position calculation unit 25 will be described.
In order to calculate the three-dimensional position of the corresponding point, calculation is performed based on the principle of triangulation using position information, posture information, and translation component information of the corresponding point on the image. When the translation component calculation unit 24 calculates the translation component information t by the translation component calculation method described above, only the translation movement direction can be calculated, and the translation distance cannot be calculated. On the other hand, in the principle of triangulation, there is a problem that the scale of the three-dimensional position obtained using the images A and B and the three-dimensional position obtained using the images B and C do not match. As shown in FIG. 19, when the same corresponding points are acquired for the three images A, B, and C, the three-dimensional position obtained using the images A and B and the images B and C are used. A scale ratio with the obtained three-dimensional position can be calculated. This is the same when the images B, C, and D are used. Therefore, it is possible to calculate the scale ratio of the three-dimensional position calculated among all the images. Although it is not possible to calculate the distance that the imaging apparatus 1 actually translated, the scale ratio can be calculated between all the images, and therefore the corresponding point assuming that the distance of the translation component information t between the images A and B is 1. It is possible to calculate the three-dimensional position.
Next, an example of subject center position calculation in the subject center position calculation unit 26 will be described.
In the calculation of the center position of the subject in this embodiment, the average value of the three-dimensional positions of the corresponding points calculated by the three-dimensional position calculation unit 25 is calculated and used as the center position of the subject. When the detected number of corresponding points is biased between images, the average value may be calculated by weighting the corresponding points detected between the images.
[0019]
Next, an example of image conversion in the image conversion unit 21 will be described.
When the photographing-dedicated device as shown in FIG. 8 is attached to the subject and the imaging device 1 and photographing is performed, it is easy to shoot so that the subject is projected at the center position of the photographed image (FIG. 8). 20), when shooting without using a dedicated camera device, the imaging device 1 is not fixed, so the subject is projected out of the center position of the captured image (see FIG. 21). In this way, if the subject is projected from the center position, the subject moves up and down and left and right on the image when the image group is switched according to the user's instruction. Different images are displayed and three-dimensional display cannot be performed. Therefore, in this embodiment, the image conversion unit 21 performs image conversion that moves the projection position of the subject to the center position of the captured image.
The general imaging apparatus 1 projects the subject on the imaging surface by central projection as shown in FIG. 4, and at this time, when the subject is projected on the imaging surface by an orthographic model as shown in FIG. An example of image conversion in the case of approximation is shown below.
First, each imaging position of the imaging device 1 is obtained from the orientation information R detected by the orientation detection unit 12 and the translation component information t calculated by the translation component calculation unit 24, and the imaging surface of the imaging device 1 (shown in FIG. 23). For example, the XY plane) and the normal of the imaging surface (straight line in the z direction shown in FIG. 23) are calculated. The normal equation of the imaging surface is represented by equation (12), and the planar equation of the imaging surface is represented by equation (13).

Here, the normal of the imaging surface is a straight line passing through the points (X1, Y1, Z1) and (X2, Y2, Z2), and the imaging surface is a plane passing through (X2, Y2, Z2).
In the above, assuming that the shooting position of the image A is a reference position in the world coordinate system, the equations of the normal line and the imaging plane of the image A are (X1, Y1, Z1) = (0, 0, 1), (X2, Y2). , Z2) = (0, 0, 0) is substituted into the equations (12) and (13). Further, the normal of the image B and the equation of the imaging plane are obtained by converting the position of the points (X1, Y1, Z1), (X2, Y2, Z2) from the posture information R and the translation component information t (X′1, By substituting Y'1, Z'1), (X'2, Y'2, Z'2) (conversion formulas shown in formulas (14) and (15)) into formulas (12) and (13) Calculated. Similarly, it is possible to calculate the normal of the imaging surface and the plane equation at each imaging position.
[0020]
Next, a straight line L2 that passes through the subject center position calculated by the subject center position calculation unit 26 and is parallel to the normal vector L1 is calculated (see FIG. 24). Assuming that the subject center position is (X3, Y3, Z3), the linear equation in the image A is as shown in equation (16). Similarly, it is possible to calculate a straight line L2 that passes through the center of the subject at each photographing position and is parallel to the normal vector L1 (see FIG. 24).
Subsequently, an intersection point p4 (X4, Y4, Z4) between the calculated straight line L2 and the imaging surface is calculated (see FIG. 24). The intersection point p4 can be calculated by solving simultaneous equations (13) and (16). Further, when the image information is translated so that the center position (XC, YC, ZC) of the captured image coincides with the position (X4, Y4, Z4) of the intersection point p4, the translation vector (Tx, In order to calculate (Ty), the center position (XC, YC, ZC) of the photographed image and the position (X4, Y4, Z4) of the intersection point p4 are expressed by the device coordinate system (xC, yC, zC) from the equations (17) and (18). ), (X4, y4, z4) (indicated by arrows in FIG. 24). Here, since the center position of the captured image and the position of the intersection point p4 are within the same imaging plane, the Z coordinate value in the apparatus coordinate system is the same value.

Therefore, Tz in the equation (19) is always Tz = 0, and the translation vector (Tx, Ty) is as in the equation (20). By moving the photographed image in translation in the translation vector direction, the projection position of the subject is moved to the center position of the photographed image.
In the above description, the image conversion unit 21 may perform image conversion in which the projected image is rotated in the ψ direction as in the first embodiment. Also, there is no image information on the image due to rotational movement and translational movement, and a blank part is generated. The blank part may be drawn by applying the color of the image background part, or an adjacent image When the information and the image information of the blank part overlap, the adjacent image information may be projected and drawn on the blank part.
[0021]
25 and 26 show the operation flow of the image input operation of this embodiment. The operation flow will be described below with reference to FIGS. 25 and 26.
First, image information taken from two or more viewpoints in the imaging apparatus 1 and attitude information of the imaging apparatus 1 are acquired and stored in a memory (step S11). Then, rotation angles θ, φ, and ψ of the imaging device 1 are calculated from the posture information (step S12). That is, first, in the imaging apparatus 1, the magnitude of the voltage value obtained from the output signal of the triaxial acceleration sensor 45 is read, the magnitude of the acceleration vector in the triaxial direction is compared, and the inclination of the image input apparatus with respect to the direction of gravity is determined. At the same time, the magnitude of the voltage value obtained from the output signal of the triaxial magnetic sensor 46 is read, the magnitude of the magnetic vector in the triaxial direction is compared, and the inclination of the image input device with respect to the geomagnetic direction is detected. Then, the rotation angles θ, φ, and ψ of the imaging device 1 are detected by combining the inclination with respect to the gravitational direction detected from the triaxial acceleration sensor 45 and the inclination with respect to the geomagnetic direction detected from the magnetic sensor 46.
[0022]
Subsequently, the correspondence detection unit 23 detects feature points from the reference image, and detects position information of corresponding points corresponding to the feature points from the relative image (step S13). Specifically, first, a feature quantity extraction block is created with each pixel constituting the reference image as the center position, and a luminance value distribution in the block is detected. Further, the reference image is divided into regions, a block in which the difference in the luminance value distribution of the feature amount extraction block is significant in the divided region is detected, and the image position information and the luminance value distribution information at the center position of the detected block are detected. Store in memory (RGB value distribution information may be used instead of luminance value distribution information).
Subsequently, in the relative image, a feature amount extraction block is created as in the case of the reference image, the luminance value distribution in the block is detected, and the stored luminance value distribution information of the reference image and the feature amount extraction of the relative image are performed. Matching with the luminance value distribution information of the block is performed, and the image position information at the center position of the feature quantity extraction block closest to each feature point is stored in the memory as the position information of the corresponding point. However, if there is no feature quantity extraction block close to the luminance value distribution information of the feature value extraction block of the feature point, the stored image position information and luminance value distribution information of the feature point of the relative image are deleted from the memory.
Next, the translation component calculation unit 24 calculates translation component information from the position information and posture information of the corresponding points (step S14). Specifically, first, simultaneous equations are established by substituting posture information and corresponding point position information into equation (11) (generally, there are three vector variables representing the movement of a rigid body in a three-dimensional space). Since the image is captured by perspective projection assuming a digital camera, the scale of the subject in the world coordinate system is not uniquely determined from the image information in the image coordinate system, so the translation component information indicates the direction of the vector and the number of variables. Will be two). When a large number of corresponding points are detected, simultaneous equations are established more than the number of variables in the translation component information. Therefore, a solution having the smallest error is calculated among a large number of simultaneous equations by the least square method, and stored in the memory as translation component information.
[0023]
Next, the three-dimensional position calculation unit 25 calculates the three-dimensional position information (world coordinate system) of the corresponding point from the posture information, the translation component information, and the position information (device coordinate system) of the corresponding point (step S15). Specifically, first, the photographing positional relationship between the reference image and the relative image is obtained from the posture information and the translation component information, and the position information (device coordinate system) of the corresponding points on the reference image and the relative image is obtained by triangulation. Then, the three-dimensional position information of the corresponding points on all the images is calculated by calculating the three-dimensional position information (world coordinate system) of the corresponding points from the positional relationship. Then, the scale ratio of the three-dimensional position calculated between all the images is calculated, the distance of the translation component information t calculated between the first images is assumed to be 1, and the scale ratio between all the images is matched. The three-dimensional position (world coordinate system) of the corresponding point is calculated.
Subsequently, the subject center position calculation unit 26 calculates the subject center position from the three-dimensional position information (world coordinate system) of the corresponding points (step S16). The average value of the three-dimensional positions of the corresponding points is calculated, and the average value is set as the subject center position. Further, the image conversion unit 21 performs image conversion that translates the image information (Tx, Ty) (step S17). Specifically, first, an equation of the normal of the imaging surface and the imaging surface in the imaging device 1 is calculated from the posture information and the translation component information, and a straight line passing through the subject center position and parallel to the normal is calculated. Then, the intersection of the imaging surface and the straight line is calculated, and the image information is translated so that the center position of the captured image coincides with the position of the intersection.
Thereafter, rotation angle θ and φ information is added to the image information that has been subjected to image conversion, and the image information and the like are stored in the image recording unit 22 (step S18).
In this way, according to this embodiment, a subject can be displayed three-dimensionally only by photographing a plurality of objects from various viewpoints by the imaging device 1 without using a special image capturing dedicated device as shown in FIG. Image group data to be generated can be automatically generated.
[0024]
Next, a third embodiment of the present invention will be described.
In the second embodiment, the three-dimensional position of the corresponding point on the world coordinates is calculated in order to calculate the subject center position, and the average value of the corresponding points is assumed to be the subject center position. Then, the subject center position is calculated by taking into account the optical axis direction of the imaging apparatus 1 when each image is captured.
FIG. 27 is a block diagram showing the configuration of the image input apparatus according to the third embodiment. As shown in the figure, this image input device is composed of an imaging device 1 and an image processing device 2 b, and the imaging device 1 includes an imaging unit 11, a posture detection unit 12, and an information transfer unit 13. The image processing apparatus 2b includes an image conversion unit 21, an image recording unit 22, a correspondence detection unit 23, a translational component calculation unit 24, an optical axis high-density position detection unit 27, and the like. The imaging device 1 and the image processing device 2b may be configured as one device. By using one device, the information transfer unit 13 can be reduced. In addition, the imaging unit 11, the posture detection unit 12, the information transfer unit 13, the image conversion unit 21, the image recording unit 22, the correspondence detection unit 23, and the translation component calculation unit 24 have the same configuration as in the first embodiment.
On the other hand, the optical axis high-density position detection unit 27 detects the position where the optical axes intersect or the position where the optical axis density is the highest when the optical axes (z direction) at each photographing position are drawn on the world coordinates. Hereinafter, an operation in which the optical axis high-density position detection unit 27 detects a position having a high optical axis density will be described.
As described above, in the second embodiment, the average value of the three-dimensional positions of the corresponding points is set as the subject center position. However, in the third embodiment, the photographer projects the center position of the subject on the image center. Assuming that the subject center position of each projected image is averaged, the subject center position statistically agrees with the average image center position, and the position with the highest optical axis density is assumed. The center position.
Therefore, in this embodiment, the shooting position of each imaging apparatus 1 is obtained from the posture information R detected by the posture detection unit 12 and the translation component information t calculated by the translation component calculation unit 24, and ( A linear equation of the optical axis is calculated by substituting the center position (XC, YC, ZC) of the photographed image into X2, Y2, Z2). FIG. 28 is a graph in which a linear equation of the optical axis at each photographing position is calculated and projected onto world coordinates. Such a space in the world coordinate system is divided into regions as shown in FIG. 29, and the number of optical axes passing through each region is calculated. Then, the center position of the area where the number of optical axes passes most in each area is detected as the position having the highest optical axis density, and the image conversion unit 21 performs image conversion using the detected position as the subject center position.
[0025]
FIG. 30 shows an operation flow of this embodiment. The operation of this embodiment will be described below with reference to FIG.
First, in the imaging device 1, image information taken from two or more viewpoints and attitude information of the imaging device 1 are acquired and stored in the memory (step S21). Then, rotation angles θ, φ, and ψ of the imaging device 1 are calculated from the posture information (step S22). Specifically, first, the magnitude of the voltage value obtained from the output signal of the triaxial acceleration sensor 45 is read, the magnitudes of the acceleration vectors in the triaxial direction are compared, and the inclination of the image input device with respect to the direction of gravity is detected. Further, the magnitude of the voltage value obtained from the output signal of the triaxial magnetic sensor 46 is read, the magnitude of the magnetic vector in the triaxial direction is compared, and the inclination of the image input device with respect to the geomagnetic direction is detected. Subsequently, the rotation angles θ, φ, and ψ of the imaging device 1 are detected by combining the inclination with respect to the gravity direction detected by the triaxial acceleration sensor 45 and the inclination with respect to the geomagnetic direction detected by the magnetic sensor 46.
Next, the correspondence detection unit 23 detects feature points from the reference image, and detects position information of the corresponding points corresponding to the feature points from the relative image (step S23). Specifically, first, a feature quantity extraction block is created with each pixel constituting the reference image as the center position, and a luminance value distribution in the block is detected. Then, the reference image is divided into regions, a block in which the difference in the luminance value distribution of the feature amount extraction block is significant in the divided region is detected, and the image position information and the luminance value distribution information at the center position of the detected block are detected. Store in memory (RGB value distribution information may be used instead of luminance value distribution information).
Subsequently, in the relative image, a feature quantity extraction block is similarly created, the brightness value distribution in the block is detected, and the stored brightness value distribution information of the reference image and the brightness value of the feature quantity extraction block of the relative image are stored. Matching with the distribution information is performed, and for each feature point, the image position information at the center position of the closest feature amount extraction block is stored in the memory as the position information of the corresponding point. However, if there is no feature quantity extraction block close to the brightness value distribution information of the feature value extraction block of the feature point, the stored image position information and brightness value distribution information of the feature point are deleted from the memory.
[0026]
Next, the translation component calculation unit 24 calculates translation component information from the position information and posture information of the corresponding points (step S24). Specifically, first, the posture information and the position information of the corresponding point are substituted into the equation (11) to establish simultaneous equations (generally, in the three-dimensional space, there are three vector variables representing the movement of the rigid body, Since the image is captured by perspective projection assuming a digital camera, the scale of the subject in the world coordinate system is not uniquely determined from the image information in the image coordinate system, so the translation component information represents the direction of the vector and the number of variables is 2). When a large number of corresponding points are detected, simultaneous equations are established more than the number of variables in the translation component information. Therefore, the least square method is used, a solution having the smallest error is calculated by a large number of simultaneous equations, and stored in the memory as translation component information.
Next, the optical axis high-density position detection unit 27 calculates a linear equation of the optical axis from the posture information and the translation component information, and detects a position with a high optical axis density (step S25). Specifically, first, the optical axis direction when each image is photographed is calculated from the posture information and the translation component information, the world coordinate space is divided into regions, and the number through which the optical axis passes is calculated for each divided region. Then, detection is performed assuming that the center position of the region through which the optical axis passes most is the subject center position.
Next, the image conversion unit 21 performs image conversion for translating the image information (Tx, Ty) (step S26). Specifically, first, an equation of the imaging surface and the normal of the imaging surface in the imaging apparatus 1 is calculated from the posture information and the translation component information, and a straight line that passes through the calculated subject center position and is parallel to the normal is calculated. . Then, the intersection of the calculated straight line and the imaging surface is calculated, and the image information is translated so that the center position of the captured image coincides with the calculated position of the intersection.
Thereafter, rotation angle θ and φ information is added to the image information that has been subjected to image conversion, and the image information and the like are stored in the image recording unit 22 (step S27).
Thus, according to this embodiment, even if a special image capturing dedicated device as shown in FIG. 8 or the like is not used, the image capturing device 1 can display a three-dimensional image by simply capturing a plurality of subjects from various viewpoints. Image group data to be generated can be automatically generated.
[0027]
Next, a fourth embodiment of the present invention will be described.
The image input apparatus according to this embodiment has a configuration in which an image scaling unit is added to the image processing apparatus according to the second or third embodiment (for example, added in the image conversion unit 21). With such a configuration, the image scaling means changes the magnification of the captured image so that the size of the subject projected on each captured image becomes equal. If the distance from the imaging surface to the subject is different when shooting the subject, the size of the subject projected on the imaging surface is also different and the projection is performed as shown in FIG. The size of the subject projected on the imaging surface is made constant by scaling the image taken according to the distance. Hereinafter, the operation will be described.
First, a distance L from the position (X2, Y2, Z2) on the imaging surface where the image A is taken to the subject center position (X3, Y3, Z3) is calculated by the equation (21). Further, the distance L ′ from the position (X′2, Y′2, Z′2) on the imaging surface where the image B is taken to the subject center position (X3, Y3, Z3) is calculated by the equation (22). Then, using the image A as a reference image, the ratio between the distance L of the image A and the distance L ′ of the image B is calculated, and the magnification H of the image B is calculated. _B = L '/ L is calculated. Furthermore, the captured image of image B is changed to H _B By multiplying, the size of the subject projected on the imaging surfaces of the images A and B is made constant. Such processing is executed for each captured image, and the size of the subject projected on all image planes is made constant. The operation flow of this embodiment will be described below with reference to the operation flow shown in FIG.
First, image information photographed from two or more viewpoints in the imaging device 1 and attitude information of the imaging device 1 are acquired and stored in a memory (step S31).
Subsequently, the rotation angles θ, φ, and ψ of the imaging device 1 are calculated from the posture information (step S32). Specifically, first, the magnitude of the voltage value obtained from the output signal of the triaxial acceleration sensor 45 is read, the magnitudes of the acceleration vectors in the triaxial direction are compared, and the inclination of the image input device with respect to the direction of gravity is detected. Further, the magnitude of the voltage value obtained from the output signal of the triaxial magnetic sensor 46 is read, the magnitude of the magnetic vector in the triaxial direction is compared, and the inclination of the image input device with respect to the geomagnetic direction is detected. Then, the rotation angles θ, φ, and ψ of the imaging device 1 are detected by combining the inclination with respect to the gravity direction detected from the triaxial acceleration sensor 45 and the inclination with respect to the geomagnetic direction detected from the triaxial magnetic sensor 46.
[0028]
Next, the correspondence detection unit 23 detects feature points from the reference image, and detects position information of the corresponding points corresponding to the feature points from the relative image (step S33). Specifically, first, a feature quantity extraction block is created with each pixel constituting the reference image as the center position, and a luminance value distribution in the block is detected. Subsequently, the reference image is divided into regions, a block in which the difference in the luminance value distribution of the feature amount extraction block is significant is detected in the divided region, and the image position information and the luminance value distribution information at the center position of the detected block are detected. Is stored in the memory. Further, a feature quantity extraction block is similarly created in the relative image, and a luminance value distribution in the block is detected. Then, the stored luminance value distribution information of the reference image and the luminance value distribution information of the feature amount extraction block of the relative image are matched, and the center position of the closest feature amount extraction block is set to each feature point. Certain image position information is stored in the memory as position information of corresponding points. However, if there is no feature value extraction block close to the luminance value distribution information of the feature value extraction block of the feature point of the reference image, the stored image position information and luminance value distribution information of the feature point are deleted from the memory. To do.
Next, the translation component calculation unit 24 solves the simultaneous equations by substituting the position information and the posture information of the corresponding points into the equation (11), and calculates the translation component information (step S34). However, when a large number of corresponding points are detected, more simultaneous equations are established than the number of variables in the translation component information. Therefore, the solution having the smallest error is calculated from a number of simultaneous equations by the least square method, and the translation components are calculated. Store in memory as information.
Next, the optical axis high-density position detection unit 27 calculates a linear equation of the optical axis from the posture information and the translation component information, and detects a position where the optical axis density is high (step S35). That is, the optical axis direction when each image is captured is calculated from the posture information and the translation component information, the world coordinate space is divided into regions, the number of optical axes that pass through each divided region is calculated, and the optical axis is the most. The detection is performed assuming that the center position of a frequently passing region is the subject center position. Next, the image scaling unit in the image conversion unit 21 calculates the distance from the imaging surface to the subject center position when each image is captured, and calculates the ratio of the distance between the images (step S36). ). Specifically, first, in each image, the distance from the imaging surface to the subject center position is calculated, and the ratio of the calculated distances is calculated to obtain the image magnification H. Further, an image scaling factor is calculated from the obtained distance ratio, and the image scaling unit scales the image information to convert the image (step S37). Then, the rotation angle θ and φ information is added to the image information subjected to the image conversion and stored (step S38).
[0029]
Next, a fifth embodiment of the present invention will be described.
FIG. 33 is a block diagram showing the configuration of the image input apparatus according to this embodiment. As shown in the figure, the image input device of this embodiment is composed of an imaging device 1 and an image processing device 2c. The imaging device 1 includes an imaging unit 11, an attitude detection unit 12, and an information transfer unit 13. The image processing device 2c includes an image recording unit 22, a correspondence detection unit 23, a translational component calculation unit 24, and an arbitrary viewpoint image. The generation unit 29 and the like are included. The imaging device 1 and the image processing device 2c may be configured as one device. In addition, the imaging unit 11, the posture detection unit 12, the information transfer unit 13, the image recording unit 22, the correspondence detection unit 23, and the translation component calculation unit 24 have the same configuration as in the first embodiment.
With this configuration, in the image input apparatus of this embodiment, the arbitrary viewpoint image generation unit 29 generates a light space image from the captured image group, posture information, and translation component information in order to generate an arbitrary viewpoint image. Generate. From the posture information and the translation component information, each pixel value in the photographed image is converted into ray data in a three-dimensional space, and the converted ray data is projected onto the ray space f (X, Y, Z, θ, φ). For areas where there is no light data in space, linear light interpolation processing or the like is executed to interpolate and synthesize light data to generate a light space image. The method for generating the light space image is a known technique, Takeyuki Yanagisawa, Ken Naemura, Masahide Kaneko, and Hiroshi Harashima “Manipulation of a three-dimensional object using light space” (Television Society Journal vol50, No9). , Pp 1345-1351, 1996), Takeshi Naemura, Masahide Kaneko, Hiroshi Harashima "Description of a three-dimensional space based on orthographic expression of ray information" (Television Society Journal vol 51, No12, pp2082-2090, 1997), Ken Naemura, Harashima Hiroshi “3D Image Coding Technology (Part 2)” (Image Lab, pp 57-62, 1997. 4) and the like.
[0030]
FIG. 34 shows an operation flow of this embodiment. The operation flow will be described below with reference to FIG.
First, in the imaging device 1, image information taken from two or more viewpoints and attitude information of the imaging device 1 are acquired and stored in a memory (step S41). Further, the rotation angles θ, φ, and ψ of the imaging device 1 are calculated from the posture information in the same manner as in the fourth embodiment described above (step S42).
Next, as in the fourth embodiment, the correspondence detection unit 23 detects feature points from the reference image, and detects position information of the corresponding points corresponding to each feature point from the relative image (step S43). Further, in the same manner as in the fourth embodiment, the translation component calculation unit 24 calculates translation component information from the position information and posture information of the corresponding points (step S44).
Next, the arbitrary viewpoint image generation unit 29 generates a light space image from the posture information and the translation component information (step S45). Specifically, first, each pixel value in the photographed image is converted from the posture information and the translation component information into ray data in a three-dimensional space, and the ray data is converted to f (X, Y, Z, θ, φ in the ray space. ). It should be noted that for a region having no ray data in the ray space, ray data is interpolated and synthesized by linear interpolation processing to generate a ray space image.
Thereafter, the generated light space image is recorded in the image recording unit 22 (step S46).
[0031]
Although the embodiment shown in FIG. 1 and the like has been described above, the imaging device can also be realized by using a PC (Personal Computer), a portable information terminal device, or the like. In recent years, digital cameras that can realize various functions by reading various processing programs have appeared. In addition, notebook PCs equipped with cameras capable of rotating the direction, PDAs (Personal Digital Assistants) and mobile phones equipped with cameras have come into widespread use. The function can be realized by executing the processing program for realizing the image input method described in the embodiment.
For example, in a case where such a program is executed on a notebook PC equipped with a camera, camera-specific processing such as AF (Auto Focus) is executed inside the camera to select a shooting mode and start / end shooting. The control, the suspension of the focus detection operation, the display of the image and the focus detection area, etc. are executed after the program is taken in via an arbitrary user interface of the notebook PC. The following is an example of assignment in the user interface.
Shooting mode selection: Cursor keys
Shooting start / end control: Return key
Pause focus detection: Space key
Display of image and focus detection area: Displayed on LCD (liquid crystal display device of PC)
In the sixth embodiment, an embodiment for realizing image input according to the present invention by causing the program to be read from a storage medium in which a processing program for realizing the image input method described in each embodiment is written. A form is shown in FIG. As shown in FIG. 35A, the CD-ROM 61 in which the program is written is mounted on a notebook PC 62 with a camera, and the program is appropriately executed. Further, as shown in FIG. 35B, the smart media 63 in which such a program is written is mounted on a digital still camera 64 that can read the program, and the program is executed as appropriate. Can do. The storage medium in which the program is written is not limited to the above example, and may be another medium such as a CD-RW or a DVD-ROM.
[0032]
【The invention's effect】
As described above, according to the present invention, when the same subject is photographed from at least two viewpoints and a plurality of pieces of image information and the subject are photographed. Can acquire attitude information indicating the attitude of the image sensor, rotate and convert the acquired image information based on the attitude information, add the attitude information to the rotated and converted image information, and record it. Even ordinary people who do not have a camera can easily shoot the same subject from various viewpoints using a monocular imaging device such as a digital camera and store, for example, QuickTime (registered trademark) VR image groups. Therefore, a high-quality three-dimensional image can be generated not only under specific shooting conditions.
[0033]
Further, in the invention of claim 2, in the invention of claim 1, in the invention of claim 6, in the invention of claim 5, the center position of the photographed subject deviates from the center position of the input image information. In this case, the center position of the image information can be matched with the center position of the subject by translating the center position of the image information, so that the center position of the image information matches the center position of the subject at the time of shooting. There is no hardship.
Further, in the invention of the third aspect, in the invention of the first aspect of the invention, in the invention of the seventh aspect of the invention of the fifth aspect, the input image information can be scaled. Even if the size of the subject projected on the image plane changes due to the variation in the distance between the shooting positions, the size of the subject in the image group data can be made the same.
Further, in the invention of claim 4, in the invention of claim 1, in the invention of claim 8, in the invention of claim 5, when the subject is photographed from a viewpoint different from the viewpoint position when the subject is imaged Since the image information input to can be generated, the subject can be displayed three-dimensionally even with image group data with a small number of captured images.
In the invention according to claim 9, a program programmed to execute image input by the image input method according to any one of claims 5 to 8 is executed on the information processing apparatus. Therefore, the effect of the invention according to any one of claims 5 to 8 can be obtained by using the information processing apparatus.
In the invention according to claim 10, since the program according to claim 9 can be stored in a removable storage medium, the storage medium has been described in any one of claims 5 to 8. By mounting the information processing apparatus such as a personal computer that cannot input an image according to the invention, the effect of the invention according to any one of claims 5 to 8 can be obtained in such an information processing apparatus. Obtainable.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the configuration of an image input apparatus according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing the configuration of the main part of the image input apparatus according to the first embodiment of the present invention.
FIG. 3 is a hardware configuration diagram of the main part of the image input apparatus, showing the first embodiment of the present invention.
FIG. 4 is an explanatory diagram of a main part of the image input device according to the first embodiment of the present invention.
FIG. 5 is another explanatory diagram showing the main part of the image input apparatus according to the first embodiment of the present invention.
FIG. 6 is another explanatory diagram of the main part of the image input apparatus showing the first embodiment of the present invention.
FIG. 7 is another explanatory diagram of the main part of the image input apparatus showing the first embodiment of the present invention.
FIG. 8 is another explanatory diagram of the main part of the image input apparatus showing the first embodiment of the present invention.
FIG. 9 is an explanatory diagram according to the image input apparatus of the first embodiment of the present invention.
FIG. 10 is another explanatory diagram according to the image input apparatus of the first embodiment of the present invention.
FIG. 11 is another explanatory diagram of the main part of the image input apparatus showing the first embodiment of the present invention.
FIG. 12 is another explanatory diagram of the main part of the image input apparatus showing the first embodiment of the present invention.
FIG. 13 is a data configuration diagram of the main part of the image input apparatus, showing the first embodiment of the present invention.
FIG. 14 is an operation flowchart of the image input method according to the first embodiment of the present invention.
FIG. 15 is a block diagram showing the configuration of an image input apparatus according to the second embodiment of the present invention.
FIG. 16 is an explanatory diagram of an image input device according to a second embodiment of the present invention.
FIG. 17 is an explanatory diagram of a main part of an image input device, showing a second embodiment of the present invention.
FIG. 18 is another explanatory diagram of the main part of the image input apparatus showing the second embodiment of the present invention.
FIG. 19 is another explanatory diagram of the main part of the image input device showing the second embodiment of the present invention.
FIG. 20 is an explanatory diagram relating to an image input apparatus according to a second embodiment of the present invention.
FIG. 21 is another explanatory diagram according to the image input apparatus of the second embodiment of the present invention.
FIG. 22 is another explanatory diagram of the main part of the image input device, showing the second embodiment of the present invention.
FIG. 23 is another explanatory diagram of the main part of the image input apparatus showing the second embodiment of the present invention.
FIG. 24 is another explanatory diagram showing the main part of the image input device according to the second embodiment of the present invention.
FIG. 25 is an operation flowchart of the image input method according to the second embodiment of the present invention.
FIG. 26 is a flowchart showing another operation of the image input method according to the second embodiment of the present invention.
FIG. 27 is a block diagram showing the configuration of an image input apparatus according to a third embodiment of the present invention.
FIG. 28 is an explanatory diagram of a main part of an image input device according to a third embodiment of the present invention.
FIG. 29 is another explanatory diagram showing the main part of the image input apparatus according to the third embodiment of the present invention.
FIG. 30 is an operation flowchart of the image input method according to the third embodiment of the present invention.
FIG. 31 is an explanatory diagram relating to an image input apparatus according to a fourth embodiment of the present invention.
FIG. 32 is an operation flowchart of an image input method according to the fourth embodiment of the present invention.
FIG. 33 is a block diagram showing the configuration of an image input apparatus according to a fifth embodiment of the present invention.
FIG. 34 is an operation flowchart of an image input method according to the fifth embodiment of the present invention.
FIG. 35 is a hardware configuration diagram of an image input apparatus according to a sixth embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Imaging device, 2 Image processing apparatus, 11 Imaging part, 12 Attitude detection part, 13 Information transfer part, 21 Image conversion part, 22 Image recording part, 23 Correspondence detection part, 24 Translation component calculation part, 25 Three-dimensional position calculation part , 26 subject center position calculation unit, 27 optical axis high-density position detection unit, 29 arbitrary viewpoint image generation unit

Claims

In an image input apparatus that inputs an image using an imaging unit, an imaging unit that captures the same subject from at least two viewpoints and inputs a plurality of pieces of image information, and the imaging unit that captures the subject Attitude detection means for detecting attitude information indicating the attitude of the image, image conversion means for rotationally converting the image information input by the imaging means based on the attitude information, and image information converted by the image conversion means An image input device comprising: image recording means for adding and recording posture information detected by the posture detection means.

The image input apparatus according to claim 1, wherein when the center position of the subject projected on the imaging surface of the imaging means deviates from the center position of the imaging surface, the center position of the captured image information is translated, An image input apparatus characterized in that the center position of image information and the center position of a subject are made to coincide.

2. The image input device according to claim 1, further comprising an image scaling unit that scales a plurality of pieces of image information input by the imaging unit.

The image input apparatus according to claim 1, further comprising an arbitrary viewpoint image generation unit configured to generate image information input when the imaging unit captures the subject from a viewpoint different from a viewpoint position when the imaging unit captures the subject. An image input device characterized by that.

In an image input method for inputting a photographed image, a plurality of pieces of image information and posture information indicating the posture of an image sensor when the subject is photographed are obtained by photographing the same subject from at least two or more viewpoints. An image input method, wherein the acquired image information is rotationally converted based on the posture information, and the posture information is added to and recorded on the rotationally converted image information.

6. The image input method according to claim 5, wherein when the center position of the photographed subject deviates from the center position of the input image information, the center position of the image information and the subject are translated by moving the center position of the image information. An image input method characterized by matching the center position of the image.

6. The image input method according to claim 5, wherein the input image information is scaled.

6. The image input method according to claim 5, wherein image information input when the subject is photographed from a viewpoint different from a viewpoint position when the subject is photographed is generated.

A program executed on an information processing apparatus, wherein the program is programmed to execute image input by the image input method according to any one of claims 5 to 8.

A storage medium storing the program, wherein the program according to claim 9 is stored.