JP5503510B2

JP5503510B2 - Posture estimation apparatus and posture estimation program

Info

Publication number: JP5503510B2
Application number: JP2010260468A
Authority: JP
Inventors: 誠喜井上; 周平秦
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-11-22
Filing date: 2010-11-22
Publication date: 2014-05-28
Anticipated expiration: 2030-11-22
Also published as: JP2012113438A

Description

本発明は、推定する対象物を撮影した単視点の静止画または動画像を示す撮影画像に映ったオブジェクトから、画像処理により、対象物の姿勢または動きを推定する姿勢推定装置および姿勢推定プログラムに関する。 The present invention relates to a posture estimation device and a posture estimation program for estimating a posture or a motion of a target object by image processing from a single-point still image or a captured image showing a moving image in which the target object to be estimated is captured. .

従来、１台のカメラで撮影された単眼視画像（単視点の静止画または動画像）を用いたモーションキャプチャ手法は種々提案されている。推定する対象物を人物としたときに、単視点の人物画像から、人物の姿勢が推定できれば、人物動作の分析や、コンピュータグラフィックス（ＣＧ）によるキャラクタアニメーションの制作に有効である。 Conventionally, various motion capture methods using monocular images (single-viewpoint still images or moving images) captured by a single camera have been proposed. If the object to be estimated is a person, if the posture of the person can be estimated from a single-viewpoint person image, it is effective for analysis of person movements and for the production of character animation by computer graphics (CG).

撮影画像から人物領域を抽出し、その形状および模様から姿勢を推定するために、以下の手法などが提案されている。
（１）人体の骨格構造を持った３次元ＣＧモデルを用意しておき、骨格を様々に動かして生成したＣＧ画像と、撮影画像と、のマッチングにより、姿勢推定を行う。このとき、例えば、撮影画像から人物領域を抽出し、その画像特徴とＣＧ生成映像の画像特徴とを比較する（例えば、非特許文献１参照）。
（２）撮影画像から人物領域を抽出し、その形状（シルエット）から、人物の手足や肘、膝の位置を推定し、内部の骨格を推定する（例えば、特許文献１参照）。
（３）撮影画像から人物領域を抽出し、その形状（シルエット）と、ＣＧ生成画像のシルエットと、を比較する。この場合、比較は２つの画像のＸＯＲ（排他的論理和）で行う。 In order to extract a person region from a photographed image and estimate the posture from the shape and pattern, the following methods and the like have been proposed.
(1) A three-dimensional CG model having a human skeleton structure is prepared, and posture estimation is performed by matching a CG image generated by variously moving the skeleton with a captured image. At this time, for example, a person region is extracted from the photographed image, and the image feature is compared with the image feature of the CG generation video (for example, see Non-Patent Document 1).
(2) A person region is extracted from the photographed image, the positions of the person's limbs, elbows, and knees are estimated from the shape (silhouette), and the internal skeleton is estimated (see, for example, Patent Document 1).
(3) A person region is extracted from the photographed image, and the shape (silhouette) is compared with the silhouette of the CG generation image. In this case, the comparison is performed by XOR (exclusive OR) of the two images.

特開２００４−１６４４８０号公報JP 2004-164480 A

「ＨＯＧ特徴に基づく単眼画像からの人体３次元姿勢推定」、画像の認識・理解シンポジウム（MIRU2008）、2008年7月"3D human body posture estimation from monocular images based on HOG features", Image Recognition and Understanding Symposium (MIRU2008), July 2008

しかしながら、前記した（１）の手法のように画像特徴を利用しようとすると、人物が着ている洋服などの模様に推定結果が大きく影響される。服装が異なると、撮影画像とＣＧ画像とのマッチング（照合）の度合いが変化し、正確な姿勢推定が行えない。 However, if an image feature is used as in the method (1) described above, the estimation result is greatly influenced by the pattern of clothes worn by a person. If the clothes are different, the degree of matching between the captured image and the CG image changes, and accurate posture estimation cannot be performed.

また、前記した（２）の手法は、服装の影響が少ないよう、シルエットを用いているが、例えば人物の手足や肘、膝などの部位を認識することは難しく、シルエット形状の正確さによって、部位が正確に特定できるかどうかが大きく影響される。つまり、シルエット形状を抽出するときの誤差、すなわち領域抽出の段階での誤差によって、部位を誤検出する可能性が高い。 In addition, the above-described method (2) uses a silhouette so that the influence of clothing is small. However, it is difficult to recognize a part such as a person's limb, elbow, or knee, and the accuracy of the silhouette shape Whether or not the site can be accurately identified is greatly affected. That is, there is a high possibility that a part is erroneously detected due to an error in extracting a silhouette shape, that is, an error in the region extraction stage.

また、前記した（３）の手法は、比較的ロバストな方法（頑健な方法）であるが、単純にＸＯＲでシルエット間の照合を行うと、画面上の位置や手足の太さの差異の影響を受けてしまう。つまり、単純にシルエットで比較すると、例えば、シルエットの向きが微妙に変化していたり、歩行中の両足や両腕の重なり具合が原因となって、なかなか適合しない。また、ＣＧでモデルを作成する場合に、標準的であると考えられるような例えば足部のモデルを作成したとしても、撮影画像中のオブジェクトである人物の筋肉のつき方や太さに個人差があるために、形状が同じであっても太さが違うと、希望のマッチング結果が得られない。要するに、従来技術では、様々な動作における特徴を正確に再現できるまでには至っていないのが現状である。 In addition, the method (3) described above is a relatively robust method (robust method), but if the matching between silhouettes is simply performed by XOR, the influence of the difference in the position on the screen and the thickness of the limbs. Will receive. In other words, when compared simply by silhouette, for example, the orientation of the silhouette is slightly changed, or the overlapping of both feet and arms during walking is not suitable. In addition, when creating a model with CG, even if a foot model, for example, which is considered to be standard, is created, there are individual differences in how the muscles and thickness of a person, which is an object in the photographed image, are added. Therefore, even if the shape is the same, if the thickness is different, the desired matching result cannot be obtained. In short, the current state of the art has not yet been able to accurately reproduce the characteristics of various operations.

本発明は、以上のような問題点に鑑みてなされたものであり、推定する対象物の撮影画像から姿勢または動きを推定する際に対応するＣＧ画像との照合の精度を高めることのできる姿勢推定装置および姿勢推定プログラムを提供することを課題とする。 The present invention has been made in view of the above-described problems, and can improve the accuracy of matching with a corresponding CG image when estimating the posture or motion from a captured image of an object to be estimated. It is an object of the present invention to provide an estimation device and a posture estimation program.

前記課題を解決するために、請求項１に記載の姿勢推定装置は、推定する対象物を撮影した単視点の静止画または動画像を示す撮影画像に映ったオブジェクトから、画像処理により、前記対象物の姿勢または動きを特徴付けるパラメータを推定する姿勢推定装置であって、画像入力手段と、特定領域抽出手段と、細線化手段と、膨張処理手段と、距離変換手段と、勾配特徴量抽出手段と、照合手段と、を備えることとした。 In order to solve the above-described problem, the posture estimation apparatus according to claim 1 is configured to perform image processing on an object from a single-view still image or moving image showing a moving image captured from an object to be estimated. A posture estimation device for estimating a parameter characterizing the posture or motion of an object, comprising: an image input means, a specific area extraction means, a thinning means, an expansion processing means, a distance conversion means, a gradient feature value extraction means, And collation means.

かかる構成によれば、姿勢推定装置は、画像入力手段によって、前記撮影画像を入力すると共に、当該撮影画像中のオブジェクトを多関節物体としてコンピュータグラフィックス（ＣＧ）用にモデル化したＣＧキャラクタモデルおよび当該ＣＧキャラクタモデルで用いる関節角度パラメータに基づいて前記撮影画像中のオブジェクトを擬似的に描画することで生成されたＣＧ画像を入力する。ここで、推定する対象物が例えば人物であれば、ＣＧキャラクタモデルは、人体モデルを含む。そして、姿勢推定装置は、特定領域抽出手段によって、前記入力された撮影画像から前記オブジェクトの特定領域を２値化したシルエットを抽出すると共に、前記入力されたＣＧ画像から前記オブジェクトの特定領域を２値化したシルエットを抽出する。そして、姿勢推定装置は、細線化手段によって、前記抽出されたそれぞれのシルエットに細線化処理を施し、膨張処理手段によって、前記細線化されたそれぞれのシルエットに膨張処理を施し、距離変換手段によって、前記膨張させたそれぞれのシルエットに距離変換を施すことで濃淡画像を生成する。 According to such a configuration, the posture estimation apparatus inputs the photographed image by the image input means, and also includes a CG character model obtained by modeling an object in the photographed image as a multi-joint object for computer graphics (CG), and Based on the joint angle parameter used in the CG character model, a CG image generated by pseudo-drawing an object in the captured image is input. Here, if the object to be estimated is, for example, a person, the CG character model includes a human body model. Then, the posture estimation apparatus extracts a silhouette obtained by binarizing the specific area of the object from the input photographed image by the specific area extraction unit, and also extracts the specific area of the object from the input CG image. Extract a valuated silhouette. Then, the posture estimation device performs thinning processing on each of the extracted silhouettes by thinning means, performs expansion processing on each of the thinned silhouettes by expansion processing means, and by distance conversion means, A grayscale image is generated by performing distance conversion on each of the expanded silhouettes.

ここで、細線化処理、膨張処理、距離変換は、一般的な画像処理ソフトウェアにライブラリ化されている関数を用いることで実現できる。
また、細線化処理は、２値画像のシルエットを幅１ピクセルの線画像に変換し、膨張処理は、細線を均等な太さに拡幅する。したがって、例えば撮影画像から抽出された後に細線化されたシルエットに対して膨張処理を施すと、撮影画像から抽出されたシルエットに復元されるわけではなく、細線が均等な太さに拡幅されたシルエットとなる。これにより、画像中のオブジェクトの太さの影響を受けずに、例えば撮影画像中の人物の足部のシルエットを抽出したときに、足部の個人差に関わらず、予め均等な幅で作成したＣＧモデルの足部のシルエットとの間で高精度に照合を行うことができる。 Here, the thinning process, the expansion process, and the distance conversion can be realized by using a function stored in a library in general image processing software.
Further, the thinning process converts the silhouette of the binary image into a line image having a width of 1 pixel, and the expansion process widens the thin line to a uniform thickness. Therefore, for example, if an expansion process is performed on a thinned silhouette after being extracted from the photographed image, the silhouette extracted from the photographed image is not restored, but the silhouette in which the thin line is widened to an equal thickness It becomes. Thus, for example, when a silhouette of a person's foot in a photographed image is extracted without being affected by the thickness of an object in the image, it is created with a uniform width in advance regardless of individual differences in the foot. Matching with the silhouette of the foot portion of the CG model can be performed with high accuracy.

また、距離変換は、値が０と１の２値画像の各画素に対して、そこから値が０である画素への最短距離を与える変換を示す。このため、２値画像のシルエット内の各画素からシルエットの輪郭縁部の画素への距離のうち最短距離を与えることができる。したがって、距離変換後には、２値画像のシルエットの元の形状に応じて、元の形状の縁部が適宜削られたような濃淡画像となる。このようにシルエットに濃淡をつけて濃淡画像を生成すると、シルエットの方向性として、明るさの傾きが表れてくる。そのため、従来技術において領域抽出で得たシルエットで単純に比較したときにシルエットの向きが微妙に変化していたり、シルエットの重なり具合が原因となって適合しない問題を解決し、シルエットであっても方向性を見つけることができ、希望のマッチング結果を得ることができる。 The distance conversion is a conversion that gives the shortest distance from each pixel of a binary image having values 0 and 1 to a pixel having a value 0. For this reason, the shortest distance among the distances from the pixels in the silhouette of the binary image to the pixels at the contour edge of the silhouette can be given. Therefore, after the distance conversion, a grayscale image is obtained in which the edge of the original shape is appropriately cut according to the original shape of the silhouette of the binary image. When a shade image is generated by adding shading to a silhouette in this way, a brightness gradient appears as the directionality of the silhouette. Therefore, when the silhouette obtained by region extraction in the conventional technology is simply compared, the orientation of the silhouette changes slightly, or the problem of incompatibility due to the overlapping of silhouettes is solved. The directionality can be found, and a desired matching result can be obtained.

そして、姿勢推定装置は、勾配特徴量抽出手段によって、前記それぞれの濃淡画像の特徴量としてＨＯＧ（Histogram of Oriented Gradient）を算出する。ここで、ＨＯＧは、画像の着目する画素について水平方向および垂直方向に隣接する画素間の明るさの差を輝度勾配として抽出した特徴量を示す。そして、姿勢推定装置は、照合手段によって、前記撮影画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧと、前記ＣＧ画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧとを照合することで、前記撮影画像中のオブジェクトの関節角度パラメータを推定する。照合の結果、ＨＯＧの差分が小さいほど、撮影画像に対するＣＧ画像の類似度が大きいことになる。また、姿勢推定装置は、このようなＣＧ画像を生成するときに用いた関節角度パラメータを、姿勢推定結果として求めることができる。 Then, the posture estimation apparatus calculates HOG (Histogram of Oriented Gradient) as the feature value of each gray image by the gradient feature value extraction means. Here, HOG indicates a feature amount obtained by extracting a brightness difference between adjacent pixels in the horizontal direction and the vertical direction as a luminance gradient with respect to a pixel of interest of an image. Then, the posture estimation apparatus collates the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image by the collating unit, A joint angle parameter of an object in the captured image is estimated. As a result of the collation, the smaller the HOG difference is, the greater the similarity of the CG image to the captured image. Further, the posture estimation device can obtain the joint angle parameter used when generating such a CG image as the posture estimation result.

仮にＨＯＧを算出することなくシルエットあるいは濃淡画像で比較した場合には、撮影画像とＣＧ画像のオブジェクトが同じ形状であったとしても、オブジェクトの位置がずれているだけで、マッチングができなくなってしまう。このような問題に対して、この姿勢推定装置は、照合手段によって、画像のシルエットに基づいてＨＯＧを比較しているので、撮影画像のシルエットに基づいて得た濃淡画像中のオブジェクトの位置と、ＣＧ画像のシルエットに基づいて得た濃淡画像中のオブジェクトの位置と、がずれていたとしても、特徴量をオブジェクトの明るさの傾きで求めているので、画面上のオブジェクトの位置の違いによる影響を受けずに高精度に照合を行うことができる。
また、非特許文献１のようにＨＯＧを適用したとしても画像のシルエットを抽出しない場合には、推定する対象物が人物の場合、服装が異なると、マッチングができなくなる。このような問題に対して、この姿勢推定装置は、画像のシルエットを抽出した上で濃淡画像を生成し、さらに濃淡画像からＨＯＧを算出しているので、洋服の模様などの影響を受けることなく、シルエット照合のロバスト性をいかしながら、撮影画像とＣＧ画像とを高精度に照合することができる。 If a silhouette or grayscale image is compared without calculating the HOG, even if the object of the photographed image and the CG image have the same shape, matching is not possible because the position of the object is shifted. . In order to solve such a problem, the posture estimation apparatus compares the HOG based on the silhouette of the image by the matching unit, so the position of the object in the grayscale image obtained based on the silhouette of the captured image, Even if the position of the object in the grayscale image obtained based on the silhouette of the CG image is deviated, the feature amount is obtained from the inclination of the brightness of the object, so the influence of the difference in the position of the object on the screen It is possible to perform the collation with high accuracy without receiving.
Further, even if HOG is applied as in Non-Patent Document 1, if the silhouette of the image is not extracted, matching is not possible if the object to be estimated is a person and the clothes are different. For such a problem, the posture estimation apparatus generates a grayscale image after extracting the silhouette of the image, and further calculates the HOG from the grayscale image, so that it is not affected by the clothes pattern or the like. The captured image and the CG image can be collated with high accuracy while using the robustness of the silhouette collation.

また、請求項２に記載の姿勢推定装置は、請求項１に記載の姿勢推定装置において、前記画像入力手段に入力する前記ＣＧ画像を生成するために、モデルシーケンス記憶手段と、ＣＧ画像生成手段と、をさらに備えることが好ましい。 According to a second aspect of the present invention, there is provided a posture estimation apparatus according to the first aspect, in which the model sequence storage unit and the CG image generation unit are configured to generate the CG image to be input to the image input unit. It is preferable to further comprise.

かかる構成によれば、姿勢推定装置は、モデルシーケンス記憶手段に、前記推定する対象物が一連の所定動作を行うためのモデルとしてフレーム毎に予め作成された関節角度パラメータの値をモデルシーケンスとして記憶する。ここで、対象物が例えば人物であれば、モデルシーケンスには、例えば歩行や走行等の個別の動きに対応したモデルが含まれる。そして、姿勢推定装置において、ＣＧ画像生成手段は、前記画像入力手段にフレーム毎に入力する撮影画像である撮影フレーム画像に対応させて前記モデルシーケンス記憶手段からフレーム毎に読み出された関節角度パラメータの値と、前記ＣＧキャラクタモデルとに基づいて、フレーム毎のＣＧ画像としてＣＧフレーム画像を生成する。そして、姿勢推定装置において、前記特定領域抽出手段は、前記撮影フレーム画像から前記オブジェクトの特定領域を２値化したシルエットを抽出すると共に、前記ＣＧフレーム画像から前記オブジェクトの特定領域を２値化したシルエットを抽出し、前記細線化手段、前記膨張処理手段、前記距離変換手段および前記勾配特徴量抽出手段は、前記撮影フレーム画像および前記ＣＧフレーム画像のフレーム別に画像処理を施す。 According to this configuration, the posture estimation apparatus stores, as a model sequence, the value of the joint angle parameter created in advance for each frame as a model for the target object to perform a series of predetermined operations in the model sequence storage unit. To do. Here, if the object is, for example, a person, the model sequence includes models corresponding to individual movements such as walking and running. In the posture estimation apparatus, the CG image generation unit reads the joint angle parameter read out from the model sequence storage unit for each frame in association with a captured frame image that is a captured image input to the image input unit for each frame. And a CG frame image as a CG image for each frame based on the value of CG and the CG character model. In the posture estimation apparatus, the specific area extracting unit extracts a silhouette obtained by binarizing the specific area of the object from the captured frame image, and binarizes the specific area of the object from the CG frame image. The silhouette is extracted, and the thinning unit, the expansion processing unit, the distance conversion unit, and the gradient feature amount extraction unit perform image processing for each frame of the captured frame image and the CG frame image.

また、請求項３に記載の姿勢推定装置は、請求項２に記載の姿勢推定装置において、パラメータ変更手段をさらに備え、前記照合手段が、差分算出手段と、空間的特徴判定手段と、を備えることが好ましい。 The posture estimation apparatus according to claim 3 is the posture estimation apparatus according to claim 2, further comprising parameter changing means, wherein the collating means comprises difference calculation means and spatial feature determination means. It is preferable.

かかる構成によれば、姿勢推定装置は、パラメータ変更手段によって、前記撮影フレーム画像に対して前記モデルシーケンス記憶手段からフレーム毎に読み出された関節角度パラメータの値を予め定められた範囲内で変更する。そして、姿勢推定装置において、前記ＣＧ画像生成手段は、フレーム毎に読み出された関節角度パラメータまたは前記変更された関節角度パラメータの値と、前記ＣＧキャラクタモデルとに基づいて、前記ＣＧフレーム画像を生成する。また、姿勢推定装置において、前記照合手段は、差分算出手段によって、前記撮影フレーム画像に対して前記モデルシーケンス記憶手段から読み出された関節角度パラメータまたは前記パラメータ変更手段で変更された関節角度パラメータの値を用いて生成されるＣＧ画像のシルエットに基づく各ＨＯＧと、当該撮影フレーム画像のシルエットに基づくＨＯＧとの差分データをそれぞれ算出する。そして、姿勢推定装置において、前記照合手段は、空間的特徴判定手段によって、前記モデルシーケンスのフレーム番号を固定したときに、当該撮影フレーム画像に対して算出された前記ＨＯＧの差分データに基づいて、差分データが最小となるときの前記関節角度パラメータの値を判定し、推定結果として前記フレーム番号および前記関節角度パラメータの値を出力する。 According to this configuration, the posture estimation apparatus changes the value of the joint angle parameter read for each frame from the model sequence storage unit with respect to the captured frame image by the parameter changing unit within a predetermined range. To do. Then, in the posture estimation apparatus, the CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model. Generate. Further, in the posture estimation apparatus, the collating unit may calculate the joint angle parameter read from the model sequence storage unit or the joint angle parameter changed by the parameter changing unit with respect to the captured frame image by the difference calculating unit. Difference data between each HOG based on the silhouette of the CG image generated using the value and the HOG based on the silhouette of the captured frame image is calculated. Then, in the posture estimation apparatus, the collating unit, based on the difference data of the HOG calculated for the captured frame image when the frame number of the model sequence is fixed by the spatial feature determining unit, The value of the joint angle parameter when the difference data is minimum is determined, and the frame number and the value of the joint angle parameter are output as an estimation result.

また、請求項４に記載の姿勢推定装置は、請求項２に記載の姿勢推定装置において、前記照合手段が、差分算出手段と、時間的特徴抽出手段と、を備えることが好ましい。 In addition, in the posture estimation apparatus according to a fourth aspect, in the posture estimation apparatus according to the second aspect, the collating unit preferably includes a difference calculation unit and a temporal feature extraction unit.

かかる構成によれば、姿勢推定装置において、前記照合手段は、差分算出手段によって、前記撮影フレーム画像に対して前記モデルシーケンス記憶手段から読み出された関節角度パラメータの値を用いて生成されるＣＧ画像のシルエットに基づく各ＨＯＧと、当該撮影フレーム画像のシルエットに基づくＨＯＧとの差分データをそれぞれ算出する。そして、姿勢推定装置において、前記照合手段は、時間的特徴抽出手段によって、前記撮影フレーム画像に対して前記モデルシーケンスのフレーム番号を変化させたときに前記モデルシーケンス記憶手段から読み出される関節角度パラメータの値を用いて生成されるＣＧ画像についての前記ＨＯＧの差分データに基づいて、差分データが最小となるときの前記モデルシーケンスのフレーム番号を抽出し、推定結果として前記フレーム番号および前記関節角度パラメータの値を出力する。 According to such a configuration, in the posture estimation apparatus, the collating unit is CG generated by the difference calculating unit using the value of the joint angle parameter read from the model sequence storage unit with respect to the captured frame image. Difference data between each HOG based on the silhouette of the image and the HOG based on the silhouette of the captured frame image is calculated. Then, in the posture estimation apparatus, the collation unit is configured to determine a joint angle parameter read from the model sequence storage unit when the frame number of the model sequence is changed with respect to the captured frame image by the temporal feature extraction unit. Based on the difference data of the HOG for the CG image generated using the value, the frame number of the model sequence when the difference data is minimized is extracted, and the frame number and the joint angle parameter Output the value.

また、請求項５に記載の姿勢推定装置は、請求項２に記載の姿勢推定装置において、パラメータ変更手段をさらに備え、前記照合手段が、差分算出手段と、時間的特徴抽出手段と、空間的特徴判定手段と、を備えることが好ましい。 The posture estimation apparatus according to claim 5 is the posture estimation apparatus according to claim 2, further comprising parameter changing means, wherein the collating means includes a difference calculating means, a temporal feature extracting means, a spatial It is preferable to include a feature determination unit.

かかる構成によれば、姿勢推定装置は、パラメータ変更手段によって、前記撮影フレーム画像に対して前記モデルシーケンス記憶手段からフレーム毎に読み出された関節角度パラメータの値を予め定められた範囲内で変更する。そして、姿勢推定装置において、前記ＣＧ画像生成手段は、フレーム毎に読み出された関節角度パラメータまたは前記変更された関節角度パラメータの値と、前記ＣＧキャラクタモデルとに基づいて、前記ＣＧフレーム画像を生成する。また、姿勢推定装置において、前記照合手段は、差分算出手段によって、前記撮影フレーム画像に対して前記モデルシーケンス記憶手段から読み出された関節角度パラメータまたは前記前記パラメータ変更手段で変更された関節角度パラメータの値を用いて生成されるＣＧ画像のシルエットに基づく各ＨＯＧと、当該撮影フレーム画像のシルエットに基づくＨＯＧとの差分データをそれぞれ算出する。そして、姿勢推定装置において、前記照合手段は、第１段階として、時間的特徴抽出手段によって、前記撮影フレーム画像に対して前記モデルシーケンスのフレーム番号を変化させたときに前記モデルシーケンス記憶手段から読み出される関節角度パラメータの値を用いて生成されるＣＧ画像についての前記ＨＯＧの差分データに基づいて、差分データが最小となるときの前記モデルシーケンスのフレーム番号を抽出する。これにより、予め作成されたモデルシーケンスの各フレームと、撮影画像の各フレームとのタイミングを合わせることができる。そして、姿勢推定装置において、前記照合手段は、第２段階として、空間的特徴判定手段によって、前記抽出されたフレーム番号に固定し、かつ、前記パラメータ変更手段で前記関節角度パラメータの値を変更したときに、当該撮影フレーム画像に対して算出された前記ＨＯＧの差分データに基づいて、差分データが最小となるときの前記関節角度パラメータの値を特定し、推定結果として前記フレーム番号および前記関節角度パラメータの値を出力する。 According to this configuration, the posture estimation apparatus changes the value of the joint angle parameter read for each frame from the model sequence storage unit with respect to the captured frame image by the parameter changing unit within a predetermined range. To do. Then, in the posture estimation apparatus, the CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model. Generate. Further, in the posture estimation apparatus, the collating unit may be configured such that the difference calculating unit reads the joint angle parameter read from the model sequence storage unit with respect to the captured frame image or the joint angle parameter changed by the parameter changing unit. Difference data between each HOG based on the silhouette of the CG image generated using the value of H and the HOG based on the silhouette of the captured frame image is calculated. In the posture estimation apparatus, as a first step, the collating unit reads out from the model sequence storage unit when the frame number of the model sequence is changed with respect to the captured frame image by the temporal feature extracting unit. The frame number of the model sequence when the difference data is minimized is extracted based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter. Thereby, the timing of each frame of the model sequence created in advance and each frame of the captured image can be matched. In the posture estimation apparatus, as a second stage, the collating unit fixes the extracted frame number by the spatial feature determining unit and changes the value of the joint angle parameter by the parameter changing unit. Sometimes, based on the difference data of the HOG calculated for the captured frame image, the value of the joint angle parameter when the difference data is minimum is specified, and the frame number and the joint angle are obtained as an estimation result. Outputs the parameter value.

また、請求項６に記載の姿勢推定プログラムは、推定する対象物を撮影した単視点の静止画または動画像を示す撮影画像に映ったオブジェクトから、画像処理により、前記対象物の姿勢または動きを特徴付けるパラメータを推定するために、コンピュータを、画像入力手段、特定領域抽出手段、細線化手段、膨張処理手段、距離変換手段、勾配特徴量抽出手段、照合手段、として機能させるためのプログラムである。 According to a sixth aspect of the present invention, there is provided a posture estimation program that determines the posture or movement of an object by image processing from a single-view still image obtained by photographing the object to be estimated or a photographed image showing a moving image. This is a program for causing a computer to function as an image input unit, a specific region extraction unit, a thinning unit, an expansion processing unit, a distance conversion unit, a gradient feature amount extraction unit, and a collation unit in order to estimate a parameter to be characterized.

かかる構成によれば、姿勢推定プログラムは、画像入力手段によって、前記撮影画像を入力すると共に、当該撮影画像中のオブジェクトを多関節物体としてコンピュータグラフィックス（ＣＧ）用にモデル化したＣＧキャラクタモデルおよび当該ＣＧキャラクタモデルで用いる関節角度パラメータに基づいて前記撮影画像中のオブジェクトを擬似的に描画することで生成されたＣＧ画像を入力する。そして、姿勢推定プログラムは、特定領域抽出手段によって、前記入力された撮影画像から前記オブジェクトの特定領域を２値化したシルエットを抽出すると共に、前記入力されたＣＧ画像から前記オブジェクトの特定領域を２値化したシルエットを抽出する。そして、姿勢推定プログラムは、細線化手段によって、前記抽出されたそれぞれのシルエットに細線化処理を施し、膨張処理手段によって、前記細線化されたそれぞれのシルエットに膨張処理を施し、距離変換手段によって、前記膨張させたそれぞれのシルエットに距離変換を施すことで濃淡画像を生成し、勾配特徴量抽出手段によって、前記それぞれの濃淡画像の特徴量としてＨＯＧを算出する。そして、姿勢推定プログラムは、照合手段によって、前記撮影画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧと、前記ＣＧ画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧとを照合することで、前記撮影画像中のオブジェクトの関節角度パラメータを推定する。 According to this configuration, the posture estimation program inputs the photographed image by the image input unit, and also creates a CG character model obtained by modeling an object in the photographed image as a multi-joint object for computer graphics (CG), and Based on the joint angle parameter used in the CG character model, a CG image generated by pseudo-drawing an object in the captured image is input. Then, the posture estimation program extracts a silhouette obtained by binarizing the specific area of the object from the input photographed image by the specific area extraction unit, and also extracts the specific area of the object from the input CG image. Extract a valuated silhouette. Then, the posture estimation program performs thinning processing on each of the extracted silhouettes by thinning means, performs expansion processing on each of the thinned silhouettes by expansion processing means, and by distance conversion means, A gradation image is generated by performing distance conversion on each of the expanded silhouettes, and HOG is calculated as a feature amount of each of the gradation images by a gradient feature amount extraction unit. Then, the posture estimation program collates the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image by the collating unit, A joint angle parameter of an object in the captured image is estimated.

本発明によれば、次のような優れた効果を奏することができる。
請求項１に記載の発明によれば、姿勢推定装置において、撮影画像中のオブジェクトの洋服の模様などの影響を受けることなく、シルエット照合のロバスト性をいかしながら、距離変換およびＨＯＧ特徴により、画面上の位置の影響を受けずに精度の高い照合を行うことができる。また、姿勢推定装置は、細線化および膨張処理によって、撮影画像中のオブジェクトの太さを、ＣＧ画像中のオブジェクトの太さと同様に、一定の太さにすることができるので、太さの影響を受けずに精度の高い照合を行うことができる。
また、請求項６に記載の発明によれば、姿勢推定プログラムは、請求項１に記載の姿勢推定装置と同様の効果を奏することができる。 According to the present invention, the following excellent effects can be achieved.
According to the first aspect of the present invention, in the posture estimation apparatus, the distance conversion and the HOG feature can be used to improve the silhouette matching without affecting the clothes pattern of the object in the captured image. High-precision collation can be performed without being affected by the upper position. In addition, the posture estimation apparatus can reduce the thickness of an object in a captured image to a constant thickness, similar to the thickness of an object in a CG image, by thinning and expansion processing. Highly accurate collation can be performed without receiving.
According to the invention described in claim 6, the attitude estimation program can achieve the same effect as the attitude estimation apparatus described in claim 1.

請求項２に記載の発明によれば、姿勢推定装置は、一連の所定動作に対応したモデルシーケンスを記憶しているので、撮影画像中のオブジェクトの姿勢と同様な近似した姿勢をとっているＣＧ画像とのマッチングによる姿勢推定を迅速に行うことができる。 According to the second aspect of the present invention, since the posture estimation apparatus stores a model sequence corresponding to a series of predetermined actions, the CG takes an approximate posture similar to the posture of the object in the captured image. Posture estimation by matching with images can be performed quickly.

請求項３に記載の発明によれば、姿勢推定装置は、予め作成されたモデルシーケンスの関節角度パラメータの値を変更可能なので、ＣＧフレーム画像を微調整して撮影フレーム画像に合わせ込むことができる。 According to the third aspect of the present invention, since the posture estimation device can change the value of the joint angle parameter of the model sequence created in advance, the CG frame image can be finely adjusted to match the captured frame image. .

請求項４に記載の発明によれば、姿勢推定装置は、予め作成されたモデルシーケンスの各フレームと、撮影画像の各フレームとのタイミングを合わせることができる。したがって、例えばＣＧキャラクタの動作をスローモーションにしたり、高速にしたりしたときに、実写のような自然の動きを演出することができる。 According to the fourth aspect of the present invention, the posture estimation apparatus can synchronize the timing of each frame of the model sequence created in advance with each frame of the captured image. Therefore, for example, when the motion of the CG character is set to slow motion or high speed, a natural motion such as a live action can be produced.

請求項５に記載の発明によれば、姿勢推定装置は、撮影フレーム画像との間で動作のタイミングを合わせたＣＧフレーム画像を微調整して撮影フレーム画像に合わせ込むことができる。したがって、撮影動画像をＣＧ動画像に対して時間的にも空間的にも高精度に照合することができ、撮影動画像から時間的、空間的精度の高いモーションデータを頑健に得ることができる。 According to the fifth aspect of the present invention, the posture estimation device can finely adjust the CG frame image in which the timing of the operation is matched with the captured frame image to fit the captured frame image. Therefore, the captured moving image can be compared with the CG moving image with high accuracy in terms of time and space, and motion data with high temporal and spatial accuracy can be obtained robustly from the captured moving image. .

本発明の第１実施形態に係る姿勢推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the attitude | position estimation apparatus which concerns on 1st Embodiment of this invention. 図１に示す姿勢推定装置の画像処理の説明図であって、（ａ）は撮影画像、（ｂ）は撮影画像から人物領域を抽出した画像、（ｃ）は人物領域から下半身領域を抽出した画像、（ｄ）は下半身画像を細線化した画像、（ｅ）は膨張処理を施した画像、（ｆ）は距離変換を施した画像、（ｇ）は撮影画像に対応して生成されたＣＧ画像、（ｈ）はＣＧ画像から撮影画像と同様な処理により生成された距離変換後の画像をそれぞれ示している。It is explanatory drawing of the image processing of the attitude | position estimation apparatus shown in FIG. 1, (a) is a picked-up image, (b) is the image which extracted the person area from the picked-up image, (c) extracted the lower body area from the person area. (D) is an image obtained by thinning the lower body image, (e) is an image subjected to expansion processing, (f) is an image subjected to distance conversion, and (g) is a CG generated corresponding to the photographed image. An image (h) shows an image after distance conversion generated from the CG image by the same process as the captured image. 図１に示す姿勢推定装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the attitude | position estimation apparatus shown in FIG. 図３に示すＨＯＧ算出処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the HOG calculation process shown in FIG. 図４に示すＳ２１の説明図であって、原画像を示している。It is explanatory drawing of S21 shown in FIG. 4, Comprising: The original image is shown. 図４に示すＳ２２の説明図であって、（ａ）は図５から求められたセル領域、（ｂ）は（ａ）から求められた勾配ヒストグラムをそれぞれ示している。5A and 4B are explanatory diagrams of S22, in which FIG. 4A shows a cell region obtained from FIG. 5, and FIG. 4B shows a gradient histogram obtained from FIG. 図４に示すＳ２３の説明図であって、ブロックの移動の様子を示している。It is explanatory drawing of S23 shown in FIG. 4, Comprising: The mode of the movement of a block is shown. 本発明の第２実施形態に係る姿勢推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the attitude | position estimation apparatus which concerns on 2nd Embodiment of this invention. 図８に示すモデルシーケンス記憶手段の説明図である。It is explanatory drawing of the model sequence memory | storage means shown in FIG. 図８に示す時間的特徴抽出手段の説明図である。It is explanatory drawing of the temporal feature extraction means shown in FIG.

以下、本発明に係る姿勢推定装置を実施するため形態（以下「実施形態」という）について図面を参照して詳細に説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, a mode (hereinafter referred to as “embodiment”) for implementing a posture estimation apparatus according to the present invention will be described in detail with reference to the drawings.

（第１実施形態）
図１に示す姿勢推定装置１は、推定する対象物を撮影した単視点の静止画または動画像を示す撮影画像に映ったオブジェクトから、画像処理により、対象物の姿勢または動きを特徴付けるパラメータを推定するものである。
以下では、対象物を人物として、例えば「歩行」や「蹴る」といった予め定められた動作をしている人物を１台のカメラで動画撮影した撮影画像が姿勢推定装置１に入力し、フレーム別の撮影画像である撮影フレーム画像中のオブジェクトとしての人物の動きを推定するものとして説明する。ここで、フレームは、フレーム画像であって、時間方向のサンプリング周波数は特に限定されるものではなく、例えば、ノンインターレース方式（例えば２９．９７ｆｐｓ（Frame Per Second））や、フレーム画像を２種類のフィールドで読み出すインターレース方式（例えば５９．９４ｆｐｓ）でもよい。 (First embodiment)
The posture estimation apparatus 1 shown in FIG. 1 estimates parameters that characterize the posture or motion of a target object by image processing from a single-view still image obtained by capturing the target object to be estimated or a captured image showing a moving image. To do.
In the following description, a captured image obtained by taking a moving image of a person who performs a predetermined action such as “walking” or “kick” as a target object with a single camera is input to the posture estimation apparatus 1 and is frame-by-frame. In the following description, it is assumed that the motion of a person as an object in a captured frame image that is a captured image is estimated. Here, the frame is a frame image, and the sampling frequency in the time direction is not particularly limited. For example, a non-interlace method (for example, 29.97 fps (Frame Per Second)) or two types of frame images are used. An interlace method (for example, 59.94 fps) for reading in the field may be used.

姿勢推定装置１は、図１に示すように、ＣＧ生成手段２と、フレームデータ処理手段３と、照合手段４と、画像入力手段５とを備えている。
画像入力手段５は、撮影画像を入力すると共に、当該撮影画像中のオブジェクトを擬似的に描画することで生成されたＣＧ画像を入力するものである。この撮影画像およびＣＧ画像は、フレームデータ処理手段３にて画像処理される。この画像入力手段５に入力する撮影画像のフレーム番号（撮影画像フレーム番号）は、ＣＧ生成手段２に入力され、ＣＧ生成手段２のＣＧ画像生成手段２４において、撮影画像と合ったＣＧ画像を生成するための情報として利用される。なお、画像入力手段５は、記憶媒体あるいはオンラインで外部から取り込んだ画像をフレームデータ処理手段３に入力してもよいし、予め姿勢推定装置１の内部の記憶装置に格納しておいた画像を読み出してフレームデータ処理手段３に入力してもよい。 As shown in FIG. 1, the posture estimation apparatus 1 includes a CG generation unit 2, a frame data processing unit 3, a collation unit 4, and an image input unit 5.
The image input means 5 inputs a photographed image and also inputs a CG image generated by pseudo-drawing an object in the photographed image. The captured image and the CG image are subjected to image processing by the frame data processing means 3. The frame number (captured image frame number) of the captured image input to the image input unit 5 is input to the CG generation unit 2, and the CG image generation unit 24 of the CG generation unit 2 generates a CG image that matches the captured image. It is used as information for Note that the image input means 5 may input an image captured from the outside on a storage medium or online to the frame data processing means 3, or an image stored in advance in a storage device inside the posture estimation apparatus 1. It may be read and input to the frame data processing means 3.

ＣＧ生成手段２は、画像入力手段５に入力する撮影画像中のオブジェクトを描画したＣＧ画像を生成するものであって、ＣＧキャラクタモデル２１を記憶する記憶手段と、パラメータ変更手段２３と、ＣＧ画像生成手段２４と、ＣＧデータ２２を記憶するモデルシーケンス記憶手段２５とを備えている。なお、ＣＧキャラクタモデル２１を記憶する記憶手段は、ＣＧデータ２２を記憶する記憶手段と異なってもよいし、モデルシーケンス記憶手段２５を共用してもよい。 The CG generation unit 2 generates a CG image in which an object in a photographed image input to the image input unit 5 is drawn. The CG generation unit 2 stores a CG character model 21, a parameter change unit 23, and a CG image. A generation unit 24 and a model sequence storage unit 25 that stores the CG data 22 are provided. The storage means for storing the CG character model 21 may be different from the storage means for storing the CG data 22, or the model sequence storage means 25 may be shared.

ＣＧキャラクタモデル２１は、推定する対象物を多関節物体としてコンピュータグラフィックス（ＣＧ）用にモデル化したものである。本実施形態では、推定する対象物を人物としているので、ＣＧキャラクタモデル２１は、人体の関節の角度情報をパラメータとして持つ人体構造モデルを含み、人体の予め作成したＣＧパーツ等も含む。
ここで、人体構造モデルは、特に限定されず、推定しようとする動きや、必要とする精度に応じて関節等を適宜設定すればよい。例えば、「歩行」や「蹴る」といった動作について推定する場合には、指関節については無視して、関節を、例えば、肩関節、肘関節、股関節、膝関節、足関節等のように区分し、各関節を部位に応じた１〜３軸の自由度にて予め定められた角度範囲内で屈曲できるようなモデルを用いることができる。ここで、例えば「歩行」等の動きを推定するのであれば、非特許文献１に記載のように、２４次元の関節角度パラメータを用いることができる。 The CG character model 21 is obtained by modeling an object to be estimated as an articulated object for computer graphics (CG). In this embodiment, since the object to be estimated is a person, the CG character model 21 includes a human body structure model having angle information of joints of the human body as a parameter, and includes CG parts created in advance of the human body.
Here, the human body structure model is not particularly limited, and a joint or the like may be appropriately set according to the motion to be estimated and the required accuracy. For example, when estimating movements such as “walking” and “kicking”, the finger joints are ignored and the joints are classified as shoulder joints, elbow joints, hip joints, knee joints, ankle joints, and the like. A model that can bend each joint within a predetermined angle range with a degree of freedom of 1 to 3 axes according to the part can be used. Here, for example, if a motion such as “walking” is estimated, a 24-dimensional joint angle parameter can be used as described in Non-Patent Document 1.

ＣＧデータ２２は、撮影画像中のオブジェクトと照合するために擬似的に生成するＣＧ画像のオブジェクトに関する関節角度パラメータであって、ＣＧキャラクタモデル２１に基づいてＣＧ画像を描画するために用いられる。なお、図１において、ＣＧデータ２２は、フレームデータ処理手段３に入力する１つの撮影画像に対応した１つのＣＧ画像を生成するための１組の関節角度パラメータを代表している。ここで１組の関節角度パラメータとは、例えば、人体構造モデルにおいて２４次元の関節角度パラメータを採用したときには、所定の関節および当該関節の軸方向を特定することのできる２４個の角度（値）を示す。 The CG data 22 is a joint angle parameter related to an object of a CG image that is generated in a pseudo manner in order to collate with an object in a captured image, and is used to draw a CG image based on the CG character model 21. In FIG. 1, CG data 22 represents a set of joint angle parameters for generating one CG image corresponding to one captured image input to the frame data processing unit 3. Here, the one set of joint angle parameters is, for example, 24 angles (values) that can specify a predetermined joint and the axial direction of the joint when a 24-dimensional joint angle parameter is adopted in the human body structure model. Indicates.

パラメータ変更手段２３は、撮影フレーム画像に対してモデルシーケンス記憶手段２５からフレーム毎に読み出されたＣＧデータ（関節角度パラメータ）２２の値を予め定められた範囲内で変更するものである。このパラメータ変更手段２３は、ＣＧ画像の１つの姿勢に対応してモデルシーケンス記憶手段２５に予め記憶されている１組の関節角度パラメータの値を微調整する。ここで、微調整とは、例えば人物の動きにおいてある１つの関節に着目したときに、角度を例えば±４５°より大きな範囲で変化させることは、比較的大きな調整と言えるので、例えば±４５°以内の範囲で変化させること、好ましくは±３０°以内の範囲で変化させることを微調整という。例えば、パラメータ変更手段２３は、ＣＧデータ（関節角度パラメータ）２２の関節の角度を、例えば１°ずつ微調整する。この処理に続いて、ＣＧ画像生成手段２４は、関節角度の変更した値と、ＣＧキャラクタモデル２１とに基づいてＣＧフレーム画像を作成し、照合前の画像処理および照合の後に、パラメータ変更手段２３は、再び関節角度を微調整するというように処理を繰り返す。 The parameter changing means 23 changes the value of the CG data (joint angle parameter) 22 read for each frame from the model sequence storage means 25 with respect to the captured frame image within a predetermined range. The parameter changing unit 23 finely adjusts the value of a set of joint angle parameters stored in advance in the model sequence storage unit 25 corresponding to one posture of the CG image. Here, the fine adjustment is, for example, a change of the angle in a range larger than ± 45 °, for example, when focusing on one joint in the movement of a person, for example, can be said to be a relatively large adjustment. The change within the range of within ± 30 °, preferably the change within the range of ± 30 ° is referred to as fine adjustment. For example, the parameter changing unit 23 finely adjusts the joint angle of the CG data (joint angle parameter) 22 by 1 °, for example. Following this processing, the CG image generation means 24 creates a CG frame image based on the value of the joint angle changed and the CG character model 21, and after the image processing before matching and matching, the parameter changing means 23 Repeats the process to finely adjust the joint angle again.

ＣＧ画像生成手段２４は、画像入力手段５にフレーム毎に入力する撮影画像である撮影フレーム画像に対応させてモデルシーケンス記憶手段２５からフレーム毎に読み出されたＣＧデータ（関節角度パラメータ）２２またはパラメータ変更手段２３で変更された関節角度パラメータの値と、ＣＧキャラクタモデル２１とに基づいて、フレーム毎のＣＧ画像としてＣＧフレーム画像を生成するものである。ＣＧ画像生成手段２４は、画像入力手段５に入力する撮影画像を特定する撮影画像フレーム番号の入力を受け付け、撮影フレーム画像と合ったＣＧフレーム画像を生成するための情報として利用する。ＣＧ画像生成手段２４は、ＣＧデータに基づいて仮想３次元空間データを生成し、入力された関節角度に基づいてＣＧオブジェクトおよびアルファプレーンをレンダリングし、レンダリングしたＣＧオブジェクトを、アルファプレーンと共に画像入力手段５に出力する。なお、アルファプレーンは、ＣＧフレーム画像のオブジェクト領域（被写体領域）とそうでない領域とを区別する情報を有する画像である。 The CG image generation means 24 is CG data (joint angle parameter) 22 read out for each frame from the model sequence storage means 25 in correspondence with a photographic frame image that is a photographic image input to the image input means 5 for each frame. Based on the joint angle parameter value changed by the parameter changing means 23 and the CG character model 21, a CG frame image is generated as a CG image for each frame. The CG image generation unit 24 receives an input of a captured image frame number that specifies a captured image to be input to the image input unit 5 and uses it as information for generating a CG frame image that matches the captured frame image. The CG image generation unit 24 generates virtual three-dimensional space data based on the CG data, renders a CG object and an alpha plane based on the input joint angle, and outputs the rendered CG object together with the alpha plane as an image input unit. 5 is output. The alpha plane is an image having information for distinguishing an object area (subject area) from a CG frame image and a non-alpha area.

モデルシーケンス記憶手段２５は、推定する対象物が一連の所定動作を行うためのモデルとしてフレーム毎に予め作成されたＣＧデータ（関節角度パラメータ）２２の値をモデルシーケンスとして記憶するものであって、例えば一般的なハードディスクやメモリから構成されている。このモデルシーケンス記憶手段２５には、具体的には、人物が「歩く」動作を行うときのフレーム番号と関節角度パラメータの１組とを紐付けたデータ（モデルシーケンス）や、人物が「蹴る」動作を行うときのモデルシーケンスというように基本的な動きに対応したモデルがそれぞれ格納される。 The model sequence storage means 25 stores the value of CG data (joint angle parameter) 22 created in advance for each frame as a model for the object to be estimated to perform a series of predetermined operations, as a model sequence. For example, it is composed of a general hard disk or memory. Specifically, in this model sequence storage means 25, data (model sequence) in which a frame number and a set of joint angle parameters when a person performs a “walking” motion, or a person “kick” A model corresponding to a basic movement is stored, such as a model sequence for performing an action.

なお、描画したＣＧフレーム画像そのものをモデルシーケンス記憶手段２５に格納するようにしてもよい。また、本実施形態では、姿勢推定装置１において、フレームデータ処理手段３の特定領域抽出手段３１に入力するＣＧ画像を生成するために、ＣＧ画像生成手段２４を設けたが、描画したＣＧフレーム画像を姿勢推定装置１に予め蓄積している場合には、ＣＧ画像生成手段２４は必須ではない。 The drawn CG frame image itself may be stored in the model sequence storage unit 25. In the present embodiment, the pose estimation apparatus 1 is provided with the CG image generation unit 24 to generate the CG image to be input to the specific area extraction unit 31 of the frame data processing unit 3. Is stored in the posture estimation apparatus 1 in advance, the CG image generation means 24 is not essential.

フレームデータ処理手段３は、撮影画像とＣＧ画像とに対してフレーム単位で画像処理を行うものであって、特定領域抽出手段３１と、細線化手段３２と、膨張処理手段３３と、距離変換手段３４と、勾配特徴量抽出手段３５と、を備えている。なお、図１のブロック図では、説明の都合上、撮影画像を処理対象とする各手段３１〜３５に符号ａを付し、ＣＧ画像を処理対象とする各手段３１〜３５に符号ｂを付して区別したが、実質的には１つずつの手段があればよい。 The frame data processing unit 3 performs image processing on the captured image and the CG image in units of frames, and includes a specific area extracting unit 31, a thinning unit 32, an expansion processing unit 33, and a distance converting unit. 34 and a gradient feature quantity extraction means 35. In the block diagram of FIG. 1, for convenience of explanation, a sign “a” is assigned to each means 31 to 35 for processing a captured image, and a sign “b” is assigned to each means 31 to 35 for processing a CG image. However, it is only necessary to have one means each.

特定領域抽出手段３１は、撮影画像とＣＧ画像とを照合するための前処理として、入力された撮影画像からオブジェクト（人物）の特定領域を２値化したシルエットを抽出すると共に、入力されたＣＧ画像からオブジェクト（人物）の特定領域を２値化したシルエットを抽出する。本実施形態のように、対象物が人物であれば、画像上のオブジェクトの特定領域は、その一部であっても全身であってもよい。人物領域の一部として下半身領域を特定するには、例えば、「入力撮影画像の下半分」のように、画像上の位置の閾値範囲を予め定めておけばよい。なお、画像を２値化してシルエットを抽出する手法は、画像上のオブジェクト位置やそのサイズあるいは輝度の閾値を予め定めておく等の公知の手法を採用することができる。なお、後記する動作の説明に画像処理の具体例について図示して説明する。 The specific area extraction unit 31 extracts a silhouette obtained by binarizing a specific area of an object (person) from the input captured image and performs input CG as pre-processing for collating the captured image and the CG image. A silhouette obtained by binarizing a specific area of an object (person) is extracted from an image. As in the present embodiment, if the target is a person, the specific area of the object on the image may be a part of it or the whole body. In order to specify the lower body area as a part of the person area, a threshold range of the position on the image may be determined in advance, for example, “lower half of the input photographed image”. As a technique for binarizing an image and extracting a silhouette, a known technique such as predetermining an object position on the image, its size, or a brightness threshold can be employed. A specific example of image processing will be illustrated and described in the description of operations described later.

細線化手段３２は、特定領域抽出手段３１で抽出されたそれぞれのシルエットに細線化処理を施すものである。
膨張処理手段３３は、細線化手段３２で細線化されたそれぞれのシルエットに膨張処理を施すものである。
距離変換手段３４は、膨張処理手段３３で膨張させたそれぞれのシルエットに距離変換を施すことで濃淡画像を生成するものである。 The thinning means 32 performs thinning processing on each silhouette extracted by the specific area extraction means 31.
The expansion processing means 33 performs expansion processing on each silhouette thinned by the thinning means 32.
The distance conversion means 34 generates a grayscale image by performing distance conversion on each silhouette expanded by the expansion processing means 33.

ここで、細線化処理、膨張処理、距離変換は、撮影画像とＣＧ画像とを照合するための前処理であって、一般的な画像処理ソフトウェアにライブラリ化されている関数を用いることで実現できる。
細線化処理は、２値画像のシルエットを幅１ピクセルの線画像に変換する。
膨張処理は、細線を均等な太さに拡幅する。
距離変換は、値が０と１の２値画像の各画素に対して、そこから値が０である画素への最短距離を与える変換を示す。 Here, the thinning process, the expansion process, and the distance conversion are pre-processes for collating a captured image with a CG image, and can be realized by using a function that is stored in a library in general image processing software. .
In the thinning process, the silhouette of the binary image is converted into a line image having a width of 1 pixel.
In the expansion process, the fine line is widened to a uniform thickness.
The distance conversion is a conversion that gives the shortest distance from each pixel of a binary image having values 0 and 1 to a pixel having a value 0.

勾配特徴量抽出手段３５は、撮影画像とＣＧ画像とを照合するための前処理を行った後の画像であるそれぞれの濃淡画像の特徴量として、ＨＯＧ（Histogram of Oriented Gradient）を算出するものである。ＨＯＧは、画像の着目する画素について水平方向および垂直方向に隣接する画素間の明るさの差を輝度勾配として抽出した特徴量を示す。ここで算出されたＨＯＧは、照合手段４に出力され、撮影画像とＣＧ画像との照合に用いられる。なお、ＨＯＧの参考文献として、「N.Dalal and B.Triggs，“Histograms of Oriented Gradients for Human Detection,” IEEE Conputer Vision and Pattern Recognition, 886-893, 2005.」が知られている。 The gradient feature quantity extraction means 35 calculates HOG (Histogram of Oriented Gradient) as the feature quantity of each grayscale image, which is an image after preprocessing for collating the captured image and the CG image. is there. HOG indicates a feature amount obtained by extracting a brightness difference between pixels adjacent in the horizontal direction and the vertical direction as a luminance gradient for a pixel of interest of an image. The HOG calculated here is output to the collation means 4 and used for collation between the captured image and the CG image. As a reference for HOG, “N. Dalal and B. Triggs,“ Histograms of Oriented Gradients for Human Detection, ”IEEE Computer Vision and Pattern Recognition, 886-893, 2005.” is known.

なお、本実施形態では、特定領域抽出手段３１は、撮影フレーム画像およびＣＧフレーム画像を２値化することでシルエットを抽出するので、前記した細線化手段３２、膨張処理手段３３、距離変換手段３４および勾配特徴量抽出手段３５も、撮影フレーム画像およびＣＧフレーム画像に対してフレーム別に画像処理を施す。 In the present embodiment, the specific area extracting unit 31 extracts the silhouette by binarizing the captured frame image and the CG frame image, so the thinning unit 32, the expansion processing unit 33, and the distance converting unit 34 described above. The gradient feature amount extraction unit 35 also performs image processing for each frame on the captured frame image and the CG frame image.

照合手段４は、撮影画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧと、ＣＧ画像中のオブジェクトのシルエットに基づいて算出されたＨＯＧとを照合することで、撮影画像中のオブジェクトの関節角度パラメータを推定するものである。この照合手段４は、図１に示すように、差分算出手段４１と、差分データ記憶手段４２と、空間的特徴判定手段４３と、を備えている。 The collating means 4 collates the HOG calculated based on the silhouette of the object in the photographed image with the HOG calculated based on the silhouette of the object in the CG image, so that the joint angle of the object in the photographed image The parameter is estimated. As shown in FIG. 1, the collation unit 4 includes a difference calculation unit 41, a difference data storage unit 42, and a spatial feature determination unit 43.

差分算出手段４１は、撮影フレーム画像に対してモデルシーケンス記憶手段２５から読み出されたＣＧデータ（関節角度パラメータ）２２またはパラメータ変更手段２３で変更された関節角度パラメータの値を用いて生成されるＣＧ画像のシルエットに基づく各ＨＯＧと、当該撮影フレーム画像のシルエットに基づくＨＯＧとの差分データをそれぞれ算出するものである。算出された差分データは、差分データ記憶手段４２に格納される。 The difference calculating unit 41 is generated using the CG data (joint angle parameter) 22 read from the model sequence storage unit 25 or the value of the joint angle parameter changed by the parameter changing unit 23 with respect to the captured frame image. Difference data between each HOG based on the silhouette of the CG image and the HOG based on the silhouette of the captured frame image is calculated. The calculated difference data is stored in the difference data storage means 42.

差分データ記憶手段４２は、フレーム番号５１と、パラメータ５２と、差分データ５３とを紐付けて記憶するものであって、ハードディスク等の記憶装置である。
フレーム番号５１は、モデルシーケンス記憶手段２５から読み出されたＣＧデータ（関節角度パラメータ）２２のフレーム番号である。
パラメータ５２は、フレーム番号５１に対応した関節角度パラメータの値、またはフレーム番号５１においてパラメータ変更手段２３で変更された関節角度パラメータの値である。
差分データ５３は、撮影フレーム画像に対応し、パラメータ５２から生成されたＣＧフレーム画像のシルエットに基づいて算出されたＨＯＧの差分データである。 The difference data storage means 42 stores frame numbers 51, parameters 52, and difference data 53 in association with each other, and is a storage device such as a hard disk.
The frame number 51 is the frame number of the CG data (joint angle parameter) 22 read from the model sequence storage unit 25.
The parameter 52 is the value of the joint angle parameter corresponding to the frame number 51 or the value of the joint angle parameter changed by the parameter changing unit 23 in the frame number 51.
The difference data 53 corresponds to the captured frame image and is HOG difference data calculated based on the silhouette of the CG frame image generated from the parameter 52.

空間的特徴判定手段４３は、モデルシーケンスのフレーム番号を固定したときに、当該撮影フレーム画像に対して算出されたＨＯＧの差分データに基づいて、差分データが最小となるときの関節角度パラメータの値を判定するものである。
照合手段４は、推定結果として、このフレーム番号および関節角度パラメータの値を出力する。 When the frame number of the model sequence is fixed, the spatial feature determination unit 43 determines the value of the joint angle parameter when the difference data is minimum based on the HOG difference data calculated for the captured frame image. Is determined.
The collation means 4 outputs the frame number and the value of the joint angle parameter as the estimation result.

［姿勢推定装置の動作］
次に、図２および図３を参照（適宜図１参照）して姿勢推定装置１の動作について説明する。図２には、姿勢推定装置１のフレームデータ処理手段３のうち、特定領域抽出手段３１、細線化手段３２、膨張処理手段３３および距離変換手段３４の処理例を示している。また、この例では、人物がボールを蹴る素振りを撮影した撮影画像から下半身の動作を推定するものとして説明する。 [Operation of posture estimation device]
Next, the operation of the posture estimation apparatus 1 will be described with reference to FIGS. 2 and 3 (refer to FIG. 1 as appropriate). FIG. 2 shows a processing example of the specific area extraction unit 31, the thinning unit 32, the expansion processing unit 33, and the distance conversion unit 34 in the frame data processing unit 3 of the posture estimation apparatus 1. Further, in this example, the description will be made assuming that the motion of the lower body is estimated from a captured image in which a person kicks a ball.

図３は、図１に示す姿勢推定装置の動作を示すフローチャートである。
まず、姿勢推定装置１において、画像入力手段５によって、フレームデータ処理手段３に撮影画像を入力する（ステップＳ１）。そして、フレームデータ処理手段３において、特定領域抽出手段３１ａには、図２（ａ）に示す撮影画像が入力される。特定領域抽出手段３１ａは、まず、撮影画像を２値化して図２（ｂ）に示す人物領域のシルエットを抽出し、次いで、この場合には、図２（ｃ）に示すようにシルエットの下半身領域を特定領域として抽出する（ステップＳ２）。下半身領域については、例えば、画像の位置による閾値範囲を「画像下の半分」のように予め定めておくことで抽出できる。 FIG. 3 is a flowchart showing the operation of the posture estimation apparatus shown in FIG.
First, in the posture estimation apparatus 1, a photographed image is input to the frame data processing unit 3 by the image input unit 5 (step S1). Then, in the frame data processing means 3, the photographed image shown in FIG. 2A is input to the specific area extracting means 31a. The specific area extracting means 31a first binarizes the photographed image to extract the silhouette of the person area shown in FIG. 2 (b), and then in this case, the lower half of the silhouette as shown in FIG. 2 (c). An area is extracted as a specific area (step S2). The lower body region can be extracted, for example, by setting a threshold range based on the position of the image in advance, such as “half lower image”.

そして、細線化手段３２ａは、抽出したシルエットに対して、図２（ｄ）に示すように細線化処理を施し（ステップＳ３）、膨張処理手段３３ａは、細線化されたシルエットに対して図２（ｅ）に示すように膨張処理を施す（ステップＳ４）。さらに、距離変換手段３４ａは、膨張させたシルエットに対して図２（ｆ）に示すように距離変換を施すことで２値画像から濃淡画像を生成する（ステップＳ５）。そして、勾配特徴量抽出手段３５ａは、撮影画像に基づく濃淡画像（図２（ｆ）参照）についてのＨＯＧを算出する（ステップＳ６）。なお、勾配特徴量抽出手段３５の処理の具体例については後記する。 The thinning means 32a performs thinning processing on the extracted silhouette as shown in FIG. 2D (step S3), and the expansion processing means 33a applies the thinning silhouette to FIG. Expansion processing is performed as shown in (e) (step S4). Further, the distance converting unit 34a generates a grayscale image from the binary image by performing distance conversion on the expanded silhouette as shown in FIG. 2 (f) (step S5). Then, the gradient feature amount extraction unit 35a calculates the HOG for the grayscale image (see FIG. 2F) based on the captured image (step S6). A specific example of processing of the gradient feature quantity extraction unit 35 will be described later.

一方、撮影画像に対応したＣＧ画像を生成するために、ＣＧデータ２２において、図２に示す例では、下半身の動作に着目しているので、左腰、左膝、左足首、および、右腰、右膝、右足首の関節角度を設定している。そして、ＣＧ画像生成手段２４は、撮影画像に対応したＣＧデータ２２の関節角度の設定値と、ＣＧキャラクタモデル２１とに基づいて、図２（ｇ）に示すようにＣＧフレーム画像を作成する（ステップＳ７）。 On the other hand, in the example shown in FIG. 2 in the CG data 22 in order to generate a CG image corresponding to the captured image, attention is paid to the movement of the lower body, so the left hip, left knee, left ankle, and right hip The joint angle of the right knee and right ankle is set. Then, the CG image generating means 24 creates a CG frame image as shown in FIG. 2G based on the joint angle setting value of the CG data 22 corresponding to the photographed image and the CG character model 21 ( Step S7).

そして、撮影画像に対して行ったように、ＣＧ画像に対して、ＣＧ特定領域抽出手段３１ｂによる処理（ステップＳ８）、細線化手段３２ｂによる処理（ステップＳ９）、膨張処理手段３３ｂによる処理（ステップＳ１０）、距離変換手段３４ｂによる処理（ステップＳ１１）が順次実行され、図２（ｈ）に示すように距離変換が施された濃淡画像が生成される。そして、勾配特徴量抽出手段３５ｂは、ＣＧ画像に基づく濃淡画像（図２（ｈ）参照）についてのＨＯＧを算出する（ステップＳ１２）。 Then, as performed on the captured image, the processing by the CG specific area extraction unit 31b (step S8), the processing by the thinning unit 32b (step S9), and the processing by the expansion processing unit 33b (step) S10), the process (step S11) by the distance conversion means 34b is sequentially executed, and a grayscale image subjected to the distance conversion is generated as shown in FIG. 2 (h). Then, the gradient feature amount extraction unit 35b calculates the HOG for the grayscale image (see FIG. 2H) based on the CG image (step S12).

次いで、照合手段４は、図２（ｆ）の濃淡画像と、図２（ｈ）の濃淡画像とについて、ＨＯＧ特徴の比較を行う。ここで、照合手段４の差分算出手段４１は、ＣＧ画像のシルエットに基づくＨＯＧと、当該撮影フレーム画像のシルエットに基づくＨＯＧとの差分データをそれぞれ算出し、フレーム番号５１およびパラメータ５２と紐付けて差分データ５３を差分データ記憶手段４２に格納する（ステップＳ１３）。 Next, the collating unit 4 compares the HOG characteristics of the grayscale image in FIG. 2 (f) and the grayscale image in FIG. 2 (h). Here, the difference calculating means 41 of the matching means 4 calculates difference data between the HOG based on the silhouette of the CG image and the HOG based on the silhouette of the captured frame image, and associates them with the frame number 51 and the parameter 52. The difference data 53 is stored in the difference data storage means 42 (step S13).

そして、パラメータ変更手段２３は、例えば±３０°の範囲といった予め定められたパラメータ（関節角度）の値をすべて選択していない場合（ステップＳ１４：Ｎｏ）、関節角度パラメータの値を変更する（ステップＳ１５）。すなわち、パラメータ変更手段２３は、ＣＧデータ（関節角度パラメータ）２２の関節の角度を、例えば１°ずつ微調整して、ステップＳ７に戻ると、ＣＧ画像生成手段２４は、関節角度の変更した値と、ＣＧキャラクタモデル２１とに基づいてＣＧフレーム画像を作成し、同様な画像処理によって図２（ｈ）に示すような新たな濃淡画像を得ることを繰り返す。 And the parameter change means 23 changes the value of a joint angle parameter, when not selecting all the values of predetermined parameters (joint angle), for example, the range of ± 30 ° (Step S14: No). S15). That is, when the parameter changing unit 23 finely adjusts the joint angle of the CG data (joint angle parameter) 22 by 1 °, for example, and returns to step S7, the CG image generating unit 24 changes the value of the joint angle. Then, a CG frame image is created based on the CG character model 21 and a new grayscale image as shown in FIG.

一方、ステップＳ１４において、予め定められたパラメータ（関節角度）の値をすべて選択した場合（ステップＳ１４：Ｙｅｓ）、照合手段４の空間的特徴判定手段４３は、当該撮影フレーム画像に対して格納されているＨＯＧの差分データ５３のうち、差分データが最小となるときのフレーム番号５１およびパラメータ５２を推定結果として出力する（ステップＳ１６）。すなわち、照合手段４は、最も照合度のよい関節角度の値を、推定されたパラメータとして出力する。 On the other hand, when all the values of predetermined parameters (joint angles) are selected in step S14 (step S14: Yes), the spatial feature determination unit 43 of the matching unit 4 is stored for the captured frame image. The frame number 51 and the parameter 52 when the difference data is the smallest among the HOG difference data 53 are output as estimation results (step S16). That is, the matching unit 4 outputs the value of the joint angle with the best matching degree as the estimated parameter.

以上が姿勢推定装置１のフレームデータ処理手段３による1枚の撮影フレーム画像についての処理である。したがって、撮影動画像のすべてのフレーム画像について、前記ステップＳ１〜Ｓ１６の処理を同様に行うことで、撮影動画像中の人物の動作を推定することができる。 The above is the process for one captured frame image by the frame data processing means 3 of the posture estimation apparatus 1. Therefore, by performing the processing in steps S1 to S16 in the same manner for all frame images of the captured moving image, it is possible to estimate the motion of the person in the captured moving image.

［ＨＯＧ算出処理］
次に、図３のステップＳ６，Ｓ１２に示すＨＯＧ算出処理について図４ないし図７を参照（適宜図１参照）して説明する。図４は、ＨＯＧ算出処理の概要を示すフローチャートである。ＨＯＧ算出処理は、例えば、前記したＨＯＧの参考文献や非特許文献１等に開示されている公知技術なので、以下ではその概要を簡単に説明する。 [HOG calculation processing]
Next, the HOG calculation process shown in steps S6 and S12 of FIG. 3 will be described with reference to FIGS. 4 to 7 (refer to FIG. 1 as appropriate). FIG. 4 is a flowchart showing an outline of the HOG calculation process. Since the HOG calculation process is a known technique disclosed in, for example, the above-mentioned HOG reference documents, Non-Patent Document 1, and the like, an outline thereof will be briefly described below.

ＨＯＧ算出処理では、第１段階として、画像から輝度勾配を算出する（ステップＳ２１）。そして、第２段階として、算出された輝度勾配から、セル毎に勾配方向ヒストグラムを算出する（ステップＳ２２）。そして、第３段階として、算出された勾配方向ヒストグラムを用いて画像のブロック毎に特徴量の正規化を行う（ステップＳ２３）。ＨＯＧは、輝度勾配のヒストグラムを基本としているので、例えば人物の下半身領域の位置や大きさの影響を受けにくいという性質がある。そのため、姿勢推定装置１の勾配特徴量抽出手段３５は、ステップＳ２１〜Ｓ２３の各処理を実行することとした。 In the HOG calculation process, as a first step, a luminance gradient is calculated from the image (step S21). Then, as a second stage, a gradient direction histogram is calculated for each cell from the calculated luminance gradient (step S22). Then, as a third stage, the feature amount is normalized for each block of the image using the calculated gradient direction histogram (step S23). Since HOG is based on a histogram of luminance gradient, for example, it has a property that it is hardly affected by the position and size of the lower body region of a person. Therefore, the gradient feature quantity extraction unit 35 of the posture estimation apparatus 1 executes each process of steps S21 to S23.

以下、ＨＯＧ算出処理の第１段階（ステップＳ２１）〜第３段階（ステップＳ２３）を順次説明する。ここでは、原画像の一例として、図５に示すような歩行中の人物の撮影画像を用い、この原画像から動きを推定する場合を想定する。 Hereinafter, the first stage (step S21) to the third stage (step S23) of the HOG calculation process will be sequentially described. Here, as an example of the original image, a case is assumed in which a captured image of a walking person as shown in FIG. 5 is used and the motion is estimated from the original image.

＜第１段階（ステップＳ２１）＞
第１段階では、原画像から輝度勾配（輝度勾配画像）を求める。具体的には、原画像の各ピクセル（画素）における輝度の勾配強度ｍと勾配方向θを算出する。ここで、画像中の左上隅を原点として、画素の水平方向の座標をｕ、画素の垂直方向の座標をｖ、画素（ｕ，ｖ）における輝度値をＩ（ｕ，ｖ）とすると、当該画素（ｕ，ｖ）における勾配強度ｍ（ｕ，ｖ）は、次の式（１）で表される。また、当該画素（ｕ，ｖ）における勾配方向θ（ｕ，ｖ）は、次の式（２）で表される。 <First stage (step S21)>
In the first stage, a luminance gradient (luminance gradient image) is obtained from the original image. Specifically, the gradient intensity m and gradient direction θ of each pixel (pixel) of the original image are calculated. Here, assuming that the upper left corner in the image is the origin, the horizontal coordinate of the pixel is u, the vertical coordinate of the pixel is v, and the luminance value at the pixel (u, v) is I (u, v), The gradient intensity m (u, v) at the pixel (u, v) is expressed by the following equation (1). Further, the gradient direction θ (u, v) in the pixel (u, v) is expressed by the following equation (2).

＜第２段階（ステップＳ２２）＞
第２段階では、輝度勾配θ（輝度勾配画像）を用いて勾配方向ヒストグラムを算出する。このために、図６（ａ）に示すように、輝度勾配画像をマトリクス状に複数のセル１０１に分割する。ここで、図６（ａ）に示す画像例では、横５×縦５の２５個の画素を１セルとし、輝度勾配画像を、横６×縦１２の７２個のセル１０１に分割した。また、図６（ａ）に示す画像例では、輝度勾配画像において、人物の輪郭を黒色細線で示し、他の領域をすべて白色で示したが、輝度勾配θの角度に応じたカラー表示を行うと、輪郭の線を含めてすべての領域がカラー表示されることになる。 <Second Stage (Step S22)>
In the second stage, a gradient direction histogram is calculated using the luminance gradient θ (luminance gradient image). For this purpose, as shown in FIG. 6A, the luminance gradient image is divided into a plurality of cells 101 in a matrix. Here, in the image example shown in FIG. 6A, 25 pixels of 5 × 5 pixels are defined as one cell, and the luminance gradient image is divided into 72 cells 101 of 6 × 12 pixels. In the image example shown in FIG. 6A, in the luminance gradient image, the outline of the person is indicated by a black thin line and all other areas are indicated by white. However, color display according to the angle of the luminance gradient θ is performed. Then, all the areas including the outline line are displayed in color.

また、セル１０１の中に画素毎に図示した２５個の矢印は、その方向が当該画素における輝度勾配θを示し、その大きさ（magnitude）が勾配強度ｍを示す。輝度勾配θは、実際には−１８０°〜＋１８０°までの値で算出されるが、一直線上の向きを無視して方向のみを考慮するため、負の値には１８０°を加算してシフト変換することで、以下では、輝度勾配は０〜１８０°の値であるものとする。この場合、０°と１８０°とは同じことを意味する。なお、シフト変換後の輝度勾配についても同じ記号（θ）を用いる。 In addition, the 25 arrows shown for each pixel in the cell 101 indicate the luminance gradient θ in the pixel, and the magnitude indicates the gradient intensity m. The luminance gradient θ is actually calculated as a value from −180 ° to + 180 °. However, in order to ignore only the direction on a straight line and consider only the direction, the negative value is shifted by adding 180 °. In the following description, it is assumed that the luminance gradient has a value of 0 to 180 °. In this case, 0 ° and 180 ° mean the same thing. The same symbol (θ) is used for the luminance gradient after shift conversion.

また、ここでは、輝度勾配θの０〜１８０°の範囲の分割数を９とする。つまり、輝度勾配θを、次の（１）〜（９）の区間に分割する。各区間において、例えば、下限の値は含まれず、上限の値は含まれることとする。
（１）０〜２０°
（２）２０〜４０°
（３）４０〜４０°
（４）６０〜４０°
（５）８０〜１００°
（６）１００〜１２０°
（７）１２０〜１４０°
（８）１４０〜１６０°
（９）１６０〜１８０° Here, the number of divisions in the range of 0 to 180 ° of the luminance gradient θ is assumed to be 9. That is, the luminance gradient θ is divided into the following sections (1) to (9). In each section, for example, the lower limit value is not included, and the upper limit value is included.
(1) 0-20 °
(2) 20-40 °
(3) 40-40 °
(4) 60-40 °
(5) 80-100 °
(6) 100-120 °
(7) 120-140 °
(8) 140-160 °
(9) 160-180 °

セル毎に、つまり、２５個の画素を１つの単位として求めた勾配方向ヒストグラムの一例を図６（ｂ）に示す。この例では、前記した（５）８０〜１００°の区間における輝度勾配が最も大きいことが分かる。 An example of the gradient direction histogram obtained for each cell, that is, with 25 pixels as one unit is shown in FIG. In this example, it can be seen that (5) the luminance gradient in the section of 80 to 100 ° is the largest.

以下では、図６（ａ）に示す輝度勾配画像におけるセル１０１の位置座標を（ｉ，ｊ）で示す（１≦ｉ≦６，１≦ｊ≦１４）。また、セル（ｉ，ｊ）において、勾配方向が９分割されたそれぞれの方向における大きさをｆ_１，ｆ_２，ｆ_３，ｆ_４，ｆ_５，ｆ_６，ｆ_７，ｆ_８，ｆ_９とする。この場合、１つのセル（ｉ，ｊ）の特徴ベクトルＦ_ｉｊは式（３）のように９次元で表される。 Hereinafter, the position coordinates of the cell 101 in the luminance gradient image shown in FIG. 6A are indicated by (i, j) (1 ≦ i ≦ 6, 1 ≦ j ≦ 14). Further, in the cell (i, j), the magnitudes in the respective directions obtained by dividing the gradient direction into nine are represented by f ₁ , f ₂ , f ₃ , f ₄ , f ₅ , f ₆ , f ₇ , f ₈ , f _9. And In this case, the feature vector F _ij of one cell (i, j) is expressed in 9 dimensions as in Expression (3).

＜第３段階（ステップＳ２３）＞
第３段階では、算出された勾配方向ヒストグラムから画像のブロック毎に特徴量の正規化を行う。このために、図７に示すように、セル１０１に分割された輝度勾配画像において、複数のセル１０１を一度に選択して構成されたブロック１０２を想定する。なお、このブロックは、一部の領域が互いに重なっても構わないものである。 <Third stage (step S23)>
In the third stage, the feature amount is normalized for each block of the image from the calculated gradient direction histogram. For this purpose, as shown in FIG. 7, a block 102 configured by selecting a plurality of cells 101 at a time in a luminance gradient image divided into cells 101 is assumed. In this block, some areas may overlap each other.

図７に示す画像例では、横６×縦１２の７２個のセル１０１が表示されており、横３×縦３の９個のセル１０１を１つのブロック１０２として選択する。この場合、前記した式（３）を利用し、１つのブロック内の左上隅のセルの位置を（ｉ，ｊ）とすると、ある位置（識別子ｋ）にある１つのブロックの特徴ベクトルＶ_ｋは、次の式（４）のように８１次元で表される。 In the image example shown in FIG. 7, 72 cells 101 of 6 × 12 are displayed, and nine cells 101 of 3 × 3 are selected as one block 102. In this case, if the position of the upper left corner cell in one block is (i, j) using the above-described equation (3), the feature vector V _k of one block at a certain position (identifier k) is It is expressed in 81 dimensions as the following equation (4).

そして、前記したようにブロックは一部の領域が互いに重なっても構わない。ここで、図７の画像例において、例えば１列目から３列目まで、かつ、２行目〜４行目までの範囲の９つのセルを選択した太線で囲まれたブロック（仮にこれをｂ＝１のブロックと呼ぶ）を想定する。このブロック全体を画像の上側にセル１つ分だけシフト移動したきにも別のブロック（同様にｂ＝２のブロックと呼ぶ；図中符号は省略、以下同様）が構成される。この状態から、それ以上上側にはブロックを選択することはできない。一方、この状態からブロック全体を画像の右側にセル１つ分だけシフトすれば別のブロック（ｂ＝３）が構成される。また、同様に右側にセル１つ分だけシフトすれば別のブロック（ｂ＝４）が構成される。さらに、右側にセル１つ分だけシフトしたきにも別のブロック（ｂ＝５）が構成され、それ以上右側にはブロックを選択することはできない。 As described above, some areas of the block may overlap each other. Here, in the image example of FIG. 7, for example, a block surrounded by a thick line in which nine cells in the range from the first column to the third column and from the second row to the fourth row are selected (assuming this is b) = 1 block). Even when the entire block is shifted to the upper side of the image by one cell, another block (similarly referred to as a block of b = 2; reference numerals are omitted in the figure, the same applies hereinafter) is formed. From this state, a block cannot be selected any further upward. On the other hand, if the entire block is shifted by one cell to the right side of the image from this state, another block (b = 3) is formed. Similarly, another block (b = 4) is formed by shifting to the right by one cell. Furthermore, another block (b = 5) is formed even when the cell is shifted to the right by one cell, and no further block can be selected on the right.

以上のようにセル１つ分ずつシフトして選択される５個のブロック（ｂ＝１）〜ブロック（ｂ＝５）を重ね合わせた状態を図７の上側に模式的に示す。各ブロックには９個のセルが含まれており、セル１つ分ずつブロックをシフトした場合、セルの重なりが生じる。図７において、重なりが多いセルほど、模様が大きく濃く表示されている。この模様は、セル別のヒストグラムに基づく勾配方向θの９区間（９方向）と、その大きさを模式的に示している。 A state in which five blocks (b = 1) to blocks (b = 5) selected by shifting by one cell as described above are superimposed is schematically shown on the upper side of FIG. Each block includes nine cells. When the block is shifted by one cell, cell overlap occurs. In FIG. 7, the pattern is displayed larger and darker as the number of overlapping cells increases. This pattern schematically shows nine sections (9 directions) of the gradient direction θ based on the histogram for each cell and the size thereof.

図７の画像例において、ブロックをシフト移動した場合、処理途中に、横４×縦１０の４０個のブロックが選択可能である。これらすべてについて、識別子ｋ（ｋ＝１〜４０）で識別する。なお、図７において、水平方向の「１，２，３，４」の目盛りは、画像左上ブロックを原点として、画像の水平方向にシフト移動により選択可能なブロック数を示し、同様に垂直方向の「１，４，７，１０」の目盛りは、画像の垂直方向にシフト移動により選択可能なブロック数を示す。この例では、４０個のブロックにおいて、前記した式（３）と前記した式（４）を適用する。ブロック内のセルは（３×３）個存在する。このとき、セルの勾配方向ヒストグラムをｆとして、当該ブロックの特徴ベクトルＶの大きさにより正規化した大きさｖは、次の式（５）で表される。なお、ｆの中身は、（勾配方向「＝９」）×（ブロック内のセルの数「＝９」）×（ブロックの数「＝４０」）の計算結果の値（＝３２４０）と同じ次元となる。 In the image example of FIG. 7, when the block is shifted, 40 blocks of 4 × 10 can be selected during the process. All of these are identified by an identifier k (k = 1 to 40). In FIG. 7, the scale of “1, 2, 3, 4” in the horizontal direction indicates the number of blocks that can be selected by shifting in the horizontal direction of the image with the upper left block of the image as the origin, and similarly in the vertical direction. The scale of “1, 4, 7, 10” indicates the number of blocks that can be selected by shifting in the vertical direction of the image. In this example, the above-described equation (3) and the above-described equation (4) are applied to 40 blocks. There are (3 × 3) cells in the block. At this time, the magnitude v normalized by the magnitude of the feature vector V of the block, where f is the gradient direction histogram of the cell, is expressed by the following equation (5). The content of f is the same dimension as the value (= 3240) of the calculation result of (gradient direction “= 9”) × (number of cells in block “= 9”) × (number of blocks “= 40”). It becomes.

これにより、撮影画像のシルエットに基づいて算出されたＨＯＧから得られたｖ（ｖ_ｍ）と、ＣＧ画像のシルエットに基づいて算出されたＨＯＧから得られたｖ（ｖ_ｃｇ）と、の距離が小さい方が類似度が大きいと評価することができる。 Thereby, the distance between v (v _m ) obtained from the HOG calculated based on the silhouette of the photographed image and v (v _cg ) obtained from the HOG calculated based on the silhouette of the CG image is It can be evaluated that the smaller the degree of similarity, the greater the degree of similarity.

ここで、ｖ（ｖ_ｍ）とｖ（ｖ_ｃｇ）との距離は、ヒストグラム同士の差分となる。この差分（差分データ）は、例えば、各ヒストグラムの階級（勾配方向の角度の区間）毎の差分を加工した正の値の累積和とすることができる。また、階級毎の差分を加工した正の値の累積和を計算する手法としては、例えば、階級毎の大きさの差分の２乗和、差分の絶対値和等がある。 Here, the distance between v (v _m ) and v (v _cg ) is the difference between the histograms. This difference (difference data) can be, for example, a cumulative sum of positive values obtained by processing the difference for each histogram class (gradient of angle in the gradient direction). Further, as a method for calculating the cumulative sum of positive values obtained by processing the difference for each class, for example, there are a square sum of the magnitude difference for each class, an absolute value sum of the differences, and the like.

第１実施形態によれば、洋服の模様などの影響を受けることなく、シルエット照合のロバスト性をいかしながら、距離変換およびＨＯＧ特徴により、画面上の位置の影響を受けない精度の高い照合が行える。また、細線化、膨張処理により、手足が一定の太さになり、太さの影響を受けない。したがって、第１実施形態によれば、撮影動画像から空間的精度の高いモーションデータを頑健に得ることができる。 According to the first embodiment, highly accurate matching can be performed without being affected by the position on the screen by distance conversion and the HOG feature while using the robustness of silhouette matching without being influenced by the pattern of clothes. . In addition, the limbs have a constant thickness due to the thinning and expansion processing, and are not affected by the thickness. Therefore, according to the first embodiment, motion data with high spatial accuracy can be obtained robustly from the captured moving image.

（第２実施形態）
図８に示す姿勢推定装置１Ｂは、撮影画像とＣＧ画像との照合において、推定する対象物（人物）の動作の空間的特徴のみならず、時間的特徴も考慮して、姿勢または動作を推定するものである。この姿勢推定装置１Ｂは、図８に示すように、ＣＧ生成手段２と、フレームデータ処理手段３と、照合手段４Ｂと、を備えている。この姿勢推定装置１Ｂにおいて、図１に示す姿勢推定装置１と同じ構成には、同じ符号を付して説明を適宜省略する。 (Second Embodiment)
The posture estimation apparatus 1B shown in FIG. 8 estimates the posture or motion in consideration of not only the spatial feature of the motion of the object (person) to be estimated but also the temporal feature when collating the captured image with the CG image. To do. As shown in FIG. 8, the posture estimation apparatus 1B includes a CG generation unit 2, a frame data processing unit 3, and a collation unit 4B. In this posture estimation device 1B, the same components as those in the posture estimation device 1 shown in FIG.

また、モデルシーケンス記憶手段２５には、撮影動画像のシーケンスと同じようなＣＧ動画像を作成できるように予めＣＧデータ（関節角度パラメータ）２２が作成されていることとする。例えば、図９（ａ）に示すように、人物がボールを蹴る素振りの構えから、回り込みながら蹴って、身体の向きを変えるまでの動作に関する撮影動画像のシーケンスが姿勢推定装置１Ｂに入力する場合、予め同じようなモデルシーケンスが用意され、ＣＧ動画像として作成される。モデルシーケンスから作成されたＣＧ動画像シーケンスの一例を図９（ｂ）に示す。なお、フレームのサンプリング周期やフレーム枚数は同じでも相違してもよいが、同じであることが好ましい。 In the model sequence storage unit 25, it is assumed that CG data (joint angle parameter) 22 is generated in advance so that a CG moving image similar to the sequence of the captured moving image can be generated. For example, as shown in FIG. 9 (a), when a sequence of captured moving images related to a motion from a posture of a person kicking a ball to kicking while turning around and changing the direction of the body is input to the posture estimation device 1B A similar model sequence is prepared in advance and created as a CG moving image. An example of the CG moving image sequence created from the model sequence is shown in FIG. The frame sampling cycle and the number of frames may be the same or different, but are preferably the same.

図８に示す照合手段４Ｂにおいて、姿勢推定のための手法は大きく２段階に分けられる。第１段階は、時系列フレーム全体に対する時間的処理である。ここでは、「蹴る」等の予め定められたモーションが表示される撮影動画像の各フレームに対して、撮影動画像で表示されるものと同様のモーションを表示可能なモデルシーケンス内のフレームと照合を行い、最も類似しているフレームを抽出する。この第１段階により、撮影動画像とモデルシーケンスとの時間的なフレーム対応関係が得られる。
第２段階は、各フレームに対する空間的処理である。ここでは、第１段階で抽出されたモデルシーケンスのフレームについて、関節角度パラメータを調節して新たに作成したフレームと、対応する撮影動画像フレームとの照合を繰り返して、より実物の姿勢に近い関節角度を求める。 In the matching means 4B shown in FIG. 8, the method for posture estimation is roughly divided into two stages. The first stage is a temporal process for the entire time series frame. Here, each frame of the captured moving image in which a predetermined motion such as “kick” is displayed is collated with a frame in the model sequence that can display the same motion as that displayed in the captured moving image. To extract the most similar frame. With this first stage, a temporal frame correspondence between the captured moving image and the model sequence is obtained.
The second stage is spatial processing for each frame. Here, with respect to the frame of the model sequence extracted in the first step, the joint newly created by adjusting the joint angle parameter and the corresponding captured moving image frame are repeatedly collated to obtain a joint closer to the real posture. Find the angle.

このため、図８に示すように、照合手段４Ｂは、差分算出手段４１と、差分データ記憶手段４２と、空間的特徴判定手段４３と、時間的特徴抽出手段４４とを備えている。
時間的特徴抽出手段４４は、撮影フレーム画像に対してモデルシーケンスのフレーム番号を変化させたときにモデルシーケンス記憶手段２５から読み出されるＣＧデータ（関節角度パラメータ）２２の値を用いて生成されるＣＧ画像についてのＨＯＧの差分データに基づいて、差分データが最小となるときのモデルシーケンスのフレーム番号を抽出する。時系列に並べたフレームを連続的に観察すると、オブジェクトの姿勢が連続的に変化することが分かる。これは動作の時間的変化と同じ意味である。この時間的特徴抽出手段４４は、予め作成したモデルフレームと撮影動画像フレームとを照合することによって、各フレームでの姿勢を推定する。このときの姿勢の連続的な変化が、動作の時間的特徴として求められることになる。そして、時間的特徴抽出手段４４で抽出されたフレーム番号に固定した場合に、空間的特徴判定手段４３は、パラメータ変更手段２３で関節角度パラメータの値を変更したときに、当該撮影フレーム画像に対して算出されたＨＯＧの差分データに基づいて、差分データが最小となるときの関節角度パラメータの値を特定する。 For this reason, as shown in FIG. 8, the matching unit 4 </ b> B includes a difference calculation unit 41, a difference data storage unit 42, a spatial feature determination unit 43, and a temporal feature extraction unit 44.
The temporal feature extraction unit 44 generates CG using the value of the CG data (joint angle parameter) 22 read from the model sequence storage unit 25 when the frame number of the model sequence is changed with respect to the captured frame image. Based on the HOG difference data for the image, the frame number of the model sequence when the difference data is minimized is extracted. When the frames arranged in time series are continuously observed, it can be seen that the posture of the object continuously changes. This has the same meaning as the temporal change of operation. The temporal feature extraction unit 44 estimates the posture of each frame by collating a model frame created in advance with a captured moving image frame. A continuous change in posture at this time is obtained as a temporal feature of the motion. Then, when the frame number extracted by the temporal feature extraction unit 44 is fixed, the spatial feature determination unit 43 applies to the captured frame image when the parameter angle change unit 23 changes the value of the joint angle parameter. Based on the HOG difference data calculated as described above, the value of the joint angle parameter when the difference data is minimized is specified.

姿勢推定装置１Ｂの動作は、第１段階にて、図３に示したステップＳ１４〜Ｓ１６の処理を動作の時間的特徴を抽出するために置き換えて同様に行った後で、第２段階にて、図１の姿勢推定装置１のように動作の空間的特徴を判定する処理を行う点を除いて、第１実施形態と同様なので説明を省略する。 The operation of the posture estimation apparatus 1B is performed in the second stage after performing the same operations in the first stage by replacing the processes in steps S14 to S16 shown in FIG. 3 to extract temporal characteristics of the movement. Since it is the same as that of the first embodiment except that the process for determining the spatial feature of the operation is performed as in the posture estimation apparatus 1 of FIG.

姿勢推定装置１Ｂの動作の第１段階の処理結果の一例を図１０に示す。この一例は、図９（ａ）に示す人物の動作において下半身の動作について推定した結果である。図１０のグラフにおいて、横軸は撮影動画像のフレーム番号、縦軸はモデルシーケンスのフレーム番号をそれぞれ示す。時間的特徴抽出手段４４は、撮影動画像のあるフレーム番号に着目したときに、当該撮影フレーム画像および同様のＣＧフレーム画像のシルエットに基づくＨＯＧから算出された差分データが最小となるような、ＣＧフレーム画像のフレーム番号を求める。図１０に示す例では、撮影動画像のフレーム番号が「１」の場合、モデルシーケンスにおいてすべてのフレーム番号について探索した結果、モデルシーケンスにおいて最も類似したフレーム番号が「０」であったことが分かる。以下、同様である。 An example of the processing result of the first stage of the operation of the posture estimation device 1B is shown in FIG. An example of this is the result of estimation of the lower body motion in the human motion shown in FIG. In the graph of FIG. 10, the horizontal axis indicates the frame number of the captured moving image, and the vertical axis indicates the frame number of the model sequence. The temporal feature extraction means 44 is a CG that minimizes the difference data calculated from the HOG based on the silhouette of the captured frame image and the silhouette of the same CG frame image when focusing on a frame number of the captured moving image. The frame number of the frame image is obtained. In the example shown in FIG. 10, when the frame number of the captured moving image is “1”, as a result of searching for all the frame numbers in the model sequence, it can be seen that the most similar frame number in the model sequence is “0”. . The same applies hereinafter.

この例のように「蹴る」動作では、動き（運動）の方向は、一方向なので、例えば撮影動画像のフレーム番号が「８」である場合、モデルシーケンスにおいてすべてのフレーム番号について探索する必要は無く、その直前の探索で既に確定している結果を用いれば、モデルシーケンスのフレーム番号「５」およびその後方の残りのフレームと照合すればよい。 In the “kick” operation as in this example, the direction of movement (movement) is one direction. For example, when the frame number of the captured moving image is “8”, it is necessary to search for all frame numbers in the model sequence. If the result already determined in the previous search is used, the frame number “5” of the model sequence and the remaining frames after that may be compared.

また、時間的特徴抽出手段４４において、時系列的に大きくはずれたモデルフレームを誤って抽出することを防ぐために、ＤＰ（Dynamic Programming）マッチングを用いることが好ましい。 Further, it is preferable to use DP (Dynamic Programming) matching in order to prevent the temporal feature extraction unit 44 from erroneously extracting model frames that are greatly deviated in time series.

なお、時間的特徴抽出手段４４は、照合の探索結果をテーブル形式で保持していれば、図１０のようなグラフを作成する必要は必ずしもない。ただし、図１０のようなグラフを作成した場合には、傾きが小さいときには、モデルの動作に対して実際の人物の動作が遅く、逆に、傾きが大きいときには、モデルの動作に対して実際の人物の動作が速いことが分かったり、あるいは、実際の人物の動作の速度の時間変化による個人別の動作特徴が分かったりするので、グラフを作成しておくことが好ましい。 Note that the temporal feature extraction means 44 does not necessarily have to create a graph as shown in FIG. 10 as long as the collation search results are held in a table format. However, when the graph as shown in FIG. 10 is created, when the inclination is small, the actual person's movement is slower than the model's movement, and conversely, when the inclination is large, It is preferable to create a graph because it can be understood that the movement of the person is fast or the movement characteristics of each person according to the time change of the actual movement speed of the person can be understood.

このように照合の第１段階にて、時間的特徴抽出手段４４で抽出されたあるフレーム番号に固定した場合に、空間的特徴判定手段４３は、パラメータ変更手段２３で関節角度パラメータの値を変更しつつ、差分データが最小となるときの関節角度パラメータの値を特定する。実験の結果、図１０に示す例の場合に、あるフレーム番号で示される姿勢において、関節角度をさらに±３０°の範囲内で微調整した結果、ＣＧフレーム画像を、撮影フレーム画像の姿勢に合わせ込むことができた。 As described above, when the frame number extracted by the temporal feature extraction unit 44 is fixed in the first stage of collation, the spatial feature determination unit 43 changes the value of the joint angle parameter by the parameter change unit 23. However, the value of the joint angle parameter when the difference data is minimum is specified. As a result of the experiment, in the example shown in FIG. 10, in the posture indicated by a certain frame number, as a result of fine adjustment of the joint angle within a range of ± 30 °, the CG frame image is adjusted to the posture of the photographing frame image. I was able to.

第２実施形態によれば、推定する対象物の動作特徴を、時間変化の特徴（時間的特徴）と、姿勢そのものの特徴（空間的特徴）との２段階に分け、予め用意したモデルフレームとの照合を段階的に行うことにより、動作特徴の再現性が高いモーションキャプチャ手法を提供することができる。すなわち、第２実施形態によれば、撮影動画像から時間的、空間的精度の高いモーションデータを頑健に得ることができる。 According to the second embodiment, the motion feature of the target object to be estimated is divided into two stages, a time change feature (temporal feature) and a posture feature (spatial feature). By performing stepwise matching, it is possible to provide a motion capture method with high reproducibility of motion features. That is, according to the second embodiment, motion data with high temporal and spatial accuracy can be obtained robustly from a captured moving image.

以上、本発明の実施形態について説明したが、本発明は、各実施形態には限定されない。例えば、第２実施形態に係る姿勢推定装置１Ｂは、照合手段４Ｂに、空間的特徴判定手段４３と、時間的特徴抽出手段４４との両方を備えるものとしたが、このうち、時間的特徴抽出手段４４だけ備えることとしてもよい。つまり、第２実施形態に係る姿勢推定装置１Ｂが照合のために行う、第１段階と第２段階のうち、第１段階だけを行うこととしてもよい。このように構成した姿勢推定装置によれば、予め作成されたモデルシーケンスの各フレームと、撮影画像の各フレームとのタイミングを合わせることができるので、例えばＣＧキャラクタの動作をスローモーションにしたり、高速にしたりしたときに、実写のような自然の動きを演出することができる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to each embodiment. For example, the posture estimation apparatus 1B according to the second embodiment includes both the spatial feature determination unit 43 and the temporal feature extraction unit 44 in the matching unit 4B. Of these, the temporal feature extraction is performed. Only the means 44 may be provided. That is, it is good also as performing only the 1st step among the 1st step and the 2nd step which posture estimation device 1B concerning a 2nd embodiment performs for collation. According to the posture estimation apparatus configured as described above, the timing of each frame of the model sequence created in advance and each frame of the photographed image can be synchronized. When this is done, you can produce natural movements like live-action.

また、各実施形態では、ＣＧ生成手段２にパラメータ変更手段２３を備えることとしたが、本発明においてパラメータ変更手段２３は必要に応じて備えていればよく、例えば第２実施形態に係る姿勢推定装置１Ｂが照合のために行う、第１段階と第２段階のうち、第１段階だけを行う形態とした場合には、除外してもよい。 In each embodiment, the CG generation unit 2 includes the parameter changing unit 23. However, in the present invention, the parameter changing unit 23 may be provided as necessary. For example, the posture estimation according to the second embodiment is performed. In the case where only the first stage is performed among the first stage and the second stage, which is performed by the device 1B for collation, it may be excluded.

また、各実施形態では、ＣＧ生成手段２にモデルシーケンス記憶手段２５を備えることとしたが、本発明においてモデルシーケンス記憶手段２５は必要に応じて備えていればよく、例えば、撮影画像として入力される画像が１枚あるいは数枚程度である場合には、除外してもよい。 In each embodiment, the CG generation unit 2 includes the model sequence storage unit 25. However, in the present invention, the model sequence storage unit 25 may be provided as necessary. For example, the model sequence storage unit 25 is input as a captured image. If there are only one image or several images, it may be excluded.

なお、撮影画像として入力される画像が動画像の場合、仮に、推定対象である人物が一連の所定動作を行うためのモデルシーケンスが無ければ、例えば歩行中の人物の撮影画像と比較するためのＣＧ画像を、人体モデルから作成する場合、体軸に沿った直立姿勢から、数十個もの関節すべてに亘って網羅的に関節角度パラメータの値を変化させながら、それぞれに基づくＣＧ画像を１つ１つ検証してマッチングを取る必要がある。このような問題に対して、姿勢推定装置１，１Ｂは、一連の所定動作に対応したモデルシーケンスを記憶しているので、撮影画像中のオブジェクトの姿勢に近似した姿勢をとっているＣＧ画像を手動または自動的に容易に求めることができ、マッチングによる姿勢推定を迅速に行うことができる。 In addition, when the image input as a captured image is a moving image, if there is no model sequence for the person to be estimated to perform a series of predetermined operations, for example, for comparison with a captured image of a person who is walking When creating a CG image from a human body model, the joint angle parameter value is changed comprehensively over all of several tens of joints from an upright posture along the body axis, and one CG image based on each is obtained. It is necessary to verify and match one. For such a problem, since the posture estimation devices 1 and 1B store a model sequence corresponding to a series of predetermined operations, a CG image that has a posture that approximates the posture of an object in the captured image is obtained. It can be easily obtained manually or automatically, and posture estimation by matching can be performed quickly.

また、例えば、推定対象である人物が行う動作は、「蹴る」動作に限定されるものではない。また、推定対象である人物の体格は、図示したものに限定されるものではない。
さらに、推定対象部物は、人物に限らず、姿勢の変更等の各種動作を行うことができ、その動作をモデル化することができれば、例えば動物のほか、関節を有する人形、ロボット、移動体、各種機械等の人工の物体でもよい。 Further, for example, the action performed by the person who is the estimation target is not limited to the “kicking” action. Further, the physique of the person to be estimated is not limited to that shown in the figure.
Furthermore, the estimation target part is not limited to a person, and can perform various operations such as posture change. If the operation can be modeled, for example, in addition to animals, jointed dolls, robots, and moving objects Artificial objects such as various machines may be used.

また、姿勢推定装置１，１Ｂは、一般的なコンピュータを、前記した各手段として機能させるプログラムにより動作させることで実現することができる。このプログラムは、通信回線を介して提供することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 Further, the posture estimation apparatuses 1 and 1B can be realized by operating a general computer by a program that functions as each of the above-described units. This program can be provided via a communication line, or can be written on a recording medium such as a CD-ROM and distributed.

１，１Ｂ姿勢推定装置
２ＣＧ生成手段
２１ＣＧキャラクタモデル
２２ＣＧデータ（関節角度パラメータ）
２３パラメータ変更手段
２４ＣＧ画像生成手段
２５モデルシーケンス記憶手段
３フレームデータ処理手段
３１ａ，３１ｂ特定領域抽出手段
３２ａ，３２ｂ細線化手段
３３ａ，３３ｂ膨張処理手段
３４ａ，３４ｂ距離変換手段
３５ａ，３５ｂ勾配特徴量抽出手段
４照合手段
４ｂ照合手段
４１差分算出手段
４２差分データ記憶手段
４３空間的特徴判定手段
４４時間的特徴抽出手段
５画像入力手段
５１フレーム番号
５２パラメータ
５３差分データ
１０１セル
１０２ブロック 1, 1B Posture estimation device 2 CG generation means 21 CG character model 22 CG data (joint angle parameter)
23 Parameter changing unit 24 CG image generating unit 25 Model sequence storage unit 3 Frame data processing unit 31a, 31b Specific area extracting unit 32a, 32b Thinning unit 33a, 33b Expansion processing unit 34a, 34b Distance conversion unit 35a, 35b Gradient feature amount Extraction means 4 Collation means 4b Collation means 41 Difference calculation means 42 Difference data storage means 43 Spatial feature determination means 44 Temporal feature extraction means 5 Image input means 51 Frame number 52 Parameter 53 Difference data 101 Cell 102 Block

Claims

A posture estimation device that estimates, by image processing, parameters that characterize the posture or movement of the target object from a single-point still image or moving image that shows a moving image of the target object to be estimated,
The photographed image is input, and an object in the photographed image is modeled for computer graphics (CG) as an articulated object and based on a joint angle parameter used in the CG character model. Image input means for inputting a CG image generated by pseudo-drawing the object of
A specific area extracting means for extracting a binarized silhouette of the specific area of the object from the input captured image, and extracting a binarized silhouette of the specific area of the object from the input CG image;
Thinning means for thinning each extracted silhouette;
Expansion processing means for performing expansion processing on each of the thinned silhouettes;
Distance conversion means for generating a grayscale image by performing distance conversion on each expanded silhouette;
Gradient feature amount extraction means for calculating HOG (Histogram of Oriented Gradient) as the feature amount of each grayscale image;
By comparing the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image, the joint angle parameter of the object in the captured image is determined. A matching means to estimate;
A posture estimation apparatus comprising:

In order to generate the CG image to be input to the image input means,
A model sequence storage unit that stores, as a model sequence, a value of a joint angle parameter created in advance for each frame as a model for the object to be estimated to perform a series of predetermined operations;
Based on the value of the joint angle parameter read for each frame from the model sequence storage unit in correspondence with the captured frame image that is a captured image input to the image input unit for each frame, and the CG character model, CG image generation means for generating a CG frame image as a CG image for each frame,
The specific area extraction unit extracts a silhouette obtained by binarizing the specific area of the object from the shooting frame image, and extracts a silhouette obtained by binarizing the specific area of the object from the CG frame image,
2. The thinning unit, the expansion processing unit, the distance conversion unit, and the gradient feature amount extraction unit perform image processing for each frame of the captured frame image and the CG frame image. Posture estimation device.

Parameter change means for changing the value of the joint angle parameter read for each frame from the model sequence storage means for the captured frame image within a predetermined range;
The CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model,
The verification means includes
Each HOG based on the silhouette of a CG image generated using the joint angle parameter read from the model sequence storage unit or the joint angle parameter value changed by the parameter changing unit with respect to the captured frame image; Difference calculation means for calculating difference data from the HOG based on the silhouette of the captured frame image,
Spatial determination of the value of the joint angle parameter when the difference data is minimized based on the difference data of the HOG calculated for the captured frame image when the frame number of the model sequence is fixed And a feature determination means,
The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.

The verification means includes
The difference between each HOG based on the silhouette of the CG image generated using the value of the joint angle parameter read from the model sequence storage means for the captured frame image and the HOG based on the silhouette of the captured frame image A difference calculating means for calculating each data,
Based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter read from the model sequence storage means when the frame number of the model sequence is changed with respect to the captured frame image. A temporal feature extraction means for extracting the frame number of the model sequence when the difference data is minimized,
The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.

Parameter change means for changing the value of the joint angle parameter read for each frame from the model sequence storage means for the captured frame image within a predetermined range;
The CG image generation means generates the CG frame image based on the joint angle parameter read for each frame or the changed joint angle parameter value and the CG character model,
The verification means includes
Each HOG based on a silhouette of a CG image generated using a joint angle parameter read from the model sequence storage unit or a value of a joint angle parameter changed by the parameter change unit with respect to the captured frame image; Difference calculation means for calculating difference data from the HOG based on the silhouette of the captured frame image,
Based on the difference data of the HOG for the CG image generated using the value of the joint angle parameter read from the model sequence storage means when the frame number of the model sequence is changed with respect to the captured frame image. , Temporal feature extraction means for extracting the frame number of the model sequence when the difference data is minimum;
Based on the HOG difference data calculated for the captured frame image when the extracted frame number is fixed and the value of the joint angle parameter is changed by the parameter changing means, the difference data Spatial feature determination means for specifying the value of the joint angle parameter when
The posture estimation apparatus according to claim 2, wherein the frame number and the value of the joint angle parameter are output as an estimation result.

In order to estimate a parameter characterizing the posture or movement of the target object by image processing from an object shown in a captured image showing a single-point still image or moving image of the target object to be estimated,
The photographed image is input, and an object in the photographed image is selected based on a CG character model modeled for computer graphics using an object in the photographed image as an articulated object and a joint angle parameter used in the CG character model. Image input means for inputting a CG image generated by pseudo-drawing;
A specific area extracting means for extracting a binarized silhouette of the specific area of the object from the input captured image and extracting a binarized silhouette of the specific area of the object from the input CG image;
Thinning means for thinning each of the extracted silhouettes;
Expansion processing means for performing expansion processing on the thinned silhouettes;
Distance conversion means for generating a grayscale image by performing distance conversion on each expanded silhouette;
Gradient feature amount extraction means for calculating HOG as the feature amount of each grayscale image,
By comparing the HOG calculated based on the silhouette of the object in the captured image with the HOG calculated based on the silhouette of the object in the CG image, the joint angle parameter of the object in the captured image is determined. Matching means to estimate,
Posture estimation program to function as