JP2001250121A

JP2001250121A - Image recognition device and recording medium

Info

Publication number: JP2001250121A
Application number: JP2000059502A
Authority: JP
Inventors: Simon Kurippinguderu; サイモンクリッピングデル; Takayuki Ito; 崇之伊藤; Tomofumi Yamane; 智文山根
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2000-03-03
Filing date: 2000-03-03
Publication date: 2001-09-14
Anticipated expiration: 2020-03-03
Also published as: JP4092059B2

Abstract

PROBLEM TO BE SOLVED: To reduce a computation quantity without lowering recognition precision. SOLUTION: A Gabor wavelet coefficient as a feature corresponding to the direction of a face at a previously specified position such as eyes and a nose is extracted from learnt image data of plural persons whose face directions are already known, and learning data including mean feature vectors calculated by face directions and the directions of the faces and feature data extracted from the specific position of the face such as eyes and a nose of recognition object image data are used to recognize the direction of the face to be recognized. The features of the direction of the face are extracted from the specific position of the face where position variation clearly appears according to the direction of the face, so that the computation quantity of the recognition processing is greatly reduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、動画像または静止
画像に含まれる顔画像の向きを認識する画像認識装置お
よび記録媒体に関する。[0001] 1. Field of the Invention [0002] The present invention relates to an image recognition device and a recording medium for recognizing the direction of a face image included in a moving image or a still image.

【０００２】[0002]

【従来の技術】従来、この種の画像認識方法が開示され
ている文献としては、“ＦａｃｅＴｒａｃｋｉｎｇ
ａｎｄＰｏｓｅＲｅｐｒｅｓｅｎｔａｔｉｏｎ”，
ＳｔｅｐｈｅｎＭＣＫＥＮＮＡ，Ｓｈａｏｇａｎｇ
ＧＯＮＧ，ＪｏｈｎＪ．ＣＯＬＬＩＮＳ，Ｂｒｉｔｉ
ｓｈＭａｃｈｉｎｅＶｉｓｉｏｎＣｏｎｆｅｒｅ
ｎｃｅ，Ｅｄｉｎｂｕｒｇｈ，１９９６が挙げられる。
この文献は、画像中の顔の向きを表現することについて
述べ、下記の画像認識方法を提案している。2. Description of the Related Art Conventionally, a document disclosing this type of image recognition method includes "Face Tracking".
and Pose Representation ”,
Stephen MCKENNA, Shaogang
GONG, John J .; COLLINS, Briti
sh Machine Vision Confere
nce, Edinburgh, 1996.
This document describes expressing the orientation of a face in an image and proposes the following image recognition method.

【０００３】（処理１）複数人物の顔それぞれを、垂直
軸に対するいくつかの向きで示した画像集合を用意す
る。この画像集合の各画像の画素について４つの方位を
持つガボールウェーブレット計数を計測し、計数の絶対
値から特徴ベクトルを構築する。各向きごとにおいて人
物全員に対する特徴ベクトルの平均を計算する。平均の
特徴ベクトルから向きによる主成分を計算する。処理２
の計算では固有値の大きい順に最初のいくつかの主成分
のベクトルを利用する。(Process 1) An image set is prepared in which faces of a plurality of persons are shown in several directions with respect to a vertical axis. A Gabor wavelet count having four directions is measured for the pixels of each image in this image set, and a feature vector is constructed from the absolute value of the count. The average of the feature vectors for all persons in each direction is calculated. The principal component according to the direction is calculated from the average feature vector. Processing 2
Uses the vectors of the first few principal components in descending order of the eigenvalues.

【０００４】（処理２）画像認識の対象の顔画像を入力
し、入力画像の全画素で４つの方位を持つガボールウェ
ーブレットの係数を計測し、係数の絶対値から特徴ベク
トルを構築する。上記の処理１で求めた主成分ベクトル
に射影し、その結果により入力画像中の顔の垂直軸に対
する向きを表現する。(Process 2) A face image to be image-recognized is inputted, Gabor wavelet coefficients having four directions are measured for all pixels of the input image, and a feature vector is constructed from the absolute values of the coefficients. Projection is performed on the principal component vector obtained in the above processing 1, and the direction of the face in the input image with respect to the vertical axis is expressed.

【０００５】[0005]

【発明が解決しようとする課題】上記画像認識方法で
は、顔の向きを少数の変数（すなわち射影係数）により
表現することができるが上記提案された方法には次のよ
うな欠点があった。In the above-described image recognition method, the direction of the face can be represented by a small number of variables (ie, projection coefficients). However, the proposed method has the following disadvantages.

【０００６】（ａ）各学習画像と入力画像の全画素にお
いてガボールウェーブレット特徴を求め、各画像の全画
素の特徴を並べ直すことにより画像の特徴ベクトルを求
めるので、特徴ベクトルの次元数が多く、したがって計
算量が非常に多くなってしまう。(A) The Gabor wavelet feature is obtained for all pixels of each learning image and the input image, and the feature vector of the image is obtained by rearranging the features of all pixels of each image. Therefore, the amount of calculation becomes very large.

【０００７】（ｂ）同一座標を持つ画素でも学習画像ま
たは入力画像により顔上の位置が異なる。このため顔領
域の中心位置やサイズを正規化するにしても、人物によ
る顔の形状の差により顔の向きの表現に好ましくない影
響を与える。結果として向きの推定精度が減少する。(B) Even pixels having the same coordinates have different positions on the face depending on the learning image or the input image. For this reason, even if the center position and the size of the face area are normalized, the difference in the shape of the face between the persons has an unfavorable effect on the expression of the face direction. As a result, the direction estimation accuracy is reduced.

【０００８】（ｃ）学習画像や入力画像の全画素でウェ
ーブレット係数を計測するので、画像に背景があればそ
の背景が画像認識に影響を与え、顔の向きの推定精度が
減少する。(C) Since the wavelet coefficient is measured for all pixels of the learning image or the input image, if the image has a background, the background affects image recognition, and the estimation accuracy of the face direction is reduced.

【０００９】そこで、本発明の目的は、顔の向きの推定
精度を損なうことなく、画像認識の計算量を低減するこ
とができる画像認識方法および記録媒体を提供すること
にある。SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide an image recognition method and a recording medium which can reduce the amount of image recognition calculation without impairing the estimation accuracy of the face direction.

【００１０】[0010]

【課題を解決するための手段】このような目的を達成す
るために、請求項１の発明は、顔の向きが予め判明して
いる学習画像データおよび認識対象の顔の画像データか
らそれぞれ特徴を抽出し、当該抽出した特徴を使用して
認識対象の画像データの顔の向きを認識する画像認識装
置において、前記学習画像データから顔の特定位置につ
いて予め抽出された特徴および顔の向きを記憶する記憶
手段と、認識対象の画像データの前記特定位置に対応す
る位置を指定する指定手段と、当該指定された位置にお
ける特徴を前記認識対象の画像データから抽出する特徴
抽出手段とを具え、該特徴抽出手段により抽出された特
徴と前記記憶手段に記憶された顔の向きおよび特徴とを
使用して認識対象の顔の向きを認識することを特徴とす
る。In order to achieve the above object, according to the first aspect of the present invention, a feature is obtained from learning image data whose face direction is known in advance and image data of a face to be recognized. An image recognition device that extracts and uses the extracted features to recognize the orientation of the face of the image data to be recognized stores the features and the orientation of the face previously extracted for the specific position of the face from the learning image data. Storage means; designating means for designating a position corresponding to the specific position of the image data to be recognized; and feature extracting means for extracting a feature at the designated position from the image data to be recognized. The direction of the face to be recognized is recognized using the feature extracted by the extraction unit and the face direction and the feature stored in the storage unit.

【００１１】請求項２の発明は、請求項１に記載の画像
認識装置において、前記学習画像データおよび顔の向き
を入力する入力手段と、当該入力された学習画像データ
から前記特定位置における特徴を抽出して前記記憶手段
に記憶する学習画像データ用特徴抽出手段とをさらに具
えたことを特徴とする。According to a second aspect of the present invention, in the image recognition device according to the first aspect, input means for inputting the learning image data and the face direction, and a feature at the specific position is obtained from the input learning image data. A feature extraction unit for learning image data to be extracted and stored in the storage unit.

【００１２】請求項３の発明は、請求項１または請求項
２に記載の画像認識装置において、前記学習画像データ
用特徴抽出手段は、複数の学習画像データの特定位置か
らガボールウェーブレット係数を抽出し、顔の向きごと
に複数人物の平均特徴べクトルを計算し、当該計算した
平均特徴ベクトルを前記特徴となし、前記抽出手段は認
識対象の画像データの前記特定位置に対応する位置から
ガボールウェーブレット係数を抽出し、顔の向きごとに
複数人物の平均特徴べクトルを計算し、当該計算した平
均特徴ベクトルを前記特徴となすことを特徴とする。According to a third aspect of the present invention, in the image recognition apparatus according to the first or second aspect, the feature extracting means for learning image data extracts Gabor wavelet coefficients from specific positions of a plurality of learning image data. Calculate the average feature vector of a plurality of persons for each face direction, define the calculated average feature vector as the feature, and extract the Gabor wavelet coefficient from the position corresponding to the specific position in the image data to be recognized. Is extracted, an average feature vector of a plurality of persons is calculated for each face direction, and the calculated average feature vector is used as the feature.

【００１３】請求項４の発明は、請求項１〜請求項３の
いずれかに記載の画像認識装置において、前記特定位置
は目の両端、眉間、鼻頭頂、唇の両端および唇の中央部
上部のいずれかを含むことを特徴とする。According to a fourth aspect of the present invention, in the image recognition apparatus according to any one of the first to third aspects, the specific position is located at both ends of the eye, the space between the eyebrows, the top of the nose, both ends of the lip, and the upper part of the center of the lip. Or any of the following.

【００１４】請求項５の発明は、顔の向きが予め判明し
ている学習画像データおよび認識対象の顔の画像データ
からそれぞれ特徴を抽出し、当該抽出した特徴を使用し
て認識対象の画像データの顔の向きを認識する画像認識
装置で実行するプログラムを記録した記録媒体におい
て、前記プログラムは、前記学習画像データから顔の特
定位置について予め抽出された特徴および顔の向きが前
記画像認識装置内の記憶手段に予め記憶されており、認
識対象の画像データの前記特定位置に対応する位置を指
定する指定ステップと、当該指定された位置における特
徴を前記認識対象の画像データから抽出する特徴抽出ス
テップとを具え、該特徴抽出ステップにより抽出された
特徴と前記記憶手段に記憶された顔の向きおよび特徴と
を使用して認識対象の顔の向きを認識することを特徴と
する。According to a fifth aspect of the present invention, a feature is extracted from learning image data whose face direction is known in advance and image data of a face to be recognized, and the extracted image data is used by using the extracted features. In a recording medium recording a program to be executed by an image recognition device for recognizing the orientation of a face, the program includes a feature and a face orientation extracted in advance for a specific position of the face from the learning image data in the image recognition device. Specifying a position corresponding to the specific position of the image data to be recognized, which is stored in advance in the storage means, and extracting the characteristic at the specified position from the image data to be recognized. A recognition target using the feature extracted in the feature extraction step and the face direction and the feature stored in the storage means. Characterized in that it recognizes the orientation of the face.

【００１５】請求項６の発明は請求項５に記載の記録媒
体において、前記プログラムは前記学習画像データおよ
び顔の向きを入力する入力ステップと、当該入力された
学習画像データから前記特定位置における特徴を抽出し
て前記記憶手段に記憶する学習画像データ用特徴抽出ス
テップとをさらに具えたことを特徴とする。According to a sixth aspect of the present invention, in the recording medium according to the fifth aspect, the program includes an input step of inputting the learning image data and the face direction, and a feature at the specific position from the input learning image data. And extracting a feature for learning image data to be stored in the storage means.

【００１６】請求項７の発明は、請求項５または請求項
６に記載の記録媒体において、前記学習画像データ用特
徴抽出ステップでは、複数の学習画像データの特定位置
からガボールウェーブレット係数を抽出し、顔の向きご
とに複数人物の平均特徴べクトルを計算し、当該計算し
た平均特徴ベクトルを前記特徴となし、前記抽出ステッ
プでは認識対象の画像データの前記特定位置に対応する
位置からガボールウェーブレット係数を抽出し、顔の向
きごとに複数人物の平均特徴べクトルを計算し、当該計
算した平均特徴ベクトルを前記特徴となすことを特徴と
する。According to a seventh aspect of the present invention, in the recording medium according to the fifth or sixth aspect, in the learning image data feature extracting step, Gabor wavelet coefficients are extracted from specific positions of a plurality of learning image data, An average feature vector of a plurality of persons is calculated for each face direction, and the calculated average feature vector is regarded as the feature.In the extracting step, a Gabor wavelet coefficient is calculated from a position corresponding to the specific position of the image data to be recognized. It is characterized by extracting and calculating an average feature vector of a plurality of persons for each face direction, and using the calculated average feature vector as the feature.

【００１７】請求項８の発明は、請求項５〜請求項７の
いずれかに記載の記録媒体において、前記特定位置は目
の両端、眉間、鼻頭頂、唇の両端および唇の中央部上部
のいずれかを含むことを特徴とする。According to an eighth aspect of the present invention, in the recording medium according to any one of the fifth to seventh aspects, the specific position is located at both ends of the eyes, the space between the eyebrows, the top of the nose, both ends of the lips, and the upper part of the center of the lips. It is characterized by including any of them.

【００１８】[0018]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】本実施形態では以下の画像認識方法に基づ
いて、従来の画像認識方法の欠点を解消する。In the present embodiment, the disadvantages of the conventional image recognition method are eliminated based on the following image recognition method.

【００２０】（１）学習画像又は入力画像の特徴ベクト
ルとして、上記文献の場合、全画素（１２８×１２８＝
１６３８４点）でガボールウエーブレット係数を四つの
方位で計測するが、本実施形態では少数の特徴点（例え
ば後述の実施形態では目頭、目尻、口の両端等の９点）
だけでガボールウエーブレット係数を計測する。このよ
うにして、本実施形態では、顔の中の少数の特徴点のみ
を使用するので、８つの方位を使用しても、特徴ベクト
ルの次元数と計算量は本実施形態の方が圧倒的に少な
い。(1) In the case of the above-mentioned document, all pixels (128 × 128 =
At 16384 points), the Gabor wavelet coefficient is measured in four directions. In the present embodiment, a small number of characteristic points (for example, nine points such as the inner corner of the eye, the outer corner of the eye, and both ends of the mouth in the embodiment described later).
Just measure the Gabor wavelet coefficient. As described above, in the present embodiment, only a small number of feature points in the face are used. Therefore, even when eight orientations are used, the number of dimensions and the amount of calculation of the feature vector are overwhelming in this embodiment. Less.

【００２１】（２）従来法では、顔の特徴（例えば目
頭）を位置合わせずに画像を処理することにより、サイ
ズなどを正規化するとしても、人物の顔の形状の個人性
は向きの表現に好ましくない影響を及ぼして向きの推定
の精度を減少させる恐れがある。しかしながら本実施形
態では、顔の中の各特徴点毎の位置を特定することで、
特徴点はいつも顔の同じ場所（例えば目頭）に位置す
る。これにより、（ａ）学習画像の各向きに対する特徴ベクトルの平均を
取るときに異なる顔でも同じ特徴の位置が合わせられ、
平均特徴ベクトルとその主成分ベクトルは人物の顔形状
の個人性から影響を受けずに顔の向きによる変化だけを
表現することが可能になる。（ｂ）同様に、入力画像より計測した特徴ベクトルを主
成分ベクトルに射影するときに、特徴ベクトルの計測場
所は画像の同じ場所ではなく、顔の同じ場所であること
により、顔の向きの推定値は人物の顔形状の個人性から
影響を受けない。(2) In the conventional method, even if the size and the like are normalized by processing the image without aligning the facial features (for example, the inner corners of the eyes), the personality of the shape of the face of the person is represented by the expression of the direction. May adversely affect the accuracy of the orientation estimation. However, in the present embodiment, by specifying the position of each feature point in the face,
The feature points are always located at the same location on the face (for example, the inner corner of the eye). Thereby, (a) when taking the average of the feature vectors for each direction of the learning image, the position of the same feature is adjusted even in different faces,
The average feature vector and its principal component vector can express only a change due to the face direction without being affected by the personality of the person's face shape. (B) Similarly, when projecting a feature vector measured from an input image onto a principal component vector, the feature vector is measured not at the same place in the image but at the same place in the face, so that the orientation of the face is estimated. The value is not affected by the personality of the person's face shape.

【００２２】（３）従来の画像認識方法では、学習画像
や入力画像の全画素でウエーブレット係数を計測するの
で、画像に背景があればそれは結果に影響を及ぼす恐れ
がある。これに対して本実施形態では、各特徴点毎の近
傍だけを用いて特徴ベクトルを計測することにより、学
習画像や入力画像に背景があるとしてもその背景の影響
を最小化する。(3) In the conventional image recognition method, the wavelet coefficient is measured for all the pixels of the learning image or the input image, and if there is a background in the image, it may affect the result. On the other hand, in the present embodiment, the effect of the background is minimized even if the learning image or the input image has the background by measuring the feature vector using only the neighborhood of each feature point.

【００２３】図１は本発明実施形態の機能構成を示す。
図１において、学習画像集合は、複数人物の複数の顔の
画像集合であり、顔の向きが異なる画像が画像集合に含
まれている。各学習画像が示す顔の向きは予め与えられ
ている。例えば、ある学習画像は「人物ＩＤ＝ｎ、向き
＝ｐ」という形態で与えられる。本実施形態では、学習
画像集合は−４０度〜＋４０度の間で１０度ずつ向きが
異なる複数の画像を含む。（正面顔は０度と呼ぶ。）FIG. 1 shows a functional configuration of the embodiment of the present invention.
In FIG. 1, the learning image set is an image set of a plurality of faces of a plurality of persons, and images having different face directions are included in the image set. The direction of the face indicated by each learning image is given in advance. For example, a certain learning image is given in the form of “person ID = n, direction = p”. In the present embodiment, the learning image set includes a plurality of images whose directions are different by 10 degrees from -40 degrees to +40 degrees. (The front face is called 0 degree.)

【００２４】本実施形態で採取した学習画像集合の人数
は１７人であり、学習画像のサイズは１２８×１２８画
素である。正規化の詳細は後述する。また、各学習画像
毎における、複数の特徴点座標（本実施形態の場合では
９点）も与えられているものとする。特徴点の座標は手
動又は別の方法により求められるものとする。本実施形
態では学習画像を画像処理装置の表示画面に表示させ、
マウス等により、操作者が特徴点の位置を指定して、画
像処理装置側で学習画像の特徴点の位置を取得する。The number of learning image sets collected in this embodiment is 17 and the size of the learning image is 128 × 128 pixels. Details of the normalization will be described later. Also, it is assumed that a plurality of feature point coordinates (9 points in the case of the present embodiment) are given for each learning image. It is assumed that the coordinates of the feature points are obtained manually or by another method. In the present embodiment, the learning image is displayed on the display screen of the image processing device,
The operator specifies the position of the feature point using a mouse or the like, and the image processing apparatus acquires the position of the feature point of the learning image.

【００２５】本実施形態の場合、顔における各特徴点の
位置は以下のとおりである。In the case of this embodiment, the positions of each feature point on the face are as follows.

【００２６】特徴点０：顔の垂直中央線に、鼻の上部に
おける一番深い場所特徴点１：顔の垂直中央線に、上唇の真中辺における赤
色と白色の境界特徴点２：人物の右の目尻特徴点３：人物の右の目頭特徴点４：人物の左の目頭特徴点５：人物の左の目尻特徴点６：鼻先（３次元で一番手前の所）特徴点７：人物の右の口隅特徴点８：人物の左の口隅従来の特徴点が全画素の個数であるのに対し、本実施形
態では９点であり、このように圧倒的に特徴点が少ない
ことに留意されたい。与えられた学習画像集合と各学習
画像における特徴点の位置を模式的に図２に示す。Feature point 0: the vertical center line of the face, the deepest place in the upper part of the nose. Feature point 1: the vertical center line of the face, the boundary between red and white in the middle of the upper lip. Feature point 2: the right of the person. Feature point 3: Right eye corner of the person Feature point 4: Left corner of the person Feature point 5: Left eye corner of the person Feature point 6: Tip of the nose (foreground in three dimensions) Feature point 7: The person Right mouth corner Feature point 8: Left corner of person The conventional feature point is the number of all pixels, whereas in the present embodiment it is 9 points. Please note. FIG. 2 schematically shows a given set of learning images and the positions of feature points in each learning image.

【００２７】顔の向きの認識処理の前処理として原学習
画像が示す顔の領域を画像の中心に平行移動し、上述特
徴点０から特徴点１までの線の長さをほぼ５６画素に拡
大し、傾きを正面顔の場合に垂直、横顔の場合に１０度
に回転する正規化の処理（拡大、回転、平行移動）を画
像認識装置において行う。このように前処理された後の
学習画像および特徴点座標の値を使用して画像認識装置
において図１に示す処理（ハード的に記載）を実行す
る。As a pre-process of the face direction recognition process, the face region indicated by the original learning image is moved in parallel to the center of the image, and the line length from the feature point 0 to the feature point 1 is enlarged to approximately 56 pixels. Then, the image recognition apparatus performs normalization processing (enlargement, rotation, and translation) in which the tilt is rotated vertically for a front face and rotated by 10 degrees for a side face. The processing (shown in hardware) shown in FIG. 1 is executed in the image recognition device using the learning image and the value of the feature point coordinates after the preprocessing as described above.

【００２８】図１の特徴ベクトル計測部１では各特徴点
で、学習画像の近傍といくつかのガボールウェーブレッ
ト関数を畳み込み、その結果（ウェーブレット係数）の
絶対値を計算する。使用するウエーブレットの定義は以
下の文献に基づく。・ＳｉｍｏｎＣＬＩＰＰＩＮＧＤＡＬＥ，伊藤崇之，
“動画像の顔追跡・認識システム「ＦＡＶＲＥＴ」の特
徴選択実験”，電子情報通信学会ＰＲＭＵ９９−１
０９，１９９９年１１月なお、最低解像度（上記文献の（１）〜（３）数式にお
ける解像度パラメータｒ＝４）のものだけを使用する。
本実施形態では、８つのガボールウエーブレット関数を
使用しているがこれに限定されることはない。それぞれ
は２２．５度ごとに回転された母関数である。本実施形
態で使用している８方位×９特徴点＝７２係数の絶対値
を特徴ベクトルとし、一枚の学習画像を代表する。各学
習画像の特徴ベクトルのイメージを図３の上半分で示
す。The feature vector measuring unit 1 shown in FIG. 1 convolves a Gabor wavelet function in the vicinity of the learning image with some Gabor wavelet functions at each feature point, and calculates the absolute value of the result (wavelet coefficient). The definition of the wavelet used is based on the following literature.・ Simon CLIPPINGDALE, Takayuki Ito,
"Feature selection experiment of moving image face tracking / recognition system" FAVRET "", IEICE PRMU 99-1
09, November 1999 Only the lowest resolution (resolution parameter r = 4 in equations (1) to (3) in the above document) is used.
In the present embodiment, eight Gabor wavelet functions are used, but the present invention is not limited to this. Each is a generating function rotated every 22.5 degrees. The absolute value of 8 coefficients × 9 feature points = 72 coefficients used in the present embodiment is used as a feature vector, and represents one learning image. The image of the feature vector of each learning image is shown in the upper half of FIG.

【００２９】各向きの平均計算部２では各向きで人物全
員の特徴ベクトルの平均ベクトルを計算する。このプロ
セスを図３の太い矢印として示す。The average calculation unit 2 for each direction calculates the average vector of the feature vectors of all persons in each direction. This process is shown as the thick arrow in FIG.

【００３０】主成分分析部３では、平均特徴ベクトルの
向きによる主成分ベクトルを求める。この処理は一般的
によく知られており、例えば下記の参考文献で説明され
ている。・Ｐｒｅｓｓ，Ｔｅｕｋｏｌｓｋｙ，Ｖｅｔｔｅｒｌｉ
ｎｇ＆Ｆｌａｎｎｅｒｙ，“ニューメリカルレシピ
・イン・シー，Ｃ言語による数値計算のレシピ”（日本
語版），技術評論社，ＩＳＢＮ４−８７４０８−５６
０−１・田中豊・脇本和昌，“多変量統計解析法”，現代数学
社，京都（１９８３）の第２章“主成分分析法” 主成分分析部３において求められた主成分ヘクトルは平
均特徴ベクトル集合のサンプル共分散行列の単位固有ベ
クトルである。固有値の大きい順に並べ直したものを図
３の下の部分で示す。固有値が最大の二つの主成分ベク
トルｅ₀ｅ₁だけを画像処理装置内部のメモリに記憶して
おく。（ｉ）入力画像とそこにおける特徴点座標が与えられる
とする。実施形態では、下記の参考文献・特願平１１−２０６７６４号・ＳｉｍｏｎＣＬＩＰＰＩＮＧＤＡＬＥ，伊藤崇之，
“動画像の顔検出・追跡・認識への統一されたアプロー
チ”，電子情報通信学会ＰＲＭＵ９８−２００，１
９９９年１月に記載された追跡・認識の統一された方法により特徴点
座標を求める。上記の学習画像の正規化処理と同じく、
入力画像と特徴点座標があらかじめ正規化される（図４
の右上参照）。The principal component analysis unit 3 determines a principal component vector based on the direction of the average feature vector. This process is generally well known and is described, for example, in the following references:・ Press, Teukolsky, Vetterli
ng & Plannery, "Numerical Recipe in Sea, Recipe for Numerical Calculation in C Language" (Japanese version), Technical Review, ISBN 4-87408-56
0-1 ・ Yutaka Tanaka ・ Kazumasa Wakimoto, “Multivariate Statistical Analysis”, Chapter 2 “Principal Component Analysis” of Hyundai Mathematics, Kyoto (1983) The principal component vector obtained by the principal component analysis unit 3 is It is a unit eigenvector of the sample covariance matrix of the average feature vector set. Those rearranged in descending order of the eigenvalues are shown in the lower part of FIG. Only the two principal component vectors e ₀ e ₁ having the largest eigenvalues are stored in the memory inside the image processing apparatus. (I) It is assumed that an input image and feature point coordinates in the input image are given. In the embodiments, the following references are cited:-Japanese Patent Application No. 11-206764-Simon CLIPPINGDALE, Takayuki Ito,
"Unified approach to face detection, tracking and recognition of moving images", IEICE PRMU 98-200, 1
The feature point coordinates are obtained by the unified method of tracking and recognition described in January 999. As with the normalization process of the learning image above,
The input image and the feature point coordinates are normalized in advance (FIG. 4
In the upper right).

【００３１】特徴ベクトル計測部４では上述の特徴ベク
トル計測部１と同じく、入力画像を代表する特徴ベクト
ルを求める（図４の左上参照）。射影部５では入力画像
から求めた特徴ベクトルを主成分ベクトルｅ₀，ｅ₁に射
影する。すなわち、主成分ベクトルｅ₀，ｅ₁それぞれと
の内積を計算し（図４の右上参照）、射影係数ｕ．ｖを
得る。射影係数を主成分空間上の座標とする（図４の右
下参照）。The feature vector measuring unit 4 obtains a feature vector representative of the input image, similarly to the above-described feature vector measuring unit 1 (see the upper left of FIG. 4). The projection unit 5 projects the feature vector obtained from the input image to the principal component vectors e ₀ and e ₁ . That is, the inner product of each of the principal component vectors e ₀ and e ₁ is calculated (see the upper right of FIG. 4), and the projection coefficient u. Get v. The projection coefficients are set as coordinates on the principal component space (see the lower right of FIG. 4).

【００３２】ベクトル量子化ＶＱ６では射影係数ｕ，ｖ
の値より入力画像に現れる顔の向きの推定を求める。ベ
クトル量子化Ｖ０６の入出力関数の例を図５で示す。In the vector quantization VQ6, the projection coefficients u and v
Of the face appearing in the input image is obtained from the value of. FIG. 5 shows an example of the input / output function of the vector quantization V06.

【００３３】ベクトル量子化部ＶＱ６の入出力関数を設
定するには実際の顔画像から求めたデータを使う。例と
して、１７人分の実際のデータから求めた主成分空間
に、同じ１７人分の顔画像を入力画像とし、得られた射
影係数を図６で示す。図６は１７人分の射影係数の平均
を示している。このグラフにおける平均値の位置（図
中、□で示す）により各向きに対応する領域を図５のよ
うに定義することができる。数人分の入力顔画像の向き
の変化に伴い本実施形態の画像認識方法により求めた射
影係数を線として表している。そのほとんどは同じよう
な形を持つ緩やかな軌跡を見せ、それにより逆に顔の向
きを推定することが簡易となる。To set the input / output function of the vector quantization unit VQ6, data obtained from an actual face image is used. As an example, FIG. 6 shows projection coefficients obtained by using face images of the same 17 persons as input images in the principal component space obtained from actual data of 17 persons. FIG. 6 shows the average of the projection coefficients of 17 persons. A region corresponding to each direction can be defined as shown in FIG. 5 by the position of the average value in this graph (indicated by □ in the figure). Projection coefficients obtained by the image recognition method of the present embodiment along with changes in the directions of input face images for several persons are represented as lines. Most of them show a gentle trajectory having the same shape, which makes it easy to estimate the face direction.

【００３４】具体的な画像認識装置のシステム構成の一
例を図７に示す。画像認識装置としては、汎用の画像処
理装置、パソコンなどを使用することができる。FIG. 7 shows an example of a specific system configuration of an image recognition apparatus. As the image recognition device, a general-purpose image processing device, a personal computer, or the like can be used.

【００３５】画像認識装置はＣＰU７、システムメモリ
８、ハードディスクドライブ（ＨＤＤ）９、スキャナー
と接続する入出力処理部（Ｉ／Ｏ）１０、キーボード１
１、マウス１２およびディスプレイ１３等を有する。不
図示の記録媒体からプログラムが読み込まれて、ＨＤＤ
９にインストールされる。プログラムを実行するときに
は、記録媒体又はＨＤＤ９からプログラムが読まれてシ
ステムメモリ８に格納（ローディング）され、ＣＰＵ７
によって実行される。The image recognition device includes a CPU 7, a system memory 8, a hard disk drive (HDD) 9, an input / output processing unit (I / O) 10 connected to a scanner, and a keyboard 1.
1, a mouse 12, a display 13, and the like. The program is read from a recording medium (not shown) and
9 is installed. When executing the program, the program is read from the recording medium or the HDD 9 and stored (loaded) in the system memory 8.
Performed by

【００３６】図８を参照して図７の画像認識装置のシス
テム動作を説明する。The system operation of the image recognition device shown in FIG. 7 will be described with reference to FIG.

【００３７】プログラムの起動に応じて、ディスプレイ
１３にメニュー選択画面が表示される。本実施形態で
は、学習画像の登録モードと顔の向きの認識モードがメ
ニューとして用意されている。ユーザが学習画像の登録
モードをキーボード１１またはマウス１２により選択す
ると（ステップＳ１０）、選択内容が判別されて、手順
はステップＳ２０からステップＳ１００へと移行する。A menu selection screen is displayed on the display 13 in response to the activation of the program. In the present embodiment, a registration mode for learning images and a recognition mode for face orientation are prepared as menus. When the user selects the learning image registration mode with the keyboard 11 or the mouse 12 (step S10), the selection is determined, and the procedure shifts from step S20 to step S100.

【００３８】ユーザはスキャナーに入力したい複数の画
像原稿をセットし、画像の読み込みを実行させる。読み
込まれた画像がシステムメモリ８に記憶され、ディスプ
レイ１３に順次に表示される。ユーザはマウス１２を使
用して表示画面上の顔の特徴点、すなわち、目の両端や
鼻頭頂、眉間、唇の両端、唇の中央の上部を指定する。
また、顔の向きおよび画像の識別番号（ＩＤ）をキーボ
ード１１およびマウス１２により入力する。指定された
位置に対応する画像データがシステムメモリ８に記憶さ
れた入力画像の中から抽出される。抽出された画像デー
タおよび顔の向きが読み込まれた画像単位でシステムメ
モリ８内の別の領域に記憶される。The user sets a plurality of image originals to be input to the scanner, and causes the scanner to read the images. The read images are stored in the system memory 8 and sequentially displayed on the display 13. The user uses the mouse 12 to specify the feature points of the face on the display screen, that is, the both ends of the eyes, the top of the nose, the space between the eyebrows, both ends of the lips, and the upper center of the lips.
The face direction and the identification number (ID) of the image are input by the keyboard 11 and the mouse 12. Image data corresponding to the designated position is extracted from the input image stored in the system memory 8. The extracted image data and face direction are stored in another area in the system memory 8 for each read image.

【００３９】ＣＰＵ７は抽出され、記憶された画像デー
タについて同一の顔の向きの画像データ同士で集めた
後、これらの画像データを使用して特徴を計算する。こ
こで図１により説明した処理１〜処理３が行なわれ、計
算された特徴（平均特徴ベクトルの主成分）およびその
場合の顔の向きがＨＤＤ９に記憶される（ステップＳ１
００→Ｓ１１０→Ｓ１２０）。以上で学習画像の登録処
理が終了する。The CPU 7 collects image data having the same face orientation from the extracted and stored image data, and then calculates a feature using the image data. Here, processing 1 to processing 3 described with reference to FIG. 1 are performed, and the calculated feature (the main component of the average feature vector) and the direction of the face in that case are stored in the HDD 9 (step S1).
00 → S110 → S120). This completes the learning image registration process.

【００４０】ユーザが顔の向きの認識を行ないたい場合
にはメニュー画面で顔の向きの認識モードをマウス１２
等により選択する（ステップＳ１０）。この選択処理に
より手順はステップＳ１０〜Ｓ３０→Ｓ２００へと進
む。When the user wants to recognize the face direction, the user sets the face direction recognition mode on the menu screen with the mouse 12.
(Step S10). By this selection processing, the procedure proceeds to steps S10 to S30 → S200.

【００４１】ユーザは認識したい画像原稿をスキャナー
にセットしてスキャナーにより画像を読み込ませる。こ
れにより読み込まれた画像はシステムメモリ８に記憶さ
れると共にディスプレイ１３にも表示される（ステップ
Ｓ２００）。The user sets an image document to be recognized on the scanner and reads the image by the scanner. The image thus read is stored in the system memory 8 and displayed on the display 13 (step S200).

【００４２】ユーザは表示された画像の顔の特徴点、す
なわち、目の両端、鼻頭頂，眉間等学習画像の登録と同
じ位置の特徴点を指定する（ステップＳ２１０）。ＣＰ
Ｕ７は指定された位置の画像データをシステムメモリ８
から抽出し、特徴ベクトルを計算する。この処理が図１
の処理４に対応する。ＨＤＤ９に記憶された平均特徴ベ
クトルの主成分および向きと認識対象の画像から取得さ
れた特徴ベクトルを使用して画像の向きが認識される。
この処理は図１の処理５および処理６を通じて行なわれ
る。認識結果が、たとえば顔の向きの角度の形態でディ
スプレイ１３に表示される。The user designates feature points of the face of the displayed image, that is, feature points at the same position as the registration of the learning image, such as the ends of the eyes, the nose and the top of the eyebrows (step S210). CP
U7 stores the image data at the designated position in the system memory 8
And calculate the feature vector. This process is shown in FIG.
Corresponds to process 4. The orientation of the image is recognized using the principal components and the direction of the average feature vector stored in the HDD 9 and the feature vector acquired from the image to be recognized.
This processing is performed through processing 5 and processing 6 in FIG. The recognition result is displayed on the display 13 in the form of, for example, a face angle.

【００４３】以上説明したように、本実施形態によれ
ば、従来法に比べて計算量が少なく、顔の垂直軸に対す
る向きを効率的に表現し推定することができる。As described above, according to the present embodiment, the amount of calculation is smaller than that of the conventional method, and the orientation of the face with respect to the vertical axis can be efficiently expressed and estimated.

【００４４】上述の実施形態の他に次の形態を実施でき
る。１）上述の実施形態では認識対象の入力画像および学習
画像の特徴点の指定はマウスにより手動で行なったが画
像認識装置側で自動で行なうことができる。一例として
は画像の輪郭線を抽出するとともに、抽出した輪郭線の
部位を判定する。部位の判定は色又は目、鼻、口などの
顔部品の画像テンプレートを使用するとよい。色を使用
するのであれば、唇は赤成分が多く、目では黒と白の成
分が多いことを利用するとよい。また、特徴点は、目、
唇、鼻等を選択的に使用すればよく、上述の実施例に限
定されることはない。また、目、唇、鼻以外の顔の部位
を特徴点の位置（特定位置）として使用することも可能
であるが顔の向きが異なるごとに、その位置の変化が明
確になる位置が好ましい。２）上述の実施形態では学習機能、すなわち，学習画像
データから特徴を抽出する機能を画像認識装置に持たし
ているが、予め他の画像処理装置により学習画像データ
から特徴を抽出し、抽出した特徴およびその向きを携帯
用記録媒体や通信を介して、上述の画像処理装置のハー
ドディスクに記憶させるようにしてもよい。３）顔の認識を行なうためのプログラムを記録する記録
媒体としては、ＲＯＭ、ＲＡＭ等のＩＣメモリ、ハード
ディスク、ＣＤＲＯＭその他、周知の記憶装置を記憶す
ることができる。また、このプログラムは他の装置から
通信により画像認識装置にダウンロードする場合には、
他の装置内でプログラムを記憶する記憶装置が本発明の
記録媒体となる。４）上述の形態では静止画像を使用した認識処理を説明
したが、認識対象の画像や学習画像は動画像とすること
ができることは言うまでもない。動画像の場合には、動
画像を構成する複数の静止画像が処理の対象となる。５）以上述べた画像認識方法を使用して顔画像処理（検
出、追跡、領域の切り出し、人物認識等）のために顔デ
ータベースを構築することができる。下記の参考文献参
照。The following embodiment can be carried out in addition to the above embodiment. 1) In the above-described embodiment, the specification of the input image to be recognized and the feature points of the learning image is manually performed by using the mouse, but can be automatically performed by the image recognition apparatus. As an example, the contour of the image is extracted, and the site of the extracted contour is determined. The determination of the part may use a color or an image template of a face part such as an eye, a nose, and a mouth. If color is used, it is good to use the fact that the lips have many red components and the eyes have many black and white components. The feature points are eyes,
Lips, nose, etc. may be used selectively, and are not limited to the above-described embodiment. It is also possible to use a part of the face other than the eyes, lips, and nose as the position (specific position) of the feature point, but it is preferable to use a position where the change in the position becomes clear each time the face direction changes. 2) In the above-described embodiment, the image recognition device has a learning function, that is, a function of extracting a feature from the learning image data. However, a feature is previously extracted from the learning image data by another image processing device and extracted. The feature and its orientation may be stored in a hard disk of the above-described image processing apparatus via a portable recording medium or communication. 3) As a recording medium for recording a program for performing face recognition, an IC memory such as a ROM and a RAM, a hard disk, a CDROM, and other well-known storage devices can be stored. Also, when downloading this program from another device to the image recognition device by communication,
A storage device that stores a program in another device is a recording medium of the present invention. 4) In the above embodiment, the recognition processing using a still image has been described. However, it is needless to say that an image to be recognized or a learning image can be a moving image. In the case of a moving image, a plurality of still images constituting the moving image are to be processed. 5) A face database can be constructed for face image processing (detection, tracking, area segmentation, person recognition, etc.) using the image recognition method described above. See references below.

【００４５】・特願平１１−２０６７６４号・ＳｉｍｏｎＣＬＩＰＰＩＮＧＤＡＬＥ，伊藤崇之，
“動画像の顔検出・追跡・認識への統一されたアプロー
チ”，電子情報通信学会ＰＲＭＵ９８−２００，１
９９９年１月以上述べた実施形態の他に種々の変形が可能であるが、
その変形が特許請求の範囲に記載された技術思想に沿う
ものである限り、その変形は本発明の技術範囲内とな
る。・ Japanese Patent Application No. 11-206764 ・ Simon CLIPPINGDALE, Takayuki Ito,
"Unified approach to face detection, tracking and recognition of moving images", IEICE PRMU 98-200, 1
January 999 Various modifications other than the embodiment described above are possible,
As long as the modification is in accordance with the technical idea described in the claims, the modification falls within the technical scope of the present invention.

【００４６】[0046]

【発明の効果】以上、説明したように、たとえば、目，
鼻、唇など、顔の向きがことなるとその位置の変化が明
確に現れる顔の特定位置から画像の特徴を抽出するよう
にしたので、画像データお画素全てから特徴を抽出する
従来法に比べて顔の向きの認識処理にかかわる計算量を
大幅に低減し、また、認識性能が低下することはない。As described above, for example, eyes,
Since the feature of the image is extracted from the specific position of the face where the change of the face clearly appears when the direction of the face is different, such as nose, lips, etc., compared to the conventional method that extracts the feature from all the image data pixels The amount of calculation involved in the face direction recognition process is greatly reduced, and the recognition performance is not reduced.

[Brief description of the drawings]

【図１】本発明実施形態の機能構成を示すブロック図で
ある。FIG. 1 is a block diagram illustrating a functional configuration of an embodiment of the present invention.

【図２】本発明実施形態の特徴点を説明するための説明
図である。FIG. 2 is an explanatory diagram for explaining features of the embodiment of the present invention.

【図３】本発明実施形態の情報処理内容を示す説明図で
ある。FIG. 3 is an explanatory diagram showing information processing contents of the embodiment of the present invention.

【図４】本発明実施形態の情報処理内容を示す説明図で
ある。FIG. 4 is an explanatory diagram showing information processing contents of the embodiment of the present invention.

【図５】本発明実施形態の情報処理内容を示す説明図で
ある。FIG. 5 is an explanatory diagram showing information processing contents of the embodiment of the present invention.

【図６】本発明実施形態の射影係数を示す説明図であ
る。FIG. 6 is an explanatory diagram showing projection coefficients according to the embodiment of the present invention.

【図７】本発明実施形態のシステム構成を示すブロック
図である。FIG. 7 is a block diagram illustrating a system configuration according to an embodiment of the present invention.

【図８】本発明実施形態の処理手順を示すフローチャー
トである。FIG. 8 is a flowchart illustrating a processing procedure according to the embodiment of the present invention.

[Explanation of symbols]

７ＣＰＵ８システムメモリ９ＨＤＤ１０Ｉ／Ｏ１１キーボード１２マウス１３ディスプレイ 7 CPU 8 System memory 9 HDD 10 I / O 11 Keyboard 12 Mouse 13 Display

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5L096 EA13 EA15 EA16 FA32 FA38 FA67 FA81 HA09 JA05 JA11 JA22 KA04 9A001 HH21 HH23 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5L096 EA13 EA15 EA16 FA32 FA38 FA67 FA81 HA09 JA05 JA11 JA22 KA04 9A001 HH21 HH23

Claims

[Claims]

1. A feature is extracted from learning image data whose face direction is known in advance and image data of a face to be recognized, and the face direction of the image data to be recognized is determined using the extracted features. In the image recognition apparatus for recognizing, a storage unit for storing a feature and a face direction previously extracted for a specific position of a face from the learning image data, and a designation for designating a position corresponding to the specific position of the image data to be recognized Means, and feature extraction means for extracting the feature at the designated position from the image data of the recognition target, wherein the feature extracted by the feature extraction means and the orientation and feature of the face stored in the storage means are provided. An image recognition apparatus for recognizing the orientation of a face to be recognized using a computer.

2. The image recognition apparatus according to claim 1, wherein input means for inputting the learning image data and the face direction, and extracting and storing a feature at the specific position from the input learning image data. An image recognition apparatus, further comprising learning image data feature extraction means stored in the means.

3. The image recognition device according to claim 1, wherein the learning image data feature extracting unit extracts a Gabor wavelet coefficient from a specific position of a plurality of learning image data, and extracts the Gabor wavelet coefficient for each face direction. The average feature vector of a plurality of people is calculated, the calculated average feature vector is regarded as the feature, and the extracting means extracts a Gabor wavelet coefficient from a position corresponding to the specific position in the image data of the recognition target. An image recognition apparatus comprising: calculating an average feature vector of a plurality of persons for each of the directions; and calculating the calculated average feature vector as the feature.

4. The image recognition device according to claim 1, wherein the specific position includes any one of both ends of an eye, a space between eyebrows, a top of a nose, both ends of a lip, and an upper part of a center of a lip. An image recognition device characterized by the above-mentioned.

5. A feature is respectively extracted from learning image data whose face direction is known in advance and image data of a face to be recognized, and the face direction of the image data to be recognized is determined using the extracted features. In a recording medium on which a program to be executed by an image recognition device to be recognized is recorded, the program includes a feature and a face direction previously extracted for a specific position of a face from the learning image data in a storage unit in the image recognition device. A designating step of designating a position corresponding to the specific position of the image data of the recognition target, and a feature extracting step of extracting a characteristic at the designated position from the image data of the recognition target. Recognizing the orientation of the face to be recognized using the features extracted in the feature extraction step and the face orientation and features stored in the storage means Recording medium characterized and.

6. The recording medium according to claim 5, wherein the program inputs the learning image data and the face direction, and extracts the feature of the specific position from the input learning image data. A feature extraction step for learning image data stored in the storage means.

7. The learning medium according to claim 5, wherein in the learning image data feature extracting step, a Gabor wavelet coefficient is extracted from a specific position of a plurality of learning image data, and a face direction is determined for each face direction. The average feature vector of a plurality of persons is calculated, the calculated average feature vector is regarded as the feature, and the Gabor wavelet coefficient is extracted from the position corresponding to the specific position of the image data to be recognized in the extracting step, A recording medium comprising calculating an average feature vector of a plurality of persons for each direction, and using the calculated average feature vector as the feature.

8. The recording medium according to claim 5, wherein the specific position includes any one of the both ends of the eye, the space between the eyebrows, the top of the nose, both ends of the lip, and the upper part of the center of the lip. Recording medium characterized by the above-mentioned.