JPH05232908A

JPH05232908A - Instruction input device

Info

Publication number: JPH05232908A
Application number: JP4031804A
Authority: JP
Inventors: Yoshinori Kuno; 義徳久野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-02-19
Filing date: 1992-02-19
Publication date: 1993-09-10

Abstract

PURPOSE:To provide an instruction input device capable of inputting information to a computer or the like based upon the appearance of an operator. CONSTITUTION:The instruction input device is constituted of a video camera 3 for picking up the images of an operator 1, image memories 57, 75c for storing the images of the operator inputted from the camera 3, a discriminating means for inputting an image relating to operator's operation executed at optional time which is stored in the memories 57, 75c and an image relating to operator's operation executed at time different from the optional time and discriminating the operator's operation by comparing these images, a judging means for judging operator's gesture from operator's operation discriminated by the discriminating means, an input means for specifying the start of the gesture by an input based upon the prescribed operation of the operator 1, and a processing means for giving the concrete expression of the gesture discriminated by the discriminating means when the start of the gesture is specified by the input means.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は操作者の指示や意図を
計算機などの任意の装置に伝達し、操作者が容易に使用
できるようにする指示入力装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an instruction input device for transmitting an operator's instruction or intention to an arbitrary device such as a computer so that the operator can easily use the device.

【０００２】[0002]

【従来の技術】従来、計算機に各種コマンド等によって
指示を入力するにはキーボードやマウスを用いるのが一
般的である。さらに多くの情報を伝えたり、あるいはよ
り容易に指示を与えるために、操作者の視線、すなわち
操作者がどこを見ているかという情報や、身振り・手振
りなどを利用することが考えられている。視線について
はそれを求める専用の装置、例えばアイカメラがある
が、このアイカメラの場合には操作者はアイカメラを装
着し、予め整合を取るための調整を行わなければなら
ず、負担を強いられることになる。また、手のサインに
ついてもデータグローブなどの装置を用いて計算機に指
示を入力できる。いずれの場合においても、このような
装置を用いる場合は、操作者が装着及び整合を取るため
の調整に対して負担を強いられてしまう。2. Description of the Related Art Conventionally, a keyboard or a mouse is generally used to input an instruction to a computer by using various commands. In order to convey more information or to give an instruction more easily, it has been considered to use the line of sight of the operator, that is, information about where the operator is looking, and gestures and gestures. There is a dedicated device that demands the line of sight, for example, an eye camera, but in the case of this eye camera, the operator must wear the eye camera and make adjustments to achieve alignment beforehand, which is a heavy burden. Will be In addition, with regard to the hand signature, instructions can be input to the computer using a device such as a data glove. In any case, when using such a device, the operator is burdened with the adjustment for mounting and aligning.

【０００３】そのため、ビデオカメラ等の撮像装置を入
力装置に用い、操作者の表情、視線、身振り手振りなど
をその画像から認識し、計算機への指示に利用すること
も考えられている。例えば、間瀬、渡部、末永らによる
ヘッドリーダ：画像による頭部動作の実時間検出（電子
情報通信学会論文誌、Vol.J74-D-II,No.3,pp.398-406）
が有る。Therefore, it has been considered that an image pickup device such as a video camera is used as an input device to recognize an operator's facial expression, line of sight, gesturing and hand gestures from the image, and use it for instructing a computer. For example, head readers by Mase, Watanabe, and Suenaga: Real-time detection of head movements by images (The Institute of Electronics, Information and Communication Engineers, Vol.J74-D-II, No.3, pp.398-406)
There is.

【０００４】しかし、画像処理による認識は必ずしも確
実でないため、例えばある特定の表情で所定の指示を表
すと決めておいても、操作者が意図しないときの表情を
誤認識して指示であるとして入力が行われる虞がある。
また表情の認識そのものに誤りはなくとも、視線や顔の
向きでディスプレイ画面上の対象を動かす場合など、操
作者がふと、意識することなく顔を動かした場合にも指
示と解釈してしまい、操作者の円滑な指示を妨げてしま
うことがある。However, since recognition by image processing is not always reliable, even if it is decided that a predetermined instruction is expressed by a certain specific facial expression, the facial expression when the operator does not intend is erroneously recognized as the instruction. Input may occur.
Even if there is no mistake in the recognition of facial expressions themselves, even if the operator suddenly moves the face without being conscious, such as when moving the target on the display screen with the line of sight or the direction of the face, it will be interpreted as an instruction, This may obstruct smooth instructions from the operator.

【０００５】前述したように、画像処理による認識は必
ずしも確実ではなく、特に操作者の背景が複雑であった
り、照明条件が良くなかったりすると、静止画を入力し
てから認識処理する従来の方式では対象物の切り出しに
失敗することが多くなる。As described above, the recognition by image processing is not always reliable. Especially, when the background of the operator is complicated or the lighting conditions are not good, the conventional method of inputting a still image and then performing recognition processing. Then, it often fails to cut out the object.

【０００６】例えば、背景から顔を取り出す場合、顔の
部分が背景より明るいとして、画像の輝度の大きい所を
取り出す方式では、背景に明るいところや暗いところが
あったりする場合には良好に動作しない。また、背景の
明るさが変わると処理に失敗することが多くなる。For example, in the case of extracting a face from the background, the method of extracting the part of the image where the brightness is high assuming that the face part is brighter than the background does not work well when there is a bright part or a dark part in the background. Further, if the background brightness changes, the processing often fails.

【０００７】[0007]

【発明が解決しようとする課題】上述のように、従来の
装置では、操作者に負担をかけずに、確実に操作者の外
観などを指示入力に用いることはできなかった。As described above, in the conventional device, it is not possible to reliably use the appearance of the operator for inputting instructions without imposing a burden on the operator.

【０００８】この発明はこのような従来の問題点に鑑み
て成されたもので、操作者が特別な装置を装着するなど
のことなしに、確実に操作者の外観により計算機等に情
報を入力することを可能にする装置を提供することを目
的とする。The present invention has been made in view of the above-mentioned problems of the related art, and the information can be surely input to the computer or the like by the appearance of the operator without the operator wearing a special device. It is an object of the present invention to provide a device that enables the following.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
本願第１の発明は、操作者の意思表示を入力し得る第１
の入力手段と、該操作者の所定の動作に関して入力を行
う第２の入力手段と、この第２の入力手段への該操作者
の所定の動作に係る入力で前記第１の入力手段による操
作者の意思表示を具現化する処理手段とを有すること要
旨とする。In order to achieve the above object, the first invention of the present application is the first invention capable of inputting an operator's intention indication.
Input means, second input means for inputting a predetermined operation of the operator, and operation by the first input means by inputting the predetermined operation of the operator to the second input means. And a processing unit that embodies the intention of the person.

【００１０】また、本願第２の発明は、操作者を撮像す
る撮像手段と、この撮像手段から入力される操作者の画
像を記憶する記憶手段と、この記憶手段に記憶されるあ
る任意の時刻における操作者の動作に係る画像と該任意
の時刻とは異なるの時刻における操作者の動作に係る画
像とを取り込み、これらの画像の比較から当該操作者の
動作を識別する識別手段と、この識別手段で識別された
操作者の動作から当該操作者の意思表示を判別する判別
手段と、操作者の所定の動作による入力で意思表示の開
始を指示する入力手段と、この入力手段で意思表示の開
始が指示されたときには前記判別手段で判別される意思
表示を具現化する処理手段とを有することを有すること
を要旨とする。The second aspect of the present invention is an image pickup means for picking up an image of an operator, a storage means for storing an image of the operator input from the image pickup means, and an arbitrary time stored in the storage means. An image relating to the operation of the operator and an image relating to the operation of the operator at a time different from the arbitrary time, and identifying means for identifying the operation of the operator by comparing these images; The determination means for determining the operator's intention display from the operation of the operator identified by the means, the input means for instructing the start of the intention display by the operator's predetermined operation, and the input means for displaying the intention display. The gist of the present invention is to have a processing unit that embodies the intention display judged by the judgment unit when the start is instructed.

【００１１】また、本願第３の発明は、請求項１及び２
記載の指示入力装置において、前記入力手段による入力
は前記識別手段によって識別された当該操作者の意思表
示開始の指示動作であることを要旨とする。The third invention of the present application is defined by claims 1 and 2.
In the described instruction input device, the gist is that the input by the input means is an instruction operation for starting the intention display of the operator identified by the identification means.

【００１２】さらに、本願第４の発明は、請求項１、２
及び３記載の指示入力装置において、前記処理手段は操
作者の所定の動作による入力時刻と同時、若しくは該入
力時刻から所定時間遡った時点からの操作者の意思表示
を具現化することを要旨とする。Further, a fourth invention of the present application is to provide the first and second inventions.
In the instruction input device described in (3) and (3), the processing means embodies an operator's intention display at the same time as an input time by a predetermined operation of the operator, or at a time point a predetermined time back from the input time. To do.

【００１３】[0013]

【作用】上述の如く構成すれば、本願第１の発明の指示
入力装置は、第２の入力手段で操作者の所定の動作に関
して入力が行なわれると、第１の入力手段で入力された
操作者の意思表示を具現化するものである。With the above-described structure, the instruction input device according to the first aspect of the invention of the present application is such that when the second input means inputs an operator's predetermined motion, the operation input by the first input means is performed. The person's intention is embodied.

【００１４】本願第２の発明の指示入力装置は、識別手
段において、撮像手段から入力され記憶手段に記憶され
る、ある任意の時刻における操作者の動作に係る画像と
該任意の時刻とは異なるの時刻における操作者の動作に
係る画像との比較から当該操作者の動作が識別される。
また判別手段においては、この識別手段で識別された操
作者の動作から当該操作者の意思表示が判別される。さ
らに入力手段で意思表示の開始が指示されたときには、
前記判別手段で判別される意思表示が具現化されるもの
である。In the instruction input device according to the second aspect of the present invention, the image relating to the operation of the operator at a certain arbitrary time, which is input from the image pickup means and stored in the storage means in the identifying means, is different from the image. The operation of the operator is identified by comparison with the image related to the operation of the operator at the time.
Further, the discriminating means discriminates the intention of the operator from the action of the operator identified by the identifying means. Furthermore, when the input means is instructed to start the intention display,
The intention display determined by the determination means is embodied.

【００１５】本願第３の発明の指示入力装置は、入力手
段による入力を識別手段によって識別された当該操作者
の動作によって行うようにしたものである。The instruction input device of the third invention of the present application is such that the input by the input means is performed by the operation of the operator identified by the identification means.

【００１６】本願第４の発明の指示入力装置は、操作者
の所定の動作による入力時刻と同時、若しくは該入力時
刻から所定時間遡った時点からの操作者の意思表示を具
現化するようにしたものである。The instruction input device according to the fourth aspect of the present invention is configured to embody the operator's intention display at the same time as the input time by a predetermined operation of the operator or at a time point a predetermined time back from the input time. It is a thing.

【００１７】[0017]

【実施例】以下、本発明に係る一実施例を図面を参照し
て説明する。図１は本発明に係る指示入力装置の構成を
示したブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment according to the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an instruction input device according to the present invention.

【００１８】図１に示すように、撮像手段としてのビデ
オカメラ３は操作者１の動作、本実施例においては操作
者１の顔の表情の変化を捕らえるものである。As shown in FIG. 1, the video camera 3 as an image pickup means captures the movement of the operator 1, in this embodiment, the change in the facial expression of the operator 1.

【００１９】画像処理部５は、ＣＰＵ５１、Ａ／Ｄコン
バータ５３、バッファ５５及び画像メモリ５７からな
り、ＣＰＵ５１は画像処理部５全体のデータの流れ等を
制御するものであり、Ａ／Ｄコンバータ５３はビデオカ
メラ３で撮像された操作者１の動作に係る画像のＡ／Ｄ
変換を行い、入力されるアナログ信号をデジタル信号と
して出力するものである。このＡ／Ｄコンバータ５３か
ら出力されたデジタル画像信号はバッファ５５を介して
画像メモリ５７に記憶される。The image processing section 5 comprises a CPU 51, an A / D converter 53, a buffer 55 and an image memory 57. The CPU 51 controls the data flow of the image processing section 5 as a whole, and the A / D converter 53. Is an A / D of an image of the operation of the operator 1 captured by the video camera 3.
The conversion is performed and the input analog signal is output as a digital signal. The digital image signal output from the A / D converter 53 is stored in the image memory 57 via the buffer 55.

【００２０】中央処理演算部７は、ＣＰＵ７１、入力側
インタフェース（Ｉ／Ｆ）７３ａ、出力側インタフェー
ス（Ｉ／Ｆ）７３ｂ、ＲＯＭ７５ａ、ＲＡＭ７５ｂ及び
画像メモリ７５ｃからなり、ＣＰＵ７１は中央処理演算
部７及び当該指示入力装置全体のデータの流れ等を制御
するものであり、入力側Ｉ／Ｆ７３ａは赤外線センサ１
１、キーボード１３、マウス１５、マイクロフォン１７
及びフットスイッチ１９等と接続されており、これらか
ら入力される各種信号の接続を行う。また、出力側Ｉ／
Ｆ７３ｂは、ディスプレイ装置９及び音声合成回路２１
を介してスピーカ２３と接続されており、中央処理演算
部７からこれらへの信号の接続を行う。The central processing unit 7 comprises a CPU 71, an input side interface (I / F) 73a, an output side interface (I / F) 73b, a ROM 75a, a RAM 75b and an image memory 75c. The input side I / F 73a controls the data flow of the entire instruction input device.
1, keyboard 13, mouse 15, microphone 17
And a foot switch 19 and the like to connect various signals input from them. Also, the output side I /
F73b is the display device 9 and the voice synthesis circuit 21.
Is connected to the speaker 23 via the, and signals are connected to these from the central processing unit 7.

【００２１】ＲＯＭ７５ａは中央処理演算部７及び当該
指示入力装置全体の制御に係るプログラムを記憶するも
のであり、また、識別手段、判別手段及び処理手段を含
むものである。また、ＲＡＭ７５ｂは例えば入力側Ｉ／
Ｆ７３ａを介して入力される赤外線センサ１１、キーボ
ード１３及びマウス１５等から入力される各種信号等、
或いはキーボード１３によって入力され設定されるプロ
グラム等を入力するものであり、記憶手段としての画像
メモリ７５ｃは画像処理部５の画像メモリ５７に記憶さ
れる画像或いは当該中央処理演算部７で画像処理された
画像を記憶するものである。The ROM 75a stores a program relating to the control of the central processing unit 7 and the instruction input device as a whole, and also includes an identification means, a discrimination means and a processing means. The RAM 75b is, for example, an input side I /
Various signals input from the infrared sensor 11, the keyboard 13, the mouse 15 and the like input via the F73a,
Alternatively, a program or the like which is input and set by the keyboard 13 is input, and the image memory 75c as a storage means is an image stored in the image memory 57 of the image processing unit 5 or image-processed by the central processing operation unit 7. The stored images are stored.

【００２２】ディスプレイ装置９はビデオカメラ３で撮
像された操作者１の動作に係る画像や画像処理部５若し
くは中央処理演算部７で画像処理された画像等を表示す
るものである。The display device 9 displays an image of the operation of the operator 1 captured by the video camera 3, an image processed by the image processing unit 5 or the central processing unit 7, and the like.

【００２３】赤外線センサ１１は、当該システムの前に
操作者１が位置したことを検出するものであり、これに
よりシステムの動作を自動的に開始せしめることが可能
になる。The infrared sensor 11 detects that the operator 1 has been positioned in front of the system, and can thereby automatically start the operation of the system.

【００２４】キーボード１３は、当該システムに各種指
示、例えばコマンドの入力或いは注意喚起操作を入力す
るためのものであり、マウス１５、マイクロフォン１７
及びフットスイッチ１９によっても同様に注意喚起操作
等の入力を行うことができる。また、本実施例において
は、キーボード１３に後述する専用キー１３ａを設け、
使い勝手を良くしている。The keyboard 13 is for inputting various instructions, such as command input or attention calling operation, to the system. The mouse 15 and the microphone 17 are used.
Also, the foot switch 19 can be used to input an alerting operation or the like. In the present embodiment, the keyboard 13 is provided with a dedicated key 13a described later,
It is easy to use.

【００２５】音声合成回路２１は、中央処理演算部７で
生成され出力される信号にもとずいて所定の音声信号を
合成し、スピーカ２３から音声を出力するものである。The voice synthesizing circuit 21 synthesizes a predetermined voice signal based on the signal generated and output by the central processing unit 7 and outputs the voice from the speaker 23.

【００２６】スポットライト２５は、操作者１の顔と背
景との間にコントラストをつけ、画像処理を容易にする
もので、操作者１の正面よりは若干斜めの位置から操作
者１を照明するように配置される。The spotlight 25 provides contrast between the face of the operator 1 and the background to facilitate image processing, and illuminates the operator 1 from a position slightly oblique to the front of the operator 1. Is arranged as.

【００２７】キーボード１３、専用キー１３ａ、マウス
１５、マイクロフォン１７及びフットスイッチ１９は操
作者の所定の動作による入力で意思表示の開始を指示す
る入力手段及び第２の入力手段を構成するものである。The keyboard 13, the dedicated key 13a, the mouse 15, the microphone 17 and the foot switch 19 constitute an input means and a second input means for instructing the start of the intention display by an input by a predetermined operation of the operator. ..

【００２８】次に、本実施例の作用を説明する。なお、
操作者が注意喚起操作等の指示をどのような身体的表現
を用いて行うかには、いろいろの方法が可能だが、この
実施例の説明では、例として顔の特徴の一つである口の
形・動きを使用した場合について説明する。Next, the operation of this embodiment will be described. In addition,
Although various methods are possible for the operator to use what physical expression to give an instruction such as an alerting operation, in the description of this embodiment, as an example, one of the facial features, the mouth The case of using shapes and movements will be described.

【００２９】まず、操作者１がディスプレイ装置９の前
に座ると、赤外線センサ１１が人間、すなわち操作者１
の存在を検知し、中央処理演算部７を起動する。そし
て、操作者１の方に向けられたビデオカメラ３から画像
が、以下で説明する処理中で必要な時点で入力できるよ
うに準備される。画像はＡ／Ｄコンバータ５３によりア
ナログ／ディジタル変換され、バッファ５５を介して画
像メモリ５７に一旦、蓄えられる。First, when the operator 1 sits in front of the display device 9, the infrared sensor 11 is a human being, that is, the operator 1
Is detected, the central processing unit 7 is activated. Then, an image is prepared from the video camera 3 directed to the operator 1 so that the image can be input at a necessary time point in the processing described below. The image is analog-to-digital converted by the A / D converter 53 and temporarily stored in the image memory 57 via the buffer 55.

【００３０】この画像が中央処理演算部７のＲＯＭ７５
ａに格納されるソフトウェアにより処理され、顔・口部
分検出、口部分追跡及び口の動き識別等が行われる。こ
のとき、顔・口部分検出、口部分追跡、口の動き識別の
３つの処理は、それぞれ時間に沿ってどう行うかによっ
て、各種の方法が実現できる。以下、順次各処理につい
て説明する。This image is stored in the ROM 75 of the central processing unit 7.
It is processed by the software stored in a, and face / mouth part detection, mouth part tracking, mouth movement identification, and the like are performed. At this time, various methods can be realized depending on how the three processes of face / mouth part detection, mouth part tracking, and mouth movement identification are performed over time. Hereinafter, each process will be sequentially described.

【００３１】まず、図２のフローチャートを参照して顔
・口部分検出の処理について説明する。ステップＳ１
で、前述したように前記赤外線センサ１１による操作者
１の検知により撮像が開始される。この現時点ｔ_n+1の
画像とそれ以前の所定の時刻ｔ_nに撮像された画像をス
テップＳ３で取り込み、続いてステップＳ５でこの取り
込んだ画像の各画素につき、差の絶対値を求める。そし
て、ステップＳ７で２値化処理を施した後に、この差分
結果のうち値の大きい画素を抽出し、これを「１」とす
る。First, the face / mouth part detection processing will be described with reference to the flowchart of FIG. Step S1
Then, as described above, the imaging is started by the detection of the operator 1 by the infrared sensor 11. The image at the current time t _{n + 1 and} the image captured at a predetermined time t _n before that are captured in step S3, and subsequently, in step S5, the absolute value of the difference is obtained for each pixel of the captured image. Then, after the binarization process is performed in step S7, a pixel having a large value is extracted from the difference result and is set to "1".

【００３２】次に、ステップＳ９で、水平および垂直方
向の射影、（この場合水平、垂直の１ラインごとの１の
画素数になる）を求める。操作者１の顔は前記２時点
（時刻ｔ_n、時刻ｔ_n+1）の間に多少は動くので、ステ
ップＳ７における２値化処理により図３に示すような射
影部分Ｓが取り出される。その射影Ｓは水平方向の射影
を示す図、及び垂直方向の射影を示す図のようになる。
そこで、射影値の大きいところ、例えば領域Ｆを取り出
すことにより、顔を囲む長方形領域（以下、これを顔領
域Ｅ_Fという）が取り出せる（ステップＳ１１）。顔の
下部については首や肩等の動きで、射影から顔の切り出
しにくい場合がある。この場合は顔の縦横の比は概略決
まっているので、横幅の値を用いて、顔領域の下限を決
定してもよい。Next, in step S9, the projections in the horizontal and vertical directions (in this case, the number of pixels is 1 for each horizontal and vertical line) are obtained. Since the face of the operator 1 slightly moves during the two time points (time t _n , time t _{n + 1} ), the projection portion S as shown in FIG. 3 is extracted by the binarization processing in step S7. The projection S is as shown in the figure showing the horizontal projection and the figure showing the vertical projection.
Therefore, when a large projected values, for example, by taking out the area F, a rectangular region (hereinafter referred to as the face area E _F) surrounding the face out of the way (step S11). As for the lower part of the face, it may be difficult to cut out the face from the projection due to movements of the neck and shoulders. In this case, since the aspect ratio of the face is roughly determined, the width value may be used to determine the lower limit of the face area.

【００３３】以上の処理で、射影値の大きい部分が無い
場合は、ステップＳ１３で顔が検出できなかったとして
ステップＳ３に戻り、次の時点の画像を入力して処理を
繰り返す。If there is no portion with a large projection value in the above processing, it is determined that the face cannot be detected in step S13, the process returns to step S3, the image at the next time point is input, and the processing is repeated.

【００３４】続いて口領域（図４に示す領域Ｅ_M）の検
出を行なう。ステップＳ１５では、顔の位置と、その顔
における口の位置は、図４に示すようにある程度の範囲
内で位置関係が決まっているので、あらかじめ定められ
たその位置関係により口領域Ｅ_Mを推定し、口の周囲を
取り出す（ステップＳ１７）。さらにこの口領域Ｅ_Mの
中で口の位置をより正確に限定する。これにはいくつか
の方法が利用できる。Subsequently, the mouth area (area E _M shown in FIG. 4) is detected. In step S15, since the positional relationship between the position of the face and the position of the mouth on the face is determined within a certain range as shown in FIG. 4, the mouth area E _M is estimated based on the predetermined positional relationship. Then, the area around the mouth is taken out (step S17). Furthermore, the position of the mouth is more accurately limited within this mouth region E _M. There are several methods available for this.

【００３５】例えば、一番目の方法は口の周辺部分につ
いて顔の切り出しと同様に差分絶対値の画像の射影を求
めて、射影値の大きい部分を取り出すやり方である。あ
るいは、現時点の原画像を使用して、口の周辺領域を２
値化して、口の部分（濃度の大きい部分）を取り出すこ
ともできる。これは、２値化結果に連結領域のラベル付
けの処理を行い、面積の大きい領域を口として取り出す
か、差分画像に対するのと同様に２値化結果に画像の水
平・垂直方向の射影を求めて射影値の大きい領域を切り
出すことにより実現できる。あるいは、２値化をせず
に、原画像の水平・垂直の射影を求め、そこから同様に
口の周囲領域を定めることもできる。他にエッジ検出に
よる方法や、スネークス（M.Kass,A.Witkin,D.Terzopou
los:Snakes:Active contour models,Proceedings of 1s
t International Conference on Computer Vision,pp.2
69-276,1987 ）による方法を用いて口の輪郭を求めても
よい。For example, the first method is a method of obtaining the projection of the image of the absolute difference value and extracting the portion having a large projection value for the peripheral portion of the mouth similarly to the cutting of the face. Alternatively, using the original image at this point, the area around the mouth is
It is also possible to digitize and take out the mouth part (the part with high density). This is done by labeling connected areas on the binarized result and taking out a large area as a mouth, or obtaining the horizontal and vertical projections of the image on the binarized result as with the difference image. Can be realized by cutting out a region having a large projection value. Alternatively, the horizontal and vertical projections of the original image may be obtained without binarization, and the peripheral area of the mouth may be similarly determined from the projections. Other methods include edge detection and snakes (M.Kass, A.Witkin, D.Terzopou
los: Snakes: Active contour models, Proceedings of 1s
t International Conference on Computer Vision, pp.2
69-276, 1987) may be used to determine the contour of the mouth.

【００３６】もし、以上の処理で口に当たる領域が検出
できない場合はステップＳ１９でステップＳ３に戻り、
次の時点の画像に対して再度処理するようにする。If the area touching the mouth cannot be detected by the above processing, the process returns to step S3 in step S19,
Process the image at the next time point again.

【００３７】図５は口部分追跡処理のフローチャートで
ある。まず、ステップＳ２１で次時点の時刻ｔ_m+1と前
時点の時刻ｔ_mの画像をそれぞれ取り込む。そして、前
時点の時刻ｔ_mに対して、この画像で口がどこに動いた
かを求める（ステップＳ２３）。これには、発明者の他
の提案（提案番号１３Ａ９０Ｘ１９５−１）の中で用い
られている追跡法を、前時点の画像で切り出された口の
回りの位置に対して適用すればよい。あるいは、前時点
の口の回りの画像をテンプレートにして、差分の絶対値
の総和や相関係数を、今回の時点の画像の前時点の口の
位置の周辺の各点に対して求め、最も類似度の大きい位
置を求めてもよく、或いは他の、例えば前述のスネーク
スを用いて口の動きを追跡してもよい。ステップＳ２５
で追跡に成功すれば、同じ処理を繰り返す（ステップＳ
２５，２１，２３）。もし、追跡に失敗したら、通常は
ステップＳ２７に進み、顔・口部分検出処理に戻るよう
にしておく。FIG. 5 is a flowchart of the mouth part tracking process. First, capture each image at time t _m prior to the time the time t _{m + 1} of the next point at step S21. Then, with respect to the time t _m at the previous time, it is determined where the mouth has moved in this image (step S23). For this, the tracking method used in another proposal of the inventor (proposition number 13A90X195-1) may be applied to the position around the mouth cut out in the image at the previous time point. Alternatively, using the image around the mouth at the previous time as a template, the sum of absolute differences and the correlation coefficient are obtained for each point around the mouth position at the previous time in the image at this time, and Positions with a high degree of similarity may be determined, or other movements of the mouth may be tracked, for example using the snakes described above. Step S25
If the tracking is successful in, the same process is repeated (step S
25, 21, 23). If the tracking fails, the process normally proceeds to step S27 to return to the face / mouth part detection process.

【００３８】このような失敗は、操作者が口の形を大き
く変えた場合に起こりやすい。特に、追跡に相関係数な
どの類似度指標を用いた場合や口の形が変化した場合
は、失敗が多くなる。最初の方法やスネークスによる方
法では、そのような口の形の変化に対しての失敗は少な
い。Such a failure is likely to occur when the operator greatly changes the shape of the mouth. In particular, when a similarity index such as a correlation coefficient is used for tracking or when the shape of the mouth changes, the number of failures increases. The first method and the Snakes method are less susceptible to such changes in mouth shape.

【００３９】図６に口の動き識別処理のフローチャート
を示す。この処理では前後する２時点、例えば前述した
時刻ｔ_mと時刻ｔ_m+1における画像を用いる。その画像
入力の時点の前後関係等については後で述べる。ここで
は、２時点の画像が与えられたものとして処理法を説明
する。なお、この口の動き識別処理部分は３時点の画像
を用いる方法でも同様に実現できる。これについても後
で説明する。FIG. 6 shows a flowchart of the mouth movement identification processing. In this process, images at two points before and after, for example, the above-mentioned time t _m and time t _{m + 1} are used. The context and the like at the time of image input will be described later. Here, the processing method will be described assuming that the images at the two time points are given. It should be noted that this mouth movement identification processing portion can be similarly realized by a method using images at three time points. This will also be described later.

【００４０】この図６に示す口の動き識別処理により操
作者の意思が最終的に解釈されるため、ここでは簡単で
確実な処理を用いる。Since the intention of the operator is finally interpreted by the mouth movement identification process shown in FIG. 6, a simple and reliable process is used here.

【００４１】まず、口による意思伝達として、通常は口
の開閉による動きと、口の左右方向及び上下方向への動
きを使用することができる。但し、口の上下左右の動き
は、実際には口が動くわけではなく顔全体が動くため
に、口の位置が変わるものである。First, as the communication with the mouth, usually, the movement by opening and closing the mouth and the movement of the mouth in the horizontal direction and the vertical direction can be used. However, the up, down, left, and right movements of the mouth change the position of the mouth because the mouth does not actually move but the entire face moves.

【００４２】始めに、ステップＳ３１で追跡処理で求め
られている口周辺の領域について、２時点、例えば前述
した時刻ｔ_mと時刻ｔ_m+1における画像間の差分を取
る。ここでは、ディジタル画像の中で明るい部分は大き
な数値、暗い部分は小さい数値を取るものとする（ここ
で逆にして明るい部分は小さな数値、暗い部分は大きい
数値としてもよく、その場合は以下の説明で符号が逆に
なる）。以下、このステップＳ３１で求められる差分は
動作をした後の時点の画像から前の時点の画像を引いた
ものとして説明する。First, with respect to the area around the mouth obtained by the tracking processing in step S31, the difference between the images at two time points, for example, the time t _m and the time t _{m + 1} is calculated. Here, it is assumed that the bright part in the digital image has a large numerical value, and the dark part has a small numerical value (in reverse, the bright part may have a small numerical value, and the dark part may have a large numerical value. The signs are reversed in the description). Hereinafter, the difference obtained in step S31 will be described as an image obtained by subtracting the image at the previous time point from the image at the time point after the operation.

【００４３】図７は、このようにした場合の口の開閉、
口の上下左右の運動の際の差分値のパタンを模式的に示
したものである。図７においては、実線が運動前、破線
が運動後の形を示す。差分結果の正負の概略を図は示し
ている。従って、処理としてはこれらのパタンを識別す
ればよい。そのために、差分結果から正と負それぞれ絶
対値が一定のしきい値以上の部分を２値化により取り出
す（ステップＳ３３）。そして、それぞれにつき、顔の
検出で用いたように、水平と垂直方向の射影を求める
（ステップＳ３５）。２値化をせずに差分結果を正負に
わけ、それぞれについてそのまま、あるいは絶対値が一
定値以上のものを選んで射影をとってもよい。射影の結
果は、模式的に示すと、口の各種の動きに対して図８乃
至図１０のようになる。この図でわかるように水平・垂
直の射影の結果の中の山の存在の組み合わせと山の幅と
位置関係で、口の動きが識別できる。そこで、それぞれ
の射影結果について山の存在・幅・位置を調べる（ステ
ップＳ３７）。そして、図１１に示すような判定テーブ
ルで口の動きを識別する（ステップＳ３９）。FIG. 7 shows opening and closing of the mouth in this case.
It is a diagram schematically showing a pattern of difference values during movements of the upper, lower, left and right sides of the mouth. In FIG. 7, the solid line shows the shape before exercise and the broken line shows the shape after exercise. The figure shows the outline of positive and negative difference results. Therefore, it is sufficient to identify these patterns as a process. For this reason, the positive and negative absolute values of the absolute values are equal to or larger than a certain threshold value are extracted by binarization (step S33). Then, for each of them, the projections in the horizontal and vertical directions are obtained as used in the face detection (step S35). The difference result may be divided into positive and negative without binarization, and the projection may be performed as it is or by selecting one having an absolute value of a certain value or more. The projection results are schematically shown in FIGS. 8 to 10 for various movements of the mouth. As can be seen in this figure, the movement of the mouth can be identified by the combination of the presence of mountains in the horizontal and vertical projection results, the width of the mountains, and the positional relationship. Therefore, the presence / width / position of the mountain is checked for each projection result (step S37). Then, the movement of the mouth is identified by the determination table as shown in FIG. 11 (step S39).

【００４４】次に、以上の処理を組み合わせ、操作者の
注意喚起操作を加えて、本発明を実施する方法について
述べる。図１２乃至図１８は処理を時間的に組み合わせ
る方法の例を示したものである。これらの図の中で、中
間の長さの垂直線分は顔・口部分検出の処理を行う時点
を示す。線は１本だが、この線で示した時点と次の時点
の画像を用いて処理を行う。短い線分は追跡処理を行っ
ていることを示す。線分で示される時刻の画像とその前
の時点に画像を用い、現時点の口の位置を追跡する。三
角と四角のしるしは口の動き識別処理で用いる２時点の
画像の入力時点を示す。Next, a method for implementing the present invention by combining the above processes and adding an operator's attention operation will be described. 12 to 18 show examples of methods of temporally combining processes. In these figures, a vertical line segment with an intermediate length indicates the time when the face / mouth part detection processing is performed. Although there is one line, processing is performed using the images at the time indicated by this line and the next time. A short line segment indicates that the tracking process is being performed. The image of the time indicated by the line segment and the image before the time are used to track the current mouth position. The triangles and squares indicate the input points of the images at the two points used in the mouth movement identification process.

【００４５】長い線分は注意喚起操作の起こった時点を
示す。操作者は、自分の意思を伝えたいことを、この操
作により示す。これはキーボード１３により、ある特定
のキーあるいはキーの組み合わせを押すことにより行
う。通常のキー（の組み合わせ）とは別に専用キー１３
ａを設けてもよい。他に、マウス１５のある定まった押
し方で行っても、あるいはフットスイッチ９を付加し
て、その操作により行ってもよい。注意喚起の操作によ
り、指示の意思を伝えるとともに、その時点の画像を入
力し、それと他の時点の画像の間の処理により、背景等
の周囲環境の変化に強い処理が実現される。The long line segment indicates the time when the alerting operation occurs. By this operation, the operator indicates that he wants to convey his intention. This is done by pressing a particular key or combination of keys on the keyboard 13. Dedicated key 13 apart from normal key (combination)
a may be provided. Alternatively, the mouse 15 may be pressed by a certain fixed method, or the foot switch 9 may be added and operated. By the operation of alerting, the intention of the instruction is transmitted, the image at that time is input, and the processing between the image at that time and the image at another time realizes processing that is resistant to changes in the surrounding environment such as the background.

【００４６】注意喚起操作は、上に述べたようにキーボ
ード１３を押すなどで行われるが、キーボード１３など
を押した後、直ぐに離す必要はなく、しばらく押し続け
てもよいように入力制御部１１で入力を制御してもよい
（通常は、キーボード１３は押した時点だけ入力され
る、このような形になっている）。これは、注意喚起操
作をしてからある意思表示の動作を操作者がするが、そ
の動作と同時にキーボード１３を離す動作をするのは、
操作者にとって使いにくい場合があるからである。The alerting operation is performed by pressing the keyboard 13 or the like as described above, but it is not necessary to immediately release the keyboard 13 after pressing the keyboard 13 or the like, and the input control unit 11 may be kept pressed for a while. The input may be controlled with (normally, the keyboard 13 is input only at the time of pressing). This is because the operator performs a certain intention operation after performing the alerting operation, but the operation of releasing the keyboard 13 at the same time as the operation is
This is because it may be difficult for the operator to use.

【００４７】図１２は基本的な実施法である。顔・口検
出が成功すると追跡処理が始まる。追跡に失敗すると再
び顔・口検出が起動される。あるいは検出に失敗しなく
ても、時々顔・口検出を行い、追跡結果を修正してもよ
い。そして注意喚起操作を行うとその時点（三角印）と
次の適当な時間後（四角印）の画像を入力して、口の動
き識別処理を行う。この場合操作者は、注意喚起操作を
行ってから、意思伝達の動作を行う。FIG. 12 shows a basic implementation method. If the face / mouth detection is successful, the tracking process starts. If tracking fails, face / mouth detection is activated again. Alternatively, even if the detection does not fail, face / mouth detection may be performed from time to time to correct the tracking result. When the attention operation is performed, the images at that time (triangle mark) and the next appropriate time (square mark) are input to perform the mouth movement identification process. In this case, the operator performs a communication operation after performing the alerting operation.

【００４８】図１３は図１２と同様であるが、動きの識
別の処理に注意喚起操作の直前（あるいは一定時間前）
の時点の追跡に用いた画像と注意喚起操作の時点の画像
の２枚の画像を用いる。この場合は操作者は意思表示の
動作の完了とともに注意喚起操作を行う。完全に動作が
完了してから注意喚起操作を行うという形でもよいが、
操作者がある意思表示をしようと思うと同時に注意喚起
操作の動作にも入ろうと思い、前者の実際の動作を開始
すると、その直後ほぼ同時に後者の動作を開始するよう
にしてもよい。実際には注意喚起操作から画像入力には
少しの時間遅れがあり、これを適当に調節して、使いや
すいタイミングを設定することができる。この点は、こ
の実施例の場合に限らず、この発明の他の実施例にも適
用することができる。すなわち、注意喚起で入力される
画像については、注意喚起操作から適当な時間遅れをも
って入力されてもよい。FIG. 13 is the same as FIG. 12, but just before the alerting operation (or before a certain period of time) in the process of identifying the movement.
Two images are used, an image used for tracking at the point of time and an image at the time of the alerting operation. In this case, the operator performs an alerting operation upon completion of the intention indicating operation. It is also possible to perform the alert operation after the operation is completely completed,
It is also possible that the operator intends to make an intention display at the same time that he / she wants to enter the operation of the alerting operation, and when the former actual operation is started, the latter operation is started almost immediately thereafter. Actually, there is a slight time delay from the alerting operation to the image input, and this can be adjusted appropriately to set a timing that is easy to use. This point can be applied not only to this embodiment but also to other embodiments of the present invention. That is, the image input by the alert may be input with an appropriate time delay from the alert operation.

【００４９】図１４は追跡の部分を省略した例である。
この場合、注意喚起操作の時点の画像と、その後のある
時点（図中の丸印）で顔・口部分を求め、さらにその後
のある時点の画像と、前のいずれかの時点の画像を用い
て、口の動作識別を行う。FIG. 14 shows an example in which the tracking part is omitted.
In this case, the image at the time of the alerting operation and the face / mouth portion at a certain time after that (circle in the figure) is obtained, and the image at the certain time after that and the image at any one of the previous time are used. Then, the movement of the mouth is identified.

【００５０】さらに簡略化した方法として、顔・口検出
と口の動作識別に同一時点の画像を用いるようにしても
よい。この場合は両方の処理とも、注意喚起操作時点の
画像とその後のある時点の画像を用いて処理を行うこと
になる。口の動きでなく、顔全体の動きで意思を伝える
場合には、この方法で十分である。顔の動きで意思を伝
える方法の例については後で述べる。As a further simplified method, images at the same time may be used for face / mouth detection and mouth movement identification. In this case, both processes are performed using the image at the time of the alerting operation and the image at a certain point after that. This method is sufficient when the intention is to be communicated by the movement of the whole face rather than the movement of the mouth. An example of how to convey your intentions by moving your face will be described later.

【００５１】図１５も追跡の部分を省略した例である。
この例では、画像の入力だけは図中のＸ印で示すよう
に、たえず適当な間隔で行っておく。そして、直前の適
当な枚数だけメモリ５７に蓄えておくようにしておく。
注意喚起操作があったら、その前の適当な時間の蓄えら
れた画像（図中の丸印）と注意喚起操作の時点の画像を
用いて、顔・口検出を行い、さらに注意喚起操作時点と
その次の時点の画像を用いて、口の動き識別を行う。こ
の方法では図１４の方法に比べ、適切な処理を行うのに
必要な時間間隔を自由に設定できる。この方法と図１３
の方法の合成したものとして、四角印の入力時点を図中
の矢印のように注意喚起操作の前にもってきて、図１３
の場合のような操作を行う方法もある。また、図１４の
説明の後半で述べたように、顔・口検出処理を省略する
方法もある。この場合は図１３の場合のような操作法に
なる。FIG. 15 also shows an example in which the tracking part is omitted.
In this example, only the input of images is always performed at appropriate intervals as indicated by the X mark in the figure. Then, an appropriate number of sheets just before are stored in the memory 57.
If there is an attention operation, the face and mouth are detected using the image (circle in the figure) that has been stored for an appropriate time before that and the image at the time of the attention operation, and the The movement of the mouth is identified using the image at the next time point. In this method, the time interval necessary for performing appropriate processing can be freely set as compared with the method of FIG. This method and FIG.
As a combination of the above method, the input time of the square mark is brought before the alerting operation as shown by the arrow in the figure, and
There is also a method of performing the operation as in the case of. Also, as described in the latter half of the description of FIG. 14, there is a method of omitting the face / mouth detection process. In this case, the operation method is as in the case of FIG.

【００５２】図１６は追跡は行わないが、顔・口検出は
適当な時間ごとに行っておく方法である。図１４の場合
に比べて、顔・口部分検出処理を注意喚起操作後に行う
必要がないので、口の動き識別の処理に直ぐ入れ、早い
応答が可能である。FIG. 16 shows a method in which the tracking is not performed, but the face / mouth detection is performed at appropriate intervals. Compared with the case of FIG. 14, since it is not necessary to perform the face / mouth part detection process after the alerting operation, it is possible to immediately enter the process of identifying the movement of the mouth and provide a quick response.

【００５３】図１７は図１５と図１６の場合を合わせた
もので、追跡は行わないが、顔・口検出の間にも画像入
力はしておき、四角印と三角印の画像を用い口の動き識
別を行い、図１３のような操作をするものである。FIG. 17 is a combination of the cases of FIG. 15 and FIG. 16. Although tracking is not performed, an image is input even during face / mouth detection, and a square mark and a triangle mark image are used. The motion is identified and the operation shown in FIG. 13 is performed.

【００５４】図１８は図１９のように操作者が口（顔）
を上下左右に動かすことによりディスプレイ９上のアイ
コンを動かしたり、マルチウィンドウのウィンドウの移
動や、その他のポインティングを行うような場合に使用
される方法である。まず注意喚起操作が起こると、その
時点と次の時点の画像で顔・口の位置検出を行い、その
後、口の追跡処理が実行され、その動きに応じてディス
プレイ画面上の操作対象が動く。操作者が満足する位置
に移動などができたら、操作者は再び注意喚起操作を行
う。この操作の起こった時点の結果が、最終的な移動等
の結果として採用される。注意喚起操作としては毎回キ
ーボード１３を押すような操作をしてもよいが、最初の
操作でキーボートを押し、それを押し続け、２回目の注
意喚起操作は、それを離すことにより行うように入力側
インタフェース７３ａを設定しておいてもよい。また、
顔・口検出処理を省略する実施例も可能である。その場
合は丸印の画像入力は不要である。In FIG. 18, as shown in FIG. 19, the operator touches the mouth (face).
This is a method used when moving the icon on the display 9 by moving up, down, left or right, moving a window of a multi-window, or performing other pointing. First, when a reminder operation occurs, the position of the face / mouth is detected from the images at that time point and the next time point, then the mouth tracking process is executed, and the operation target on the display screen moves according to the movement. When the operator can move to a position where he is satisfied, the operator again performs the alerting operation. The result at the time when this operation occurs is adopted as the result of the final movement or the like. As a reminder operation, the keyboard 13 may be operated every time, but the keyboard is pressed in the first operation and kept pressed, and the second reminder operation is input by releasing it. The side interface 73a may be set in advance. Also,
An embodiment in which the face / mouth detection process is omitted is also possible. In that case, it is not necessary to input the circled image.

【００５５】以上が本発明の基本的な実施例であるが、
本発明は以下に述べるような方法をその構成手段の実施
法としてもよい。例えば操作者の意思伝達に顔の大きな
動きを用いる場合は、図２０にフローチャートを示す方
法を用いてもよい。まず、ステップＳ４１で注意喚起操
作の時点とその後（図１７のような方法を用いる場合は
前の時点でも可、この場合は図１５の場合のような使用
法になる）の２枚の画像の差分をとる。以下、図２に示
したステップＳ３乃至ステップＳ１１と同様にして、顔
領域を決定する（ステップＳ４１乃至ステップＳ４
７）。そしてステップＳ４９で顔領域の縦と横の比を求
め、あらかじめ定めておいた値より、横の方に長いなら
顔を横に動かした、縦の方に長いなら縦に動かしたと判
定する。前に述べたように、顔の縦の領域が求めにくい
場合は、横に動いたら横、そうでなければ縦と判定する
ようにしてもよい。The above is the basic embodiment of the present invention.
In the present invention, the method described below may be used as the method of implementing the constituent means. For example, when a large movement of the face is used to convey the operator's intention, the method shown in the flowchart of FIG. 20 may be used. First, in step S41, the two images at the time of the alerting operation and after that (when the method of FIG. 17 is used, the previous time is also possible, in this case, the usage is as in the case of FIG. 15) Take the difference. Hereinafter, the face area is determined in the same manner as steps S3 to S11 shown in FIG. 2 (steps S41 to S4).
7). Then, in step S49, the vertical to horizontal ratio of the face area is obtained, and it is determined that the face is moved horizontally if it is longer in the horizontal direction and vertically if it is longer than the predetermined value. As described above, when it is difficult to find the vertical region of the face, it may be determined to be horizontal if the face moves horizontally, and otherwise to be vertical.

【００５６】顔を動かす方法で、顔を縦に何回か振って
うなずいたり、横に何回か振って否定を表したりするよ
うな場合は、２枚の画像ではなくて、何枚かの画像を注
意喚起操作後（あるいは前）入力して、連続時点あるい
は他の適当な組み合わせの画像間で差分し２値化した結
果を加算し、顔領域を決定してもよい。このようにすれ
ば、このような動作を用いる場合の意思の判定がより確
実になる。In the method of moving the face, if the face is swayed vertically a few times to nod or it is swayed a few times horizontally to express the negation, some images are used instead of two images. It is also possible to input the image after (or before) the attention operation and add the binarized results obtained by subtracting the images at successive time points or another appropriate combination to determine the face area. By doing so, the determination of the intention when using such an operation becomes more reliable.

【００５７】次に口の動き識別の処理の別の実施例につ
いて述べる。この方法では、図２１のように口の領域の
中に、さらに小さな領域を設ける。具体的には口の中央
領域ｅ_C、上側領域ｅ_U、右側領域ｅ_R、左側領域
ｅ_L、下側領域ｅ_Dを設ける。図２２に示すフローチャ
ートにおいて、まず各小領域ｅについて２時点の画像の
差分を求める（ステップＳ５１）。そして、小領域ｅご
とに、小領域ｅの全画素に対する差分結果の総和、ある
いは適当なしきい値より絶対値が大きい正の部分の画素
数と負の部分の画素数を求め、両者の差を計算する（ス
テップＳ５３）。その結果に対して図２３の判定表を利
用することにより、口の動き識別を行う（ステップＳ５
５）。図２３では＋は領域の和が正で絶対値が一定値以
上、あるいは正の画素数が負の画素数より一定値以上多
いことを示す。−は逆である。斜線は＋や−を付したも
のより絶対値が小さいものである。Next, another embodiment of the mouth movement identifying process will be described. In this method, a smaller area is provided in the mouth area as shown in FIG. Specifically, a central area e _{C of the} mouth, an upper area e _U , a right area e _R , a left area e _L , and a lower area e _D are provided. In the flowchart shown in FIG. 22, first, for each small area e, the difference between the images at the two points is obtained (step S51). Then, for each small area e, the sum of the difference results for all the pixels in the small area e, or the number of pixels in the positive part and the number of pixels in the negative part whose absolute value is larger than an appropriate threshold value is obtained, and the difference between the two is calculated. Calculate (step S53). By using the determination table of FIG. 23 for the result, the mouth movement is identified (step S5).
5). In FIG. 23, + indicates that the sum of the areas is positive and the absolute value is a certain value or more, or the number of positive pixels is more than the negative pixel number by a certain value or more. -Is the opposite. The shaded lines have smaller absolute values than those with + or −.

【００５８】これまでの実施例では基本的には２枚の画
像を用いていたが、以下のように３枚の画像を用いる方
法もある。この場合も、画像２枚の場合と同様にいろい
ろな時点で画像を入力する方法が実施できるが、ここで
は代表的な例を用いて、顔の動き検出を例に説明する。In the above embodiments, basically two images were used, but there is also a method of using three images as follows. In this case as well, a method of inputting images at various points in time can be implemented as in the case of two images, but here, a representative example will be used to describe face detection as an example.

【００５９】ここでは、図２４のタイミングチャートに
示すように、画像は絶えず入力されており（図中の画像
入力時点Ｘ印）、注意喚起操作が三角印の時点で起こる
とする。本実施例における顔の動き検出処理には、注意
喚起操作の時点の適当な時間前の画像（図中の丸印）、
注意喚起操作の時点の画像（図中の三角印）、それにそ
の後の適当な時間の画像を用いる（図中の四角印）。Here, as shown in the timing chart of FIG. 24, it is assumed that the image is constantly input (mark X at the time of image input in the figure), and the alerting operation occurs at the time of the triangle. In the face motion detection processing in the present embodiment, an image (circled in the figure) at an appropriate time before the alerting operation,
An image at the time of the alerting operation (triangle mark in the figure) and an image at an appropriate time after that are used (square mark in the figure).

【００６０】また、操作者１は注意喚起操作の後、意思
表示動作を行うものとする。さらに、この例では意思表
示動作は顔を左右に動かす（左右に回す）、上下に動か
す（見上げる、うなずく）というような動作とする。Further, it is assumed that the operator 1 performs an intention indicating operation after the attention calling operation. Further, in this example, the intention display operation is an operation such as moving the face left and right (turning left and right) or moving up and down (looking up, nodding).

【００６１】処理は始めに、三角印と丸印の差分画像を
求め（ステップＳ６１）、そこから図２０のステップＳ
４３乃至ステップＳ４７と同様にして顔の領域を求める
（ステップＳ６３）。注意喚起操作の前なので、顔はあ
まり動いていないので、この処理では差分により顔全体
の輪郭がでる。次に四角印と三角印の時点の画像の差分
を求め（ステップＳ６５）、これも図２０のステップＳ
４３乃至ステップＳ４７と同様にして、顔領域を求める
（ステップＳ６７）。そして、顔領域の大きさについて
ステップＳ６７の結果はステップＳ６３の結果に対し
て、縦方向と横方向のどちらにより大きくなったかを求
める（ステップＳ６９）。大きくなった方向に顔が動か
されたと判定する。この方法では、動作前の大きさと動
作後の大きさの比較を用いるので、正確さが向上する。
また、この場合は近い時点の比較なので、顔の画像上で
の位置の絶対値の変化で動きを判定することもできる。
顔領域の右端、左端、上端、下端の変化した方向に動い
たと判定する。下端は求めにくい場合があるが、そのよ
うな場合は上端がどちらに動いたかで判定できる。In the process, first, a difference image between a triangle mark and a circle mark is obtained (step S61), and from there, step S of FIG.
Similar to steps 43 to S47, the face area is obtained (step S63). Since the face has not moved much since it is before the attention operation, the contour of the whole face is obtained by the difference in this process. Next, the difference between the images at the time of the square mark and the time of the triangle mark is obtained (step S65), and this is also the step S of FIG.
Similar to steps 43 to S47, the face area is obtained (step S67). Then, regarding the size of the face area, it is determined whether the result of step S67 is larger in the vertical direction or the horizontal direction than the result of step S63 (step S69). It is determined that the face has been moved in the direction of increasing the size. This method uses the comparison of the size before the movement and the size after the movement, so that the accuracy is improved.
Further, in this case, since the comparison is made at a close time point, it is possible to judge the movement by the change of the absolute value of the position of the face on the image.
It is determined that the face area has moved in the changed directions of the right edge, the left edge, the upper edge, and the lower edge. The lower end may be difficult to obtain, but in such a case, it can be determined by which direction the upper end has moved.

【００６２】さらに円滑な、いわゆるヒューマンインタ
ーフェイスを実現するために本発明を適用することもで
きる。以下、図２６を参照してこのような装置構成を説
明する。The present invention can be applied to realize a smoother so-called human interface. Hereinafter, such a device configuration will be described with reference to FIG.

【００６３】例えば、図２６（ａ）に示す中央処理演算
部７で操作者１に対して所定のメッセージをディスプレ
イ装置９に表示出力し、或いは音声合成装置２１で作成
された音声をスピーカ２３を介して出力する等して、質
問を行ない操作者１の返答を要求する場合、中央処理演
算部７は返答要求後の短時間の後に操作者１から反応が
あることが期待できる。そこで質問を出力したことを注
意喚起操作の時点と見なして、続いて意思表示の認識が
可能な状態となり、待機状態に入る。従って、この場合
には操作者１は特に注意喚起操作をする必要はない。For example, the central processing unit 7 shown in FIG. 26A outputs a predetermined message to the operator 1 on the display device 9 or outputs the voice produced by the voice synthesizer 21 to the speaker 23. When asking a question and requesting a reply from the operator 1 by outputting it via, for example, the central processing unit 7 can be expected to have a reaction from the operator 1 shortly after the reply request. Therefore, the output of the question is regarded as the time of the alerting operation, and subsequently, the state of intention is recognized and the standby state is entered. Therefore, in this case, the operator 1 does not need to particularly perform the alerting operation.

【００６４】また図２６（ｂ）に示すように、中央処理
演算部７が特に明確な返答要求を行なわない場合など
は、注意喚起操作を操作者１が行うようにする。Further, as shown in FIG. 26B, when the central processing unit 7 does not make a particularly clear reply request, the operator 1 is made to perform a caution operation.

【００６５】このようにすれば、操作者１は聞かれれば
そのまま返答し、何か操作者１から特に言いたいとき、
すなわち入力を行ないたいときには注意を引いて確実に
伝達するという形になり、操作者１における操作感が向
上する。In this way, if the operator 1 is asked to answer, and if the operator 1 wants to say something,
That is, when it is desired to make an input, the user's attention is paid and the information is surely transmitted, which improves the operational feeling of the operator 1.

【００６６】また上記実施例では、注意喚起操作として
キーボード１３への入力等による操作を用いたが、マイ
クロホン１７を用いて、操作者の音声或いは拍手、足踏
み、手で他のものを叩く等の他の音を検知して、それを
注意喚起操作としてもよい。この場合、キーボード１３
からの入力に比べ音声認識に対する処理時間が掛かる事
から、意思表示の画像入力の開始時点のタイミングには
注意を要する。Further, in the above embodiment, the operation by inputting to the keyboard 13 is used as the alerting operation, but the voice of the operator or clapping, stepping, tapping other things with the hand, etc. is used by using the microphone 17. Other sounds may be detected and used as an alerting operation. In this case, the keyboard 13
Since the processing time for voice recognition is longer than that for the input from, pay attention to the timing of the start point of the image input of the intention display.

【００６７】また、本実施例では意思表示を検知するセ
ンサとしてビデオカメラ３を用いるようにしたが、任意
の他のセンサを用いてもよい。例えば、マイクロホン１
７を用いて音声や他の音を使ってもよい。この場合、注
意喚起操作の時点で、音を入力し、この入力された音声
を認識する。このとき音声による入力は画像による場合
よりも、認識に係る処理時間を短くすることができる。
また、音声の認識には通常の音声認識手段を用いてもよ
いが、画像の認識処理の場合と同様に、注意喚起操作時
点の音とその後（あるいは前）の音を入力し、その２点
（あるいはそれ以上の数の点）での大きさや周波数の差
を求め、それを意思表示と対応するようにしておけば、
簡単な処理で確実な意思伝達が実現できる。例えば、あ
るキーを押して注意喚起操作を行った場合は音の大きさ
が、後の時点の方が大きくなったら画面上の対象物が上
に動くようにする。Although the video camera 3 is used as the sensor for detecting the intention in this embodiment, any other sensor may be used. For example, microphone 1
You may use voice and other sounds using 7. In this case, a sound is input at the time of the alerting operation, and the input voice is recognized. At this time, the input time by voice can shorten the processing time for recognition as compared with the case of inputting by image.
Ordinary voice recognition means may be used for voice recognition, but as in the case of image recognition processing, the sound at the time of the alerting operation and the sound after (or before) are input, and the two points are input. If you find the difference in size or frequency at (or more points) and make it correspond to the intention,
Reliable communication can be realized with simple processing. For example, when a certain key is pressed to perform an alerting operation, when the volume of the sound becomes louder at a later point in time, the object on the screen moves upward.

【００６８】さらに上記各実施例では、画像はすべてメ
モリに記憶してから、計算機のソフトフェアで処理する
ようにしている。これを、最新時点の画像はメモリに一
旦記憶すること無く、入力と同時にハードウェア的に、
記憶された前時点の画像と差分を取るようにしてもよ
い。Further, in each of the above embodiments, all the images are stored in the memory and then processed by the software of the computer. The image at the latest point is not stored in the memory once, but it is hardware-based at the same time as input.
You may make it take the difference with the stored image at the previous time.

【００６９】また、実施例では意思伝達処理の部分を操
作者が使う中央処理演算部７を用いて実現しているが、
この意思伝達処理専用のハードウェアを構成して実現す
るようにしてもよい。In the embodiment, the communication processing part is realized by using the central processing unit 7 used by the operator.
The hardware dedicated to this communication processing may be configured and realized.

【００７０】以上、具体的に説明したように本実施例に
よれば、操作者は特別な装置や器具を身体に装着するこ
と無く、また負担を強いられること無く指示を入力する
することができる。これにより、従来の意思伝達の際の
誤りの多発を、本実施例による注意喚起操作により意思
伝達の必要な時点を捕え、その時点と後（あるいは前）
の時点の画像内の変化を使うことにより周囲環境の変化
などに影響されず確実に意思伝達を行うことができる。As described above in detail, according to the present embodiment, the operator can input an instruction without wearing a special device or instrument on the body and without burden. .. As a result, the frequent occurrence of mistakes in the conventional communication is grasped at the time when the communication is necessary by the alerting operation according to the present embodiment, and after that time (or before).
By using the change in the image at the point of time, it is possible to surely communicate without being affected by the change of the surrounding environment.

【００７１】尚、上記の実施例では顔及び口の動きを意
思表示に適用した場合を例にとって説明したが、本発明
はこれに限定されること無く、例えば目、耳、鼻或いは
手、指等の適宜の部位を単独でまたは適当に組み合わせ
て意思表示に適用することができる。これらの場合につ
いても、画像間の差分で対象の外形を出してその形を射
影などで求めたり、対象の回りに小領域を設定し、その
中の明るさ変化を求めることにより、実施例の場合と同
様に実現できる。In the above embodiment, the case where the movements of the face and mouth are applied to the intention is described, but the present invention is not limited to this. For example, the eyes, ears, nose or hands, fingers. Appropriate parts such as can be applied alone or in an appropriate combination to the intention statement. Also in these cases, the outer shape of the object is obtained by the difference between the images and the shape is obtained by projection or the like, or a small area is set around the object and the brightness change therein is obtained, thereby It can be realized as in the case.

【００７２】[0072]

【発明の効果】上述したように本発明は、操作者の所定
の動作を注意喚起として操作者の意思表示を具現化する
ように構成したので、操作者に対して負担を強いること
無くかつ周囲環境の変化などに影響されること無く指示
を入力することができる。As described above, according to the present invention, the operator's predetermined motion is called as a reminder to embody the operator's intention, so that the operator is not burdened and the surroundings are avoided. Instructions can be input without being affected by changes in the environment.

[Brief description of drawings]

【図１】本発明の一実施例の概略の構成を使用形態と共
に示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention together with usage patterns.

【図２】顔・口部分の検出処理の手順を示すフローチャ
ートである。FIG. 2 is a flowchart showing a procedure of face / mouth portion detection processing.

【図３】顔の射影から顔領域を検出する様子を説明する
ための説明図である。FIG. 3 is an explanatory diagram for explaining how to detect a face area from a projection of a face.

【図４】口の射影から口領域を検出する様子を説明する
ための説明図である。FIG. 4 is an explanatory diagram for explaining how to detect a mouth area from a projection of a mouth.

【図５】口部分追跡処理の手順を示すフローチャートで
ある。FIG. 5 is a flowchart showing a procedure of a mouth part tracking process.

【図６】口の動き識別処理の手順を示すフローチャート
である。FIG. 6 is a flowchart showing a procedure of mouth movement identification processing.

【図７】口の動きを各パターン毎に説明する図である。FIG. 7 is a diagram illustrating movement of the mouth for each pattern.

【図８】図７に示す各パターンの内、横方向の口の動き
に伴う射影の形を示す図である。FIG. 8 is a diagram showing the shape of a projection associated with the lateral movement of the mouth among the patterns shown in FIG.

【図９】図７に示す各パターンの内、上下方向の口の動
きに伴う射影の形を示す図である。9 is a diagram showing the shape of a projection associated with the movement of the mouth in the vertical direction among the patterns shown in FIG.

【図１０】図７に示す各パターンの内、開口動作に伴う
射影の形を示す図である。FIG. 10 is a diagram showing the shape of a projection associated with the opening operation in each of the patterns shown in FIG.

【図１１】図７、図９乃至図１０における口の動き識別
のための判定表を示す図である。FIG. 11 is a diagram showing a determination table for mouth movement identification in FIGS. 7 and 9 to 10;

【図１２】画像入力・処理のタイミングチャートであ
る。FIG. 12 is a timing chart of image input / processing.

【図１３】画像入力・処理のタイミングチャートであ
る。FIG. 13 is a timing chart of image input / processing.

【図１４】画像入力・処理のタイミングチャートであ
る。FIG. 14 is a timing chart of image input / processing.

【図１５】画像入力・処理のタイミングチャートであ
る。FIG. 15 is a timing chart of image input / processing.

【図１６】画像入力・処理のタイミングチャートであ
る。FIG. 16 is a timing chart of image input / processing.

【図１７】画像入力・処理のタイミングチャートであ
る。FIG. 17 is a timing chart of image input / processing.

【図１８】画像入力・処理のタイミングチャートであ
る。FIG. 18 is a timing chart of image input / processing.

【図１９】図１８に示すタイミングチャートの場合の操
作の状態を説明する図である。FIG. 19 is a diagram illustrating an operation state in the case of the timing chart shown in FIG.

【図２０】顔の動きによる意思伝達処理の手順を示すフ
ローチャートである。FIG. 20 is a flowchart showing a procedure of a communication process based on a movement of a face.

【図２１】口の回りに設定した小領域の説明図である。FIG. 21 is an explanatory diagram of a small area set around the mouth.

【図２２】小領域を用いた口の動き識別処理の手順を示
すフローチャートである。FIG. 22 is a flowchart showing a procedure of mouth movement identification processing using a small area.

【図２３】小領域を用いた口の動き識別処理のための判
定表を示す図である。FIG. 23 is a diagram showing a determination table for mouth movement identification processing using a small area.

【図２４】３時点の画像を用いる場合の画像入力・処理
のタイミングチャートである。FIG. 24 is a timing chart of image input / processing when images at three time points are used.

【図２５】３時点の画像を用いる場合の処理の手順を示
すフローチャートである。FIG. 25 is a flowchart showing a procedure of processing when images at three time points are used.

【図２６】注意喚起操作を使い分ける場合を説明する図
である。[Fig. 26] Fig. 26 is a diagram illustrating a case where a different attention operation is used.

[Explanation of symbols]

１操作者３ビデオカメラ５画像処理部７中央処理演算部９ディスプレイ装置１１赤外線センサ１３キーボード１５マウス１７マイクロフォン１９フットスイッチ２１音声合成回路２３スピーカ２５スポットライト 1 Operator 3 Video Camera 5 Image Processing Section 7 Central Processing Calculation Section 9 Display Device 11 Infrared Sensor 13 Keyboard 15 Mouse 17 Microphone 19 Foot Switch 21 Voice Synthesis Circuit 23 Speaker 25 Spotlight

Claims

[Claims]

1. A first input means capable of inputting an operator's intention indication, a second input means for inputting a predetermined motion of the operator, and the operator to the second input means. And a processing means for embodying the operator's intention display by the first input means by an input relating to the predetermined operation of the instruction input device.

2. An image pickup unit for picking up an image of an operator, a storage unit for storing an image of the operator input from the image pickup unit, and an operation of the operator at an arbitrary time stored in the storage unit. An image and an image relating to the motion of the operator at a time different from the arbitrary time are fetched, and an identification means for identifying the motion of the operator from the comparison of these images, and an operator identified by the identification means The determination means for determining the intention of the operator based on the operation of the operator, the input means for instructing the start of the intention display by the operator's predetermined operation, and the input means for instructing the start of the intention display An instruction input device comprising: a processing unit that embodies the intention display determined by the determination unit.

3. The instruction input device according to claim 1, wherein the input by the input means is an instruction operation for starting the intention display of the operator identified by the identification means.

4. The processing means embodies an operator's intention display at the same time as an input time by a predetermined operation of the operator or at a time point traced back by a predetermined time from the input time. The instruction input device according to items 2 and 3.