JP2000132693A

JP2000132693A - Device and method for processing picture, and providing medium

Info

Publication number: JP2000132693A
Application number: JP10305153A
Authority: JP
Inventors: Tetsujiro Kondo; 哲二郎近藤; Tomoyuki Otsuki; 知之大月; Junichi Ishibashi; 淳一石橋; Daisuke Kikuchi; 大介菊地
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-10-27
Filing date: 1998-10-27
Publication date: 2000-05-12

Abstract

PROBLEM TO BE SOLVED: To make easily measurable a direction which a person faces. SOLUTION: A matching calculating part 42 detects the positions of both eyes from the picture of an inputted man's face and calculates the distance between the both detected eyes. A calibration part 43 stores the distance A between the eyes, which is calculated by the matching part 42 from the picture where the person looks at a camera in front of it. An angle calculating part 44 substitutes the distance (d) calculated by the matching calculating part 4 and the distance A stored in the calibration part 43 in θ=ACOS-1d and, then, calculates the angle θ from the reference point of the direction which the person faces.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像処理装置およ
び方法、並びに提供媒体に関し、特に、撮像された人の
顔の画像から、眼の位置を検出し、その検出された両眼
間の距離の変化により、その人の向いている方向を計測
する画像処理装置および方法、並びに提供媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus, an image processing method, and a providing medium. The present invention relates to an image processing apparatus and method for measuring a direction in which a person is facing, and a providing medium.

【０００２】[0002]

【従来の技術】複数の会議室をネットワークを用いて接
続し、あたかも１つのテーブルを囲んで会議しているよ
うな遠隔会議システムが提案されている。このようなシ
ステムにおける各会議室には、自分（参加者Ａと称す
る）以外の参加者の映像を映し出すディスプレイと、そ
のディスプレイに映し出されている参加者の発言を出力
するスピーカが備え付けられている。ビデオカメラとス
ピーカは、参加者Ａを除いた参加人数と同じ数だけ設置
される。2. Description of the Related Art There has been proposed a remote conference system in which a plurality of conference rooms are connected via a network and a conference is held as if a single table is surrounded. Each conference room in such a system is provided with a display for displaying an image of a participant other than the user (referred to as participant A) and a speaker for outputting the participant's remark displayed on the display. . The same number of video cameras and speakers as the number of participants excluding the participant A are provided.

【０００３】また、各会議室には、その会議室にいる参
加者を撮像するためのビデオカメラと、音声を取り込む
為のマイクロホンも備え付けられている。そのビデオカ
メラやマイクロホンは、各会議室に備え付けられている
ディスプレイの近傍（主に上部）に設置されている。ビ
デオカメラにより撮像された、参加者Ａの映像や、マイ
クロホンにより取り込まれた音声は、各会場に備え付け
られた、参加者Ａに対応するディスプレイとスピーカに
出力される。[0005] Each conference room is also provided with a video camera for capturing an image of a participant in the conference room and a microphone for capturing audio. The video camera and microphone are installed near (mainly at the top of) a display provided in each conference room. The video of the participant A captured by the video camera and the sound captured by the microphone are output to a display and speaker corresponding to the participant A provided at each venue.

【０００４】参加者Ａの発言を取り込むマイクロホン
に、指向性を有するものを用いると、参加者Ａが、発言
を伝えたい参加者の方、換言すれば、その参加者が映し
出されているディスプレイの方を向いて発言すれば、そ
の向かれた位置に設置されているマイクロホンが、他の
マイクロホンに比べて、より大きなレベルで発言を取り
込むことができる。従って、例えば、各会議室に設置さ
れたスピーカから参加者Ａの発言が出力される際、その
出力レベルは各会議室毎に異なる。これにより、大きい
出力レベルの音声を聞いた参加者は、直感的に、参加者
Ａが自分に向かって話をしていることを理解することが
できる。[0004] If a microphone having a directivity is used as the microphone for taking in the speech of the participant A, the participant A can be a participant who wants to convey the speech, in other words, a display on which the participant is projected. When speaking toward the microphone, the microphone installed at the position facing the microphone can capture the utterance at a larger level than other microphones. Therefore, for example, when the speech of the participant A is output from the speakers installed in each conference room, the output level differs for each conference room. Thereby, the participant who has heard the sound of the high output level can intuitively understand that the participant A is talking to himself.

【０００５】[0005]

【発明が解決しようとする課題】上述した会議室の構成
においては、複数のビデオカメラが設置されており、こ
れらのビデオカメラを用い、参加者の顔の向きを計測
し、その計測した結果から、参加者が、どの会議室にい
る参加者に対して話しているのかを判断し、その判断結
果に対応した会議室にのみ、参加者の発言を伝送した
り、または、出力レベルを大きくして伝送したりするこ
とが可能である。In the above-described configuration of the conference room, a plurality of video cameras are installed, and the direction of the participant's face is measured using these video cameras. Determine which conference room the participant is talking to, and transmit the participant's statement or increase the output level only to the conference room corresponding to the result of the determination. Can be transmitted.

【０００６】しかしながら、このように参加者の顔の向
きを計測するには、立体計測などの複雑な処理を行う必
要がある。その計測には、テンプレートマッチングが用
いられるが、このテンプレートマッチングにより顔の向
きを計測する場合、一定の角度毎の参加者の顔の画像を
予め記憶し、その記憶された画像（テンプレート）と、
その時点で撮像された画像との相関度を計算することに
より、顔の向きが計測される。このため、テンプレート
としての画像を記憶する多量のメモリが必要であり、さ
らに相関度を算出するための計算量が増大するといった
課題があった。However, in order to measure the orientation of the participant's face in this way, it is necessary to perform a complicated process such as three-dimensional measurement. For the measurement, template matching is used. When the face direction is measured by the template matching, the images of the face of the participant at each fixed angle are stored in advance, and the stored image (template) is
The orientation of the face is measured by calculating the degree of correlation with the image captured at that time. For this reason, there is a problem that a large amount of memory for storing an image as a template is required, and the amount of calculation for calculating the degree of correlation increases.

【０００７】本発明はこのような状況に鑑みてなされた
ものであり、撮像された参加者の画像から、眼の位置を
検出し、その検出された両眼の間の距離の変化により、
参加者の向いている方向を計測するようにするものであ
る。The present invention has been made in view of such a situation, and detects a position of an eye from a captured image of a participant, and changes the distance between the detected eyes.
It measures the direction in which the participant is facing.

【０００８】[0008]

【課題を解決するための手段】請求項１に記載の画像処
理装置は、被写体の画像を撮像する撮像手段と、撮像手
段により撮像された画像から、所定のパターンと一致す
る２点を抽出する抽出手段と、抽出手段により抽出され
た２点間の距離を算出する第１の算出手段と、第１の算
出手段により算出された２点間の距離を用いて、被写体
の角度を算出する第２の算出手段とを備えることを特徴
とする。According to a first aspect of the present invention, there is provided an image processing apparatus, comprising: an image pickup means for picking up an image of a subject; and two points matching a predetermined pattern are extracted from the image picked up by the image pickup means. An extracting unit, a first calculating unit that calculates a distance between the two points extracted by the extracting unit, and a second calculating unit that calculates an angle of the subject using the distance between the two points calculated by the first calculating unit. 2 calculation means.

【０００９】請求項７に記載の画像処理方法は、被写体
の画像を撮像する撮像ステップと、撮像ステップで撮像
された画像から、所定のパターンと一致する２点を抽出
する抽出ステップと、抽出ステップで抽出された２点間
の距離を算出する第１の算出ステップと、第１の算出ス
テップで算出された２点間の距離を用いて、被写体の角
度を算出する第２の算出ステップとを含むことを特徴と
する。According to a seventh aspect of the present invention, there is provided an image processing method comprising: an image capturing step of capturing an image of a subject; an extracting step of extracting two points matching a predetermined pattern from the image captured in the image capturing step; A first calculation step of calculating the distance between the two points extracted in step (a), and a second calculation step of calculating the angle of the subject using the distance between the two points calculated in the first calculation step. It is characterized by including.

【００１０】請求項８に記載の提供媒体は、被写体の画
像を撮像する撮像ステップと、撮像ステップで撮像され
た画像から、所定のパターンと一致する２点を抽出する
抽出ステップと、抽出ステップで抽出された２点間の距
離を算出する第１の算出ステップと、第１の算出ステッ
プで算出された２点間の距離を用いて、被写体の角度を
算出する第２の算出ステップとを含む処理を画像処理装
置に実行させるコンピュータが読み取り可能なプログラ
ムを提供することを特徴とする。[0010] The providing medium according to claim 8 includes an imaging step of capturing an image of a subject, an extraction step of extracting two points matching a predetermined pattern from the image captured in the imaging step, and an extraction step. A first calculation step of calculating the distance between the two extracted points; and a second calculation step of calculating the angle of the subject using the distance between the two points calculated in the first calculation step. A computer readable program for causing an image processing apparatus to execute a process is provided.

【００１１】請求項１に記載の画像処理装置、請求項７
に記載の画像処理方法、および請求項８に記載の提供媒
体においては、被写体の画像が撮像され、その撮像され
た画像から、所定のパターンと一致する２点が抽出さ
れ、抽出された２点間の距離が算出され、算出された２
点間の距離を用いて、被写体の角度が算出される。An image processing apparatus according to claim 1 and claim 7.
In the image processing method described in (1) and the providing medium described in (8), an image of a subject is captured, and two points that match a predetermined pattern are extracted from the captured image, and the two extracted points are extracted. The distance between is calculated and the calculated 2
The angle of the subject is calculated using the distance between the points.

【００１２】[0012]

【発明の実施の形態】以下に本発明の実施の形態を説明
するが、特許請求の範囲に記載の発明の各手段と以下の
実施の形態との対応関係を明らかにするために、各手段
の後の括弧内に、対応する実施の形態（但し一例）を付
加して本発明の特徴を記述すると、次のようになる。但
し勿論この記載は、各手段を記載したものに限定するこ
とを意味するものではない。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below. In order to clarify the correspondence between each means of the invention described in the claims and the following embodiments, each means is described. When the features of the present invention are described by adding the corresponding embodiment (however, an example) in parentheses after the parentheses, the result is as follows. However, of course, this description does not mean that each means is limited to those described.

【００１３】請求項１に記載の画像処理装置は、被写体
の画像を撮像する撮像手段（例えば、図４の正面ビデオ
カメラ３１−２１）と、撮像手段により撮像された画像
から、所定のパターンと一致する２点を抽出する抽出手
段（例えば、図５のマッチング計算部４２）と、抽出手
段により抽出された２点間の距離を算出する第１の算出
手段（例えば、図６のステップＳ３）と、第１の算出手
段により算出された２点間の距離を用いて、被写体の角
度を算出する第２の算出手段（例えば、図６のステップ
Ｓ８）とを備えることを特徴とする。An image processing apparatus according to a first aspect of the present invention includes an image pickup means (for example, a front video camera 31-21 in FIG. 4) for picking up an image of a subject and a predetermined pattern from the image picked up by the image pickup means. Extraction means for extracting two matching points (for example, the matching calculation unit 42 in FIG. 5) and first calculation means for calculating the distance between the two points extracted by the extraction means (for example, step S3 in FIG. 6) And second calculating means (for example, step S8 in FIG. 6) for calculating the angle of the subject using the distance between the two points calculated by the first calculating means.

【００１４】請求項３に記載の画像処理装置は、第１の
算出手段が被写体の正面画像から算出した２点間の距離
Ａを記憶する記憶手段（例えば、図５のキャリブレーシ
ョン部４３）をさらに備え、第２の算出手段は、距離Ａ
と第１の算出手段により算出された距離ｄとを、θ＝Ａ
COS^-1ｄの式に代入することにより、角度θを算出する
ことを特徴とする。According to a third aspect of the present invention, the image processing apparatus includes a storage unit (for example, the calibration unit 43 in FIG. 5) for storing the distance A between the two points calculated from the front image of the subject by the first calculation unit. In addition, the second calculating means includes a distance A
And the distance d calculated by the first calculating means, θ = A
It is characterized in that the angle θ is calculated by substituting into the equation of COS ⁻¹ d.

【００１５】請求項４に記載の画像処理装置は、撮像手
段により撮像された画像から、第２のパターンと一致す
る１点を抽出する第２の抽出手段（例えば、図１２のマ
ッチング計算部５２−１，５２−２）と、第１の算出手
段により算出された２点間の距離の中点と、第２の抽出
手段により抽出された１点との距離を算出し、その算出
された値で、第１の算出手段により算出された２点間の
距離を正規化する正規化手段（例えば、図１２の距離正
規化部５３−１，５３−２）とをさらに備え、第２の算
出手段は、正規化手段により正規化された２点間の距離
を用いて、被写体の角度を算出することを特徴とする。According to a fourth aspect of the present invention, there is provided an image processing apparatus, comprising: a second extracting unit (for example, a matching calculation unit 52 shown in FIG. 12) for extracting one point which coincides with the second pattern from an image captured by the imaging unit; -1, 52-2), and the distance between the midpoint of the distance between the two points calculated by the first calculating means and the one point extracted by the second extracting means is calculated. A normalizing unit (for example, distance normalizing units 53-1 and 53-2 in FIG. 12) that normalizes the distance between the two points calculated by the first calculating unit using the value. The calculation means calculates the angle of the subject using the distance between the two points normalized by the normalization means.

【００１６】請求項６に記載の画像処理装置は、記撮像
手段は、所定の角度αを有して設置された２台のビデオ
カメラであり、ビデオカメラの一方で撮像された画像か
ら、正規化手段により正規化された値を距離ｄ₁とし、
ビデオカメラの他方で撮像された画像から、正規化手段
により正規化された値を距離ｄ₂としたとき、角度α、
距離ｄ₁、および距離ｄ₂を用いて、２点間の距離の最大
の距離Ａを算出する第３の算出手段（例えば、図１３の
ステップＳ２５）をさらに備え、第２の算出手段は、第
３の算出手段により算出された距離Ａを用いて、被写体
の角度を算出することを特徴とする。According to a sixth aspect of the present invention, in the image processing apparatus, the imaging means is two video cameras installed at a predetermined angle α, and a normal image is taken from one of the video cameras. The value normalized by the conversion means is distance d ₁ ,
From the captured image in the other video camera, when the normalized value by the normalization means and the distance d _2, the angle alpha,
A third calculating means (for example, step S25 in FIG. 13) for calculating the maximum distance A between the two points using the distance d ₁ and the distance d ₂ is further provided. The angle of the subject is calculated using the distance A calculated by the third calculating means.

【００１７】図１は、本発明の画像処理装置を適用した
テレビ会議システムの構成を示している。なお、本明細
書において、システムとは、複数の装置で構成される全
体的な装置を表すものとする。図１に示されるように、
複数（この実施の形態の場合、４つ）の通信センタ１−
１乃至１ー４が、例えば、ISDN（Integrated ServiceDi
gital Network）などのネットワーク２を介して相互に
接続されている。また、各通信センタは、例えば、図２
に示すような１つの会議室を備えている。FIG. 1 shows a configuration of a video conference system to which the image processing apparatus of the present invention is applied. In this specification, a system refers to an entire device including a plurality of devices. As shown in FIG.
A plurality (four in this embodiment) of communication centers 1-
For example, ISDN (Integrated Service Diode)
(Gital Network). Each communication center is, for example, as shown in FIG.
One conference room as shown in FIG.

【００１８】図２に示す会議室においては、１つのテー
ブル１０、１つの椅子及び３台のディスプレイ装置が設
けられている。例えば、通信センタ１−４の会議室にお
いては、図２に示される番号４の位置に椅子が配置され
ており、番号１乃至３の位置にディスプレイ装置が配置
されている。また、通信センタ１−３の会議室において
は、図２に示される番号３の位置に椅子が配置されてお
り、番号１，２、および４の位置にディスプレイ装置が
配置されている。通信センタ１−２の会議室において
は、図２に示される番号２の位置に椅子が配置されてお
り、番号１，３、および４の位置にディスプレイ装置が
配置されている。さらに、通信センタ１−１の会議室に
おいては、図２に示される番号１の位置に椅子が配置さ
れており、番号２乃至４の位置にディスプレイ装置が配
置される。In the conference room shown in FIG. 2, one table 10, one chair, and three display devices are provided. For example, in a conference room of the communication center 1-4, a chair is arranged at a position of number 4 shown in FIG. 2, and a display device is arranged at positions of numbers 1 to 3. Further, in the conference room of the communication center 1-3, a chair is arranged at the position of No. 3 shown in FIG. 2, and a display device is arranged at the positions of Nos. 1, 2, and 4. In the conference room of the communication center 1-2, the chair is arranged at the position of No. 2 shown in FIG. 2, and the display device is arranged at the positions of Nos. 1, 3 and 4. Further, in the conference room of the communication center 1-1, a chair is arranged at a position of number 1 shown in FIG. 2, and a display device is arranged at positions of numbers 2 to 4.

【００１９】また、通信センタ１−４の会議室に配置さ
れたディスプレイ装置において、図２に示される番号１
の位置に配置されたディスプレイ装置には、通信センタ
１−１の参加者を撮影した画像が表示され、番号２の位
置に配置されたディスプレイ装置には、通信センタ１−
２の参加者を撮影した画像が表示され、番号３の位置に
配置されたディスプレイ装置には、通信センタ１−３の
参加者を撮影した画像が表示されるようになされてい
る。他の通信センタも同様に、参加者が座る椅子以外の
位置に配置されたディスプレイ装置には、対応する通信
センタの参加者を撮影した映像が表示されることにな
る。Further, in the display device arranged in the conference room of the communication center 1-4, the number 1 shown in FIG.
Is displayed on the display device arranged at the position of the communication center 1-1, and the display device arranged at the position of No. 2 is displayed on the display device arranged at the position of the communication center 1-1.
An image of the participant of the communication center 1-3 is displayed on the display device arranged at the position of the number 3 by displaying an image of the participant No. 2. Similarly, in other communication centers, a video image of the participant of the corresponding communication center is displayed on a display device arranged at a position other than the chair where the participant sits.

【００２０】このように、それぞれの通信センタの会議
室には、その通信センタの特定の位置に参加者が座るた
めの椅子が配置されており、その他は、他の通信センタ
の参加者を表示するディスプレイ装置が配置されること
になる。したがって、このように、会議室を構成するこ
とにより、どの通信センタの会議室においても、参加者
の配置が同一の位置となる。すなわち、あたかも、テー
ブル１０を中心にして、４人の参加者が、実際に特定の
位置に配置されたような状態となる。ただし、各通信セ
ンタの会議室において、実在する参加者自身以外は、全
てディスプレイ表示による参加者になるが、どの会議室
においても、同様の会議状態が実現されていることにな
る。As described above, in the conference room of each communication center, the chair for the participant to sit at a specific position of the communication center is arranged, and for the others, the participants of the other communication centers are displayed. Display device to be arranged. Therefore, by configuring the conference room in this way, the arrangement of the participants is the same in the conference room of any communication center. That is, it is as if four participants were actually arranged at specific positions with the table 10 at the center. However, in the conference rooms of each communication center, all the participants other than the actual participants themselves are displayed, and the same conference state is realized in any conference room.

【００２１】次に、各通信センタの詳細について、図３
を用いて説明する。なお、各通信センタとも、ディスプ
レイ装置の配置状態は多少異なるが、ほぼ同一の構成で
あるため、ここでは、通信センタ１−４についてのみ説
明し、他の通信センタ１−１乃至１−３の説明は省略す
る。Next, the details of each communication center will be described with reference to FIG.
This will be described with reference to FIG. Although the arrangement of the display devices is slightly different in each of the communication centers, they have almost the same configuration. Therefore, only the communication center 1-4 will be described here, and the other communication centers 1-1 to 1-3 will be described. Description is omitted.

【００２２】まず、通信センタ１−４の会議室には、図
２に示したように、番号４の位置に椅子が配置されてお
り、番号１乃至３の位置にディスプレイ装置がそれぞれ
配置されている。従って、図３に示される参加者２４
は、図２の番号４の位置に配置されている椅子に座るこ
とになる。また、各ディスプレイ装置２１乃至２３に
は、参加者２４の映像を取り込むために、参加者２４の
左側に設置されている左側面ビデオカメラ３１−２２、
参加者２４の正面に設置されている正面ビデオカメラ３
１−２１、および参加者２４の右側に設置されている右
側面ビデオカメラ３１−２３が設けられている。さら
に、その参加者２４の発言を取り込むマイクロホン３２
−２１乃至３２−２３（以下、マイクロホン３２−２１
乃至３２−２３を個々に区別する必要がない場合、単に
マイクロホン３２と記述する。その他の装置に付いても
同様に記述する）、他の通信センタからそれぞれ供給さ
れる音声を出力するスピーカ部３３−２１乃至３３−２
３、およびその音声に対応する画像を表示するディスプ
レイ部３４−２１乃至３４−２３が設けられている。First, in the conference room of the communication center 1-4, as shown in FIG. 2, a chair is arranged at a position of No. 4 and display devices are arranged at positions of Nos. 1 to 3, respectively. I have. Therefore, the participant 24 shown in FIG.
Will be sitting on the chair located at position 4 in FIG. In addition, in order to capture the image of the participant 24, each of the display devices 21 to 23 has a left side video camera 31-22 installed on the left side of the participant 24,
Front video camera 3 installed in front of participant 24
1-2, and a right side video camera 31-23 installed on the right side of the participant 24. Further, a microphone 32 for capturing the speech of the participant 24
-21 to 32-23 (hereinafter, a microphone 32-21)
When it is not necessary to individually discriminate among the microphones 32 to 23, the microphone 32 is simply described. Speaker units 33-21 to 33-2 for outputting sounds respectively supplied from other communication centers.
3 and display units 34-21 to 34-23 for displaying an image corresponding to the sound.

【００２３】スピーカ部３３−２１乃至３３−２３とデ
ィスプレイ部３４−２１乃至３４−２３は、通信センタ
１−１乃至１−３から送信されてきた画像とその画像に
対応する音声をそれぞれ出力するようになされている。
すなわち、例えば、ディスプレイ装置２１のディスプレ
イ部３４−２１には、通信センタ１−１の参加者の画像
が表示され、スピーカ部３３−２１からは、その参加者
の発言が出力されるようになされている。また、ディス
プレイ装置２２のディスプレイ部３４−２２には、通信
センタ１−２の参加者の画像が表示され、スピーカ部３
３−２２からは、その参加者の発言が出力されるように
なされている。さらに、ディスプレイ装置２３のディス
プレイ部３４−２３には、通信センタ１−３の参加者の
画像が表示され、スピーカ部３３−２３からは、その参
加者の発言が出力されるようになされている。The speaker units 33-21 to 33-23 and the display units 34-21 to 34-23 output images transmitted from the communication centers 1-1 to 1-3 and sound corresponding to the images, respectively. It has been made like that.
That is, for example, an image of a participant of the communication center 1-1 is displayed on the display unit 34-21 of the display device 21, and a statement of the participant is output from the speaker unit 33-21. ing. Further, an image of a participant of the communication center 1-2 is displayed on the display unit 34-22 of the display device 22, and the speaker unit 3 is displayed.
From 3-22, the utterance of the participant is output. Further, an image of the participant of the communication center 1-3 is displayed on the display part 34-23 of the display device 23, and the utterance of the participant is output from the speaker part 33-23. .

【００２４】また、ディスプレイ装置２１に配置されて
いる正面ビデオカメラ３１−２１は、通信センタ１−４
の参加者２４を撮影し、マイクロホン３２−２１は、そ
の参加者２４の発言を取り込み、その参加者２４の画像
と発言が、通信センタ１−１に供給される。また、ディ
スプレイ装置２２に設置されている左側面ビデオカメラ
３１−２２は、通信センタ１−４の参加者２４を撮影
し、マイクロホン３２−２２は、その参加者の発言を取
り込み、その参加者２４の画像と発言が、通信センタ１
−２に供給される。さらに、ディスプレイ装置２３に設
置されているビデオカメラ３１−２３は、通信センタ１
−４の参加者２４を撮影し、マイクロホン３２−２３
は、その参加者の発言を取り込み、その参加者２４の画
像と発言が、通信センタ１−３に供給される。The front video camera 31-21 disposed on the display device 21 is connected to the communication center 1-4.
Of the participant 24, the microphone 32-21 captures the utterance of the participant 24, and the image and the utterance of the participant 24 are supplied to the communication center 1-1. Further, the left side video camera 31-22 installed on the display device 22 photographs the participant 24 of the communication center 1-4, and the microphone 32-22 captures the participant's remark, and the participant 24 Of the communication center 1
-2. Further, the video camera 31-23 installed on the display device 23 is connected to the communication center 1
-4 participant 24 is photographed, and microphones 32-23 are taken.
Captures the speech of the participant, and the image and speech of the participant 24 are supplied to the communication center 1-3.

【００２５】そして、図３に示すように、ディスプレイ
装置２１乃至２３は、参加者２４が各ディスプレイ装置
２１乃至２３のディスプレイ部３４−２１乃至３４−２
３を見ることができるように、図２に示された所定の位
置に配置されている。Then, as shown in FIG. 3, the display devices 21 to 23 are configured such that the participant 24 displays the display units 34-21 to 34-2 of the respective display devices 21 to 23.
3 so that they can be seen in the predetermined position shown in FIG.

【００２６】このような構成をもつ会議室において、参
加者２４が注目している参加者（対応するディスプレイ
装置２１乃至２３）を判別し、その参加者（以下、適
宜、対話者と記述する）の発言や映像が強調されるよう
にすることにより、臨場感を増し、使いやすいテレビ会
議システムを提供する事が可能となる。以下に、参加者
２４が注目している対話者を判別するために行う参加者
２４の顔の向きの測定について説明する。In the conference room having such a configuration, a participant (corresponding display devices 21 to 23) to which the participant 24 pays attention is determined, and the participant (hereinafter, appropriately referred to as a talker) is determined. By emphasizing the remarks or images, it is possible to provide a video conferencing system that is more realistic and easy to use. Hereinafter, measurement of the face direction of the participant 24 performed to determine the interlocutor that the participant 24 focuses on will be described.

【００２７】図４は、参加者２４の顔の向きを測定する
ための装置の構成を示している。参加者２４を撮像する
ビデオカメラとして、図４においては、正面ビデオカメ
ラ３１−２１が用いられている。正面ビデオカメラ３１
−２１は、得られた参加者２４の画像から顔の向きを算
出する演算装置４１と接続されている。FIG. 4 shows the configuration of an apparatus for measuring the direction of the face of the participant 24. In FIG. 4, a front video camera 31-21 is used as a video camera for imaging the participant 24. Front video camera 31
-21 is connected to an arithmetic unit 41 that calculates the direction of the face from the obtained image of the participant 24.

【００２８】演算装置４１の内部構成のブロック図を図
５に示す。正面ビデオカメラ３１−２１により撮像さ
れ、演算装置４１に入力された画像データは、マッチン
グ計算部４２に入力される。マッチング計算部４２は、
入力された参加者２４の画像データから、眼の部分を抽
出し、その抽出された両眼間の距離を算出するようにな
されている。なお、その詳細については、後述する。マ
ッチング計算部４２から出力された両眼間の距離のう
ち、基準値として用いられるために入力された画像デー
タから算出された値は、キャリブレーション部４３に出
力され、記憶される。FIG. 5 is a block diagram showing the internal configuration of the arithmetic unit 41. Image data captured by the front video camera 31-21 and input to the arithmetic unit 41 is input to the matching calculation unit 42. The matching calculation unit 42
An eye portion is extracted from the input image data of the participant 24, and the distance between the extracted eyes is calculated. The details will be described later. Of the distance between the eyes output from the matching calculation unit 42, a value calculated from the image data input to be used as the reference value is output to the calibration unit 43 and stored.

【００２９】角度計算部４４は、キャリブレーション部
４３から出力された基準値とマッチング計算部４２から
出力された、両眼間の距離の値を用いて、参加者２４の
向いている方向の角度を算出する。算出された角度は、
例えば、スピーカ部３３−２１乃至３３−２３から出力
される音の音量を制御したり、ディスプレイ部３４−２
１乃至３４−２３の映像を制御したりするための制御量
を算出する際のデータとして用いられる。The angle calculation unit 44 uses the reference value output from the calibration unit 43 and the value of the distance between the eyes output from the matching calculation unit 42 to calculate the angle in the direction in which the participant 24 is facing. Is calculated. The calculated angle is
For example, it controls the volume of sound output from the speaker units 33-21 to 33-23, or controls the display unit 34-2.
It is used as data when calculating a control amount for controlling the images 1 to 34-23.

【００３０】演算装置４１の動作について、図６のフロ
ーチャートを参照して説明する。ステップＳ１におい
て、初期画像の入力が行われる。参加者２４は、初期画
像として、正面ビデオカメラ３１−２１に、自分が正面
ビデオカメラ３１−２１に対して正面を向いているとき
の画像を撮像させる。演算装置４１のマッチング計算部
４２は、ステップＳ２において、得られた参加者２４の
正面画像から、眼の位置を検出する。The operation of the arithmetic unit 41 will be described with reference to the flowchart of FIG. In step S1, an initial image is input. The participant 24 causes the front video camera 31-21 to capture an image when the participant 24 faces the front with respect to the front video camera 31-21 as an initial image. In step S2, the matching calculation unit 42 of the arithmetic device 41 detects the position of the eye from the obtained front image of the participant 24.

【００３１】図７は、眼の位置の検出の仕方を説明する
図である。部分８１は、眼球の黒眼の部分であり、部分
８２乃至８５は眼球の周りの肌色の部分である。このよ
うに、顔において、中心部が黒、その周りの４カ所の部
分が肌色というパターンをもつのは眼だけであり、この
パターンにあてはまる部分を検出することにより、参加
者２４の両眼を検出する事が可能である。具体的に部分
８１は、輝度Ｙが小さく、色差Ｕ，Ｖが１２８付近であ
ることを条件とし、部分８２乃至８５は、色差Ｕを輝度
Ｙで割った値が−０．１乃至０．０の間の値であり、か
つ色差Ｖを輝度Ｙで割った値が０．１乃至０．３の間の
値とする。また、部分８１乃至８５は、それぞれ所定の
画素数の大きさ、例えば、３×３画素の大きさであり、
それぞれの部分の範囲の一定数以上がその色であると判
断された場合、そこが眼の位置として検出される。FIG. 7 is a diagram for explaining how to detect the position of the eye. The portion 81 is a black eye portion of the eyeball, and the portions 82 to 85 are skin color portions around the eyeball. As described above, only the eyes have a pattern in which the central part is black and the four parts around the face are flesh-colored. By detecting the part corresponding to this pattern, both eyes of the participant 24 are detected. It is possible to detect. Specifically, the portion 81 has a condition that the luminance Y is small and the color differences U and V are around 128, and the portions 82 to 85 have values obtained by dividing the color difference U by the luminance Y from −0.1 to 0.0. And the value obtained by dividing the color difference V by the luminance Y is a value between 0.1 and 0.3. Each of the portions 81 to 85 has a size of a predetermined number of pixels, for example, a size of 3 × 3 pixels,
When it is determined that a certain number or more of the range of each part is the color, the color is detected as the position of the eye.

【００３２】図７においては、部分８１乃至８４が同じ
大きさで、部分８５のみが、例えば、３×１５画素と、
異なる大きさにされている。このように部分８５を縦長
に取ったのは、他の部分と区別するためにである。さら
に、顔が傾いている時のことを考慮し、部分８３と部分
８４は、両方検出される必要はなく、どちらか一方が肌
色として検出されればよい。このようなパターンを用い
て検出された眼の例を、図８に示す。図８に示したよう
に、図７で示したようなパターンを持つ部分を、入力さ
れた顔の画像上から検出することで両眼の位置を確定す
ることができる。In FIG. 7, portions 81 to 84 have the same size, and only portion 85 has, for example, 3 × 15 pixels.
Different sizes. The reason why the portion 85 is vertically elongated is to distinguish it from other portions. Further, in consideration of the time when the face is tilted, it is not necessary to detect both the portion 83 and the portion 84, and it is sufficient that one of them is detected as the skin color. FIG. 8 shows an example of an eye detected using such a pattern. As shown in FIG. 8, the positions of both eyes can be determined by detecting a portion having the pattern shown in FIG. 7 from the input face image.

【００３３】このようにして、ステップＳ２において、
両眼が検出されたら、ステップＳ３に進む。ステップＳ
３において、ステップＳ２において検出された両眼の部
分８１の中心を、それぞれ結んだ直線の距離を算出す
る。この算出された両眼間の距離は、ステップＳ４にお
いて、キャリブレーション値として、キャリブレーショ
ン部４３に記憶される。Thus, in step S2,
When both eyes are detected, the process proceeds to step S3. Step S
In step 3, the distance between straight lines connecting the centers of the parts 81 of both eyes detected in step S2 is calculated. The calculated distance between both eyes is stored in the calibration unit 43 as a calibration value in step S4.

【００３４】このように、キャリブレーションが行われ
た後、ステップＳ５において、参加者２４の新たな画像
が、正面ビデオカメラ３１−２１により撮像される。こ
の撮像された画像に対して、マッチング計算部４２によ
り、ステップＳ６とステップＳ７の処理が行われるわけ
だが、その処理は、ステップＳ２とステップＳ３の処理
と同様の処理なので、その説明は省略する。After the calibration is performed as described above, in step S5, a new image of the participant 24 is captured by the front video camera 31-21. The processes of steps S6 and S7 are performed on the captured image by the matching calculation unit 42. Since the processes are the same as the processes of steps S2 and S3, the description thereof is omitted. .

【００３５】ステップＳ８において、参加者２４の顔の
向き（角度）が算出される。この算出は、次式に従って
行われる。 θ＝COS^-1ｄ／Ａ・・・（１）ここで、θは正面ビデオカメラ３１−２１に対する参加
者２４の顔の向きの角度を示し、ＡはステップＳ４にお
いて、キャリブレーション部４３に記憶された両眼間の
距離であり、ｄはステップＳ７において、算出された両
眼間の距離である。In step S8, the direction (angle) of the face of the participant 24 is calculated. This calculation is performed according to the following equation. θ = COS ⁻¹ d / A (1) Here, θ indicates the angle of the face direction of the participant 24 with respect to the front video camera 31-21, and A is stored in the calibration unit 43 in step S4. Is the distance between both eyes calculated, and d is the distance between both eyes calculated in step S7.

【００３６】図９は、式（１）により、角度θが求めら
れることを説明するための図である。図９（Ａ）は、参
加者２４が初期画像を入力させるために、正面ビデオカ
メラ３１−２１と正面に向かい合った場合（θ＝０の場
合）を表しており、その時に算出される両眼間の距離
は、距離Ａとされる。そして、図９（Ｂ）に示すよう
に、参加者２４が、角度θだけ顔の向きを傾けた場合、
正面ビデオカメラ３１−２１により撮像された画像から
算出される両眼間の距離は、距離ｄとなることがわか
る。FIG. 9 is a diagram for explaining that the angle θ is obtained by the equation (1). FIG. 9A illustrates a case where the participant 24 faces the front of the front video camera 31-21 to input an initial image (when θ = 0), and the both eyes calculated at that time are shown. The distance between them is a distance A. Then, as shown in FIG. 9B, when the participant 24 tilts the face by the angle θ,
It can be seen that the distance between the eyes calculated from the image captured by the front video camera 31-21 is the distance d.

【００３７】また、図９（Ｂ）から、距離Ａと距離ｄを
２辺とし、その間の角度が角度θである三角形が構成さ
れることがわかる。従って、COSθ＝（ｄ／Ａ）という
式が成り立ち、この式から、上述した式（１）が導き出
される。From FIG. 9B, it can be seen that a triangle is formed in which the distance A and the distance d are two sides and the angle between them is the angle θ. Therefore, the equation of COSθ = (d / A) holds, and the above equation (1) is derived from this equation.

【００３８】図６のフローチャートの説明に戻り、ステ
ップＳ８において、参加者２４が向いている顔の角度が
算出されたら、ステップＳ９に進み、その算出された角
度が、スピーカ部３３から出力される音の音量や、ディ
スプレイ部３４に映し出される映像を制御するための装
置に出力される。Returning to the description of the flowchart of FIG. 6, when the angle of the face to which the participant 24 faces is calculated in step S8, the process proceeds to step S9, and the calculated angle is output from the speaker unit 33. The sound is output to a device for controlling the sound volume and the image projected on the display unit 34.

【００３９】ステップＳ５乃至Ｓ９の処理が繰り返され
ることにより、会議中の間、随時参加者２４の顔の向き
が検出され、その検出結果により、スピーカ部３３やデ
ィスプレイ部３４が制御されることにより、臨場感があ
るテレビ会議システムを提供することが可能となる。By repeating the processing of steps S5 to S9, the direction of the face of the participant 24 is detected at any time during the conference, and the speaker unit 33 and the display unit 34 are controlled based on the detection result, thereby realizing a real It is possible to provide a video conference system with a feeling.

【００４０】図１０は、上述したように、参加者２４の
顔の向き（角度）を、１台のビデオカメラで撮像された
画像から算出した結果を示すグラフである。図１０に示
したグラフは、縦軸に角度を取り、横軸にフレーム数を
取り、参加者２４が、（−４５）度乃至４５度の間で顔
を振り、その時の画像を２１０フレームで撮像し、その
フレーム毎に角度を算出し、その結果をプロットしたも
のである。FIG. 10 is a graph showing the result of calculating the direction (angle) of the face of the participant 24 from an image captured by one video camera as described above. The graph shown in FIG. 10 has an angle on the vertical axis and the number of frames on the horizontal axis, and the participant 24 shakes his face between (−45) degrees and 45 degrees, and the image at that time is represented by 210 frames. The image is taken, the angle is calculated for each frame, and the result is plotted.

【００４１】なお、上述した説明においては、会議室に
備え付けられている正面ビデオカメラ３１−２１、左側
面ビデオカメラ３１−２２、およびビデオカメラ３１−
２１Ｒの３台のビデオカメラのうち、正面ビデオカメラ
３１−２１にのみ、演算装置４１を備え付け、参加者２
４の顔の向きを算出するようにしたが、３台ともに、演
算装置４１を取り付け、それぞれの装置で参加者２４の
顔の向きを検出するようにしても良い。このようにする
ことにより、正面ビデオカメラ３１−２１では両眼を検
出することができないような場合でも、他の２台のカメ
ラの一方で検出することが可能である、その結果、参加
者２４の顔の向きを算出することができる。このこと
は、広範囲で参加者２４の角度を算出できることを意味
する。In the above description, the front video camera 31-21, the left side video camera 31-22, and the video camera 31- provided in the conference room.
Of the three video cameras of 21R, only the front video camera 31-21 is provided with the arithmetic unit 41 and the participant 2
Although the face direction of No. 4 is calculated, the arithmetic unit 41 may be attached to all three devices, and the face direction of the participant 24 may be detected by each device. By doing so, even when the front video camera 31-21 cannot detect both eyes, it is possible to detect one of the other two cameras. As a result, the participant 24 Can be calculated. This means that the angle of the participant 24 can be calculated in a wide range.

【００４２】上述したように、ビデオカメラ１台で参加
者２４を撮像し、その撮像された画像から顔の向きを算
出するためには、キャリブレーション値を保持する必要
があった。しかしながら、ビデオカメラを２台用いるこ
とで、キャリブレーション値を保持しなくても、参加者
２４の顔の向きを算出することが可能であり、図１１
は、そのような場合の装置の構成例を示している。As described above, in order to take an image of the participant 24 with one video camera and calculate the face direction from the captured image, it is necessary to hold the calibration value. However, by using two video cameras, it is possible to calculate the face direction of the participant 24 without holding the calibration value.
Shows a configuration example of the device in such a case.

【００４３】図１１に示した構成例においては、２台の
正面ビデオカメラとして、参加者２４に向かって右側に
配置されたビデオカメラ３１−２１Ｌと、右側に配置さ
れたビデオカメラ３１−２１Ｒとが用いられ、それぞ
れ、演算装置５１と接続されている。演算装置５１の内
部は、図１２に示したように構成されている。すなわ
ち、ビデオカメラ３１−２１Ｌにより撮像された参加者
２４の画像は、マッチング計算部５２−１に入力され、
ビデオカメラ３１−２１Ｒにより撮像された参加者２４
の画像は、マッチング計算部５２−２に入力される。In the configuration example shown in FIG. 11, as two front video cameras, a video camera 31-21L disposed on the right side toward the participant 24 and a video camera 31-21R disposed on the right side. Are connected to the arithmetic unit 51, respectively. The inside of the arithmetic unit 51 is configured as shown in FIG. That is, the image of the participant 24 captured by the video camera 31-21L is input to the matching calculation unit 52-1.
Participant 24 imaged by video camera 31-21R
Are input to the matching calculation unit 52-2.

【００４４】マッチング計算部５２−１から出力された
データは、距離正規化部５３−１に入力され、マッチン
グ計算部５２−２から出力されたデータは、距離正規化
部５３−２に入力される。さらに、距離正規化部５３−
１，５３−２から出力されたデータは、角度計算部５４
に出力される。The data output from matching calculating section 52-1 is input to distance normalizing section 53-1. The data output from matching calculating section 52-2 is input to distance normalizing section 53-2. You. Further, the distance normalizing unit 53-
The data output from the first and the second 53-2 are calculated by an angle calculator 54.
Is output to

【００４５】次に、図１３のフローチャートを参照し
て、演算装置５１の動作について説明する。ステップＳ
２１において、ビデオカメラ３１−２１Ｌと右側面カメ
ラ３１−２３により撮像された参加者２４の顔の画像が
演算装置５１に、それぞれ入力される。ビデオカメラ３
１−２１Ｌで撮像された画像は、マッチング計算部５２
−１に入力され、ビデオカメラ３１−２１Ｒで撮像され
た画像は、マッチング計算部５２−２に入力される。Next, the operation of the arithmetic unit 51 will be described with reference to the flowchart of FIG. Step S
At 21, an image of the face of the participant 24 captured by the video camera 31-21 L and the right side camera 31-23 is input to the arithmetic unit 51. Video camera 3
The image captured in 1-21L is matched
-1 and the image captured by the video camera 31-21R is input to the matching calculation unit 52-2.

【００４６】マッチング計算部５２−１，５２−２は、
ステップＳ２２において、マッチング計算部４２（図
５）と同様の処理を行うことにより、参加者２４の眼の
部分を検出し、検出された両眼間の距離を算出する。さ
らに、マッチング計算部５２−１，５２−２は、ステッ
プＳ２３において、後述する距離正規化部５３−１，５
３−２が行う正規化に用いる距離を算出するために、参
加者２４の口の部分も検出する。口の検出は、図１４に
示したように、顔の中で所定の大きさの画素数の範囲内
が、赤い部分９１であるところを検出することにより行
われる。そのようにして検出された口の位置を用いて、
マッチング計算部５２−１，５２−２は、両眼間の距離
の中点と口との距離を算出する。The matching calculation units 52-1 and 52-2 are:
In step S22, by performing the same processing as the matching calculation unit 42 (FIG. 5), the eye part of the participant 24 is detected, and the distance between the detected eyes is calculated. Further, in step S23, the matching calculation units 52-1 and 52-2 determine the distance normalization units 53-1 and 53-1 described later.
In order to calculate the distance used for the normalization performed by 3-2, the mouth part of the participant 24 is also detected. The mouth is detected by detecting a red portion 91 within a range of a predetermined number of pixels in the face, as shown in FIG. Using the position of the mouth thus detected,
The matching calculation units 52-1 and 52-2 calculate the distance between the midpoint of the distance between the eyes and the mouth.

【００４７】マッチング計算部５２−１で算出された両
眼間の距離（以下、距離１と記述する）と、両眼間の距
離の中心と口との距離（以下、距離２と記述する）は、
距離正規化部５３−１に出力される。同様に、マッチン
グ計算部５２−２で算出された距離１と距離２は、距離
正規化部５３−２に出力される。距離正規化部５３−
１，５３−２は、ステップＳ２４において、それぞれ、
入力された距離１を距離２で除算することにより正規化
を行う。すなわち、人間の両眼間の距離１と、両眼間の
距離の中点から口までの距離２は、同一人物の場合、変
化せず一定であるので、距離１を距離２で除算した結果
は、常に一定の値（比率）となる。The distance between the eyes calculated by the matching calculation unit 52-1 (hereinafter referred to as distance 1) and the distance between the center of the distance between the eyes and the mouth (hereinafter referred to as distance 2) Is
Output to distance normalizing section 53-1. Similarly, the distance 1 and the distance 2 calculated by the matching calculation unit 52-2 are output to the distance normalization unit 53-2. Distance normalizing unit 53-
In step S24, 1, 53-2 respectively
Normalization is performed by dividing the input distance 1 by the distance 2. That is, since the distance 1 between the human eyes and the distance 2 from the midpoint of the distance between the eyes to the mouth are constant without change in the case of the same person, the result of dividing the distance 1 by the distance 2 Is always a constant value (ratio).

【００４８】図１５は、参加者２４の画像から、検出さ
れた眼と口の一例を示している。このように、上述した
方法により、眼と口を検出することが可能である。FIG. 15 shows an example of the eyes and the mouth detected from the image of the participant 24. As described above, the eyes and the mouth can be detected by the above-described method.

【００４９】距離正規化部５３−１，５３−２により正
規化された値は、角度計算部５４に出力される。角度計
算部５４は、ステップＳ２５において、入力された値か
ら両眼間の最大の距離Ａ（図５で説明した実施の形態で
は、キャリブレーション部４３に記憶された値）を算出
し、その算出された距離Ａを用いて参加者Ａの角度計算
を行う。The values normalized by the distance normalizing sections 53-1 and 53-2 are output to the angle calculating section 54. In step S25, the angle calculation unit 54 calculates the maximum distance A between the two eyes (the value stored in the calibration unit 43 in the embodiment described with reference to FIG. 5) from the input value, and calculates the value. The angle of the participant A is calculated using the distance A obtained.

【００５０】まず距離Ａの算出について説明するが、こ
こでは、図１６に示すように、ビデオカメラ３１−２１
Ｌとビデオカメラ３１−２１Ｒが配置されているものと
する。すなわち、角度ｄＩ₁，ｄＩ₂は、それぞれビデオ
カメラ３１−２１Ｌまたはビデオカメラ３１−２１Ｒ
の、参加者２４の正面に対する角度を表している。ここ
では、ビデオカメラ３１−２１Ｌの参加者２４の正面に
対する角度ｄＩ₁をマイナス４５度とし、ビデオカメラ
３１−２１Ｒの参加者２４の正面に対する角度ｄＩ₂を
４５度とする。そして、参加者２４のビデオカメラ３１
−２１Ｌに対する角度を角度θ₁、ビデオカメラ３１−
２１Ｒに対する角度をθ₂とする。なお、角度（θ₁＋θ
₂）の値は、角度（ｄＩ₁＋ｄＩ₂）と同じ大きさであ
る。First, the calculation of the distance A will be described. Here, as shown in FIG.
L and the video camera 31-21R are arranged. That is, the angles dI ₁ and dI ₂ are respectively set to the video camera 31-21L or the video camera 31-21R.
Of the participant 24 with respect to the front of the participant 24. Here, the angle dI ₁ with respect to the front of the participants 24 of the video camera 31-21L and minus 45 degrees, the angle dI ₂ to 45 degrees with respect to the front of the participants 24 of the video camera 31-21R. And the video camera 31 of the participant 24
The angle with respect to −21L is the angle θ ₁ , and the video camera 31−
The angle with respect to 21R is θ ₂ . Note that the angle (θ ₁ + θ
Value of ₂₎ is the same size as the angle (dI ₁ + dI _2).

【００５１】このように各ビデオカメラ３１が配置され
ている場合、すなわち、ビデオカメラ３１−２１Ｌとビ
デオカメラ３１−２１Ｒが９０度の角度を有して設置さ
れている場合、参加者２４の両眼間の最大の距離Ａは、
次式に基づいて、算出される。When the video cameras 31 are arranged as described above, that is, when the video cameras 31-21L and 31-21R are installed at an angle of 90 degrees, both of the participants 24 The maximum distance A between the eyes is
It is calculated based on the following equation.

【数１】この式（２）において、ｄ₁は、ビデオカメラ３１−２
１Ｌで撮像された参加者２４の顔の画像から算出された
両眼間距離であり、ｄ₂は、ビデオカメラ３１−２１Ｒ
で撮像された参加者２４の顔の画像から算出された両眼
間距離である。但し、両眼間の距離ｄ₁，ｄ₂は、それぞ
れ、ステップＳ２４で算出された正規化された値であ
る。(Equation 1) In this equation (2), d ₁ is the video camera 31-2.
The interocular distance calculated from the image of the face of the participant 24 captured in 1L, and d ₂ is the video camera 31-21R.
Is the distance between the eyes calculated from the image of the face of the participant 24 captured in the step (b). However, the distances d ₁ and d ₂ between both eyes are the normalized values calculated in step S24, respectively.

【００５２】式（２）は、以下の関係式（３）乃至
（５）から角度θ１と角度θ２を消すことにより、求め
られる。 θ₁＋θ₂＝９０度・・・（３） COSθ₁＝（ｄ₁／Ａ）・・・（４） COSθ₂＝（ｄ₂／Ａ）・・・（５）The equation (2) is obtained by eliminating the angles θ1 and θ2 from the following equations (3) to (5). θ ₁ + θ ₂ = 90 degrees (3) COS θ ₁ = (d ₁ / A) (4) COS θ ₂ = (d ₂ / A) (5)

【００５３】ちなみに、ビデオカメラ３１−２１Ｌとビ
デオカメラ３１−２１Ｒとがなす角度が４５度の場合、
すなわち、角度（θ₁＋θ₂）＝４５度の場合、距離Ａ
は、次式に従って、算出される。By the way, when the angle between the video camera 31-21L and the video camera 31-21R is 45 degrees,
That is, when the angle (θ ₁ + θ ₂ ) = 45 degrees, the distance A
Is calculated according to the following equation.

【数２】勿論、角度（θ₁＋θ₂）の大きさはいくらでも、本発明
を適用することは可能である。(Equation 2) Of course, the present invention can be applied to any size of the angle (θ ₁ + θ ₂ ).

【００５４】このようにして算出された距離Ａと、距離
ｄ₁または距離ｄ₂の一方の値が、式（１）に代入される
ことにより、参加者２４の顔の向きが算出される。そし
て、算出された顔の向きは、ステップＳ２６において、
スピーカ部３３から出力される音の音量や、ディスプレ
イ部３４に映し出される映像を制御するための装置に対
して出力される。ステップＳ２１乃至Ｓ２６の処理は、
会議の間、随時繰り返され、会議が終了された（テレビ
会議システムの電源がオフにされた）時に、このフロー
チャートの処理も、割り込み処理として終了される。The direction of the face of the participant 24 is calculated by substituting the thus calculated distance A and one of the distances d ₁ and d ₂ into the equation (1). Then, the calculated face direction is determined in step S26.
The sound is output to a device for controlling the volume of the sound output from the speaker unit 33 and the image displayed on the display unit 34. The processing of steps S21 to S26
The process is repeated at any time during the conference, and when the conference is terminated (the power supply of the video conference system is turned off), the process of this flowchart is also terminated as an interrupt process.

【００５５】このように、２台のビデオカメラを用いる
ことにより、キャリブレーション値を保持しなくても、
参加者２４の顔の向きを算出することが可能となる。ま
た、参加者２４が、ビデオカメラに対して前後に動いた
（ズーム移動した）場合に、両眼間の距離が変化する
が、両眼と口を検出し、それぞれ得られた距離を用いて
正規化することにより、そのような変化に影響を受けず
に、参加者２４の顔の向きを算出することが可能とな
る。As described above, by using two video cameras, even if the calibration values are not held,
The direction of the face of the participant 24 can be calculated. Also, when the participant 24 moves back and forth with respect to the video camera (zooms), the distance between the eyes changes, but the eyes and mouth are detected, and the obtained distances are used. By normalizing, it is possible to calculate the direction of the face of the participant 24 without being affected by such a change.

【００５６】図１７は、上述したように、ビデオカメラ
２台を用いて、参加者２４の顔の向きを測定した時の結
果を示すグラフである。図１０と同様に、縦軸に角度を
取り、横軸にフレームを取ったグラフであり、参加者２
４は（−４５）度乃至４５度の間で顔を振り、その時の
画像を２１０フレームで撮像し、そのフレーム毎に角度
を算出し、プロットしたグラフである。FIG. 17 is a graph showing a result obtained by measuring the orientation of the face of the participant 24 using two video cameras as described above. Similar to FIG. 10, it is a graph in which the vertical axis shows an angle and the horizontal axis shows a frame.
Reference numeral 4 denotes a graph in which the face is swung between (−45) degrees and 45 degrees, the image at that time is captured in 210 frames, the angle is calculated for each frame, and plotted.

【００５７】このように、キャリブレーション値を保持
しなくても、２台のビデオカメラを用いることにより、
参加者２４の顔の向きを算出することが可能であり、か
つ、複雑な演算や膨大な計算を行う必要なく、算出する
ことが可能である。As described above, even if the calibration values are not held, by using two video cameras,
The direction of the face of the participant 24 can be calculated, and the calculation can be performed without the need to perform complicated calculations or enormous calculations.

【００５８】上述した説明においては、本発明をテレビ
会議システムに適用したが、その他の装置等に適用する
ことも可能である。また、上述した説明において、通信
センタが４つである場合について説明したが、本発明は
これに限らず、通信センタをさらに多くしてもよいし、
少なくしてもよい。In the above description, the present invention is applied to a video conference system, but can be applied to other devices and the like. Further, in the above description, the case where there are four communication centers has been described, but the present invention is not limited to this, and the number of communication centers may be further increased,
You may reduce it.

【００５９】なお、本明細書中において、上記処理を実
行するコンピュータプログラムをユーザに提供する提供
媒体には、磁気ディスク、CD-ROMなどの情報記録媒体の
他、インターネット、デジタル衛星などのネットワーク
による伝送媒体も含まれる。In this specification, a medium for providing a user with a computer program for executing the above-mentioned processing includes an information recording medium such as a magnetic disk and a CD-ROM, and a network such as the Internet and a digital satellite. Transmission media is also included.

【００６０】[0060]

【発明の効果】以上の如く、請求項１に記載の画像処理
装置、請求項７に記載の画像処理方法、および請求項８
に記載の提供媒体によれば、被写体の画像を撮像し、そ
の撮像した画像から、所定のパターンと一致する２点を
抽出し、抽出した２点間の距離を算出し、算出した２点
間の距離を用いて、被写体の角度を算出するようにした
ので、簡易に被写体の向いている方向を検出することが
可能となる。As described above, the image processing apparatus according to claim 1, the image processing method according to claim 7, and the image processing apparatus according to claim 8.
According to the providing medium described in (1), an image of a subject is captured, two points that match a predetermined pattern are extracted from the captured image, a distance between the extracted two points is calculated, and the calculated distance between the two points is calculated. Since the angle of the subject is calculated using the distance, it is possible to easily detect the direction in which the subject is facing.

[Brief description of the drawings]

【図１】本発明の画像処理装置を適用したテレビ会議シ
ステムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a video conference system to which an image processing apparatus according to the present invention is applied.

【図２】本発明のテレビ会議システムにおける各通信セ
ンタの会議室の状態を示す図である。FIG. 2 is a diagram showing a state of a conference room of each communication center in the video conference system of the present invention.

【図３】図１における通信センタにおけるディスプレイ
装置の配置状態を示す図である。FIG. 3 is a diagram showing an arrangement state of a display device in a communication center in FIG. 1;

【図４】参加者の顔の向きを算出する装置の構成例を示
す図である。FIG. 4 is a diagram illustrating a configuration example of a device that calculates the direction of a participant's face;

【図５】演算装置の内部構成を示すブロック図である。FIG. 5 is a block diagram showing an internal configuration of the arithmetic unit.

【図６】演算装置の動作を説明するフローチャートであ
る。FIG. 6 is a flowchart illustrating an operation of the arithmetic device.

【図７】眼の検出について説明する図である。FIG. 7 is a diagram illustrating eye detection.

【図８】検出された眼について説明する図である。FIG. 8 is a diagram illustrating a detected eye.

【図９】角度の算出を説明する図である。FIG. 9 is a diagram illustrating calculation of an angle.

【図１０】算出された角度をプロットしたグラフであ
る。FIG. 10 is a graph in which calculated angles are plotted.

【図１１】参加者の顔の向きを算出する装置の構成例を
示す図である。FIG. 11 is a diagram illustrating a configuration example of a device that calculates the direction of a participant's face.

【図１２】演算装置の内部構成を示すブロック図であ
る。FIG. 12 is a block diagram illustrating an internal configuration of a calculation device.

【図１３】演算装置の動作を説明するフローチャートで
ある。FIG. 13 is a flowchart illustrating an operation of the arithmetic device.

【図１４】眼の検出について説明する図である。FIG. 14 is a diagram illustrating eye detection.

【図１５】検出された眼について説明する図である。FIG. 15 is a diagram illustrating a detected eye.

【図１６】角度の算出を説明する図である。FIG. 16 is a diagram illustrating calculation of an angle.

【図１７】算出された角度をプロットしたグラフであ
る。FIG. 17 is a graph in which calculated angles are plotted.

[Explanation of symbols]

１−１乃至１−６通信センタ，２ネットワーク，
２１−２３ディスプレイ装置，３５−２１正面
ビデオカメラ，３５−２２左側面ビデオカメラ，
３５−２３右側面ビデオカメラ，３６−２１乃至３
６−２３マイクロホン，３７−２１乃至３７−２３
ディスプレイ部，３８−２１乃至３８−２３スピ
ーカ部，４１演算装置，４２マッチング計算
部，４３キャリブレーション部，４４角度計算
部，５１演算装置，５２−１，５２−２マッチ
ング計算部，５３−１，５３−２距離正規化部，
５４角度計算部1-1 to 1-6 communication center, 2 networks,
21-23 display device, 35-21 front video camera, 35-22 left side video camera,
35-23 Right Side Video Camera, 36-21 to 3
6-23 microphone, 37-21 to 37-23
Display unit, 38-21 to 38-23 speaker unit, 41 arithmetic unit, 42 matching calculation unit, 43 calibration unit, 44 angle calculation unit, 51 arithmetic unit, 52-1, 52-2 matching calculation unit, 53-1 , 53-2 distance normalization unit,
54 angle calculator

フロントページの続き (72)発明者石橋淳一東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者菊地大介東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 2F065 AA03 AA07 AA31 BB05 BB07 CC16 DD00 DD06 DD07 FF04 FF61 JJ03 JJ05 JJ19 JJ26 NN20 QQ00 QQ23 QQ24 QQ25 QQ26 QQ28 QQ38 QQ42 SS02 SS13 5B057 BA02 DA06 DB02 DC02 DC08 DC25 5B087 AA00 AE03 BC05 BC12 BC26 BC32 5C064 AA02 AC04 AC06 AC22 5L096 BA18 CA05 EA13 FA66 FA67 GA38 Continued on the front page (72) Inventor Junichi Ishibashi 6-7-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Daisuke Kikuchi 6-35-35 Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Stock In-house F-term (reference) 2F065 AA03 AA07 AA31 BB05 BB07 CC16 DD00 DD06 DD07 FF04 FF61 JJ03 JJ05 JJ19 JJ26 NN20 QQ00 QQ23 QQ24 QQ25 QQ26 QQ28 QQ38 QQ42 SS02 SS13 5B057 BA02 BC06 DC02 BC02 DC02 AC04 AC06 AC22 5L096 BA18 CA05 EA13 FA66 FA67 GA38

Claims

[Claims]

An imaging unit that captures an image of a subject; an extraction unit that extracts two points that match a predetermined pattern from an image captured by the imaging unit; A first calculating means for calculating a distance between the two points, and a second calculating means for calculating an angle of the subject using a distance between two points calculated by the first calculating means. Image processing device.

2. The method according to claim 1, wherein the two points extracted by the extracting means are eyes, and the pattern has a black point as a center, and upper and lower portions thereof and either one of a left portion and a right portion is a flesh color pattern. The image processing apparatus according to claim 1, wherein:

3. The image processing apparatus according to claim 1, further comprising a storage unit configured to store a distance A between the two points calculated from the front image of the subject by the first calculation unit, wherein the second calculation unit stores the distance A and the first The image processing apparatus according to claim 1, wherein the angle θ is calculated by substituting the distance d calculated by the calculating means into the equation θ = COS ⁻¹ d / A.

4. A second extracting means for extracting one point that matches a second pattern from an image captured by the image capturing means, and a distance between the two points calculated by the first calculating means. A normal that calculates the distance between the midpoint and one point extracted by the second extracting means, and normalizes the distance between the two points calculated by the first calculating means with the calculated value. 2. The apparatus according to claim 1, further comprising a conversion unit, wherein the second calculation unit calculates the angle of the subject using a distance between the two points normalized by the normalization unit. 3. Image processing device.

5. The one extracted by the second extracting means.
The image processing apparatus according to claim 4, wherein the dot is a mouth, and the second pattern is a red pattern in a predetermined size range.

6. The imaging unit is two video cameras installed at a predetermined angle α, and is normalized by the normalization unit from an image captured by one of the video cameras. When the value is distance d ₁ and the value normalized by the normalizing means is distance d ₂ from an image captured by the other of the video cameras, the angle α, the distance d ₁ , and the distance d _And a third calculating means for calculating a maximum distance A of the distance between the two points by using the distance A, wherein the second calculating means uses the distance A calculated by the third calculating means. The image processing apparatus according to claim 4, wherein an angle of the subject is calculated.

7. An image capturing step of capturing an image of a subject, an extracting step of extracting two points matching a predetermined pattern from the image captured in the image capturing step, and a step between the two points extracted in the extracting step A first calculation step of calculating the distance between the two points, and a second calculation step of calculating the angle of the subject using the distance between the two points calculated in the first calculation step. Image processing method.

8. An image capturing step of capturing an image of a subject, an extracting step of extracting two points that match a predetermined pattern from the image captured in the image capturing step, and a step between the two points extracted in the extracting step. Image processing, comprising: a first calculation step of calculating a distance between two points; and a second calculation step of calculating an angle of the subject using the distance between the two points calculated in the first calculation step. A providing medium for providing a computer-readable program to be executed by an apparatus.