JP6767664B1

JP6767664B1 - Information processing systems, information processing equipment and programs

Info

Publication number: JP6767664B1
Application number: JP2019195113A
Authority: JP
Inventors: 和利谷山; 加藤　圭; 圭加藤
Original assignee: Fujitsu Client Computing Ltd
Current assignee: Fujitsu Client Computing Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-10-14
Anticipated expiration: 2039-10-28
Also published as: JP2021069079A; GB2589950A; GB202013486D0

Abstract

【課題】システム規模の増加を抑えて、効率よく所定の空間内に位置する人物に対して音声通知を行う。【解決手段】情報処理システム１−１は、情報処理装置１、カメラ２およびスピーカ３を備える。カメラ２は、空間内に位置する人物を撮影する。スピーカ３は、カメラ２と非一体型であり、音声の指向性を有し、制御部１ａからの指示にもとづいて回転駆動する。情報処理装置１内の制御部１ａは、カメラ２で撮影された撮影画像から対象人物を特定し、対象人物の頭部位置を検出し、頭部位置に向けて音声を発するためのスピーカ３の回転角度を算出する。そして、制御部１ａは、対象人物に発すべき音声パターンを選択して、回転角度でスピーカ３を回転させ、選択した音声パターンをスピーカ３から出力させる。【選択図】図１PROBLEM TO BE SOLVED: To efficiently perform voice notification to a person located in a predetermined space while suppressing an increase in system scale. An information processing system 1-1 includes an information processing device 1, a camera 2, and a speaker 3. The camera 2 photographs a person located in the space. The speaker 3 is non-integrated with the camera 2, has voice directivity, and is rotationally driven based on an instruction from the control unit 1a. The control unit 1a in the information processing device 1 identifies the target person from the captured image taken by the camera 2, detects the head position of the target person, and emits a sound toward the head position of the speaker 3. Calculate the rotation angle. Then, the control unit 1a selects a voice pattern to be emitted to the target person, rotates the speaker 3 at a rotation angle, and outputs the selected voice pattern from the speaker 3. [Selection diagram] Fig. 1

Description

本発明は、情報処理システム、情報処理装置およびプログラムに関する。 The present invention relates to information processing systems, information processing devices and programs.

情報処理技術の進展および監視カメラの高解像度化に伴い、監視カメラによって撮影された画像から人物を検出してスピーカから音声を発するシステムが開発されている。このようなシステムを例えば、店内に構築することにより、店内に侵入した不審者に対する防犯や、店員への事務連絡等を行うことができる。 With the progress of information processing technology and the increase in resolution of surveillance cameras, a system has been developed that detects a person from an image taken by a surveillance camera and emits sound from a speaker. By constructing such a system in the store, for example, it is possible to prevent crime against a suspicious person who has invaded the store, to contact a store clerk, and the like.

特開２０１７−２１５８０６号公報Japanese Unexamined Patent Publication No. 2017-215806

上記のようなシステムでは、従来、監視カメラとスピーカが一体型になっており、監視カメラの向きとスピーカの向きが同じになっている。しかし、このようなシステムで空間内に位置する人物に向けて音声を通知するためには、一体型の監視カメラ／スピーカを複数配置することになり、システム規模が増加し非効率であるという問題がある。 In the above system, the surveillance camera and the speaker are conventionally integrated, and the orientation of the surveillance camera and the orientation of the speaker are the same. However, in order to notify a person located in the space of voice in such a system, a plurality of integrated surveillance cameras / speakers must be arranged, which causes a problem that the system scale increases and it is inefficient. There is.

１つの側面では、本発明は、システム規模の増加を抑えて、効率よく所定の空間内に位置する人物に対して音声通知を行うことが可能な情報処理システム、情報処理装置およびプログラムを提供することを目的とする。 In one aspect, the present invention provides an information processing system, an information processing device, and a program capable of efficiently performing voice notification to a person located in a predetermined space while suppressing an increase in the system scale. The purpose is.

上記課題を解決するために、情報処理システムが提供される。情報処理システムは、カメラと、カメラと非一体型であり指向性を有して回転駆動するスピーカと、カメラで撮影された撮影画像から対象人物を特定し、対象人物の頭部位置を検出し、頭部位置に向けて音声を発するためのスピーカの回転角度を算出し、対象人物に発すべき音声パターンを選択して、回転角度でスピーカを回転させ音声パターンをスピーカから出力させる制御部と、を備え、制御部は、撮影画像の２次元画像を３次元空間に対応付け、２次元画像から対象人物の足元の座標および頭上の座標を検出して、足元の座標および頭上の座標を３次元空間にマッピングし、３次元空間にマッピングした頭上の座標にもとづく対象人物の頭上高さから所定値減算して耳の位置を検出し、耳の位置を対象人物の頭部位置とし、２次元画像から一定の時間間隔で対象人物の足元の座標を複数検出して時系列の座標データを取得し、座標データから所定時間の経過後の対象人物の移動量を算出し、移動量にもとづいて頭部位置の更新を行い、対象人物の検出からスピーカから音声パターンが出力されるまでの遅延時間を保持しておき、所定時間に遅延時間を含めて移動量を算出する。また、制御部は、検出した頭部位置にスピーカを向ける第１の回転角度を算出し、対象人物の移動先の予測を行わない場合、スピーカを第１の回転角度で回転させ音声パターンをスピーカから出力させ、対象人物の移動先の予測を行う場合、更新後の頭部位置にスピーカを向ける第２の回転角度を算出し、スピーカを第１の回転角度で回転させ、第１の回転角度の回転の終了後に、スピーカから音声パターンを出力させながら、スピーカを第２の回転角度で回転させる。 An information processing system is provided to solve the above problems. The information processing system identifies the target person from the camera, the speaker that is non-integrated with the camera and is driven to rotate with directionality, and the captured image taken by the camera, and detects the head position of the target person. A control unit that calculates the rotation angle of the speaker to emit sound toward the head position, selects the sound pattern to be emitted to the target person, rotates the speaker at the rotation angle, and outputs the sound pattern from the speaker. The control unit associates the two-dimensional image of the captured image with the three-dimensional space, detects the coordinates of the feet and overhead of the target person from the two-dimensional image, and three-dimensionalizes the coordinates of the feet and overhead. A two-dimensional image that maps to space, detects the position of the ear by subtracting a predetermined value from the overhead height of the target person based on the overhead coordinates mapped to the three-dimensional space, and sets the position of the ear as the head position of the target person. Detects multiple coordinates of the target person's feet at regular time intervals to acquire time-series coordinate data, calculates the movement amount of the target person after a lapse of a predetermined time from the coordinate data, and heads based on the movement amount. The position of the part is updated, the delay time from the detection of the target person to the output of the voice pattern from the speaker is maintained, and the movement amount is calculated including the delay time in the predetermined time. Further, the control unit calculates the first rotation angle at which the speaker is directed to the detected head position, and when the movement destination of the target person is not predicted, the control unit rotates the speaker at the first rotation angle and transmits the voice pattern to the speaker. When predicting the movement destination of the target person by outputting from, the second rotation angle for pointing the speaker to the updated head position is calculated, the speaker is rotated at the first rotation angle, and the first rotation angle is calculated. After the rotation of the speaker is completed, the speaker is rotated at the second rotation angle while outputting the sound pattern from the speaker.

また、上記課題を解決するために、上記情報処理システムと同様の制御を実行する情報処理装置が提供される。
さらに、コンピュータに上記情報処理システムと同様の制御を実行させるプログラムが提供される。 Further, in order to solve the above problems, an information processing device that executes the same control as the above information processing system is provided.
Further, a program for causing the computer to execute the same control as the information processing system is provided.

１側面によれば、システム規模の増加を抑えて、効率よく所定の空間内に位置する人物に対して音声通知を行うことができる。 According to one aspect, it is possible to suppress an increase in the system scale and efficiently perform voice notification to a person located in a predetermined space.

第１の実施の形態の情報処理システムの一例を説明するための図である。It is a figure for demonstrating an example of the information processing system of 1st Embodiment. 第２の実施の形態の情報処理システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the information processing system of 2nd Embodiment. スピーカの構成の一例を示す図である。It is a figure which shows an example of the structure of a speaker. 情報処理装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware configuration of an information processing apparatus. 人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。It is a figure which shows an example of the operation sequence from taking a picture of a person to making a voice call. 人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。It is a figure which shows an example of the operation sequence from taking a picture of a person to making a voice call. 人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。It is a figure which shows an example of the operation sequence from taking a picture of a person to making a voice call. 音声パターンテーブルの一例を示す図である。It is a figure which shows an example of the voice pattern table. ３Ｄ空間におけるカメラと人物の位置を説明するための図である。It is a figure for demonstrating the position of a camera and a person in 3D space. ３Ｄ空間におけるカメラと人物の位置を説明するための図である。It is a figure for demonstrating the position of a camera and a person in 3D space. 人物を検出してから声掛けを行うまでの全体動作の一例を示すフローチャートである。It is a flowchart which shows an example of the whole operation from the detection of a person to the call. 頭部位置の検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the detection process of a head position. 対象人物の移動速度の推定および頭部位置の更新の一例を示すフローチャートである。It is a flowchart which shows an example of estimation of the movement speed of a target person, and update of a head position. スピーカの回転角度の算出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the calculation process of the rotation angle of a speaker. 対象人物の移動に伴う回転角度の算出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the calculation process of the rotation angle with the movement of a target person. スピーカの回転駆動および声掛けの動作の一例を示すフローチャートである。It is a flowchart which shows an example of the rotation drive of a speaker, and the operation of a voice call. スピーカの回転駆動および声掛けの動作の一例を示すフローチャートである。It is a flowchart which shows an example of the rotation drive of a speaker, and the operation of a voice call.

以下、本実施の形態について図面を参照して説明する。
［第１の実施の形態］
図１は第１の実施の形態の情報処理システムの一例を説明するための図である。情報処理システム１−１は、情報処理装置１、カメラ２およびスピーカ３を備える。情報処理装置１は、制御部１ａおよび記憶部１ｂを含む。 Hereinafter, the present embodiment will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram for explaining an example of the information processing system of the first embodiment. The information processing system 1-1 includes an information processing device 1, a camera 2, and a speaker 3. The information processing device 1 includes a control unit 1a and a storage unit 1b.

カメラ２は、所定の空間内に位置する人物を監視し撮影する。スピーカ３は、カメラ２と非一体型であり、音声の指向性を有しており、制御部１ａからの指示にもとづいて回転駆動して音声を出力する。 The camera 2 monitors and photographs a person located in a predetermined space. The speaker 3 is non-integrated with the camera 2, has sound directivity, and is rotationally driven to output sound based on an instruction from the control unit 1a.

制御部１ａは、カメラ２で撮影された撮影画像に対して、ＡＩ（Artificial Intelligence）処理にもとづく画像解析を行う。また、制御部１ａは、画像解析結果にもとづいて、スピーカ３に対する駆動制御および音声出力制御を行う。記憶部１ｂは、制御部１ａによる処理に要する各種データを格納する。
制御部１ａおよび記憶部１ｂの各処理は、例えば、情報処理装置１が備える図示しないプロセッサが、所定のプログラムを実行することによって実現される。 The control unit 1a performs image analysis based on AI (Artificial Intelligence) processing on the captured image captured by the camera 2. Further, the control unit 1a performs drive control and audio output control for the speaker 3 based on the image analysis result. The storage unit 1b stores various data required for processing by the control unit 1a.
Each process of the control unit 1a and the storage unit 1b is realized, for example, by executing a predetermined program by a processor (not shown) included in the information processing device 1.

制御部１ａの動作の流れについて説明する。
〔ステップＳ１〕制御部１ａは、カメラ２で撮影された撮影画像から対象人物を特定する。
〔ステップＳ２〕制御部１ａは、対象人物の頭部位置を検出する。 The operation flow of the control unit 1a will be described.
[Step S1] The control unit 1a identifies the target person from the captured image captured by the camera 2.
[Step S2] The control unit 1a detects the head position of the target person.

〔ステップＳ３〕制御部１ａは、頭部位置に向けて音声を発するためのスピーカ３の回転角度を算出する。
〔ステップＳ４〕制御部１ａは、対象人物に適した音声パターンを選択する。
〔ステップＳ５〕制御部１ａは、算出した回転角度でスピーカ３を回転させて、選択した音声パターンをスピーカ３から出力させる。 [Step S3] The control unit 1a calculates the rotation angle of the speaker 3 for emitting sound toward the head position.
[Step S4] The control unit 1a selects a voice pattern suitable for the target person.
[Step S5] The control unit 1a rotates the speaker 3 at the calculated rotation angle, and outputs the selected voice pattern from the speaker 3.

このように、情報処理システム１−１では、カメラ２と非一体型であり指向性を有して回転駆動するスピーカ３を用いて、カメラ２で撮影された撮影画像から算出した対象人物の頭部位置に向けてスピーカ３を回転させて、スピーカ３から対象人物に音声を出力させる構成とした。これにより、スピーカ設置数を減少させることができるので、システム規模の増加を抑えて、効率よく所定の空間内に位置する人物に対して音声通知を行うことが可能になる。 As described above, in the information processing system 1-1, the head of the target person calculated from the captured image captured by the camera 2 using the speaker 3 which is not integrated with the camera 2 and has directivity and is rotationally driven. The speaker 3 is rotated toward the part position, and the speaker 3 outputs the sound to the target person. As a result, the number of speakers installed can be reduced, so that it is possible to suppress an increase in the system scale and efficiently perform voice notification to a person located in a predetermined space.

［第２の実施の形態］
次に第２の実施の形態について説明する。なお、以降の説明では、対象人物に音声通知を行うことを声掛けと呼ぶ場合がある。 [Second Embodiment]
Next, the second embodiment will be described. In the following description, giving a voice notification to the target person may be referred to as a voice call.

図２は第２の実施の形態の情報処理システムの構成の一例を示す図である。情報処理システム１−２は、情報処理装置１０、カメラ２０−１、・・・、２０−ｎ（総称する場合はカメラ２０と呼ぶ）、スピーカ３０、端末４１（保守管理用）、端末４２（通知用）、ＡＰ（アクセスポイント）５０、ハブ（Hub）６１およびＰｏＥ（Power over Ethernet）ハブ６２を備える（Ethernetは登録商標）。 FIG. 2 is a diagram showing an example of the configuration of the information processing system according to the second embodiment. The information processing system 1-2 includes an information processing device 10, a camera 20-1, ..., 20-n (collectively referred to as a camera 20), a speaker 30, a terminal 41 (for maintenance management), and a terminal 42 (for maintenance management). It includes (for notification), AP (access point) 50, hub (Hub) 61 and PoE (Power over Ethernet) hub 62 (Ethernet is a registered trademark).

情報処理装置１０は、制御部１１および記憶部１２を含む。制御部１１は、図１の制御部１ａの機能を有し、記憶部１２は図１の記憶部１ｂの機能を有する。スピーカ３０は、図１のスピーカ３の機能を有する。 The information processing device 10 includes a control unit 11 and a storage unit 12. The control unit 11 has the function of the control unit 1a of FIG. 1, and the storage unit 12 has the function of the storage unit 1b of FIG. The speaker 30 has the function of the speaker 3 of FIG.

ハブ６１は、ポートｐ１、・・・、ｐ４を有し、ＰｏＥハブ６２は、ポートｐ１１、ｐ１２−１・・・、ｐ１２−ｎを有している。ポートｐ１、・・・、ｐ４およびポートｐ１１は、例えば、１Ｇｂｉｔ／ｓの通信回線が接続可能なポートである。ポートｐ１２−１・・・、ｐ１２−ｎは、例えば、１００Ｍｂｉｔ／ｓの通信回線が接続可能なポートである。 The hub 61 has ports p1, ..., P4, and the PoE hub 62 has ports p11, p12-1 ..., P12-n. Ports p1, ..., P4 and port p11 are ports to which, for example, a 1 Gbit / s communication line can be connected. Ports p12-1 ..., P12-n are ports to which, for example, a 100 Mbit / s communication line can be connected.

ハブ６１のポートｐ１と、ＰｏＥハブ６２のポートｐ１１とは、ＬＡＮ（Local Area Network）ケーブルＬ１で接続されている。なお、ＰｏＥハブ６２は、Ethernet通信で利用するカテゴリ５ｅ以上のＬＡＮケーブルＬ１を通じて電力を供給する。 The port p1 of the hub 61 and the port p11 of the PoE hub 62 are connected by a LAN (Local Area Network) cable L1. The PoE hub 62 supplies power through the LAN cable L1 of category 5e or higher used for Ethernet communication.

よって、ＰｏＥハブ６２にカメラ２０を接続することで、ＡＣ（Alternating Current）アダプタ等の外部電力が不要になり、データ通信を行うＬＡＮケーブルＬ１のみで電力供給ができる。このため、屋外や天井等の電力供給が困難な場所でもカメラ２０を設置できる。 Therefore, by connecting the camera 20 to the PoE hub 62, external power such as an AC (Alternating Current) adapter becomes unnecessary, and power can be supplied only by the LAN cable L1 that performs data communication. Therefore, the camera 20 can be installed outdoors or in a place where power supply is difficult, such as the ceiling.

一方、ハブ６１には、ポートｐ２に端末４１が接続され、ポートｐ３に情報処理装置１０が接続され、ポートｐ４にＡＰ５０が接続されている。また、ＰｏＥハブ６２には、ポートｐ１２−１・・・、ｐ１２−ｎそれぞれにカメラ２０−１、・・・、２０−ｎが接続されている。ＡＰ５０には、端末４２およびスピーカ３０が無線で接続されている。 On the other hand, to the hub 61, the terminal 41 is connected to the port p2, the information processing device 10 is connected to the port p3, and the AP50 is connected to the port p4. Further, the PoE hub 62 is connected to the cameras 20-1, ..., 20-n to the ports p12-1 ..., P12-n, respectively. The terminal 42 and the speaker 30 are wirelessly connected to the AP 50.

＜スピーカの構成＞
図３はスピーカの構成の一例を示す図である。スピーカ３０は、音声出力部３１と、回転機構部３２を備える。音声出力部３１は、超音波を利用した音声伝播機能を有し、音声の指向性出力を行う。 <Speaker configuration>
FIG. 3 is a diagram showing an example of the speaker configuration. The speaker 30 includes an audio output unit 31 and a rotation mechanism unit 32. The voice output unit 31 has a voice propagation function using ultrasonic waves, and performs directional output of voice.

回転機構部３２は、水平方向と垂直方向の２軸回転機構を有する。回転機構部３２の水平方向のモータ回転機構により、水平軸ｈの０度を基準にして、プラス方向（矢印ｈ１）およびマイナス方向（矢印ｈ２）に音声出力部３１を水平方向に回転させる。 The rotation mechanism unit 32 has a biaxial rotation mechanism in the horizontal direction and the vertical direction. The horizontal motor rotation mechanism of the rotation mechanism unit 32 rotates the audio output unit 31 in the plus direction (arrow h1) and the minus direction (arrow h2) with reference to 0 degrees of the horizontal axis h.

また、回転機構部３２の垂直方向のモータ回転機構により、垂直軸ｖの０度を基準にして、プラス方向（矢印ｖ１）およびマイナス方向（矢印ｖ２）に音声出力部３１を垂直方向に回転させる。なお、回転機構部３２の上面には、壁面等にスピーカ３０を取り付けるための取付用部品３３が設けられており、また、スピーカ３０には、図示しない無線ＬＡＮ通信機能が設けられている。 Further, the vertical motor rotation mechanism of the rotation mechanism unit 32 rotates the audio output unit 31 in the plus direction (arrow v1) and the minus direction (arrow v2) with reference to 0 degree of the vertical axis v. .. A mounting component 33 for mounting the speaker 30 on a wall surface or the like is provided on the upper surface of the rotation mechanism unit 32, and the speaker 30 is provided with a wireless LAN communication function (not shown).

＜ハードウェア構成＞
図４は情報処理装置のハードウェア構成の一例を示す図である。情報処理装置１０は、プロセッサ（コンピュータ）１００によって全体制御されている。プロセッサ１００は、制御部１１の機能を実現する。 <Hardware configuration>
FIG. 4 is a diagram showing an example of the hardware configuration of the information processing device. The information processing device 10 is totally controlled by the processor (computer) 100. The processor 100 realizes the function of the control unit 11.

プロセッサ１００には、バス１０３を介して、メモリ１０１、入出力インタフェース１０２およびネットワークインタフェース１０４が接続されている。プロセッサ１００は、マルチプロセッサであってもよい。プロセッサ１００は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、またはＰＬＤ（Programmable Logic Device）である。またプロセッサ１００は、ＣＰＵ、ＭＰＵ、ＤＳＰ、ＡＳＩＣ、ＰＬＤのうちの２以上の要素の組み合わせであってもよい。 A memory 101, an input / output interface 102, and a network interface 104 are connected to the processor 100 via a bus 103. The processor 100 may be a multiprocessor. The processor 100 is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), or a PLD (Programmable Logic Device). Further, the processor 100 may be a combination of two or more elements of the CPU, MPU, DSP, ASIC, and PLD.

メモリ１０１は、記憶部１２の機能を含み、情報処理装置１０の主記憶装置として使用される。メモリ１０１には、プロセッサ１００に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、メモリ１０１には、プロセッサ１００による処理に要する各種データが格納される。 The memory 101 includes the function of the storage unit 12 and is used as the main storage device of the information processing device 10. At least a part of an OS (Operating System) program or an application program to be executed by the processor 100 is temporarily stored in the memory 101. In addition, various data required for processing by the processor 100 are stored in the memory 101.

また、メモリ１０１は、情報処理装置１０の補助記憶装置としても使用され、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。メモリ１０１は、補助記憶装置として、フラッシュメモリやＳＳＤ（Solid State Drive）等の半導体記憶装置やＨＤＤ（Hard Disk Drive）等の磁気記録媒体を含んでもよい。 The memory 101 is also used as an auxiliary storage device for the information processing device 10, and stores OS programs, application programs, and various data. The memory 101 may include a flash memory, a semiconductor storage device such as an SSD (Solid State Drive), or a magnetic recording medium such as an HDD (Hard Disk Drive) as an auxiliary storage device.

バス１０３に接続されている周辺機器としては、入出力インタフェース１０２およびネットワークインタフェース１０４がある。入出力インタフェース１０２は、プロセッサ１００からの命令にしたがって情報処理装置１０の状態を表示する表示装置として機能するモニタ（例えば、ＬＥＤ（Light Emitting Diode）やＬＣＤ（Liquid Crystal Display）等）が接続できる。 Peripheral devices connected to the bus 103 include an input / output interface 102 and a network interface 104. A monitor (for example, LED (Light Emitting Diode), LCD (Liquid Crystal Display), etc.) that functions as a display device that displays the status of the information processing device 10 according to an instruction from the processor 100 can be connected to the input / output interface 102.

さらに、入出力インタフェース１０２は、キーボードやマウス等の情報入力装置を接続可能であって、情報入力装置から送られてくる信号をプロセッサ１００に送信する。
さらにまた、入出力インタフェース１０２は、周辺機器を接続するための通信インタフェースとしても機能する。例えば、入出力インタフェース１０２は、レーザ光等を利用して、光ディスクに記録されたデータの読み取りを行う光学ドライブ装置を接続することができる。光ディスクには、ＤＶＤ（Digital Versatile Disc）、Ｂｌｕ−ｒａｙＤｉｓｃ（登録商標）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（Rewritable）等がある。 Further, the input / output interface 102 can be connected to an information input device such as a keyboard or a mouse, and transmits a signal sent from the information input device to the processor 100.
Furthermore, the input / output interface 102 also functions as a communication interface for connecting peripheral devices. For example, the input / output interface 102 can be connected to an optical drive device that reads data recorded on an optical disk by using laser light or the like. Optical discs include DVD (Digital Versatile Disc), Blu-ray Disc (registered trademark), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (Rewritable), and the like.

また、入出力インタフェース１０２は、メモリ装置やメモリリーダライタを接続することができる。メモリ装置は、入出力インタフェース１０２との通信機能を搭載した記録媒体である。メモリリーダライタは、メモリカードへのデータの書き込み、またはメモリカードからのデータの読み出しを行う装置である。メモリカードは、カード型の記録媒体である。 Further, the input / output interface 102 can be connected to a memory device or a memory reader / writer. The memory device is a recording medium equipped with a communication function with the input / output interface 102. A memory reader / writer is a device that writes data to or reads data from a memory card. A memory card is a card-type recording medium.

ネットワークインタフェース１０４は、ネットワークに接続してネットワークインタフェース制御を行う。例えば、ＮＩＣ（Network Interface Card）、無線ＬＡＮカード等を使用することもできる。ネットワークインタフェース１０４で受信されたデータは、メモリ１０１やプロセッサ１００に出力される。 The network interface 104 connects to the network and controls the network interface. For example, a NIC (Network Interface Card), a wireless LAN card, or the like can also be used. The data received by the network interface 104 is output to the memory 101 and the processor 100.

以上のようなハードウェア構成によって、情報処理装置１０の処理機能を実現することができる。例えば、情報処理装置１０は、プロセッサ１００がそれぞれ所定のプログラムを実行することで本発明の処理を行うことができる。 With the hardware configuration as described above, the processing function of the information processing apparatus 10 can be realized. For example, the information processing device 10 can perform the processing of the present invention by each processor 100 executing a predetermined program.

情報処理装置１０は、例えば、コンピュータで読み取り可能な記録媒体に記録されたプログラムを実行することにより、本発明の処理機能を実現する。情報処理装置１０に実行させる処理内容を記述したプログラムは、様々な記録媒体に記録しておくことができる。 The information processing device 10 realizes the processing function of the present invention, for example, by executing a program recorded on a computer-readable recording medium. The program that describes the processing content to be executed by the information processing apparatus 10 can be recorded on various recording media.

例えば、情報処理装置１０に実行させるプログラムを補助記憶装置に格納しておくことができる。プロセッサ１００は、補助記憶装置内のプログラムの少なくとも一部を主記憶装置にロードし、プログラムを実行する。 For example, a program to be executed by the information processing device 10 can be stored in the auxiliary storage device. The processor 100 loads at least a part of the program in the auxiliary storage device into the main storage device and executes the program.

また、光ディスク、メモリ装置、メモリカード等の可搬型記録媒体に記録しておくこともできる。可搬型記録媒体に格納されたプログラムは、例えば、プロセッサ１００からの制御により、補助記憶装置にインストールされた後、実行可能となる。またプロセッサ１００が、可搬型記録媒体から直接プログラムを読み出して実行することもできる。 It can also be recorded on a portable recording medium such as an optical disk, a memory device, or a memory card. The program stored in the portable recording medium can be executed after being installed in the auxiliary storage device, for example, under the control of the processor 100. The processor 100 can also read and execute the program directly from the portable recording medium.

＜声掛け動作シーケンス＞
図５は人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。
不審者に声掛けを行う場合の動作シーケンスを示している。
〔ステップＳ１１〕人物が入店する。
〔ステップＳ１１ａ〕カメラ２０は、入店した人物を撮影する。
〔ステップＳ１１ｂ〕カメラ２０は、人物の撮影画像を制御部１１に送信する。
〔ステップＳ１１ｃ〕制御部１１は、ＡＩ処理によって撮影画像を解析して、人物の検出および追跡を行う。 <Voice operation sequence>
FIG. 5 is a diagram showing an example of an operation sequence from shooting a person to speaking out.
The operation sequence when calling out to a suspicious person is shown.
[Step S11] A person enters the store.
[Step S11a] The camera 20 photographs a person who has entered the store.
[Step S11b] The camera 20 transmits a photographed image of a person to the control unit 11.
[Step S11c] The control unit 11 analyzes the captured image by AI processing to detect and track a person.

〔ステップＳ１２〕人物が何らかの不審行動を行う。
〔ステップＳ１２ａ〕カメラ２０は、人物の不審行動を撮影する。
〔ステップＳ１２ｂ〕カメラ２０は、不審行動の撮影画像を制御部１１に送信する。
〔ステップＳ１２ｃ〕制御部１１は、人物の行動パターンとして正常行動（または不審行動）のパターンをあらかじめ認識しており、受信した撮影画像にもとづき、行動パターンの判定を行う。そして、正常行動から外れる行動パターン（または不審行動パターン）を検出すると、不審者であると判定する。 [Step S12] A person performs some suspicious action.
[Step S12a] The camera 20 captures the suspicious behavior of the person.
[Step S12b] The camera 20 transmits a captured image of the suspicious behavior to the control unit 11.
[Step S12c] The control unit 11 recognizes in advance a pattern of normal behavior (or suspicious behavior) as a behavior pattern of a person, and determines the behavior pattern based on the received captured image. Then, when a behavior pattern (or a suspicious behavior pattern) that deviates from the normal behavior is detected, it is determined that the person is a suspicious person.

〔ステップＳ１３〕制御部１１は、通知用の端末４２に不審者を発見したことを通知する。
〔ステップＳ１４〕端末４２は、不審者が入店したことを画面上に表示する。
〔ステップＳ１５〕制御部１１は、スピーカ３０から不審者に声掛けを行うために、３Ｄ（Dimension）空間マッピング、頭部位置検出、回転角度算出および音声パターン選択の各処理を行う。 [Step S13] The control unit 11 notifies the notification terminal 42 that a suspicious person has been found.
[Step S14] The terminal 42 displays on the screen that a suspicious person has entered the store.
[Step S15] The control unit 11 performs 3D (Dimension) space mapping, head position detection, rotation angle calculation, and voice pattern selection in order to call out to a suspicious person from the speaker 30.

３Ｄ空間マッピング処理は、人物の位置を３Ｄ空間内にマッピングする。頭部位置検出処理は、３Ｄ空間内で人物の頭部位置の座標を検出する。回転角度算出処理は、検出した人物の頭部位置にスピーカ３０が向くように、スピーカ３０の回転角度を算出する。音声パターン選択処理は、声掛け時の音声パターンを複数の音源のうちから選択する（音声パターンの具体例については図８で後述する）。 The 3D space mapping process maps the position of a person in 3D space. The head position detection process detects the coordinates of the head position of a person in 3D space. The rotation angle calculation process calculates the rotation angle of the speaker 30 so that the speaker 30 faces the detected head position of the person. In the voice pattern selection process, a voice pattern at the time of speaking is selected from a plurality of sound sources (a specific example of the voice pattern will be described later in FIG. 8).

〔ステップＳ１６〕制御部１１は、声掛け命令（算出した回転角度および選択した音声パターン）をスピーカ３０に送信する。
〔ステップＳ１７〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された回転角度に駆動する。 [Step S16] The control unit 11 transmits a voice call command (calculated rotation angle and selected voice pattern) to the speaker 30.
[Step S17] The speaker 30 is driven to the instructed rotation angle based on the received voice command.

〔ステップＳ１８〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された音声パターンの音声を発して、不審者に向けて声掛けを行う。不審者は、声掛けに気づくことになる。 [Step S18] The speaker 30 emits a voice of the instructed voice pattern based on the received voice command to speak to the suspicious person. The suspicious person will notice the voice.

図６は人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。特定人物に声掛けを行う場合の動作シーケンスを示している。なお、特定人物とは、不審者以外の人物であり、例えば、来店した一般の客等が該当する。 FIG. 6 is a diagram showing an example of an operation sequence from shooting a person to speaking out. The operation sequence when calling out to a specific person is shown. The specific person is a person other than a suspicious person, and corresponds to, for example, a general customer who has visited the store.

〔ステップＳ２１〕人物が入店する。
〔ステップＳ２１ａ〕カメラ２０は、入店した人物を撮影する。
〔ステップＳ２１ｂ〕カメラ２０は、人物の撮影画像を制御部１１に送信する。
〔ステップＳ２１ｃ〕制御部１１は、ＡＩ処理によって撮影画像を解析して、特定人物の検出および追跡を行う。なお、制御部１１は、人物の行動パターンをあらかじめ認識しており、受信した撮影画像にもとづき、行動パターンの判定を行う。例えば、正常行動の行動パターンを検出すると、特定人物であると判定する。 [Step S21] A person enters the store.
[Step S21a] The camera 20 photographs a person who has entered the store.
[Step S21b] The camera 20 transmits a photographed image of a person to the control unit 11.
[Step S21c] The control unit 11 analyzes the captured image by AI processing to detect and track a specific person. The control unit 11 recognizes the behavior pattern of the person in advance, and determines the behavior pattern based on the received captured image. For example, when an action pattern of normal behavior is detected, it is determined that the person is a specific person.

〔ステップＳ２２〕制御部１１は、通知用の端末４２に特定人物を発見したことを通知する。
〔ステップＳ２３〕端末４２は、特定人物が入店したことを画面上に表示する。
〔ステップＳ２４〕制御部１１は、スピーカ３０から特定人物に声掛けを行うために、３Ｄ空間マッピング、頭部位置検出、回転角度算出および音声パターン選択の各処理を行う。 [Step S22] The control unit 11 notifies the notification terminal 42 that a specific person has been found.
[Step S23] The terminal 42 displays on the screen that a specific person has entered the store.
[Step S24] The control unit 11 performs 3D space mapping, head position detection, rotation angle calculation, and voice pattern selection in order to call out to a specific person from the speaker 30.

〔ステップＳ２５〕制御部１１は、声掛け命令（算出した回転角度および選択した音声パターン）をスピーカ３０に送信する。
〔ステップＳ２６〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された回転角度に駆動する。 [Step S25] The control unit 11 transmits a voice call command (calculated rotation angle and selected voice pattern) to the speaker 30.
[Step S26] The speaker 30 is driven to the instructed rotation angle based on the received voice command.

〔ステップＳ２７〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された音声パターンの音声を発して、特定人物に向けて声掛けを行う。特定人物は、声掛けに気づくことになる。 [Step S27] The speaker 30 emits a voice of the instructed voice pattern based on the received voice command, and speaks to a specific person. A specific person will notice the call.

図７は人物を撮影してから声掛けを行うまでの動作シーケンスの一例を示す図である。特定エリア内にいる特定人物に声掛けを行う場合の動作シーケンスを示している。特定エリア内にいる特定人物とは、例えば、店内の売り場にいる店員等が該当する。 FIG. 7 is a diagram showing an example of an operation sequence from shooting a person to speaking out. The operation sequence when calling out to a specific person in a specific area is shown. The specific person in the specific area corresponds to, for example, a clerk in the sales floor in the store.

〔ステップＳ３１〕人物が入店する。
〔ステップＳ３１ａ〕カメラ２０は、入店した人物を撮影する。
〔ステップＳ３１ｂ〕カメラ２０は、人物の撮影画像を制御部１１に送信する。
〔ステップＳ３１ｃ〕制御部１１は、ＡＩ処理によって撮影画像を解析して、特定人物を検出して３Ｄ空間にマッピングする。また、３Ｄ空間内で特定人物の追跡を行う。 [Step S31] A person enters the store.
[Step S31a] The camera 20 photographs a person who has entered the store.
[Step S31b] The camera 20 transmits a photographed image of a person to the control unit 11.
[Step S31c] The control unit 11 analyzes the captured image by AI processing, detects a specific person, and maps it to the 3D space. It also tracks a specific person in 3D space.

〔ステップＳ３２〕人物が特定エリアに入る。
〔ステップＳ３２ａ〕カメラ２０は、特定エリアにいる人物を撮影する。
〔ステップＳ３２ｂ〕カメラ２０は、人物の撮影画像を制御部１１に送信する。
〔ステップＳ３２ｃ〕制御部１１は、特定エリアに特定人物がいることを判定する。 [Step S32] A person enters a specific area.
[Step S32a] The camera 20 takes a picture of a person in a specific area.
[Step S32b] The camera 20 transmits a photographed image of a person to the control unit 11.
[Step S32c] The control unit 11 determines that a specific person is present in the specific area.

〔ステップＳ３３〕制御部１１は、通知用の端末４２に特定エリア内の特定人物を発見したことを通知する。
〔ステップＳ３４〕端末４２は、特定エリア内に特定人物がいることを画面上に表示する。 [Step S33] The control unit 11 notifies the notification terminal 42 that a specific person in the specific area has been found.
[Step S34] The terminal 42 displays on the screen that a specific person is in the specific area.

〔ステップＳ３５〕制御部１１は、スピーカ３０から特定人物に声掛けを行うために、頭部位置検出、回転角度算出および音声パターン選択の各処理を行う。
〔ステップＳ３６〕制御部１１は、声掛け命令（算出した回転角度および選択した音声パターン）をスピーカ３０に送信する。
〔ステップＳ３７〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された回転角度に駆動する。 [Step S35] The control unit 11 performs each process of head position detection, rotation angle calculation, and voice pattern selection in order to call out to a specific person from the speaker 30.
[Step S36] The control unit 11 transmits a voice command (calculated rotation angle and selected voice pattern) to the speaker 30.
[Step S37] The speaker 30 is driven to the instructed rotation angle based on the received voice command.

〔ステップＳ３８〕スピーカ３０は、受信した声掛け命令にもとづいて、指示された音声パターンの音声を発して、特定エリア内の特定人物に向けて声掛けを行う。特定エリア内の特定人物は、声掛けに気づくことになる。 [Step S38] The speaker 30 emits a voice of the instructed voice pattern based on the received voice command, and speaks to a specific person in the specific area. A specific person in a specific area will notice the call.

＜音声パターン＞
図８は音声パターンテーブルの一例を示す図である。音声パターンテーブル１２ａは、人物、音声ファイルおよび音声パターン（音声の内容）の項目を有し、該テーブルのデータ構造は、記憶部１２に格納されている。 <Voice pattern>
FIG. 8 is a diagram showing an example of a voice pattern table. The voice pattern table 12a has items of a person, a voice file, and a voice pattern (voice content), and the data structure of the table is stored in the storage unit 12.

テーブル内容として例えば、人物が不審者である場合、音声ファイルには、音声ファイル１．ｗａｖ、音声ファイル２．ｗａｖ、音声ファイル３．ｗａｖが登録されている。音声ファイル１．ｗａｖの音声パターンは“いらっしゃいませ”、音声ファイル２．ｗａｖの音声パターンは“ｘｘエリアにお客様がお待ちです”、音声ファイル３．ｗａｖの音声パターンは“お買い上げありがとうございます”という音声が登録されている。 As the table contents, for example, when the person is a suspicious person, the audio file includes the audio file 1. wav, audio file 2. wav, audio file 3. wav is registered. Audio file 1. The voice pattern of wav is "Welcome", voice file 2. The audio pattern of wav is "Customers are waiting in the xx area", audio file 3. As for the voice pattern of wav, the voice "Thank you for your purchase" is registered.

また、人物が特定人物（例えば、３０歳代男性）である場合、音声ファイルには、音声ファイル４．ｗａｖが登録されている。音声ファイル４．ｗａｖの音声パターンは“○○の商品がおすすめです”という音声が登録されている。 When the person is a specific person (for example, a man in his thirties), the audio file includes the audio file 4. wav is registered. Audio file 4. As for the voice pattern of wav, the voice "○○ products are recommended" is registered.

さらに、人物が特定エリア内の特定人物（例えば、店員）である場合、音声ファイルには、音声ファイル５．ｗａｖが登録されている。音声ファイル５．ｗａｖの音声パターンは“ｘｘに来てください”という音声が登録されている。
このように、音声パターンテーブル１２ａには、対象人物に声掛けを行う際に適した音声が登録されている。 Further, when the person is a specific person (for example, a clerk) in the specific area, the audio file includes the audio file 5. wav is registered. Audio file 5. As for the voice pattern of wav, the voice "Please come to xx" is registered.
As described above, in the voice pattern table 12a, voices suitable for speaking to the target person are registered.

＜３Ｄ空間におけるカメラと人物の位置＞
図９、図１０は３Ｄ空間におけるカメラと人物の位置を説明するための図である。なお、図１０は、図９のイメージをｘｚ平面で表現したものである。図９において、３Ｄのｘｙｚ空間に対象人物の足元が座標Ａ（ｘ１、ｙ１、ｚ１＝０）に位置している。また、カメラ２０が座標（ｘ２、ｙ２、ｚ２）に位置している。 <Position of camera and person in 3D space>
9 and 10 are diagrams for explaining the positions of the camera and the person in the 3D space. Note that FIG. 10 is a representation of the image of FIG. 9 in the xz plane. In FIG. 9, the feet of the target person are located at the coordinates A (x1, y1, z1 = 0) in the 3D xyz space. Further, the camera 20 is located at the coordinates (x2, y2, z2).

図１０においては、対象人物は座標（ｘ１、ｚ１）に位置し、カメラ２０は（ｘ２、ｚ２）に位置している。また、対象人物の頭上の座標は（ｘ１、Ｈ）であり、カメラ２０から対象人物の頭上に引いた線分がｘ軸に交わる点が座標Ｂ（ｘ３、ｚ３＝０）である。 In FIG. 10, the target person is located at the coordinates (x1, z1), and the camera 20 is located at (x2, z2). The overhead coordinates of the target person are (x1, H), and the point where the line segment drawn above the target person's head from the camera 20 intersects the x-axis is the coordinate B (x3, z3 = 0).

＜フローチャート＞
次に図１１から図１７のフローチャートを用いて詳細動作について説明する。図１１は人物を検出してから声掛けを行うまでの全体動作の一例を示すフローチャートである。
〔ステップＳ４１〕制御部１１は、ＡＩ処理による画像解析処理を起動する。
〔ステップＳ４２〕制御部１１は、カメラ２０の撮影画像から人物検出を行い、検出した人物が声掛けの対象人物か否かを判定する。声掛けの対象人物の場合はステップＳ４３に処理が進み、対象人物でない場合は人物検出および当該判定処理を繰り返す。 <Flow chart>
Next, the detailed operation will be described with reference to the flowcharts of FIGS. 11 to 17. FIG. 11 is a flowchart showing an example of the overall operation from the detection of a person to the calling.
[Step S41] The control unit 11 activates the image analysis process by the AI process.
[Step S42] The control unit 11 detects a person from the captured image of the camera 20 and determines whether or not the detected person is the target person for speaking. In the case of the target person to be called, the process proceeds to step S43, and if it is not the target person, the person detection and the determination process are repeated.

〔ステップＳ４３〕制御部１１は、３Ｄ空間における対象人物の頭部位置を検出する。
〔ステップＳ４４〕制御部１１は、対象人物の移動先の予測を行うか否かを判定する。移動先の予測を行う場合はステップＳ４５に処理が進み、移動先の予測を行わない場合はステップＳ４６に処理が進む。 [Step S43] The control unit 11 detects the head position of the target person in the 3D space.
[Step S44] The control unit 11 determines whether or not to predict the movement destination of the target person. If the movement destination is predicted, the process proceeds to step S45, and if the movement destination is not predicted, the process proceeds to step S46.

〔ステップＳ４５〕制御部１１は、対象人物の移動速度の推定と、頭部位置の更新を行う。
〔ステップＳ４６〕制御部１１は、スピーカ３０の回転角度を算出する。
〔ステップＳ４７〕制御部１１は、音声パターンテーブル１２ａを用いて、対象人物に適した音声パターンを選択する。 [Step S45] The control unit 11 estimates the moving speed of the target person and updates the head position.
[Step S46] The control unit 11 calculates the rotation angle of the speaker 30.
[Step S47] The control unit 11 selects a voice pattern suitable for the target person by using the voice pattern table 12a.

〔ステップＳ４８〕制御部１１は、対象人物を追跡しながら声掛けを行うか否かを判定する。追跡しながら声掛けを行う場合はステップＳ４９に処理が進み、追跡せずに声掛けを行う場合はステップＳ５０ａに処理が進む。
〔ステップＳ４９〕制御部１１は、対象人物の移動に伴うスピーカ３０の回転角度を算出する。ステップＳ５０ｂに処理が進む。 [Step S48] The control unit 11 determines whether or not to speak while tracking the target person. If the call is made while tracking, the process proceeds to step S49, and if the call is made without tracking, the process proceeds to step S50a.
[Step S49] The control unit 11 calculates the rotation angle of the speaker 30 as the target person moves. The process proceeds to step S50b.

〔ステップＳ５０ａ〕スピーカ３０は、制御部１１から指示された回転角度に駆動し、また制御部１１から指示された音声パターンで対象人物に声掛けを行う。
〔ステップＳ５０ｂ〕スピーカ３０は、制御部１１から指示された、対象人物の移動に合わせた回転角度に駆動し、また制御部１１から指示された音声パターンで対象人物に声掛けを行う。 [Step S50a] The speaker 30 is driven to a rotation angle instructed by the control unit 11, and also speaks to the target person in a voice pattern instructed by the control unit 11.
[Step S50b] The speaker 30 is driven to a rotation angle in accordance with the movement of the target person instructed by the control unit 11, and also speaks to the target person in a voice pattern instructed by the control unit 11.

図１２は頭部位置の検出処理の一例を示すフローチャートである。図１１のステップＳ４３の詳細フローを示している。
〔ステップＳ４３ａ〕制御部１１は、カメラ２０のキャリブレーションによる補正後のカメラ画面と、３Ｄ空間との対応付けを行う。 FIG. 12 is a flowchart showing an example of the head position detection process. The detailed flow of step S43 of FIG. 11 is shown.
[Step S43a] The control unit 11 associates the camera screen corrected by calibration of the camera 20 with the 3D space.

〔ステップＳ４３ｂ〕制御部１１は、カメラ２０で撮影された撮影画像から対象人物を検出し、対象人物の撮影画像内の座標を取得する。なお、人物検出が行われた場合、例えば、その人物の位置は矩形（矩形情報）で示される。 [Step S43b] The control unit 11 detects the target person from the captured image captured by the camera 20 and acquires the coordinates in the captured image of the target person. When a person is detected, for example, the position of the person is indicated by a rectangle (rectangular information).

〔ステップＳ４３ｃ〕制御部１１は、対象人物の矩形情報から足元の座標を検出する。例えば、人物位置を示す矩形の下底の中間点を算出し、その中間点を足元の座標とする。
〔ステップＳ４３ｄ〕制御部１１は、検出した足元座標を３Ｄ空間の座標上にマッピングする（図９の座標Ａに相当）。 [Step S43c] The control unit 11 detects the coordinates of the feet from the rectangular information of the target person. For example, the midpoint of the lower base of the rectangle indicating the position of the person is calculated, and the midpoint is used as the coordinates of the feet.
[Step S43d] The control unit 11 maps the detected foot coordinates onto the coordinates in the 3D space (corresponding to the coordinates A in FIG. 9).

〔ステップＳ４３ｅ〕制御部１１は、対象人物の矩形情報から頭上の座標を算出する。例えば、人物位置を示す矩形の上底の中間点を算出し、その中間点を頭上の座標とする。
〔ステップＳ４３ｆ〕制御部１１は、２Ｄ画像（撮影画像）の頭上の座標を３Ｄ画像での床上とみなして、３Ｄ空間上に頭上座標をマッピングする（図１０の座標Ｂに相当）。 [Step S43e] The control unit 11 calculates overhead coordinates from the rectangular information of the target person. For example, the midpoint of the upper base of the rectangle indicating the position of the person is calculated, and the midpoint is used as the overhead coordinates.
[Step S43f] The control unit 11 regards the overhead coordinates of the 2D image (captured image) as above the floor in the 3D image, and maps the overhead coordinates in the 3D space (corresponding to the coordinates B in FIG. 10).

〔ステップＳ４３ｇ〕制御部１１は、座標Ｂとカメラ２０の座標とを結んだ線分における座標Ａのｘ成分に等しいｚ成分を抽出する（ｘ成分ではなくｙ成分を使ってもよい）。
〔ステップＳ４３ｈ〕制御部１１は、抽出したｚ成分に対して、所定長低い（例えば、２０ｃｍ低い）位置を対象人物の耳の高さＨとする。
〔ステップＳ４３ｉ〕制御部１１は、座標Ａにおけるｚ成分を耳の高さにした座標値を頭部位置とし、この頭部位置を、スピーカ３０を向ける座標として確定する（座標Ｃとする）。 [Step S43g] The control unit 11 extracts a z component equal to the x component of the coordinate A in the line segment connecting the coordinates B and the coordinates of the camera 20 (the y component may be used instead of the x component).
[Step S43h] The control unit 11 sets the ear height H of the target person to a position that is a predetermined length lower (for example, 20 cm lower) with respect to the extracted z component.
[Step S43i] The control unit 11 sets the coordinate value in which the z component in the coordinate A is the height of the ear as the head position, and determines the head position as the coordinates for pointing the speaker 30 (referred to as the coordinate C).

このように、制御部１１は、撮影画像を３Ｄ空間にマッピングして、３Ｄ空間上で対象人物の頭上の位置を求め、頭上の位置から耳の位置を求めて、耳の位置を頭部位置とする。これにより、頭部位置に向けてスピーカ３０を回転させるので、スピーカ３０からの音声を対象人物に明確に聞かせることができる。 In this way, the control unit 11 maps the captured image to the 3D space, obtains the overhead position of the target person in the 3D space, obtains the ear position from the overhead position, and sets the ear position to the head position. And. As a result, since the speaker 30 is rotated toward the head position, the sound from the speaker 30 can be clearly heard by the target person.

図１３は対象人物の移動速度の推定および頭部位置の更新の一例を示すフローチャートである。図１１のステップＳ４５の詳細フローを示している。
〔ステップＳ４５ａ〕制御部１１は、対象人物の過去数秒分の２Ｄ画像内の足元の座標を複数検出する。 FIG. 13 is a flowchart showing an example of estimating the moving speed of the target person and updating the head position. The detailed flow of step S45 of FIG. 11 is shown.
[Step S45a] The control unit 11 detects a plurality of coordinates of the feet of the target person in the 2D image for the past several seconds.

〔ステップＳ４５ｂ〕制御部１１は、検出した過去の足元の座標を３Ｄ空間上の座標に変換する。これにより座標Ａを含む時系列の座標データを得る。
〔ステップＳ４５ｃ〕制御部１１は、時系列の座標データをもとに、ｔ秒後の対象人物の３Ｄ空間上の移動量を推定する。例えば、時系列の座標データから得られる座標間の移動速度をｘｙｚの３方向のベクトルとして求めた上でそれぞれの成分ごとに平均値を求める（移動速度Ｖａ）。そして、移動速度Ｖａに対して時間ｔを乗算することで、ｔ秒後の移動量ｄＬが推定できる。 [Step S45b] The control unit 11 converts the detected coordinates of the past feet into the coordinates in the 3D space. As a result, time-series coordinate data including the coordinate A is obtained.
[Step S45c] The control unit 11 estimates the amount of movement of the target person in 3D space after t seconds based on the time-series coordinate data. For example, the moving speed between coordinates obtained from the time-series coordinate data is obtained as a vector in three directions of xyz, and then the average value is obtained for each component (moving speed Va). Then, by multiplying the movement speed Va by the time t, the movement amount dL after t seconds can be estimated.

ただし、ｔ秒は対象人物を検出した時間から音声を出力するまでの遅延時間に相当するものである。ｔ秒は事前のシステムテスト等で求めておいて、設定値としてあらかじめ保持しておくものとする。 However, t seconds corresponds to the delay time from the time when the target person is detected to the time when the sound is output. It is assumed that t seconds is obtained by a system test or the like in advance and is retained as a set value in advance.

〔ステップＳ４５ｄ〕制御部１１は、座標Ａに対してｘｙｚ方向のｔ秒後の移動量ｄＬを加算する。これにより、座標（Ａ＋ｄＬ）を得られる。座標（Ａ＋ｄＬ）は、声掛けをすべき対象人物の足元の座標になる（座標Ａ２とする）。また、座標Ａ２のｚ成分を耳の高さＨとすることで、これが移動後の頭部位置となり、スピーカ３０を向ける対象の座標となる（頭部位置の座標Ｃの更新）。 [Step S45d] The control unit 11 adds the movement amount dL after t seconds in the xyz direction with respect to the coordinate A. As a result, the coordinates (A + dL) can be obtained. The coordinates (A + dL) are the coordinates of the feet of the target person to be called (referred to as coordinates A2). Further, by setting the z component of the coordinate A2 to the height H of the ear, this becomes the head position after movement and becomes the coordinates of the target to which the speaker 30 is directed (update of the coordinate C of the head position).

ここで、人物に声掛けを行う場合、人物検出からスピーカ３０から音声を出力させるまでに遅延時間が発生する。仮にこの遅延時間を考慮しないと、人物に向けて声掛けを行っても、その人物が移動している場合は、すでにその人物がいないことが起こりうる。 Here, when speaking to a person, a delay time occurs from the person detection to the output of the voice from the speaker 30. If this delay time is not taken into consideration, even if a person is called to a person, if the person is moving, it is possible that the person is no longer present.

上記のように、制御部１１は、２次元画像から一定の時間間隔で対象人物の足元の座標を複数検出して時系列の座標データを取得し、座標データから算出した移動量にもとづいて頭部位置の更新を行う。これにより、対象人物の移動後の位置を精度よく検出することができる。 As described above, the control unit 11 detects a plurality of coordinates of the feet of the target person from the two-dimensional image at regular time intervals, acquires time-series coordinate data, and heads based on the movement amount calculated from the coordinate data. Update the department position. As a result, the position of the target person after movement can be detected with high accuracy.

また、制御部１１は、対象人物の検出からスピーカ３０から音声パターンが出力されるまでの遅延時間を含めて移動量を算出する。これにより、人物が移動していても移動後の人物の頭部に向けてスピーカ３０から音声を出力させることができ、声掛け精度を向上させることができる。 Further, the control unit 11 calculates the movement amount including the delay time from the detection of the target person to the output of the voice pattern from the speaker 30. As a result, even if the person is moving, the speaker 30 can output the voice toward the head of the person after the movement, and the accuracy of speaking can be improved.

図１４はスピーカの回転角度の算出処理の一例を示すフローチャートである。図１１のステップＳ４６の詳細フローを示している。
〔ステップＳ４６ａ〕制御部１１は、３Ｄ空間における、座標Ｃ（頭部位置）からスピーカ３０の設置座標を減算する。この減算処理はスピーカ３０を中心とした座標Ｃのベクトル化を行うものであり、減算結果をベクトルＳと呼ぶ。 FIG. 14 is a flowchart showing an example of the calculation process of the rotation angle of the speaker. The detailed flow of step S46 of FIG. 11 is shown.
[Step S46a] The control unit 11 subtracts the installation coordinates of the speaker 30 from the coordinates C (head position) in the 3D space. This subtraction process vectorizes the coordinates C centered on the speaker 30, and the subtraction result is called a vector S.

〔ステップＳ４６ｂ〕制御部１１は、ベクトルＳの水平方向成分（ｘ成分とｙ成分）から水平方向の回転角（水平回転角）を算出する。水平回転角の算出式は、以下の式（１）になる。 [Step S46b] The control unit 11 calculates the horizontal rotation angle (horizontal rotation angle) from the horizontal components (x component and y component) of the vector S. The formula for calculating the horizontal rotation angle is the following formula (1).

〔ステップＳ４６ｃ〕制御部１１は、式（１）で求めた水平回転角で回転したときの回転方向成分を新たにｒ成分として、ｒ成分をｘ成分とｙ成分から算出する。ｒ成分の算出式は、以下の式（２）になる。 [Step S46c] The control unit 11 calculates the r component from the x component and the y component, using the rotation direction component when rotating at the horizontal rotation angle obtained by the equation (1) as a new r component. The formula for calculating the r component is the following formula (2).

〔ステップＳ４６ｄ〕制御部１１は、上記のｒ成分と、ベクトルＳの垂直方向成分であるｚ成分とから垂直方向の回転角（垂直回転角）を算出する。垂直回転角の算出式は、以下の式（３）になる。 [Step S46d] The control unit 11 calculates a vertical rotation angle (vertical rotation angle) from the above r component and the z component which is a vertical component of the vector S. The formula for calculating the vertical rotation angle is the following formula (3).

制御部１１は、上記のような算出式を用いて、水平回転角および垂直回転角を求めることにより、スピーカ３０の回転角度を容易に精度よく算出することができる。 The control unit 11 can easily and accurately calculate the rotation angle of the speaker 30 by obtaining the horizontal rotation angle and the vertical rotation angle by using the above calculation formula.

図１５は対象人物の移動に伴う回転角度の算出処理の一例を示すフローチャートである。図１１のステップＳ４９の詳細フローを示している。
〔ステップＳ４９ａ〕制御部１１は、声掛けを行う際に選択した音声パターンの再生時間ｔ２を決定する。 FIG. 15 is a flowchart showing an example of the calculation process of the rotation angle accompanying the movement of the target person. The detailed flow of step S49 of FIG. 11 is shown.
[Step S49a] The control unit 11 determines the reproduction time t2 of the voice pattern selected when speaking.

〔ステップＳ４９ｂ〕制御部１１は、移動速度Ｖａに時間ｔ２を乗算し、乗算結果を移動量として算出する。
〔ステップＳ４９ｃ〕制御部１１は、算出した移動量を座標Ａ２（移動後の足元座標）に加算すると共に、ｚ成分を耳の高さＨとして頭部位置を求める（座標Ｃａとする）。この頭部位置は、声掛け終了時の対象人物の頭部の座標になる。 [Step S49b] The control unit 11 multiplies the movement speed Va by the time t2, and calculates the multiplication result as the movement amount.
[Step S49c] The control unit 11 adds the calculated movement amount to the coordinates A2 (coordinates of the feet after the movement), and obtains the head position with the z component as the height H of the ear (referred to as the coordinates Ca). This head position becomes the coordinates of the head of the target person at the end of the voice call.

〔ステップＳ４９ｄ〕制御部１１は、座標Ｃａからスピーカ３０の設置座標を減算する。これはスピーカ３０を中心とした座標Ｃａのベクトル化に相当するものであり、減算結果をベクトルＳａとする。 [Step S49d] The control unit 11 subtracts the installation coordinates of the speaker 30 from the coordinates Ca. This corresponds to the vectorization of the coordinates Ca centered on the speaker 30, and the subtraction result is the vector Sa.

〔ステップＳ４９ｅ〕制御部１１は、ベクトルＳａのｘ成分とｙ成分から、式（１）を用いて水平方向の回転角（水平回転角）を算出する。
〔ステップＳ４９ｆ〕制御部１１は、式（２）を用いて、水平方向の回転角方向を新たにｒ成分とし、ｘ成分とｙ成分からｒ成分を算出する。
〔ステップＳ４９ｇ〕制御部１１は、ベクトルＳａのｒ成分とｚ成分から垂直方向の回転角（垂直回転角）を算出する。 [Step S49e] The control unit 11 calculates a horizontal rotation angle (horizontal rotation angle) from the x and y components of the vector Sa using the equation (1).
[Step S49f] Using the equation (2), the control unit 11 newly sets the horizontal rotation angle direction as the r component, and calculates the r component from the x component and the y component.
[Step S49g] The control unit 11 calculates the rotation angle (vertical rotation angle) in the vertical direction from the r component and z component of the vector Sa.

図１６はスピーカの回転駆動および声掛けの動作の一例を示すフローチャートである。図１１のステップＳ５０ａの詳細フローを示している。なお、図１４で上述した、座標Ｃ（最初の頭部位置）にもとづいて算出したスピーカ３０の水平回転角を水平回転角ａ１とし、座標Ｃにもとづいて算出したスピーカ３０の垂直回転角を垂直回転角ｂ１とする。 FIG. 16 is a flowchart showing an example of the rotational drive of the speaker and the operation of speaking. The detailed flow of step S50a of FIG. 11 is shown. The horizontal rotation angle of the speaker 30 calculated based on the coordinates C (first head position) described in FIG. 14 is defined as the horizontal rotation angle a1, and the vertical rotation angle of the speaker 30 calculated based on the coordinates C is vertical. The rotation angle is b1.

〔ステップＳ５０ａ１〕制御部１１は、スピーカ３０に対して、算出した水平回転角ａ１および垂直回転角ｂ１（第１の回転角度）と、選択した音声パターンとをスピーカ３０に送信する。
〔ステップＳ５０ａ２〕スピーカ３０は、水平回転角ａ１および垂直回転角ｂ１で回転駆動する。
〔ステップＳ５０ａ３〕スピーカ３０は、回転駆動が終了すると、指示された音声パターンで対象人物に向けて声掛けを行う。 [Step S50a1] The control unit 11 transmits the calculated horizontal rotation angle a1 and vertical rotation angle b1 (first rotation angle) to the speaker 30 and the selected voice pattern to the speaker 30.
[Step S50a2] The speaker 30 is rotationally driven at a horizontal rotation angle a1 and a vertical rotation angle b1.
[Step S50a3] When the rotation drive is completed, the speaker 30 speaks to the target person with the instructed voice pattern.

図１７はスピーカの回転駆動および声掛けの動作の一例を示すフローチャートである。図１１のステップＳ５０ｂの詳細フローを示している。なお、図１５で上述した、座標Ｃａ（移動後の頭部位置）にもとづいて算出したスピーカ３０の水平回転角を水平回転角ａ２とし、座標Ｃａにもとづいて算出したスピーカ３０の垂直回転角を垂直回転角ｂ２とする。 FIG. 17 is a flowchart showing an example of the rotational drive of the speaker and the operation of speaking. The detailed flow of step S50b of FIG. 11 is shown. The horizontal rotation angle of the speaker 30 calculated based on the coordinates Ca (head position after movement) described in FIG. 15 is defined as the horizontal rotation angle a2, and the vertical rotation angle of the speaker 30 calculated based on the coordinates Ca is defined as the horizontal rotation angle a2. The vertical rotation angle is b2.

〔ステップＳ５０ｂ１〕制御部１１は、スピーカ３０に対して、算出した水平回転角ａ１および垂直回転角ｂ１（第１の回転角度）と、選択した音声パターンとをスピーカ３０に送信する。
〔ステップＳ５０ｂ２〕制御部１１は、スピーカ３０に対して、算出した水平回転角ａ２および垂直回転角ｂ２（第２の回転角度）と、時間ｔ２の情報とをスピーカ３０に送信する。なお、時間ｔ２は、上述のように遅延が考慮された音声パターンの再生時間である。 [Step S50b1] The control unit 11 transmits the calculated horizontal rotation angle a1 and vertical rotation angle b1 (first rotation angle) to the speaker 30 and the selected voice pattern to the speaker 30.
[Step S50b2] The control unit 11 transmits the calculated horizontal rotation angle a2 and vertical rotation angle b2 (second rotation angle) and the information of the time t2 to the speaker 30 to the speaker 30. The time t2 is the reproduction time of the voice pattern in consideration of the delay as described above.

〔ステップＳ５０ｂ３〕スピーカ３０は、水平回転角ａ１および垂直回転角ｂ１（第１の回転角度）で回転駆動する。
〔ステップＳ５０ｂ４〕スピーカ３０は、水平回転角ａ１および垂直回転角ｂ１の回転駆動の終了後、指示された音声パターンで、かつ送信された音声パターンの再生時間（時間ｔ２）で声掛けを行う。さらに、スピーカ３０は、声掛けを行いながら、水平回転角ａ２および垂直回転角ｂ２（第２の回転角度）で回転駆動する。 [Step S50b3] The speaker 30 is rotationally driven at a horizontal rotation angle a1 and a vertical rotation angle b1 (first rotation angle).
[Step S50b4] After the rotation drive of the horizontal rotation angle a1 and the vertical rotation angle b1 is completed, the speaker 30 speaks with the instructed voice pattern and at the reproduction time (time t2) of the transmitted voice pattern. Further, the speaker 30 is rotationally driven at a horizontal rotation angle a2 and a vertical rotation angle b2 (second rotation angle) while speaking out.

〔ステップＳ５０ｂ５〕スピーカ３０は声掛けを行う。また、スピーカ３０が声掛けを終了すると同時または終了した後に回転駆動が停止する。
このように、制御部１１は、頭部位置の座標から３Ｄ空間上のスピーカ３０の設置位置の座標を減算してスピーカ３０を中心とする頭部位置の座標のベクトルを算出し、ベクトルの水平方向成分にもとづいてスピーカ３０の水平回転角を算出する。
そして、スピーカ３０が水平回転角で回転したときの回転方向成分と、ベクトルの垂直方向成分とにもとづいてスピーカ３０の垂直回転角を算出し、水平回転角および垂直回転角を、スピーカ３０の回転角度とする。これにより、水平方向と垂直方向の２軸回転機構を有するスピーカ３０の回転角度を精度よく求めることができる。 [Step S50b5] The speaker 30 speaks. Further, when the speaker 30 finishes speaking, the rotation drive stops at the same time or after the end.
In this way, the control unit 11 subtracts the coordinates of the installation position of the speaker 30 in the 3D space from the coordinates of the head position to calculate the vector of the coordinates of the head position centered on the speaker 30, and the vector is horizontal. The horizontal rotation angle of the speaker 30 is calculated based on the directional component.
Then, the vertical rotation angle of the speaker 30 is calculated based on the rotation direction component when the speaker 30 is rotated at the horizontal rotation angle and the vertical component of the vector, and the horizontal rotation angle and the vertical rotation angle are set to the rotation of the speaker 30. Let it be an angle. As a result, the rotation angle of the speaker 30 having the biaxial rotation mechanism in the horizontal direction and the vertical direction can be accurately obtained.

さらに、制御部１１は、検出した頭部位置にスピーカ３０を向ける第１の回転角度（水平回転角ａ１および垂直回転角ｂ１）を算出し、対象人物の移動先の予測を行わない場合、スピーカ３０を第１の回転角度で回転させ音声パターンをスピーカ３０から出力させる。
また、対象人物の移動先の予測を行う場合、更新後の頭部位置にスピーカ３０を向ける第２の回転角度（水平回転角ａ２および垂直回転角ｂ２）を算出し、スピーカ３０を第１の回転角度で回転させ、第１の回転角度の回転の終了後に、スピーカ３０から音声パターンを出力させながら、スピーカ３０を第２の回転角度で回転させる。
これにより、対象人物の移動に追随するようにスピーカ３０が制御されるので、対象人物が移動することによって、スピーカ３０からの音声が対象人物に到達せずに、対象人物が声掛けを聞き逃してしまうといったことを防止することができる。 Further, when the control unit 11 calculates the first rotation angle (horizontal rotation angle a1 and vertical rotation angle b1) for directing the speaker 30 to the detected head position and does not predict the movement destination of the target person, the speaker 30 is rotated by the first rotation angle to output an audio pattern from the speaker 30.
Further, when predicting the movement destination of the target person, the second rotation angle (horizontal rotation angle a2 and vertical rotation angle b2) for directing the speaker 30 to the updated head position is calculated, and the speaker 30 is used as the first rotation angle. It is rotated at a rotation angle, and after the rotation of the first rotation angle is completed, the speaker 30 is rotated at the second rotation angle while outputting an audio pattern from the speaker 30.
As a result, the speaker 30 is controlled so as to follow the movement of the target person. Therefore, when the target person moves, the voice from the speaker 30 does not reach the target person, and the target person misses the voice. It is possible to prevent such a situation.

このように、第２の実施の形態の情報処理システム１−２では、カメラ２０と非一体型であり指向性を有して回転駆動するスピーカ３０を用いて、カメラ２０で撮影された撮影画像から算出した対象人物の頭部位置に向けてスピーカ３０を回転させて、スピーカ３０から対象人物に音声を出力させる構成とした。これにより、スピーカ設置数を減少させることができるので、システム規模の増加を抑えて、効率よく所定の空間内に位置する人物に対して音声通知を行うことが可能になる。 As described above, in the information processing system 1-2 of the second embodiment, the captured image taken by the camera 20 is used by the speaker 30 which is not integrated with the camera 20 and is rotationally driven with directivity. The speaker 30 is rotated toward the head position of the target person calculated from the above, and the speaker 30 outputs the sound to the target person. As a result, the number of speakers installed can be reduced, so that it is possible to suppress an increase in the system scale and efficiently perform voice notification to a person located in a predetermined space.

上記で説明した本発明の情報処理システム１−１、１−２の処理機能は、コンピュータによって実現することができる。この場合、情報処理システム１−１、１−２が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 The processing functions of the information processing systems 1-1 and 1-2 of the present invention described above can be realized by a computer. In this case, a program that describes the processing contents of the functions that the information processing systems 1-1 and 1-2 should have is provided. By executing the program on a computer, the above processing function is realized on the computer.

処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記憶部、光ディスク、光磁気記録媒体、半導体メモリ等がある。磁気記憶部には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープ等がある。光ディスクには、ＣＤ−ＲＯＭ／ＲＷ等がある。光磁気記録媒体には、ＭＯ（Magneto Optical disk）等がある。 The program that describes the processing content can be recorded on a computer-readable recording medium. Computer-readable recording media include magnetic storage units, optical disks, opto-magnetic recording media, semiconductor memories, and the like. The magnetic storage unit includes a hard disk device (HDD), a flexible disk (FD), a magnetic tape, and the like. Optical discs include CD-ROM / RW and the like. The magneto-optical recording medium includes MO (Magneto Optical disk) and the like.

プログラムを流通させる場合、例えば、そのプログラムが記録されたＣＤ−ＲＯＭ等の可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶部に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When a program is distributed, for example, a portable recording medium such as a CD-ROM on which the program is recorded is sold. It is also possible to store the program in the storage unit of the server computer and transfer the program from the server computer to another computer via the network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶部に格納する。そして、コンピュータは、自己の記憶部からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage unit. Then, the computer reads the program from its own storage unit and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute the processing according to the program.

また、コンピュータは、ネットワークを介して接続されたサーバコンピュータからプログラムが転送される毎に、逐次、受け取ったプログラムに従った処理を実行することもできる。また、上記の処理機能の少なくとも一部を、ＤＳＰ、ＡＳＩＣ、ＰＬＤ等の電子回路で実現することもできる。 In addition, the computer can sequentially execute processing according to the received program each time the program is transferred from the server computer connected via the network. Further, at least a part of the above processing functions can be realized by an electronic circuit such as a DSP, ASIC, or PLD.

以上、実施の形態を例示したが、実施の形態で示した各部の構成は同様の機能を有する他のものに置換することができる。また、他の任意の構成物や工程が付加されてもよい。さらに、前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 Although the embodiment has been illustrated above, the configuration of each part shown in the embodiment can be replaced with another having the same function. Further, any other components or processes may be added. Further, any two or more configurations (features) of the above-described embodiments may be combined.

１−１情報処理システム
１情報処理装置
１ａ制御部
１ｂ記憶部
２カメラ
３スピーカ 1-1 Information processing system 1 Information processing device 1a Control unit 1b Storage unit 2 Camera 3 Speaker

Claims

With the camera
A speaker that is not integrated with the camera and has directivity and is driven to rotate,
The target person is identified from the captured image taken by the camera, the head position of the target person is detected, the rotation angle of the speaker for emitting a sound toward the head position is calculated, and the target person A control unit that selects a voice pattern to be emitted from the speaker, rotates the speaker at the rotation angle, and outputs the voice pattern from the speaker.
With
The control unit
The two-dimensional image of the captured image is associated with the three-dimensional space, the coordinates of the feet and overhead of the target person are detected from the two-dimensional image, and the coordinates of the feet and the coordinates of the overhead are converted into the three-dimensional space. The position of the ear is detected by subtracting a predetermined value from the overhead height of the target person based on the overhead coordinates mapped to the three-dimensional space, and the position of the ear is the position of the head of the target person. age,
A plurality of coordinates of the foot of the target person are detected from the two-dimensional image at regular time intervals to acquire time-series coordinate data, and the movement amount of the target person after a lapse of a predetermined time is calculated from the coordinate data. Then, the head position is updated based on the movement amount, and the head position is updated.
The delay time from the detection of the target person to the output of the voice pattern from the speaker is held, and the movement amount is calculated by including the delay time in the predetermined time.
The control unit
The first rotation angle at which the speaker is directed to the detected head position is calculated.
When the movement destination of the target person is not predicted, the speaker is rotated at the first rotation angle to output the voice pattern from the speaker.
When predicting the movement destination of the target person, a second rotation angle at which the speaker is directed to the updated head position is calculated, the speaker is rotated at the first rotation angle, and the first rotation angle is used. After the rotation of the rotation angle is completed, the speaker is rotated at the second rotation angle while outputting the sound pattern from the speaker.
Information processing system.

The control unit
The coordinates of the installation position of the speaker in the three-dimensional space are subtracted from the coordinates of the head position to calculate the vector of the coordinates of the head position centered on the speaker.
The horizontal rotation angle of the speaker is calculated based on the horizontal component of the vector.
The vertical rotation angle of the speaker is calculated based on the rotation direction component when the speaker is rotated at the horizontal rotation angle and the vertical component of the vector.
The information processing system according to claim 1, wherein the horizontal rotation angle and the vertical rotation angle are the rotation angles of the speaker.

The target person is identified from the captured image taken by the camera, the head position of the target person is detected, and the head position is relative to a speaker that is non-integrated with the camera and has directivity and is rotationally driven. A control unit that calculates the rotation angle of the speaker for emitting sound toward the target person, selects an audio pattern to be emitted to the target person, rotates the speaker at the rotation angle, and outputs the audio pattern from the speaker. When,
A storage unit that stores the voice pattern and
With
The control unit
The two-dimensional image of the captured image is associated with the three-dimensional space, the coordinates of the feet and overhead of the target person are detected from the two-dimensional image, and the coordinates of the feet and the coordinates of the overhead are converted into the three-dimensional space. The position of the ear is detected by subtracting a predetermined value from the overhead height of the target person based on the overhead coordinates mapped to the three-dimensional space, and the position of the ear is the position of the head of the target person. age,
A plurality of coordinates of the foot of the target person are detected from the two-dimensional image at regular time intervals to acquire time-series coordinate data, and the movement amount of the target person after a lapse of a predetermined time is calculated from the coordinate data. Then, the head position is updated based on the movement amount, and the head position is updated.
The delay time from the detection of the target person to the output of the voice pattern from the speaker is maintained, and the movement amount is calculated by including the delay time in the predetermined time.
The control unit
The first rotation angle at which the speaker is directed to the detected head position is calculated.
When the movement destination of the target person is not predicted, the speaker is rotated at the first rotation angle to output the voice pattern from the speaker.
When predicting the movement destination of the target person, a second rotation angle for directing the speaker to the updated head position is calculated, the speaker is rotated at the first rotation angle, and the first rotation angle is used. After the rotation of the rotation angle is completed, the speaker is rotated at the second rotation angle while outputting the sound pattern from the speaker.
Information processing device.

On the computer
Identify the target person from the captured image taken by the camera,
The head position of the target person is detected,
The rotation angle of the speaker for emitting sound toward the head position with respect to the speaker which is not integrated with the camera and is driven to rotate with directivity is calculated.
Select the voice pattern to be emitted to the target person,
The speaker is rotated at the rotation angle to output the voice pattern from the speaker.
The two-dimensional image of the captured image is associated with the three-dimensional space, the coordinates of the feet and overhead of the target person are detected from the two-dimensional image, and the coordinates of the feet and the coordinates of the overhead are converted into the three-dimensional space. The position of the ear is detected by subtracting a predetermined value from the overhead height of the target person based on the overhead coordinates mapped to the three-dimensional space, and the position of the ear is the position of the head of the target person. age,
A plurality of coordinates of the foot of the target person are detected from the two-dimensional image at regular time intervals to acquire time-series coordinate data, and the movement amount of the target person after a lapse of a predetermined time is calculated from the coordinate data. Then, the head position is updated based on the movement amount, and the head position is updated.
The delay time from the detection of the target person to the output of the voice pattern from the speaker is held, and the movement amount is calculated by including the delay time in the predetermined time.
The first rotation angle at which the speaker is directed to the detected head position is calculated.
When the movement destination of the target person is not predicted, the speaker is rotated at the first rotation angle to output the voice pattern from the speaker.
When predicting the movement destination of the target person, a second rotation angle at which the speaker is directed to the updated head position is calculated, the speaker is rotated at the first rotation angle, and the first rotation angle is used. After the rotation of the rotation angle is completed, the speaker is rotated at the second rotation angle while outputting the sound pattern from the speaker.
A program that executes processing.