JP7176244B2

JP7176244B2 - Robot, robot control method and program

Info

Publication number: JP7176244B2
Application number: JP2018116650A
Authority: JP
Inventors: 哲司牧野; 英里奈市川
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2022-11-22
Anticipated expiration: 2038-06-20
Also published as: JP2019219509A

Description

本発明は、ロボット、ロボットの制御方法及びプログラムに関する。 The present invention relates to a robot, a robot control method, and a program.

一般家庭等において音声対話を目的としたロボットが用いられている。音声対話を目的としたロボットの普及には、ロボットと、ユーザを含む所定の対象と、の対話の精度を向上させることが重要な点となっている。ロボットと所定の対象との距離が、ロボットと所定の対象との対話に必要な対話距離よりも離れている場合、又は、雑音が大きい場合には、ロボットと所定の対象との対話が困難である。 Robots for the purpose of voice interaction are used in general households and the like. Improving the precision of dialogue between a robot and a predetermined target including a user is important for the spread of robots intended for voice dialogue. When the distance between the robot and the predetermined target is longer than the dialogue distance required for dialogue between the robot and the predetermined target, or when the noise is large, it is difficult for the robot to interact with the predetermined target. be.

例えば、特許文献１は、所定の対象からの音源方向を推定して、推定された音源方向へロボットを移動させて、所定の対象と対話をする対話型ロボットを開示する。 For example, Patent Literature 1 discloses an interactive robot that estimates the direction of a sound source from a predetermined target, moves the robot in the estimated direction of the sound source, and interacts with the predetermined target.

特開２００６－１８１６５１号公報Japanese Patent Application Laid-Open No. 2006-181651

しかしながら、特許文献１の技術においては、所定の対象から音源方向へロボットを移動させて、所定の対象と対話をするだけであるため、対話距離よりも離れている所定の対象から話しかけられた場合や、雑音が大きいときに所定の対象から話しかけられた場合に、ロボットと所定の対象との対話時の音声認識精度を向上させることが困難であるという問題がある。 However, in the technique of Patent Literature 1, since the robot is moved from a predetermined target toward the sound source and only interacts with the predetermined target, when the predetermined target that is farther than the dialogue distance talks to the robot, Another problem is that it is difficult to improve the accuracy of speech recognition during dialogue between the robot and the predetermined target when the target speaks to the robot when there is a lot of noise.

本発明は、前記のような課題を解決するためになされたものであり、対話時の音声認識精度が高いロボット、ロボットの制御方法及びプログラムを提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a robot, a method for controlling the robot, and a program that achieve high speech recognition accuracy during dialogue.

本発明の目的を達成するため、本発明に係るロボットの一様態は、
自装置に動作をさせる動作手段と、
前記自装置を移動させる移動手段と、
前記自装置が所定の対象から話しかけられた場合、前記自装置と前記所定の対象との間の距離を検出し、検出された前記距離が前記所定の対象との対話の音声認識精度を確保できる対話可能距離を超えるか否かを判定する判定手段と、
前記判定手段により、検出された前記距離が前記対話可能距離を超えていると判定された場合、前記自装置が前記所定の対象に向けて移動をするように前記移動手段を制御する制御手段と、
を備え、
前記制御手段は、前記自装置の前記所定の対象に向けての移動中に前記所定の対象から再度話しかけられた場合、前記判定手段により前記自装置と前記所定の対象との間の距離が前記対話可能距離以下となったと判定された後に、前記移動中に話しかけられた内容を認識できていないことを示す動作をするように前記動作手段を制御する、
ことを特徴とする。 In order to achieve the object of the present invention, one aspect of the robot according to the present invention is
an operation means for causing the device to operate;
a moving means for moving the own device;
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. a determination means for determining whether or not the interactive distance is exceeded ;
a control means for controlling the movement means so that the self-device moves toward the predetermined target when the determination means determines that the detected distance exceeds the interactive distance ; ,
with
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, the control means determines the distance between the device and the predetermined target by the determination device. controlling the operating means so as to perform an operation indicating that the content spoken to during the movement is not recognized after it is determined that the distance is equal to or less than the interactive distance ;
It is characterized by

本発明の目的を達成するため、本発明に係るロボットの制御方法の一様態は、
自装置に動作をさせる動作手段と、前記自装置を移動させる移動手段と、を備えるロボットの制御方法であって、
前記自装置が所定の対象から話しかけられた場合、前記自装置と前記所定の対象との間の距離を検出し、検出された前記距離が前記所定の対象との対話の音声認識精度を確保できる対話可能距離を超えるか否かを判定する判定ステップと、
前記判定ステップにより、検出された前記距離が前記対話可能距離を超えていると判定された場合に、前記所定の対象に前記話しかけの中断をさせるための第１の応答をするように前記動作手段を制御する制御ステップと、
を含み、
前記判定ステップにより、前記距離が前記対話可能距離を超えると判定された場合、前記自装置が前記所定の対象に向けて移動をするように前記移動手段を制御し、
前記自装置の前記所定の対象に向けての移動中に前記所定の対象から再度話しかけられた場合、前記自装置と前記所定の対象との間の距離が前記対話可能距離以下となったと判定された後に、前記移動中に話しかけられた内容を認識できていないことを示す動作をするように前記動作手段を制御する、
ことを特徴とする。 In order to achieve the object of the present invention, one aspect of the robot control method according to the present invention includes:
A control method for a robot comprising operating means for causing the own device to operate and moving means for moving the own device ,
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. a determination step of determining whether or not the interactive distance is exceeded ;
When the determination step determines that the detected distance exceeds the interactive distance, the operation means makes a first response for causing the predetermined target to stop the talking. a control step for controlling
including
if the determining step determines that the distance exceeds the interactive distance, controlling the moving means so that the own device moves toward the predetermined target;
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, it is determined that the distance between the device and the predetermined target is equal to or less than the interactive distance. After that, controlling the operation means to perform an operation indicating that the content spoken to during the movement is not recognized.
It is characterized by

本発明の目的を達成するため、本発明に係るプログラムの一様態は、
自装置に動作をさせる動作手段と、前記自装置を移動させる移動手段と、を備えるロボットを制御するコンピュータを、
前記自装置が所定の対象から話しかけられた場合、前記自装置と前記所定の対象との間の距離を検出し、検出された前記距離が前記所定の対象との対話の音声認識精度を確保できる対話可能距離を超えるか否かを判定する判定手段、
前記判定手段により、検出された前記距離が前記対話可能距離を超えていると判定された場合に、前記所定の対象に前記話しかけの中断をさせるための第１の応答をするように前記動作手段を制御する制御手段、
として機能させ、
前記制御手段は、
前記判定手段により前記距離が前記対話可能距離を超えると判定された場合、前記自装置が前記所定の対象に向けて移動をするように前記移動手段を制御し、
前記自装置の前記所定の対象に向けての移動中に前記所定の対象から再度話しかけられた場合、前記判定手段により前記自装置と前記所定の対象との間の距離が前記対話可能距離以下となったと判定された後に、前記移動中に話しかけられた内容を認識できていないことを示す動作をするように前記動作手段を制御する、
ことを特徴とする。 In order to achieve the object of the present invention, one aspect of the program according to the present invention is
A computer that controls a robot comprising operating means for causing the own device to move and moving means for moving the own device ,
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. Determination means for determining whether or not the interactive distance is exceeded ;
When the determining means determines that the detected distance exceeds the interactive distance, the operating means makes a first response for causing the predetermined target to stop talking. control means for controlling
function as
The control means is
controlling the movement means so that the device moves toward the predetermined target when the determination means determines that the distance exceeds the interactive distance;
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, the determining means determines that the distance between the device and the predetermined target is equal to or less than the interactive distance. controlling the operating means to perform an operation indicating that the content spoken to during the movement is not recognized after it is determined that the
It is characterized by

本発明によれば、対話時の音声認識精度が高いロボット、ロボットの制御方法及びプログラムを提供することができる。 Advantageous Effects of Invention According to the present invention, it is possible to provide a robot, a control method for the robot, and a program that achieve high speech recognition accuracy during dialogue.

本発明の第１の実施の形態に係るロボットを示す図である。It is a figure showing a robot concerning a 1st embodiment of the present invention. 本発明の第１の実施の形態に係るロボットの構成を示すブロック図である。1 is a block diagram showing the configuration of a robot according to a first embodiment of the present invention; FIG. 本発明の第１の実施の形態に係る第１の対話処理を示すフローチャートである。4 is a flow chart showing first interactive processing according to the first embodiment of the present invention; 本発明の第１の実施の形態に係る第１の対話処理を説明する図である。It is a figure explaining the 1st interactive processing based on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る第１の対話処理を説明する図である。It is a figure explaining the 1st interactive processing based on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る第１の対話処理を説明する図である。It is a figure explaining the 1st interactive processing based on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る第１の対話処理を説明する図である。It is a figure explaining the 1st interactive processing based on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係るロボットの構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of a robot according to a second embodiment of the present invention; FIG. 本発明の第２の実施の形態に係る第２の対話処理を示すフローチャートである。9 is a flow chart showing second interactive processing according to the second embodiment of the present invention; 本発明の変形例に係る対話可能距離と雑音の音量との関係を示す図である。FIG. 10 is a diagram showing the relationship between the interactive distance and the volume of noise according to the modification of the present invention; 本発明の変形例に係る雑音の音量の基準値と、ユーザとロボットとの距離と、の関係を示す図である。FIG. 10 is a diagram showing the relationship between the reference value of the volume of noise and the distance between the user and the robot according to the modification of the present invention;

以下、本発明を実施するための形態に係るロボットを、図面を参照しながら説明する。 Hereinafter, robots according to embodiments for carrying out the present invention will be described with reference to the drawings.

（第１の実施の形態）
第１の実施の形態に係るロボット１００は、図１に示すように、人をデフォルメした形状を有し、目と口と鼻とを模した部材が配置された頭部１０１と、足を模した部材が配置された胴体部（筐体）１０２と、胴体部１０２に配置された手部１０３と、頭部１０１に配置されたマイク（マイクロフォン）１０４と、撮像部１０５と、スピーカ１０６と、表示部１０７と、底部に配置された移動部１０８と、胴体部１０２の背中に設けられた操作ボタン１２０と、を備え、胴体部１０２の内部には、制御部１１０と電源部１３０とを有する。手部１０３とマイク１０４と撮像部１０５とスピーカ１０６と表示部１０７とは、動作手段として機能する。 (First embodiment)
As shown in FIG. 1, the robot 100 according to the first embodiment has a deformed shape of a human, and includes a head 101 on which members simulating eyes, a mouth, and a nose are arranged, and a leg simulating a leg. A body (housing) 102 in which the members are arranged, a hand 103 arranged in the body 102, a microphone (microphone) 104 arranged in the head 101, an imaging unit 105, a speaker 106, It has a display unit 107, a moving unit 108 arranged on the bottom, and an operation button 120 provided on the back of the body 102. The body 102 has a control unit 110 and a power supply unit 130 inside. . The hand portion 103, the microphone 104, the imaging portion 105, the speaker 106, and the display portion 107 function as operating means.

手部１０３は、制御部１１０の制御に基づいて、図示しない駆動部により動作する。例えば、手部１０３を耳の後ろにあてる動作で、音声が聞こえないことをジェスチャーにより表現する。 The hand portion 103 is operated by a driving portion (not shown) under the control of the control portion 110 . For example, by putting the hand part 103 behind the ear, a gesture is used to express that the voice cannot be heard.

マイク１０４は、頭部１０１の右耳、左耳及び後頭部に配置され、音声を収音する。右耳に配置されたマイク１０４は、右前方から発せられた音声を集音する。左耳に配置されたマイク１０４は、左前方から発せられた音声を集音する。後頭部に配置されたマイク１０４は、後ろから発せられた音声を集音する。マイク１０４は、収音した音声を制御部１１０に出力する。このように、マイク１０４は、音声を入力する音声入力手段として機能する。 The microphones 104 are arranged in the right ear, left ear, and back of the head 101 and pick up sounds. A microphone 104 placed in the right ear picks up sound emitted from the right front. A microphone 104 placed in the left ear collects sound emitted from the left front. A microphone 104 placed on the back of the head collects sounds emitted from behind. Microphone 104 outputs the collected sound to control unit 110 . Thus, the microphone 104 functions as an audio input means for inputting audio.

撮像部１０５は、頭部１０１の鼻の位置に設けられたカメラである。撮像部１０５は、ユーザＵなどの所定の対象を撮像し、撮像された画像を示すデータを制御部１１０に出力する。このように、撮像部１０５は、画像を撮像する撮像手段として機能する。 The imaging unit 105 is a camera provided at the nose position of the head 101 . The imaging unit 105 captures an image of a predetermined target such as the user U, and outputs data representing the captured image to the control unit 110 . Thus, the imaging unit 105 functions as imaging means for capturing an image.

スピーカ１０６は、頭部１０１の口の位置に設けられ、制御部１１０の制御に基づいて、音声を発話する。このように、スピーカ１０６は、音声を出力する音声出力手段として機能する。 The speaker 106 is provided at the position of the mouth of the head 101 and speaks a sound under the control of the control unit 110 . Thus, the speaker 106 functions as an audio output unit that outputs audio.

表示部１０７は、頭部１０１の目の位置に設けられ、制御部１１０の制御に基づいて、目の画像を表示する。 The display unit 107 is provided at the position of the eyes of the head 101 and displays an image of the eyes based on the control of the control unit 110 .

移動部１０８は、モータとタイヤとから構成され、制御部１１０の制御に基づいて、ロボット１００を自律移動するものである。移動部１０８は、ロボット１００を前進、後退、右折、左折、右旋回及び左旋回する。このように、移動部１０８は、移動手段として機能する。 The moving unit 108 is composed of a motor and tires, and autonomously moves the robot 100 based on the control of the control unit 110 . The moving unit 108 moves the robot 100 forward, backward, turn right, turn left, turn right, and turn left. Thus, the moving unit 108 functions as moving means.

制御部１１０は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）から構成される。ＲＯＭは、フラッシュメモリ等の不揮発性メモリから構成され、制御部１１０が各種機能を実現するためのプログラム及び呼びかけ音声の音声パターンを示すデータ、対話を開始する通知を示す音声データ及び移動中話しかけ対策のための音声データを記憶する。ＲＡＭは、揮発性メモリから構成され、制御部１１０が各種処理を行うためのプログラムを実行するための作業領域として用いられる。また、ＲＡＭは、移動中話しかけＦｌａｇがＯＮであるかＯＦＦであるかを記憶する。制御部１１０は、ＣＰＵがＲＯＭに記憶されたプログラムを読み出してＲＡＭ上で実行することにより、図２に示すように、音声解析部１１１と、移動制御部１１２と、判定部１１３と、対話制御部１１４として機能する。移動制御部１１２と対話制御部１１４とは、制御手段として機能する。 The control unit 110 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The ROM is composed of non-volatile memory such as flash memory, and includes programs for realizing various functions of the control unit 110, data indicating the voice pattern of the calling voice, voice data indicating the notification of starting a dialogue, and countermeasures against talking while moving. store audio data for The RAM is composed of volatile memory, and is used as a work area for executing programs for the control unit 110 to perform various processes. In addition, the RAM stores whether the talking-while-moving Flag is ON or OFF. The control unit 110 has a voice analysis unit 111, a movement control unit 112, a determination unit 113, and an interactive control unit as shown in FIG. It functions as part 114 . The movement control unit 112 and the interaction control unit 114 function as control means.

音声解析部１１１は、ユーザＵから発せられた音声を検出し、ユーザＵから発せられた音声に、予め登録された呼びかけ音声を含んでいるか否かを判定する。呼びかけ音声は、例えば、ロボットの名前、「おーい」、「ちょっと聞いて」などが含まれる。呼びかけ音声は、予め登録されている音声パターンと照合するため、対話可能距離Ｄを超えて離れていても検出可能である。また、音声解析部１１１は、呼びかけ音声が発せられた音源の方向を検出する。具体的には、音声解析部１１１は、呼びかけ音声が発せられた音源の方向を、頭部１０１の右耳に配置されたマイク１０４、左耳に配置されたマイク１０４、及び後頭部に配置されたマイク１０４で収音した呼びかけ音声の音量差又は位相差により検出する。また、音声解析部１１１は、移動中にユーザＵから話しかけられているか否かを判定する。移動中にユーザＵから話しかけられていると判定すると、音声解析部１１１は、移動中話しかけＦｌａｇをＯＮにする。 The voice analysis unit 111 detects the voice uttered by the user U, and determines whether or not the voice uttered by the user U includes a pre-registered calling voice. The calling voice includes, for example, the name of the robot, "hey", "listen", and the like. Since the calling voice is matched with a pre-registered voice pattern, it can be detected even if the distance D exceeds the dialogue possible distance. Also, the voice analysis unit 111 detects the direction of the sound source from which the calling voice is emitted. Specifically, the voice analysis unit 111 determines the direction of the sound source from which the calling voice is emitted, the microphone 104 placed in the right ear of the head 101, the microphone 104 placed in the left ear, and the microphone 104 placed in the back of the head. It is detected by the volume difference or phase difference of the calling voice picked up by the microphone 104 . Also, the voice analysis unit 111 determines whether or not the user U is speaking to the user while moving. When determining that the user U is talking to the user U while moving, the voice analysis unit 111 turns on the moving talk flag.

移動制御部１１２は、移動部１０８を制御し、呼びかけ音声が発せられた音源の方向にロボット１００が向くように旋回する。具体的には、移動制御部１１２は、音声解析部１１１で検出された方向、又は撮像部１０５で検出したユーザＵの顔の方向にロボット１００を右旋回又は左旋回し、呼びかけ音声が発せられた音源の方向に向ける。また、移動制御部１１２は、移動部１０８を制御し、ロボット１００をユーザＵに向けて移動する。また、対話可能距離Ｄを超えていた位置から、ロボット１００がユーザＵに向けて移動をするように移動部１０８を制御した後において、移動制御部１１２は、ロボット１００が対話可能距離Ｄ以下に近づいたと判定された場合に、ロボット１００を停止するように移動部１０８を制御する。 The movement control unit 112 controls the movement unit 108 to turn the robot 100 so that it faces the direction of the sound source from which the calling voice is emitted. Specifically, the movement control unit 112 turns the robot 100 to the right or left in the direction detected by the voice analysis unit 111 or the direction of the face of the user U detected by the imaging unit 105, and the calling voice is emitted. facing the direction of the sound source. The movement control unit 112 also controls the movement unit 108 to move the robot 100 toward the user U. Further, after controlling the movement unit 108 so that the robot 100 moves toward the user U from a position beyond the conversational distance D, the movement control unit 112 moves the robot 100 to the conversational distance D or less. When it is determined that the robot 100 has approached, the moving unit 108 is controlled to stop the robot 100 .

判定部１１３は、ユーザＵからロボット１００に話しかけられた場合、ユーザＵとロボット１００との対話が可能であるか否かを判定する。撮像部１０５が撮像した画像からユーザＵの顔を検出し、ユーザＵの顔までの距離を検出する。判定部１１３は、顔の大きさから距離を検出してもよく、左右の目の間の距離からユーザＵの顔までの距離を検出してもよい。その後、判定部１１３は、ユーザＵとロボット１００との対話が可能であるか否かを、ユーザＵの顔までの距離が対話可能距離Ｄ以下であるか否かにより判定する。対話可能距離Ｄは、音声パターンが予め登録されていなくても、マイク１０４で集音した音声を文字情報に変換できる程度に音声を認識できる距離である。対話可能距離Ｄは、例えば１ｍである。このように、判定部１１３は、判定手段として機能する。 When the user U speaks to the robot 100, the determination unit 113 determines whether or not the user U and the robot 100 can interact with each other. The face of the user U is detected from the image captured by the imaging unit 105, and the distance to the face of the user U is detected. The determination unit 113 may detect the distance from the size of the face, or may detect the distance to the face of the user U from the distance between the left and right eyes. After that, the determination unit 113 determines whether or not the user U and the robot 100 can interact with each other by determining whether or not the distance to the face of the user U is equal to or less than the interaction-possible distance D. The conversationable distance D is a distance at which speech can be recognized to such an extent that the speech collected by the microphone 104 can be converted into character information even if no speech pattern is registered in advance. The interactive distance D is, for example, 1 m. Thus, the determination unit 113 functions as determination means.

対話制御部１１４は、対話の開始を阻止する動作（対話の中断をさせるための第１の応答）、対話を開始する通知（第１の応答とは異なる対話を継続する第２の応答）、移動中話しかけ対策及び対話の実行を行う。対話制御部１１４は、ロボット１００とユーザＵとの間の対話が可能になるまでの間、対話の開始を阻止する動作を実行する。具体的には、対話制御部１１４は、対話の開始を阻止する動作として、「ハーイただいま」「ちょっと待って」などの対話の開始を阻止する音声を発話する、又は手部１０３を耳の後ろに当てて聞こえないジェスチャーを実行する。また、対話制御部１１４は、対話の開始を阻止する動作を実行した後、ロボット１００とユーザＵとの間の対話が可能になったと判定されると、対話を開始する通知を実行する。対話制御部１１４は、対話を開始する通知として、スピーカ１０６からユーザＵの名前、「何でしょうか」などの対話を開始する通知を音声により発話してもよく、手部１０３を前に出して対話を開始するジェスチャーを実行してもよい。また、対話制御部１１４は、移動中話しかけＦｌａｇがＯＮであるか否かを判定し、移動中話しかけＦｌａｇがＯＮであると判定されると、スピーカ１０６から移動中話しかけ対策のための音声を発話する。移動中話しかけ対策のための音声は、「聞こえなかった」、「もう一回言って」などを含む。 The dialog control unit 114 performs an operation to block the start of the dialog (first response for interrupting the dialog), a notification to start the dialog (a second response to continue the dialog different from the first response), Take countermeasures against talking while moving and execute dialogue. The dialogue control unit 114 performs an operation to prevent the dialogue from starting until the dialogue between the robot 100 and the user U becomes possible. Specifically, the dialog control unit 114 utters a voice such as "Hi, I'm home" or "Wait a minute" to prevent the start of the dialog, or puts the hand unit 103 behind the ear. perform an inaudible gesture against Further, when it is determined that the interaction between the robot 100 and the user U has become possible after executing the operation to prevent the interaction from starting, the interaction control unit 114 executes a notification to start the interaction. The dialog control unit 114 may utter the name of the user U, "What is it?", etc. from the speaker 106 as a notification to start the dialog. Gestures that initiate interactions may be performed. Further, the dialog control unit 114 determines whether or not the moving talk flag is ON, and if it is determined that the moving talk flag is ON, the speaker 106 utters a voice for countermeasures against the moving talk. do. Voices for countermeasures against talking while moving include "I didn't hear you" and "Tell me again".

操作ボタン１２０は、胴体部１０２の背中に設けられ、ロボット１００を操作するためのボタンであり、電源ボタンを含む。 The operation button 120 is provided on the back of the body section 102 and is a button for operating the robot 100, and includes a power button.

電源部１３０は、胴体部１０２に内蔵された充電池から構成され、ロボット１００の各部に電力を供給する。 The power supply unit 130 is composed of a rechargeable battery built into the body unit 102 and supplies electric power to each unit of the robot 100 .

次に、以上の構成を有するロボット１００が実行する第１の対話処理について説明する。第１の対話処理は、ユーザＵとロボット１００との距離が対話可能距離Ｄ以上離れている場合、対話の開始を阻止し、対話可能距離Ｄ以下に近づくと対話を開始する通知を実行する処理である。 Next, the first interactive processing executed by the robot 100 having the above configuration will be described. The first dialogue process is a process of blocking the start of the dialogue when the distance between the user U and the robot 100 is greater than or equal to the dialogue possible distance D, and executing a notification to start the dialogue when the distance between the user U and the robot 100 approaches the dialogue possible distance D or less. is.

ユーザＵが操作ボタン１２０を操作し電源をＯＮにすると、ロボット１００は電源をＯＮにする指示に応答し、図３に示す第１の対話処理を開始する。以下、ロボット１００が実行する第１の対話処理を、フローチャートを用いて説明する。 When the user U operates the operation button 120 to turn on the power, the robot 100 responds to the instruction to turn on the power and starts the first interactive process shown in FIG. The first interactive process executed by the robot 100 will be described below using a flowchart.

まず、音声解析部１１１は、ユーザＵから発せられた音声を検出する（ステップＳ１０１）。次に、音声解析部１１１は、ユーザＵから発せられた音声に、予め登録された呼びかけ音声を含んでいるか否かを判定する（ステップＳ１０２）。呼びかけ音声は、例えば、ロボットの名前、「おーい」、「ちょっと聞いて」などが含まれる。呼びかけ音声が含まれていないと判定されると（ステップＳ１０２；Ｎｏ）、ステップＳ１０１～ステップＳ１０２を繰り返す。 First, the voice analysis unit 111 detects voice uttered by the user U (step S101). Next, the voice analysis unit 111 determines whether or not the voice uttered by the user U includes a pre-registered calling voice (step S102). The calling voice includes, for example, the name of the robot, "hey", "listen", and the like. If it is determined that the calling voice is not included (step S102; No), steps S101 and S102 are repeated.

ユーザＵから発せられた音声に、呼びかけ音声が含まれていると判定されると（ステップＳ１０２；Ｙｅｓ）、音声解析部１１１は、移動中話しかけＦｌａｇをＯＦＦにする（ステップＳ１０３）。次に、音声解析部１１１は、呼びかけ音声が発せられた音源の方向を検出する（ステップＳ１０４）。具体的には、音声解析部１１１は、呼びかけ音声が発せられた音源の方向を、頭部１０１の右耳に配置されたマイク１０４、左耳に配置されたマイク１０４、及び後頭部に配置されたマイク１０４で収音した呼びかけ音声の音量差又は位相差により検出する。次に、移動制御部１１２は、呼びかけ音声が発せられた音源の方向にロボット１００が向くように旋回する（ステップＳ１０５）。具体的には、移動制御部１１２は、移動部１０８を制御し、ロボット１００を右旋回又は左旋回し、呼びかけ音声が発せられた音源の方向に向ける。次に、判定部１１３は、ユーザＵの顔を検出する（ステップＳ１０６）。次に、判定部１１３は、ユーザＵの顔までの距離を検出する（ステップＳ１０７）。判定部１１３は、顔の大きさから距離を検出してもよく、左右の目の間の距離からユーザＵの顔までの距離を検出してもよい。 When it is determined that the voice uttered by the user U includes the calling voice (step S102; Yes), the voice analysis unit 111 turns off the moving talk flag (step S103). Next, the voice analysis unit 111 detects the direction of the sound source of the calling voice (step S104). Specifically, the voice analysis unit 111 determines the direction of the sound source from which the calling voice is emitted, the microphone 104 placed in the right ear of the head 101, the microphone 104 placed in the left ear, and the microphone 104 placed in the back of the head. It is detected by the volume difference or phase difference of the calling voice picked up by the microphone 104 . Next, the movement control unit 112 turns so that the robot 100 faces the direction of the sound source from which the calling voice is emitted (step S105). Specifically, the movement control unit 112 controls the movement unit 108 to turn the robot 100 to the right or left, and direct it toward the sound source from which the calling voice is issued. Next, the determination unit 113 detects the face of the user U (step S106). Next, the determination unit 113 detects the distance to the face of the user U (step S107). The determination unit 113 may detect the distance from the size of the face, or may detect the distance to the face of the user U from the distance between the left and right eyes.

次に、判定部１１３は、ユーザＵの顔までの距離が対話可能距離Ｄ以下であるか否かを判定する（ステップＳ１０８）。対話可能距離Ｄは、例えば１ｍである。ユーザＵの顔までの距離が対話可能距離Ｄ以下でないと判定されると（ステップＳ１０８；Ｎｏ）、対話制御部１１４は、対話の開始を阻止する動作を行う（ステップＳ１０９）。対話の開始を阻止する動作には、スピーカ１０６から「ハーイただいま」「ちょっと待って」などの対話の開始を阻止する音声を発話してもよく、手部１０３を耳の後ろに当てて聞こえないジェスチャーを実行してもよい。次に、移動制御部１１２は、移動部１０８を制御し、ロボット１００をユーザＵに向けて移動する（ステップＳ１１０）。次に、音声解析部１１１は、移動中にユーザＵから話しかけられているか否かを判定する（ステップＳ１１１）。移動中にユーザＵから話しかけられていると判定すると（ステップＳ１１１；Ｙｅｓ）、音声解析部１１１は、移動中話しかけＦｌａｇをＯＮにし（ステップＳ１１２）、ステップＳ１０６に戻る。移動中にユーザＵから話しかけられていないと判定すると（ステップＳ１１１；Ｎｏ）、ステップＳ１０６に戻る。 Next, the determination unit 113 determines whether or not the distance to the face of the user U is equal to or less than the interactive distance D (step S108). The interactive distance D is, for example, 1 m. When it is determined that the distance to the user U's face is not equal to or less than the conversationable distance D (step S108; No), the dialogue control unit 114 performs an operation to block the start of dialogue (step S109). For the operation to prevent the start of the dialogue, a sound such as "Hi, I'm home", "Wait a minute", etc., may be uttered from the speaker 106 to prevent the start of the dialogue. Gestures may be performed. Next, the movement control unit 112 controls the moving unit 108 to move the robot 100 toward the user U (step S110). Next, the voice analysis unit 111 determines whether or not the user U is speaking to the user while moving (step S111). If it is determined that the user U is talking to the user U while moving (step S111; Yes), the voice analysis unit 111 turns on the moving talk flag (step S112), and returns to step S106. If it is determined that the user U has not spoken to the user during movement (step S111; No), the process returns to step S106.

ユーザＵの顔までの距離が対話可能距離Ｄ以下であると判定されると（ステップＳ１０８；Ｙｅｓ）、対話制御部１１４は、対話を開始する通知を行う（ステップＳ１１３）。具体的には、対話制御部１１４は、スピーカ１０６から「何でしょうか」などの対話を開始する通知を音声により発話してもよく、手部１０３を前に出して対話を開始するジェスチャーを実行してもよい。このとき、移動制御部１１２は、ロボット１００を停止するように移動部１０８を制御する。次に、対話制御部１１４は、移動中話しかけＦｌａｇがＯＮであるか否かを判定する（ステップＳ１１４）。移動中話しかけＦｌａｇがＯＮであると判定されると（ステップＳ１１４；Ｙｅｓ）、スピーカ１０６から移動中話しかけ対策のための音声を発話する（ステップＳ１１５）。移動中話しかけ対策のための音声は、「聞こえなかった」、「もう一回言って」などを含む。次に、対話制御部１１４は、対話を実行する（ステップＳ１１６）。移動中話しかけＦｌａｇがＯＮでないと判定されると（ステップＳ１１４；Ｎｏ）、対話制御部１１４は、対話を実行する（ステップＳ１１６）。その後、音声解析部１１１は、終了指示を受け付けたか否かを判定する（ステップＳ１１７）。終了指示を受け付けていないと判定すると（ステップＳ１１７；Ｎｏ）、ステップＳ１０１に戻り、ステップＳ１０１からステップＳ１１７を繰り返す。対話中に終了指示を受け付けたと判定すると（ステップＳ１１７；Ｙｅｓ）、第１の対話処理を終了する。 When it is determined that the distance to the face of the user U is equal to or less than the conversationable distance D (step S108; Yes), the dialogue control unit 114 notifies the start of dialogue (step S113). Specifically, the dialogue control unit 114 may utter a notification to start the dialogue such as “What is it?” You may At this time, the movement control section 112 controls the movement section 108 to stop the robot 100 . Next, the dialogue control unit 114 determines whether or not the moving talking flag is ON (step S114). If it is determined that the talking-while-moving flag is ON (step S114; Yes), the speaker 106 speaks as a countermeasure against talking-while-moving (step S115). Voices for countermeasures against talking while moving include "I didn't hear you" and "Tell me again". Next, the dialogue control unit 114 executes dialogue (step S116). If it is determined that the moving talking flag is not ON (step S114; No), the dialogue control unit 114 executes dialogue (step S116). After that, the voice analysis unit 111 determines whether or not an end instruction has been received (step S117). If it is determined that the end instruction has not been received (step S117; No), the process returns to step S101, and steps S101 to S117 are repeated. If it is determined that an end instruction has been received during the dialogue (step S117; Yes), the first dialogue process is terminated.

次に、本実施の形態に係るロボット１００が実行する第１の対話処理を具体例に基づいて図４～図７を参照しながら説明する。 Next, the first interactive processing executed by the robot 100 according to the present embodiment will be described based on specific examples with reference to FIGS. 4 to 7. FIG.

ユーザＵが操作ボタン１２０を操作し電源をＯＮにすると、ロボット１００は電源をＯＮにする指示に応答し第１の対話処理を開始する。ロボット１００は、自走しユーザＵから対話可能距離Ｄを超えて離れた位置に移動したとする。 When the user U operates the operation button 120 to turn on the power, the robot 100 responds to the instruction to turn on the power and starts the first interactive process. It is assumed that the robot 100 runs by itself and moves to a position separated from the user U by exceeding the interactive distance D. FIG.

ユーザＵが図４に示すロボット１００に「ちょっと聞いて」と呼びかけると、音声解析部１１１は、ユーザＵから発せられた音声を検出する（ステップＳ１０１；図３）。ユーザＵから発せられた音声に、「ちょっと聞いて」という音声を含んでいるので、音声解析部１１１は、予め登録された呼びかけ音声を含んでいると判定する（ステップＳ１０２；Ｙｅｓ；図３）。 When the user U calls the robot 100 shown in FIG. 4 to "listen", the voice analysis unit 111 detects the voice uttered by the user U (step S101; FIG. 3). Since the voice uttered by the user U includes the voice "Listen to me", the voice analysis unit 111 determines that the voice includes a pre-registered calling voice (step S102; Yes; FIG. 3). .

次に、音声解析部１１１は、移動中話しかけＦｌａｇをＯＦＦにする（ステップＳ１０３；図３）。次に、音声解析部１１１は、呼びかけ音声が発せられた音源の方向を検出する（ステップＳ１０４；図３）。このときユーザＵは、ロボット１００からみて右に位置するので、音声解析部１１１は、右耳、左耳、後頭部に配置されたマイク１０４で収音した呼びかけ音声の音量差又は位相差により、呼びかけ音声が発せられた音源の方向は、右であると検出する。次に、移動制御部１１２は、図５に示すように、呼びかけ音声が発せられた音源の方向にロボット１００が向くように右旋回する（ステップＳ１０５；図３）。次に、判定部１１３は、ユーザの顔を検出する（ステップＳ１０６；図３）。次に、判定部１１３は、ユーザの顔までの距離Ｄ１を検出する（ステップＳ１０７；図３）。 Next, the voice analysis unit 111 turns off the talking-while-moving flag (step S103; FIG. 3). Next, the voice analysis unit 111 detects the direction of the sound source of the calling voice (step S104; FIG. 3). At this time, the user U is positioned on the right side of the robot 100, so the voice analysis unit 111 detects the call by using the volume difference or phase difference of the call voice picked up by the microphones 104 arranged in the right ear, the left ear, and the back of the head. The direction of the sound source from which the voice is emitted is detected as right. Next, as shown in FIG. 5, the movement control unit 112 turns right so that the robot 100 faces the direction of the sound source from which the calling voice was issued (step S105; FIG. 3). Next, the determination unit 113 detects the user's face (step S106; FIG. 3). Next, the determination unit 113 detects the distance D1 to the user's face (step S107; FIG. 3).

次に、判定部１１３は、ユーザの顔までの距離が対話可能距離Ｄ以下であるか否かを判定する（ステップＳ１０８；図３）。このとき、距離Ｄ１は、対話可能距離Ｄよりも大きいので、ユーザの顔までの距離が対話可能距離Ｄ以下でないと判定され（ステップＳ１０８；Ｎｏ；図３）、対話制御部１１４は、対話の開始を阻止する動作を行う（ステップＳ１０９；図３）。対話の開始を阻止する動作として、図６に示すように、手部１０３を耳の後ろに当てて聞こえないジェスチャーを実行する、又はスピーカ１０６から「ハーイただいま」「ちょっと待って」などの対話の開始を阻止する音声を発話する。次に、移動制御部１１２は、移動部１０８を制御し、ロボット１００をユーザに向けて移動する（ステップＳ１１０；図３）。次に、移動中にユーザＵから話しかけられると、音声解析部１１１は、移動中にユーザから話しかけられていると判定する（ステップＳ１１１；Ｙｅｓ；図３）。次に、音声解析部１１１は、移動中話しかけＦｌａｇをＯＮにし（ステップＳ１１２；図３）、ステップＳ１０６に戻る。 Next, the determination unit 113 determines whether or not the distance to the user's face is equal to or less than the interactive distance D (step S108; FIG. 3). At this time, since the distance D1 is greater than the interactive distance D, it is determined that the distance to the user's face is not equal to or smaller than the interactive distance D (step S108; No; FIG. An operation to prevent the start is performed (step S109; FIG. 3). As an action to prevent the start of the dialogue, as shown in FIG. Speak a voice that prevents it from starting. Next, the movement control unit 112 controls the moving unit 108 to move the robot 100 toward the user (step S110; FIG. 3). Next, when the user U speaks while moving, the voice analysis unit 111 determines that the user speaks while moving (step S111; Yes; FIG. 3). Next, the voice analysis unit 111 turns ON the speaking flag while moving (step S112; FIG. 3), and returns to step S106.

ロボット１００が図７に示す位置まで移動すると、判定部１１３は、ユーザの顔までの距離Ｄ２が対話可能距離Ｄ以下であると判定する（ステップＳ１０８；Ｙｅｓ；図３）。次に、対話制御部１１４は、対話を開始する通知を行う（ステップＳ１１３；図３）。具体的には、対話制御部１１４は、手部１０３を前に出して対話を開始するジェスチャーを実行し、スピーカ１０６から「何でしょうか」などの対話を開始する通知を音声により発話する。次に、対話制御部１１４は、移動中話しかけＦｌａｇがＯＮであると判定し（ステップＳ１１４；Ｙｅｓ；図３）、スピーカ１０６から移動中話しかけ対策のための音声「聞こえなかった」を発話（ステップＳ１１５；図３）する。次に、対話制御部１１４は、対話を実行する（ステップＳ１１６；図３）。その後、音声解析部１１１は、対話中に終了指示を受け付けたと判定すると（ステップＳ１１７；Ｙｅｓ；図３）、第１の対話処理を終了する。 When the robot 100 moves to the position shown in FIG. 7, the determination unit 113 determines that the distance D2 to the user's face is equal to or less than the interactive distance D (step S108; Yes; FIG. 3). Next, the dialogue control unit 114 notifies the start of dialogue (step S113; FIG. 3). Specifically, the dialogue control unit 114 puts the hand part 103 forward to perform a gesture to start dialogue, and utters a notification to start dialogue such as “What is it?” from the speaker 106 . Next, the dialog control unit 114 determines that the moving talk flag is ON (step S114; Yes; FIG. 3), and utters the voice "I couldn't hear" from the speaker 106 as a countermeasure against the moving talk (step S114; Yes; FIG. 3). S115; FIG. 3). Next, the dialogue control unit 114 executes dialogue (step S116; FIG. 3). After that, when the speech analysis unit 111 determines that an end instruction has been received during the dialogue (step S117; Yes; FIG. 3), the first dialogue process ends.

以上のように、本実施の形態のロボット１００によれば、呼びかけ音声を検出し、ユーザＵとロボット１００の距離が、対話の音声認識精度が確保できる対話可能距離Ｄよりも大きい場合に、明示的にまだ対話出来ない旨を表し、ユーザＵに対話可能距離Ｄ以下の位置まで近づくまで不用な対話開始を阻止することで、音声認識精度を向上することができる。また、ロボット１００が対話を開始する通知を行うことで、移動後の対話をスムーズに行うことを可能にする。 As described above, according to the robot 100 of the present embodiment, when the robot 100 detects the calling voice and the distance between the user U and the robot 100 is greater than the dialogue-possible distance D at which the speech recognition accuracy of the dialogue can be ensured, the explicit It is possible to improve speech recognition accuracy by indicating that the user U cannot actually have a dialogue yet and preventing unnecessary dialogue start until the user U approaches a position within the dialogue possible distance D or less. Further, by notifying that the robot 100 will start dialogue, it is possible to smoothly conduct dialogue after movement.

（第２の実施の形態）
第１の実施の形態に係るロボット１００は、移動部１０８により自律移動し、ユーザＵがロボット１００に呼びかけたとき、ロボット１００がユーザＵから離れているとユーザＵに移動して近づくものである。これに対して、第２の実施の形態に係るロボット２００は、自律移動せず、雑音の音量がユーザＵと対話をすることができる基準値以上である場合、雑音の音量が基準値以下になるまでの間、ユーザＵに対話の開始を阻止する動作を実行する。 (Second embodiment)
The robot 100 according to the first embodiment moves autonomously by the moving unit 108, and when the user U calls the robot 100, if the robot 100 is away from the user U, the robot 100 moves and approaches the user U. . On the other hand, in the case where the robot 200 according to the second embodiment does not move autonomously and the volume of the noise is equal to or greater than the reference value at which the user U can interact, the volume of the noise becomes equal to or less than the reference value. Until it becomes, the operation|movement which prevents the user U from starting a dialogue is performed.

第２の実施の形態に係るロボット２００は、図８に示すように、ロボット１００の構成で有していた移動部１０８及び移動制御部１１２を有さず、ロボット１００の構成に加えて、唇動作検出部１１５を備える。 As shown in FIG. 8, the robot 200 according to the second embodiment does not have the movement unit 108 and the movement control unit 112 that are included in the configuration of the robot 100. In addition to the configuration of the robot 100, a lip A motion detector 115 is provided.

音声解析部１１１は、ユーザＵから発せられた音声を検出し、ユーザＵから発せられた音声に、予め登録された呼びかけ音声を含んでいるか否かを判定する。また、音声解析部１１１は、ユーザＵから発せられた音声以外の音を雑音として検出する。また、音声解析部１１１は、雑音がテレビジョン、オーディオ機器、楽器などの音響機器から発せられた音であるか否かを判定する。音響機器から発せられた音か否かは、音に音声又は楽器の音を含むか、音にリズムを有するか、等により判定する。 The voice analysis unit 111 detects the voice uttered by the user U, and determines whether or not the voice uttered by the user U includes a pre-registered calling voice. Also, the voice analysis unit 111 detects sounds other than the voice uttered by the user U as noise. Also, the voice analysis unit 111 determines whether or not the noise is sound emitted from an audio device such as a television, an audio device, or a musical instrument. Whether or not the sound is emitted from an audio device is determined based on whether the sound includes a voice or the sound of a musical instrument, whether the sound has a rhythm, and the like.

判定部１１３は、音声解析部１１１により検出された雑音の音量がユーザＵと対話をすることができる基準値以下である場合に、ロボット２００とユーザＵとの間の対話が可能であると判定する。判定部１１３は、唇動作検出部１１５により検出されたユーザＵの唇の動作により、ユーザＵがロボット２００に呼びかけたか否かを判定する。具体的には、判定部１１３は、ユーザＵの唇の動作にロボットの名前、「おーい」、「ちょっと聞いて」と発音したときの動きが含まれるか否かにより判定する。 The determination unit 113 determines that a dialogue between the robot 200 and the user U is possible when the volume of the noise detected by the voice analysis unit 111 is equal to or less than a reference value for dialogue with the user U. do. The determination unit 113 determines whether or not the user U has called out to the robot 200 based on the lip motion of the user U detected by the lip motion detection unit 115 . Specifically, the determination unit 113 determines whether or not the movement of the lips of the user U includes the movement when pronouncing the robot's name, "Hey", and "Listen".

対話制御部１１４は、判定部１１３により雑音の音量が基準値を超えると判定された場合に、雑音の音量が基準値以下になるまでの間、ユーザＵへ対話の開始を阻止する動作を実行する。また、対話制御部１１４は、音声解析部１１１により雑音がテレビジョン、オーディオ機器、楽器などの音響機器から発せられたものであると判定されると、音響機器からの音量を下げる指示を示す音声を発話する。 When the determination unit 113 determines that the noise volume exceeds the reference value, the interaction control unit 114 performs an operation to prevent the user U from starting the interaction until the noise volume becomes equal to or lower than the reference value. do. Further, when the voice analysis unit 111 determines that the noise is emitted from an acoustic device such as a television, an audio device, or a musical instrument, the dialog control unit 114 controls the voice indicating an instruction to lower the volume from the acoustic device. utter the

唇動作検出部１１５は、撮像部１０５により撮像された画像からユーザＵの顔を検出し、ユーザＵの唇の動作を検出する。このように、唇動作検出部１１５は、唇動作検出手段として機能する。 The lip motion detection unit 115 detects the face of the user U from the image captured by the imaging unit 105 and detects the motion of the user U's lips. Thus, the lip motion detection unit 115 functions as lip motion detection means.

次に、第２の実施の形態に係るロボット２００が実行する第２の対話処理を説明する。 Next, a second interactive process executed by the robot 200 according to the second embodiment will be described.

ユーザＵが操作ボタン１２０を操作し電源をＯＮにすると、ロボット２００は電源をＯＮにする指示に応答し、図９に示す第２の対話処理を開始する。以下、ロボット２００が実行する第２の対話処理を、フローチャートを用いて説明する。 When the user U operates the operation button 120 to turn on the power, the robot 200 responds to the instruction to turn on the power and starts the second interactive process shown in FIG. The second interactive processing executed by the robot 200 will be described below using a flowchart.

まず、音声解析部１１１は、ユーザＵから発せられた音声を検出する（ステップＳ２０１）。次に、音声解析部１１１は、ユーザＵから発せられた音声に、予め登録された呼びかけ音声を含んでいるか否かを判定する（ステップＳ２０２）。呼びかけ音声は、例えば、ロボット２００の名前、「おーい」、「ちょっと聞いて」などが含まれる。呼びかけ音声が含まれていないと判定されると（ステップＳ２０２；Ｎｏ）、唇動作検出部１１５は、撮像部１０５により撮像された画像からユーザＵの顔を検出する（ステップＳ２０３）。次に、唇動作検出部１１５は、ユーザＵの唇の動作を検出する（ステップＳ２０４）。次に、判定部１１３は、ユーザＵの唇の動作により、ユーザＵがロボット２００に呼びかけたか否かを判定する（ステップＳ２０５）。具体的には、判定部１１３は、ユーザＵの唇の動作にロボット２００の名前、「おーい」、「ちょっと聞いて」と発音したときの動きが含まれるか否かにより判定する。呼びかけがなかったと判定されると（ステップＳ２０５；Ｎｏ）、ステップＳ２０１～ステップＳ２０４を繰り返す。 First, the voice analysis unit 111 detects voice uttered by the user U (step S201). Next, the voice analysis unit 111 determines whether or not the voice uttered by the user U includes a pre-registered calling voice (step S202). The calling voice includes, for example, the name of the robot 200, "hey", "listen", and the like. If it is determined that the calling voice is not included (step S202; No), the lip motion detection unit 115 detects the face of the user U from the image captured by the imaging unit 105 (step S203). Next, the lip motion detector 115 detects the lip motion of the user U (step S204). Next, the determination unit 113 determines whether or not the user U has called out to the robot 200 based on the motion of the lips of the user U (step S205). Specifically, the determination unit 113 determines whether or not the movement of the lips of the user U includes the movement when the name of the robot 200, “hey”, and “listen” are pronounced. If it is determined that there is no calling (step S205; No), steps S201 to S204 are repeated.

音声解析部１１１により呼びかけ音声が含まれていると判定される（ステップＳ２０２；Ｙｅｓ）、又は判定部１１３により、ユーザＵがロボット２００に呼びかけたと判定されると（ステップＳ２０５；Ｙｅｓ）、音声解析部１１１は、ユーザＵから発せられた音声以外の音を雑音として検出する（ステップＳ２０６）。次に、判定部１１３は、音声解析部１１１により検出された雑音の音量がユーザＵと対話をすることができる基準値以下であるか否かを判定する（ステップＳ２０７）。 When the voice analysis unit 111 determines that the calling voice is included (step S202; Yes), or when the determination unit 113 determines that the user U has called the robot 200 (step S205; Yes), voice analysis is performed. The unit 111 detects sounds other than the voice uttered by the user U as noise (step S206). Next, the determination unit 113 determines whether or not the volume of the noise detected by the voice analysis unit 111 is equal to or less than a reference value at which the user U can have a dialogue (step S207).

判定部１１３により、雑音の音量が基準値以下でないと判定されると（ステップＳ２０７；Ｎｏ）、対話制御部１１４は、対話の開始を阻止する動作を行う（ステップＳ２０８）。対話の開始を阻止する動作には、スピーカ１０６から「ハーイただいま」「ちょっと待って」などの対話の開始を阻止する音声を発話してもよく、手部１０３を耳の後ろに当てて聞こえないジェスチャーを実行してもよい。音声解析部１１１は、雑音がテレビジョン、オーディオ機器、楽器などの音響機器から発せられた音であるか否かを判定する（ステップＳ２０９）。音響機器から発せられた音か否かは、音に音声又は楽器の音を含むか、音にリズムを有するか、等により判定する。雑音が音響機器から発せられた音であると判定されると（ステップＳ２０９；Ｙｅｓ）、対話制御部１１４は、音響機器からの音量を下げる指示、例えば「ボリュームを下げて」を示す音声を発話し（ステップＳ２１０）、ステップＳ２０６に戻る。雑音が音響機器から発せられた音でないと判定されると（ステップＳ２０９；Ｎｏ）、ステップＳ２０６に戻る。 When the determination unit 113 determines that the noise volume is not equal to or lower than the reference value (step S207; No), the dialog control unit 114 performs an operation to prevent the dialog from starting (step S208). For the operation to prevent the start of the dialogue, a sound such as "Hi, I'm home", "Wait a minute", etc., may be uttered from the speaker 106 to prevent the start of the dialogue. Gestures may be performed. The voice analysis unit 111 determines whether or not the noise is sound emitted from an audio device such as a television, an audio device, or a musical instrument (step S209). Whether or not the sound is emitted from an audio device is determined based on whether the sound includes a voice or the sound of a musical instrument, whether the sound has a rhythm, and the like. When it is determined that the noise is the sound emitted from the audio equipment (step S209; Yes), the dialogue control unit 114 utters an instruction to lower the volume from the audio equipment, for example, a voice indicating "Turn down the volume". (step S210) and returns to step S206. If it is determined that the noise is not the sound emitted from the audio equipment (step S209; No), the process returns to step S206.

判定部１１３により、雑音の音量が基準値以下であると判定されると（ステップＳ２０７；Ｙｅｓ）、対話制御部１１４は、対話を開始する通知を行う（ステップＳ２１１）。具体的には、対話制御部１１４は、スピーカ１０６から「何でしょうか」などの対話を開始する通知を音声により発話してもよく、手部１０３を前に出して対話を開始するジェスチャーを実行してもよい。次に、対話制御部１１４は、対話を実行する（ステップＳ２１２）。その後、音声解析部１１１は、終了指示を受け付けたか否かを判定する（ステップＳ２１３）。終了指示を受け付けていないと判定すると（ステップＳ２１３；Ｎｏ）、ステップＳ２０１に戻り、ステップＳ２０１からステップＳ２１３を繰り返す。対話中に終了指示を受け付けたと判定すると（ステップＳ２１３；Ｙｅｓ）、第２の対話処理を終了する。 When the determination unit 113 determines that the volume of noise is equal to or less than the reference value (step S207; Yes), the dialogue control unit 114 notifies the start of dialogue (step S211). Specifically, the dialogue control unit 114 may utter a notification to start the dialogue such as “What is it?” You may Next, the dialogue control unit 114 executes dialogue (step S212). After that, the voice analysis unit 111 determines whether or not an end instruction has been received (step S213). If it is determined that the end instruction has not been received (step S213; No), the process returns to step S201, and steps S201 to S213 are repeated. If it is determined that an end instruction has been received during the dialogue (step S213; Yes), the second dialogue process is terminated.

以上のように、第２の実施の形態のロボット２００によれば、雑音の音量が基準値を超える場合、対話の開始を阻止する動作を行う。これにより、雑音により対話が困難である場合に対話が中止されるため、音声認識精度を向上することができる。また、ロボット２００が対話を開始する通知を行うことで、移動後の対話をスムーズに行うことを可能にする。また、唇動作検出部１１５は、撮像部１０５により撮像された画像からユーザＵの唇の動作を検出し、判定部１１３は、ユーザＵの唇の動作により、ユーザＵがロボット２００に呼びかけたか否かを判定する。これにより、雑音が大きい場合でもロボット２００は、ユーザＵがロボット２００に呼びかけたか否かを判定できる。 As described above, according to the robot 200 of the second embodiment, when the volume of noise exceeds the reference value, the action of preventing the start of dialogue is performed. As a result, the speech recognition accuracy can be improved because the dialogue is stopped when the dialogue is difficult due to noise. In addition, by notifying that the robot 200 will start a dialogue, it is possible to have a smooth dialogue after movement. Further, the lip motion detection unit 115 detects the motion of the user U's lips from the image captured by the imaging unit 105, and the determination unit 113 determines whether the user U has called the robot 200 based on the motion of the user U's lips. determine whether This allows the robot 200 to determine whether or not the user U has called the robot 200 even when the noise is loud.

（変形例）
前述の実施の形態では、第１の実施の形態に係るロボット１００の判定部１１３は、ユーザＵとの距離が対話可能距離Ｄを超えるか否かで、対話が困難であるか否かを判定した。第２の実施の形態に係るロボット２００の判定部１１３は、雑音の音量が基準値を超えるか否かにより、対話が困難であるか否かを判定した。ロボット１００、２００の判定部１１３は、対話が困難であるか否かを判定できればよい。例えば、ロボット１００、２００の判定部１１３は、ユーザＵとの距離と、雑音と、により対話が困難であるか否かを判定してもよい。例えば、ユーザＵとの距離が対話可能距離Ｄを超え、且つ雑音の音量が基準値を超えた場合に、対話が困難であるか否かを判定してもよい。また、雑音の音量レベルに応じて対話可能距離Ｄを変更してもよい。例えば、図１０に示すように、雑音の音量が大きいとき、対話可能距離Ｄを小さくし、雑音の音量が小さいとき、対話可能距離Ｄを大きくする。この場合、ロボット１００、２００は、対話可能距離Ｄを関数により算出してもよく、対話可能距離ＤをあらかじめＲＯＭに記憶したテーブルにより得てもよい。また、ユーザＵと対話をすることができる雑音の音量の基準値は、ユーザＵとロボット１００、２００の距離により変化してもよい。例えば、図１１に示すように、ユーザＵとロボット１００、２００の距離が大きいとき、雑音の音量の基準値を小さくする。この場合、ロボット１００、２００は、雑音の音量の基準値を関数により算出してもよく、雑音の音量の基準値をあらかじめＲＯＭに記憶したテーブルにより得てもよい。また、ロボット１００、２００の判定部１１３は、ユーザＵの声の大きさ、音声認識で文字データに変換できる割合、対話に用いる言語などにより対話が困難であるか否かを判定してもよい。 (Modification)
In the above-described embodiment, the determination unit 113 of the robot 100 according to the first embodiment determines whether or not the dialogue is difficult based on whether or not the distance to the user U exceeds the dialogue-possible distance D. did. The determination unit 113 of the robot 200 according to the second embodiment determines whether or not the dialogue is difficult based on whether or not the noise volume exceeds the reference value. The determination unit 113 of the robots 100 and 200 only needs to be able to determine whether or not the dialogue is difficult. For example, the determination unit 113 of the robots 100 and 200 may determine whether or not it is difficult to interact with the user U based on the distance to the user U and noise. For example, when the distance from the user U exceeds the conversationable distance D and the volume of noise exceeds a reference value, it may be determined whether or not the conversation is difficult. Further, the interactive distance D may be changed according to the volume level of noise. For example, as shown in FIG. 10, when the noise volume is high, the interactive distance D is decreased, and when the noise volume is low, the interactive distance D is increased. In this case, the robots 100 and 200 may calculate the interactive distance D using a function, or obtain the interactive distance D from a table stored in advance in the ROM. In addition, the reference value of the volume of noise with which the user U can have a dialogue may vary depending on the distance between the user U and the robots 100 and 200 . For example, as shown in FIG. 11, when the distance between the user U and the robots 100 and 200 is large, the noise volume reference value is decreased. In this case, the robots 100 and 200 may calculate the noise volume reference value using a function, or obtain the noise volume reference value from a table stored in advance in the ROM. Further, the determination unit 113 of the robots 100 and 200 may determine whether or not the dialogue is difficult based on the volume of the voice of the user U, the rate at which speech recognition can be converted into character data, the language used for the dialogue, and the like. .

前述の実施の形態のロボット１００、２００の対話制御部１１４は、目を表す表示部１０７とユーザＵの目とがアイコンタクトをするように、表示部１０７に目の画像を表示してもよい。このようにすることで、ロボット１００、２００がより人間に近い動作を表現できる。対話制御部１１４が行う対話の開始を阻止する動作は、マイク１０４が配置さていている耳部及び目部に配置された表示部１０７に目をとじた画像を表示し、ロボット１００、２００が音声を聞いておらず且つユーザＵも見ていないようなふりをしてもよい。 The interaction control unit 114 of the robots 100 and 200 of the above embodiments may display an eye image on the display unit 107 so that the display unit 107 representing the eyes and the eyes of the user U make eye contact. . By doing so, the robots 100 and 200 can express more human-like motions. The action of blocking the start of the dialogue performed by the dialogue control unit 114 is to display an image with the eyes closed on the display unit 107 arranged in the ears and eyes where the microphone 104 is arranged, and the robots 100 and 200 You may pretend that you are not listening to the voice and that the user U is not looking at it.

前述の第１の実施の形態では、移動制御部１１２が、移動部１０８を制御し、ロボット１００をユーザＵに向けて移動し、対話可能距離Ｄ以下に近づくと、ロボット１００を停止するように移動部１０８を制御する例について説明した。移動制御部１１２は、ロボット１００を対話可能距離Ｄ以下に近づくように制御すればよい。例えば、判定部１１３によりロボット１００とユーザＵとの距離が対話可能距離Ｄを超えると判定された場合に、移動制御部１１２は、ロボット１００が任意の方向に移動をしている場合にはロボット１００が移動の方向を変更してユーザＵに向けて移動をするように移動部１０８を制御して、ロボット１００が停止をしている場合にはロボット１００がユーザＵに向けて移動を始めるように移動部１０８を制御してもよい。判定部１１３によりロボット１００とユーザＵとの距離が対話可能距離以下であると判定された場合に、移動制御部１１２は、ロボット１００が移動をしている場合にはロボット１００を停止するように移動部１０８を制御して、ロボット１００が停止をしている場合にはロボット１００の停止を維持するように移動部を制御してもよい。この場合、対話制御部１１４は、ユーザＵに対話の継続をさせるための応答をするように動作手段を制御してもよい。 In the first embodiment described above, the movement control unit 112 controls the movement unit 108 to move the robot 100 toward the user U, and stops the robot 100 when the robot 100 approaches the interactive distance D or less. An example of controlling the moving unit 108 has been described. The movement control unit 112 may control the robot 100 so as to approach the conversationable distance D or less. For example, when the determination unit 113 determines that the distance between the robot 100 and the user U exceeds the interactive distance D, the movement control unit 112 determines that the robot 100 is moving in an arbitrary direction. The moving unit 108 is controlled so that the robot 100 changes its direction of movement and moves toward the user U, and when the robot 100 is stopped, the robot 100 starts moving toward the user U. You may control the moving part 108 to. When the determination unit 113 determines that the distance between the robot 100 and the user U is equal to or less than the conversationable distance, the movement control unit 112 stops the robot 100 when the robot 100 is moving. The moving unit 108 may be controlled to keep the robot 100 stopped when the robot 100 is stopped. In this case, the dialogue control unit 114 may control the operation means to give a response to the user U to continue the dialogue.

前述の実施の形態では、撮像部１０５が、頭部１０１の鼻の位置に設けられたカメラである例について説明した。撮像部１０５は、頭部１０１の片方の目の位置に設けられたカメラでもよく、両目に設けられたステレオカメラであってもよい。 In the above-described embodiment, an example in which imaging unit 105 is a camera provided at the position of the nose of head 101 has been described. The imaging unit 105 may be a camera provided at the position of one eye of the head 101, or may be a stereo camera provided for both eyes.

前述の実施の形態では、ロボット１００、２００とユーザＵの顔との距離を、撮像部１０５が撮像した画像に写るユーザＵの顔の大きさ、又は左右の目の間の距離から検出する例について説明した。ロボット１００、２００とユーザＵの顔との距離を測定する方法は限定されず、非接触距離センサにより測定されてもよい。非接触距離センサは、例えば、ロボット１００、２００の胴体部１０２に設置された超音波センサ又はレーザーセンサである。撮像部１０５にステレオカメラを用いた場合、ステレオカメラで撮像した画像からロボット１００、２００とユーザＵの顔との距離を検出してもよい。 In the above embodiments, the distance between the robots 100 and 200 and the face of the user U is detected from the size of the face of the user U captured in the image captured by the imaging unit 105 or the distance between the left and right eyes. explained. The method of measuring the distance between the robots 100 and 200 and the face of the user U is not limited, and may be measured by a non-contact distance sensor. The non-contact distance sensor is, for example, an ultrasonic sensor or a laser sensor installed on the body 102 of the robot 100,200. When a stereo camera is used for the imaging unit 105, the distance between the robots 100 and 200 and the face of the user U may be detected from images captured by the stereo camera.

前述の実施の形態では、所定の対象がユーザＵである例について説明した。所定の対象は、ロボット１００と対話できるものであればよく、人であってもよく、犬や猫などの動物であってもよく、ロボット１００以外の他のロボットであってもよい。 In the above embodiment, an example in which the predetermined target is the user U has been described. The predetermined object may be any object that can interact with the robot 100 , and may be a person, an animal such as a dog or a cat, or a robot other than the robot 100 .

前述の実施の形態では、移動部１０８が、モータとタイヤとから構成される例について説明した。移動部１０８は、ロボット１００を移動することができればよく、例えば、複数の足と足を動かすモータとから構成されてもよい。このようにすることで、ロボット１００の形状を人や動物により近づけることができる。 In the above-described embodiment, an example in which the moving unit 108 is composed of a motor and tires has been described. The moving unit 108 only needs to be able to move the robot 100, and may be composed of, for example, a plurality of legs and motors for moving the legs. By doing so, the shape of the robot 100 can be made closer to a person or an animal.

前述の実施の形態では、ロボット１００、２００が、人を模した形状を有する例について説明したが、ロボット１００、２００の形状は、特に限定されず、例えば、犬又は猫を含む動物を模した形状を有してもよく、アニメーションのキャラクタや想像上の生き物を模した形状であってもよい。また、ロボット１００、２００は、対話機能を有するものであればよく、自律走行して床などを掃除する掃除ロボット、巡回監視などを行う警備ロボットなどを含む。 In the above-described embodiments, robots 100 and 200 have a human-like shape, but the shape of robots 100 and 200 is not particularly limited. It may have a shape, and may be a shape imitating an animation character or an imaginary creature. Also, the robots 100 and 200 may be any robots that have an interactive function, and include cleaning robots that autonomously run to clean floors, security robots that patrol and monitor, and the like.

前述の実施の形態では、判定手段は、ロボットが所定の対象から話しかけられた場合、ロボットと所定の対象との間の対話が可能であるか否かを判定したが、対話に限定されず、判定手段は、ロボットが所定の対象から話しかけがあった場合、ロボットと所定の対象との間の音声認識精度の向上が可能であるか否かを判定してもよい。 In the above-described embodiment, when the robot is spoken to by the predetermined target, the determining means determines whether or not the robot can interact with the predetermined target. The determining means may determine whether or not it is possible to improve the speech recognition accuracy between the robot and the predetermined target when the robot is spoken to by the predetermined target.

また、前述の実施の形態では、制御手段は、判定手段によりロボットと所定の対象との間の対話が可能でないと判定された場合に、対話の中断をさせるための第１の応答をするように動作手段を制御したが、同様に対話に限定されず、制御手段は判定手段によりロボットと所定の対象との間の音声認識精度の向上が可能でないと判定された場合に、所定の対象に話しかけの中断をさせるための第１の応答をしてもよい。 Further, in the above-described embodiment, the control means makes the first response for interrupting the dialogue when the determination means determines that the dialogue between the robot and the predetermined object is not possible. However, it is not limited to dialogue, and the control means controls the predetermined target when the determination means determines that it is not possible to improve the speech recognition accuracy between the robot and the predetermined target. A first response may be given to cause a break in speaking.

また、第１の応答は、所定の対象に対話の中断を促す、音声出力手段による音声の出力、又は、所定の対象に対話の中断を促す、動作手段による姿勢の出力を含んでもよい。また、第１の応答は、音声入力手段及び撮像手段が意図的に機能していないふりをして、ロボット１００、２００が音声を聞いておらず、且つ、画像も見ていないような感情表現をすることを含んでもよい。 In addition, the first response may include output of sound by the audio output means to prompt the predetermined target to stop the dialogue, or output of posture by the action means to prompt the predetermined target to stop the dialogue. In addition, the first response is to intentionally pretend that the voice input means and imaging means are not functioning, and the robots 100 and 200 do not hear voices and do not see images. may include doing

また、第２の応答は、所定の対象に対話の継続を促す、音声出力手段による音声の出力、又は、所定の対象に対話の継続を促す、動作手段による姿勢の出力を含んでもよい。また、第２の応答は、音声入力手段及び撮像部１０５が意図的に機能しているふりをして、ロボット１００、２００が音声を聞いており、且つ、画像も見ているような感情表現をすることを含んでもよい。 In addition, the second response may include output of voice by the voice output means prompting the predetermined target to continue the dialogue, or output of posture by the action means prompting the predetermined target to continue the dialogue. In addition, the second response is to pretend that the voice input means and the imaging unit 105 are functioning intentionally, and the robots 100 and 200 are listening to the voice and also looking at the image. may include doing

また、前述の実施の形態では、ロボットを例として説明したが、ロボットに限定されず、ＡＩ（Artificial Intelligence）スピーカ等の電子機器でもよい。 Further, in the above-described embodiments, a robot was described as an example, but the present invention is not limited to a robot, and an electronic device such as an AI (Artificial Intelligence) speaker may be used.

また、ＣＰＵ、ＲＡＭ、ＲＯＭ等から構成される制御部１１０が実行する処理を行う中心となる部分は、専用のシステムによらず、通常の情報携帯端末（スマートフォン、タブレットＰＣ（Personal Computer））、パーソナルコンピュータなどを用いて実行可能である。たとえば、前述の動作を実行するためのコンピュータプログラムを、コンピュータが読み取り可能な記録媒体（フレキシブルディスク、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＤＶＤ－ＲＯＭ（Digital Versatile Disc Read Only Memory）等）に格納して配布し、このコンピュータプログラムを情報携帯端末などにインストールすることにより、前述の処理を実行する情報端末を構成してもよい。また、インターネット等の通信ネットワーク上のサーバ装置が有する記憶装置にこのコンピュータプログラムを格納しておき、通常の情報処理端末などがダウンロード等することで情報処理装置を構成してもよい。 In addition, the main part of the processing executed by the control unit 110, which is composed of a CPU, RAM, ROM, etc., is not based on a dedicated system, but a normal information portable terminal (smartphone, tablet PC (Personal Computer)), It can be executed using a personal computer or the like. For example, a computer program for executing the above-described operations can be stored on a computer-readable recording medium (flexible disk, CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disc Read Only Memory), etc.). By storing, distributing, and installing this computer program in a portable information terminal or the like, an information terminal that executes the above-described processing may be constructed. Alternatively, the computer program may be stored in a storage device of a server device on a communication network such as the Internet, and may be downloaded by an ordinary information processing terminal to constitute an information processing device.

また、制御部１１０の機能を、ＯＳ（Operating System）とアプリケーションプログラムとの分担、又はＯＳとアプリケーションプログラムとの協働により実現する場合などには、アプリケーションプログラム部分のみを記録媒体や記憶装置に格納してもよい。 In addition, when the functions of the control unit 110 are realized by sharing the functions of an OS (Operating System) and an application program, or by cooperation between the OS and an application program, only the application program portion is stored in a recording medium or storage device. You may

また、搬送波にコンピュータプログラムを重畳し、通信ネットワークを介して配信することも可能である。例えば、通信ネットワーク上の掲示板（ＢＢＳ：Bulletin Board System）にこのコンピュータプログラムを掲示し、ネットワークを介してこのコンピュータプログラムを配信してもよい。そして、このコンピュータプログラムを起動し、ＯＳの制御下で、他のアプリケーションプログラムと同様に実行することにより、前述の処理を実行できるように構成してもよい。 It is also possible to superimpose a computer program on a carrier wave and distribute it via a communication network. For example, the computer program may be posted on a bulletin board system (BBS: Bulletin Board System) on a communication network and distributed over the network. Then, this computer program may be activated and executed in the same manner as other application programs under the control of the OS so that the above processing can be executed.

本発明は、本発明の広義の精神と範囲とを逸脱することなく、様々な実施形態及び変形が可能とされるものである。また、前述した実施形態は、本発明を説明するためのものであり、本発明の範囲を限定するものではない。つまり、本発明の範囲は、実施形態ではなく、請求の範囲によって示される。そして、請求の範囲内及びそれと同等の発明の意義の範囲内で施される様々な変形が、本発明の範囲内とみなされる。以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 The present invention is capable of various embodiments and modifications without departing from the broader spirit and scope of the invention. Moreover, the above-described embodiments are for explaining the present invention, and do not limit the scope of the present invention. In other words, the scope of the present invention is indicated by the claims rather than the embodiments. Various modifications made within the scope of the claims and within the meaning of the invention equivalent thereto are considered to be within the scope of the present invention. The invention described in the original claims of the present application is appended below.

（付記１）
自装置に動作をさせる動作手段と、
前記自装置が所定の対象から話しかけがあった場合、前記自装置と前記所定の対象との間の音声認識精度の向上が可能であるか否かを判定する判定手段と、
前記判定手段により前記自装置と前記所定の対象との間の前記音声認識精度の向上が可能でないと判定された場合に、前記所定の対象に前記話しかけの中断をさせるための第１の応答をするように前記動作手段を制御する制御手段と、
を備える、
ことを特徴とするロボット。 (Appendix 1)
an operation means for causing the device to operate;
determination means for determining whether or not speech recognition accuracy between the device and the predetermined target can be improved when the device is spoken to by a predetermined target;
When the determination means determines that the accuracy of speech recognition between the device itself and the predetermined target cannot be improved, a first response for causing the predetermined target to stop talking to the target is provided. a control means for controlling the operating means to
comprising
A robot characterized by:

（付記２）
前記判定手段は、前記自装置が前記所定の対象から話しかけられた場合、前記自装置と前記所定の対象との間の対話が可能であるか否かを判定し、
前記制御手段は、前記判定手段により前記自装置と前記所定の対象との間の前記対話が可能でないと判定された場合に、前記対話の前記中断をさせるための前記第１の応答をするように前記動作手段を制御する、
ことを特徴とする付記１に記載のロボット。 (Appendix 2)
The determination means determines whether or not a dialogue between the device and the predetermined object is possible when the device is spoken to by the predetermined object,
The control means makes the first response for interrupting the dialogue when the determination means determines that the dialogue between the device and the predetermined target is not possible. controlling the operating means to
The robot according to appendix 1, characterized by:

（付記３）
前記制御手段は、前記自装置と前記所定の対象との間の前記対話が可能になるまでの間、前記対話の前記中断をさせるための前記第１の応答をするように前記動作手段を制御する、
ことを特徴とする付記２に記載のロボット。 (Appendix 3)
The control means controls the operation means to make the first response for interrupting the dialogue until the dialogue between the device and the predetermined target becomes possible. do,
The robot according to appendix 2, characterized by:

（付記４）
前記制御手段が、前記第１の応答をするように前記動作手段を制御した後、前記判定手段が前記自装置と前記所定の対象との間の前記対話が可能であると判定すると、前記制御手段は、前記第１の応答とは異なる前記所定の対象に前記対話の継続をさせるための第２の応答をするように前記動作手段を制御する、
ことを特徴とする付記２又は３に記載のロボット。 (Appendix 4)
After the control means controls the operation means to make the first response, if the determination means determines that the interaction between the device and the predetermined target is possible, the control means for controlling the operating means to make a second response different from the first response for causing the predetermined target to continue the dialogue;
The robot according to appendix 2 or 3, characterized by:

（付記５）
前記判定手段は、前記自装置が前記所定の対象から話しかけられた場合、前記自装置と前記所定の対象との距離を検出し、前記距離が前記所定の対象と前記対話をするために必要な対話可能距離を超える場合に、前記自装置と前記所定の対象との間の前記対話が可能でないと判定する、
ことを特徴とする付記２乃至４の何れか１つに記載のロボット。 (Appendix 5)
The determination means detects a distance between the device and the predetermined target when the device is spoken to by the predetermined target, and the determination means detects the distance required for the dialogue with the predetermined target. Determining that the interaction between the device and the predetermined target is not possible when the interaction possible distance is exceeded;
5. The robot according to any one of Appendices 2 to 4, characterized by:

（付記６）
前記自装置を移動させる移動手段を更に備え、
前記制御手段は、前記判定手段により前記距離が前記対話可能距離を超えると判定された場合に、前記自装置が前記所定の対象に向けて移動をするように前記移動手段を制御する、
ことを特徴とする付記５に記載のロボット。 (Appendix 6)
further comprising moving means for moving the device,
When the determining means determines that the distance exceeds the interactive distance, the control means controls the moving means so that the own device moves toward the predetermined target.
The robot according to appendix 5, characterized by:

（付記７）
前記距離が前記対話可能距離以下になるまでの間、前記第１の応答をするように前記動作手段を制御する、
ことを特徴とする付記５又は６に記載のロボット。 (Appendix 7)
controlling the operating means to make the first response until the distance becomes equal to or less than the interactive distance;
The robot according to appendix 5 or 6, characterized by:

（付記８）
音声を入力する音声入力手段と、
前記音声入力手段により入力された音声を解析する音声解析手段と、
を更に備え、
前記判定手段は、前記音声解析手段により解析された雑音の音量が前記所定の対象と前記対話をすることができる基準値を超える場合に、前記自装置と前記所定の対象との間の前記対話が可能でないと判定し、
前記制御手段は、前記判定手段により前記雑音の音量が前記基準値を超えると判定された場合に、前記雑音の音量が前記基準値以下になるまでの間、前記第１の応答をするように前記動作手段を制御する、
ことを特徴とする付記２乃至７の何れか１つに記載のロボット。 (Appendix 8)
voice input means for inputting voice;
a voice analysis means for analyzing the voice input by the voice input means;
further comprising
When the volume of the noise analyzed by the speech analysis means exceeds a reference value at which the dialogue with the predetermined object is possible, the determination means determines whether the dialogue between the device itself and the predetermined object is possible. is not possible,
When the determination means determines that the volume of the noise exceeds the reference value, the control means makes the first response until the volume of the noise becomes equal to or less than the reference value. controlling the operating means;
8. The robot according to any one of Appendices 2 to 7, characterized by:

（付記９）
画像を撮像する撮像手段と、
前記撮像手段により撮像された画像から前記所定の対象の顔を検出し、検出された前記顔に含まれる唇動作を検出する唇動作検出手段と、
を更に備え、
前記判定手段は、前記唇動作検出手段により検出された前記唇動作により前記所定の対象から話しかけられたか否かを判定する、
ことを特徴とする付記２乃至８の何れか１つに記載のロボット。 (Appendix 9)
an imaging means for imaging an image;
lip motion detection means for detecting the face of the predetermined target from the image captured by the imaging means and detecting lip motion included in the detected face;
further comprising
The determination means determines whether or not the predetermined target has spoken to the lip motion detected by the lip motion detection means.
9. The robot according to any one of Appendices 2 to 8, characterized by:

（付記１０）
前記判定手段により前記自装置と前記所定の対象との距離が対話可能距離を超えると判定された場合に、
前記制御手段は、前記自装置が任意の方向に移動をしている場合には前記自装置が移動の方向を変更して前記所定の対象に向けて移動をするように移動手段を制御して、前記自装置が停止をしている場合には前記自装置が前記所定の対象に向けて移動を始めるように前記移動手段を制御する、
ことを特徴とする付記２乃至９の何れか１つに記載のロボット。 (Appendix 10)
when the determination means determines that the distance between the device and the predetermined target exceeds the interactive distance,
The control means controls the movement means so that, when the device is moving in an arbitrary direction, the device changes the direction of movement and moves toward the predetermined target. , controlling the moving means so that the own device starts moving toward the predetermined target when the own device is stopped;
The robot according to any one of Appendices 2 to 9, characterized in that:

（付記１１）
前記判定手段により前記自装置と前記所定の対象との距離が対話可能距離以下であると判定された場合に、
前記制御手段は、前記自装置が移動をしている場合には前記自装置を停止するように移動手段を制御して、前記自装置が停止をしている場合には前記自装置の停止を維持するように前記移動手段を制御して、
前記制御手段は、前記所定の対象に前記対話の継続をさせるための第２の応答をするように前記動作手段を制御する、
ことを特徴とする付記２乃至１０の何れか１つに記載のロボット。 (Appendix 11)
when the determination means determines that the distance between the device and the predetermined object is equal to or less than the interactive distance,
The control means controls the moving means to stop the own device when the own device is moving, and stops the own device when the own device is stopped. controlling the moving means to maintain
The control means controls the action means to give a second response for causing the predetermined target to continue the dialogue.
11. The robot according to any one of Appendices 2 to 10, characterized by:

（付記１２）
音声を出力する音声出力手段を更に備え、
前記第１の応答は、前記所定の対象に前記対話の前記中断を促す、前記音声出力手段による音声の出力、又は、前記所定の対象に前記対話の前記中断を促す、前記動作手段による姿勢の出力を含む、
ことを特徴とする付記２に記載のロボット。 (Appendix 12)
further comprising audio output means for outputting audio,
The first response prompts the predetermined target to discontinue the dialogue, outputs voice by the voice output means, or prompts the predetermined target to discontinue the dialogue, and changes posture by the action means. including output,
The robot according to appendix 2, characterized by:

（付記１３）
音声を出力する音声出力手段を更に備え、
前記第２の応答は、前記所定の対象に前記対話の前記継続を促す、前記音声出力手段による音声の出力、又は、前記所定の対象に前記対話の前記継続を促す、前記動作手段による姿勢の出力を含む、
ことを特徴とする付記４又は１１に記載のロボット。 (Appendix 13)
further comprising audio output means for outputting audio,
The second response prompts the predetermined target to continue the dialogue, outputs voice by the voice output means, or prompts the predetermined target to continue the dialogue, and changes posture by the action means. including output,
The robot according to appendix 4 or 11, characterized by:

（付記１４）
音声を入力する音声入力手段と、
画像を撮像する撮像手段と、
を更に備え、
前記第１の応答は、前記音声入力手段及び前記撮像手段が意図的に機能していないふりをして、前記自装置が音声を聞いておらず、且つ、画像も見ていないような感情表現をすることを含む、
ことを特徴とする付記２に記載のロボット。 (Appendix 14)
voice input means for inputting voice;
an imaging means for imaging an image;
further comprising
The first response is to intentionally pretend that the voice input means and the imaging means are not functioning, and to express an emotion such that the self-device neither hears voices nor sees images. including doing
The robot according to appendix 2, characterized by:

（付記１５）
音声を入力する音声入力手段と、
画像を撮像する撮像手段と、
を更に備え、
前記第２の応答は、前記音声入力手段及び前記撮像手段が意図的に機能しているふりをして、前記自装置が音声を聞いており、且つ、画像も見ているような感情表現をすることを含む、
ことを特徴とする付記４又は１１に記載のロボット。 (Appendix 15)
voice input means for inputting voice;
an imaging means for imaging an image;
further comprising
The second response is to pretend that the voice input means and the imaging means are functioning intentionally, and to express emotions such that the self-device is listening to voices and also seeing images. including to
The robot according to appendix 4 or 11, characterized by:

（付記１６）
前記所定の対象は、人又は動物又は他のロボットを含む、
ことを特徴とする付記１乃至１５の何れか１つに記載のロボット。 (Appendix 16)
the predetermined target includes a human or animal or other robot;
16. The robot according to any one of Appendices 1 to 15, characterized by:

（付記１７）
自装置に動作をさせる動作手段を備えるロボットの制御方法であって、
前記自装置が所定の対象から話しかけがあった場合、前記自装置と前記所定の対象との間の音声認識精度の向上が可能であるか否かを判定する判定ステップと、
前記判定ステップにより前記自装置と前記所定の対象との間の前記音声認識精度の向上が可能でないと判定された場合に、前記所定の対象に前記話しかけの中断をさせるための第１の応答をするように前記動作手段を制御する制御ステップと、
を含む、
ことを特徴とするロボットの制御方法。 (Appendix 17)
A control method for a robot comprising operating means for causing the robot to operate,
a determination step of determining whether or not it is possible to improve speech recognition accuracy between the device and the predetermined target when the device itself is spoken to by a predetermined target;
When it is determined by the determination step that the improvement of the speech recognition accuracy between the device itself and the predetermined target is not possible, a first response for causing the predetermined target to stop the speech is provided. a control step of controlling the operating means to
including,
A robot control method characterized by:

（付記１８）
自装置に動作をさせる動作手段を備えるロボットを制御するコンピュータを、
前記自装置が所定の対象から話しかけがあった場合、前記自装置と前記所定の対象との間の音声認識精度の向上が可能であるか否かを判定する判定手段、
前記判定手段により前記自装置と前記所定の対象との間の前記音声認識精度の向上が可能でないと判定された場合に、前記所定の対象に前記話しかけの中断をさせるための第１の応答をするように前記動作手段を制御する制御手段、
として機能させる、
ことを特徴とするプログラム。 (Appendix 18)
A computer that controls a robot equipped with operating means that causes the device to operate,
Determination means for determining whether or not speech recognition accuracy between the device and the predetermined target can be improved when the device is spoken to by a predetermined target;
When the determination means determines that the accuracy of speech recognition between the device itself and the predetermined target cannot be improved, a first response for causing the predetermined target to stop talking to the target is provided. control means for controlling the operating means to
to function as
A program characterized by

１００、２００…ロボット、１０１…頭部、１０２…胴体部、１０３…手部、１０４…マイク、１０５…撮像部、１０６…スピーカ、１０７…表示部、１０８…移動部、１１０…制御部、１１１…音声解析部、１１２…移動制御部、１１３…判定部、１１４…対話制御部、１１５…唇動作検出部、１２０…操作ボタン、１３０…電源部 DESCRIPTION OF SYMBOLS 100, 200... Robot 101... Head 102... Body part 103... Hand part 104... Microphone 105... Imaging part 106... Speaker 107... Display part 108... Moving part 110... Control part 111 112 movement control unit 113 determination unit 114 dialogue control unit 115 lip motion detection unit 120 operation button 130 power supply unit

Claims

an operation means for causing the device to operate;
a moving means for moving the own device;
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. a determination means for determining whether or not the interactive distance is exceeded ;
a control means for controlling the movement means so that the self-device moves toward the predetermined target when the determination means determines that the detected distance exceeds the interactive distance ; ,
with
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, the control means determines the distance between the device and the predetermined target by the determination device. controlling the operating means so as to perform an operation indicating that the content spoken to during the movement is not recognized after it is determined that the distance is equal to or less than the interactive distance ;
A robot characterized by:

When the determination means determines that the detected distance exceeds the interactive distance, the control means makes a first response for causing the predetermined target to stop the talking. controlling said operating means to
The robot according to claim 1, characterized by:

The control means causes the operation means to make the first response for interrupting the talking until the distance between the device and the predetermined object becomes equal to or less than the interactive distance . to control the
3. The robot according to claim 2, characterized by:

After the control means controls the operation means to make the first response, the determination means determines that the distance between the own device and the predetermined object has become equal to or less than the interactive distance. Then, the control means controls the operation means to make a second response different from the first response to cause the predetermined target to continue the dialogue.
4. The robot according to claim 2 or 3, characterized in that:

The second response is performed before an action indicating that the content spoken to during the movement is not recognized .
5. The robot according to claim 4 , characterized in that:

voice input means for inputting voice;
a memory in which a calling voice is stored;
further comprising
The determining means compares the voice input from the voice input means with the calling voice stored in the memory to determine whether the predetermined target is speaking to the user .
6. The robot according to any one of claims 1 to 5 , characterized in that:

when the determining means determines that the distance between the device and the predetermined target exceeds the interactive distance,
The control means controls the movement means so that, when the device is moving in an arbitrary direction, the device changes the direction of movement and moves toward the predetermined target. and controlling the moving means so that the own device starts moving toward the predetermined target when the own device is stopped;
7. The robot according to any one of claims 1 to 6 , characterized in that:

when the determining means determines that the distance between the device and the predetermined target is equal to or less than the interactive distance,
The control means controls the moving means to stop the own device when the own device is moving, and stops the own device when the own device is stopped. controlling the moving means to maintain
The control means controls the action means to give the second response for causing the predetermined target to continue the dialogue.
6. The robot according to claim 4 or 5, characterized in that:

further comprising audio output means for outputting audio,
The action indicating that the content spoken to while moving is not recognized is a voice requesting to repeat the content of the speech from the predetermined target, or recognizing the content of the speech from the predetermined target. An operation of outputting a voice indicating that it is not done from the voice output means ,
9. The robot according to any one of claims 1 to 8 , characterized in that:

further comprising audio output means for outputting audio,
The first response prompts the predetermined target to discontinue the dialogue, outputs voice by the voice output means, or prompts the predetermined target to discontinue the dialogue, and changes posture by the action means. including output,
3. The robot according to claim 2 , characterized by:

voice input means for inputting voice;
an imaging means for imaging an image;
further comprising
The first response is to intentionally pretend that the voice input means and the imaging means are not functioning, and to express an emotion such that the self-device neither hears voices nor sees images. including doing
3. The robot according to claim 2, characterized by:

further comprising audio output means for outputting audio,
The second response prompts the predetermined target to continue the dialogue, outputs voice by the voice output means, or prompts the predetermined target to continue the dialogue, and changes posture by the action means. including output,
9. The robot according to claim 4 , 5 or 8 , characterized in that:

voice input means for inputting voice;
an imaging means for imaging an image;
further comprising
The second response is to pretend that the voice input means and the imaging means are functioning intentionally, and to express emotions such that the self-device is listening to voices and also seeing images. including to
9. The robot according to claim 4 , 5 or 8 , characterized in that:

the predetermined target includes a human or animal or other robot;
14. The robot according to any one of claims 1 to 13 , characterized in that:

A control method for a robot comprising operating means for causing the own device to operate and moving means for moving the own device ,
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. a determination step of determining whether or not the interactive distance is exceeded ;
When the determination step determines that the detected distance exceeds the interactive distance, the operation means makes a first response for causing the predetermined target to stop the talking. a control step for controlling
including
if the determining step determines that the distance exceeds the interactive distance, controlling the moving means so that the own device moves toward the predetermined target;
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, it is determined that the distance between the device and the predetermined target is equal to or less than the interactive distance. After that, controlling the operation means to perform an operation indicating that the content spoken to during the movement is not recognized.
A robot control method characterized by:

A computer that controls a robot comprising operating means for causing the own device to move and moving means for moving the own device ,
When the device is spoken to by a predetermined target, the device detects the distance between the device and the predetermined target, and the detected distance can ensure the speech recognition accuracy of the dialogue with the predetermined target. Determination means for determining whether or not the interactive distance is exceeded ;
When the determining means determines that the detected distance exceeds the interactive distance, the operating means makes a first response for causing the predetermined target to stop talking. control means for controlling
function as
The control means is
controlling the movement means so that the device moves toward the predetermined target when the determination means determines that the distance exceeds the interactive distance;
When the predetermined target speaks to the device again while the device is moving toward the predetermined target, the determining means determines that the distance between the device and the predetermined target is equal to or less than the interactive distance. controlling the operating means to perform an operation indicating that the content spoken to during the movement is not recognized after it is determined that the
A program characterized by