WO2005076258A1 - ユーザ適応型装置およびその制御方法 - Google Patents
ユーザ適応型装置およびその制御方法 Download PDFInfo
- Publication number
- WO2005076258A1 WO2005076258A1 PCT/JP2005/001219 JP2005001219W WO2005076258A1 WO 2005076258 A1 WO2005076258 A1 WO 2005076258A1 JP 2005001219 W JP2005001219 W JP 2005001219W WO 2005076258 A1 WO2005076258 A1 WO 2005076258A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- response
- unit
- utterance
- input
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 160
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 29
- 230000004044 response Effects 0.000 claims abstract description 152
- 238000012545 processing Methods 0.000 claims abstract description 68
- 230000008569 process Effects 0.000 claims abstract description 24
- 238000001514 detection method Methods 0.000 claims abstract description 16
- 230000008859 change Effects 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 17
- 230000009471 action Effects 0.000 claims description 10
- 238000013459 approach Methods 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims 1
- 238000002474 experimental method Methods 0.000 description 33
- 230000033001 locomotion Effects 0.000 description 28
- 238000004891 communication Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 14
- 230000006978 adaptation Effects 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 10
- 230000003993 interaction Effects 0.000 description 10
- 241000282412 Homo Species 0.000 description 9
- 230000001360 synchronised effect Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000008921 facial expression Effects 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 3
- 230000035790 physiological processes and functions Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 241001180747 Hottea Species 0.000 description 1
- 241001122315 Polites Species 0.000 description 1
- 206010071299 Slow speech Diseases 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002079 cooperative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- the present invention relates to an apparatus provided with an interface for providing information and services in response to an input from a user, and specifically relates to a home robot, an information terminal, Related to home appliances and the like.
- An interface is required between a user and a device that is assumed to be used by the user.
- One of the interfaces is an adaptive interface that adjusts the information and services provided based on the history of user-device interactions. This adaptive interface allows the equipment to adapt to the differences and personalities of each user, realizing the use, ease, and operation interface for each user.
- Such an adaptive interface receives (1) a specific state or input of a user, (2) determines a state of a user, and (3) adjusts a service or an interface, thereby providing a user with a user.
- the company has been aiming to provide use, ease and interface.
- Patent Documents 1 and 2 disclose examples of conventional techniques related to the improvement of the adaptive interface.
- Patent Document 1 proposes a method of adjusting a situation of user adaptation using a machine translation system as an example.
- machine translation the dictionary and vocabulary of translation examples required for translation change depending on the genre of input documents (newspaper articles, manuals, etc.), and the genre is switched adaptively.
- this genre switching is not always successful, so the user can see the genre switching candidates and improve the genre specification accuracy. It is something to try. This aims to address the possibility that adaptation may not be smooth in an adaptive interface.
- Patent Document 2 taking the GUI interface of a WEB browser as an example, the arrangement of the interface and the like are determined based on a model called pseudo-emotion. That is, the elements of the interface are treated as generating pseudo-emotions, and the expression of the emotions is represented as the arrangement of the interfaces. It is said that the user interface can be adapted to the human senses by using the characteristics of human pseudo-emotional changes.
- Non-Patent Documents 1 to 3 show remarkable findings regarding the interface between a human and a machine.
- Non-Patent Document 1 when a user plays a game (siritori) with a human or a computer through a network, the same computer program is used when teaching that an opponent is a human and when teaching that the opponent is a computer. It has been reported that the user's interest duration and response were different even when dealing with the user.
- Non-Patent Document 2 regarding the task of requesting a message by a user, in the case of a robot, a case of a computer screen, and a case of a human, in the case of an interview after the end of the task, The results indicate that the utterance changes.
- Non-Patent Document 3 in human-to-human communication, nonverbal information such as speech surrounding language and nodding, blinking, facial expression, and gestures, which can only be communicated by verbal information, are communicated between the speaker and the listener. This causes a so-called phenomena of pulling in each other, which indicates that communication has been facilitated. It has also been pointed out that physiological retraction, such as heart rate variability and respiration, which are closely related to emotional variability, also play an important role.
- Patent Document 1 JP-A-9-81350
- Patent Document 2 JP-A-2000-330676 (particularly, paragraph 0062)
- Patent Document 3 Japanese Patent Application Laid-Open No. 2003-150194 (especially paragraphs 0009—0011, 0072)
- Non-Patent Document 1 Yoshinobu Yamamoto, Takao Matsui, Kazuo Kai, Satoshi Umeda, Yuichiro Anzai, “Interaction with a computing system A Study on Factors that Promote Awareness, " Knowledge Science, Vol. 1, No. 1, PP. 107-120, Kyoritsu Shuppan, May 1994
- Non-Patent Document 2 Etsuko Harada, "The Effect of Agentity and Social Context in Voice Interface: Examination by Message Experiment", The 19th Annual Meeting of the Japanese Society for Cognitive Science, pp.14-15, June 2002
- Non-Patent Document 3 Tomio Watanabe, "Through the Development of E-COSMIC, a Physical Communication System that Engages and Engages in Physical Communication", Baby Science, Vol.2, pp.4-12, 2002
- the adaptive interface aims to realize a more user-friendly interface by adapting to the user, and many ideas have been devised for better adaptation to the user.
- the device side was not always adaptable to the user.
- the present invention considers that it is not always possible to adapt to a user who simply responds to the user as a device having an interface with the user.
- the goal is to promote behavior and impression changes without awareness, and to realize smooth interaction between users and devices.
- the present invention focuses on the following points.
- the content of the information or service to be provided can be considered separately from the method of providing the information (method of responding to the user).
- the behavior of the user and the impression that the user receives from the device may change.
- Such a viewpoint is positively utilized in the present invention.
- the utterance "Sorry” corresponds to the information content, and the utterance speed, intonation, and actions such as lowering the head of the agent displayed on the screen. And the like correspond to a method of responding to the user.
- the utterance “Sorry” is given, depending on the information presentation method such as the utterance speed, intonation, movement of the body of the agent, etc., it is possible to make the user feel that he / she is not actually apologizing, or to make the user more displeased.
- Non-Patent Document 1 suggests that, even when dealing with a device, humans may have a joyful or boring time depending on their own beliefs. are doing.
- Non-Patent Document 2 also shows that the reaction of the user can change depending on the type of device the user faces. From these examples, it can be seen that (1) the user changes the form of reaction to the device based on his / her impression and belief in the device, and (2) the user's ease of use changes depending on the type of device facing the device.
- the user's impression and reaction can be controlled by adjusting the interface part, which is the method of responding to the user. It is considered possible. This is the gist of the present invention. Adjusting the response method to the user can be said to be a secondary information transmission means.
- index For example, it can be considered that how much internal processing of a device is completed, that is, the processing status of input signals is used as an index.
- the processing of the device is not keeping up with the user, the information content is provided to the user, and the providing method is adjusted so that "I want you to speak more slowly” and "The device you are facing” Is not a device that can respond so quickly. " As a result, it is expected that the user will understand consciously or unconsciously that the processing of the device has not caught up, and will naturally change the response to the device.
- the utterance speed is adjusted as an information providing method, and the internal processing of the device cannot keep up with the utterance from the user, etc.
- the device adjusts the utterance speed to its own speed (slow speech)
- the user will It is thought that the tempo of the utterance is reduced, for example, when the user feels that the device is coming, is coming, is not. This is intended to make the pull-in phenomenon in the communication between humans shown in Non-Patent Document 3 described above hold between the user and the device.
- Patent Document 3 when the user's utterance is uttered too quickly and is erroneously recognized, the utterance from the system side is performed at an utterance speed slower than the user's utterance speed. It is disclosed that the user is naturally guided to speak at a slower and easier-to-recognize speech rate.
- the method of responding to the user may be adjusted based on information such as the user's state and mood detected from the input signal, or may be adjusted according to the degree of interaction between the user and the device. You may. In addition, if the device learns knowledge about the user, the response method may be adjusted according to the learning degree.
- the inventors of the present application have obtained a new finding from a test or the like described later that the user is not necessarily drawn into the guidance of the device even if the guidance is performed from the device side. From this new knowledge, we thought that it would be preferable to use a combination of natural guidance and forced guidance for the user. By combining the natural guidance and the forced guidance, it is possible to guide the user who can be guided naturally without being conscious of the user. can do. In other words, it is possible to reliably guide users while minimizing opportunities for giving discomfort.
- the present invention provides a user-adaptive apparatus for communicating with a user, which obtains an input signal indicating at least one of a user operation, a state, and a request, and converts the obtained input signal.
- a user which obtains an input signal indicating at least one of a user operation, a state, and a request, and converts the obtained input signal.
- Process to detect information about the user determine the response to the user based on this detection result, and process the input signal, information about the user detected from the input signal, and knowledge about the user.
- the method of responding to the user is adjusted, and the determined response content is output according to the adjusted response method. Then, the response of the user to the output is detected, and when the response of the user does not show a change expected by the adjusted response method, a response content for prompting the user to change is determined.
- the content of the response to the user is determined from the information on the user detected by processing the input signal, and the method of responding to the user is determined by the processing state of the input signal and the detection from the input signal.
- the adjustment is made based on at least one of the information on the user and the learning level of the knowledge about the user.
- the response method can be adjusted so as to encourage a change in the user's behavior and impression, and the Natural guidance can be realized, and the ability to realize smooth interaction between the user and the device can be achieved.
- the response content for prompting the user to change is determined, so that the user can be forcibly guided.
- the conventional adaptive interface observes the situation of the user and attempts to improve the convenience by adapting to the user.
- the present invention considers that it is not always possible to adapt to the user. Then, by adjusting the response method, the user's approach to the device and changes in the impression are promoted. As a result, even in a situation where the user is dissatisfied in the past, the user's dissatisfaction can be relieved by prompting a change in the user's impression, and it can serve as a smooth interface.
- the device When the device utters in a voice dialogue, the utterance speed, vocabulary to be used, and intonation are adjusted as a response method to the user.
- the operation speed of the actuator is adjusted as a response method to the user.
- the device recommends useful information
- the form of the agent displayed on the screen for example, the facial expression or clothes is adjusted as a response method to the user.
- the present invention by adjusting the response method from the device side to the user, it is possible to promote a change in the user's impression and behavior of the device, thereby realizing smoother communication between the user and the device. can do. Furthermore, when the user's response does not change as expected by the adjusted response method, the user can be forcibly guided.
- FIG. 1 is a conceptual diagram of the configuration of the present invention.
- FIG. 2 is an image diagram of the first embodiment, in which a user removes a box from a home robot. It is a figure which shows the case where it has come.
- FIG. 3 (a) is an example of a dialogue in the situation of FIG. 2, and FIG. 3 (b) is a graph showing the relationship between speech rate and recognition level.
- FIG. 4 is a block diagram showing a configuration of an interface unit in the user adaptive device according to the first embodiment of the present invention.
- FIG. 5 is a flowchart showing an operation of the configuration of FIG. 4.
- FIG. 6 is a block diagram showing a configuration of an interface unit in a user adaptive device according to the second embodiment of the present invention.
- FIG. 7 is a flowchart showing an operation of the configuration of FIG. 6.
- FIG. 8 is an image diagram of the third embodiment, and shows a case where an information terminal recommends information to a user.
- FIG. 9 is a block diagram showing a configuration of an interface unit in a user adaptive device according to the third embodiment of the present invention.
- FIG. 10 shows the utterance speed of each subject when reading aloud a single utterance obtained in Experiment 1.
- FIG. 11 is a schematic diagram showing classifications of changes in utterance speed.
- FIG. 12 is a graph showing the results of Experiment 1.
- FIG. 13 shows the dialog sequence in Experiment 2.
- FIG. 14 is a graph showing the results of Experiment 2.
- a user having an interface unit for interacting with a user is provided.
- the interface unit obtains an input signal indicating at least one of a user's operation, state, and request, and processes an input signal obtained by the input unit.
- An input processing unit that detects information about the user, a response content determination unit that determines the content of a response to the user based on a detection result by the input processing unit, a processing state in the input processing unit, and the input signal.
- a response method adjustment unit that adjusts a response method to the user based on at least four of the detected information about the user and the learning degree of knowledge about the user; and determining the response content.
- An output unit for outputting the response content determined by the unit according to the response method adjusted by the response method adjustment unit.
- the processing unit is for detecting a response of the user to the output of the output unit, and when the response of the user does not show a change expected by the response method, the processing is performed by the response content determination unit.
- an instruction is provided for instructing the user to determine a response content for prompting a change.
- the user-adaptive device performs a voice dialogue with a user
- the input unit acquires the utterance of the user as a voice signal
- the input processing unit includes the voice signal.
- the speech content of the user is detected, and the content of the utterance of the user is detected, and the response content determination unit determines the content of the utterance to the user based on the content of the utterance of the user detected by the input processing unit.
- the response method adjustment unit provides the user adaptive apparatus according to the first aspect, which adjusts the utterance method based on the recognition state in the voice recognition processing.
- the response method adjustment unit may be configured to use the user adaptive type apparatus according to the second aspect, which adjusts at least any four of utterance speed, vocabulary, and intonation as the utterance method. provide.
- the response method adjustment section adjusts an utterance speed as the utterance method, and when the utterance speed of the user is lower than a target value, the utterance method.
- a second aspect of the present invention provides a user-adaptive device in which the speed is set higher than the target value, or when the user's utterance speed is higher than the target value, the utterance speed is set lower than the target value.
- the response method adjustment unit changes the utterance speed to approach the target value when the utterance speed of the user approaches the target value. To provide a user adaptive device.
- the response method adjustment section adjusts the utterance speed as the utterance method, and determines a target value of the utterance speed of the user according to the user.
- a user adaptive device according to a second aspect is provided.
- the user adaptive device provides an operation to a user, and the input unit inputs a signal indicating a state and an operation of the user,
- the input processing unit processes a signal input to the input unit to recognize the request of the user, and the response content determination unit responds to the request of the user recognized by the input processing unit.
- the content of the operation to be provided to the user is determined, and the response method adjustment unit recognizes the degree of cooperation between the user and the user-adaptive device from the signal input to the input unit, and recognizes the recognized operation.
- a user-adaptive device according to a first aspect, which adjusts a method of providing an operation according to a degree of cooperation.
- the response method adjustment unit provides the user-adaptive apparatus according to the seventh aspect, which adjusts an operation speed as a method of providing an operation.
- the user-adaptive device is for providing information to a user, and has a function of learning a preference model of the user.
- a signal indicating a request from the user is obtained, the input processing unit determines a request from the user based on the signal obtained by the input unit, and the response content determination unit determines the user determined by the input processing unit.
- the response method adjusting unit adjusts the information providing method based on the learning degree of the preference model by determining the information content to be provided to the user with reference to the preference model based on the request from the user.
- the response method adjustment unit adjusts at least any one of a vocabulary and a form of an agent to be displayed on a screen as a method for providing information, by a user according to a ninth aspect.
- An adaptive device is provided.
- an interaction with a user is performed.
- Control method for the user at least one of the user's operation, state and request
- a first step of obtaining an input signal indicating one, a second step of processing the input signal obtained in the first step, and detecting information of the user, and a detection result in the second step A third step of determining the contents of a response to the user based on the information processing state of the input signal, information of the user detected from the input signal, and a degree of learning of knowledge about the user.
- a fourth step of adjusting a response method to the user based on at least one of the following: a response method adjusted in the fourth step by adjusting the response content determined in the third step.
- FIG. 1 is a diagram showing an outline of a configuration of an interface unit 10 in a user adaptive device according to the present invention.
- An interface unit 10 as shown in FIG. 1 is incorporated as a part of a user-adaptive device that provides functions and information to a user 7 such as a robot and an information terminal. Then, information about the user 7 is provided to the other components of the apparatus, and output is received from the other components to respond to the user 7.
- the input unit 1 inputs an action from the user 7.
- the actions from the user 7 include utterance, gesture, facial expression, switch, keyboard, mouse operation, and physiological state of the user.
- the input unit 1 is equipped with a sensor for taking in such information into the device or is communicable with the sensor, and acquires user information as an electric signal that can be processed inside the device. That is, the input unit 1 acquires an input signal indicating at least one of the operation, the state, and the request of the user 7.
- the input processing unit 2 processes the input signal acquired by the input unit 1 and converts the input signal into an expression of a level such as the state, intention, and request of the user 7. That is, the information of the user 7 is detected.
- the response content determination unit 3 determines the response of the device based on the user's state, intention, and request acquired by the input processing unit 2, and outputs the response to the output unit 6. That is, the content of the response to the user 7 is determined based on the detection result by the input processing unit 2.
- the flow of a series of processes from the input unit 1 to the input processing unit 2, the response content determining unit 3, and the output unit 6 is the same as that of the conventional interface.
- a response method adjustment unit 4 for adjusting the response method to the user 7 is provided.
- the response method adjustment unit 4 adjusts a response method for the user 7.
- the content of the information and services provided to User 7 and the method of providing them can be considered separately and can be controlled relatively independently.
- One of the features of the present invention is that the response method is changed to promote adaptation on the user side. The user's adaptation at this time is made as a result of a natural reaction that is not intentional, and the response method is adjusted so as to promote the natural reaction.
- the user 7 does not feel dissatisfied with the device, and can perform smooth communication with the device.
- the adjustment of the response method is performed depending on how smoothly information and services are exchanged between the user and the apparatus.
- the response method shall be adjusted according to the degree to which the user's request matches the possible operation of the device.
- the processing may be performed according to the processing state of the input processing unit 2.
- a response method such as an utterance speed is adjusted according to a processing state of a voice recognition process. From the processing state of the voice recognition processing, it is possible to detect whether the voice recognition is performed smoothly, whether the voice is strong, and whether the voice dialogue with the user is established.
- the response method may be adjusted according to the information regarding the user 7 whose input signal strength has been detected.
- the user's reaction to the output synthesized speech for example, adjustment may be made in accordance with the power to show signs that words are not communicated, and in the example of cooperative work with a mouth bot, , The degree of cooperation between the robot and the user (the movement is performed without any delay, and it must be adjusted according to the force of the movement).
- the response method may be adjusted according to the learning degree.
- the response method may be adjusted based on the information, or the determination may be made by combining a plurality of pieces of information.
- a user speaks to a device by voice
- the user has an expectation for some device operation. If the device is operating as expected, the user can use the device naturally, but if there are many operations that do not meet the expectations, the user may be distrusted with the device. Conceivable.
- voice communication between the humanoid robot and the user is not always realized smoothly.
- the humanoid robot is not as good as expected from the user due to its appearance and tone. You can't talk. This is in contrast to the phenomenon that a cat-type or dog-type robot seems to be able to communicate with the user, even though the vocabulary and utterance method are limited.
- FIG. 2 is a diagram conceptually showing a case where the user has the home robot pick up the luggage.
- 11 is a user
- 12 is a home robot having an interface function according to the present embodiment
- BX1, BX2, and BX3 are boxes.
- the user 11 sends a request to the robot 12 by voice.
- the user 11 requests the robot 12 to “take the white box BX1”.
- the robot 12 responds to the user 11 by voice and performs an operation according to the request of the user 11. Further, the robot 12 adjusts its own utterance speed according to the recognition degree of the utterance of the user 11.
- FIG. 3 (a) is an example of the dialogue in the situation of FIG.
- FIG. 3A shows the utterance speed for each utterance content, and the utterance A of the user 11 indicates the degree of recognition indicating the goodness of the recognition processing of the robot 12.
- FIG. 3 (b) is a graph showing the relationship between the speech rate and the degree of recognition.
- the speech rate and the degree of recognition are given convenient numbers.
- the user 11 requests the robot 12 to “bring a box”. Assume that the utterance speed at this time is 100 and the degree of recognition is 60. In general, in speech recognition processing, there is an appropriate utterance speed that can maximize the recognition performance. According to the relationship in FIG. 3 (b), the recognition performance is the best when the utterance speed is around 90. Here, the target value of the speech speed is set to 90. Since the current utterance speed of the user 12 is higher than this target value, the robot 12 takes measures to reduce the utterance speed of the user 12. Here, the utterance speed of the robot 12 that encourages the user 11 to adapt is reduced to 80, which is lower than the target value of 90.
- the robot 12 since the robot 12 cannot specify which box BX1, BX2, or BX3 to bring, the robot 12 creates an utterance content for confirming which box the user wants to bring. As a result, the robot 12 asks the user 11 "Which box is it?"
- the user 11 In response to the question of the robot 12, the user 11 answers "white box". At this time, the user 11 is affected by the utterance speed of the robot 12, and lowers the utterance speed without being particularly conscious of himself. As a result, the utterance speed changes to 90, and the recognition level is greatly improved to 80. That is, the utterance content is transmitted from the robot 12 to the user 11, and an action is performed so that the recognition process can be performed well.
- the robot 12 accurately recognizes that the task requested by the user 11 is “take the white box BX1”, and performs the delivery of the white box BX1 while saying “Yes, please”. .
- the utterance speed of the user 11 is an appropriate value that can perform the recognition process well, and it is no longer necessary to promote the adaptation. Therefore, the utterance speed of the user 11 is adjusted to the same 90 as the user 11.
- the user 11 and the robot 12 can communicate with each other at an utterance speed appropriate for the recognition process. Further, when the speech speed of the user 11 changes and the degree of recognition decreases, the speech speed may be appropriately adjusted as described above. In this way, you To maintain the system so that the recognition process can always be performed properly by adjusting the utterance speed of the user and controlling the utterance speed of the user 12 to an appropriate range while performing the task requested by the user 11. Can be.
- the utterance speed of the system is set to be lower than the target value, so that the utterance speed of the user can be made closer to the target value. can do.
- the utterance speed of the system is set higher than this target value, so that the user's utterance speed can be made closer to the target value. can do. Also in this case, when the user's utterance speed approaches the target value, it is preferable to change the utterance speed on the system side from the initial setting so as to approach the target value. For example, after the user's utterance speed has been guided to near the target value, the system side does not need to derive the user's utterance speed. It is sufficient to reset the speech rate to the target value close to.
- the target value of the user's utterance speed is not necessarily determined only by the convenience of the device side. As will be described later, since there is a great difference between individuals in the speech speed, it is preferable to determine the target value of the speech speed according to the user. In other words, by setting the target value to a value close to the user's utterance speed within a range in which the recognition performance can be appropriately obtained, the utterance speed can be guided so that the user does not feel uncomfortable with the utterance of the device. It becomes possible. For example, for a person having a very low utterance speed, the target value may be set near the lower limit of the utterance speed range in which recognition performance is not obtained at the optimum value on the device side. For a person with a high speech rate, the target value may be set near the upper limit of the range of the recognizable speech rate.
- FIG. 4 is a block diagram showing a configuration of the interface unit 20 in the user adaptive device according to the present embodiment.
- a recognition state detection unit 24 and a speech speed determination unit 25 correspond to a response method adjustment unit
- a voice input unit 21, a voice recognition unit 22, a speech content determination unit 23, and a voice output unit 26 Correspond to an input unit, an input processing unit, a response content determination unit, and an output unit, respectively.
- the voice input unit 21 is a unit that obtains a voice input from the user 11, and is usually configured by a microphone or the like for obtaining voice.
- the voice uttered by the user 11 is converted by the voice input unit 21 into an electric signal that can be processed inside the interface unit 20.
- the voice recognition unit 22 processes the voice electric signal acquired by the voice input unit 21 and converts it into the utterance content of the user 11. That is, a so-called voice recognition process is performed.
- the utterance content determination unit 23 determines the utterance content for the user 11 based on the speech recognition result processed by the speech recognition unit 22.
- the utterance content determination unit 23 stores various dialogue examples S, rules, and knowledge bases, such as "Thank you”, “Welcome to you,” "Where is one?" ing.
- the recognition state detection unit 24 acquires a signal related to the recognition state from the voice recognition unit 22, and detects whether the recognition state is good or not. Since the utterance of the user 11 is not necessarily a speech recognition process, the processing result often includes an error. In most speech recognition processing, a signal indicating the reliability of the processing result is obtained. For example, in speech recognition using a neural network, an output value output together with each candidate for a recognition result can be treated as reliability for each recognition result. When this output value is obtained in the range of 0 to 1, for example, when the output value is 0.9, the recognition result is often not wrong, and conversely, when the output value is 0.5, the recognition result is Is low, but it is output for the time being. The recognition state detection unit 24 calculates the quality of the recognition state from such values obtained from the speech recognition unit 22.
- the speech speed determination unit 25 determines the speech speed of the utterance to the user 11 based on the degree of the recognition state calculated by the recognition state detection unit 24. Specifically, for example, the value of the speech rate stored as an internal parameter in the interface unit 20 is adjusted. In speech recognition processing, it is generally known that there is an appropriate utterance speed at which the recognition rate is the highest, and that the recognition rate decreases as the speech rate increases.
- the audio output unit 26 includes, for example, a circuit unit including a D / A conversion unit and a speaker, and outputs a synthesized voice indicating the utterance content determined by the utterance content determination unit 23 to the speech speed determination unit. Generate and output according to the speech rate determined by 25.
- the voice is acquired by the voice input unit 21 and converted into an electric signal (S11). Then, the voice recognition unit 22 performs a voice recognition process using the electric signal generated in step S11 (S12).
- the recognition state detection unit 24 determines whether or not the recognition state of the voice recognition is good based on the signal acquired from the voice recognition unit 22 (S13).
- the determination is based on, for example, the degree of variation in data representing reliability, such as the output value in the neural network described above, relating to a plurality of recognition candidates identified in the speech recognition processing. And do it. In other words, if the reliability of only a specific candidate is high, the recognition state is determined to be good, while if the reliability of any candidate is low, the recognition state is determined to be bad. Also, when the recognition process itself is not successful and no recognition candidate is obtained, it is determined that the recognition state is bad.
- step S13 when it is determined that the recognition state is good, the process proceeds to step S14.
- step S14 it is assumed that the speech speed determination unit 25 does not control the speech speed for promoting adaptation. That is, for example, the utterance speed is set to the same value as the value set last time, or adjusted to the utterance speed of the user.
- the reason for matching the user's utterance speed is that when communication is good, it is assumed that the utterance speed of both the user and the device is similar.
- step S15 the recognition state detection unit 24 determines whether the utterance speed of the user 11 is too high. That is, the current utterance speed of the user 11 is calculated from the recognition state of the voice recognition unit 22, and the utterance speed is stored in the interface unit 20 in advance and compared with the optimum utterance speed. If the utterance speed of the user 11 is faster (Yes), the utterance speed determiner 25 sets the utterance speed lower than the current setting (S16). On the other hand, if the utterance speed of the user 11 is lower (No), the utterance speed determiner 25 sets the utterance speed higher than the current setting (S17). Adjustment of the speaking rate is, for example, to reduce or add a certain amount to the current speaking rate. Or by multiplying by a certain amount smaller or larger than one.
- the utterance content determination unit 23 determines the user 11 recognized by the speech recognition unit 22. The content of the response to the utterance content is determined (S18). Then, the voice output unit 26 outputs the utterance content determined by the utterance content determination unit 23 to the user 11 at the utterance speed determined by the speech speed determination unit 25 (S19).
- the utterance speed of the voice response of the apparatus slightly changes with respect to the utterance of the user 11.
- User 11 also considers that during speech dialogue with the device, the user's own speech speed is naturally reduced or increased in accordance with the speech speed of the device due to the pull-in phenomenon seen during human-to-human communication. It is possible.
- Such a change in the utterance speed is not consciously performed by the user 11 himself. That is, the utterance speed can be naturally controlled without the user 11 being conscious of anything, and this makes it possible to obtain an utterance input that is easy to recognize, thereby improving the recognition rate and improving the recognition rate.
- the story can proceed smoothly.
- the user is prompted to change the utterance speed without being conscious of it. Communication can be facilitated.
- the user does not need to intentionally adapt to the device, and the mutual entrainment process that the user naturally goes with other people is realized. Therefore, the voice conversation between the user and the device can be performed smoothly without burdening the user.
- the utterance speed is adjusted as the adjustment of the method of responding to the user, but the present invention is not limited to this.
- the speech vocabulary may be adjusted. Adjusting the utterance vocabulary changes the impression when the user hears the utterance content That can be S. Examples of patterns that change the vocabulary include vocabulary spoken by children and vocabulary spoken by adults, vocabularies with different degrees of politeness (eg, polite and violent speaking), vocabulary with different intimacy (friendly speaking) And business-like style).
- the intonation of the utterance may be adjusted. By adjusting the intonation, it is thought that the user's speech can be slowed down, calmed down, or calmed down even though the same language is used. Of course, all or some of the response methods such as speaking speed, vocabulary, intonation, etc. may be combined and adjusted.
- the adjustment of the response method does not always change the user's response as expected on the system side. For example, as suggested by the results of experiments described later, in the dialogue between a person and the system, some users do not change their own speaking speed even if the system adjusts the speaking speed. Therefore, if the response of the user does not change as expected even if the response method is adjusted, it is preferable to output such that the request is directly transmitted to the user.
- the input processing unit 2 detects the response of the user 7 to the output of the output unit 6, and the response of the user 7 has a change expected by the response method adjusted by the response method adjustment unit 4. If not, it is preferable to instruct the response content determination unit 3 to determine a response content for prompting the user 7 to make a change. For example, if there is no change in the utterance speed of the user 7 even when the utterance speed is changed, the input processing unit 2 causes the response content determination unit 3 to determine a response content that prompts a change in the utterance speed. Instruct. In response to this instruction, the response content determination unit 3 adds a message such as “Can you speak a little more slowly?” In addition to the utterance content for establishing a dialogue with the user 7. As a result, a request from the system, such as a request to lower the utterance speed, can be transmitted directly to the user 7.
- each subject was asked to make a single utterance, such as reading a news manuscript, and the utterance speed at that time was compared with the standard utterance speed of that subject in the sense that it was not affected by the other party. did.
- the utterance speed was calculated based on the number of characters uttered per second (the number of characters included in the uttered voice ⁇ the time required for utterance).
- FIG. 10 shows the standard utterance speed of each subject. As can be seen from Fig. 10, in the group of subjects who participated in the experiment, the utterance speed was distributed up to 6.88 characters Z seconds-1.69 characters / second, and the difference between slow and fast utterers was 1 . More than 5 times, with considerable variation. The average speaking speed is 8.84 characters / sec.
- the change in the utterance speed was classified into four types from the viewpoint of how it changed with respect to the other party.
- the normal utterance speed of yourself (A) is Vda
- the normal utterance speed of the other party (Mr. B) is Vdb
- your utterance speed Vna at the time of dialogue example n is Vnb
- the utterance speed is Vnb.
- the axis of the speech rate is set in the vertical direction, and the positions of the speech rates Vda, Vna, and Vnb are shown on the axis.
- D sign (Vnb—Vda) X sign (Vna-Vda) X abs (Vna-Vda)... (Equation 1) where sign is a function that extracts only plus and minus signs, and abs is an absolute value Function. When D> 0, it indicates that the utterance speed is synchronized with the partner, and when D> 0, it indicates that the utterance speed is not synchronized with the partner. Also, the magnitude of the value of D indicates how much the speech speed is synchronized.
- FIG. 12 is a graph in which the values of the above discriminant D are plotted for the speech data obtained in Experiment 1.
- the horizontal axis is the subject ID
- the vertical axis is the value of the discriminant D
- the unit is (character Z Seconds).
- the discriminant D 2.
- the automatic response system used in this experiment realizes dialogue with the subject by detecting the end of the utterance of the user and then playing back a voice file recorded in advance.
- One woman sings a spoken dialogue as the audio played by the system
- the recording of the voice during the operation was used.
- a voice was created that expanded and contracted to 80% and 120% in the time direction while maintaining the pitch value.
- the file with the utterance time converted to 80% has the highest utterance speed, with the order of 80% (fast utterance, High), 100% (as recorded, Middle), and 120% (slow utterance, Low). Audio files with three different utterance rates were prepared.
- FIG. 13 is a table showing the order of conversation in Experiment 2.
- the first number in each column indicates the number of the dialogue sentence, and the HML symbol indicates the utterance speed.
- M indicates the speed of the recorded file, L indicates a slow utterance, and H indicates a fast utterance.
- “2_H” indicates that the user has interacted with H (fast utterance) in the dialogue sentence 2.
- the content of the dialogue was different for each subject every time.
- FIG. 14 is a graph in which the values of the above discriminant D are plotted for the utterance data obtained in Experiment 2. From Fig. 14, it can be seen that the speech rate of the subject matches the system in most dialogues. Of the 18 dialogues obtained in the experiment, the discrimination formula D> 0 in 16 dialogues.
- Robots used in homes and the like unlike information terminals and software agents, interface with users as well as exchange of languages and information, as well as exchange of physical objects and collaboration.
- the device provides to the user
- things such as operation of an object, gesture, work, and the like can be considered, and these are referred to as “operations” in the present embodiment.
- operation there is a “method” aspect of how to provide the operation, in addition to the function provided by the operation itself, and the impression received by the user greatly changes depending on the “method”.
- an example will be described in which the “method” for providing the “operation” is adjusted to prompt the user to adapt.
- the utterance speed is adjusted according to the state of speech recognition inside the device.
- the difference between the externally output “operation” and the user's operation or the degree of cooperation is greatly different from the first embodiment. Is different.
- the robot In normal use, the robot must also operate at a speed similar to the user's operation speed for smooth cooperative operation. However, when the user is performing a very fast movement and the movement at the same speed exceeds the robot's ability due to the limitations of the mechanism etc., or when handing hot tea or a knife, for example, If it is not safe to operate at the same speed as the robot, the robot needs to encourage the user to adapt to the ideal operating speed required by the robot instead of operating at the same speed as the user. . Thereby, the user can take a cooperative action without always being dissatisfied with the operation speed of the robot.
- various impressions can be additionally given to the user who can not only achieve smooth cooperative operation.
- a fast motion can give an impression such as “moving and reliable”
- a slow motion can give an impression such as “calm”.
- the operation speed of the robot affects the action speed of the user.
- a bow I pulling force S can occur between the robot and the user.
- the robot passes the object in a slow motion, the user will receive the object in a slow motion, affected by the slow motion of the robot. .
- FIG. 6 is a block diagram showing a configuration of an interface unit 30 in a robot as a user adaptive device according to the present embodiment.
- a robot that has the ability to move autonomously and the ability to manipulate objects using arms, and that can move by itself and move objects.
- the state input unit 31 and the operation input unit 33 correspond to an input unit
- the operation shift recognition unit 35 and the operation speed determination unit 36 correspond to a response method adjustment unit.
- the state recognition unit 32, the operation content determination unit 34, and the operation output unit 37 correspond to the input processing unit, the response content determination unit, and the output unit, respectively.
- the state input unit 31 acquires the state of the user 11 facing the robot.
- the state of the user 11 indicates a gesture instruction, a facial expression, an action, and the like to the robot.
- the state input unit 31 includes, for example, a camera for photographing the user 11, a microphone for inputting speech, and the like, and a sensor for measuring the physiological state of the user 11 (3D position). Communication with sensors, perspiration sensors, electroencephalographs, etc.)
- the state recognition unit 32 processes the signal acquired by the state input unit 31 to recognize and output the state of the user 11.
- the output contents include the request contents of the user 11 for the robot and the physiological state of the user 11 such as being tired or having fun.
- the operation content determination unit 34 receives the output of the state recognition unit 32 and determines what function or operation is actually output to the user 11.
- the motion input unit 33 is provided to determine whether or not the robot and the user 11 are cooperatively operating well.
- the motion input unit 33 is attached to a camera that reflects the motion of the user 11 or a robot hand. And a pressure sensor. Elements such as a camera constituting the operation input unit 33 may be shared with the state input unit 31.
- the motion deviation recognition unit 35 receives the output of the motion input unit 33 and recognizes a deviation between the motion of the user 11 and the robot. This shift is used as an index indicating the degree of coordination of the motion between the user 11 and the robot.
- the operation deviation recognition unit 35 for recognizing the deviation of the operation leading to such user dissatisfaction. Specifically, by measuring the operating speed of the user and the robot itself from the camera image, and measuring the time from when the robot completes the operation of passing the object to when the user receives the object, each other's operation is performed. The deviation can be recognized. The fact that the user has received the object can be detected by a pressure sensor or the like mounted on the robot hand.
- the motion speed determination unit 36 uses the motion deviation between the user and the robot recognized by the motion deviation recognition unit 35 and the state of the user recognized from the state recognition unit 32 to determine how the robot performs.
- the ideal cooperative movement speed is determined in consideration of the power at which the ideal speed is ideal and what kind of impression you want to give the impression of safety.
- the operation output unit 37 outputs the operation or function determined by the operation content determination unit 34 to the user 37 according to the operation speed determined by the operation speed determination unit 36.
- FIG. 7 is a flowchart showing the operation of the configuration of FIG. Comparing the flow of FIG. 7 with the flow of FIG. 5 in the first embodiment described above, there is a difference between the power of the interface with the user and the power of the operation and the voice (conversation). Is almost the same. 7.
- FIG. 7 is different from FIG. That is, in the present embodiment, there is a step S23 for recognizing a difference between an ideal operation speed for the robot to determine the degree of cooperation with the robot and the current user operation speed. The robot's operating speed is adjusted accordingly.
- smoother cooperative operation can be realized by adjusting the operation speed of the robot based on the deviation of the operation between the user and the robot. Can be.
- an explanation will be given using an information terminal that recommends information to a user as an example.
- an agent is displayed on the screen of the information terminal, and the agent presents information according to the user's preference by voice.
- the output method such as the utterance speed is adjusted in accordance with the state of speech recognition.
- the output method is adjusted in accordance with the shift in the cooperative work between the user and the robot. Output methods such as operating speed have been adjusted.
- the user's preference model is learned on the information terminal side, and the method of providing information such as the form (look) and vocabulary of the agent is adjusted according to the learning degree. I do. That is, the point that the amount of knowledge about the user obtained from outside is reflected in the adjustment of the providing method is different from the first and second embodiments.
- FIG. 8 is a diagram conceptually showing a case in which restaurant information is provided from an information terminal to a user by an agent.
- 13 is an information terminal having an interface function according to the present embodiment
- 14 is a display screen
- 15A and 15B are agents.
- (a) shows the state when the user's preference model has not been learned much
- (b) shows the state after the user's preference model has been learned.
- the information terminal 13 learns the user's preference model from the interaction with the user.
- the preference model has not yet been sufficiently studied, so it is not clear what kind of recommendation information the user likes. For this reason, if the user has excessive expectations for the information recommendation function, the degree of discouragement increases when the user does not like the recommended one.
- an agent 15A that looks like a toddler is displayed on the screen 14, and the vocabulary of the utterance is set to be "delicious chiyyo" and the toddler language.
- the user may temporarily dislike the recommended information. Even when there is no information terminal, the impression of the information terminal 13 is not so bad. By doing so, it is thought that the user is less likely to get angry or uncomfortable.
- FIG. 9 is a block diagram showing a configuration of an interface unit 40 in an information terminal as a user adaptive device according to the present embodiment.
- a response method adjustment unit is configured by the processing state detection unit 43 and the response method determination unit 46.
- the input unit 41, the input processing unit 42, the information content determining unit 45, and the output unit 47 correspond to the input unit, the input processing unit, the response content determining unit, and the output unit, respectively.
- an input unit 41 receives a user's action through a keyboard touch panel, a microphone, or the like.
- the input unit 41 converts the utterance or instruction of the user 11 into an electric signal.
- the input unit 41 also acquires a user's response to information output from an output unit 47 described later.
- the input processing unit 42 receives the signal from the input unit 41 and determines the content of the request from the user 11.
- information on the reaction of the user 11 to the information output from the output unit 47 is also acquired.
- the processing state detection unit 43 receives the output of the input processing unit 42 and updates the user 11's preference model stored in the storage unit 44. For example, the content of the request from the user, the content of the information provided to the user, and the reaction of the user at that time are stored together.
- the past history may be simply stored, or may be stored in a categorized manner. That is, The preference model stored in the storage unit 44 gradually changes to a high-precision one by repeatedly interacting with the user 11.
- the information content determination unit 45 determines the output content for the current user 11 request based on the request content of the user 11 determined by the input processing unit 42 and the preference model stored in the storage unit 44.
- the response method determination unit 46 adjusts a method of providing information such as vocabulary / appearance of the agent according to the learning degree of the preference model stored in the storage unit 44. In other words, the method of providing information is adjusted depending on how accurately the preference model reflects the user's preference. Then, the output unit 47 outputs the information content determined by the information content determination unit 45 according to the providing method determined by the response method determination unit 46.
- the adjustment of the information providing method indirectly indicates to the user 11 how much he or she knows the preference of the user 11.
- the information is communicated by adjusting the providing method. For example, use a phrase such as "How about one?" Conversely, when the preferences of user 11 have been properly learned, this is also communicated by adjusting the provision method. For example, use a vocabulary such as "One is perfect.”
- the recommended information is temporarily not suitable.
- the user can naturally accept.
- the user's preference is gradually learned while the user naturally repeats the interaction with the device without being particularly aware of the learning process on the information terminal side.
- the present embodiment has been described by taking information recommendation as an example, the present invention can be applied to other cases, for example, even when a user acquires information through a dialogue from an information terminal.
- a device having advanced functions such as the force S described as an individual case and a home robot, must also have a voice interaction capability, a cooperative work capability, an information recommendation capability, and the like. It can adjust the way of responding to users concurrently or in an integrated manner. By coordinating multiple response methods at the same time, users can communicate more naturally.
- the device since communication between the device and the user becomes smoother, it is considered that it is effective to promote general adaptation of the device having a user interface, in particular, a home robot, an information terminal, Useful for home appliances.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Manipulator (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005517657A JP3924583B2 (ja) | 2004-02-03 | 2005-01-28 | ユーザ適応型装置およびその制御方法 |
US11/449,852 US7684977B2 (en) | 2004-02-03 | 2006-06-08 | User adaptive system and control method thereof |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-026647 | 2004-02-03 | ||
JP2004026647 | 2004-02-03 | ||
JP2004-275476 | 2004-09-22 | ||
JP2004275476 | 2004-09-22 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/449,852 Continuation US7684977B2 (en) | 2004-02-03 | 2006-06-08 | User adaptive system and control method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005076258A1 true WO2005076258A1 (ja) | 2005-08-18 |
Family
ID=34840123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/001219 WO2005076258A1 (ja) | 2004-02-03 | 2005-01-28 | ユーザ適応型装置およびその制御方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7684977B2 (ja) |
JP (1) | JP3924583B2 (ja) |
WO (1) | WO2005076258A1 (ja) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016109897A (ja) * | 2014-12-08 | 2016-06-20 | シャープ株式会社 | 電子機器、発話制御方法、およびプログラム |
KR20160074388A (ko) * | 2014-12-18 | 2016-06-28 | 삼성전자주식회사 | 전자 장치의 제어 방법 및 장치 |
WO2017168936A1 (ja) * | 2016-03-31 | 2017-10-05 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
WO2019073668A1 (ja) * | 2017-10-11 | 2019-04-18 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
JP2019211909A (ja) * | 2018-06-01 | 2019-12-12 | 凸版印刷株式会社 | 情報提示システム、情報提示方法及びプログラム |
JP2020018794A (ja) * | 2018-08-03 | 2020-02-06 | 株式会社ニデック | 眼科画像処理装置、oct装置、および眼科画像処理プログラム |
JP2021503112A (ja) * | 2017-09-29 | 2021-02-04 | トルーク インコーポレイテッドTorooc Inc. | 自律行動ロボットを利用して対話サービスを提供する方法、システム、及び非一時的なコンピュータ読み取り可能な記録媒体 |
CN112533526A (zh) * | 2018-08-03 | 2021-03-19 | 尼德克株式会社 | 眼科图像处理装置、oct装置及眼科图像处理程序 |
JP2021117296A (ja) * | 2020-01-23 | 2021-08-10 | トヨタ自動車株式会社 | エージェントシステム、端末装置およびエージェントプログラム |
US11257459B2 (en) | 2014-12-18 | 2022-02-22 | Samsung Electronics Co., Ltd | Method and apparatus for controlling an electronic device |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7398209B2 (en) | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7693720B2 (en) * | 2002-07-15 | 2010-04-06 | Voicebox Technologies, Inc. | Mobile systems and methods for responding to natural language speech utterance |
US7640160B2 (en) * | 2005-08-05 | 2009-12-29 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7620549B2 (en) | 2005-08-10 | 2009-11-17 | Voicebox Technologies, Inc. | System and method of supporting adaptive misrecognition in conversational speech |
US7949529B2 (en) | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
US8073681B2 (en) | 2006-10-16 | 2011-12-06 | Voicebox Technologies, Inc. | System and method for a cooperative conversational voice user interface |
US7818176B2 (en) | 2007-02-06 | 2010-10-19 | Voicebox Technologies, Inc. | System and method for selecting and presenting advertisements based on natural language processing of voice-based input |
US8140335B2 (en) | 2007-12-11 | 2012-03-20 | Voicebox Technologies, Inc. | System and method for providing a natural language voice user interface in an integrated voice navigation services environment |
US20090209341A1 (en) * | 2008-02-14 | 2009-08-20 | Aruze Gaming America, Inc. | Gaming Apparatus Capable of Conversation with Player and Control Method Thereof |
JP5104448B2 (ja) * | 2008-03-21 | 2012-12-19 | 富士通株式会社 | 業務改善支援装置および業務改善支援プログラム |
US9305548B2 (en) | 2008-05-27 | 2016-04-05 | Voicebox Technologies Corporation | System and method for an integrated, multi-modal, multi-device natural language voice services environment |
US8589161B2 (en) | 2008-05-27 | 2013-11-19 | Voicebox Technologies, Inc. | System and method for an integrated, multi-modal, multi-device natural language voice services environment |
TW201019288A (en) * | 2008-11-13 | 2010-05-16 | Ind Tech Res Inst | System and method for conversation practice in simulated situations |
US8326637B2 (en) | 2009-02-20 | 2012-12-04 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
WO2011059997A1 (en) | 2009-11-10 | 2011-05-19 | Voicebox Technologies, Inc. | System and method for providing a natural language content dedication service |
US9171541B2 (en) | 2009-11-10 | 2015-10-27 | Voicebox Technologies Corporation | System and method for hybrid processing in a natural language voice services environment |
US9743820B2 (en) | 2010-02-26 | 2017-08-29 | Whirlpool Corporation | User interface for dishwashing cycle optimization |
FR2962048A1 (fr) * | 2010-07-02 | 2012-01-06 | Aldebaran Robotics S A | Robot humanoide joueur, methode et systeme d'utilisation dudit robot |
JP5842245B2 (ja) * | 2011-04-28 | 2016-01-13 | 株式会社国際電気通信基礎技術研究所 | コミュニケーションロボット |
US8738364B2 (en) * | 2011-12-14 | 2014-05-27 | International Business Machines Corporation | Adaptation of vocabulary levels for enhanced collaboration |
US9443514B1 (en) * | 2012-02-08 | 2016-09-13 | Google Inc. | Dynamic voice response control based on a weighted pace of spoken terms |
TW201408052A (zh) * | 2012-08-14 | 2014-02-16 | Kentec Inc | 電視裝置及其虛擬主持人顯示方法 |
US9223837B2 (en) * | 2013-03-14 | 2015-12-29 | Toyota Motor Engineering & Manufacturing North America, Inc. | Computer-based method and system for providing active and automatic personal assistance using an automobile or a portable electronic device |
EP2933067B1 (en) * | 2014-04-17 | 2019-09-18 | Softbank Robotics Europe | Method of performing multi-modal dialogue between a humanoid robot and user, computer program product and humanoid robot for implementing said method |
CN107003996A (zh) | 2014-09-16 | 2017-08-01 | 声钰科技 | 语音商务 |
WO2016044321A1 (en) | 2014-09-16 | 2016-03-24 | Min Tang | Integration of domain information into state transitions of a finite state transducer for natural language processing |
US9747896B2 (en) | 2014-10-15 | 2017-08-29 | Voicebox Technologies Corporation | System and method for providing follow-up responses to prior natural language inputs of a user |
US10614799B2 (en) | 2014-11-26 | 2020-04-07 | Voicebox Technologies Corporation | System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance |
US10431214B2 (en) | 2014-11-26 | 2019-10-01 | Voicebox Technologies Corporation | System and method of determining a domain and/or an action related to a natural language input |
KR20170034154A (ko) | 2015-09-18 | 2017-03-28 | 삼성전자주식회사 | 콘텐츠 제공 방법 및 이를 수행하는 전자 장치 |
WO2017100167A1 (en) * | 2015-12-06 | 2017-06-15 | Voicebox Technologies Corporation | System and method of conversational adjustment based on user's cognitive state and/or situational state |
JP6741504B2 (ja) * | 2016-07-14 | 2020-08-19 | 株式会社ユニバーサルエンターテインメント | 面接システム |
WO2018023106A1 (en) | 2016-07-29 | 2018-02-01 | Erik SWART | System and method of disambiguating natural language processing requests |
US10276149B1 (en) * | 2016-12-21 | 2019-04-30 | Amazon Technologies, Inc. | Dynamic text-to-speech output |
US10628754B2 (en) * | 2017-06-06 | 2020-04-21 | At&T Intellectual Property I, L.P. | Personal assistant for facilitating interaction routines |
CN110278140B (zh) * | 2018-03-14 | 2022-05-24 | 阿里巴巴集团控股有限公司 | 通讯方法及装置 |
US10573298B2 (en) * | 2018-04-16 | 2020-02-25 | Google Llc | Automated assistants that accommodate multiple age groups and/or vocabulary levels |
KR102228866B1 (ko) * | 2018-10-18 | 2021-03-17 | 엘지전자 주식회사 | 로봇 및 그의 제어 방법 |
JP6993314B2 (ja) * | 2018-11-09 | 2022-01-13 | 株式会社日立製作所 | 対話システム、装置、及びプログラム |
JP2020119412A (ja) * | 2019-01-28 | 2020-08-06 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US11425523B2 (en) * | 2020-04-10 | 2022-08-23 | Facebook Technologies, Llc | Systems and methods for audio adjustment |
CN112599151B (zh) * | 2020-12-07 | 2023-07-21 | 携程旅游信息技术(上海)有限公司 | 语速评估方法、系统、设备及存储介质 |
CN114627876B (zh) * | 2022-05-09 | 2022-08-26 | 杭州海康威视数字技术股份有限公司 | 基于音频动态调节的智能语音识别安全防御方法及装置 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6173985A (ja) * | 1984-09-19 | 1986-04-16 | 渡辺 富夫 | 教習装置 |
JPS62145322A (ja) * | 1985-12-20 | 1987-06-29 | Canon Inc | 音声出力装置 |
JPH04344930A (ja) * | 1991-05-23 | 1992-12-01 | Nippon Telegr & Teleph Corp <Ntt> | 音声ガイダンス出力方式 |
JP2000194386A (ja) * | 1998-12-24 | 2000-07-14 | Omron Corp | 音声認識応答装置及び方法 |
JP2001034293A (ja) * | 1999-06-30 | 2001-02-09 | Internatl Business Mach Corp <Ibm> | 音声を転写するための方法及び装置 |
JP2003150194A (ja) * | 2001-11-14 | 2003-05-23 | Seiko Epson Corp | 音声対話装置および音声対話装置における入力音声最適化方法ならびに音声対話装置における入力音声最適化処理プログラム |
JP2003255991A (ja) * | 2002-03-06 | 2003-09-10 | Sony Corp | 対話制御システム、対話制御方法及びロボット装置 |
JP2004258290A (ja) * | 2003-02-26 | 2004-09-16 | Sony Corp | 音声処理装置および方法、記録媒体、並びにプログラム |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2119397C (en) * | 1993-03-19 | 2007-10-02 | Kim E.A. Silverman | Improved automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation |
US5717823A (en) * | 1994-04-14 | 1998-02-10 | Lucent Technologies Inc. | Speech-rate modification for linear-prediction based analysis-by-synthesis speech coders |
JPH0844520A (ja) | 1994-07-29 | 1996-02-16 | Toshiba Corp | 対話装置及び同装置に適用される操作ガイダンス出力方法 |
JPH09212568A (ja) | 1995-08-31 | 1997-08-15 | Sanyo Electric Co Ltd | ユーザ適応型応答装置 |
JPH0981350A (ja) | 1995-09-11 | 1997-03-28 | Toshiba Corp | ヒューマンインタフェースシステム及びユーザ適応制御方法 |
JPH09152926A (ja) | 1995-11-29 | 1997-06-10 | Sanyo Electric Co Ltd | 可変誘導入力機能付き画像情報処理装置 |
JPH10111786A (ja) * | 1996-10-03 | 1998-04-28 | Sharp Corp | リズム制御対話装置 |
JP2000305585A (ja) * | 1999-04-23 | 2000-11-02 | Oki Electric Ind Co Ltd | 音声合成装置 |
JP2000330676A (ja) | 1999-05-19 | 2000-11-30 | Nec Corp | 適応ユーザインタフェース生成装置および生成方法 |
JP3514372B2 (ja) * | 1999-06-04 | 2004-03-31 | 日本電気株式会社 | マルチモーダル対話装置 |
US6505153B1 (en) * | 2000-05-22 | 2003-01-07 | Compaq Information Technologies Group, L.P. | Efficient method for producing off-line closed captions |
US6795808B1 (en) * | 2000-10-30 | 2004-09-21 | Koninklijke Philips Electronics N.V. | User interface/entertainment device that simulates personal interaction and charges external database with relevant data |
US20020150869A1 (en) * | 2000-12-18 | 2002-10-17 | Zeev Shpiro | Context-responsive spoken language instruction |
DE10138408A1 (de) * | 2001-08-04 | 2003-02-20 | Philips Corp Intellectual Pty | Verfahren zur Unterstützung des Korrekturlesens eines spracherkannten Textes mit an die Erkennungszuverlässigkeit angepasstem Wiedergabegeschwindigkeitsverlauf |
US7295982B1 (en) * | 2001-11-19 | 2007-11-13 | At&T Corp. | System and method for automatic verification of the understandability of speech |
US20030163311A1 (en) * | 2002-02-26 | 2003-08-28 | Li Gong | Intelligent social agents |
US7096183B2 (en) * | 2002-02-27 | 2006-08-22 | Matsushita Electric Industrial Co., Ltd. | Customizing the speaking style of a speech synthesizer based on semantic analysis |
GB0228245D0 (en) * | 2002-12-04 | 2003-01-08 | Mitel Knowledge Corp | Apparatus and method for changing the playback rate of recorded speech |
US7337108B2 (en) * | 2003-09-10 | 2008-02-26 | Microsoft Corporation | System and method for providing high-quality stretching and compression of a digital audio signal |
US7412378B2 (en) * | 2004-04-01 | 2008-08-12 | International Business Machines Corporation | Method and system of dynamically adjusting a speech output rate to match a speech input rate |
US7865365B2 (en) * | 2004-08-05 | 2011-01-04 | Nuance Communications, Inc. | Personalized voice playback for screen reader |
TWI235823B (en) * | 2004-09-30 | 2005-07-11 | Inventec Corp | Speech recognition system and method thereof |
US8694319B2 (en) * | 2005-11-03 | 2014-04-08 | International Business Machines Corporation | Dynamic prosody adjustment for voice-rendering synthesized data |
-
2005
- 2005-01-28 JP JP2005517657A patent/JP3924583B2/ja not_active Expired - Fee Related
- 2005-01-28 WO PCT/JP2005/001219 patent/WO2005076258A1/ja active Application Filing
-
2006
- 2006-06-08 US US11/449,852 patent/US7684977B2/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6173985A (ja) * | 1984-09-19 | 1986-04-16 | 渡辺 富夫 | 教習装置 |
JPS62145322A (ja) * | 1985-12-20 | 1987-06-29 | Canon Inc | 音声出力装置 |
JPH04344930A (ja) * | 1991-05-23 | 1992-12-01 | Nippon Telegr & Teleph Corp <Ntt> | 音声ガイダンス出力方式 |
JP2000194386A (ja) * | 1998-12-24 | 2000-07-14 | Omron Corp | 音声認識応答装置及び方法 |
JP2001034293A (ja) * | 1999-06-30 | 2001-02-09 | Internatl Business Mach Corp <Ibm> | 音声を転写するための方法及び装置 |
JP2003150194A (ja) * | 2001-11-14 | 2003-05-23 | Seiko Epson Corp | 音声対話装置および音声対話装置における入力音声最適化方法ならびに音声対話装置における入力音声最適化処理プログラム |
JP2003255991A (ja) * | 2002-03-06 | 2003-09-10 | Sony Corp | 対話制御システム、対話制御方法及びロボット装置 |
JP2004258290A (ja) * | 2003-02-26 | 2004-09-16 | Sony Corp | 音声処理装置および方法、記録媒体、並びにプログラム |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016109897A (ja) * | 2014-12-08 | 2016-06-20 | シャープ株式会社 | 電子機器、発話制御方法、およびプログラム |
KR20160074388A (ko) * | 2014-12-18 | 2016-06-28 | 삼성전자주식회사 | 전자 장치의 제어 방법 및 장치 |
US11257459B2 (en) | 2014-12-18 | 2022-02-22 | Samsung Electronics Co., Ltd | Method and apparatus for controlling an electronic device |
KR102362042B1 (ko) * | 2014-12-18 | 2022-02-11 | 삼성전자주식회사 | 전자 장치의 제어 방법 및 장치 |
WO2017168936A1 (ja) * | 2016-03-31 | 2017-10-05 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JPWO2017168936A1 (ja) * | 2016-03-31 | 2019-02-07 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US11462213B2 (en) | 2016-03-31 | 2022-10-04 | Sony Corporation | Information processing apparatus, information processing method, and program |
JP2021503112A (ja) * | 2017-09-29 | 2021-02-04 | トルーク インコーポレイテッドTorooc Inc. | 自律行動ロボットを利用して対話サービスを提供する方法、システム、及び非一時的なコンピュータ読み取り可能な記録媒体 |
JPWO2019073668A1 (ja) * | 2017-10-11 | 2020-11-05 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
WO2019073668A1 (ja) * | 2017-10-11 | 2019-04-18 | ソニー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
JP2019211909A (ja) * | 2018-06-01 | 2019-12-12 | 凸版印刷株式会社 | 情報提示システム、情報提示方法及びプログラム |
JP7180127B2 (ja) | 2018-06-01 | 2022-11-30 | 凸版印刷株式会社 | 情報提示システム、情報提示方法及びプログラム |
CN112533526A (zh) * | 2018-08-03 | 2021-03-19 | 尼德克株式会社 | 眼科图像处理装置、oct装置及眼科图像处理程序 |
JP2020018794A (ja) * | 2018-08-03 | 2020-02-06 | 株式会社ニデック | 眼科画像処理装置、oct装置、および眼科画像処理プログラム |
JP7210927B2 (ja) | 2018-08-03 | 2023-01-24 | 株式会社ニデック | 眼科画像処理装置、oct装置、および眼科画像処理プログラム |
JP2023024614A (ja) * | 2018-08-03 | 2023-02-16 | 株式会社ニデック | 眼科画像処理装置、oct装置、および眼科画像処理プログラム |
US11961229B2 (en) | 2018-08-03 | 2024-04-16 | Nidek Co., Ltd. | Ophthalmic image processing device, OCT device, and non-transitory computer-readable storage medium |
JP7521575B2 (ja) | 2018-08-03 | 2024-07-24 | 株式会社ニデック | 眼科画像処理装置、oct装置、および眼科画像処理プログラム |
JP2021117296A (ja) * | 2020-01-23 | 2021-08-10 | トヨタ自動車株式会社 | エージェントシステム、端末装置およびエージェントプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20060287850A1 (en) | 2006-12-21 |
JPWO2005076258A1 (ja) | 2007-10-18 |
US7684977B2 (en) | 2010-03-23 |
JP3924583B2 (ja) | 2007-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005076258A1 (ja) | ユーザ適応型装置およびその制御方法 | |
JP6693111B2 (ja) | 対話装置、ロボット、対話方法及びプログラム | |
JP3968133B2 (ja) | 音声認識対話処理方法および音声認識対話装置 | |
US5946658A (en) | Cartridge-based, interactive speech recognition method with a response creation capability | |
KR101423258B1 (ko) | 상담 대화 제공 방법 및 이를 이용하는 장치 | |
JP2017049471A (ja) | 対話制御装置、対話制御方法及びプログラム | |
JP6970413B2 (ja) | 対話方法、対話システム、対話装置、およびプログラム | |
JPWO2017200072A1 (ja) | 対話方法、対話システム、対話装置、およびプログラム | |
Aneja et al. | Understanding conversational and expressive style in a multimodal embodied conversational agent | |
WO2017175351A1 (ja) | 情報処理装置 | |
JP2009037050A (ja) | 対話装置と対話用プログラム | |
Ward et al. | Non-native differences in prosodic-construction use | |
Siegert et al. | “Speech Melody and Speech Content Didn’t Fit Together”—Differences in Speech Behavior for Device Directed and Human Directed Interactions | |
JP3681145B2 (ja) | 発話装置及び発話方法 | |
CN115088033A (zh) | 代表对话中的人参与者生成的合成语音音频数据 | |
JP6682104B2 (ja) | 対話方法、対話システム、対話装置、およびプログラム | |
JP6551793B2 (ja) | 対話方法、対話システム、対話装置、およびプログラム | |
McDonnell et al. | “Easier or Harder, Depending on Who the Hearing Person Is”: Codesigning Videoconferencing Tools for Small Groups with Mixed Hearing Status | |
Hoque et al. | Robust recognition of emotion from speech | |
Cowan et al. | Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue? | |
Noyes | Talking and writing—how natural in human–machine interaction? | |
JP6601625B2 (ja) | 対話方法、対話システム、対話装置、およびプログラム | |
JP6647636B2 (ja) | 対話方法、対話システム、対話装置、及びプログラム | |
JP7322374B2 (ja) | ロボットの制御装置、ロボット、ロボットの制御方法およびプログラム | |
Nishimura et al. | Chat-like spoken dialog system for a multi-party dialog incorporating two agents and a user |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2005517657 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11449852 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 11449852 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |