JP5073024B2

JP5073024B2 - Spoken dialogue device

Info

Publication number: JP5073024B2
Application number: JP2010179194A
Authority: JP
Inventors: 優佳小林; 大介山本; 祥恵横山; 美和子土井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2010-08-10
Filing date: 2010-08-10
Publication date: 2012-11-14
Anticipated expiration: 2030-08-10
Also published as: JP2012037790A

Description

本発明の実施形態は、音声によって対話を行う音声対話装置に関する。 Embodiments described herein relate generally to a voice interaction apparatus that performs a conversation by voice.

音声対話装置がユーザと対話するためには対話内容に即した発話を行う必要がある。しかし、多種多様な発話を行うためには、高度な言語処理と膨大なデータベースが必要となる。そこで、実際の対話履歴を元にシステムの発話内容を生成する方法が提案されている。例えば、その中の一つの方法は、音声対話におけるユーザとの対話履歴を使用するものであり、ユーザの発話音声を音声認識し、認識結果を対話履歴として使用する。 In order for the voice interaction apparatus to interact with the user, it is necessary to utter in accordance with the content of the conversation. However, in order to perform a wide variety of utterances, advanced language processing and a huge database are required. Therefore, a method for generating the utterance content of the system based on the actual conversation history has been proposed. For example, one of the methods uses a dialogue history with a user in a voice dialogue, recognizes a user's utterance voice, and uses a recognition result as a dialogue history.

特開２００２−２８８１５５号公報JP 2002-288155 A

音声対話ではユーザの発話にはゆらぎが大きい。意味的には同じ内容の発話でも、言い回しや語尾が変化し、いつも同じ文章を発話するとは限らない。そのような環境下においても、音声対話を用いて新しい発話テンプレートを生成する必要がある。 In the voice dialogue, the user's utterance has a large fluctuation. Semantically speaking, even if the utterance has the same content, the wording or ending changes, and the same sentence is not always uttered. Even in such an environment, it is necessary to generate a new utterance template using voice dialogue.

そこで、本発明の実施形態は、上記問題点に鑑み、ユーザとの音声対話を利用して発話テンプレートを生成する音声対話装置を提供することを目的とする。 Therefore, in view of the above problems, an embodiment of the present invention aims to provide a voice interaction device that generates an utterance template using a voice interaction with a user.

本発明の実施形態は、テンプレート辞書に格納した発話テンプレートを用いてユーザと音声対話を行う音声対話装置において、特定の話題に関する複数のキーワードと、前記各キーワードにそれぞれ対応した概念を示す情報とを格納したキーワード辞書格納部と、ユーザの発話文を音声認識する音声認識部と、前記音声認識部が音声認識した前記発話文中に、前記キーワード辞書に格納された前記キーワードが含まれていた場合、前記キーワードを前記キーワードに対応する前記概念を示す情報に置き換えて、置き換えた前記発話文を新しい発話テンプレートに設定し、前記テンプレート辞書に格納するテンプレート生成部と、を有することを特徴とする音声対話装置である。 An embodiment of the present invention provides a plurality of keywords related to a specific topic and information indicating a concept corresponding to each keyword in a voice interaction device that performs a voice conversation with a user using an utterance template stored in a template dictionary. When the keyword stored in the keyword dictionary is included in the stored keyword dictionary storage unit, the speech recognition unit that recognizes speech of the user's utterance, and the utterance that is recognized by the speech recognition unit, A voice dialog, comprising: a template generation unit that replaces the keyword with information indicating the concept corresponding to the keyword, sets the replaced speech sentence as a new speech template, and stores the template in the template dictionary Device.

本発明の実施例１の音声対話装置のブロック図。1 is a block diagram of a voice interaction apparatus according to a first embodiment of the present invention. 実施例１の発話文生成のフローチャート。The flowchart of the speech sentence production | generation of Example 1. FIG. 実施例１の発話テンプレート格納のフローチャート。5 is a flowchart for storing an utterance template according to the first embodiment. 実施例２の発話テンプレート格納のフローチャート。10 is a flowchart of storing an utterance template according to the second embodiment. 実施例２の音声認識結果から正解の発話文生成例。An example of generating a correct utterance sentence from the speech recognition result of the second embodiment. 実施例２の音声認識結果から正解の発話文生成例。An example of generating a correct utterance sentence from the speech recognition result of the second embodiment. 実施例３の発話テンプレート格納のフローチャート。10 is a flowchart of storing an utterance template according to the third embodiment. 実施例３の対話例。An example of interaction of Example 3. 実施例４の発話テンプレートの第１の変換例。10 is a first conversion example of an utterance template of Example 4. FIG. 実施例４の発話テンプレートの第２の変換例。10 is a second conversion example of the utterance template of the fourth embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

以下、本発明に係る音声対話装置１の実施の形態を図面に基づいて説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, an embodiment of a voice interaction device 1 according to the present invention will be described based on the drawings.

本発明の実施例１の音声対話装置１について図１〜図３に基づいて説明する。 A voice interaction apparatus 1 according to a first embodiment of the present invention will be described with reference to FIGS.

本実施例の音声対話装置１は、図１（ａ）に示すように、例えば、ユーザ２と音声によって対話を行うロボット３に内蔵された装置である。 As shown in FIG. 1A, the voice interaction device 1 according to the present embodiment is a device built in a robot 3 that interacts with the user 2 by voice, for example.

まず、音声対話装置１の構成について図１（ｂ）のブロック図に基づいて説明する。 First, the configuration of the voice interaction apparatus 1 will be described based on the block diagram of FIG.

音声対話装置１は、ユーザ２の音声を入力するためのマイク１５、ユーザ２の音声を音声認識してテキストの発話文に変換する音声認識部１３、ユーザ２の発話内容に応じて発話文を生成する発話生成部１１、ユーザに対する発話文を音声に変換する音声合成部１４、音声を出力するスピーカ１６、発話プレートを生成するテンプレート生成部１２、キーワード辞書データベース１７、テンプレート辞書データベース１８、概念辞書１９を有する。 The voice interactive apparatus 1 includes a microphone 15 for inputting the voice of the user 2, a voice recognition unit 13 that recognizes the voice of the user 2 and converts it into a text utterance, and an utterance sentence according to the utterance content of the user 2. An utterance generation unit 11 to be generated, an audio synthesis unit 14 that converts an utterance sentence to a user into speech, a speaker 16 that outputs audio, a template generation unit 12 to generate an utterance plate, a keyword dictionary database 17, a template dictionary database 18, and a concept dictionary 19

キーワード辞書データベース１７には、複数のキーワード辞書１７１が格納されている。キーワード辞書１７１は、図１に示すように複数のキーワードが登録されたファイルであり、各キーワード辞書１７１には、特定の話題について関連性の高い単語がキーワードとして格納されている。例えば、図１のキーワード辞書１７１は、「京都旅行」に関する辞書であり、キーワードは、「京都」「新幹線」「舞妓」「湯豆腐」などである。また、各キーワードには、そのキーワードの概念が付与されている。例えば、「京都」の概念は「場所」であり、「新幹線」の概念は「乗り物」であり、「舞妓」の概念は「人」であり、「湯豆腐」の概念は「食べ物」である。キーワード辞書１７１中の各キーワードの概念は、概念辞書１９から引いて付与されている。概念辞書１９にはさまざまな単語が数十個から数百個の概念に分類されている。なお、「概念」とは、事物の本質的な特徴を意味する。 The keyword dictionary database 17 stores a plurality of keyword dictionaries 171. The keyword dictionary 171 is a file in which a plurality of keywords are registered as shown in FIG. 1, and each keyword dictionary 171 stores words that are highly relevant to a specific topic as keywords. For example, the keyword dictionary 171 in FIG. 1 is a dictionary related to “Kyoto trip”, and the keywords are “Kyoto”, “Shinkansen”, “Maiko”, “Yudofu”, and the like. Each keyword is given the concept of the keyword. For example, the concept of “Kyoto” is “place”, the concept of “Shinkansen” is “vehicle”, the concept of “maiko” is “people”, and the concept of “Yudofu” is “food”. The concept of each keyword in the keyword dictionary 171 is given by being pulled from the concept dictionary 19. In the concept dictionary 19, various words are classified into tens to hundreds of concepts. “Concept” means the essential characteristics of things.

各キーワード辞書１７１は、上記したように特定の話題に関するものであり、それぞれの話題に関する対話記録や関連情報などのテキストから生成されている。その生成方法は、次の通りである。 Each keyword dictionary 171 relates to a specific topic as described above, and is generated from text such as a dialog record and related information regarding each topic. The generation method is as follows.

第１に、キーワード辞書データベース１７は、各テキストから助詞・助動詞等を除いた単語だけを抽出する。 First, the keyword dictionary database 17 extracts only words obtained by removing particles / auxiliary verbs from each text.

第２に、キーワード辞書データベース１７は、各単語の該当テキスト中での頻度（ｔｆ）、該当テキスト以外で、その単語を含むテキストの数Ｎを算出する。さらに、テキストの総数Ｎ_ａｌｌを用いて、評価値ｉｄｆ＝ｌｏｇ（Ｎ_ａｌｌ／Ｎ）を算出する。 Secondly, the keyword dictionary database 17 calculates the frequency (tf) of each word in the corresponding text and the number N of texts including the word other than the corresponding text. Further, the evaluation value idf = log (N _all / N) is calculated using the total number of texts N _all .

第３に、キーワード辞書データベース１７は、ｔｆ−ｉｄｆを算出し、ｔｆ−ｉｄｆが基準より高い単語を、キーワードとして抽出する。 Thirdly, the keyword dictionary database 17 calculates tf-idf, and extracts a word whose tf-idf is higher than the reference as a keyword.

なお、キーワード辞書１７１は話題毎に予め生成されているが、リアルタイムに生成することも可能である。 Although the keyword dictionary 171 is generated in advance for each topic, it can be generated in real time.

テンプレート辞書データベース１８には、複数のテンプレート辞書１８１が格納されている。テンプレート辞書１８１は「旅行」「ＴＶ」「本」「映画」等の対話の種類の数だけ格納されている。テンプレート辞書１８１は、図１（ｂ）に示すように概念辞書１９中の各概念を使用した発話テンプレートが複数記述されたファイルである。例えば、発話プレートしては、「［場所］に行く」、「［食べ物］を食べる」などであり、発話テンプレート中の概念が記述されている［］の部分を、その概念を持つキーワード（例えば、「京都」）に置換すれば発話文が生成できる。なお、キーワードを挿入すべき位置と挿入されるキーワードの概念を特定できれば、発話テンプレートのデータ構造は任意の形式で構わない。 A plurality of template dictionaries 181 are stored in the template dictionary database 18. The template dictionary 181 stores as many dialog types as “travel”, “TV”, “book”, “movie”, and the like. The template dictionary 181 is a file in which a plurality of utterance templates using each concept in the concept dictionary 19 are described as shown in FIG. For example, the utterance plate is “go to [location]”, “eating [food]”, etc., and the part of [] in which the concept in the utterance template is described is replaced with a keyword (for example, , “Kyoto”), the utterance can be generated. Note that the data structure of the utterance template may be in any format as long as the position where the keyword is to be inserted and the concept of the keyword to be inserted can be specified.

音声対話装置１が、ユーザ２に対する発話文を生成する方法について図２のフローチャートに基づいて説明する。 A method in which the voice interaction apparatus 1 generates an utterance sentence for the user 2 will be described based on the flowchart of FIG.

音声対話装置１は、ユーザ２と対話を始める前に、何の話題について発話するかを予め決めておく。音声対話装置１は、その話題に応じてキーワード辞書１７１とテンプレート辞書１８１を選択する。音声対話装置１は、図２に示すように「京都旅行」という話題について発話するため、「京都旅行」に関するキーワードを格納したキーワード辞書１７１と「旅行」に関する発話テンプレートを格納したテンプレート辞書１８１を選択する。発話生成部１１は、この選択したキーワード辞書データベース１７とテンプレート辞書データベース１８を使用して発話文を生成する。 The voice interactive apparatus 1 determines in advance what topic to speak before starting a conversation with the user 2. The voice interactive apparatus 1 selects the keyword dictionary 171 and the template dictionary 181 according to the topic. As shown in FIG. 2, the voice interaction apparatus 1 utters a topic “Kyoto trip”, and therefore selects a keyword dictionary 171 storing a keyword related to “Kyoto trip” and a template dictionary 181 storing an utterance template related to “travel”. To do. The utterance generation unit 11 generates an utterance sentence using the selected keyword dictionary database 17 and template dictionary database 18.

ステップｓ１では、ユーザ２が発話を行う。 In step s1, the user 2 speaks.

ステップｓ２では、音声認識部１３が、ユーザ２の発話文を音声認識する。 In step s2, the speech recognition unit 13 recognizes the speech of the user 2 by speech.

ステップｓ３では、発話生成部１１は、ユーザ２と現在対話している話題「京都旅行」用のキーワード辞書１７１中からキーワード「京都」を選択する。キーワードの選択方法としては、例えば、発話生成部１１が、音声認識した発話文中にキーワード辞書１７１で格納しているキーワードが存在すれば、そのキーワードを選択してもよいし、ランダムに選択してもよい。 In step s3, the utterance generation unit 11 selects the keyword “Kyoto” from the keyword dictionary 171 for the topic “Kyoto trip” that is currently interacting with the user 2. As a keyword selection method, for example, if the keyword stored in the keyword dictionary 171 is present in the utterance sentence recognized by the utterance generation unit 11, the keyword may be selected or selected randomly. Also good.

ステップｓ４では、発話生成部１１は、ユーザ２と現在対話している話題の種類「旅行」に対応したテンプレート辞書１８１中で、選択したキーワード「京都」の概念「場所」を含む発話テンプレートを選択する。 In step s4, the utterance generation unit 11 selects an utterance template including the concept “location” of the selected keyword “Kyoto” in the template dictionary 181 corresponding to the type of topic “travel” that is currently interacting with the user 2. To do.

ステップｓ５では、発話生成部１１は、選択したキーワード辞書１７１中のキーワードとテンプレート辞書１８１中の発話テンプレートを組み合わせて発話文を生成する。例えば、発話生成部１１は、「京都」というキーワードと、「［場所］に行く」という発話テンプレートを組み合わせて「京都に行く」という発話文を生成する。 In step s5, the utterance generation unit 11 generates an utterance sentence by combining the keyword in the selected keyword dictionary 171 and the utterance template in the template dictionary 181. For example, the utterance generating unit 11 generates an utterance sentence “going to Kyoto” by combining the keyword “Kyoto” and an utterance template “going to [location]”.

ステップｓ６では、音声合成部１４がスピーカ１６を用いて、生成した発話文を発話する。 In step s6, the speech synthesizer 14 utters the generated utterance sentence using the speaker 16.

このようにキーワード辞書１７１とテンプレート辞書１８１の組み合わせで、音声対話装置１は発話を行う。 In this way, the voice interaction apparatus 1 utters speech by combining the keyword dictionary 171 and the template dictionary 181.

次に、音声対話装置１が、ユーザ２との対話履歴を用いて、テンプレート辞書１８１に新しく発話テンプレートを格納する方法について説明する。例えば、音声対話装置１とユーザ２が以下のような対話を行ったとする。 Next, a method in which the voice interaction apparatus 1 stores a new utterance template in the template dictionary 181 using the conversation history with the user 2 will be described. For example, assume that the voice interaction device 1 and the user 2 have the following interaction.

音声対話装置１：「京都はいつ行くのがいいの？」
ユーザ２：「京都は秋がお勧めだよ」
音声対話装置１が、この対話例から発話テンプレートを生成する方法について図３のフローチャートに基づいて説明する。 Spoken Dialogue Device 1: “When should I go to Kyoto?”
User 2: “Kyoto is recommended for autumn”
A method in which the voice interaction apparatus 1 generates an utterance template from this dialogue example will be described based on the flowchart of FIG.

ステップｓ２１では、音声対話装置１が「京都はいつ行くのがいいの？」と発話したことに対して、ユーザ２が「京都は秋がお勧めだよ」と発話する。 In step s21, the user 2 utters "Kyoto is recommended for autumn" in response to the voice dialogue apparatus 1 uttering "When should I go to Kyoto?"

ステップｓ２２では、音声認識部１３は、音声認識を行い、これにより、音声認識結果「京都は秋がお勧めだよ」を得る。 In step s22, the speech recognition unit 13 performs speech recognition, thereby obtaining a speech recognition result “Kyoto is recommended for autumn”.

ステップｓ２３では、テンプレート生成部１２は、音声認識した発話文中にキーワード辞書１７１に格納しているキーワードが含まれているかどうか検索を行う。キーワードが含まれていればステップｓ２４に進み（ｙｅｓの場合）、含まれていなければ処理を終了する（ｎｏの場合）。ここで、ユーザ２の発話には、キーワード「京都」が含まれているのでステップｓ２４に進む。 In step s23, the template generation unit 12 performs a search to determine whether or not the keyword stored in the keyword dictionary 171 is included in the speech sentence that has been voice-recognized. If a keyword is included, the process proceeds to step s24 (in the case of yes), and if not included, the process is ended (in the case of no). Here, since the keyword “Kyoto” is included in the utterance of the user 2, the process proceeds to step s24.

ステップｓ２４では、テンプレート生成部１２は、検索の結果キーワードが含まれているので、その部分をキーワードの概念に置換する。ここでは、テンプレート生成部１２は、「京都」の概念「場所」に置換する。 In step s24, since the keyword is included as a result of the search, the template generation unit 12 replaces that portion with the keyword concept. Here, the template generation unit 12 replaces the concept “location” of “Kyoto”.

ステップｓ２５では、テンプレート生成部１２は、置換された文章「［場所］は秋がお勧めだよ」を新たな発話テンプレートに設定し、テンプレート辞書データベース１８のテンプレート辞書１８１「旅行」に格納する。 In step s 25, the template generation unit 12 sets the replaced sentence “[Location] is recommended for autumn” as a new utterance template and stores it in the template dictionary 181 “travel” of the template dictionary database 18.

本実施例によれば、音声対話装置１は、新しい発話テンプレート「［場所］は秋がお勧めだよ」を使用して発話を今後行うことができる。この発話テンプレートはテンプレート辞書１８１「旅行」に格納されているので「旅行」の対話であれば他のキーワード辞書１７１と組み合わせて使用できる。例えば、音声対話装置１は、「北海道旅行」のキーワード辞書１７１中のキーワード「小樽」を使用して「小樽は秋がお勧めだよ」という発話文を生成できる。 According to the present embodiment, the voice interaction apparatus 1 can utter in the future by using the new utterance template “[Location] is recommended in autumn”. Since this utterance template is stored in the template dictionary 181 “travel”, it can be used in combination with another keyword dictionary 171 in the case of “travel” dialogue. For example, the spoken dialogue apparatus 1 can generate an utterance sentence “Otaru recommends autumn” using the keyword “Otaru” in the keyword dictionary 171 of “Hokkaido Travel”.

本発明の実施例２の音声対話装置１について図４〜図６に基づいて説明する。 A voice interaction apparatus 1 according to a second embodiment of the present invention will be described with reference to FIGS.

ユーザ２の発話文を使用して新しい発話テンプレートを生成する際、実施例１のようにユーザ２の発話文を正確に音声認識できるとは限らない。そこで、本実施例では、複数のユーザ２の発話の音声認識した発話文を使用して一つの発話テンプレートを生成する。 When a new utterance template is generated using the utterance sentence of the user 2, the utterance sentence of the user 2 cannot be accurately recognized as in the first embodiment. Therefore, in the present embodiment, one utterance template is generated using the utterance sentences that are voice-recognized by the utterances of the plurality of users 2.

なお、実施例１ではユーザ２の発話直後にリアルタイムに発話テンプレートを生成したが、本実施例の音声対話装置１では、ユーザ２との対話終了後に発話テンプレートを生成する。 In the first embodiment, the utterance template is generated in real time immediately after the utterance of the user 2. However, in the voice interaction apparatus 1 of the present embodiment, the utterance template is generated after the conversation with the user 2 is completed.

本実施例の音声対話装置１が発話テンプレートを生成する第１の具体例について、図４のフローチャートと図５に基づいて説明する。 A first specific example in which the voice interaction apparatus 1 of the present embodiment generates an utterance template will be described with reference to the flowchart of FIG. 4 and FIG.

ステップｓ３１では、テンプレート生成部１２は、実施例１と同じように、ユーザ２の複数の発話の音声認識である発話文中に、キーワード辞書１７１に格納されたキーワードが含まれるかどうかをそれぞれ検索する。キーワードが含まれていればステップｓ３２に進み（ｙｅｓの場合）、含まれていなければ処理を終了する（ｎｏの場合）。ここで、ユーザ２の各発話には、キーワード「京都」が含まれているのでステップｓ３２に進む。 In step s31, as in the first embodiment, the template generation unit 12 searches whether the keyword stored in the keyword dictionary 171 is included in the utterance sentence that is the voice recognition of the plurality of utterances of the user 2. . If a keyword is included, the process proceeds to step s32 (in the case of yes), and if not included, the process is terminated (in the case of no). Here, since each keyword of the user 2 includes the keyword “Kyoto”, the process proceeds to step s32.

ステップｓ３２では、テンプレート生成部１２が同じキーワードを含む複数の発話文を使用して正しい音声認識の発話文を、図５の記載を例示として、次の順番で生成する。なお、図５は、テンプレート生成部１２は、５個の「京都」を含む音声認識した発話文を取得した場合を示し、図５（ａ）の右側の文章中の（）内はユーザ２の実際の発話文であり、左側の文章における下線部は音声認識部１３が誤認識している部分である。 In step s32, the template generation unit 12 generates correct speech recognition utterances using the plurality of utterances including the same keyword in the following order, taking the description of FIG. 5 as an example. FIG. 5 shows a case where the template generation unit 12 has acquired a speech-recognized utterance sentence including five “Kyoto”, and the parentheses in the right side of FIG. It is an actual utterance sentence, and the underlined part in the sentence on the left side is the part that the voice recognition unit 13 has misrecognized.

第１に、テンプレート生成部１２は、図５（ｂ）に示すように、各音声認識した５個の発話文に関して「京都」以外をひらがなに変換する。 First, as illustrated in FIG. 5B, the template generation unit 12 converts other than “Kyoto” into hiragana for the five uttered sentences recognized for each voice.

第２に、テンプレート生成部１２は、図５（ｂ）に示すように、キーワード「京都」を中心に、かつ、単位文字（すなわち、ひらがな単位）で、５個の発話文を可能な限り重なる部分が多くなるように対応付けを行う。 Secondly, as shown in FIG. 5B, the template generation unit 12 overlaps as much as possible five utterance sentences centering on the keyword “Kyoto” and unit characters (that is, hiragana units). Matching is performed so that there are more parts.

第３に、テンプレート生成部１２は、対応付けを行ったひらがなの中で、音声認識された回数が多いひらがなを選択する。テンプレート生成部１２は、絶対的に回数が多いひらがなを選択しても構わないし、相対的に回数が多いひらがなを選択しても構わない。絶対的に回数が多いか否かは、閾値を用いて判断する。相対的に回数が多いか否かは、音声認識された回数の多さの順位に基づいて判断する（例えば、最も回数が多いか否か）。 Third, the template generation unit 12 selects hiragana having a large number of times of speech recognition among hiragana that has been associated. The template generation unit 12 may select a hiragana having a large number of absolute times, or may select a hiragana having a relatively large number of times. Whether or not the number is absolutely large is determined using a threshold value. Whether or not the number of times is relatively large is determined based on the ranking of the number of times of voice recognition (for example, whether or not the number of times is the largest).

第４に、テンプレート生成部１２は、図５（ｃ）に示すように、選択したひらがなとキーワード「京都」をつなげて文字列を生成して、「京都にいった」という文章を正解文章として生成する。 Fourth, as shown in FIG. 5C, the template generation unit 12 generates a character string by connecting the selected hiragana and the keyword “Kyoto”, and sets the sentence “I went to Kyoto” as a correct sentence. Generate.

第５に、テンプレート生成部１２は、図５（ｄ）に示すように、正解文章に、元の音声認識結果から漢字を割り当てて「京都に行った」という文章を生成する。 Fifthly, as shown in FIG. 5D, the template generation unit 12 generates a sentence “I went to Kyoto” by assigning kanji to the correct sentence from the original speech recognition result.

ステップｓ３３では、テンプレート生成部１２は、「京都」を概念［場所］に置換して「［場所］に行った」という文章を、新たな発話テンプレートに設定する。 In step s33, the template generation unit 12 replaces “Kyoto” with the concept [place] and sets a sentence “I went to [place]” as a new utterance template.

ステップｓ３４では、テンプレート生成部１２は、新たな発話テンプレート「［場所］に行った」をテンプレート辞書データベース１８のテンプレート辞書１８１「旅行」に格納する。 In step s34, the template generation unit 12 stores the new utterance template “I went to [location]” in the template dictionary 181 “travel” of the template dictionary database 18.

この第１の具体例では、ユーザ２の異なる５個の発話文を統合して１つの発話文を生成している。そのため、誤認識した部分を削除できる。また、１回しか発話しなかった「清水寺」も削除され、ユーザ２がよく発話した部分だけが残るようになる。そのため、ユーザ２の発話の癖も取得できる。 In the first specific example, five utterance sentences different from the user 2 are integrated to generate one utterance sentence. Therefore, the erroneously recognized part can be deleted. Also, “Kiyomizu-dera” that was spoken only once is deleted, and only the part that the user 2 often utters remains. Therefore, the habit of the user 2's utterance can also be acquired.

本実施例の音声対話装置１が発話テンプレートを生成する第２の具体例について、図６に基づいて説明する。 A second specific example in which the speech dialogue apparatus 1 of the present embodiment generates an utterance template will be described with reference to FIG.

図６は、音声対話装置１が、「〜じゃん」という口癖を持ったユーザ２の音声認識結果から発話テンプレートを生成する例を示している。口癖は発話される頻度が高いので、認識結果に出てくる確率も高く、図６に示すように発話テンプレートに残る可能性が高い。図６（ａ）に示すように、テンプレート生成部１２は、４個の「新幹線」を含む音声認識した発話文を取得する。図６（ａ）の右側の文章の（）内はユーザ２の実際の発話文で、左側の文章の下線部は音声認識部１３が誤認識している部分である。 FIG. 6 shows an example in which the voice interaction apparatus 1 generates an utterance template from the voice recognition result of the user 2 who has a habit of “~ jan”. Since the mustache is uttered frequently, the probability of appearing in the recognition result is high, and there is a high possibility that it remains in the utterance template as shown in FIG. As illustrated in FIG. 6A, the template generation unit 12 acquires a speech-recognized utterance that includes four “Shinkansen”. In FIG. 6A, the text in parentheses () is the actual utterance sentence of the user 2, and the underlined part of the text on the left side is the part that the speech recognition unit 13 has erroneously recognized.

第１に、テンプレート生成部１２は、図６（ｂ）に示すように、各音声認識した発話文に関してキーワード「新幹線」以外をひらがなに変換する。 First, as illustrated in FIG. 6B, the template generation unit 12 converts hiragana other than the keyword “Shinkansen” with respect to each speech-recognized speech sentence.

第２に、テンプレート生成部１２は、図６（ｂ）に示すように、キーワード「新幹線」を中心にひらがな単位で、４個の発話文を可能な限り重なる部分が多くなるように対応付けを行う。 Secondly, as shown in FIG. 6B, the template generation unit 12 associates the four “sentences” as much as possible with the hiragana unit around the keyword “Shinkansen”. Do.

第４に、テンプレート生成部１２は、図６（ｃ）に示すように、選択したひらがなとキーワード「新幹線」をつなげて文字列を生成して、「新幹線ははやいじゃん」という文章が正解文章として生成する。 Fourth, as shown in FIG. 6C, the template generation unit 12 generates a character string by connecting the selected hiragana and the keyword “Shinkansen”, and the sentence “Shinkansen is fast” is the correct sentence. Generate.

第５に、テンプレート生成部１２は、図６（ｄ）に示すように、元の音声認識結果から漢字を割り当てて「新幹線は早いじゃん」という文章を生成する。 Fifth, as shown in FIG. 6D, the template generation unit 12 assigns kanji from the original speech recognition result and generates a sentence “Shinkansen is fast”.

第６に、テンプレート生成部１２は、キーワード「新幹線」を概念［乗り物］に置換して「［場所］は早いじゃん」という文章を、新たな発話テンプレートに設定する。 Sixth, the template generation unit 12 replaces the keyword “Shinkansen” with the concept [vehicle] and sets a sentence “[location] is early” as a new utterance template.

第７に、テンプレート生成部１２は、新たな発話テンプレート「［乗り物］は早いじゃん」を、テンプレート辞書データベース１８のテンプレート辞書１８１「旅行」に格納する。 Seventh, the template generation unit 12 stores the new utterance template “[vehicle] is fast” in the template dictionary 181 “travel” of the template dictionary database 18.

第２の具体例によれば、音声対話装置１は、言語的な解析をしないため、言語的に意味を持たない「〜じゃん」のような口癖も発話テンプレートとして格納できる。 According to the second specific example, since the voice interaction apparatus 1 does not perform linguistic analysis, it can also store a moustache such as “~ jan” that has no linguistic meaning as an utterance template.

この方法によって発話テンプレートを大量に生成して、これを使用して音声対話装置１が発話を行うと、ユーザ２のしゃべり方に似たしゃべり方をする音声対話装置１を構築できる。 When a large amount of utterance templates are generated by this method and the voice dialogue apparatus 1 utters using the utterance template, the voice dialogue apparatus 1 that speaks in a manner similar to that of the user 2 can be constructed.

本実施例によれば、音声対話装置１を内蔵したロボット３は、無味乾燥な発話ではなく、キャラクタを持った発話を行うことができ、さらにユーザ２と同じようなしゃべり方なので、ユーザ２に親近感を持ってもらうことができる。 According to the present embodiment, the robot 3 with the built-in voice interaction device 1 can make a utterance with a character, not a tasteless dry utterance, and can speak to the user 2 because the way of speaking is similar to that of the user 2. You can have a sense of affinity.

本発明の実施例３の音声対話装置１について図７〜図８に基づいて説明する。 A voice interaction apparatus 1 according to a third embodiment of the present invention will be described with reference to FIGS.

実施例２では、複数の音声認識した発話文を使用して発話テンプレートを生成した。しかし、この方法だと、同じキーワード辞書１７１に格納されたキーワードを含む全ての音声認識した発話文を使用して１つの発話テンプレートしか生成されない。また、ユーザ２が同じような文章を発話しているとは限らないので、全く異なる発話の音声認識した発話文を対応付けしても、正しい発話テンプレートを得るのは難しい。 In Example 2, an utterance template was generated using a plurality of speech-recognized utterance sentences. However, with this method, only one utterance template is generated using all speech-recognized utterance sentences including keywords stored in the same keyword dictionary 171. In addition, since the user 2 does not always utter the same sentence, it is difficult to obtain a correct utterance template even if the speech sentences whose voices are recognized by completely different utterances are associated with each other.

そこで、本実施例では、同じキーワード辞書１７１に格納されたキーワードを含む音声認識した発話文の中で、互いの類似度が高いものだけでグルーピングし、同じグループ内の発話文のみで、実施例２で説明した対応付けを行って発話テンプレートを生成する。 Therefore, in the present embodiment, the speech recognition utterances including the keywords stored in the same keyword dictionary 171 are grouped only by speech sentences having a high degree of similarity, and only the utterance sentences within the same group are used. The correspondence described in 2 is performed and an utterance template is generated.

本実施例の音声対話装置１が発話テンプレートを生成する第１の具体例について、図７のフローチャートに基づいて説明する。 A first specific example in which the speech dialogue apparatus 1 of the present embodiment generates an utterance template will be described based on the flowchart of FIG.

ステップｓ４１では、テンプレート生成部１２は、実施例１と同じように、音声認識した各発話文中に、キーワード辞書１７１に格納されたキーワードが含まれるかどうかを検索する。キーワードが含まれていればステップｓ３２に進み（ｙｅｓの場合）、含まれていなければ処理を終了する（ｎｏの場合）。 In step s41, as in the first embodiment, the template generation unit 12 searches for whether or not the keyword stored in the keyword dictionary 171 is included in each speech sentence that has been voice-recognized. If a keyword is included, the process proceeds to step s32 (in the case of yes), and if not included, the process is terminated (in the case of no).

ステップｓ４２では、テンプレート生成部１２は、キーワード辞書１７１に格納されたキーワードを含む音声認識した発話文について、同じキーワードを含む発話文を類似度の高い発話文毎にグルーピングする。このグルーピング方法については後から詳しく説明する。 In step s42, the utterance sentences including the same keyword are grouped for each utterance sentence having a high similarity with respect to the utterance sentences recognized by speech including the keywords stored in the keyword dictionary 171. This grouping method will be described in detail later.

ステップｓ４３では、テンプレート生成部１２は、グループ分けを行った後、同じグループの発話文を使って発話テンプレートを生成する。これ以降の処理は実施例２と同じなので省略する。 In step s43, after performing grouping, the template generation unit 12 generates an utterance template using an utterance sentence of the same group. Since the subsequent processing is the same as that of the second embodiment, a description thereof will be omitted.

次に、テンプレート生成部１２が、ステップｓ４２で行う発話文のグルーピング方法について詳しく説明する。 Next, the utterance sentence grouping method performed in step s42 by the template generation unit 12 will be described in detail.

第１のグルーピング方法は、ユーザ２との対話履歴を使用する方法である。音声対話装置１が同じ発話をした際のユーザ２の発話は、同じような発話をしている可能性が高い。図８（ａ）（ｂ）は音声対話装置１が同じ質問を複数回行った際のユーザ２の回答例であり、これを用いて説明する。 The first grouping method is a method that uses a dialogue history with the user 2. The utterance of the user 2 when the voice interactive apparatus 1 utters the same utterance is likely to be the same utterance. FIGS. 8A and 8B are examples of answers of the user 2 when the voice dialogue apparatus 1 makes the same question a plurality of times, which will be described below.

図８（ａ）の例では、音声対話装置１が、「京都はいつ行くのがいいの？」と２回質問すると、ユーザ２が「京都は秋がお勧めだよ」、「京都はそうだねえ、秋がお勧めだと」とそれぞれ回答したとする。 In the example of FIG. 8 (a), when the voice dialogue apparatus 1 asks twice "Why should I go to Kyoto?", The user 2 said "Kyoto is recommended for autumn" and "Kyoto is so. “Hey, autumn is recommended”.

図８（ｂ）の例では、音声対話装置１が、「京都はどうやって行くの？」と２回質問すると、ユーザ２が「そうだねえ、新幹線に乗るんじゃないかな」、「新幹線に乗るといいよ」とそれぞれ回答したとする。 In the example of FIG. 8B, when the voice interaction apparatus 1 asks twice "How do you go to Kyoto?", The user 2 may say "That's right, I'm going to get on the Shinkansen." Suppose you answer each.

同じ質問に対する回答同士は、図８の文章の下線部のように重複する文字列が多いので、音声認識結果も重複する文字列が多くなる。このように、対話を何度も行ったり、長時間行った際に、音声対話装置１が同じ質問をした際のユーザ２の回答は類似度が高い文章とみなし、テンプレート生成部１２は同じグループに分ける。なお、第１のグルーピング方法では、音声対話装置１が行う同じ質問は、対話として連続してなくてよく、最初の質問と次の質問との間に他の対話があってもよい。 Answers to the same question have many overlapping character strings like the underlined portion of the sentence in FIG. As described above, when the dialogue is performed many times or for a long time, the answer of the user 2 when the voice dialogue apparatus 1 asks the same question is regarded as a sentence having a high similarity, and the template generation unit 12 has the same group. Divide into In the first grouping method, the same question performed by the voice dialogue apparatus 1 may not be continuous as a dialogue, and there may be another dialogue between the first question and the next question.

第２のグルーピング方法は、音声対話装置１が、ユーザ２に同じ発話文を繰り返して発話させるような質問文を用いて聞き直す方法である。ユーザ２が何か発話した後に、「え？なんて言ったの？」のような聞き返す発話を音声対話装置１が行う。するとユーザ２は先ほど発話した文章に似た内容を発話する可能性が高い。ユーザ２自身も一度発話した内容を正確に覚えてはいないので、全く同じ文章を発話するとは限らないが、似たような内容の文章を発話する可能性が高い。そこで、音声対話装置１がこのような発話をした際の前後のユーザ２の発話は類似度が高い文章とみなし、同じグループに分ける。このような同じグループの文章を増やすために、音声対話装置１は意図的に聞き返す発話を行ってもよい。なお、第２のグルーピング方法では、音声対話装置１が行う聞き直しは、対話として連続している必要がある。 The second grouping method is a method in which the voice interaction apparatus 1 listens again using a question sentence that causes the user 2 to repeat the same utterance sentence. After the user 2 utters something, the voice interaction apparatus 1 performs the utterance to listen back like “What did you say?”. Then, the user 2 is likely to utter content similar to the sentence spoken earlier. Since the user 2 himself / herself does not remember exactly what was spoken, the user 2 does not always utter exactly the same sentence, but is likely to utter a sentence with similar contents. Therefore, the utterances of the user 2 before and after the speech dialogue apparatus 1 utters such utterances are regarded as sentences having high similarity and are divided into the same group. In order to increase the sentence of the same group, the voice interactive apparatus 1 may utter an intentionally listening back. In the second grouping method, the re-listening performed by the voice dialogue apparatus 1 needs to be continuous as a dialogue.

第３のグルーピング方法は、音声認識された各発話文の互いの類似度を用いる方法であり、類似度として編集距離を使用する。テンプレート生成部１２は同じキーワード辞書１７１に格納されたキーワードを含む音声認識された発話文同士の編集距離を算出する。テンプレート生成部１２は編集距離の近い文章は類似度が高いとみなし、編集距離が閾値以下である２つの発話文を同じグループに分ける。なお、「編集距離」とは、２つの文字列がどの程度異なっているかを示す数値であって、例えば、文字の挿入や削除、置換によって、１つの文字列を別の文字列に変形するのに必要な手順の最小回数を意味する。 The third grouping method is a method that uses the degree of similarity of each speech sentence that has been voice-recognized, and uses the edit distance as the degree of similarity. The template generation unit 12 calculates the edit distance between speech sentences that have been speech-recognized including the keywords stored in the same keyword dictionary 171. The template generation unit 12 regards sentences having a close edit distance as having high similarity, and divides two utterance sentences having an edit distance equal to or less than a threshold into the same group. The “edit distance” is a numerical value indicating how different two character strings are. For example, one character string is transformed into another character string by inserting, deleting, or replacing characters. Means the minimum number of steps required.

本実施例によれば、類似度の高い発話文同士を対応付けて発話テンプレートを作ることで、発話テンプレートに残る文字列が多くなり、より長くて複雑な発話文を生成できる。 According to the present embodiment, by creating an utterance template by associating utterance sentences with high similarity, more character strings remain in the utterance template, and a longer and more complex utterance sentence can be generated.

本発明の実施例４の音声対話装置１について図９〜図１０に基づいて説明する。 A voice interaction apparatus 1 according to a fourth embodiment of the present invention will be described with reference to FIGS.

本実施例では、生成された発話テンプレートはそのまま格納するだけでなく、一定の変換ルールに基づいて発話テンプレートを変換した文章を発話テンプレートとして格納する。 In the present embodiment, not only the generated utterance template is stored as it is, but also a sentence obtained by converting the utterance template based on a certain conversion rule is stored as the utterance template.

以下、第１の変換ルールの例を説明する。 Hereinafter, an example of the first conversion rule will be described.

第１の変換ルールは、元の発話テンプレートの文体を肯定文、否定文、又は、疑問文に変更する。例として肯定文から疑問文への変換を行う際のルールを説明する。肯定文から疑問文への変換は、主に文末の語句を変換させればよい。以下は発話テンプレートを形態素解析した際の文末の形態素１つ、又は、２つの種類に応じた変換ルールである。文末に助詞が入る場合は助詞は無視される。 The first conversion rule changes the style of the original utterance template to a positive sentence, a negative sentence, or a question sentence. As an example, a rule for converting a positive sentence into a question sentence will be described. The conversion from an affirmative sentence to a question sentence may be performed mainly by converting a word at the end of the sentence. The following are conversion rules corresponding to one or two types of morphemes at the end of the sentence when the utterance template is analyzed. If a particle comes at the end of the sentence, the particle is ignored.

・名詞＋助動詞 →名詞＋「なの？」
・名詞 →名詞＋「なの？」
・動詞＋助動詞 →動詞基本形＋「の？」
・動詞 →動詞基本形＋「の？」
・形容詞＋助動詞→形容詞基本形＋「の？」
・形容詞 →形容詞基本形＋「の？」
図９（ａ）に示すように、テンプレート生成部１２が、例として「［場所］は秋がお勧めだよ」という発話テンプレートを疑問文に変換する方法を説明する。図９はその変換過程を示したものである。・ Noun + auxiliary verb → noun + "What?"
・ Noun → Noun + “What?”
・ Verb + auxiliary verb → basic verb + “no?”
・ Verb-> Verb basic form + "no?"
・ Adjective + auxiliary verb → adjective basic form + “no?”
-Adjective → Adjective basic form + "no?"
As illustrated in FIG. 9A, a method will be described in which the template generation unit 12 converts an utterance template “[Place] is recommended in autumn” into a question sentence as an example. FIG. 9 shows the conversion process.

まず、テンプレート生成部１２は、図９（ｂ）に示すように、発話テンプレートを形態素解析する。すると、この発話テンプレートは、助詞以外の形態素だけに注目すると、文末は「名詞＋助動詞」から構成されている。 First, the template generation unit 12 performs morphological analysis on the utterance template as shown in FIG. Then, in this utterance template, if attention is paid only to morphemes other than particles, the end of the sentence is composed of “noun + auxiliary verb”.

次に、テンプレート生成部１２は、図９（ｃ）に示すように、上記ルールの「名詞＋助動詞→名詞＋「なの？」」を適用し、「お勧めだ」が「お勧めなの？」に変換する。 Next, as shown in FIG. 9C, the template generation unit 12 applies the above rule “noun + auxiliary verb → noun +“ no? ”” And “recommend” is “recommended?” Convert to

次に、テンプレート生成部１２は、「［場所］は秋がお勧めなの？」という新しい発話テンプレートを生成する。 Next, the template generation unit 12 generates a new utterance template “Would you recommend autumn for [location]?”.

以下、第２の変換ルールの例を説明する。 Hereinafter, an example of the second conversion rule will be described.

第２の変換ルールは、発話テンプレートの時制を変更する。以下は現在形から過去形に変換するルールである。 The second conversion rule changes the tense of the utterance template. Below are the rules for converting from present tense to past tense.

・名詞 →名詞＋「だった」
・名詞＋助動詞 →名詞＋「だった」
・動詞＋助動詞 →動詞連用形＋「た」
・動詞 →動詞連用形＋「た」
・形容詞＋助動詞→形容詞連用形＋「た」
・形容詞 →形容詞連用形＋「た」
テンプレート生成部１２は、「［場所］は秋がお勧めだよ」という発話テンプレートに、図９（ｄ）に示すように、「名詞＋助動詞→名詞＋「だった」」というルールを適用して、「［場所］は秋がお勧めだった」と変換する。・ Noun → Noun + “It was”
・ Noun + auxiliary verb → noun + "was"
・ Verb + auxiliary verb → verb combination + “ta”
・ Verb → Verb + + “ta”
・ Adjective + auxiliary verb → adjective combination + “ta”
・ Adjective → Adjective + + “ta”
The template generation unit 12 applies a rule of “noun + auxiliary verb → noun +“ was ”as shown in FIG. 9D to the utterance template“ [Place] is recommended for autumn ”. Then, “[Location] recommended autumn”.

以下、第３の変換ルールの例を説明する。 Hereinafter, an example of the third conversion rule will be described.

第３の変換ルールは、発話テンプレート中の語句を一部変更する。 The third conversion rule partially changes the word / phrase in the utterance template.

音声対話装置１中には概念辞書１９があり、この中にはさまざまな単語が同じ概念に分類されていて、基本的に同じ概念の単語は文中で同じような用法で使用される。 In the spoken dialogue apparatus 1, there is a concept dictionary 19, in which various words are classified into the same concept, and basically words of the same concept are used in the same way in the sentence.

例えば、時間的頻度を概念にもつ「ときどき」と「よく」を例にとると「図書館にときどき行く」と「図書館によく行く」のように同じ文章中でその部分だけ入れ替えても日本語文章として使用できる。 For example, taking “sometimes” and “frequently” with the concept of time frequency as an example, even if only that part is replaced in the same sentence, such as “frequently go to the library” and “frequently go to the library”, the Japanese sentence Can be used as

そこで、まず、テンプレート生成部１２は、発話テンプレートを形態素解析し、各形態素を概念辞書１９で検索し、各形態素の概念を見つける。次に、テンプレート生成部１２は、同じ概念中の単語と置換したものも新しい発話テンプレートとして使用する。 Therefore, first, the template generation unit 12 performs morphological analysis on the utterance template, searches each morpheme in the concept dictionary 19, and finds the concept of each morpheme. Next, the template generation part 12 uses what replaced the word in the same concept as a new speech template.

図１０（ａ）に示すように、例として「［場所］は秋がお勧めだよ」という発話テンプレートを変換する場合について説明する。 As shown in FIG. 10A, a case where an utterance template “[Location] is recommended for autumn” will be described as an example.

テンプレート生成部１２は、図１０（ｂ）に示すように、形態素解析し、各形態素を概念辞書１９で検索すると「秋」という単語は「季節」という概念を持つことがわかる。 As shown in FIG. 10B, the template generation unit 12 performs morphological analysis, and when each morpheme is searched in the concept dictionary 19, it can be seen that the word “autumn” has the concept of “season”.

そこで、テンプレート生成部１２は、図１０（ｂ）に示すように、「秋」を同じ概念を持つ他の単語を抽出し、図１０（ｄ）に示すように、その抽出した単語で置換した発話テンプレート「［場所］は春がお勧めだよ」「［場所］は早春がお勧めだよ」「［場所］は梅雨時がお勧めだよ」などを生成する。 Therefore, the template generation unit 12 extracts other words having the same concept as “autumn” as shown in FIG. 10 (b), and replaces them with the extracted words as shown in FIG. 10 (d). The utterance templates “[Location] is recommended for spring”, “[Location] is recommended for early spring”, “[Location] is recommended for the rainy season”, etc. are generated.

上記のいくつかの例を組み合わせた変換も可能である。 Conversions combining some of the above examples are also possible.

Example of change

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の主旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１・・・音声対話装置、１１・・・発話生成部、１２・・・テンプレート生成部、１３・・・音声認識部、１４・・・音声合成部、１７１・・・キーワード辞書、１８１・・・テンプレート辞書 DESCRIPTION OF SYMBOLS 1 ... Voice dialogue apparatus, 11 ... Speech production | generation part, 12 ... Template production | generation part, 13 ... Speech recognition part, 14 ... Speech synthesis part, 171 ... Keyword dictionary, 181 ...・ Template dictionary

Claims

In a voice interaction device for performing a voice interaction with a user using an utterance template stored in a template dictionary,
A keyword dictionary storage unit storing a plurality of keywords related to a specific topic, and information indicating a concept corresponding to each keyword,
A voice recognition unit that recognizes a user's speech,
When the keyword stored in the keyword dictionary is included in the utterance sentence recognized by the voice recognition unit, the keyword is replaced with information indicating the concept corresponding to the keyword, and the utterance is replaced. A template generation unit that sets a sentence as a new utterance template and stores it in the template dictionary;
A voice interactive apparatus characterized by comprising:

The template generation unit
With respect to the plurality of uttered sentences that have been voice-recognized, parts other than the keywords are divided by unit characters,
With respect to each unit character of each utterance sentence, making an association based on the same keyword between the utterance sentences,
Generating the utterance template from a character string connecting the unit characters that are absolute or relatively large in the number of times of voice recognition among the unit characters associated with each other;
The spoken dialogue apparatus according to claim 1.

An utterance generator for outputting utterances to the user;
The utterance generation unit outputs the same utterance multiple times,
The template generation unit generates the utterance template using each utterance sentence answered by the user with respect to the plurality of the same utterances.
The spoken dialogue apparatus according to claim 2.

An utterance generator for outputting utterances to the user;
The utterance generation unit outputs a question sentence such that the user repeats the same utterance sentence as an answer multiple times,
The template generation unit generates the utterance template using each utterance sentence answered by the user with respect to the question sentence a plurality of times.
The spoken dialogue apparatus according to claim 3.

The template generation unit
Among the utterances recognized by the voice recognition unit, the utterance template is generated using only utterances whose similarity is higher than a threshold.
The spoken dialogue apparatus according to claim 1.

The template generation unit
Perform morphological analysis on the utterance template,
Based on the result of the morphological analysis, the original utterance template is changed to generate a new utterance template.
The spoken dialogue apparatus according to claim 2.

The template generation unit
Based on the result of the morphological analysis, change the style of the utterance template to a positive sentence, a negative sentence, or a question sentence, or change the tense of the utterance template to generate a new utterance template.
The voice interactive apparatus according to claim 6.

The template generation unit
Based on the result of the morphological analysis, a word other than the keyword in the original utterance template is changed to a sentence replaced with a word having the same concept as the word, and a new utterance template is generated.
The voice interactive apparatus according to claim 6.