JP2003141113A

JP2003141113A - Translating device, voice translating method and program

Info

Publication number: JP2003141113A
Application number: JP2001334721A
Authority: JP
Inventors: Shigeru Kafuku; 滋加福; Koichi Nakagome; 浩一中込
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2001-10-31
Filing date: 2001-10-31
Publication date: 2003-05-16

Abstract

PROBLEM TO BE SOLVED: To realize precise and high-speed translation processing by presenting an example candidate from a speech recognition result. SOLUTION: The input voice is speech-recognized, and a word having the highest one of previously provided words is extracted as a candidate. An example is prepared for each scene, and related words are related to the respective examples. The extracted word and the word of each example are compared, scoring is performed according to the conformity, and the words having a high score are presented as candidates. A suitable word is selected from the presented candidates, and parallel translation to the selected example is output.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、的確な対訳を高速
に出力することができる翻訳装置等に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a translation device and the like capable of outputting an accurate parallel translation at high speed.

【０００２】[0002]

【従来の技術】従来より、海外旅行などで現地の人と会
話をするとき、その会話を支援する携帯型会話翻訳装置
が用いられている。この種の会話翻訳機における音声翻
訳は話者が発話した自然言語の音声文を推論手法により
音声認識し、その認識結果を文字列に変換し、対応する
他の言語の文字列を選び出している。しかし、人間どう
しのコミニケーションの過程で現れる様々な言語現象
を、余すことなく完全に網羅することは不可能に近く、
このため、網羅されていない言語現象が発話された場合
には誤った処理が行われる場合があった。2. Description of the Related Art Conventionally, when a conversation is made with a local person during an overseas trip or the like, a portable conversation translation device for supporting the conversation has been used. The speech translation in this type of conversation translator recognizes a natural language speech sentence uttered by a speaker by an inference method, converts the recognition result into a character string, and selects a corresponding character string in another language. . However, it is almost impossible to completely cover various language phenomena that appear in the process of communication between human beings,
For this reason, erroneous processing may be performed when an uncovered language phenomenon is spoken.

【０００３】そこで、最近では、このような弊害を解決
するために、原文の用例とその対訳例との組を用例デー
タベースに多数格納しておき、所定の言語で原文が音声
入力されると、その音声文を音声認識し、その認識結果
の会話文の音声からキーワードとなる単語を抽出し、抽
出された単語を含む用例を用例データベースの多数の用
例から検索し、その検索結果の用例のうち単語の一致度
が最大の用例を選択し、その用例の対訳例にしたがっ
て、原文の翻訳が行われるような翻訳機も出現してきて
いる。Therefore, recently, in order to solve such an adverse effect, a large number of sets of examples of original sentences and their bilingual examples are stored in an example database, and when the original sentence is input by voice in a predetermined language, The voice sentence is voice-recognized, the word that is the keyword is extracted from the voice of the conversation sentence of the recognition result, an example including the extracted word is searched from many examples in the example database, and the example of the search result is extracted. Translators that select an example with the highest degree of word matching and translate the original sentence according to the bilingual example of the example have also appeared.

【０００４】[0004]

【発明が解決しようとする課題】従来の用例を利用する
方法では、相手の言葉が理解できない人々に自分の意志
を伝えることはできるが、その自分の意志に対しての相
手からの応答の言葉に対する処理は考慮されていなかっ
た。このため、相手からの問いかけが理解できない、あ
るいは聞き取ることができずに、相手とのコミニケーシ
ョンが円滑に行えない場合があった。With the conventional method of using an example, it is possible to convey one's will to a person who cannot understand the other person's words, but a word of the other person's response to the one's will. Was not considered. For this reason, in some cases, the inquiry from the other party could not be understood or could not be heard, and communication with the other party could not be carried out smoothly.

【０００５】本発明は、上記実状に鑑みてなされたもの
で、より正確に翻訳文例を出力することができる翻訳装
置、音声翻訳方法、およびプログラムを提供することを
目的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a translation device, a speech translation method, and a program that can output translated sentence examples more accurately.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するた
め、本発明の第１の観点にかかる翻訳装置は、複数の自
然言語のうち、一対の他言語間の翻訳を行う翻訳装置に
おいて、前記複数言語のうち、前記一対の他言語のうち
の一方を設定する言語設定手段と、前記言語設定手段で
設定された言語による音声文を取得して音声認識する音
声認識手段と、前記音声認識手段による音声認識結果に
基づき、前記取得した音声文に含まれる単語を認識する
単語認識手段と、各言語について、複数の文例と、各文
例に関連する少なくとも１つの単語と、各文例の対訳文
例と、を対応付けて記憶している文例記憶手段と、前記
単語認識手段で認識された単語と、前記文例記憶手段に
記憶された単語との一致度に基づいて、対応する文例を
複数選出する文例候補選出手段と、前記文例候補選出手
段が選出した文例を、使用者に選択可能に出力する文例
候補出力手段と、前記文例候補出力手段が出力した文例
のうち、前記使用者に選択された文例に対応する対訳文
例を、前記文例記憶手段から取得して出力する対訳文例
出力手段と、前記音声認識手段による音声認識の対象と
なる言語を、前記言語設定手段により設定された言語か
ら他方の言語に切り替える対象言語切替手段と、を備え
ることを特徴とする。In order to achieve the above object, a translation device according to a first aspect of the present invention is a translation device for translating between a pair of other languages among a plurality of natural languages. Of a plurality of languages, a language setting unit that sets one of the pair of other languages, a voice recognition unit that acquires a voice sentence in the language set by the language setting unit, and recognizes the voice, and the voice recognition unit. A word recognition means for recognizing a word included in the acquired voice sentence based on a voice recognition result by, a plurality of sentence examples for each language, at least one word associated with each sentence example, and a parallel translation sentence example of each sentence example. , A sentence example in which a plurality of corresponding sentence examples are selected based on the degree of coincidence between the sentence example storage unit that stores, and the word recognized by the word recognition unit and the word stored in the sentence example storage unit. Weather Of the sentence examples selected by the selecting means and the sentence example candidate selecting means, the sentence example candidate outputting means for outputting to the user in a selectable manner, and the sentence example selected by the user among the sentence examples output by the sentence example candidate outputting means, The corresponding bilingual sentence example is acquired from the sentence example storage unit and output, and the target language for voice recognition by the voice recognition unit is changed from the language set by the language setting unit to the other language. And a target language switching unit for switching.

【０００７】このような構成によれば、音声入力された
第１言語（例えば、日本語）の音声文を音声認識して単
語抽出を行い、予め用意された文例に対応づけられた単
語と比較する。比較の結果、単語の一致度が高い文例を
所定数選出し、その中から最適なものを話者に選択させ
る。そして、選択された文例の対訳（第２言語：例えば
英語）を、例えば、表示装置や音声合成装置などの出力
装置に出力させることで、相手に提示することができ
る。このため、話者が伝えたい内容の対訳を的確に出力
することができる。また、例えば、対訳を出力した後な
どに、対象言語を他方に切り替えることができるので、
応答者による応答の翻訳にも対応でき、会話の進行を妨
げずに翻訳処理を行うことができる。According to this structure, the voice sentence of the first language (for example, Japanese) that has been input by voice is recognized by voice recognition, and the words are extracted and compared with the words associated with the prepared sentence examples. To do. As a result of the comparison, a predetermined number of sentence examples in which the degree of word matching is high are selected, and the speaker is made to select the optimum one. Then, the parallel translation (second language: English, for example) of the selected sentence example can be presented to the other party by being output to an output device such as a display device or a voice synthesizer. Therefore, it is possible to accurately output the parallel translation of the content that the speaker wants to convey. Also, for example, after outputting the parallel translation, the target language can be switched to the other,
It is possible to deal with the translation of the response by the respondent, and the translation processing can be performed without hindering the progress of the conversation.

【０００８】上記翻訳装置において、前記文例記憶手段
は、文例および対訳文例を場面毎に分類して記憶してい
ることが望ましく、この場合、前記音声認識手段は、音
声認識対象となる音声文が用いられる場面を予め指定さ
せる場面指定手段を備え、前記文例候補選出手段は、前
記場面指定手段で指定された場面に対応した文例から候
補を選出するものとすることができる。In the above translation apparatus, it is preferable that the sentence example storage means classify and store sentence examples and bilingual sentence examples for each scene. In this case, the voice recognition means recognizes the voice sentence as a voice recognition target. The sentence example selection unit may include a scene designation unit that designates a scene to be used in advance, and the sentence example candidate selection unit may select a candidate from a sentence example corresponding to the scene designated by the scene designation unit.

【０００９】また、この場合、前記対訳文例出力手段が
出力する対訳文例に対する応答文例を前記場面毎に予め
記憶している応答文例記憶手段と、前記場面指定手段に
よる場面指定を契機に、当該場面に対応する応答文例を
前記応答文例記憶手段から取得する応答文例取得手段
と、をさらに備えることが望ましく、この場合、前記文
例候補選出手段は、前記対象言語切替手段による対象言
語の切り替え後は、前記応答文例取得手段が取得した応
答文例から文例候補を選出するものとすることができ
る。[0009] In this case, the response sentence example storage means stores in advance the response sentence example for the parallel translation example output by the parallel translation example output means for each scene, and the scene is designated by the scene designation means. It is desirable to further include a response sentence example acquisition unit that obtains a response sentence example corresponding to the response sentence example storage unit, in this case, the sentence example candidate selection unit, after switching the target language by the target language switching unit, Sentence example candidates may be selected from the response sentence examples acquired by the response sentence example acquisition means.

【００１０】上記のような構成によれば、文例を予め場
面毎に分類しておき、話者に予め場面を選択させてお
く。そして、選択された場面に対応する文例から候補文
例を選出するので、より的確に話者の意図する文例を選
出することができる。According to the above configuration, sentence examples are classified in advance for each scene, and the speaker is allowed to select the scene in advance. Since candidate sentence examples are selected from the sentence examples corresponding to the selected scene, the sentence examples intended by the speaker can be selected more accurately.

【００１１】また、場面の選択に対応して、提示した対
訳に対する相手方の応答を予測し、予めメモリなどに保
持しておくので、相手方が応答文例を選択する際の処理
を高速に行うことができる。この場合、応答文例も場面
毎に分類されているので、応答者の意図する文例を的確
に選出することができる。Further, since the response of the other party to the presented parallel translation is predicted corresponding to the selection of the scene and is stored in the memory or the like in advance, the processing when the other party selects the response sentence example can be performed at high speed. it can. In this case, since the response sentence examples are also classified for each scene, the sentence examples intended by the responder can be accurately selected.

【００１２】上記翻訳装置において、前記音声認識手段
は、前記複数言語について、音声に含まれる各音素をモ
デル化した音素モデルと、複数種類の単語の音素パター
ン系列情報を登録した言語モデルと、を予め記憶してい
るモデル記憶部を備えていることが望ましく、この場
合、入力された音声文を、前記モデル記憶部に記憶され
た前記音素モデルと前記言語モデルとを参照することで
音声認識し、前記単語認識手段は、認識対象となる単語
を含むグループの前後に、無音状態を示すグループおよ
び前記言語モデルに登録されていない単語を含むグルー
プを配置する接続関係を定義するためのグループ間接続
規則を示す情報を予め記憶しておく接続規則記憶手段を
備え、前記接続規則記憶手段に記憶されたグループ間接
続規則に基づいて、前記音声認識手段が音声認識した音
声文から、各グループごとの最大尤度を求め、該求めた
最大尤度に基づいて、認識対象となる単語を抽出するも
のとすることができる。In the above translation apparatus, the speech recognition means includes, for the plurality of languages, a phoneme model in which each phoneme included in the speech is modeled and a language model in which phoneme pattern series information of a plurality of types of words is registered. It is desirable to include a model storage unit that is stored in advance. In this case, the input voice sentence is recognized by referring to the phoneme model and the language model stored in the model storage unit. The word recognizing means arranges a group indicating a silent state and a group including a word that is not registered in the language model before and after a group including a word to be recognized. A connection rule storage means for pre-storing information indicating a rule, and based on the inter-group connection rule stored in the connection rule storage means, From a speech statement whose serial speech recognition means recognizes speech, obtains the maximum likelihood for each group, based on the maximum likelihood calculated said, it can be made to extract a word to be recognized.

【００１３】このような構成によれば、例えば、発話さ
れた音声文の発話前と発話後の無音部や、助詞などのそ
れ自体には意味のない語を峻別し、必要な単語のみを抽
出できるので、文例候補の選出をより的確に行うことが
できる。According to such a configuration, for example, silent parts before and after utterance of a spoken voice sentence and words that have no meaning in themselves, such as particles, are distinguished, and only necessary words are extracted. Therefore, the sentence example candidates can be selected more accurately.

【００１４】上記翻訳装置において、現在位置の位置情
報を取得する位置情報取得手段をさらに備えていてもよ
く、この場合、前記言語設定手段は、前記位置情報取得
手段が取得した現在位置情報に基づいて、前記一対の他
言語のうちの一方を設定するものとすることができる。The translation apparatus may further include position information acquisition means for acquiring position information of the current position, in which case the language setting means is based on the current position information acquired by the position information acquisition means. Then, one of the pair of other languages can be set.

【００１５】このような構成によれば、例えば、翻訳装
置にＧＰＳ（Global Positioning System：全地球測位
システム）などに対応した受信装置などを備えること
で、翻訳装置の現在位置情報を取得し、これに基づいて
対象言語を設定する。すなわち、例えば、使用者の使用
言語はデフォルトで設定しておき、相手方が使用する言
語を現在位置により判別して自動的に設定する。これに
より、例えば、同一国内で地方により使用言語が異なる
ような場合でも、使用者が意識することなく適切な言語
を設定することができる。According to such a configuration, for example, by providing the translation device with a receiving device compatible with GPS (Global Positioning System) or the like, the current position information of the translation device is acquired and Set the target language based on. That is, for example, the language used by the user is set by default, and the language used by the other party is determined by the current position and automatically set. As a result, for example, even when the language used in different regions varies within the same country, the user can set an appropriate language without being aware of it.

【００１６】上記翻訳装置において、前記音声認識手段
は、取得された音声文を、前記音素モデルと前記言語モ
デルのすべてを参照して音声認識することで、当該音声
文を構成する言語を判別する発話言語判別手段を備えて
いてもよく、この場合、前記言語設定手段は、所定の文
言を母国語で発話するよう促す複数言語によるメッセー
ジを出力し、前記発話言語判別手段が判別した、該メッ
セージに応じて発話された所定文言の言語を設定するも
のとすることができる。In the translation device, the speech recognition unit discriminates the language that composes the speech sentence by recognizing the acquired speech sentence by referring to all of the phoneme model and the language model. The language setting means may include a spoken language determining means, in which case the language setting means outputs a message in a plurality of languages that prompts a predetermined language to be spoken in a native language, and the message determined by the spoken language determining means. It is possible to set the language of the predetermined wording spoken in accordance with.

【００１７】このような構成によれば、例えば、上記翻
訳装置の起動時に、表示装置などに、所定の文言（例え
ば、あいさつなど）を母国語で発話するよう促すメッセ
ージを、複数言語で表示し、このメッセージに応じて発
話された文言を音声認識することによって発話された言
語を判別するので、使用毎に設定言語を適宜変更するこ
とができる。According to this structure, for example, when the translation device is started, a message prompting the user to speak a predetermined word (eg, greeting) in his native language is displayed on the display device in a plurality of languages. Since the spoken language is determined by recognizing the spoken word in response to this message, the set language can be appropriately changed for each use.

【００１８】前記翻訳装置は、携帯型端末装置から構成
されることが望ましい。The translation device is preferably composed of a portable terminal device.

【００１９】このような構成とすることで、本発明にか
かる翻訳装置を、例えば、海外旅行などの際に携行する
ことができ、他言語でのコミュニケーションが必要とな
る場面で本発明にかかる翻訳装置を活用することができ
る。With such a configuration, the translation apparatus according to the present invention can be carried, for example, when traveling abroad, and the translation according to the present invention is required in a situation where communication in another language is required. The device can be utilized.

【００２０】上記翻訳装置において、前記対象言語切替
手段は、前記対訳文例出力手段による対訳文例の出力を
契機に、音声認識対象の言語を他方に切り替えるものと
することができる。In the above translation device, the target language switching means may switch the language of the voice recognition target to the other language when the parallel translation example output means outputs the parallel translation example.

【００２１】このような構成によれば、対訳の出力毎
に、対象言語を相手方の言語に自動的に切り替えること
ができる。With such a configuration, it is possible to automatically switch the target language to the other party's language each time the bilingual translation is output.

【００２２】また、上記翻訳装置が携帯型端末装置から
構成される場合、発話者がいずれであるかを判別する発
話者判別手段をさらに備えていてもよく、この場合、前
記対象言語切替手段は、前記発話者判別手段の判別結果
に基づいて、対象言語を切り替えるものとすることがで
きる。Further, when the translation device is composed of a portable terminal device, it may further comprise a speaker discrimination means for discriminating which is the speaker. In this case, the target language switching means is The target language can be switched based on the discrimination result of the speaker discrimination means.

【００２３】また、この場合、前記発話者判別手段は、
発話者の音声を収集して前記音声認識部に音声信号を供
給する音声収集手段であって、音声の入力方向を特定可
能な音声収集手段（例えば、指向性マイクロフォン）を
少なくとも２つ備えるものとすることができ、前記各音
声収集手段が特定した入力方向の差異に基づいて、前記
発話者がいずれであるかを判別し、前記対象言語切替手
段は、前記発話者判別手段が判別した発話者の発話タイ
ミングに基づいて、対象言語を切り替えるものとするこ
とができる。Further, in this case, the speaker discrimination means is
A voice collecting unit that collects a voice of a speaker and supplies a voice signal to the voice recognition unit, the voice collecting unit including at least two voice collecting units (for example, directional microphones) capable of specifying a voice input direction. It is possible to determine which of the speakers is based on the difference in the input direction identified by each of the voice collecting units, and the target language switching unit is the speaker identified by the speaker determining unit. The target language can be switched based on the utterance timing of.

【００２４】あるいは、前記文例候補出力手段による文
例候補出力時の前記翻訳装置の方向、および、前記対訳
文例出力手段による対訳文例出力時の前記翻訳装置の方
向を特定する装置方向特定手段と、前記装置方向特定手
段が特定した前記翻訳装置の方向の変化を検出する方向
変化検出手段と、をさらに備えていてもよく、この場
合、前記対象言語切替手段は、前記方向変化検出手段の
検出結果に基づいて、対象言語を切り替えるものとする
ことができる。Alternatively, device direction specifying means for specifying the direction of the translation device when the sentence example candidate output means outputs the sentence example candidate, and the direction of the translation device when the parallel translation sentence example output means outputs the parallel translation sentence example, The apparatus may further include a direction change detecting unit that detects a change in the direction of the translation device specified by the device direction specifying unit, and in this case, the target language switching unit may change the detection result of the direction change detecting unit. Based on this, the target language can be switched.

【００２５】さらにこの場合、上記翻訳装置は、表示手
段と、前記文例候補出力手段、および、前記対訳文例出
力手段からの出力を、前記表示手段に表示させる表示制
御手段と、をさらに備えているものとすることができ、
この場合、前記表示制御手段は、前記対象言語切替手段
が、前記出力対象判別手段の判別結果に基づいて対象言
語を切り替えた場合、判別された出力対象に応じて、表
示方向を変化させてもよい。Further, in this case, the translation apparatus further includes display means, the sentence example candidate output means, and display control means for displaying the output from the parallel translation sentence example output means on the display means. Can be
In this case, when the target language switching unit switches the target language based on the determination result of the output target determining unit, the display control unit may change the display direction according to the determined output target. Good.

【００２６】上記のような構成によれば、二者間で翻訳
装置の受け渡しをしながら会話を進行するような場合に
おいて、発話者がいずれかであるかに基づいて、対象言
語を自動的に切り替えることができる。ここで、翻訳装
置の音声認識部に音声を入力するための音声収集装置
（例えば、マイクロフォン）を指向性のあるものとし、
このような音声収集装置を、例えば、翻訳装置の表面お
よび裏面のそれぞれに少なくとも１つ備えた構成とすれ
ば、対面して会話をしている場合、音声の入力方向に基
づいて、二者のうちのいずれが発話しているかを判別す
ることができる。したがって、音声認識のために音声入
力をする者がいずれかによって、対象言語を自動的に切
り替えることができる。According to the above configuration, in the case where a conversation proceeds while handing over the translator between the two parties, the target language is automatically determined based on which speaker is the speaker. You can switch. Here, it is assumed that the voice collecting device (for example, a microphone) for inputting voice into the voice recognition unit of the translation device has directivity,
If at least one such voice collecting device is provided on each of the front surface and the back surface of the translation device, when face-to-face conversations are made, it is possible to use the voice input direction of the two persons. You can determine which of them is speaking. Therefore, the target language can be automatically switched by any person who inputs a voice for voice recognition.

【００２７】あるいは、出力情報の内容と、装置の方向
とから、対象言語を自動的に切り替えることもできる。
すなわち、文例候補を参照する者が二者のうちの一方で
ある場合、対訳を参照する者は他方であるとみなすこと
ができる。したがって、文例候補の出力時および対訳文
例の出力時それぞれにおける翻訳装置の方向を特定した
場合、装置方向の変化を検出することで受け渡し動作を
検出することができる。この受け渡し動作に応じて対象
言語を自動的に切り替えれば、会話の進行に応じて順次
翻訳処理を行うことができる。さらにこの場合、情報の
提示対象者がいずれか（出力情報が文例候補か、対訳文
例か）であることに応じて、表示画面を、例えば上下反
転させることで、装置の受け渡し時に装置自体を反転、
転回等させる必要がない。Alternatively, the target language can be automatically switched depending on the content of the output information and the direction of the device.
That is, when the person who refers to the sentence example candidate is one of the two persons, the person who refers to the parallel translation can be regarded as the other person. Therefore, when the directions of the translation device are specified at the time of outputting the sentence example candidates and at the time of outputting the bilingual sentence examples, the passing operation can be detected by detecting the change in the device direction. If the target language is automatically switched in accordance with this passing operation, the translation process can be sequentially performed as the conversation progresses. Furthermore, in this case, the display screen is turned upside down, for example, depending on which person the information is presented to (whether the output information is a sentence example candidate or a bilingual sentence example). ,
There is no need to turn around.

【００２８】上記目的を達成するため、本発明の第２の
観点にかかる音声翻訳方法は、一対の他言語のうちのい
ずれかを、話者の発話状況に応じて指定する言語指定ス
テップと、前記言語指定ステップで指定された言語によ
る音声文を取得して音声認識する音声認識ステップと、
前記音声認識ステップでの音声認識結果に基づき、前記
取得した音声文に含まれる単語を認識する単語認識ステ
ップと、各言語について、複数の文例と、各文例に関連
する少なくとも１つの単語と、各文例の対訳文例と、を
対応付けて予め蓄積しておく文例蓄積ステップと、前記
単語認識ステップで認識された単語と、前記文例蓄積ス
テップで蓄積された単語との一致度に基づいて、対応す
る文例を複数選出する文例候補選出ステップと、前記文
例候補選出ステップで選出された文例を、選択可能に出
力する文例候補出力ステップと、前記文例候補出力ステ
ップで出力された文例のうち、選択された文例に対応す
る対訳文例を取得して出力する対訳文例出力ステップ
と、を備えることを特徴とする。In order to achieve the above object, a speech translation method according to a second aspect of the present invention comprises a language designating step of designating one of a pair of other languages according to the utterance situation of a speaker. A voice recognition step of recognizing a voice by acquiring a voice sentence in the language designated in the language designation step;
A word recognition step of recognizing a word included in the acquired voice sentence based on the voice recognition result in the voice recognition step, a plurality of sentence examples for each language, and at least one word related to each sentence example, and Based on the degree of coincidence between the sentence example accumulating step of preliminarily accumulating the parallel translation example of the sentence example and the word recognized in the word recognizing step, and the word accumulated in the sentence example accumulating step, and corresponding. A sentence example candidate selection step of selecting a plurality of sentence examples, a sentence example candidate output step of selectively outputting the sentence example selected in the sentence example candidate selection step, and a sentence example output in the sentence example candidate output step are selected. A parallel translation sentence example output step of acquiring and outputting a parallel translation sentence example corresponding to the sentence example.

【００２９】上記目的を達成するため、本発明の第３の
観点にかかるプログラムは、コンピュータを、音声文を
取得する音声文取得部、複数の自然言語のうち、一対の
他言語を指定する言語指定部、前記音声文取得部が取得
した音声文を、前記言語指定部が指定した一方の言語に
基づいて音声認識する音声認識部、前記音声認識部によ
る音声認識結果に基づいて、前記音声文に含まれる単語
を認識する単語認識部、各言語について、複数の文例
と、各文例に関連する少なくとも１つの単語と、各文例
の対訳文例と、を対応付けて蓄積する文例蓄積部、前記
単語認識部が認識した単語と、前記文例蓄積部に蓄積さ
れている単語との一致度に基づいて、対応する文例を複
数選出する文例候補選出部、前記文例候補選出部が選出
した複数の文例を選択可能に出力する文例候補出力部、
前記文例候補出力部が出力した文例のうち、選択された
文例に対応する対訳文例を出力する対訳文例出力部、前
記対訳文例出力部による対訳文例の出力を契機に、前記
音声認識部が音声認識対象とする言語を他方に切替可能
にする対象言語切替部、として機能させることを特徴と
する。In order to achieve the above-mentioned object, a program according to a third aspect of the present invention is configured such that a computer specifies a language for specifying a pair of other languages out of a plurality of natural languages, a voice sentence acquiring section for acquiring a voice sentence. A voice recognition unit that recognizes a voice sentence acquired by a designation unit and the voice sentence acquisition unit based on one language designated by the language designation unit; and the voice sentence based on the voice recognition result by the voice recognition unit. A word recognition unit for recognizing a word included in, a sentence example storage unit that stores a plurality of sentence examples for each language, at least one word related to each sentence example, and a parallel translation sentence example of each sentence example in association with each other; Based on the degree of coincidence between the word recognized by the recognition unit and the word accumulated in the sentence example accumulating unit, a sentence example candidate selecting unit that selects a plurality of corresponding sentence examples, and a plurality of sentence examples selected by the sentence example candidate selecting unit. Election Phrases candidate output unit for outputting capable,
Among the sentence examples output by the sentence example candidate output unit, the bilingual sentence example output unit that outputs the bilingual sentence example corresponding to the selected sentence example, and the voice recognition unit performs voice recognition upon the output of the bilingual sentence example by the bilingual sentence example output unit. It is characterized in that it functions as a target language switching unit that can switch the target language to the other.

【００３０】[0030]

【発明の実施の形態】以下、添付図面を参照して、本発
明の実施の形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the accompanying drawings.

【００３１】図１は、本発明の実施の形態にかかる翻訳
装置１００の外観を示す図である。図示するように、翻
訳装置１００は、携帯型の装置であり、その外部構成と
して、音声入力部１１０、入力部２００、および出力部
３００を備えている。FIG. 1 is a diagram showing the external appearance of a translation apparatus 100 according to the embodiment of the present invention. As illustrated, the translation device 100 is a portable device, and includes a voice input unit 110, an input unit 200, and an output unit 300 as its external configuration.

【００３２】音声入力部１１０は、例えば、マイクロフ
ォンなどから構成され、後述する音声認識を行う音声文
を入力するためのものである。The voice input unit 110 is composed of, for example, a microphone, and is for inputting a voice sentence for performing voice recognition described later.

【００３３】入力部２００は、例えば、各種ボタン群で
構成され、後述する各種選択動作を行うためのものであ
る。The input unit 200 is composed of, for example, various button groups, and is for performing various selection operations described later.

【００３４】出力部３００は、例えば、液晶表示装置な
どから構成され、後述する候補文例や対訳文例などを表
示する。The output unit 300 is composed of, for example, a liquid crystal display device, and displays candidate sentence examples, parallel translation sentence examples, and the like, which will be described later.

【００３５】次に、翻訳装置１００の内部構成を図２を
参照して説明する。図２は、翻訳装置１００の内部構成
を模式的に示すブロック図である。Next, the internal structure of the translation apparatus 100 will be described with reference to FIG. FIG. 2 is a block diagram schematically showing the internal configuration of the translation device 100.

【００３６】図示するように、翻訳装置１００は、図１
に示す構成の他に、音声認識部１２０、文法ファイル１
３０、モデル格納部１４０、尤度計算部１５０、単語出
力部１６０、用例管理部１７０、用例格納部１８０、お
よびワークエリア１９０を備えている。As shown, the translation device 100 is shown in FIG.
In addition to the configuration shown in, the voice recognition unit 120 and the grammar file 1
30, a model storage unit 140, a likelihood calculation unit 150, a word output unit 160, an example management unit 170, an example storage unit 180, and a work area 190.

【００３７】また、上記入力部２００は、より詳細に
は、言語指定部２１０、場面指定部２２０、および用例
指定部２３０を備えている。言語指定部２１０は、対象
言語を指定する。場面指定部は、予め用意された場面に
基づいて、翻訳対象となるフレーズが使用される場面を
指定する。用例指定部２３０は、翻訳装置１００が抽出
した用例候補から使用者の意図する用例を選択して指定
するためのものである。Further, more specifically, the input section 200 includes a language designating section 210, a scene designating section 220, and an example designating section 230. The language designating section 210 designates a target language. The scene designating section designates a scene in which the phrase to be translated is used, based on a scene prepared in advance. The example specifying unit 230 is for selecting and specifying an example intended by the user from the example candidates extracted by the translation apparatus 100.

【００３８】音声認識部１２０は、ＡＤ変換部１２１、
パワー分析部１２２、音声区間検出部１２３、および特
徴抽出部１２４を備えている。The voice recognition section 120 includes an AD conversion section 121,
The power analysis unit 122, the voice section detection unit 123, and the feature extraction unit 124 are provided.

【００３９】ＡＤ変換部１２１は、音声入力部１１０に
入力された音声（アナログ信号）をＡ／Ｄ変換するもの
で、この音声を時系列で示すデジタル音声信号（例え
ば、ＰＣＭ信号）に変換する。The AD converter 121 A / D converts the voice (analog signal) input to the voice input unit 110, and converts this voice into a digital voice signal (for example, a PCM signal) shown in time series. .

【００４０】パワー分析部１２２は、ＡＤ変換部１２１
でＡ／Ｄ変換されたデジタル音声信号を、所定の時間間
隔（２．０〜４．０ミリ秒）で音声データを、時間窓な
どのハミング窓によって複数のフレームに区分し、区分
した各フレームから音声データを切り出してパワー成分
を求める。The power analysis section 122 includes an AD conversion section 121.
A / D-converted digital audio signal is divided into a plurality of frames at predetermined time intervals (2.0 to 4.0 milliseconds) by a Hamming window such as a time window, and each divided frame is divided. The audio data is cut out from and the power component is obtained.

【００４１】音声区間検出部１２３は、パワー分析部１
２２が求めたパワー成分が所定の閾値を超えたフレーム
を音声区間として検出する。例えば、１秒以内にパワー
成分が閾値を下回ったフレームは、音声区間としない。The voice section detector 123 is a power analyzer 1.
A frame in which the power component obtained by 22 exceeds a predetermined threshold is detected as a voice section. For example, a frame in which the power component falls below the threshold within 1 second is not set as a voice section.

【００４２】特徴抽出部１２４は、音声区間検出部１２
３が検出した音声区間について、それぞれパワー分析部
１２２が算出したパワー成分に基づいて音響特徴ベクト
ルＸ（ｔ）を計算する。The feature extraction unit 124 includes a voice section detection unit 12
For the voice section detected by 3, the acoustic feature vector X (t) is calculated based on the power components calculated by the power analysis unit 122.

【００４３】文法ファイル１３０は、翻訳装置１００が
対応する言語について、それぞれの文法規則を定義した
ファイルであり、後述する単語認識処理において参照さ
れる。なお、本実施の形態では、説明を容易にするた
め、翻訳装置１００が対応できる言語数を２とし、それ
ぞれを「第１言語」および「第２言語」として説明す
る。また、本実施の形態では、第１言語を日本語、第２
言語を英語とした場合の例を説明する。The grammar file 130 is a file that defines the grammatical rules for each language supported by the translation device 100, and is referred to in the word recognition process described later. In the present embodiment, for ease of description, the number of languages that translation apparatus 100 can handle is two, and each is described as a “first language” and a “second language”. In the present embodiment, the first language is Japanese and the second language is
An example when the language is English will be described.

【００４４】モデル格納部１４０は、第１言語用音響モ
デル格納部１４１、第１言語用言語モデル格納部１４
２、第２言語用音響モデル格納部１４３、および第２言
語用言語モデル格納部１４４を備える。音響モデル格納
部１４１および１４３は、各言語について、認識対象と
なる音声を構成する全ての音素をモデル化した音素モデ
ルを格納している。音素モデルとしては、例えば隠れマ
ルコフモデル（ＨＭＭ）が適用されるものとする。ま
た、言語モデル格納部１４２および１４４は、各言語に
ついて、単語毎の音素パターン系列情報を登録した単語
辞書を格納している。The model storage unit 140 includes a first language acoustic model storage unit 141 and a first language language model storage unit 14.
2, a second language acoustic model storage unit 143 and a second language language model storage unit 144. The acoustic model storage units 141 and 143 store, for each language, a phoneme model obtained by modeling all phonemes forming the speech to be recognized. For example, a hidden Markov model (HMM) is applied as the phoneme model. In addition, the language model storage units 142 and 144 store a word dictionary in which phoneme pattern sequence information for each word is registered for each language.

【００４５】尤度計算部１５０は、音響特徴ベクトルＸ
（ｔ）とＨＭＭによる音素モデルとを参照してフレーム
毎の連続音素認識を行い、各フレームの尤度の合計が最
大となるものを暫定的に第１位候補単語として抽出す
る。The likelihood calculator 150 calculates the acoustic feature vector X
The continuous phoneme recognition is performed for each frame with reference to (t) and the phoneme model based on the HMM, and the one having the maximum total likelihood is extracted as a first-rank candidate word.

【００４６】単語出力部１６０は、尤度計算部１５０が
抽出した単語を出力する。The word output section 160 outputs the word extracted by the likelihood calculation section 150.

【００４７】用例管理部１７０は、場面名レジスタ１７
１、用例群名レジスタ１７２、および予測応答用例管理
テーブル１７３を備える。場面名レジスタ１７１は、場
面指定部２２０で指定された場面名を保持する。用例群
名レジスタ１７２は、後述する用例候補抽出処理におい
て対象となる用例群ファイル名を保持する。予測応答用
例管理テーブル１７３は、それぞれの言語について、選
出される用例候補に基づいて、その用例に対する応答を
示す用例を指定するテーブルである。すなわち、図３に
示すように、第１言語の用例群ファイルと、第２言語の
応答用例群ファイル名とが対応付けられて記録されてい
る。また、第２言語の用例群ファイルと、第１言語の応
答用例群ファイル名とを対応させた同様のテーブルも用
意されているものとする。The example management section 170 uses the scene name register 17
1, an example group name register 172, and a predicted response example management table 173. The scene name register 171 holds the scene name designated by the scene designation unit 220. The example group name register 172 holds an example group file name that is a target in an example candidate extraction process described later. The predicted response example management table 173 is a table for designating an example indicating a response to the example based on the example candidates selected for each language. That is, as shown in FIG. 3, the example group file in the first language and the response example group file name in the second language are recorded in association with each other. It is also assumed that a similar table is prepared in which the example group files in the second language and the response example group file names in the first language are associated with each other.

【００４８】用例格納部１８０は、用例データベース
（第１言語）１８１、予測応答用例データベース（第２
言語）１８２、用例データベース（第２言語）１８３、
および、予測応答用例データベース（第１言語）１８４
を備える。The example storage unit 180 includes an example database (first language) 181 and a prediction response example database (second language).
Language) 182, example database (second language) 183,
Prediction response example database (first language) 184
Equipped with.

【００４９】用例データベース１８１および１８３は、
それぞれの言語について、図４に示すような階層化した
ディレクトリ構造で用例データを格納している。この階
層構造は、翻訳を必要とする場面に基づいて構成されて
いる。本実施の形態では、第１階層〜第３階層までを場
面設定階層とし、最下層である第４階層に用例群ファイ
ルを格納するものとする。The example databases 181 and 183 are
For each language, example data is stored in a hierarchical directory structure as shown in FIG. This hierarchical structure is constructed based on the scene that needs translation. In the present embodiment, it is assumed that the first layer to the third layer are scene setting layers, and the example group file is stored in the fourth layer, which is the lowest layer.

【００５０】すなわち、図４の例では、「緊急」→「病
院で」→「症状」に対応する用例群ファイルが第４階層
に複数用意されている。ここには、各症状毎（例えば、
「頭痛」、「腹痛」など）に用例群ファイル「hospJ0
1」、「hospJ02」、「hospJ03」が用意されているもの
とする。That is, in the example of FIG. 4, a plurality of example group files corresponding to “emergency” → “at the hospital” → “symptom” are prepared in the fourth hierarchy. Here, for each symptom (for example,
"Headache", "abdominal pain", etc.)
It is assumed that 1 ”,“ hospJ02 ”, and“ hospJ03 ”are prepared.

【００５１】図５に、用例群ファイル「hospJ01」のデ
ータ構造の例を示す。この例では、用例群ファイル１
は、「頭痛」に関する用例を格納しているファイルであ
る。図示するように、用例群ファイル１には、「頭痛」
に関連する用例がｎ種類用意されている。なお、この
「hospJ01」は、「緊急」→「病院で」→「症状」とい
う階層の最下層であるため、「頭痛がします」などの病
院で頭痛を訴える際の用例が用意されている。そして、
各用例には、当該用例を導き出すためのキーワードが対
応付けられている。本実施形態では、キーワードとして
ワード１からワード５の５つを設定しており、数値が大
きくなるほど、より具体性が高いことを示している。さ
らに、各用例の第２言語（英語）での対訳例文が対応付
けられている。FIG. 5 shows an example of the data structure of the example group file "hospJ01". In this example, example file 1
Is a file that stores examples of "headache". As shown, the example group file 1 contains "headache".
There are n types of examples related to. Since this "hospJ01" is the lowest layer in the hierarchy of "emergency" → "at the hospital" → "symptoms", there are prepared examples for complaining headaches at hospitals such as "I have a headache." . And
Each example is associated with a keyword for deriving the example. In the present embodiment, five keywords, word 1 to word 5, are set as keywords, and the larger the numerical value, the higher the specificity. Furthermore, the parallel translation example sentence in the second language (English) of each example is associated.

【００５２】各用例群ファイルには、図６に示すような
スコアリングテーブルが用意されている。このスコアリ
ングテーブルは、後述する用例候補抽出処理で行われる
スコア計算の際に参照されるものであり、上述のキーワ
ード（ワード１〜ワード５）それぞれの配点を示してい
る。上述したように、ワード番号が高くなるにしたがっ
て具体性の高いワードとしているため、配点もワード番
号が大きくなるにしたがって高くなるように設定してい
る。なお、配点については各用例群ファイル毎に異なら
せてもよい。A scoring table as shown in FIG. 6 is prepared for each example group file. This scoring table is referred to in the score calculation performed in the example candidate extraction process to be described later, and shows the points assigned to each of the above keywords (word 1 to word 5). As described above, the higher the word number, the more specific the word is. Therefore, the score is also set higher as the word number increases. The distribution of points may be different for each example group file.

【００５３】予測応答用例データベース１８２および１
８４は、最終的に提示した用例に対し、相手方が返答す
る用例を蓄積している。すなわち、発話者が用いる用例
が用例データベースに蓄積されており、応答者が用いる
用例が予測応答用例データベースに蓄積されている。こ
のため、第１言語の用例データベース１８１には、第２
言語の予測応答用例データベース１８２が対応し、第２
言語の用例データベース１８３には、第１言語の予測応
答用例データベース１８４が対応するように構成されて
いる。Predictive Response Example Databases 182 and 1
Reference numeral 84 stores an example in which the other party responds to the finally presented example. That is, the examples used by the speaker are stored in the example database, and the examples used by the responder are stored in the predicted response example database. Therefore, the second language example database 181 contains the second
The language prediction response example database 182 corresponds to the second database.
The language example database 183 is configured to correspond to the predicted response example database 184 of the first language.

【００５４】ワークエリア１９０は、用例エリア（第１
言語）１９１、応答用例エリア（第２言語）１９２、用
例エリア（第２言語）１９３、および応答用例エリア
（第１言語）を備える。各エリアには、用例格納部１８
０に格納されている用例ファイルが展開され、後述する
各処理が行われる。The work area 190 is an example area (first
Language) 191, a response example area (second language) 192, an example area (second language) 193, and a response example area (first language). Each area has an example storage 18
The example file stored in 0 is expanded and each processing described later is performed.

【００５５】次に、本実施の形態にかかる翻訳装置１０
０の動作を図面を参照して説明する。Next, the translation device 10 according to the present embodiment
The operation of 0 will be described with reference to the drawings.

【００５６】図７は、翻訳装置１００の動作を説明する
ためのフローチャートである。まず、翻訳装置１００の
出力部３００に、第１言語または第２言語を選択する画
面を表示し、入力部２００を操作することで一方を選択
する（ステップＳ１０１）。すなわち、入力部２００の
動作により、言語指定部２１０がいずれかの言語を設定
する。ここでは、第１言語（日本語）が設定された場合
を例に説明する。FIG. 7 is a flow chart for explaining the operation of translation apparatus 100. First, a screen for selecting the first language or the second language is displayed on the output unit 300 of the translation device 100, and one is selected by operating the input unit 200 (step S101). That is, according to the operation of the input unit 200, the language designation unit 210 sets any language. Here, a case where the first language (Japanese) is set will be described as an example.

【００５７】次に、出力部３００に、場面を指定するた
めの場面メニューが図８に示すように表示される（ステ
ップＳ１０２）。使用者は、入力部２００を操作して所
望の場面を選択する。なお、上述のように、場面は階層
構造となっているので、場面指定の最下層である第３階
層に用意された場面（以下、「詳細場面」とする）が選
択されるまで、所定のメニュー画面が表示される。Next, a scene menu for designating a scene is displayed on the output unit 300 as shown in FIG. 8 (step S102). The user operates the input unit 200 to select a desired scene. As described above, since the scenes have a hierarchical structure, a predetermined scene (hereinafter referred to as “detail scene”) is selected until the scene prepared in the third hierarchy, which is the lowest layer of scene specification, is selected. The menu screen is displayed.

【００５８】詳細場面が選択されると（ステップＳ１０
３：Ｙｅｓ）、場面指定部２２０は、選択された場面の
場面名を用例管理部１７０の場面名レジスタ１７１に格
納する（ステップＳ１０４）。以下、詳細場面「症状」
が選択された場合を例に説明する。When the detailed scene is selected (step S10)
3: Yes), the scene designation unit 220 stores the scene name of the selected scene in the scene name register 171 of the example management unit 170 (step S104). Below, detailed scene "symptoms"
The case where is selected will be described as an example.

【００５９】翻訳装置１００は、場面名レジスタ１７１
に格納された場面名「症状」に対応する用例群ファイル
（すなわち、「hospJ01」、「hospJ02」、「hospJ0
3」）を用例データベース（第１言語）１８１から取得
して、用例エリア（第１言語）１９１に格納する（ステ
ップＳ１０５）。The translation device 100 uses the scene name register 171.
Example group files (that is, "hospJ01", "hospJ02", "hospJ0" corresponding to the scene name "symptom" stored in
3 ”) is acquired from the example database (first language) 181 and stored in the example area (first language) 191 (step S105).

【００６０】次に、翻訳装置１００は、出力部３００
に、音声入力部１１０への音声入力を促すメッセージを
表示する。これにより、使用者による音声入力がなされ
ると（ステップＳ１０６：Ｙｅｓ）、音声認識処理を開
始する（ステップＳ２００）。なお、所定時間以上音声
入力がなされない場合（ステップＳ１０６：Ｎｏ、ステ
ップＳ１０７：Ｙｅｓ）は、処理を終了する。Next, the translation apparatus 100 has the output unit 300.
A message prompting voice input to the voice input unit 110 is displayed on. As a result, when the user inputs a voice (step S106: Yes), the voice recognition process is started (step S200). If no voice is input for a predetermined time or more (step S106: No, step S107: Yes), the process ends.

【００６１】次に、ステップＳ２００の音声認識処理
を、図９のフローチャートを参照して説明する。ここで
は、音声文「頭がすごく痛いが」が音声入力部１１０に
入力された場合を例に説明する。Next, the voice recognition process of step S200 will be described with reference to the flowchart of FIG. Here, a case will be described as an example in which the voice sentence “Head is very painful” is input to the voice input unit 110.

【００６２】まず、ＡＤ変換部１２１が、入力された音
声「頭がすごく痛いが」をＡ／Ｄ変換し、パワー分析部
１２２が、変換されたデジタル音声信号を所定時間毎で
複数のフレームに区分し、各フレームのパワー成分を算
出する（ステップＳ２０１）。First, the AD converter 121 A / D-converts the input voice "Head is very painful", and the power analyzer 122 converts the converted digital voice signal into a plurality of frames at predetermined time intervals. Then, the power components of each frame are calculated (step S201).

【００６３】次に、ステップＳ２０１でのパワー分析に
基づいて、音声区間検出部１２３が、音声区間検出を行
う（ステップＳ２０２）。ステップＳ２０１で算出した
各フレームのパワー成分を所定の閾値と比較し、閾値を
上回ったフレームを音声区画として抽出する。Next, the voice section detector 123 detects the voice section based on the power analysis in step S201 (step S202). The power component of each frame calculated in step S201 is compared with a predetermined threshold value, and a frame exceeding the threshold value is extracted as a voice segment.

【００６４】この処理で候補の単語を抽出するのに必要
なだけの音声区画が検出されたかどうかを判定する（ス
テップＳ２０３）。必要なだけの音声区画が検出されて
いなければ（ステップＳ２０３：Ｎｏ）、処理を終了す
る。必要なだけの音声区間が検出された場合には、特徴
抽出部１２４は、ステップＳ２０１で算出した各フレー
ムのパワー成分に基づいて、ステップＳ２０３で抽出し
た音声区間の音響特徴ベクトルＸ（ｔ）を計算する（ス
テップＳ２０４）。In this process, it is determined whether or not the necessary voice sections for extracting the candidate words are detected (step S203). If the required number of voice sections has not been detected (step S203: No), the process ends. When the necessary number of voice sections is detected, the feature extraction unit 124 determines the acoustic feature vector X (t) of the voice section extracted in step S203 based on the power component of each frame calculated in step S201. Calculate (step S204).

【００６５】次に、尤度計算部１５０は、ステップＳ２
０４で算出した音声特徴ベクトルＸ（ｔ）と、第１言語
用音響モデル格納部１４１に格納されている音素モデル
としての隠れマルコフモデル（ＨＭＭ）と、文法ファイ
ル１３０に格納されている接続規則に従って、累積尤度
計算を行う（ステップＳ２０５）。Next, the likelihood calculating section 150 determines in step S2.
According to the speech feature vector X (t) calculated in 04, the hidden Markov model (HMM) as a phoneme model stored in the first language acoustic model storage unit 141, and the connection rule stored in the grammar file 130. , Cumulative likelihood calculation is performed (step S205).

【００６６】このＨＭＭモデルの接続規則について図１
０を参照して説明する。この規則は、音声文に含まれる
単語をワードスポッテイング法を用いて認識する時のＨ
ＭＭモデルの接続関係を定義するものであり、予め文法
ファイル１３０に格納されており、グループという概念
があり、各グループ内に１つ以上の単語を含む構成にな
っている。その構成は、認識対象の単語である複数の単
語（グループ）の前後に無音とガベージモデル（全ての
音素に対応するモデル）を置き、前後に無関係な発話が
あってもそれを吸収するようにするものである。Regarding the connection rule of this HMM model, FIG.
This will be described with reference to 0. This rule is used when recognizing a word included in a voice sentence by using the word spotting method.
It defines the connection relationship of the MM model, is stored in advance in the grammar file 130, has the concept of a group, and each group includes one or more words. Its structure is to put silence and a garbage model (model corresponding to all phonemes) before and after a plurality of words (groups) that are the recognition target words, and absorb the unrelated utterances before and after. To do.

【００６７】ここで、以下の条件にしたがって、最大尤
度値の算出が行われる（ステップＳ３００）。・１フレ
ーム目から尤度計算を行うのは接続規則に定義された開
始グループ（０，１，３グループ）に含まれている単語
のみである。・第０グループには、Ｂという単語（ここ
でＢもＥも単語）だけが含まれている。また第０グルー
プは、第０グループ、第１グループ、第３グループに変
移する可能性があり、Ｂの最終状態は、第０グループ、
第１グループ、第３グループに含まれる全ての単語の先
頭の状態に遷移しており、通常のビタビアルゴニズム
で、累計尤度が大きい遷移が優先され、累積尤度計算が
時間ごとに進められる。Here, the maximum likelihood value is calculated according to the following conditions (step S300). The likelihood calculation from the first frame is performed only on the words included in the start group (0, 1, 3 group) defined in the connection rule. -The 0th group includes only the word B (here, both B and E are words). Further, the 0th group may be changed to the 0th group, the 1st group, and the 3rd group, and the final state of B is the 0th group,
All the words included in the first group and the third group have transited to the leading states, and in the normal Viterbi algorithm, the transition having a large cumulative likelihood is prioritized, and the cumulative likelihood calculation is advanced every time.

【００６８】この処理を、図１１に示すフローチャート
を参照して説明する。すなわに、各グループ毎に遷移先
グループを判定（ステップＳ３０１）して、最大尤度を
示す単語を判別する（ステップＳ３０２）。そして、ス
テップＳ３０２で判別された単語の最大尤度値を取得す
る（ステップＳ３０３）。This processing will be described with reference to the flowchart shown in FIG. That is, the transition destination group is determined for each group (step S301), and the word indicating the maximum likelihood is determined (step S302). Then, the maximum likelihood value of the word determined in step S302 is acquired (step S303).

【００６９】この際、後処理のバックトレースに必要な
データとして、各フレームごとに各グループの全ての単
語のうち、最終状態の尤度が最も大きいものはどの単語
か、その単語の尤度の値、その単語の開始フレームの値
を求め、保存しておく（ステップＳ２０６）。At this time, as data necessary for the back trace of the post-processing, of all the words in each group for each frame, which word has the highest likelihood in the final state, and the likelihood of that word, The value and the value of the start frame of the word are obtained and stored (step S206).

【００７０】全フレームの累積尤度計算が終了したら
（ステップＳ２０７：Ｙｅｓ）、保存しておいた尤度計
算結果を遡ること（バックトレース）で認識結果を求め
る（ステップＳ２０８）。When the cumulative likelihood calculation of all the frames is completed (step S207: Yes), the recognition result is obtained by tracing back the stored likelihood calculation result (backtrace) (step S208).

【００７１】最終グループである１，２，４の各グルー
プに含まれる全ての単語の尤度のうち、最も累積尤度の
高いものを求める。例えば、第２グループ（Ｅ）であっ
た場合、Ｅが出力になり、Ｅの開始フレームまで遡り、
その時点で同様に第２グループに接続している１，２，
４のグループに含まれる全ての単語の尤度のうち最も大
きいものを求める。Among the likelihoods of all words included in each of the final groups 1, 2, and 4, the one with the highest cumulative likelihood is obtained. For example, in the case of the second group (E), E becomes an output and traces back to the start frame of E,
At that time, 1, 2, which are also connected to the second group
The maximum likelihood of all the words included in the group of 4 is obtained.

【００７２】このようにして、開始フレームの０フレー
ムまで遡ると（ステップＳ２０９：Ｙｅｓ）、最大尤度
を示したルートが出力となる。In this way, when the start frame is traced back to 0 frame (step S209: Yes), the route showing the maximum likelihood is output.

【００７３】この認識結果から１発話につき最大３個の
キーワードを検出する（ステップＳ２１０）。From this recognition result, a maximum of three keywords are detected per utterance (step S210).

【００７４】本実施例では、入力された音声文が「頭が
すごく痛いが」であるので、図１２に示すように、認識
結果として「ＢＢＢあたまＧすごくＧいたいＧＧＥＥ
Ｅ」が取得される。ここで、Ｂは発音前の無音を示し、
Ｇはキーワード以外の発話を吸収するガベージモデルを
示し、Ｅは発話後の無音を示すので、抽出されるキーワ
ードは「あたま」、「すごく」、「いたい」となる。In the present embodiment, since the input voice sentence is "My head hurts a lot," as shown in FIG. 12, as the recognition result, "BBB head G very G want GGEE".
E ”is acquired. Here, B indicates silence before pronunciation,
G indicates a garbage model that absorbs utterances other than keywords, and E indicates silence after utterances, so the extracted keywords are "Atama,""very," and "taitai."

【００７５】次に、ステップＳ２００の音声認識処理の
結果に基づいて、用例候補抽出処理（ステップＳ４０
０）が実行される。この用例候補抽出処理を、図１３の
フローチャートを参照して説明する。Next, based on the result of the voice recognition process of step S200, the example candidate extraction process (step S40)
0) is executed. This example candidate extraction process will be described with reference to the flowchart in FIG.

【００７６】尤度計算部１５０は、上述の音声認識処理
で抽出した単語、すなわち「あたま」、「すごく」、
「いたい」を、単語出力部１６０に出力する（ステップ
Ｓ４０１）。Likelihood calculation section 150 uses the words extracted by the above-mentioned speech recognition process, that is, "Atama", "very",
“I want” is output to the word output unit 160 (step S401).

【００７７】次に、用例群名レジスタに、対象となる用
例群ファイルの先頭ファイル名を設定する（ステップＳ
４０２）。上記の例では、「症状」に対応する用例群フ
ァイル「hospJ01」、「hospJ01」、「hospJ03」のうち
の先頭ファイルを示す「hospJ01」が設定される。Next, the head file name of the target example group file is set in the example group name register (step S
402). In the above example, “hospJ01” indicating the first file of the example group files “hospJ01”, “hospJ01”, and “hospJ03” corresponding to the “symptom” is set.

【００７８】したがって、ステップＳ１０５で用例エリ
ア（第１言語）１９１に格納された用例群ファイルのう
ち、「hospJ01」に対応するものを用いて以下の処理が
行われる。Therefore, the following processing is performed using the example group file stored in the example area (first language) 191 in step S105 and corresponding to "hospJ01".

【００７９】「hospJ01」の各用例毎に、対応するキー
ワードとステップＳ４０１で取得した単語とを個々に比
較し、一致するものがあるか否かを判別する（ステップ
Ｓ４０３，４０４）。ここで一致するものがある場合、
図５に示すスコアリングテーブルに設定されている配点
に基づいてスコア計算を行う（ステップＳ４０５）。For each example of "hospJ01", the corresponding keyword and the word acquired in step S401 are individually compared to determine whether there is a match (steps S403, 404). If there is a match here,
Score calculation is performed based on the points assigned in the scoring table shown in FIG. 5 (step S405).

【００８０】このスコア計算処理を図１４を参照して説
明する。まず、ステップＳ４０１で取得された単語「あ
たま」、「すごく」、「いたい」と、用例Ｎｏ．１のキ
ーワードとを比較すると、「あたま」と「いたい」が一
致する。ここで、用例Ｎｏ．１では、「あたま」がワー
ド１、「いたい」がワード４に設定されているので、ス
コアリングポイントテーブルに基づいて、「あたま」の
８０点と「いたい」の９５点の計１７５点が付けられ
る。This score calculation processing will be described with reference to FIG. First, with the words "Atama,""very," and "I want to" acquired in step S401, the example No. When the keyword 1 is compared, "Atama" and "Itai" match. Here, the example No. In 1, "Atama" is set to word 1 and "Itai" is set to word 4, so based on the scoring point table, "Atama" has 80 points and "Itai" has 95 points, for a total of 175 points. Is attached.

【００８１】同様にして、「hospJ01」ファイルのすべ
ての用例についてスコア計算を行う（ステップＳ４０
６：Ｎｏ、Ｓ４０７）。すべての用例についてスコア計
算が終了すると（ステップＳ４０６：Ｙｅｓ）、次の用
例群ファイルについて同様の処理を行う（ステップＳ４
０８：Ｎｏ、Ｓ４０９）。Similarly, score calculation is performed for all examples of the "hospJ01" file (step S40).
6: No, S407). When the score calculation is completed for all the examples (step S406: Yes), the same process is performed for the next example group file (step S4).
08: No, S409).

【００８２】すべての用例群ファイルについて上述の処
理が終了し（ステップＳ４０８：Ｙｅｓ）、単語の一致
が検出されなかった場合（ステップＳ４１０：Ｎｏ）
は、ステップＳ１１１（図７）にもどり、出力部３００
に音声再入力要求メッセージを表示する。When the above process is completed for all the example group files (step S408: Yes) and no word match is detected (step S410: No).
Returns to step S111 (FIG. 7) and returns to the output unit 300.
A voice re-input request message is displayed on.

【００８３】一方、単語の一致があり、スコア付与がな
された場合（ステップＳ４１０：Ｙｅｓ）は、付与され
たスコアの上位から所定数の用例を候補として取得し
（ステップＳ４１１）、対応する対訳用例とともに用例
エリア（第１言語）１９１に格納して、図７に示すフロ
ーに戻る。本実施の形態では上位３つの用例を取得する
ものとする。On the other hand, when there is a word match and a score is assigned (step S410: Yes), a predetermined number of examples are acquired as candidates from the top of the assigned score (step S411), and the corresponding parallel translation example is used. It is also stored in the example area (first language) 191 and the process returns to the flow shown in FIG. In this embodiment, the top three examples are acquired.

【００８４】ステップＳ４１１で取得された候補用例
が、図１５に示すように出力部３００に表示される（ス
テップＳ１０８）。ここで、３つの候補用例は選択可能
に表示されている。すなわち、入力部２００を操作する
ことで、音声入力時に発話した文と最も近い用例を選択
することができる。The candidate examples acquired in step S411 are displayed on the output unit 300 as shown in FIG. 15 (step S108). Here, the three candidate examples are displayed so that they can be selected. That is, by operating the input unit 200, it is possible to select an example closest to the sentence spoken at the time of voice input.

【００８５】ここで、用例の選択がされず、所定の取消
動作が行われた場合（ステップＳ１０９：Ｎｏ、Ｓ１１
０：Ｙｅｓ）は、出力部３００に所定の音声再入力要求
メッセージを表示する（ステップＳ１１１）。Here, when the predetermined cancel operation is performed without selecting the example (step S109: No, S11).
0: Yes) displays a predetermined voice re-input request message on the output unit 300 (step S111).

【００８６】一方、用例が選択された場合（ステップＳ
１０９：Ｙｅｓ）は、用例エリア（第１言語）１９１か
ら、選択された用例に対応する対訳用例が取得され、図
１６に示すように出力部３００に表示される（ステップ
Ｓ１１２）。On the other hand, when the example is selected (step S
109: Yes), the parallel translation example corresponding to the selected example is acquired from the example area (first language) 191, and is displayed on the output unit 300 as shown in FIG. 16 (step S112).

【００８７】対訳が表示されると、用例群ファイル更新
処理（ステップＳ５００）が実行される。この用例群フ
ァイル更新処理を図１７に示すフローチャートを参照し
て説明する。When the parallel translation is displayed, the example group file update process (step S500) is executed. This example group file update process will be described with reference to the flowchart shown in FIG.

【００８８】まず、ステップＳ１０９で選択された用例
に対応する用例群ファイル名を用例群名レジスタ１７２
から取得する（ステップＳ５０１）。First, the example group name register 172 stores the example group file name corresponding to the example selected in step S109.
(Step S501).

【００８９】次に、予測応答用例管理テーブル１７３を
参照し、ステップＳ５０１で取得した用例群ファイル名
に対応した、第２言語についての用例群ファイル名（予
測応答用例群名）を取得する（ステップＳ５０２）。Next, referring to the predicted response example management table 173, the example group file name (predicted response example group name) for the second language corresponding to the example group file name obtained in step S501 is obtained (step). S502).

【００９０】ステップＳ５０２で取得した予測応答用例
群名を用例群名レジスタ１７２に設定する（ステップＳ
５０３）。The predicted response example group name acquired in step S502 is set in the example group name register 172 (step S).
503).

【００９１】ステップＳ５０３で設定した予測応答用例
群名に対応するすべての用例群ファイルを、予測応答用
例データベース（第２言語）１８２から取得し、応答用
例エリア（第２言語）１９２に格納して処理を終了する
（ステップＳ５０４、ステップＳ５０５）。All example group files corresponding to the predicted response example group names set in step S503 are acquired from the predicted response example database (second language) 182 and stored in the response example area (second language) 192. The process ends (steps S504 and S505).

【００９２】この後、ステップＳ１１２で表示した対訳
を相手方に提示または発話する。図１６に示すように、
出力部３００には、表示された対訳に対する応答を発話
するよう表示されている。したがって、応答者は、表示
された"I have a terrible headache."（頭がすごく痛
みます）に応答する音声文を、音声入力部１１０に入力
する。After this, the parallel translation displayed in step S112 is presented or spoken to the other party. As shown in FIG.
The output unit 300 is displayed to speak a response to the displayed parallel translation. Therefore, the responder inputs, into the voice input unit 110, a voice sentence responding to the displayed “I have a terrible headache.” (The head is very painful).

【００９３】以降は、第２言語（英語）について上述し
た処理と同様の処理（音声認識処理、用例候補抽出処
理、用例選択、選択用例の対訳表示）が行われる。な
お、ステップＳ５０４にて、予測された応答用例が応答
用例エリア（第２言語）１９２にすでに格納されている
ので、この中から応答用例候補が選出されることにな
る。すなわち、図１８（ａ）に示すように、３つの第２
言語の応答用例候補が出力部３００に選択可能に表示さ
れる。そして応答者が、入力部２００を操作して、最も
ふさわしい用例を選択する。選択された応答用例の対訳
（第１言語：日本語）が、図１８（ｂ）に示すように出
力部３００に表示される。After that, the same processing (speech recognition processing, example candidate extraction processing, example selection, parallel translation display of selected examples) as the above-described processing is performed for the second language (English). In step S504, since the predicted response example is already stored in the response example area (second language) 192, the response example candidate is selected from among them. That is, as shown in FIG. 18A, the three second
Language response example candidates are displayed on the output unit 300 in a selectable manner. Then, the responder operates the input unit 200 to select the most suitable example. A parallel translation (first language: Japanese) of the selected response example is displayed on the output unit 300 as shown in FIG.

【００９４】以上説明したように、本発明の実施の形態
にかかる翻訳装置１００によれば、発話した内容に近い
用例候補を使用者に選択させ、選択された用例の対訳を
表示するとともに、その用例に対する応答を予め予測し
て音声認識するので、的確かつ高速に翻訳処理を行うこ
とができる。As described above, according to the translation apparatus 100 according to the embodiment of the present invention, the user is allowed to select an example candidate close to the uttered content, the bilingual translation of the selected example is displayed, and Since the response to the example is predicted in advance and the voice is recognized, the translation process can be performed accurately and at high speed.

【００９５】上記実施の形態では、出力部３００を、例
えば表示装置によって構成し、選択された用例の対訳を
表示するものであった。これに対して、出力部３００を
何らかの電子回路に接続させ、出力部３００からの出力
結果に従って当該電子回路を動作させるものとしてもよ
い。例えば、出力部３００に音声合成装置を接続させる
ことで、対訳を音声出力してもよい。In the above-described embodiment, the output unit 300 is composed of, for example, a display device to display the parallel translation of the selected example. On the other hand, the output unit 300 may be connected to some electronic circuit and the electronic circuit may be operated according to the output result from the output unit 300. For example, by connecting a voice synthesizer to the output unit 300, the parallel translation may be voice output.

【００９６】上記翻訳装置１００は、携帯型装置から構
成されるものとして説明したが、この場合、翻訳装置１
００は専用装置でなくてもよい。例えば、汎用のＰＤＡ
（Personal Digital Assistance：携帯情報端末）に、
上記各処理を実現するためのプログラムをインストール
することにより、本実施の形態にかかる翻訳装置１００
を実現してもよい。また、携帯電話やＰＨＳ（Personal
Handyphone System）端末などの移動体通信端末に同様
のプログラムを搭載することで、上記翻訳装置１００と
して機能させてもよい。The translation device 100 has been described as a portable device, but in this case, the translation device 1
00 does not have to be a dedicated device. For example, general-purpose PDA
(Personal Digital Assistance)
By installing a program for realizing each of the above processes, translation apparatus 100 according to the present embodiment
May be realized. In addition, mobile phones and PHS (Personal
A similar program may be installed in a mobile communication terminal such as a Handyphone System) terminal to cause it to function as the translation device 100.

【００９７】上記実施の形態では、説明を容易にするた
め、対応言語を第１言語（日本語）と第２言語（英語）
の２つとしたが、多くの言語の中から、当該話者同士が
使用する言語を選択するように構成してもよい。この場
合、モデル格納部１４０、用例格納部１８０、ワークエ
リア１９０などを適宜拡張することで対応できる。In the above embodiment, the corresponding languages are the first language (Japanese) and the second language (English) for ease of explanation.
However, the language used by the speakers may be selected from many languages. In this case, the model storage unit 140, the example storage unit 180, the work area 190, and the like can be appropriately expanded.

【００９８】あるいは、翻訳装置１００に通信機能を持
たせることで、これらの記憶領域を、例えば、外部のホ
ストコンピュータなどに用意し、インターネットなどを
介して処理を行うようにしてもよい。このような構成に
より、記憶容量に制限のある携帯型装置であっても、よ
り多くの言語に対応することができる。Alternatively, by providing the translation device 100 with a communication function, these storage areas may be prepared in, for example, an external host computer and the like, and the processing may be performed via the Internet or the like. With such a configuration, even a portable device having a limited storage capacity can support more languages.

【００９９】また、上記実施の形態では、言語選択を手
動で設定していたが、言語選択の方法は任意である。例
えば、対訳が表示されたことを契機に自動的に言語を切
り替えるようにしてもよい。In the above embodiment, the language selection is manually set, but the language selection method is arbitrary. For example, the language may be automatically switched when the parallel translation is displayed.

【０１００】あるいは、発話者がいずれであるかを判別
して、言語を切り替えるようにしてもよい。この場合、
例えば、音声入力部１１０を、指向性のあるマイクロフ
ォンなどで構成し、さらに翻訳装置１００に少なくとも
２個備えるように構成する。そして、これら複数個のマ
イクロフォンで音声を取得する際、その到達時間の相違
から発話者の方向を検知するように構成する。このよう
な構成とすることで、１つの発話の終了後に、異なる方
向からの音声が検出されることを契機に、言語を切り替
えるように構成することができる。Alternatively, the language may be switched by discriminating which is the speaker. in this case,
For example, the voice input unit 110 is configured by a directional microphone or the like, and further, the translation device 100 is configured to include at least two voice input units. Then, when the voice is acquired by the plurality of microphones, the direction of the speaker is detected from the difference in the arrival time. With such a configuration, it is possible to switch the language upon the detection of voices from different directions after the end of one utterance.

【０１０１】あるいは、翻訳装置１００が出力する情報
の出力対象を判別することで、言語切替を行ってもよ
い。すなわち、候補文例が表示されているときは、発話
者が画面を見ており、対訳文例が表示されているとき
は、相手方に画面を提示していると考えられる。したが
って、例えば、候補文例が表示されているとき、およ
び、対訳文例が表示されているときの翻訳装置１００の
方向を判別するように構成し、この方向の変化に基づい
て、対象言語を切り替えるようにしてもよい。この場
合、翻訳装置１００の方向に応じて、出力部３００上で
の表示方向を、たとえば、上下反転するようにしてもよ
い。これにより、二者間で翻訳装置１００を受け渡しす
る場合に、翻訳装置１００自体を反転させる必要がな
い。Alternatively, the language may be switched by determining the output target of the information output by the translation apparatus 100. That is, it is considered that the speaker is looking at the screen when the candidate sentence example is displayed, and the screen is presented to the other party when the bilingual sentence example is displayed. Therefore, for example, it is configured to determine the direction of the translation device 100 when the candidate sentence example is displayed and when the bilingual sentence example is displayed, and the target language is switched based on the change in this direction. You may In this case, the display direction on the output unit 300 may be vertically inverted, for example, depending on the direction of the translation device 100. Thereby, when handing over the translation device 100 between two parties, it is not necessary to invert the translation device 100 itself.

【０１０２】また、翻訳装置１００にＧＰＳ（Global P
ositioning System：全地球測位システム）に対応した
受信装置を備えることで、現在位置に応じて使用言語を
自動的に設定するようにしてもよい。この場合、翻訳装
置１００の所有者の使用言語はデフォルトで設定してお
き、相手方の言語を適宜自動設定する。このような構成
によれば、例えば、同一国内において、地方毎に使用言
語が異なるような場合でも、使用者が意識することなく
適切な言語を設定することができる。In addition, the translation device 100 has a GPS (Global P
ositioning System: A global positioning system may be provided to automatically set the language used according to the current position. In this case, the language used by the owner of the translation apparatus 100 is set by default, and the language of the other party is automatically set as appropriate. According to such a configuration, for example, even when the language used in each region is different in the same country, an appropriate language can be set without the user's awareness.

【０１０３】上記のように、所有者の使用言語をデフォ
ルトで固定設定しておく他に、音声認識を用いて設定す
るように構成してもよい。例えば、翻訳装置１００の起
動時に、出力部３００（表示装置）に、所定の文言（例
えば、あいさつなど）を母国語で発話するよう促すメッ
セージを、複数言語で表示する。このメッセージに応じ
て発話された文言を音声認識部１２０が音声認識するこ
とによって言語を判別する。この場合、モデル格納部１
４０に格納されている、すべての言語についての音響モ
デルおよび言語モデルを参照することで、言語判別を行
う。そして、判別された言語を対象言語として設定する
ことができる。As described above, the language used by the owner may be fixed by default, or may be set by using voice recognition. For example, when the translation apparatus 100 is activated, a message prompting the user to speak a predetermined language (eg, greetings) in his native language is displayed on the output unit 300 (display device) in multiple languages. The language is identified by the voice recognition unit 120 voice-recognizing the word spoken according to this message. In this case, the model storage unit 1
Language discrimination is performed by referring to acoustic models and language models for all languages stored in 40. Then, the determined language can be set as the target language.

【０１０４】なお、上記の翻訳装置１００は、携帯型装
置に限らず、パーソナルコンピュータなどの汎用コンピ
ュータをプラットフォームとしても実現することができ
る。例えば、音声入力部１１０及び出力部３００は、そ
れぞれ汎用コンピュータに接続されるマイクロフォン、
表示装置によって実現される。文法ファイル１３０、モ
デル格納部１４０、ワークエリア１９０は、それぞれに
対応した領域がメモリに確保されることによって実現さ
れる。モデル格納部１４０が格納するモデル、音声認識
処理を行う前に予め外部装置から読み込まれてメモリに
記憶されるものである。パワー分析部１２２、音声区間
検出部１２３、特徴抽出部１２４、および尤度計算部１
５０は、ＣＰＵ（Central Processing Unit）がメモリ
に記憶されたプログラムを実行することによって実現さ
れる。The translation device 100 described above is not limited to a portable device, and can be realized by using a general-purpose computer such as a personal computer as a platform. For example, the voice input unit 110 and the output unit 300 are each a microphone connected to a general-purpose computer,
It is realized by a display device. The grammar file 130, the model storage unit 140, and the work area 190 are realized by allocating areas corresponding to the grammar file 130, the model storage unit 140, and the work area 190 in a memory. The model stored in the model storage unit 140 is read in advance from an external device and stored in the memory before performing the voice recognition process. Power analysis unit 122, voice section detection unit 123, feature extraction unit 124, and likelihood calculation unit 1
50 is implemented by a CPU (Central Processing Unit) executing a program stored in a memory.

【０１０５】このような構成によれば、本発明にかかる
音声翻訳方法を、例えば、インターネットなどを介し
て、他言語の人とコミュニケーションをする場合などに
適用することができる。With such a configuration, the speech translation method according to the present invention can be applied, for example, when communicating with a person in another language via the Internet or the like.

【０１０６】[0106]

【発明の効果】以上説明したように、本発明によれば、
的確な対訳を高速に出力することができる。As described above, according to the present invention,
An accurate parallel translation can be output at high speed.

[Brief description of drawings]

【図１】本発明の実施の形態にかかる翻訳装置の外観を
示す図である。FIG. 1 is a diagram showing an external appearance of a translation device according to an embodiment of the present invention.

【図２】図１の翻訳装置の構成を示すブロック図であ
る。FIG. 2 is a block diagram showing a configuration of the translation device of FIG.

【図３】図２に示す予測応答用例管理テーブルに記録さ
れる情報の例を示す図である。FIG. 3 is a diagram showing an example of information recorded in a predicted response example management table shown in FIG.

【図４】図２に示す用例データベースに記録される情報
の構成を説明するための図である。FIG. 4 is a diagram for explaining the structure of information recorded in the example database shown in FIG.

【図５】図２に示す用例データベースに記録される用例
群ファイルの一例を示す図である。5 is a diagram showing an example of an example group file recorded in the example database shown in FIG.

【図６】図５に示す用例群ファイルとともに用例データ
ベースに記録されるスコアリングテーブルの例を示す図
である。6 is a diagram showing an example of a scoring table recorded in an example database together with the example group file shown in FIG.

【図７】本発明の実施の形態にかかる翻訳装置の動作を
説明するためのフローチャートである。FIG. 7 is a flowchart for explaining an operation of the translation device according to the embodiment of the present invention.

【図８】図７に示す処理において、翻訳装置に表示され
る場面設定画面の表示例を示す図である。8 is a diagram showing a display example of a scene setting screen displayed on the translation device in the processing shown in FIG. 7. FIG.

【図９】図７に示す処理における、音声認識処理を説明
するためのフローチャートである。9 is a flowchart for explaining a voice recognition process in the process shown in FIG.

【図１０】図９に示す音声認識処理に用いられるＨＭＭ
モデルの接続規則を説明するための図である。10 is an HMM used in the speech recognition process shown in FIG.
It is a figure for demonstrating the connection rule of a model.

【図１１】図９に示す音声認識処理における、最大尤度
算出処理を説明するためのフローチャートである。FIG. 11 is a flowchart for explaining maximum likelihood calculation processing in the voice recognition processing shown in FIG. 9.

【図１２】図９に示す音声認識処理において、認識単語
が出力される仕組みを説明するための図である。FIG. 12 is a diagram for explaining a mechanism of outputting a recognized word in the voice recognition process shown in FIG. 9.

【図１３】図７に示す処理における、用例候補抽出処理
を説明するためのフローチャートである。FIG. 13 is a flowchart illustrating an example candidate extraction process in the process shown in FIG. 7.

【図１４】図１３に示す用例候補抽出処理におけるスコ
ア計算処理を説明するための図である。FIG. 14 is a diagram for explaining a score calculation process in the example candidate extraction process shown in FIG.

【図１５】図７に示す処理で表示される用例候補表示画
面の表示例を示す図である。15 is a diagram showing a display example of an example candidate display screen displayed by the processing shown in FIG.

【図１６】図７に示す処理で表示される対訳表示画面の
表示例を示す図である。16 is a diagram showing a display example of a parallel translation display screen displayed by the processing shown in FIG.

【図１７】図７に示す処理における、用例群ファイル更
新処理を説明するためのフローチャートである。FIG. 17 is a flowchart illustrating an example group file update process in the process shown in FIG. 7.

【図１８】本発明の実施の形態にかかる翻訳装置におい
て、応答者の操作により表示される画面の表示例を示す
図であり、（ａ）は用例候補表示画面の表示例を示し、
（ｂ）は対訳表示画面の表示例を示す。FIG. 18 is a diagram showing a display example of a screen displayed by an operation of a responder in the translation device according to the embodiment of the present invention, (a) showing a display example of an example candidate display screen,
(B) shows a display example of the parallel translation display screen.

[Explanation of symbols]

１００…翻訳装置、１１０…音声入力部、１２０…音声
認識部、１４０…モデル格納部、１５０…尤度計算部、
１７０…用例管理部、１８０…用例格納部、２００…入
力部、３００…出力部100 ... Translation device, 110 ... Voice input unit, 120 ... Voice recognition unit, 140 ... Model storage unit, 150 ... Likelihood calculation unit,
170 ... Example management unit, 180 ... Example storage unit, 200 ... Input unit, 300 ... Output unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/22 Ｇ１０Ｌ 3/00 ５７１Ｕ５７１ＶＦターム(参考） 5B091 AA03 BA19 CB12 CB24 CC15 DA11 5D015 AA05 AA06 BB01 BB02 HH00 KK02 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/22 G10L 3/00 571U 571V F term (reference) 5B091 AA03 BA19 CB12 CB24 CC15 DA11 5D015 AA05 AA06 BB01 BB02 HH00 KK02

Claims

[Claims]

1. A translation device for translating between a pair of other languages among a plurality of natural languages, a language setting means for setting one of the pair of other languages among the plurality of languages, and the language. A voice recognition unit that acquires a voice sentence in the language set by the setting unit and recognizes the voice, and a word recognition unit that recognizes a word included in the obtained voice sentence based on the voice recognition result by the voice recognition unit, For each language, a plurality of sentence examples, at least one word associated with each sentence example, and a parallel example sentence example of each sentence example, a sentence example storage unit that stores the corresponding sentence example, a word recognized by the word recognition unit , A sentence example candidate selecting means for selecting a plurality of corresponding sentence examples based on the degree of coincidence with the words stored in the sentence example storing means, and a sentence example selected by the sentence example candidate selecting means, to be selectably output to the user. A sentence example candidate output means, among the sentence examples output by the sentence example candidate output means, a bilingual sentence example corresponding to the sentence example selected by the user, parallel translation sentence example output means for acquiring and outputting from the sentence example storage means, The language that is the target of voice recognition by the voice recognition means,
A target language switching unit for switching from the language set by the language setting unit to the other language, the translation device.

2. The sentence example storage means classifies and stores sentence examples and bilingual sentence examples for each scene, and the voice recognition means includes a scene designation means for previously designating a scene in which a voice sentence as a voice recognition target is used. The translation apparatus according to claim 1, wherein the sentence example candidate selection means selects candidates from the sentence examples corresponding to the scene designated by the scene designation means.

3. A response sentence example storage unit that stores in advance a response sentence example for the parallel translation example output by the parallel translation example output unit for each scene, and responds to the scene by the scene designation by the scene designation unit. Response sentence example acquisition means for obtaining a response sentence example from the response sentence example storage means, the sentence example candidate selection means, after the switching of the target language by the target language switching means, the response acquired by the response sentence example acquisition means The translation device according to claim 2, wherein sentence example candidates are selected from the sentence examples.

4. The speech recognition means stores in advance, for each of the plurality of languages, a phoneme model in which each phoneme included in the speech is modeled and a language model in which phoneme pattern sequence information of a plurality of types of words is registered. A model storage unit is provided, and recognizes an input speech sentence by referring to the phoneme model and the language model stored in the model storage unit, and the word recognition unit becomes a recognition target. A connection that stores in advance information indicating an inter-group connection rule for defining a connection relationship in which a group indicating a silent state and a group including a word not registered in the language model are arranged before and after the group including a word. A rule storage unit is provided, and based on the inter-group connection rule stored in the connection rule storage unit, each group is selected from the voice sentences recognized by the voice recognition unit. Obtains the maximum likelihood per flop, based on the maximum likelihood calculated said, extracts words to be recognized, the translation device according to any one of claims 1 to 3, characterized in that.

5. The apparatus further comprises position information acquisition means for acquiring position information of a current position, wherein the language setting means is based on the current position information acquired by the position information acquisition means, and the language setting means acquires the position information. The translation apparatus according to any one of claims 1 to 4, wherein one of them is set.

6. The speech recognizing means for recognizing the acquired speech sentence by referring to all of the phoneme model and the language model, thereby deciding the language constituting the speech sentence. The language setting means outputs a message in a plurality of languages that prompts the user to speak a predetermined language in a native language, and the predetermined language spoken according to the message determined by the spoken language determination means. The translation device according to any one of claims 1 to 5, wherein the language of the wording is set.

7. The translation device according to claim 1, wherein the translation device comprises a portable terminal device.

8. The target language switching means switches the speech recognition target language to the other language upon the output of the bilingual sentence example by the bilingual sentence example outputting means, according to any one of claims 1 to 7. The translation device according to item.

9. A speaker discrimination means for discriminating which is a speaker is further provided, and the target language switching means switches the target language based on a discrimination result of the speaker discrimination means. The translation device according to claim 7.

10. The speaker discrimination means is a voice collection means for collecting a voice of a speaker and supplying a voice signal to the voice recognition unit, and at least a voice collection means capable of specifying a voice input direction. Two are provided, and it is determined which of the speakers is based on the difference in the input direction specified by each of the voice collecting units, and the target language switching unit is the speaker of the speaker determined by the speaker determining unit. The translation device according to claim 9, wherein the target language is switched based on the utterance timing.

11. The speaker discrimination means specifies the direction of the translation device when the sentence example candidate output means outputs the sentence example candidate, and the direction of the translation device when the parallel translation sentence example output means outputs the parallel translation sentence example. The apparatus further comprises: a device direction identification unit; and a direction change detection unit that detects a change in the direction of the translation device identified by the device direction identification unit, wherein the target language switching unit is a detection result of the direction change detection unit. The translation device according to claim 9, wherein the target language is switched based on the target language.

12. A display unit, a display control unit that causes the display unit to display the output from the sentence example candidate output unit, and the parallel translation sentence example output unit, wherein the display control unit includes the target. 10. The translation according to claim 9, wherein when the language switching unit switches the target language based on the discrimination result of the speaker discrimination unit, the display direction is changed according to the discriminated speaker. apparatus.

13. A language designation step of designating one of a pair of other languages in accordance with the utterance situation of a speaker, and a speech sentence in the language designated in the language designation step is acquired to perform speech recognition. A voice recognition step; a word recognition step of recognizing a word included in the acquired voice sentence based on a voice recognition result in the voice recognition step; a plurality of sentence examples for each language; and at least one related to each sentence example. One word, the parallel translation example of each sentence example, the sentence example accumulating step of accumulating in advance in association with each other, the word recognized in the word recognition step, and the degree of coincidence with the word accumulated in the sentence example accumulating step. Based on this, a sentence example candidate selecting step of selecting a plurality of corresponding sentence examples, and a sentence example candidate outputting step of selectively outputting the sentence examples selected in the sentence example candidate selecting step. The phrase candidate output of phrase output in step, speech translation method characterized by comprising: a bilingual phrase outputting step, the outputs to obtain the parallel translation example sentence corresponding to the selected phrase.

14. A computer comprising: a voice sentence acquisition unit for obtaining a voice sentence; a language designation unit for designating a pair of other languages among a plurality of natural languages; and a voice sentence obtained by the voice sentence acquisition unit for the language. A voice recognition unit that recognizes a voice based on one language designated by the designation unit, a word recognition unit that recognizes a word included in the voice sentence based on a voice recognition result by the voice recognition unit, and a plurality of word recognition units for each language. A sentence example storage unit that stores a sentence example, at least one word associated with each sentence example, and a parallel translation sentence example of each sentence example in association with each other, a word recognized by the word recognition unit, and a sentence example storage unit. Based on the degree of coincidence with a word, a sentence example candidate selection unit that selects a plurality of corresponding sentence examples, a sentence example candidate output unit that selectively outputs a plurality of sentence examples selected by the sentence example candidate selection unit, and the sentence example candidate output unit outputs Among the sentence examples, the bilingual sentence example output unit that outputs the bilingual sentence example corresponding to the selected sentence example, triggered by the output of the bilingual sentence example by the bilingual sentence example output unit, the language for which the voice recognition unit is the voice recognition target to the other A program that functions as a target language switching unit that enables switching.