JPWO2011033834A1

JPWO2011033834A1 - Speech translation system, speech translation method, and recording medium

Info

Publication number: JPWO2011033834A1
Application number: JP2011531830A
Authority: JP
Inventors: 長田　誠也; 誠也長田; 健花沢; 隆行荒川; 岡部　浩司; 浩司岡部; 田中　大介; 大介田中
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-09-18
Filing date: 2010-06-18
Publication date: 2013-02-07
Also published as: WO2011033834A1

Abstract

音声翻訳システム（１）は、入力音声を所定の単位で音声認識して文字データを生成する音声認識部（１１）と、この音声認識部（１１）により生成された文字データを連結する認識結果連結部（１２）と、この認識結果連結部（１２）により連結された文字データが文として成立するか否かを判定する文判定部（１３）と、連結された文字データを翻訳する翻訳部（１４）と、この翻訳部（１４）による翻訳結果を出力する出力部（１５）とを備える。この構成により、文として成立すると判定されたものを含む連結された文字データの翻訳結果が出力されるので、リアルタイムに翻訳結果を出力することができるとともに、文法的に正しい翻訳結果も出力することができる。The speech translation system (1) includes a speech recognition unit (11) that recognizes an input speech in a predetermined unit to generate character data, and a recognition result that connects the character data generated by the speech recognition unit (11). A linking unit (12), a sentence determination unit (13) for determining whether or not the character data linked by the recognition result linking unit (12) is established as a sentence, and a translation unit for translating the linked character data (14) and an output unit (15) for outputting a translation result by the translation unit (14). With this configuration, translation results of concatenated character data including those determined to be established as sentences are output, so that translation results can be output in real time and grammatically correct translation results can also be output. Can do.

Description

本発明は、入力された音声を翻訳する音声翻訳システム、音声翻訳方法および記録媒体に関するものである。 The present invention relates to a speech translation system that translates input speech, a speech translation method, and a recording medium.

近年、音声翻訳システムでは、音声認識システムと機械翻訳システムとを利用したものが提案されている。一般に、音声認識システムでは、マイクボタンなどの音声認識の処理単位を決定する装置を有さない場合、音声の切れ目という物理的な現象を利用して音声認識処理の処理単位を決めている。これに対して、機械翻訳システムでは、文というテキスト単位で翻訳処理を行っている。したがって、このような２つのシステムを単純に組み合わせて音声翻訳システムを作成すると、音声の切れ目毎に翻訳処理されるので発声された音声を適切に翻訳することが難しかった。特に、英語と日本語のように語順が大きく違う言語間の翻訳は、困難であった。 In recent years, a speech translation system using a speech recognition system and a machine translation system has been proposed. In general, in a speech recognition system, when there is no device for determining a speech recognition processing unit such as a microphone button, the processing unit for speech recognition processing is determined using a physical phenomenon called speech breaks. In contrast, machine translation systems perform translation processing in units of text called sentences. Therefore, when a speech translation system is created by simply combining these two systems, it is difficult to properly translate the spoken speech because translation processing is performed for each break in speech. In particular, it was difficult to translate between languages such as English and Japanese whose word order was very different.

このような問題を解決するために、日本語話し言葉の不適格表現を容認して、日本語を母語とする人であれば不自然なく読める逐次的な翻訳結果を出力する技術が提案されている（例えば、非特許文献１参照。）。一般に、話し言葉は、繰り返し、語順の逆転、省略、言い誤り、言い直し、言い淀みなど書き言葉には見られない様々な不適格表現が頻繁に出現する。人間は、その高度な発話理解能力により、発話の中に不適格表現が含まれていてもその不適格表現を容認して、その発話の意味するところを容易に理解することができる。そこで、上記技術では、日本語話し言葉の不適格表現を排除するのではなく、むしろ積極的に容認して、入力された他言語を逐次翻訳している。このような構成を採ることにより、上記技術では、翻訳結果をリアルタイムに出力することを実現している。 In order to solve such problems, a technique has been proposed that accepts ineligible expressions in Japanese spoken language and outputs sequential translation results that can be read unnaturally by those who are native speakers of Japanese. (For example, refer nonpatent literature 1.). In general, spoken language frequently appears with various ineligible expressions that are not found in written words such as repetition, reversal of word order, omission, phrasing, rephrasing, and sayings. Humans can easily understand the meaning of the utterance by accepting the ineligible expression even if the utterance includes the ineligible expression due to their advanced utterance understanding ability. Therefore, the above technique does not eliminate the ineligible expression of the Japanese spoken language, but rather accepts it positively and translates the input other languages sequentially. By adopting such a configuration, the above technique realizes outputting translation results in real time.

松原茂樹, 浅井悟, 外山勝彦, 稲垣康善，不適格表現を活用する漸進的な英日話し言葉翻訳手法, 電気学会論文誌, Vol.118-C, No.1, pp.71-78 (1998.1)Shigeki Matsubara, Satoru Asai, Katsuhiko Toyama, Yasuyoshi Inagaki, Progressive English-Japanese Spoken Language Translation Method Using Inappropriate Expressions, IEEJ Transactions, Vol.118-C, No.1, pp.71-78 (1998.1)

しかしながら、不適格表現を容認した技術では、翻訳結果が書き言葉として正しい文になっていなくても出力されてしまう。このため、その翻訳結果は、リアルタイムに読むときにはよいもものの、後で読み直したときにはとても読みにくいものとなっていた。 However, with the technology that tolerates ineligible expressions, the translation result is output even if the written sentence is not a correct sentence. For this reason, the translation results were good when read in real time, but were very difficult to read when reread later.

そこで、本願発明は、リアルタイムに翻訳結果を出力することができるとともに、文法的に正しい翻訳結果を出力することができる音声翻訳システム、音声翻訳方法および記録媒体を提案することを目的とする。 Accordingly, an object of the present invention is to propose a speech translation system, a speech translation method, and a recording medium capable of outputting a translation result in real time and outputting a grammatically correct translation result.

上述したよう課題を解決するために、本発明に係る音声翻訳システムは、入力音声を所定の単位で音声認識して文字データを生成する音声認識部と、この音声認識部により生成された文字データを連結する認識結果連結部と、この認識結果連結部により連結された文字データが文として成立するか否かを判定する文判定部と、連結された文字データを翻訳する翻訳部と、この翻訳部による翻訳結果を出力する出力部とを備え、認識結果連結部は、文判定部により文として成立しないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 In order to solve the problems as described above, a speech translation system according to the present invention includes a speech recognition unit that recognizes input speech in a predetermined unit to generate character data, and character data generated by the speech recognition unit. A recognition result linking unit for linking the character data, a sentence determination unit for determining whether the character data linked by the recognition result linking unit is established as a sentence, a translation unit for translating the linked character data, and this translation An output unit for outputting a translation result by the unit, and the recognition result connecting unit further connects the character data to the connected character data determined not to be established as a sentence by the sentence determining unit. It is.

また、本発明に係る音声翻訳方法は、入力音声を所定の単位で音声認識して文字データを生成する音声認識ステップと、この音声認識ステップにより生成された文字データを連結する認識結果連結ステップと、この認識結果連結ステップにより連結された文字データが文として成立するか否かを判定する文判定ステップと、連結された文字データを翻訳する翻訳ステップと、この翻訳ステップによる翻訳結果を出力する出力ステップとを備え、認識結果連結ステップは、文判定ステップにより文として成立しないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 The speech translation method according to the present invention includes a speech recognition step for recognizing input speech in a predetermined unit to generate character data, and a recognition result connecting step for connecting character data generated by the speech recognition step. A sentence determination step for determining whether or not the character data concatenated by the recognition result concatenation step is established as a sentence, a translation step for translating the concatenated character data, and an output for outputting a translation result by the translation step The recognition result connecting step is characterized in that character data is further connected to the connected character data determined not to be established as a sentence by the sentence determining step.

また、本発明に係る記録媒体は、コンピュータに、入力音声を所定の単位で音声認識して文字データを生成する音声認識ステップと、この音声認識ステップにより生成された文字データを連結する認識結果連結ステップと、この認識結果連結ステップにより連結された文字データが文として成立するか否かを判定する文判定ステップと、連結された文字データを翻訳する翻訳ステップと、この翻訳ステップによる翻訳結果を出力する出力ステップとを実行させるためのプログラムを記録した記録媒体であって、認識結果連結ステップは、文判定ステップにより文として成立しないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 The recording medium according to the present invention includes a speech recognition step for generating character data by recognizing input speech in a predetermined unit to a computer, and a recognition result connection for connecting the character data generated by the speech recognition step. A step for determining whether or not the character data concatenated in this recognition result concatenation step is established as a sentence, a translation step for translating the concatenated character data, and outputting a translation result by this translation step The recognition result connecting step further connects character data to the connected character data determined not to be established as a sentence by the sentence determining step. It is characterized by this.

本発明によれば、入力音声を所定の単位で音声認識して文字データを生成し、この生成された文字データを連結し、この文字データが文として成立するか否かを判定し、文として成立しないと判定された連結された文字データに、さらに文字データを連結し、連結された文字データを翻訳し、この翻訳結果を出力することにより、文として成立すると判定されたものを含む連結された文字データの翻訳結果が出力される。したがって、リアルタイムに翻訳結果を出力することができるとともに、文法的に正しい翻訳結果も出力することができる。 According to the present invention, character data is generated by recognizing input speech in a predetermined unit, the generated character data is concatenated, it is determined whether or not the character data is established as a sentence, Concatenated text data that has been determined to be established as a sentence by further connecting character data to the connected text data determined not to be established, translating the linked text data, and outputting the translation result The translation result of the character data is output. Therefore, translation results can be output in real time, and grammatically correct translation results can also be output.

図１は、本発明の第１の実施の形態に係る音声翻訳システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech translation system according to the first embodiment of the present invention. 図２は、本発明の第２の実施の形態に係る音声翻訳システムの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a speech translation system according to the second embodiment of the present invention. 図３は、本発明の第２の実施の形態に係る音声翻訳システムの動作を示すフローチャートである。FIG. 3 is a flowchart showing the operation of the speech translation system according to the second embodiment of the present invention. 図４は、出力装置における文字データと翻訳データの表示例である。FIG. 4 is a display example of character data and translation data in the output device. 図５は、出力装置における文字データと翻訳データの表示例である。FIG. 5 is a display example of character data and translation data in the output device. 図６は、本発明の第３の実施の形態に係る音声翻訳システムの構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a speech translation system according to the third embodiment of the present invention. 図７は、出力装置における文字データと翻訳データの表示例である。FIG. 7 is a display example of character data and translation data in the output device. 図８は、出力装置における文字データと翻訳データの表示例である。FIG. 8 is a display example of character data and translation data in the output device. 図９は、出力装置における文字データと翻訳データの表示例である。FIG. 9 is a display example of character data and translation data in the output device.

以下、図面を参照して、本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［第１の実施の形態］
まず、本発明の第１の実施の形態に係る音声翻訳システムについて説明する。[First Embodiment]
First, the speech translation system according to the first embodiment of the present invention will be described.

図１に示すように、本実施の形態に係る音声翻訳システム１は、入力音声を所定の単位で音声認識して文字データを生成する音声認識部１１と、この音声認識部１１により生成された文字データを連結する認識結果連結部１２と、この認識結果連結部１２により連結された文字データが文として成立するか否かを判定する文判定部１３と、連結された文字データを翻訳する翻訳部１４と、この翻訳部１４による翻訳結果を出力する出力部１５とを備えたものである。 As shown in FIG. 1, the speech translation system 1 according to this embodiment includes a speech recognition unit 11 that recognizes an input speech in a predetermined unit to generate character data, and the speech recognition unit 11 generates the speech data. A recognition result linking unit 12 for linking character data, a sentence determination unit 13 for determining whether or not the character data linked by the recognition result linking unit 12 is established as a sentence, and a translation for translating the linked character data A unit 14 and an output unit 15 for outputting a translation result by the translation unit 14 are provided.

ここで、認識結果連結部１２は、文判定部１３により文として成立しないと判定された連結された文字データに、さらに文字データを連結する Here, the recognition result connecting unit 12 further connects character data to the connected character data determined not to be established as a sentence by the sentence determining unit 13.

このような音声翻訳システム１は、ＣＰＵ等の演算装置と、メモリ、ＨＤＤ（Hard Disk Drive）等の記憶装置と、キーボード、マウス、ポインティングデバイス、ボタン、タッチパネル等の外部から情報の入力を検出する入力装置と、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等の通信回線を介して各種情報の送受信を行うＩ／Ｆ装置と、ＣＲＴ（Cathode Ray Tube）、ＬＣＤ（Liquid Crystal Display）またはＦＥＤ（Field Emission Display）等の表示装置を備えたコンピュータと、このコンピュータにインストールされたプログラムとから構成される。すなわちハードウェア装置とソフトウェアとが協働することによって、上記のハードウェア資源がプログラムによって制御され、上述した音声認識部１１、認識結果連結部１２、文判定部１３、翻訳部１４および出力部１５が実現される。なお、上記プログラムは、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、メモリカード、ＩＣメモリなどの記録媒体に記録された状態で提供されるようにしてもよい。 Such a speech translation system 1 detects an input of information from the outside, such as an arithmetic device such as a CPU, a storage device such as a memory or an HDD (Hard Disk Drive), and a keyboard, mouse, pointing device, button, touch panel, or the like. An input device, an I / F device that transmits and receives various types of information via a communication line such as a LAN (Local Area Network) and a WAN (Wide Area Network), and a CRT (Cathode Ray Tube), LCD (Liquid Crystal Display) or The computer includes a display device such as a field emission display (FED) and a program installed in the computer. That is, the hardware device and software cooperate to control the above hardware resources by a program, and the above-described speech recognition unit 11, recognition result connection unit 12, sentence determination unit 13, translation unit 14, and output unit 15. Is realized. The program may be provided in a state of being recorded on a recording medium such as a flexible disk, a CD-ROM, a DVD-ROM, a memory card, and an IC memory.

このような構成を有することにより、本実施の形態では、入力音声を所定の単位で音声認識して文字データを生成し、この生成された文字データを連結し、この文字データが文として成立するか否かを判定し、文として成立しないと判定された連結された文字データに、さらに文字データを連結し、連結された文字データを翻訳し、この翻訳結果を出力することにより、文として成立すると判定されたものを含む連結された文字データの翻訳結果が出力される。したがって、リアルタイムに翻訳結果を出力することができるとともに、文法的に正しい翻訳結果も出力することができる。 By having such a configuration, in the present embodiment, the input speech is recognized by a predetermined unit to generate character data, the generated character data is concatenated, and the character data is established as a sentence. It is established as a sentence by connecting further character data to the linked character data determined not to be established as a sentence, translating the linked character data, and outputting the translation result. Then, the translation result of the connected character data including the determined one is output. Therefore, translation results can be output in real time, and grammatically correct translation results can also be output.

［第２の実施の形態］
次に、本発明の第２の実施の形態に係る音声翻訳システムについて説明する。[Second Embodiment]
Next, a speech translation system according to the second embodiment of the present invention will be described.

＜音声翻訳システムの構成＞
図２に示すように、本実施の形態に係る音声翻訳システム２は、ユーザの音声が入力される入力装置２１と、この入力装置２１に入力された音声からユーザが発した言葉を翻訳するデータ処理装置２２と、このデータ処理装置２２における情報処理に用いられるデータを記憶するデータ記憶装置２３と、データ処理装置２２による情報処理結果を出力する出力装置２４とを備えている。<Configuration of speech translation system>
As shown in FIG. 2, the speech translation system 2 according to the present embodiment includes an input device 21 to which the user's speech is input, and data that translates words uttered by the user from the speech input to the input device 21. A processing device 22, a data storage device 23 that stores data used for information processing in the data processing device 22, and an output device 24 that outputs information processing results by the data processing device 22 are provided.

≪入力装置の構成≫
入力装置２１は、マイクロフォンなどの検出した音声を電気信号に変換する公知の音声検出装置から構成される。≪Configuration of input device≫
The input device 21 includes a known sound detection device that converts detected sound such as a microphone into an electric signal.

≪データ処理装置の構成≫
データ処理装置２２は、入力装置２１から入力される電気信号に対して情報処理を行うことにより、ユーザが発した音声からそのユーザが発した文字や単語を翻訳する情報処理装置から構成される。このようなデータ処理装置２２は、音声認識部２２１と、第１の翻訳部２２２と、認識結果連結部２２３と、文判定部２２４と、第２の翻訳部２２５と、翻訳結果出力部２２６とを備える。≪Data processing device configuration≫
The data processing device 22 is constituted by an information processing device that translates characters and words emitted by a user from voices emitted by the user by performing information processing on the electrical signal input from the input device 21. Such a data processing device 22 includes a speech recognition unit 221, a first translation unit 222, a recognition result connection unit 223, a sentence determination unit 224, a second translation unit 225, and a translation result output unit 226. Is provided.

音声認識部２２１は、入力装置１から入力された電気信号（以下、音声データという）を解析して、ユーザが発した音声を文字データとして取り出す音声認識処理を行う機能部である。なお、音声認識部２２１による音声認識処理は、例えば、無音が３秒以上続くといった音声の切れ目ごとに音声データを区切ったものなど、所定の単位毎に行われる。 The voice recognition unit 221 is a functional unit that performs a voice recognition process of analyzing an electric signal (hereinafter, referred to as voice data) input from the input device 1 and extracting voice uttered by the user as character data. Note that the voice recognition processing by the voice recognition unit 221 is performed for each predetermined unit, for example, the voice data is divided for each voice break in which silence continues for 3 seconds or more.

第１の翻訳部２２２は、音声認識部２２１から出力された文字データを公知の機械翻訳技術を利用して翻訳する機能部である。 The 1st translation part 222 is a function part which translates the character data output from the speech recognition part 221 using a well-known machine translation technique.

認識結果連結部２２３は、音声認識部２２１から出力される文字データを時系列的に連結した文字データ列を生成して文判定部２２４に出力する機能部である。ここで、最初に音声認識部２２１から出力された文字データについては、この文字データのみが文字データ列として文判定部２２４に出力される。一方、後述する文判定部２２４により文として成り立たないと判定された文字データ列については、この文字データ列に含まれる最後の文字データに時系列的に続く文字データが連結されて、新たな文字データ列として出力される。 The recognition result connection unit 223 is a functional unit that generates a character data string obtained by connecting the character data output from the speech recognition unit 221 in time series and outputs the character data string to the sentence determination unit 224. Here, for the character data first output from the speech recognition unit 221, only this character data is output to the sentence determination unit 224 as a character data string. On the other hand, for a character data string determined by the sentence determination unit 224, which will be described later, as a sentence, the character data that continues in time series with the last character data included in the character data string is connected to create a new character. Output as a data string.

文判定部２２４は、認識結果連結部２２３から出力された文字データ列が、文として成り立つか否かを、データ記憶装置２３に記憶された後述する文判定モデル２３１に基づいて判定する機能部である。 The sentence determination unit 224 is a functional unit that determines whether the character data string output from the recognition result connection unit 223 is a sentence based on a sentence determination model 231 described later stored in the data storage device 23. is there.

第２の翻訳部２２５は、文判定部２２４により文として成り立つと判定された文字データ列を公知の機械翻訳技術を利用して翻訳する機能部である。 The second translation unit 225 is a functional unit that translates a character data string that is determined as a sentence by the sentence determination unit 224 using a known machine translation technique.

翻訳結果出力部２２６は、第１の翻訳部２２２の翻訳結果（以下、第１の翻訳結果という）と、第２の翻訳部２２５の翻訳結果（以下、第２の翻訳結果という）とを出力装置２４に出力する機能部である。 The translation result output unit 226 outputs the translation result of the first translation unit 222 (hereinafter referred to as the first translation result) and the translation result of the second translation unit 225 (hereinafter referred to as the second translation result). This is a functional unit that outputs to the device 24.

このようなデータ処理装置２２は、ＣＰＵ等の演算装置と、メモリ、ＨＤＤ等の記憶装置と、キーボード、マウス、ポインティングデバイス、ボタン、タッチパネル等の外部から情報の入力を検出する入力装置と、ＬＡＮ、ＷＡＮ等の通信回線を介して各種情報の送受信を行うＩ／Ｆ装置と、ＣＲＴ、ＬＣＤまたはＦＥＤ等の表示装置を備えたコンピュータと、このコンピュータにインストールされたプログラムとから構成される。すなわちハードウェア装置とソフトウェアとが協働することによって、上記のハードウェア資源がプログラムによって制御され、上述した音声認識部２２１、第１の翻訳部２２２、認識結果連結部２２３、文判定部２２４、第２の翻訳部２２５および翻訳結果出力部２２６が実現される。なお、上記プログラムは、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、メモリカード、ＩＣメモリなどの記録媒体に記録された状態で提供されるようにしてもよい。 Such a data processing device 22 includes an arithmetic device such as a CPU, a storage device such as a memory and an HDD, an input device that detects input of information from the outside such as a keyboard, a mouse, a pointing device, a button, and a touch panel, and a LAN. And an I / F device that transmits and receives various types of information via a communication line such as a WAN, a computer having a display device such as a CRT, LCD, or FED, and a program installed in the computer. That is, the hardware device and software cooperate to control the above hardware resources by a program, and the above-described speech recognition unit 221, first translation unit 222, recognition result connection unit 223, sentence determination unit 224, A second translation unit 225 and a translation result output unit 226 are realized. The program may be provided in a state of being recorded on a recording medium such as a flexible disk, a CD-ROM, a DVD-ROM, a memory card, and an IC memory.

≪データ記憶装置の構成≫
データ記憶装置２３は、公知の磁気記憶装置からなり、文字データが文であるか否かを判定するための情報からなる文判定モデル２３１を記憶している。文判定モデル２３１は、例えば、Ｎ−ｇｒａｍモデルで学習された多数のモデルを含むデータベースから構成される。そのモデルとしては、例えば、Ｎ＝３の３−ｇｒａｍモデルで学習されたデータベースの場合、３つの文字列または単語の組み合わせからなる。このような文判定モデル２３１は、予め構築されている。≪Configuration of data storage device≫
The data storage device 23 includes a known magnetic storage device, and stores a sentence determination model 231 including information for determining whether or not the character data is a sentence. The sentence determination model 231 includes a database including a large number of models learned by the N-gram model, for example. As the model, for example, in the case of a database learned with a 3-gram model of N = 3, it consists of a combination of three character strings or words. Such a sentence determination model 231 is constructed in advance.

≪出力装置の構成≫
出力装置２４は、ＣＲＴ，ＬＣＤ，ＦＥＤ等の公知のディスプレイ装置やスピーカ等の公知の音声出力装置などから構成され、データ処理装置２２の翻訳結果出力部２２６から入力された翻訳結果を出力する。≪Configuration of output device≫
The output device 24 includes a known display device such as a CRT, LCD, or FED, a known voice output device such as a speaker, and the like, and outputs the translation result input from the translation result output unit 226 of the data processing device 22.

なお、本実施の形態に係る音声翻訳システム２は、入力装置２１、データ処理装置２２、データ記憶装置２３および出力装置２４が一体となって、コンピュータとこのコンピュータにインストールされたプログラムとから構成されるようにしてもよいことは言うまでもない。 The speech translation system 2 according to the present embodiment includes an input device 21, a data processing device 22, a data storage device 23, and an output device 24, which are integrated into a computer and a program installed in the computer. Needless to say, it may be possible to do so.

＜音声翻訳システムの動作＞
次に、図３を参照して、本実施の形態に係る音声翻訳システム２の動作について説明する。<Operation of speech translation system>
Next, the operation of the speech translation system 2 according to the present embodiment will be described with reference to FIG.

まず、入力装置２１は、ユーザの音声が入力されると、その音声を電気信号（音声データ）に変換してデータ処理装置２２の音声認識部２２１に入力する（ステップＳ１）。 First, when a user's voice is input, the input device 21 converts the voice into an electric signal (voice data) and inputs the electric signal to the voice recognition unit 221 of the data processing device 22 (step S1).

音声データが入力されると、音声認識部２２１は、その音声データを音声認識して、文字データを所定の単位毎に生成する（ステップＳ２）。 When the voice data is input, the voice recognition unit 221 recognizes the voice data and generates character data for each predetermined unit (step S2).

所定の単位の文字データが生成されると、第１の翻訳部２２２は、その文字データを翻訳する（ステップＳ３）。この翻訳された文字データ（第１の翻訳結果）は、翻訳結果出力部２２６に送出され、後述するステップＳ７の処理が行われる。 When character data of a predetermined unit is generated, the first translation unit 222 translates the character data (step S3). The translated character data (first translation result) is sent to the translation result output unit 226, and the process of step S7 described later is performed.

また、認識結果連結部２２３は、音声認識部２２１で生成された文字データを時系列的に連結して、文字データ列を生成する（ステップＳ４）。 In addition, the recognition result connecting unit 223 connects the character data generated by the speech recognition unit 221 in time series to generate a character data string (step S4).

文字データ列が生成されると、文判定部２２４は、その文字データ列が文として成り立つか否かを、データ記憶装置２３の文判定モデル２３１に基づいて判定する（ステップＳ５）。 When the character data string is generated, the sentence determination unit 224 determines whether or not the character data string holds as a sentence based on the sentence determination model 231 of the data storage device 23 (step S5).

文字データ列が文として成り立たない場合（ステップＳ５：ＮＯ）、文判定部２２４は、その文字データ列を認識結果連結部２２３に送出する。すると、認識結果連結部２２３は、受け取った文字データ列の末尾に時系列的に続く文字データを連結した新たな文字データ列を生成する。 When the character data string does not hold as a sentence (step S5: NO), the sentence determination unit 224 sends the character data string to the recognition result connection unit 223. Then, the recognition result concatenation unit 223 generates a new character data string obtained by concatenating character data that continues in time series to the end of the received character data string.

一方、文字データ列が文として成立する場合（ステップＳ５：ＹＥＳ）、第２の翻訳部２２５は、その文字データ列を翻訳する（ステップＳ６）。この翻訳された文字データ列（第２の翻訳結果）は、翻訳結果出力部２２６に送出される。 On the other hand, when the character data string is established as a sentence (step S5: YES), the second translation unit 225 translates the character data string (step S6). This translated character data string (second translation result) is sent to the translation result output unit 226.

第１の翻訳部２２２および第２の翻訳部２２５により翻訳がそれぞれ行われると、翻訳結果出力部２２６は、第１の翻訳結果および第２の翻訳結果を出力装置２４に送出する。すると、出力装置２４は、受け取った第１の翻訳結果および第２の翻訳結果を表示画面に表示させたり、スピーカから出力させたりする（ステップＳ７）。 When the translation is performed by the first translation unit 222 and the second translation unit 225, the translation result output unit 226 sends the first translation result and the second translation result to the output device 24. Then, the output device 24 displays the received first translation result and second translation result on the display screen or outputs them from the speaker (step S7).

このような一連の動作は、入力装置２１に入力された音声がなくなるまで行われる。 Such a series of operations is performed until there is no sound input to the input device 21.

このように、本実施の形態によれば、所定の単位で音声認識した文字データを逐次翻訳することにより、リアルタイムでその所定の単位の音声認識結果を理解することができ、かつ、所定の単位で音声認識された文字データを連結して文として成り立つか否か判定し、文として成り立つと判定された文字データ列を翻訳して出力することにより、後で翻訳結果を見直したときに読みやすい翻訳結果を提供することができる。 As described above, according to the present embodiment, by sequentially translating the character data speech-recognized in a predetermined unit, the speech recognition result of the predetermined unit can be understood in real time, and the predetermined unit It is easy to read when the translation result is reviewed later by translating and outputting the character data string that is judged to be a sentence by linking the character data recognized by the voice in the above. Translation results can be provided.

＜音声翻訳システムの動作例＞
例えば、入力装置２１から「I live」という音声が入力され、音声認識部２２１は、「I live」の発声のすぐ後に音声の情報がなかったことを検出し、この「I live」という文字データを出力したとする。<Operation example of speech translation system>
For example, a voice “I live” is input from the input device 21, and the voice recognition unit 221 detects that there is no voice information immediately after the utterance of “I live”, and the character data “I live”. Is output.

第１の翻訳部２２２は、その文字データを翻訳して「私は住んでいます」と第１の翻訳結果を翻訳結果出力部２２６に出力する。すると、図４に示すように、ディスプレイからなる出力装置２４の逐次出力部の領域には、「I live」という文字データと、この文字データを翻訳した「私は住んでいます」という第１の翻訳結果が表示される。 The first translation unit 222 translates the character data and outputs the first translation result to the translation result output unit 226 as “I live”. Then, as shown in FIG. 4, in the area of the sequential output unit of the output device 24 composed of a display, the character data “I live” and the first “I live” translated this character data. The translation result of is displayed.

一方、認識結果連結部２２３は、最初の音声が「I live」であったので、「I live」をそのまま文判定部２２４に送出する。文判定部２２４は、文判定モデル２３１を用いて、送られてきた文字データ列「I live」が文として成り立つか否かを判定する。ここで、文判定モデル２３１は、例えばＮ＝３の３−ｇｒａｍモデルで学習されたデータベースから構成され、このデータベースの中に「Ｉ」、「ｌｉｖｅ」、「文末」というモデルしかなかったものとする。このような場合、文判定部２２４は、文字データ列「I live」には末尾に「文末」に対応する単語または文字列が存在しないので、文として成立しないと判断し、第２の翻訳部２２５に文字データ列を送らない。したがって、図４に示すように、ディスプレイからなる出力装置２４の文出力部の領域には、何も表示されない。 On the other hand, since the first voice is “I live”, the recognition result connecting unit 223 sends “I live” to the sentence determination unit 224 as it is. The sentence determination unit 224 uses the sentence determination model 231 to determine whether or not the sent character data string “I live” holds as a sentence. Here, the sentence determination model 231 is composed of, for example, a database learned with a 3-gram model of N = 3, and only “I”, “live”, and “end of sentence” models exist in this database. To do. In such a case, the sentence determination unit 224 determines that the character data string “I live” does not hold as a sentence because there is no word or character string corresponding to “end of sentence” at the end, and the second translation unit No character data string is sent to H.225. Therefore, as shown in FIG. 4, nothing is displayed in the area of the sentence output unit of the output device 24 composed of a display.

このような状態において、次に、入力装置２１に音声が入力され、音声認識部２２１が「in a university dormitory」という文字データを出力したとする。 Next, in this state, it is assumed that voice is input to the input device 21 and the voice recognition unit 221 outputs character data “in a university dormitory”.

すると、第１の翻訳部２２２は、その文字データを翻訳して、「大学の寮で」という第１の翻訳結果を翻訳結果出力部２２６に出力する。すると、図５に示すように、出力装置２４の逐次出力部の領域には、上述した「I live」および「私は住んでいます」という表示の下に、「in a university dormitory」という文字データと、この文字データを翻訳した「大学の寮で」という第１の翻訳結果が表示される。 Then, the first translation unit 222 translates the character data and outputs the first translation result “in the university dormitory” to the translation result output unit 226. Then, as shown in FIG. 5, the characters “in a university dormitory” are displayed in the sequential output unit area of the output device 24 under the above-mentioned indications “I live” and “I live”. The data and the first translation result “in the university dormitory”, which is a translation of this character data, are displayed.

一方、認識結果連結部２２１は、前回の文判定部２２４で文と判定されなかった「I live」という文字データ列と今回入力された「in a university dormitory」という文字データとを連結して、「I live in a university dormitory」という文字データ列を作成する。この文字データ列に対して、文判定モデル２３１に「university」、「dormitory」、「文末」というモデルが存在すると、文判定部２２４は、その文字データ列が文として成り立つと判定し、「I live in a university dormitory」という文字データ列を第２の翻訳部２２５に送出する。第２の翻訳部２２５は、その文字データ列を翻訳した「私は大学の寮に住んでいます」という第２の翻訳結果を翻訳結果出力部２２６に出力する。すると、図５に示すように、出力装置２４の文出力部の領域には、「I live in a university dormitory」という文字データ列と、この文字データ列を翻訳した「私は大学の寮に住んでいます」という第２の翻訳結果が表示される。 On the other hand, the recognition result concatenation unit 221 concatenates the character data string “I live” that was not determined as a sentence by the previous sentence determination unit 224 and the character data “in a university dormitory” input this time, Create a character string “I live in a university dormitory”. If there is a model of “university”, “dormitory”, and “end of sentence” in the sentence determination model 231 for this character data string, the sentence determination unit 224 determines that the character data string holds as a sentence, and “I A character data string “live in a university dormitory” is sent to the second translation unit 225. The second translation unit 225 outputs, to the translation result output unit 226, the second translation result “I live in a university dormitory” translated from the character data string. Then, as shown in FIG. 5, in the sentence output area of the output device 24, a character data string “I live in a university dormitory” and “I live in a university dormitory” The second translation result is displayed.

このように、本実施の形態によれば、所定の単位で音声認識した文字データを逐次的に翻訳することにより、リアルタイムで音声認識結果を理解することができる。また、その文字データを連結して文として成立するか否かを判定し、文として成立する単位で文字データを翻訳するので、まとまった単位での理解をすることができる。 As described above, according to the present embodiment, it is possible to understand the speech recognition result in real time by sequentially translating the character data speech-recognized in a predetermined unit. Moreover, it is determined whether or not the character data is concatenated to be established as a sentence, and the character data is translated in a unit that is established as a sentence, so that it is possible to understand in a unit.

なお、本実施の形態では、文判定モデル２３１をＮ＝３のＮ−ｇｒａｍモデルで説明したが、Ｎの値は３に限定されず、適宜自由に設定することができる。 In the present embodiment, the sentence determination model 231 is described as an N-gram model with N = 3. However, the value of N is not limited to 3, and can be set as appropriate.

また、本実施の形態では、文判定モデル２３１としてＮ−ｇｒａｍモデルを適用した場合を例に説明したが、文判定モデル２３１はＮ−ｇｒａｍモデルに限定されず、例えば文字列を構文解析するために使うＣＦＧルールのような構文解析用のモデルなど、各種モデルを適宜自由に適用することができる。例えば「live」という動詞は、主語、前置詞句を伴うことが多いので、「I live」のみでは文判定部２２４で文として判定されず、「I live in a university dormitory」ではじめて文として判定される。 In this embodiment, the case where the N-gram model is applied as the sentence determination model 231 has been described as an example. However, the sentence determination model 231 is not limited to the N-gram model, and for example, for parsing a character string. Various models such as a syntax analysis model such as the CFG rule used in the above can be freely applied as appropriate. For example, the verb “live” is often accompanied by a subject and a prepositional phrase, so that “I live” alone is not determined as a sentence by the sentence determination unit 224, but “I live in a university dormitory” is determined as a sentence for the first time. The

また、本実施の形態では、音声認識部２２１が音声の切れ目で音声データを区切って所定の単位の文字データを生成する場合を例に説明したが、音声データを区切る位置は音声の切れ目に限定されず、例えば、音の高低や強弱、特定の音など、適宜自由に設定することができる。 In this embodiment, the case where the speech recognition unit 221 generates speech data in a predetermined unit by separating speech data at speech breaks has been described as an example. However, positions where speech data is separated are limited to speech breaks. Instead, for example, the pitch of the sound, the intensity, the specific sound, etc. can be set as appropriate.

また、本実施の形態では、文判定部２２４が音声認識部２２１で認識された単位で文字データが文として成立するか否かを判定する場合を例に説明したが、文字データの途中までを文として判定するようにしてもよい。例えば、最初の発声が「I live」で、２番目の発声が「in a university dormitory can I use」だった場合、「I live in a university dormitory」までを文として判定し、残りの「can I use」はその次の文の一部と判定するようにしてもよい。この場合の文判定モデル２３１は、上述したＮ−ｇｒａｍや構文解析結果を用いることにより、より高精度な判定を実現することができる。 In the present embodiment, the case where the sentence determination unit 224 determines whether or not the character data is established as a sentence in the unit recognized by the voice recognition unit 221 is described as an example. You may make it determine as a sentence. For example, if the first utterance is “I live” and the second utterance is “in a university dormitory can I use”, the sentence up to “I live in a university dormitory” is determined as a sentence, and the remaining “can I “use” may be determined as a part of the next sentence. The sentence determination model 231 in this case can realize more accurate determination by using the above-described N-gram and syntax analysis result.

また、本実施の形態では、文判定部２２４が受け取った文字データまでで文として成り立つか否かを判定する場合を例に説明したが、判定が困難な場合には、次の発声の文字データを受け取ってから判定するようにしてもよい。例えば、最初の発声「I live」のみでは文として成立するか否かを判定できない場合には、文として成立しないと判定しておき、次の発声「in a university dormitory can I use」を待ってから、「I live」は文として成り立たず、「I live in a university dormitory」までが文であると判定するようにしてもよい。 Further, in the present embodiment, the case has been described as an example in which it is determined whether or not the text data received by the text determination unit 224 is satisfied as a text. However, when the determination is difficult, the text data of the next utterance is determined. You may make it determine after receiving. For example, if the first utterance “I live” alone cannot determine whether or not the sentence is established, it is determined that the sentence is not established, and the next utterance “in a university dormitory can I use” is awaited. Therefore, it may be determined that “I live” does not hold as a sentence but “I live in a university dormitory” is a sentence.

また、本実施の形態では、出力装置はディ２４がディスプレイから構成される場合を例に説明したが、例えば逐次出力部の領域に表示された第１の翻訳結果のみを音声合成して、スピーカから出力するようにしてもよい。 In the present embodiment, the output device has been described as an example in which the display 24 is configured by a display. For example, only the first translation result displayed in the area of the sequential output unit is synthesized by speech, and the speaker May be output.

［第３の実施の形態］
次に、本発明に係る第３の実施の形態について説明する。なお、本実施の形態において、上述した第２の実施の形態と同等の構成要素については、同じ名称および符号を付し、適宜説明を省略する。[Third Embodiment]
Next, a third embodiment according to the present invention will be described. In the present embodiment, components that are the same as those in the second embodiment described above are given the same names and reference numerals, and descriptions thereof are omitted as appropriate.

＜音声翻訳システム３の構成＞
図６に示すように、本実施の形態に係る音声翻訳システム３は、ユーザの音声が入力される入力装置２１と、この入力装置２１に入力された音声からユーザが発した言葉を翻訳するデータ処理装置２２と、このデータ処理装置２２における情報処理に用いられるデータを記憶するデータ記憶装置２３と、データ処理装置２２による情報処理結果を出力する出力装置２４とを備えている。<Configuration of speech translation system 3>
As shown in FIG. 6, the speech translation system 3 according to the present embodiment includes an input device 21 to which the user's speech is input, and data that translates words uttered by the user from the speech input to the input device 21. A processing device 22, a data storage device 23 that stores data used for information processing in the data processing device 22, and an output device 24 that outputs information processing results by the data processing device 22 are provided.

データ処理装置２２は、入力装置２１から入力される電気信号に対して情報処理を行うことにより、ユーザが発した音声からそのユーザが発した言葉を翻訳する情報処理装置から構成される。このようなデータ処理装置２２は、音声認識部２２１と、認識結果連結部２２３と、文判定部２２４と、翻訳部３０１と、表示制御部３０２とを備える。 The data processing device 22 includes an information processing device that translates words uttered by a user from speech uttered by the user by performing information processing on an electrical signal input from the input device 21. Such a data processing device 22 includes a voice recognition unit 221, a recognition result connection unit 223, a sentence determination unit 224, a translation unit 301, and a display control unit 302.

ここで、文判定部２２４は、認識結果連結部２２３から出力された文字データ列が文として成り立つか否かを、データ記憶装置２３に記憶された後述する文判定モデル２３１に基づいて判定する機能部である。文判定部２２４の判定結果は、翻訳部３０１および表示制御部３０２に送出される。 Here, the sentence determination unit 224 determines whether or not the character data string output from the recognition result connection unit 223 is established as a sentence based on a sentence determination model 231 described later stored in the data storage device 23. Part. The determination result of the sentence determination unit 224 is sent to the translation unit 301 and the display control unit 302.

翻訳部３０１は、文判定部２２４により判定が行われた文字データ列を翻訳する機能部である。具体的には、翻訳部３０１は、連結された文字データ列が文として成立する場合には、その文字データ列全体を翻訳する。一方、文として成立しない場合にも、その文字データ列を翻訳する。翻訳部３０１による文字データ列の翻訳結果は、表示制御部３０２に送出される。 The translation unit 301 is a functional unit that translates the character data string determined by the sentence determination unit 224. Specifically, when the concatenated character data string is established as a sentence, the translation unit 301 translates the entire character data string. On the other hand, even when the sentence is not established, the character data string is translated. The translation result of the character data string by the translation unit 301 is sent to the display control unit 302.

表示制御部３０２は、文字データ列や翻訳結果をディスプレイからなる出力装置２４に表示させるとともに、出力装置２４における文字データ列や翻訳結果の表示位置を制御する。 The display control unit 302 displays the character data string and the translation result on the output device 24 including a display, and controls the display position of the character data string and the translation result on the output device 24.

このような構成を有する音声翻訳システム３では、連結した文字データ列が文として成り立つと判定されれば、対応する出力装置２４の表示位置にその文字データ列と翻訳結果とを表示し、表示位置を移動しながら残りの文字データ列と翻訳結果を表示する。 In the speech translation system 3 having such a configuration, if it is determined that the concatenated character data string is established as a sentence, the character data string and the translation result are displayed at the display position of the corresponding output device 24, and the display position is displayed. The remaining character data string and translation result are displayed while moving.

＜音声翻訳システムの動作例＞
次に、本実施の形態に係る音声翻訳システム３の動作例について説明する。<Operation example of speech translation system>
Next, an operation example of the speech translation system 3 according to the present embodiment will be described.

今、第２の実施の形態と同様、入力装置２１に「I live」という発声が入力され、音声認識部２２１は、「I live」の発声のすぐ後に音声情報がなかったことを検出し、この「I live」という文字データを出力したとする。 As in the second embodiment, the utterance “I live” is input to the input device 21 and the speech recognition unit 221 detects that there is no speech information immediately after the utterance of “I live”. Assume that the character data “I live” is output.

認識結果連結部２２３は、最初の音声が「I live」であったので、この文字データ「I live」を文字データ列として文判定部２２４に送出する。すると、文判定部２２４では、第２の実施の形態と同様、文判定モデル２３１を用いて、送られてきた文字データ列「I live」が文として成り立つか否かが判定され、文として成り立たないと判定される。 Since the first voice is “I live”, the recognition result connecting unit 223 sends the character data “I live” to the sentence determining unit 224 as a character data string. Then, the sentence determination unit 224 uses the sentence determination model 231 to determine whether or not the sent character data string “I live” holds as a sentence, as in the second embodiment. It is determined that there is no.

すると、翻訳部３０１は、文として成り立たない文字データ列「I live」を直訳し、「私は住んでいます」という翻訳結果を表示制御部３０２に出力する。表示制御部３０２は、図７に示すように、最初の表示位置として設定した表示装置２４の左上の領域に、「I live」という文字データ列と、「私は住んでいます」という翻訳結果とを表示する。このとき、表示制御部３０２は、文判定部２２４で文として成り立たないと判定されたので、その文字データ列および翻訳結果の表示位置を、左上の領域に固定する。 Then, the translation unit 301 directly translates the character data string “I live” that does not hold as a sentence, and outputs the translation result “I live” to the display control unit 302. As shown in FIG. 7, the display control unit 302 displays a character data string “I live” and a translation result “I live” in the upper left area of the display device 24 set as the first display position. Is displayed. At this time, the display control unit 302 determines that the sentence determination unit 224 does not hold the sentence, and thus fixes the display position of the character data string and the translation result in the upper left area.

このような状態において、次に、入力装置２１に「in a university dormitory can I use」という音声が入力され、音声認識部２２１が「in a university dormitory can I use」の発声のすぐ後に音声データがなかったことを検出し、その「in a university dormitory can I use」という文字データを出力したとする。 In such a state, next, a voice “in a university dormitory can I use” is input to the input device 21, and voice data is immediately after the voice recognition unit 221 speaks “in a university dormitory can I use”. It is assumed that the character data “in a university dormitory can I use” is output.

認識結果連結部２２１は、前回の文判定部２２４で文と判定されなかった「I live」という文字データ列に今回入力された「in a university dormitory can I use」という文字データを連結して、「I live in a university dormitory can I use」という文字データ列を作成する。この文字データ列に対して、第２の実施の形態と同様、文判定部２２４は、３−ｇｒａｍモデルを用いたマッチングにより「I live in a university dormitory」の部分が文として成立すると判断し、残りの「can I use」の部分は文として成立しない判定したものととする。 The recognition result concatenation unit 221 concatenates the character data “in a university dormitory can I use” input this time to the character data string “I live” that was not determined to be a sentence by the previous sentence determination unit 224, Create a character data string “I live in a university dormitory can I use”. As in the second embodiment, the sentence determination unit 224 determines that the part of “I live in a university dormitory” is established as a sentence by matching using the 3-gram model, as in the second embodiment. It is assumed that the remaining “can I use” is determined not to be established as a sentence.

翻訳部３０１は、文判定部２２４で文として成り立つと判定された文字データ列「I live in a university dormitory」の部分を「私は大学の寮に住んでいます」と翻訳して、残りの文字データ「can I use」を「私は使えますか」と翻訳したとする。 The translation unit 301 translates the part of the character data string “I live in a university dormitory” that is determined to be a sentence by the sentence determination unit 224 to “I live in a university dormitory”, and the rest Assume that the character data “can I use” is translated as “I can use it”.

表示制御部３０２は、文判定部２２４で文として成り立つと判定された文字データ列「I live in a university dormitory」とその文字データ列を翻訳した翻訳結果「私は大学の寮に住んでいます」を、図８に示すように、出力装置２４の左上の領域に表示する。すなわち、文字データ列「I live」と翻訳結果「私は住んでいます」が表示されていた領域に、文字データ列「I live in a university dormitory」と翻訳結果「私は大学の寮に住んでいます」を上書きする。 The display control unit 302 translates the character data string “I live in a university dormitory” determined to be a sentence by the sentence determination unit 224 and the translation result “I live in a university dormitory” "Is displayed in the upper left area of the output device 24 as shown in FIG. That is, the character data string “I live in a university dormitory” and the translation result “I live in a university dormitory” are displayed in the area where the character data string “I live” and the translation result “I live” are displayed. Overwrite "

また、表示制御部３０２は、文判定部２２４で文として成り立つと判定されたので、表示位置を次の行に移動する。
また、文として成り立たないと判定された文字データ列「can I use」とその文字データ列を翻訳した翻訳結果「私は使えますか」については、図８に示すように、出力装置２４の左下の領域に表示し、上述した表示位置、すなわち次の行には表示しない。In addition, the display control unit 302 moves the display position to the next line because the sentence determination unit 224 determines that the sentence is established as a sentence.
As for the character data string “can I use” determined not to be a sentence and the translation result “Is it usable” translated from the character data string, as shown in FIG. And is not displayed in the display position described above, that is, in the next line.

このような処理を繰り返すことにより、１つのディスプレイで所定の単位で音声認識した文字データ列を逐次的に翻訳するので、リアルタイム的に音声認識結果を理解することが可能になるとともに、後で見直したときには文字データを連結して文として成り立つと判定された部分の翻訳結果が表示されるので、文をまとまった単位で理解することが可能となる。 By repeating such processing, the character data string that is voice-recognized in a predetermined unit on one display is sequentially translated, so that it is possible to understand the voice recognition result in real time and to review it later. Since the translation result of the part determined to be formed as a sentence by connecting the character data is displayed, the sentence can be understood in a unit.

以上説明したように、本実施の形態によれば、文として成立しない判定された文字データ列はそのまま表示されるので、リアルタイム的な理解が可能となる一方、文として成立する判定された文字データ列は文単位の翻訳結果が表示されるので、表示画面が１つのままでリアルタイム的な翻訳と後で見直したときに文単位での翻訳が表示されるので、文の理解がより容易となる。 As described above, according to the present embodiment, since the character data string determined not to be established as a sentence is displayed as it is, the character data determined to be satisfied as a sentence can be understood in real time. Since the translation result for each sentence is displayed in the column, real-time translation with one display screen and sentence-by-sentence translation are displayed when reviewed later, making it easier to understand the sentence. .

なお、文判定部２２４で文として成り立つと判定された「I live in a university dormitory」を翻訳部３０１で「私は大学の寮に住んでいます」と翻訳したとする。表示制御部３０２は、そのときに表示位置にある翻訳結果「私は住んでいます」と次の翻訳結果「私は大学の寮に住んでいます」の文字データ列をマッチングして、変化した翻訳結果の部分を抽出し、図９に示すように、変化した部分に下線を付したり、色を変更したりするようにしてもよい。これにより、逐次追加された文字データがわかりやすくなり、結果として、リアルタイム的に理解しやすくなる。 It is assumed that “I live in a university dormitory”, which is determined as a sentence by the sentence determination unit 224, is translated by the translation unit 301 as “I live in a university dormitory”. The display control unit 302 matches the character data string of the translation result “I live in” and the next translation result “I live in the university dormitory” at the display position at that time, and changes The translation result portion may be extracted, and the changed portion may be underlined or the color may be changed as shown in FIG. As a result, the character data added sequentially becomes easy to understand, and as a result, it becomes easy to understand in real time.

なお、本発明は、上述した第１〜第３の実施の形態によって限定されないことは明らかである。 It should be noted that the present invention is not limited to the first to third embodiments described above.

この出願は、２００９年９月１８日に出願された日本出願特願２００９−２１６８０３を基礎とする優先権を主張し、その開示を全てここに取り込む。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2009-216803 for which it applied on September 18, 2009, and takes in those the indications of all here.

本発明は、音声から文字列を生成する音声認識装置やその文字列を翻訳する音声翻訳装置等に適用することができる。 The present invention can be applied to a speech recognition device that generates a character string from speech, a speech translation device that translates the character string, and the like.

１…音声翻訳システム、１１…音声認識部、１２…認識結果連結部、１３…文判定部、１４…翻訳部、１５…出力部。 DESCRIPTION OF SYMBOLS 1 ... Speech translation system, 11 ... Speech recognition part, 12 ... Recognition result connection part, 13 ... Sentence determination part, 14 ... Translation part, 15 ... Output part.

上述したよう課題を解決するために、本発明に係る音声翻訳システムは、入力音声を所定の単位で音声認識して文字データを生成する音声認識部と、この音声認識部により生成された文字データを連結する認識結果連結部と、この認識結果連結部により連結された文字データが文を含むか否かを判定する文判定部と、連結された文字データを翻訳する翻訳部と、この翻訳部による翻訳結果を出力する出力部とを備え、認識結果連結部は、文判定部により文を含まないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 In order to solve the problems as described above, a speech translation system according to the present invention includes a speech recognition unit that recognizes input speech in a predetermined unit to generate character data, and character data generated by the speech recognition unit. A recognition result linking unit for linking, a sentence determination unit for determining whether or not the character data linked by the recognition result linking unit includes a sentence , a translation unit for translating the linked character data, and this translation unit An output unit that outputs the translation result of the recognition result, and the recognition result connecting unit further connects the character data to the connected character data determined not to include a sentence by the sentence determining unit. is there.

また、本発明に係る音声翻訳方法は、入力音声を所定の単位で音声認識して文字データを生成する音声認識ステップと、この音声認識ステップにより生成された文字データを連結する認識結果連結ステップと、この認識結果連結ステップにより連結された文字データが文を含むか否かを判定する文判定ステップと、連結された文字データを翻訳する翻訳ステップと、この翻訳ステップによる翻訳結果を出力する出力ステップとを備え、認識結果連結ステップは、文判定ステップにより文を含まないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 The speech translation method according to the present invention includes a speech recognition step for recognizing input speech in a predetermined unit to generate character data, and a recognition result connecting step for connecting character data generated by the speech recognition step. A sentence determination step for determining whether the character data concatenated by the recognition result concatenation step includes a sentence , a translation step for translating the concatenated character data, and an output step for outputting a translation result by the translation step The recognition result connecting step is characterized in that the character data is further connected to the connected character data determined not to include a sentence in the sentence determining step.

また、本発明に係る記録媒体は、コンピュータに、入力音声を所定の単位で音声認識して文字データを生成する音声認識ステップと、この音声認識ステップにより生成された文字データを連結する認識結果連結ステップと、この認識結果連結ステップにより連結された文字データが文を含むか否かを判定する文判定ステップと、連結された文字データを翻訳する翻訳ステップと、この翻訳ステップによる翻訳結果を出力する出力ステップとを実行させるためのプログラムを記録した記録媒体であって、認識結果連結ステップは、文判定ステップにより文を含まないと判定された連結された文字データに、さらに文字データを連結することを特徴とするものである。 The recording medium according to the present invention includes a speech recognition step for generating character data by recognizing input speech in a predetermined unit to a computer, and a recognition result connection for connecting the character data generated by the speech recognition step. A step, a sentence determination step for determining whether or not the character data concatenated by the recognition result concatenation step includes a sentence , a translation step for translating the concatenated character data, and a translation result by the translation step is output. In the recording medium recording the program for executing the output step, the recognition result concatenation step further concatenates character data to the concatenated character data determined not to include a sentence by the sentence determination step. It is characterized by.

Claims

A speech recognition unit that recognizes input speech in a predetermined unit and generates character data;
A recognition result concatenation unit that concatenates the character data generated by the speech recognition unit;
A sentence determination unit that determines whether or not the character data connected by the recognition result connection unit is established as a sentence;
A translation unit for translating the concatenated character data;
An output unit for outputting the translation result of the translation unit,
The speech recognition system, wherein the recognition result connecting unit further connects the character data to the connected character data determined not to be established as a sentence by the sentence determining unit.

The translation unit
A first translation unit that translates the character data generated by the voice recognition unit for each predetermined unit;
A second translation unit that translates the linked character data determined to be established by the sentence determination unit;
The speech translation system according to claim 1, wherein the output unit outputs a translation result of the first translation unit and a translation result of the second translation unit.

The output unit includes a display screen, displays the linked character data and the translation result of the translation unit at a predetermined position on the display screen, and determines that the sentence determination unit establishes the sentence. 2. The speech translation system according to claim 1, wherein the character data following the linked character data and the translation result of the translation unit are displayed at a position different from the predetermined position on the display screen.

The output unit compares the translation result of the translation unit displayed at the predetermined position with the translation result of the translation unit to be displayed next at the predetermined position, and calculates a difference detected by the comparison. The speech translation system according to claim 3, wherein the translation result is displayed at the predetermined position with emphasis.

A speech recognition step of generating input data by recognizing input speech in a predetermined unit;
A recognition result concatenation step for concatenating the character data generated by this speech recognition step;
A sentence determination step for determining whether or not the character data connected by the recognition result connection step is established as a sentence;
A translation step of translating the concatenated character data;
An output step for outputting the translation result of this translation step, and
The speech translation method, wherein the recognition result connection step further connects the character data to the connected character data determined not to be established as a sentence by the sentence determination step.

On the computer,
A speech recognition step of generating input data by recognizing input speech in a predetermined unit;
A recognition result concatenation step for concatenating the character data generated by this speech recognition step;
A sentence determination step for determining whether or not the character data connected by the recognition result connection step is established as a sentence;
A translation step of translating the concatenated character data;
A recording medium on which a program for executing the output step of outputting the translation result of the translation step is recorded,
The recording medium characterized in that the recognition result connecting step further connects the character data to the connected character data determined not to be established as a sentence by the sentence determining step.