JP2009075263A

JP2009075263A - Voice recognition device and computer program

Info

Publication number: JP2009075263A
Application number: JP2007242927A
Authority: JP
Inventors: Toshiki Endo; 俊樹遠藤; Masaki Naito; 正樹内藤; Hisashi Kawai; 恒河井
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2007-09-19
Filing date: 2007-09-19
Publication date: 2009-04-09

Abstract

<P>PROBLEM TO BE SOLVED: To reduce the frequency of editing operations of a candidate word by a user when the user edits the candidate word displayed on a screen while displaying the candidate word from a voice recognition result of an input voice on the screen. <P>SOLUTION: The device includes a voice recognition part 13 for performing processing to recognize an input voice and generating a recognition result composed of a string of recognized words; a candidate word generation part 16 for generating a candidate word from the recognition result; a candidate word editing-display part 17 for displaying the candidate word on a screen and updating the recognition result according to an editing content by the user; and an editing operation part 18 for making the user edit the candidate word displayed on the screen. The device has a word connection rule for connecting a plurality of continuous words. The candidate word generation part 16 connects a plurality of continuous words contained in the recognition result according to the word connection rule, and takes the resulting connected word string as the candidate word. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声認識装置およびコンピュータプログラムに関する。 The present invention relates to a speech recognition apparatus and a computer program.

従来、コンピュータを用いた音声認識では、話者の発声方法や音声入力時の背景雑音などの影響により１００％の認識率を達成することは困難である。そのために、例えば特許文献１に記載の音声認識装置は、入力音声に含まれる複数の単語を予め辞書に記憶されている複数の単語とそれぞれ比較し、競合候補の中から一番競合確率の高い単語を音声認識結果とし、音声認識結果を複数の単語の単語列として画面に表示し、競合候補の中から一番競合確率の高い単語の競合確率に近い競合確率を持つ１以上の競合単語を選び、対応する一番競合確率の高い単語に隣接して画面上に表示させ、ユーザによるマニュアル操作に応じて、画面上に表示された１以上の競合単語から適切な訂正単語を選択し、選択された競合単語を、音声認識結果の一番競合確率の高い単語と置き換えるようにしている。
特開２００６−１４６００８号公報 Conventionally, in speech recognition using a computer, it is difficult to achieve a recognition rate of 100% due to the influence of a speaker's utterance method, background noise at the time of speech input, and the like. Therefore, for example, the speech recognition apparatus described in Patent Literature 1 compares a plurality of words included in the input speech with a plurality of words stored in the dictionary in advance, and has the highest competition probability among the competition candidates. A word is used as a speech recognition result, and the speech recognition result is displayed on the screen as a word string of a plurality of words. Select and display on the screen adjacent to the word with the highest probability of competition, and select and select an appropriate correction word from one or more competing words displayed on the screen according to the manual operation by the user The competing word is replaced with the word having the highest competition probability in the speech recognition result.
JP 2006-146008 A

しかし、上述した従来の音声認識装置では、音声認識結果から画面に表示される候補語は単語単位である（特許文献１の図２，図１９，図２０参照）。従って、例えば、“今日の午後３時に会議です”という文章が音声で入力された場合に、音声認識結果である“の”や“に”のような助詞、或いは“ご”のような接頭語、だけの候補語が表示されてしまう。このため、ユーザは、画面に表示された候補語の中から正解を選んだり、候補語を削除したり、候補語を新規の候補語に変更したりなどすることにより画面上で文章を編集する際に、助詞や接頭語等の単体では意味のない単語をいちいち正解としたり、或いは削除、変更等の操作を行わなければならず、手間や時間がかかる。特に、メールなど、長い文章の音声認識を行う場合には、候補語を編集する操作の回数が多くなり、ユーザの使い勝手が悪い。 However, in the conventional speech recognition apparatus described above, the candidate words displayed on the screen from the speech recognition result are in units of words (see FIGS. 2, 19, and 20 of Patent Document 1). So, for example, if the sentence “It is a meeting at 3:00 pm today” is entered by voice, the speech recognition result is a particle such as “no” or “ni”, or a prefix such as “go”. , Only candidate words will be displayed. Therefore, the user edits the text on the screen by selecting the correct answer from the candidate words displayed on the screen, deleting the candidate word, changing the candidate word to a new candidate word, etc. In this case, it is necessary to make correct words that are meaningless by itself, such as particles and prefixes, or to perform operations such as deletion and change, which takes time and effort. In particular, when speech recognition of a long sentence such as e-mail is performed, the number of operations for editing a candidate word increases, and user convenience is poor.

本発明は、このような事情を考慮してなされたもので、その目的は、入力音声の音声認識結果から候補語を画面に表示してユーザが画面に表示された候補語を編集するときに、ユーザが候補語を編集する操作の回数を減らすことのできる音声認識装置およびコンピュータプログラムを提供することにある。 The present invention has been made in consideration of such circumstances, and its purpose is to display candidate words on the screen from the speech recognition result of the input speech and when the user edits the candidate words displayed on the screen. Another object of the present invention is to provide a speech recognition apparatus and a computer program that can reduce the number of operations of a user editing a candidate word.

上記の課題を解決するために、本発明に係る音声認識装置は、入力された音声を認識する処理を行い、認識した単語の列から成る認識結果を生成する音声認識手段と、認識結果から候補語を生成する候補語生成手段と、候補語を画面に表示する候補語表示手段と、ユーザが画面に表示された候補語を編集するための編集操作手段と、ユーザによる編集内容に従って認識結果を更新する更新手段と、を備え、連続する複数の単語を連結するときの単語連結規則を設け、前記候補語生成手段は、前記単語連結規則に従って前記認識結果に含まれる連続する複数の単語を連結し、連結した単語列を候補語とすることを特徴とする。 In order to solve the above problems, a speech recognition apparatus according to the present invention performs processing for recognizing input speech and generates a recognition result including a recognized word sequence, and a candidate based on the recognition result. A candidate word generating means for generating a word, a candidate word displaying means for displaying the candidate word on the screen, an editing operation means for the user to edit the candidate word displayed on the screen, and the recognition result according to the editing content by the user. Updating means for updating, and providing a word linking rule when linking a plurality of consecutive words, wherein the candidate word generating means links a plurality of consecutive words included in the recognition result according to the word linking rule The connected word string is used as a candidate word.

本発明に係る音声認識装置においては、前記候補語生成手段は、前記単語連結規則に従って、前記認識結果に含まれる連続する複数の単語を連結することが可能か判定する単語連結判定手段と、該判定結果に従って、前記認識結果に含まれる連続する複数の単語を連結する単語連結手段と、を有することを特徴とする。 In the speech recognition apparatus according to the present invention, the candidate word generating means determines whether it is possible to connect a plurality of consecutive words included in the recognition result according to the word connection rule, And word linking means for linking a plurality of consecutive words included in the recognition result according to the determination result.

本発明に係る音声認識装置においては、前記単語連結規則は、連結可能な複数の単語の組合せと該組合せた単語の順序を規定することを特徴とする。 In the speech recognition apparatus according to the present invention, the word connection rule defines a combination of a plurality of connectable words and an order of the combined words.

本発明に係る音声認識装置においては、前記単語連結規則による連結可能な複数の単語の組合せと該組合せた単語の順序についての言語確率を記憶する言語確率記憶手段を備え、前記単語連結判定手段は、前記単語連結規則による連結可能な複数の単語の組合せと該組合せた単語の順序について、前記言語確率が閾値以上である場合に、該当する複数の単語を連結可能であると判定することを特徴とする。 In the speech recognition apparatus according to the present invention, the speech recognition apparatus includes language probability storage means for storing a language probability for a combination of a plurality of words connectable according to the word connection rule and an order of the combined words, The combination of a plurality of words connectable according to the word connection rule and the order of the combined words are determined to be connectable when the language probability is equal to or higher than a threshold value. And

本発明に係る音声認識装置においては、ユーザが前記閾値を指定するための閾値指定手段を備えたことを特徴とする。 The speech recognition apparatus according to the present invention is characterized by comprising a threshold value specifying means for the user to specify the threshold value.

本発明に係る音声認識装置においては、前記単語連結規則は、連結可能な複数の単語の品詞の種類の組合せと該組合せた品詞の種類の順序を規定することを特徴とする。 In the speech recognition apparatus according to the present invention, the word linking rule defines a combination of part-of-speech types of a plurality of connectable words and an order of the combined part-of-speech types.

本発明に係る音声認識装置においては、前記単語連結規則による連結可能な複数の品詞の種類の組合せと該組合せた品詞の種類の順序についての品詞連結コストを記憶する品詞連結コスト記憶手段を備え、前記単語連結判定手段は、前記単語連結規則による連結可能な複数の単語の品詞の種類の組合せと該組合せた品詞の種類の順序について、前記品詞連結コストが閾値以下である場合に、該当する複数の単語を連結可能であると判定することを特徴とする。 The speech recognition apparatus according to the present invention includes a part-of-speech connection cost storage unit that stores a combination of a plurality of part-of-speech types that can be connected by the word connection rule and a part-of-speech connection cost about the order of the combined part-of-speech types. The word linking determination means includes a plurality of corresponding parts of speech when the part of speech linking cost is equal to or lower than a threshold for a combination of part of speech types of a plurality of words connectable according to the word linking rule and an order of the combined part of speech types. It is determined that the words can be connected.

本発明に係るコンピュータプログラムは、入力された音声を認識する処理を行い、認識した単語の列から成る認識結果を生成する音声認識機能と、認識結果から候補語を生成する候補語生成機能と、候補語を画面に表示する候補語表示機能と、ユーザが画面に表示された候補語を編集するための編集操作機能と、ユーザによる編集内容に従って認識結果を更新する更新機能と、をコンピュータに実現させるコンピュータプログラムであり、連続する複数の単語を連結するときの単語連結規則を設け、前記候補語生成機能は、前記単語連結規則に従って前記認識結果に含まれる連続する複数の単語を連結し、連結した単語列を候補語とすることを特徴とする。
これにより、前述の音声認識装置がコンピュータを利用して実現できるようになる。 The computer program according to the present invention performs a process of recognizing input speech, generates a recognition result including a recognized word sequence, a candidate word generation function that generates a candidate word from the recognition result, Candidate word display function for displaying candidate words on the screen, editing operation function for the user to edit the candidate words displayed on the screen, and update function for updating the recognition result according to the editing contents by the user are realized on the computer A word linking rule for linking a plurality of consecutive words, wherein the candidate word generation function connects a plurality of consecutive words included in the recognition result according to the word linking rule It is characterized in that the word string is a candidate word.
As a result, the speech recognition apparatus described above can be realized using a computer.

本発明によれば、単語単位のみではなく、文節単位、フレーズ単位、文単位の候補語が作成されるので、ユーザは、単語単位のみではなく、文節単位、フレーズ単位、文単位の候補語を編集することができる。これにより、ユーザが候補語を編集する操作の回数を低減することができるという効果が得られる。 According to the present invention, not only a word unit but also a phrase unit, a phrase unit, and a sentence unit candidate word are created, so that the user can select not only a word unit but also a phrase unit, a phrase unit, and a sentence unit candidate word. Can be edited. Thereby, the effect that the frequency | count of operation which a user edits a candidate word can be reduced is acquired.

以下、図面を参照し、本発明の一実施形態について説明する。
図１は、本発明の一実施形態に係る音声認識装置１の全体構成を示すブロック図である。図１において、音声認識装置１は、音声入力部１１、音響特徴量抽出部１２、音声認識部１３、音響モデル記憶部１４、言語モデル記憶部１５、候補語生成部１６、候補語編集・表示部１７及び編集操作部１８を備える。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing the overall configuration of a speech recognition apparatus 1 according to an embodiment of the present invention. In FIG. 1, a speech recognition apparatus 1 includes a speech input unit 11, an acoustic feature amount extraction unit 12, a speech recognition unit 13, an acoustic model storage unit 14, a language model storage unit 15, a candidate word generation unit 16, a candidate word edit / display. A unit 17 and an editing operation unit 18 are provided.

音声入力部１１は、マイク、増幅器、アナログ−デジタル変換器（ＡＤ変換器）などから構成される。音声入力部１１は、ユーザが発声した音声をマイクにより入力し、入力したアナログの音声信号を適当なレベルまで増幅してからデジタルの音声データに変換する。この音声データは音響特徴量抽出部１２に送られる。 The voice input unit 11 includes a microphone, an amplifier, an analog-digital converter (AD converter), and the like. The voice input unit 11 inputs the voice uttered by the user through a microphone, amplifies the input analog voice signal to an appropriate level, and converts it into digital voice data. This audio data is sent to the acoustic feature quantity extraction unit 12.

なお、音声入力部１１は、電話回線、ＩＰ（Internet Protocol）網などの通信回線と接続する通信インターフェースを備え、通信回線を介して受信したデジタルの音声データを音響特徴量抽出部１２に送るものであってもよい。さらに、音声データが符号化されている場合には、復号化した音声データを音響特徴量抽出部１２に送るようにする。 The voice input unit 11 includes a communication interface connected to a communication line such as a telephone line or an IP (Internet Protocol) network, and sends digital voice data received via the communication line to the acoustic feature quantity extraction unit 12. It may be. Further, when the audio data is encoded, the decoded audio data is sent to the acoustic feature quantity extraction unit 12.

音響特徴量抽出部１２は、音声入力部１１から受け取った音声データから、後段の音声認識処理に用いる音響特徴量を抽出する。この音響特徴量のデータは音声認識部１３に送られる。 The acoustic feature quantity extraction unit 12 extracts the acoustic feature quantity used for the subsequent speech recognition processing from the voice data received from the voice input unit 11. The acoustic feature amount data is sent to the speech recognition unit 13.

音声認識部１３は、音響特徴量抽出部１２から受け取った音響特徴量データに対して音声認識処理を行う。この音声認識処理には、音響モデル記憶部１４に記憶されている音響モデルと、言語モデル記憶部１５に記憶されている言語モデルとを使用する。音響モデル及び言語モデルは、準備段階として事前に、学習データを用いた学習によって構築し、各記憶部１４，１５に格納しておく。 The speech recognition unit 13 performs speech recognition processing on the acoustic feature amount data received from the acoustic feature amount extraction unit 12. For this speech recognition processing, an acoustic model stored in the acoustic model storage unit 14 and a language model stored in the language model storage unit 15 are used. The acoustic model and the language model are constructed by learning using learning data in advance as a preparation stage, and stored in the storage units 14 and 15.

音声認識部１３は、音響モデル及び言語モデルを用いた音声認識処理によって、音響特徴量データから単語を認識し、認識した単語の列から成る認識結果を作成する。このとき、最も確からしい単語の列から成る認識結果だけでなく、それ以外の他の認識された単語の列についても認識結果として作成する。音声認識部１３は、各認識結果に対して、音響的なスコア（音響尤度）と言語的な確率（言語確率）から認識結果の確からしさ（信頼度）を算出する。言語確率とは、一定数（例えば３個）の単語の並びが出現する確率である。音声認識部１３は、作成した認識結果の中から、所定の順位までの信頼度を有する認識結果を用いて、単語のネットワーク形式の認識結果を作成する。 The speech recognition unit 13 recognizes a word from the acoustic feature data by speech recognition processing using an acoustic model and a language model, and creates a recognition result including the recognized word sequence. At this time, not only the recognition result including the most probable word string but also other recognized word strings are generated as the recognition result. For each recognition result, the speech recognition unit 13 calculates the certainty (reliability) of the recognition result from an acoustic score (acoustic likelihood) and a linguistic probability (language probability). The language probability is the probability that a certain number (for example, three) of word sequences will appear. The speech recognition unit 13 creates a recognition result in the form of a network of words using a recognition result having reliability up to a predetermined rank from the created recognition results.

図２に、単語のネットワーク形式の認識結果の構成例を示す。図２に示されるような、単語のネットワーク形式は、従来、ラティス形式と呼ばれている。図２の例は、ユーザが“今日の午後３時に会議です”という文章を読んだときの構成例である。図２に示されるように、複数の認識結果（所定の順位までの信頼度の認識結果）を使用し、各認識結果に含まれる時間的に対応する単語の区切りをネットワーク状に連結している。なお、図２の認識結果の内容は、説明の便宜上のものである。 FIG. 2 shows a configuration example of the recognition result of the word network format. The network format of words as shown in FIG. 2 is conventionally called a lattice format. The example of FIG. 2 is a configuration example when the user reads the sentence “It is a meeting at 3 pm today”. As shown in FIG. 2, a plurality of recognition results (recognition results up to a predetermined rank) are used, and temporally corresponding word breaks included in each recognition result are connected in a network form. . The contents of the recognition result in FIG. 2 are for convenience of explanation.

音声認識部１３は、単語のネットワーク形式の認識結果に、各単語の品詞の種類を示す品詞情報と、各単語の音響尤度、言語確率及び信頼度の情報とを含める。単語のネットワーク形式の認識結果は、候補語生成部１６に送られる。 The speech recognition unit 13 includes part-of-speech information indicating the type of part-of-speech of each word, and information on acoustic likelihood, language probability, and reliability of each word in the recognition result of the word network format. The recognition result of the word network format is sent to the candidate word generation unit 16.

候補語生成部１６は、音声認識部１３から受け取った単語のネットワーク形式の認識結果から、候補語を生成する。候補語生成部１６は、単語のネットワーク形式の認識結果に含まれる連続する複数の単語を連結することにより、文節単位、フレーズ単位、文単位の候補語を作成する。候補語生成部１６は、単語連結規則に従って、単語のネットワーク形式の認識結果に含まれる連続する複数の単語を連結する。単語連結規則は、連続する複数の単語を連結するときの規則である。単語連結規則は、単語のネットワーク形式の認識結果に含まれる連続する複数の単語が連結可能かを判定する際に、参照される。単語連結規則は、準備段階として事前に、音声認識装置１に設定しておく。 The candidate word generation unit 16 generates a candidate word from the recognition result of the word network format received from the speech recognition unit 13. The candidate word generation unit 16 creates candidate words in phrase units, phrase units, and sentence units by concatenating a plurality of consecutive words included in the recognition result of the word network format. The candidate word generation unit 16 connects a plurality of consecutive words included in the recognition result of the word network format according to the word connection rule. The word concatenation rule is a rule for concatenating a plurality of consecutive words. The word connection rule is referred to when determining whether a plurality of continuous words included in the recognition result of the word network format can be connected. The word linking rule is set in the speech recognition apparatus 1 in advance as a preparation stage.

候補語生成部１６は、単語のネットワーク形式の認識結果に含まれる連続する複数の単語のうち、単語連結規則を満足しないものについては、そのまま単語単位で候補語とする。候補語生成部１６は、単語のネットワーク形式の認識結果から生成した候補語の列から成る候補語データを候補語編集・表示部１７に出力する。候補語データは、認識結果が候補語の列として表されたものである。 The candidate word generation unit 16 directly selects a word that does not satisfy the word connection rule from a plurality of continuous words included in the recognition result of the word network format as a candidate word. The candidate word generation unit 16 outputs candidate word data composed of a sequence of candidate words generated from the recognition result of the word network format to the candidate word editing / display unit 17. Candidate word data is a recognition result represented as a sequence of candidate words.

候補語編集・表示部１７は、候補語生成部１６から受け取った候補語データを画面に表示する。編集操作部１８は、各種の編集用の操作キーを備える。例えば、画面に表示された候補語の中からユーザが正解の候補語を選択するための操作キー、ユーザが候補語を削除する操作キー、ユーザが新規の候補語を入力するための操作キー、ユーザが認識結果の編集の終了を指示する操作キーなどを備える。編集操作部１８は、ユーザが操作キーで行った編集内容を候補語編集・表示部１７に通知する。候補語編集・表示部１７は、編集操作部１８から通知された編集内容に従って、認識結果を更新する。そして、更新後の認識結果に対応する候補語データで画面の表示内容を更新する。これにより、ユーザが編集した内容が反映された認識結果が、画面に表示される。 The candidate word editing / display unit 17 displays the candidate word data received from the candidate word generation unit 16 on the screen. The editing operation unit 18 includes various editing operation keys. For example, an operation key for the user to select a correct candidate word from candidate words displayed on the screen, an operation key for the user to delete the candidate word, an operation key for the user to input a new candidate word, An operation key for instructing the user to end editing of the recognition result is provided. The editing operation unit 18 notifies the candidate word editing / display unit 17 of the editing content performed by the user using the operation keys. The candidate word editing / display unit 17 updates the recognition result according to the editing content notified from the editing operation unit 18. Then, the display content of the screen is updated with the candidate word data corresponding to the updated recognition result. Thereby, the recognition result reflecting the content edited by the user is displayed on the screen.

図３は、図１に示す候補語生成部１６の構成例である。図３において、候補語生成部１６は、単語連結部２１、単語連結判定部２２、候補語グループ化部３１、同一候補語の一元化部３２、候補語の追加部３３及び候補語グループ記憶部３４を有する。 FIG. 3 is a configuration example of the candidate word generation unit 16 shown in FIG. In FIG. 3, the candidate word generation unit 16 includes a word connection unit 21, a word connection determination unit 22, a candidate word grouping unit 31, an unification unit 32 of the same candidate words, a candidate word addition unit 33, and a candidate word group storage unit 34. Have

単語連結部２１は、単語のネットワーク形式の認識結果に含まれる連続する複数の単語について、連結可能かの判定を単語連結判定部２２に依頼する。単語連結判定部２２は、該複数の単語が単語連結規則を満足するか判定し、判定が合格（連結可能）か不合格（連結不可能）かを単語連結部２１に返答する。 The word connection unit 21 requests the word connection determination unit 22 to determine whether a plurality of continuous words included in the recognition result of the word network format can be connected. The word connection determination unit 22 determines whether the plurality of words satisfy the word connection rule, and returns to the word connection unit 21 whether the determination is acceptable (connectable) or unacceptable (not connectable).

単語連結部２１は、該判定結果に従って、認識結果に含まれる連続する複数の単語を連結する。連結した単語の列は候補語となる。一方、連結されなかった単語は、そのまま単語単位で候補語となる。これにより、図２に示された単語のネットワーク形式の認識結果は、図４に示されるような、候補語のネットワーク形式になる。図４は、単語連結処理後の認識結果の構成例である。図２では単語単位でネットワーク状に連結されていたが、図４では複数の単語が連結された候補語単位でネットワーク状に連結されている。 The word connecting unit 21 connects a plurality of consecutive words included in the recognition result according to the determination result. The concatenated word string is a candidate word. On the other hand, a word that is not connected becomes a candidate word as it is. Thereby, the recognition result of the word network format shown in FIG. 2 becomes a candidate word network format as shown in FIG. FIG. 4 is a configuration example of a recognition result after word connection processing. In FIG. 2, the words are connected in a network, but in FIG. 4, the words are connected in a network in candidate word units in which a plurality of words are connected.

候補語グループ化部３１は、単語連結処理後の認識結果に対して、読みの近さや時間情報などを基に、候補語（単語、文節、フレーズ）のグループ化処理を行う。候補語グループ化部３１は、同一グループの候補語の開始時刻および終了時刻を、信頼度が最大の候補語の開始時刻および終了時刻に揃える。これにより、図４に示された単語連結処理後の認識結果は、図５に示されるように簡略化される。図５は、候補語のグループ化処理後の認識結果の構成例である。 The candidate word grouping unit 31 performs grouping processing of candidate words (words, phrases, phrases) on the recognition result after the word concatenation processing based on reading proximity, time information, and the like. The candidate word grouping unit 31 aligns the start time and end time of the candidate words in the same group with the start time and end time of the candidate word having the maximum reliability. Thereby, the recognition result after the word concatenation process shown in FIG. 4 is simplified as shown in FIG. FIG. 5 is a configuration example of a recognition result after grouping processing of candidate words.

同一候補語の一元化部３２は、候補語のグループ化処理後の認識結果に対して、同一グループに含まれる表記の同じ候補語を１つの候補語にまとめ、その候補語の信頼度を再計算する。一元化処理後の候補語の信頼度は、一元化処理前の候補語の信頼度の平均、加算、最大値などによって求める。これにより、図５に示された候補語のグループ化処理後の認識結果は、図６に示されるように簡略化される。図６は、同一候補語の一元化処理後の認識結果の構成例である。なお、図６の例では、各時間区間で、信頼度の高い順に候補語を上から並べている。 The unifying unit 32 of the same candidate words collects the same candidate words of the notation included in the same group into one candidate word and recalculates the reliability of the candidate words for the recognition result after the grouping process of candidate words To do. The reliability of the candidate words after the unification process is obtained by the average, addition, maximum value, etc. of the reliability of the candidate words before the unification process. Thereby, the recognition result after the candidate word grouping process shown in FIG. 5 is simplified as shown in FIG. FIG. 6 is a configuration example of a recognition result after the unification processing of the same candidate word. In the example of FIG. 6, the candidate words are arranged from the top in descending order of reliability in each time interval.

候補語の追加部３３は、同一候補語の一元化処理後の認識結果に対して、過去の候補語のグループの履歴に基づき、候補語を追加する。候補語グループ記憶部３４は、過去の候補語のグループの履歴を記憶している。候補語の追加部３３は、同一候補語の一元化処理後の認識結果中の最大の信頼度を有する候補語についてのグループの履歴を、候補語グループ記憶部３４から読み出す。候補語の追加部３３は、読み出したグループの履歴中に、同一候補語の一元化処理後の認識結果中のグループ内には存在しない候補語があった場合には、該候補語を同一候補語の一元化処理後の認識結果中のグループに追加する。逆に、同一候補語の一元化処理後の認識結果中のグループ内に存在する候補語が、候補語グループ記憶部３４から読み出したグループの履歴中に存在しない場合には、該候補語を候補語グループ記憶部３４内のグループの履歴に追加する。 The candidate word adding unit 33 adds the candidate word to the recognition result after the unification processing of the same candidate word based on the history of the group of past candidate words. The candidate word group storage unit 34 stores a history of past candidate word groups. The candidate word adding unit 33 reads, from the candidate word group storage unit 34, the group history of the candidate word having the maximum reliability in the recognition result after the unification processing of the same candidate word. When there is a candidate word that does not exist in the group in the recognition result after the unification processing of the same candidate word in the read history of the group, the candidate word adding unit 33 selects the candidate word as the same candidate word. Is added to the group in the recognition result after the unification processing. Conversely, if a candidate word that exists in the group in the recognition result after the unification processing of the same candidate word does not exist in the group history read from the candidate word group storage unit 34, the candidate word is selected as a candidate word. It adds to the history of the group in the group memory | storage part 34. FIG.

候補語生成部１６は、候補語の追加処理後の認識結果に対応する候補語データを、候補語編集・表示部１７に出力する。 The candidate word generation unit 16 outputs candidate word data corresponding to the recognition result after the candidate word addition processing to the candidate word editing / display unit 17.

図７は、図１に示す候補語編集・表示部１７の構成例である。図７において、候補語編集・表示部１７は、候補語データ解析・更新部４１、候補語グループ・候補語選択履歴記憶部４２及び候補語表示部４３を有する。 FIG. 7 is a configuration example of the candidate word editing / display unit 17 shown in FIG. In FIG. 7, the candidate word editing / display unit 17 includes a candidate word data analysis / update unit 41, a candidate word group / candidate word selection history storage unit 42, and a candidate word display unit 43.

候補語データ解析・更新部４１は、候補語データを解析し、各時間区間で信頼度が最大の候補語を連結することにより、暫定的な認識結果を作成し、保持する。その暫定的な認識結果、及び、各候補語と同一グループの候補語のデータは、候補語表示部４３に送られる。このとき候補語表示部４３には、画面に表示可能な分量のみが送られる。 The candidate word data analysis / update unit 41 analyzes the candidate word data and creates and holds a provisional recognition result by connecting candidate words having the maximum reliability in each time interval. The provisional recognition result and the data of candidate words in the same group as each candidate word are sent to the candidate word display unit 43. At this time, only the amount that can be displayed on the screen is sent to the candidate word display unit 43.

候補語表示部４３は、候補語データ解析・更新部４１から受け取った認識結果を表示装置の画面に表示する。このとき、候補語の境界を空白などにより明示する。さらに、各候補語に対してグループ化された他の候補語がある場合は、その旨を下線などにより示す。さらに、同一グループの候補語を、認識結果を表示する画面とは別の画面に表示し、その画面内で候補語を信頼度の高い順に表示する。 The candidate word display unit 43 displays the recognition result received from the candidate word data analysis / update unit 41 on the screen of the display device. At this time, the boundary of the candidate word is clearly indicated by a blank or the like. Further, if there are other candidate words grouped for each candidate word, this is indicated by an underline or the like. Further, the candidate words of the same group are displayed on a screen different from the screen displaying the recognition result, and the candidate words are displayed in the order of high reliability in the screen.

候補語データ解析・更新部４１は、編集操作部１８からユーザの編集内容を受け取ると、その編集内容に従って認識結果を更新する。例えば、正解の候補語の選択、候補語の削除、候補語の並びの変更、新規の候補語の入力などの編集内容に従って、認識結果を変更する。正解の候補語の選択がなされた場合は、編集箇所を正解の候補語に置き換え、他の候補語を削除する。候補語の削除がなされた場合には、編集箇所の候補語を全て削除する。新規の候補語が入力された場合には、編集箇所に入力された候補語を挿入する。候補語データ解析・更新部４１は、編集後の認識結果、及び、各候補語と同一グループの候補語のデータを候補語表示部４３に送る。 When the candidate word data analysis / update unit 41 receives the editing content of the user from the editing operation unit 18, the candidate word data analysis / update unit 41 updates the recognition result according to the editing content. For example, the recognition result is changed according to editing contents such as selection of correct candidate words, deletion of candidate words, change of arrangement of candidate words, and input of new candidate words. When the correct candidate word is selected, the edited portion is replaced with the correct candidate word, and the other candidate words are deleted. If the candidate word is deleted, all the candidate words in the edited portion are deleted. When a new candidate word is input, the input candidate word is inserted at the edit location. The candidate word data analysis / update unit 41 sends the recognition result after editing and data of candidate words in the same group as each candidate word to the candidate word display unit 43.

候補語データ解析・更新部４１は、編集操作部１８から編集箇所を移動する指示を受け取ると、移動先に対応する認識結果、及び、各候補語と同一グループの候補語のデータを候補語表示部４３に送る。 When the candidate word data analyzing / updating unit 41 receives an instruction to move the edited portion from the editing operation unit 18, the candidate word data is displayed for the recognition result corresponding to the destination and candidate word data in the same group as each candidate word. Send to part 43.

候補語グループ・候補語選択履歴記憶部４２は、候補語のグループと、ユーザが候補語を選択した確率（ユーザ選択確率）を保持する。候補語データ解析・更新部４１は、候補語グループ・候補語選択履歴記憶部４２を参照し、編集箇所にあたる候補語のグループの候補語の表示を、候補語グループ・候補語選択履歴記憶部４２内のユーザ選択確率の高い順に並び替える処理を行うことができる。なお、ユーザ選択確率による表示順序の変更処理については、実行の可否を選択することができるようにする。 The candidate word group / candidate word selection history storage unit 42 holds a group of candidate words and a probability that the user has selected a candidate word (user selection probability). The candidate word data analyzing / updating unit 41 refers to the candidate word group / candidate word selection history storage unit 42 and displays the candidate word of the group of candidate words corresponding to the edited portion, as a candidate word group / candidate word selection history storage unit 42. It is possible to perform processing for rearranging the items in descending order of user selection probability. It should be noted that whether or not to execute the display order changing process based on the user selection probability can be selected.

次に、本実施形態に係る単語連結規則について、いくつかの実施例を挙げて詳細に説明する。 Next, the word linking rules according to the present embodiment will be described in detail with some examples.

図８は、本発明に係る単語連結規則の実施例１である。図８において、単語連結規則１０１は、連続する２つの単語について、連結可能な組合せを記載したリストとして構成される。さらに、連結可能な組合せの２つの単語は、前方単語と後続単語として、その順序が規定される。単語連結規則１０１は、事前に準備され、音声認識装置１内のメモリに格納される。 FIG. 8 is Example 1 of the word connection rule according to the present invention. In FIG. 8, the word connection rule 101 is configured as a list describing combinations that can be connected to two consecutive words. Furthermore, the order of the two words of the connectable combination is defined as a forward word and a subsequent word. The word connection rule 101 is prepared in advance and stored in a memory in the speech recognition apparatus 1.

単語連結判定部２２は、認識結果に含まれる連続する２つ単語が、単語連結規則１０１中の連結可能な組合せ及び組合せられた２つの単語の順序に一致した場合に、該２つ単語を連結可能と判定する。なお、図８の例では、連続する２つの単語を対象にしたが、３つ以上の連続する単語を対象としたものを含めてもよい。 When two consecutive words included in the recognition result match the connectable combination in the word connection rule 101 and the order of the two combined words, the word connection determination unit 22 connects the two words. Judge that it is possible. In the example of FIG. 8, two consecutive words are targeted, but three or more consecutive words may be included.

図９は、本発明に係る単語連結規則の実施例２である。図９において、単語連結規則１０２は、連続する３つの単語について、連結可能な組合せを記載したリストとして構成される。さらに、連結可能な組合せの３つの単語は、１つ目の単語、２つ目の単語、３つ目の単語として、その順序が規定される。さらに、連結可能な３つの単語の組合せと該組合せた単語の順序についての言語確率が、リストに格納される。単語連結規則１０２は、事前に準備され、音声認識装置１内のメモリに格納される。単語連結規則１０２としては、例えばＮグラム等の言語モデルが利用可能である。 FIG. 9 is a second embodiment of the word linking rule according to the present invention. In FIG. 9, the word connection rule 102 is configured as a list describing combinations that can be connected to three consecutive words. Further, the order of the three words of the connectable combinations is defined as the first word, the second word, and the third word. Further, the language probabilities about the combinations of three words that can be connected and the order of the combined words are stored in the list. The word connection rule 102 is prepared in advance and stored in a memory in the speech recognition apparatus 1. As the word connection rule 102, for example, a language model such as N-gram can be used.

単語連結判定部２２は、認識結果に含まれる連続する３つ単語が、単語連結規則１０２中の連結可能な組合せ及び組合せられた３つの単語の順序に一致した場合において、該言語確率が閾値以上であるときに、該３つの単語を連結可能であると判定する。なお、図９の例では、連続する３つの単語を対象にしたが、２つ又は４つ以上の連続する単語を対象としたものを含めてもよい。 The word connection determination unit 22 determines that the language probability is equal to or greater than a threshold when three consecutive words included in the recognition result match the connectable combination in the word connection rule 102 and the order of the combined three words. If it is, it is determined that the three words can be connected. In the example of FIG. 9, three consecutive words are targeted, but two or four or more consecutive words may be included.

なお、単語連結規則１０２のリスト内に格納する言語確率は、音声認識部１３によって作成された単語のネットワーク形式の認識結果によって、更新するようにしてもよい。 Note that the language probabilities stored in the list of the word connection rules 102 may be updated according to the recognition result of the word network format created by the speech recognition unit 13.

また、言語確率を判定する閾値は、ユーザが指定することができるようにしてもよい。この場合、ユーザが閾値を指定するための操作キーを設ける。ユーザは、閾値を小さくすれば、長い候補語を生成する確率を高くすることができる。但し、言語確率が低くても候補語として連結されるので、不正解の候補語が多くなる確率は高まる。一方、ユーザは、閾値を大きくすれば、長い候補語を生成する確率を低くすることができる。但し、言語確率が低い場合は候補語として連結されないので、正解の候補語が多くなる確率は高まる。このように、言語確率を判定する閾値をユーザが指定することができるようにすることによって、ユーザは候補語の長さと正解率を調節することができる。 The threshold for determining the language probability may be specified by the user. In this case, an operation key is provided for the user to specify a threshold value. If the user decreases the threshold, the probability of generating a long candidate word can be increased. However, even if the language probability is low, they are connected as candidate words, so the probability that the number of incorrect answer candidate words increases. On the other hand, if the user increases the threshold, the probability of generating a long candidate word can be lowered. However, if the language probabilities are low, they are not connected as candidate words, so the probability that the number of correct candidate words will increase. In this way, by allowing the user to specify the threshold value for determining the language probability, the user can adjust the length of the candidate word and the accuracy rate.

図１０は、本発明に係る単語連結規則の実施例３である。図１０において、単語連結規則１０３は、連続する２つの単語について、連結可能な品詞の種類の組合せを記載したリストとして構成される。さらに、連結可能な組合せの２つの品詞の種類は、前方品詞と後続品詞として、その順序が規定される。単語連結規則１０３は、事前に準備され、音声認識装置１内のメモリに格納される。 FIG. 10 is a third embodiment of the word linking rule according to the present invention. In FIG. 10, the word connection rule 103 is configured as a list that describes combinations of types of parts of speech that can be connected for two consecutive words. Furthermore, the order of two types of parts of speech that can be connected is defined as a front part of speech and a subsequent part of speech. The word connection rule 103 is prepared in advance and stored in a memory in the speech recognition apparatus 1.

単語連結判定部２２は、認識結果に含まれる連続する２つ単語についての品詞情報に基づいて、該２つ単語の品詞が、単語連結規則１０３中の連結可能な組合せ及び組合せられた２つの品詞の順序に一致した場合に、該２つ単語を連結可能と判定する。なお、図１０の例では、連続する２つの単語の品詞を対象にしたが、３つ以上の連続する単語の品詞を対象としたものを含めてもよい。 Based on the part-of-speech information about two consecutive words included in the recognition result, the word connection determination unit 22 combines the two parts of speech in which the parts of speech of the two words are connectable in the word connection rule 103 and the two parts of speech combined. When the order matches, the two words are determined to be connectable. In the example of FIG. 10, the part of speech of two consecutive words is targeted. However, the part of speech of three or more consecutive words may be included.

図１１は、本発明に係る単語連結規則の実施例４である。図１１において、単語連結規則１０４は、連続する２つの単語について、連結可能な品詞の種類の組合せを記載したリストとして構成される。さらに、連結可能な組合せの２つの品詞の種類は、前方品詞と後続品詞として、その順序が規定される。さらに、連結可能な２つの品詞の組合せと該組合せた品詞の順序についての品詞連結コストが、リストに格納される。単語連結規則１０４は、事前に準備され、音声認識装置１内のメモリに格納される。 FIG. 11 is Example 4 of the word connection rule based on this invention. In FIG. 11, the word connection rule 104 is configured as a list in which combinations of types of parts of speech that can be connected are described for two consecutive words. Furthermore, the order of two types of parts of speech that can be connected is defined as a front part of speech and a subsequent part of speech. Furthermore, the part-of-speech concatenation cost for the combination of two connectable parts of speech and the order of the combined parts of speech is stored in the list. The word connection rule 104 is prepared in advance and stored in a memory in the speech recognition apparatus 1.

単語連結判定部２２は、認識結果に含まれる連続する２つ単語についての品詞情報に基づいて、該２つ単語の品詞が、単語連結規則１０４中の連結可能な組合せ及び組合せられた２つの品詞の順序に一致した場合において、該品詞連結コストが閾値以下であるときに、該２つ単語を連結可能と判定する。なお、図１１の例では、連続する２つの単語の品詞を対象にしたが、３つ以上の連続する単語の品詞を対象としたものを含めてもよい。 Based on the part-of-speech information about two consecutive words included in the recognition result, the word connection determination unit 22 combines the two parts of speech in which the parts of speech of the two words can be connected in the word connection rule 104. If the part-of-speech concatenation cost is equal to or less than a threshold value, the two words are determined to be concatenable. In the example of FIG. 11, the part of speech of two consecutive words is targeted, but the part of speech of three or more consecutive words may be included.

なお、品詞連結コストを判定する閾値は、ユーザが指定することができるようにしてもよい。この場合、ユーザが閾値を指定するための操作キーを設ける。ユーザは、閾値を大きくすれば、長い候補語を生成する確率を高くすることができる。但し、品詞連結コストが大きくても候補語として連結されるので、不正解の候補語が多くなる確率は高まる。一方、ユーザは、閾値を小さくすれば、長い候補語を生成する確率を低くすることができる。但し、品詞連結コストが大きい場合は候補語として連結されないので、正解の候補語が多くなる確率は高まる。このように、品詞連結コストを判定する閾値をユーザが指定することができるようにすることによって、ユーザは候補語の長さと正解率を調節することができる。 The threshold for determining the part-of-speech connection cost may be specified by the user. In this case, an operation key is provided for the user to specify a threshold value. If the user increases the threshold, the probability of generating a long candidate word can be increased. However, even if the part-of-speech connection cost is high, the words are connected as candidate words, so the probability that the number of incorrect answer candidate words increases. On the other hand, the user can reduce the probability of generating a long candidate word by reducing the threshold. However, when the part-of-speech concatenation cost is high, since it is not concatenated as a candidate word, the probability that the number of correct candidate words increases. As described above, by allowing the user to specify the threshold value for determining the part-of-speech connection cost, the user can adjust the length of the candidate word and the correct answer rate.

上述したように本実施形態によれば、単語連結規則に従って認識結果に含まれる連続する複数の単語を連結し、連結した単語列を候補語とする。これにより、単語単位のみではなく、文節単位、フレーズ単位、文単位の候補語を作成することができる。この結果として、ユーザは、単語単位のみではなく、文節単位、フレーズ単位、文単位の候補語を編集することができるので、ユーザが候補語を編集する操作の回数を低減することができるという効果が得られる。 As described above, according to the present embodiment, a plurality of consecutive words included in the recognition result are connected according to the word connection rule, and the connected word string is set as a candidate word. Thereby, not only a word unit but a phrase unit, a phrase unit, and a sentence unit candidate word can be created. As a result, the user can edit not only the word unit but also the phrase unit, the phrase unit, and the sentence unit candidate word, so that the user can reduce the number of operations for editing the candidate word. Is obtained.

なお、本実施形態に係る音声認識装置１は、専用のハードウェアにより実現されるものであってもよく、あるいはパーソナルコンピュータ等のコンピュータシステムにより構成され、図１に示される装置の各機能を実現するためのプログラムを実行することによりその機能を実現させるものであってもよい。 Note that the speech recognition apparatus 1 according to the present embodiment may be realized by dedicated hardware, or may be configured by a computer system such as a personal computer to realize each function of the apparatus shown in FIG. The function may be realized by executing a program to do so.

また、その音声認識装置１には、周辺機器として入力装置、表示装置等（いずれも図示せず）が接続されるものとする。ここで、入力装置とはキーボード、マウス、携帯電話端末のキー等の入力デバイスのことをいう。表示装置とはＣＲＴ（Cathode Ray Tube）や液晶表示装置等のことをいう。
また、上記周辺機器については、音声認識装置１に直接接続するものであってもよく、あるいは通信回線を介して接続するようにしてもよい。 In addition, an input device, a display device, and the like (none of which are shown) are connected to the voice recognition device 1 as peripheral devices. Here, the input device refers to an input device such as a keyboard, a mouse, or a key of a mobile phone terminal. The display device refers to a CRT (Cathode Ray Tube), a liquid crystal display device or the like.
The peripheral device may be connected directly to the speech recognition apparatus 1 or may be connected via a communication line.

また、図１に示す音声認識装置１の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、音声認識に係る処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＤＶＤ（Digital Versatile Disk）等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Further, by recording a program for realizing the function of the voice recognition device 1 shown in FIG. 1 on a computer-readable recording medium, and causing the computer system to read and execute the program recorded on the recording medium, Processing related to speech recognition may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
“Computer-readable recording medium” refers to a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a DVD (Digital Versatile Disk), and a built-in computer system. A storage device such as a hard disk.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、本発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。
例えば、上述の音声認識装置１は、ワードプロセッサー装置、電子メール装置などの文書作成を行う各種の装置と組合せて構成するようにしてもよい。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to this embodiment, The design change etc. of the range which does not deviate from the summary of this invention are included.
For example, the voice recognition device 1 described above may be configured in combination with various devices that create documents such as a word processor device and an electronic mail device.

本発明の一実施形態に係る音声認識装置１の全体構成を示すブロック図である。1 is a block diagram showing an overall configuration of a speech recognition apparatus 1 according to an embodiment of the present invention. 単語のネットワーク形式の認識結果の構成例である。It is an example of a structure of the recognition result of the network format of a word. 図１に示す候補語生成部１６の構成例である。It is a structural example of the candidate word production | generation part 16 shown in FIG. 本発明の一実施形態に係る単語連結処理後の認識結果の構成例である。It is an example of composition of a recognition result after word connection processing concerning one embodiment of the present invention. 同実施形態に係る候補語のグループ化処理後の認識結果の構成例である。It is an example of composition of a recognition result after grouping processing of a candidate word concerning the embodiment. 同実施形態に係る同一候補語の一元化処理後の認識結果の構成例である。It is an example of composition of a recognition result after unification processing of the same candidate word concerning the embodiment. 図１に示す候補語編集・表示部１７の構成例である。It is an example of a structure of the candidate word edit / display part 17 shown in FIG. 本発明に係る単語連結規則の実施例１である。It is Example 1 of the word connection rule which concerns on this invention. 本発明に係る単語連結規則の実施例２である。It is Example 2 of the word connection rule which concerns on this invention. 本発明に係る単語連結規則の実施例３である。It is Example 3 of the word connection rule which concerns on this invention. 本発明に係る単語連結規則の実施例４である。It is Example 4 of the word connection rule which concerns on this invention.

Explanation of symbols

１…音声認識装置、１１…音声入力部、１２…音響特徴量抽出部、１３…音声認識部、１４…音響モデル記憶部、１５…言語モデル記憶部、１６…候補語生成部、１７…候補語編集・表示部、１８…編集操作部、２１…単語連結部、２２…単語連結判定部、１０１，１０２，１０３，１０４…単語連結規則 DESCRIPTION OF SYMBOLS 1 ... Voice recognition apparatus, 11 ... Voice input part, 12 ... Acoustic feature-value extraction part, 13 ... Voice recognition part, 14 ... Acoustic model memory | storage part, 15 ... Language model memory | storage part, 16 ... Candidate word generation part, 17 ... Candidate Word editing / display unit, 18 ... editing operation unit, 21 ... word connection unit, 22 ... word connection determination unit, 101, 102, 103, 104 ... word connection rule

Claims

Speech recognition means for performing processing for recognizing input speech and generating a recognition result including a recognized word string;
Candidate word generating means for generating candidate words from the recognition result;
Candidate word display means for displaying candidate words on the screen;
Editing operation means for the user to edit the candidate words displayed on the screen;
Updating means for updating the recognition result in accordance with the contents edited by the user,
Establish a word linking rule when linking consecutive words,
The said candidate word production | generation means connects the several continuous word contained in the said recognition result according to the said word connection rule, The speech recognition apparatus characterized by using the connected word string as a candidate word.

The candidate word generation means includes:
Word connection determination means for determining whether it is possible to connect a plurality of consecutive words included in the recognition result according to the word connection rule;
Word linking means for linking a plurality of consecutive words included in the recognition result according to the determination result;
The speech recognition apparatus according to claim 1, comprising:

The speech recognition apparatus according to claim 2, wherein the word connection rule defines a combination of a plurality of connectable words and an order of the combined words.

Language probability storage means for storing a combination of a plurality of words connectable by the word connection rule and a language probability for the order of the combined words;
The word connection determination means can connect a plurality of corresponding words when the language probability is equal to or greater than a threshold for a combination of a plurality of words connectable according to the word connection rule and an order of the combined words. To determine,
The speech recognition apparatus according to claim 3.

The voice recognition apparatus according to claim 4, further comprising a threshold value specifying means for a user to specify the threshold value.

The speech recognition apparatus according to claim 2, wherein the word connection rule defines a combination of part of speech types of a plurality of connectable words and an order of the combined part of speech types.

Part-of-speech connection cost storage means for storing a part-of-speech connection cost for a combination of a plurality of part-of-speech types that can be connected by the word connection rule and an order of the combined part-of-speech types
The word linking determination means includes a plurality of corresponding parts of speech when the part of speech linking cost is equal to or lower than a threshold for a combination of part of speech types of a plurality of words connectable according to the word linking rule and an order of the combined part of speech types. Determines that the words can be connected,
The speech recognition apparatus according to claim 6.

The voice recognition apparatus according to claim 7, further comprising a threshold value specifying unit for a user to specify the threshold value.

A speech recognition function that recognizes input speech and generates a recognition result consisting of a sequence of recognized words;
A candidate word generation function for generating candidate words from recognition results;
A candidate word display function for displaying candidate words on the screen;
An editing operation function for the user to edit the candidate words displayed on the screen;
A computer program for causing a computer to implement an update function for updating a recognition result in accordance with editing contents by a user;
Establish a word linking rule when linking consecutive words,
The computer program characterized in that the candidate word generation function connects a plurality of consecutive words included in the recognition result according to the word connection rule, and uses the connected word string as a candidate word.