JPS6370298A

JPS6370298A - Double consonant recognition equipment

Info

Publication number: JPS6370298A
Application number: JP61214858A
Authority: JP
Inventors: 博松浦
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1986-09-11
Filing date: 1986-09-11
Publication date: 1988-03-30

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は音声認識を行う場合における促音ル２識装置に
関するものである。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to a consonant letter recognition device for voice recognition.

（従来の技術とその問題点）従来音声の認識を行う場合において促音の認識は欠くこ
とのできないものである。従来の促音の処理としては例
えば「つ」と発音しこれをキー操作等により「っ」に変
換することが行なわれた。(Prior art and its problems) In the conventional speech recognition, recognition of consonants is indispensable. Conventional consonant processing involves, for example, pronouncing ``tsu'' and converting it into ``tsu'' by operating a key or the like.

しかしこの方法ではキー操作が面倒であるという問題点
を有していた。また「っ」の部分は無音が長く続くこと
、すなわち「かったく勝った）」と「かた（型）」では
「か」と「た」の間の無音区間は「勝った」の方が「型
」より長いことを利用して自動認識をすることが行なわ
れていた。However, this method has the problem that key operations are troublesome. Also, the "tsu" part is silent for a long time, in other words, in "kakutoku katsuta" and "kata (kata)", the silent section between "ka" and "ta" is better in "katsuta". Automatic recognition was carried out by taking advantage of the fact that it was longer than the ``type''.

しかしこの方法では無音の存在区間の長さは発声ごとに
ばらつく。このばらつきは発声者が異なるとざらに大き
いものとなり、促音の認識は非常に難しいものとなる。However, with this method, the length of the silent interval varies from utterance to utterance. This variation becomes even larger when the speakers are different, making it extremely difficult to recognize consonants.

一方、発声者が同じだと比較的小さいが、それでも促音
の前後の音韻によって大きくなることもおり、促音の存
在している時と存在していない時とを明確に区別できず
誤認識されてしまうことも多いという問題点がおった。On the other hand, if the utterer is the same, the sound is relatively small, but even so, it may become louder depending on the phoneme before and after the consonant, and it is not possible to clearly distinguish when a consonant is present and when it is not, leading to misrecognition. The problem was that it was often put away.

本発明はこのような問題点に対処してなされたもので、
その目的とするところは促音の認識が確実に行われ操作
性のよい促音認識装置を提供するところにある。The present invention has been made to address these problems.
The purpose is to provide a consonant recognition device that can reliably recognize consonant consonants and has good operability.

［発明の構成コ（問題点を解決するための手段）前記目的を達成するために本発明は音声を電気信号に変
換する音声入力部と、電気信号に変換された音声の音響
パラメータを求める分析部と、前記音響パラメータから
無音か有音かを決定する無音有音検出部と、音声入力単
位を示すキー操作の入力を行うキー入力部と、前記無音
有音検出部の出力から無音区間長を検出し、無音区間長
がある閾値になったと判断したときには促音と認識し促
音の表示を行わしめる制御を行い、無音区間長が前記閾
値より大ぎい別の閾値になったとぎには前記促音の表示
を取り消す制御を行い、前記キー入力部から音声入力単
位を示すキー人力があると音声入力単位の終端の促音の
表示を取り消す制御を行う表示制御部と、前記表示部の
指令に応じて音韻を表示する表示部とを具備することを
特徴とする。[Configuration of the Invention (Means for Solving Problems) In order to achieve the above object, the present invention provides an audio input unit that converts audio into an electrical signal, and an analysis method that determines the acoustic parameters of the audio that has been converted into an electrical signal. a silence/sound detecting unit that determines silence or utterance based on the acoustic parameters; a key input unit that inputs a key operation indicating a unit of audio input; and a silent section length determined from the output of the silence/sound detecting unit. is detected, and when it is determined that the length of the silent section has reached a certain threshold, it is recognized as a consonant and control is performed to display the consonant, and when the length of the silent section is greater than the threshold and reaches another threshold, the consonant is displayed. a display control unit that performs control to cancel the display of a consonant at the end of the voice input unit when a key indicating a voice input unit is pressed from the key input unit; The present invention is characterized by comprising a display section that displays phonemes.

（作　用）無音区間長が閾値よりも小さいときには促音の表示は行
なわれず、無音区間長が別の閾値よりも小さいときには
促音が表示され、無音区間長が別の閾値よりも大きいと
きに促音の表示消去が行われ、またキー操作によって促
音の消去が行われる。(Function) When the silent interval length is smaller than a threshold, a consonant is not displayed, when the silent interval length is smaller than another threshold, a consonant is displayed, and when the silent interval length is greater than another threshold, a consonant is displayed. The display is erased, and the consonants are also erased by key operation.

（実施例）以下、図面を参照し工水発明の詳細な説明する。第１図
は本実施例に係る促音認識装置の構成を示すブロック図
でおり同図に示されるようにこの促音認識装置は音声入
力部１、分析部２、無音有音検出部３、表示制御部４、
表示部５、キー入力部６からなる。(Example) Hereinafter, the industrial water invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a consonant recognition device according to the present embodiment. As shown in the figure, this consonant recognition device includes a voice input section 1, an analysis section 2, a voiceless utterance detection section 3, and a display control section. Part 4,
It consists of a display section 5 and a key input section 6.

マイクロホン等を介して入力される音声信号は音声入力
部１を介して電気信号に変換され、分析部２に導かれる
。この分析部２は、例えば１６チヤンネルのバンドパス
フィルタ群からなり、音声の特徴を効果的に表現する音
響パラメータを求めるものである。なお、この音響パラ
メータは、上記バンドパスフィルタ群の出力に限られる
ものではなく、ケプストラム係数や相関分析によって得
られる種々のパラメータのうちの一部、または複数の組
合せとして与えられるものであっても良い。An audio signal input via a microphone or the like is converted into an electrical signal via an audio input section 1 and guided to an analysis section 2. The analysis section 2 is composed of, for example, a group of 16 channel band-pass filters, and is used to find acoustic parameters that effectively express the characteristics of the voice. Note that this acoustic parameter is not limited to the output of the band-pass filter group described above, but may be given as a part of various parameters obtained by cepstral coefficients or correlation analysis, or as a combination of multiple parameters. good.

しかしここでは、分析部２は、上記音響パラメータのデ
ータと共に、全帯域におけるパワーを求め、これも”ｆ
ｌＶパラメータとする。However, in this case, the analysis unit 2 calculates the power in the entire band together with the data of the acoustic parameters, and also calculates the power in the entire band.
Let it be the lV parameter.

無音有音検出部３は音響パラメータから音声フレームが
無音か有音かを決定する。無音か有音かの決定は閾値に
よって行われる。例えばパワーが問直ＴＰ以上のとき有
音とし、閾値Ｔｐより小さいとき無音というように無音
有音検出部３において決定される。そして無音か有音か
を示す信号が表示制御部４に送られる。The silence/speech detection unit 3 determines whether the audio frame is silent or speech based on the acoustic parameters. The determination of silence or sound is made by a threshold value. For example, the soundless/sound detecting unit 3 determines that there is a sound when the power is equal to or higher than the interrogation TP, and that there is no sound when the power is smaller than the threshold Tp. Then, a signal indicating whether there is no sound or sound is sent to the display control section 4.

表示制御部４は無音フレームをカウントする無音カウン
タを備えており無音フレームのカウント数を無音区間長
とし、無音区間長がある閾値Ｔａになったと判断したと
きには促音と認識し促音の表示を行わしめる制御を行い
、無音区間長が前記閾値より大きい別の閾値Ｔｂになっ
たときには前記促音の表示を取り消す制御を行い、キー
入力部６から音声入力単位を示すキー人力があると音声
入力単位の終端の促音の表示を取り消す１ｔｉｌ制御を
行う。表示部５は認識された音声の表示を行う。キー入
力部６は操作キーが配列されており操作者が音声入力単
位を示すキー操作等を行える。The display control unit 4 is equipped with a silence counter that counts silent frames, and uses the counted number of silent frames as the silent interval length, and when it determines that the silent interval length has reached a certain threshold value Ta, it recognizes it as a consonant and displays the consonant. When the silent interval length reaches another threshold value Tb larger than the threshold value, the display of the consonant is canceled, and when the key indicating the voice input unit is pressed from the key input unit 6, the end of the voice input unit is pressed. Performs 1til control to cancel the display of the consonant. The display unit 5 displays the recognized voice. The key input unit 6 has operation keys arranged, and the operator can perform key operations indicating voice input units.

次に本実施例の動作について説明する。Next, the operation of this embodiment will be explained.

操作者が「か」を発音するとこの音声データは音声入力
部１により電気信号に変換され、分析部２により音響パ
ラメータが求められ、無音有音検出部３により有音と判
定される。ざらに表示１ｌｌｌＪ　ｔｅ１部４を介して
表示部５において「か」が表示される。When the operator pronounces "ka", this audio data is converted into an electrical signal by the audio input section 1, acoustic parameters are determined by the analysis section 2, and the sound is determined to be uttered by the utterance detection section 3. "Ka" is displayed on the display section 5 via the rough display 1llllJ te1 section 4.

次に操作者が「た」を発音するのであるが「か」を発音
してから「た」を発音するまでの時間に応じて場合を分
けて説明する。Next, the operator pronounces "ta", and the cases will be explained based on the time from pronouncing "ka" to pronouncing "ta".

第２図（ａ）は「か」と「た」を連続的に発音した場合
である。「か」の発音が終了すると無音有音検出部３に
より無音と判定され表示制御部４の無音カウンタがリセ
ットされ無音フレーム数がカウントされ「た」が発音さ
れるまでの無音区間長が検出される。この場合この無音
区間長はあらかじめ設定された閾値Ｔａよりも小さいの
で表示制御部４は「か」と「だ」が連続的に発音された
ものとみなし表示部５において「かた」が表示される。Figure 2(a) shows the case where ``ka'' and ``ta'' are pronounced consecutively. When the pronunciation of "ka" is finished, the silence detection section 3 determines that there is no sound, and the silence counter of the display control section 4 is reset, the number of silent frames is counted, and the length of the silent period until "ta" is pronounced is detected. Ru. In this case, since the silent interval length is smaller than the preset threshold value Ta, the display control section 4 assumes that "ka" and "da" are pronounced consecutively, and the display section 5 displays "kata". Ru.

第２図（ｂ）は操作者が「かった」と発音した場合を示
している。第２図（ａ）と同様に「ｈりの発音が終了す
ると表示制御部４のなかの無音カウンタがリセットされ
、無音区間長が検出される。FIG. 2(b) shows a case where the operator pronounces "It was". Similarly to FIG. 2(a), when the pronunciation of "h" ends, the silence counter in the display control section 4 is reset, and the length of the silent section is detected.

このときの無音区間長は閾値Ｔａよりも大きいので表示
制御部４は「か」と「た」の間に促音が必るものとみな
し表示部５に「かった」が表示される。Since the silent interval length at this time is greater than the threshold value Ta, the display control section 4 assumes that a consonant is necessary between "ka" and "ta", and displays "ita" on the display section 5.

第２図（Ｃ）は操作者が「か」を発音したのち発音を休
止ししばらくして「た」を発音した場合を示している。FIG. 2(C) shows a case where the operator pronounces "ka", pauses the pronunciation, and then pronounces "ta" after a while.

この場合は「か」の発音の終了後無音カウンタが動作し
無音区間長が検出される。In this case, after the pronunciation of "ka" ends, the silence counter operates to detect the length of the silent period.

この無音区間長は閾値Ｔａよりも大ぎいので表示制御部
４は「か」と「た」の間に促音がおるものとみなし表示
部５に一旦「かった」と表示される。Since this silent interval length is greater than the threshold value Ta, the display control section 4 assumes that there is a consonant between "ka" and "ta", and the display section 5 temporarily displays "Kat".

しかしこの無音区間長は第２の閾値Ｔｂよりも大きいの
で表示制御部４は促音の消去を行い表示部５には「かた
」と表示される。However, since this silent interval length is larger than the second threshold Tb, the display control section 4 erases the consonant and the display section 5 displays "kata".

第２図（ｄ＞は分節等の入力単位の最後に促音が入らな
いようにするためのものである。操作者が「か」の発音
を終了すると無音カウンタが動作しこの場合の無音区間
長は閾値下ａよりも大きいので表示部５には「かつ」と
表示される。文節の終りで表示部５に促音が表示されて
いるので操作者はキー入力部６の所定のキーを操作して
これを消去する。表示制御部４はキー入力部６からの指
令により促音の消去を行い表示部５において「か」と表
示される。Figure 2 (d> is to prevent a consonant from entering at the end of an input unit such as a segment. When the operator finishes pronouncing "ka", the silence counter operates and the length of the silent interval in this case is calculated. is larger than the lower threshold a, so "and" is displayed on the display section 5. Since a consonant is displayed on the display section 5 at the end of the phrase, the operator operates a predetermined key on the key input section 6. The display control section 4 erases the consonant in response to a command from the key input section 6, and the display section 5 displays "ka".

このように本実施例では従来のようにキー操作により「
つ」を「っ」に変換したりする必要がない。また促音の
存在している時と存在していない時とを明確に区別でき
確実な音声認識を行うことができる。In this way, in this embodiment, "
There is no need to convert tsu to tsu. Furthermore, it is possible to clearly distinguish between the presence and absence of a consonant, and to perform reliable speech recognition.

［発明の効果］以上詳細に説明したように本発明によれば、促音の認識
が確実に行われ、しかも操作性の良い促音認識装置を提
供提供することができる。[Effects of the Invention] As described above in detail, according to the present invention, it is possible to provide a consonant recognition device that reliably recognizes consonants and has good operability.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係る促音認識装置の構成を
示すブロック図、第２図は実施例の動作を示す説明図で
おる。１・・・音声入力２・・・分析部３・・・無音有音検出部４・・・表示制御部５・・・表示部６・・・キー入力部FIG. 1 is a block diagram showing the configuration of a consonant recognition device according to an embodiment of the present invention, and FIG. 2 is an explanatory diagram showing the operation of the embodiment. 1...Voice input 2...Analysis section 3...Speechless sound detection section 4...Display control section 5...Display section 6...Key input section

Claims

[Scope of Claims] An audio input unit that converts audio into an electrical signal, an analysis unit that determines acoustic parameters of the audio converted into an electrical signal, and a silence/sound detection unit that determines whether there is silence or utterance based on the acoustic parameters. a key input unit for inputting a key operation indicating a voice input unit; detecting a silent interval length from the output of the voiceless utterance detection unit, and recognizing it as a consonant when determining that the silent interval length has reached a certain threshold; control is performed to display a consonant, and when the silent interval length reaches another threshold value greater than the threshold, control is performed to cancel the display of the consonant, and a key input indicating a voice input unit is input from the key input section. A display control unit that performs control to cancel display of a consonant at the end of a voice input unit; and a display unit that displays a phoneme in accordance with a command from the display unit.