JPS5855993A

JPS5855993A - Voice data input unit

Info

Publication number: JPS5855993A
Application number: JP56153694A
Authority: JP
Inventors: 岡村　有人; 重光樋口
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1983-04-02

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は音声鍵、音声リモコンなどにおいて確実に間違
いなく音声暗号を認識し入力するための方法及び装置に
関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method and apparatus for reliably and correctly recognizing and inputting voice codes in voice keys, voice remote controls, and the like.

従来音声を認識して千−をＯＮ１０　Ｆ　Ｆするシステ
ムにおいて最も離しいとされている点は誤認識の問題で
ある。一般にこれらのシステムは使用者がある特定のキ
ーワードを発声してこれを認識するが、誤つて認識した
場合、使用者は一体どこの部分のデータ人力が不適当だ
ったか理解できないまま再度頭からキーワードを人力し
なければならず、時間的にもむだが多かったＯ本発明の目的は上記した従来技術の欠点をなくシ、関連
になく確実に音声データを入力するための音声データ入
力装置を提供するにある。The problem with conventional systems that recognize voices and turn them on is the problem of erroneous recognition. Generally, these systems recognize a specific keyword when the user utters it, but if the recognition is made by mistake, the user can re-read the keyword from the beginning without understanding where the data input was inappropriate. The object of the present invention is to eliminate the above-mentioned drawbacks of the prior art and to provide an audio data input device for reliably inputting audio data without any unrelated information. There is something to do.

本発明の特徴は複数の要素からなる音声データを装置側
のタイミング信号に同期して１要素ずつ入力し、１要素
のデータを入力するごとに音声ｌ！１ｍ＠路で判断した
結果を利用者に知らせることにより、間違いなく確実に
音声データを人力できる点である。The feature of the present invention is that audio data consisting of a plurality of elements is inputted one element at a time in synchronization with a timing signal on the device side, and each time one element of data is inputted, the audio data is inputted! By notifying the user of the results determined at 1m@road, the voice data can be reliably input manually.

第１図は、本発明による音声データ人力装置　　。FIG. 1 shows a voice data human power device according to the present invention.

の一実施例のブロック図である。FIG. 2 is a block diagram of an embodiment of the present invention.

１６はタイミング信号発生回路で、本実施例においては
信号は例えば“ピッ“という音な用い増幅器１８を介し
てスピーカ１２０より発生される０１２はマイク１９よ
り人力された音声データの要素を分析、認識するための
音声分析１１ｇ鎗回路である。１３は音声分析認識回路
１２によって認識された内容を音声で知らせるための音
声合成回路である。16 is a timing signal generation circuit, and in this embodiment, the signal is a "beep" sound, for example. The signal 012 is generated from a speaker 120 via an amplifier 18, and analyzes and recognizes the elements of audio data input manually from a microphone 19. This is a speech analysis 11g spear circuit. Reference numeral 13 denotes a speech synthesis circuit for notifying the contents recognized by the speech analysis and recognition circuit 12 by voice.

音声データは例えば数字で５５１２（さん、ご。For example, the voice data is a number 5512 (san, go, etc.).

いも−に）とする０この４つの数字の要素からなる音声
のデータを人力する場合のシステムの７０−を第２図に
示す図中、左側は使用者側の作業、右側は装置側の作業
を示す。音声入力作業２１において５（さん）、５（ご
）、１（いち）２（に）の４つの音声の要素Ｂｔっずっ
音声で入力する・人力ざｎたデータは音声認識ルーチン
２５において判断され、その判断結果は音声合成ルーチ
ン２６で音声合一されて使用者側にエコー／ｆ？り２２
される。使用者はそのエコーバッタの内容【聴いて判断
し、２５正しく判断されてぃいれば再度２１に戻つて同
じデータを人力する。Figure 2 shows the system 70- when manually inputting audio data consisting of these four numerical elements, with the left side showing the work on the user's side and the right side showing the work on the device side. shows. In the voice input task 21, the four voice elements of 5 (san), 5 (go), 1 (ichi), and 2 (ni) are input in voice.The human input data is judged in the voice recognition routine 25. , the judgment results are voice-combined in the voice synthesis routine 26 and echoed to the user. ri22
be done. The user listens to the contents of the echo locust and makes a judgment. If the judgment is correct, go back to step 21 and enter the same data manually.

正しく判断されたデータはデータスＦアルーチン２７に
より、ＲＡＭ１２３内にストアされる。Correctly determined data is stored in the RAM 123 by the data storage routine 27.

第５図にデータ入力のタイ之ングを示す。５５１２（さ
ん、ご、いち、に）のデータ入力において１（いち）の
データの人力が１度で正、しく行なわれなかった場合の
例を示Ｔｏ図中矩形で示、した“ピッ−はタイ電ングの
パルス信号音、丸で囲んだフミは使用声の大刀音声デー
タ、丸で囲まないかなは装置で判断した結果のエコーパ
ック音声合成音である。この例では５．５゜２は正しく
入力され１が１度ｔｐ−て２度目の人力で正しく人力さ
れた場合を示している。１度目の１（いち）の音声デー
タ人力３４の後に、装置はその内容ｔ−２（に）と誤つ
て認識しこという音声３５を合成して出力する。使用者
は認識結果が間違っていることに気付いて再度音声デー
タいち５７を入力するが、その前に図中５６で示すごと
く使用者は無音状態のブランクを１″′）置く。FIG. 5 shows the tying for data input. In the data input of 5512 (san, go, one, ni), an example is shown in which the manual input of data 1 (one) is not performed correctly at the first time. The pulse signal sound of the Thai Deng, the circled Fumi is the voice data of the voice used, and the not circled Kana are the echo pack voice synthesized sounds determined by the device.In this example, 5.5°2 is This shows a case where 1 is correctly inputted once tp- and then correctly entered manually the second time.After the first input of voice data 34 of 1, the device inputs its contents t-2 (to) The user inputs the voice data 157 again after realizing that the recognition result is incorrect, but before that, as shown at 56 in the figure, the user places a silent blank 1''').

このブランクは、直前に判断された内容は間違っている
ことを意味し、再度音声データの人力からやり直す。こ
のようにして再度１（いち）５７を人力し、その結果が
正しく１と判断され合成音声５８が出力されれば使用後
は次の音声データ２（に）５９を人力する０この様にし
て一連のデータ人力を終え、データ入力終了は、図中５
３１１に示すような２つ以上のブランク装置くことで装
置に知らせることができる。第１ｖ！ｉにおいて、ＭＰ
Ｕ（マイクロプロセッサユニツ））１１はＲＯＭ１２２
に納められたシステムソフトに従ってシステムをコント
ロールし、必要に応じて認識された音声データの内容も
しくはそれに付随したデータをインターフェイス１２４
ｆ介シて外部に出力する。ＭＰＵには例えば日立製作所
領の４ビットマイクロプロセッサＨＭＣ８４０シリーズ
を用いて構成することができるインターフェイス１２４
は上記の機能の他に、外部の機器との結合を司る。This blank means that the content determined just before is wrong, and the process is started again manually from the voice data. In this way, manually input 1 (ichi) 57 again, and if the result is correctly judged as 1 and the synthesized voice 58 is output, after use, manually input the next voice data 2 (ni) 59. After completing a series of data input operations, the end of data input is indicated at 5 in the diagram.
The device can be informed by providing two or more blank devices as shown at 311. 1st v! In i, MP
U (microprocessor unit)) 11 is ROM122
The system is controlled according to the system software stored in the interface 124, and the content of the recognized voice data or data accompanying it is transferred to the interface 124 as necessary.
Output to the outside via f. The MPU includes an interface 124 that can be configured using, for example, a 4-bit microprocessor HMC840 series manufactured by Hitachi.
In addition to the above functions, the function also controls connections with external devices.

音声の認識には例えば、ＰＡＲＣＯＲ分析による距離計
算方式を用いる。For example, a distance calculation method based on PARCOR analysis is used for speech recognition.

ＰＡＲＣＯ＆分析のアルゴリズムと手法はよく公知され
ており、ここでは詳述しない０ＰＡＲＣＯＲ分析の結果
音声データが持つ物理パラメータ（ＰＡＲＣＯＲ係数、
ピッチ情報、振幅情報など）が計算される〇本実施例においては音声のデータは、例えば０〜９まで
の１０個の要素で構成され、それらのデータは音声の０
（ぜろ）、１（いち）、２（に）、５（さん）＊　４（
ｔ、）、Ｓ（ご）、６（ろ＜）、７（Ｌち）、８（けち
）、９（く）で与えられる。これら音声の特徴はＰＡＲ
ＣＯＲ係数などｎコの物理パラメータで構成されるｎ次
元空間のベクトルとして表わされ、上記１０個の音声の
特徴は、ｎ次元のベクトルのデータとしてＲＯＭ１２２
にあらかじめ収納されるかまたは音声データを人力する
に先がけてあらかじめり７アレンスデータ（ぜろ、いち
、に、ざん。The algorithms and methods of PARCO & analysis are well known, and the physical parameters (PARCOR coefficients,
Pitch information, amplitude information, etc.) are calculated. In this embodiment, audio data is composed of 10 elements, for example from 0 to 9, and these data are
(zero), 1 (ichi), 2 (ni), 5 (san) * 4 (
It is given by t, ), S (go), 6 (ro<), 7 (Lchi), 8 (stingy), and 9 (ku). The characteristics of these voices are PAR
It is expressed as a vector in an n-dimensional space composed of n physical parameters such as COR coefficients, and the above 10 audio features are stored in the ROM 122 as n-dimensional vector data.
7 arrangement data (zero, one, two, three) are pre-stored in the computer or recorded in advance before the audio data is manually processed.

し、ご、ろく、シち、はち、くの１０個の音声）をマイ
ク１９より人力して分析認識回路１２で物理パラメータ
を計算しＲＡＭ１２５に収納しておく。The physical parameters are calculated by the analysis recognition circuit 12 by manually inputting the 10 voices (shi, go, roku, shi, hachi, ku) from the microphone 19 and stored in the RAM 125.

従って３（ざん）、５（ご）、１（いち）。Therefore, 3 (zan), 5 (go), 1 (ichi).

２（に）というデータご音声で人力して認識させる場合
には人力した各々の音声の持つ特徴全分析９ｗｔ−回路
１２で分析し、得られたｎ次元の物理パラメータのベク
トル−ａｋ（ｋ＝０　＊　１　＊・・・。2 (ni) When the data voice is manually recognized, the features of each human-generated voice are analyzed by the 9wt-circuit 12, and the obtained n-dimensional physical parameter vector -ak (k= 0 * 1 *...

９）と上述のＲＯＭ１２２もしくはＲＡＭ１２３にあら
かじめ収納されているり７アレンスのベクトルｂ　（Ｊ
−Ｑ、　１　、・・・、９）との距離を求めて０〜９の
いずれに最も近いかを求めることになる。演算はＭＰＵ
１１によりて行なわれる。具体的演算は次の様になるＯ
もし、入力された音声データが持つｎ個のパラメータ（
ＲＡＲＣＯＲ係数など）の値を”Ｊ　（ｋ　＝Ｏｓ　１
　＊　””９＊　ｊ−１，２，・・・、ｎ）、用意され
た基準となる音声データの同様のパラメータをｂｊｊ（
ｊ＝Ｏ−１ｅ　”・、９゜１．２．・・・、ｎ）とする
ならば人力されたデータと基準とのデータの距１１１ｄ
ｋ、ｌは次の様に表わされる。9) and the vector b (J
-Q, 1, . . . , 9) and find which one of 0 to 9 it is closest to. Calculation is done by MPU
11. The specific calculation is as follows.
If the input audio data has n parameters (
RARCOR coefficient, etc.) is “J” (k = Os 1
* ""9* j-1, 2, ..., n), similar parameters of the prepared reference audio data bjj (
If j=O−1e ”·, 9°1.2...., n), the distance between the manually generated data and the reference data is 111d.
k and l are expressed as follows.

サフィックス、ｊはｎフの物理パラメータを表ｔ）ｆサ
フィックス、α量はエコの物理ノぐラメータを規格化ま
たは重み付けするための係数である。The suffix, j, represents the physical parameter of nf.t) The f suffix, α amount, is a coefficient for normalizing or weighting the eco physical parameter.

ＭＰＵ＋１は入力された音声データに、＝に、の分析結
果ｔｋ、に対して式（１）の計算を全てのｊの１ｊにつ
いて行ないそのうちで最も小ざな値を取るＩ＝１．を認
識結果とする。すなわちに＋）＝ｊ。The MPU+1 calculates the formula (1) for all 1j of the input audio data with respect to the analysis result tk of =, and takes the smallest value of I=1. is the recognition result. In other words, +) = j.

のとき音声データは正しく入力されたことになる。When , the audio data has been input correctly.

音声合成回路１５は上記した演算結果ｊ−１Ｏに従１て
エコーバックの音声を合成して発声する。認識結果は、
第４図に示すような４ビツトのデータとして表現され、
そのデータ【もとにＲＯＭ１２２内に収納されている音
声合成のためのアドレスデータ（後述）はデータバス１
２５を経て音声合成回路１５へと送られる。The voice synthesis circuit 15 synthesizes and utters the echo back voice according to the above calculation result j-1O. The recognition result is
It is expressed as 4-bit data as shown in Figure 4,
The data [address data (described later) originally stored in the ROM 122 for voice synthesis is data bus 1]
25 and is sent to the speech synthesis circuit 15.

音声合成回路１３は音声合成部１４と音声メモリ部１５
から構成されている。音声メモリ部１５は、合成すべき
音声（ぜろ、いち、・・・、＜）のＰＡＲＣＯＲ係数、
ピッチ情報、振幅情報などのデータを格納しており、例
えば日立製作所製のＨＩ）５８８８２が用いられる。音
声合成部１４はＭＰＵ＋＋から音声合成のために必要な
データが収納　　４されている音声メモリにおける先頭
アドレスの指定を受け、これに基づいて音声ブロックメ
モリ１５から当該データを読取って音声信号を合成する
もので、例えば、日立製作所製の音声合成用のＬＳＩで
あるＨＤ３８８８０が用いられる。The speech synthesis circuit 13 includes a speech synthesis section 14 and a speech memory section 15.
It consists of The voice memory unit 15 stores PARCOR coefficients of voices to be synthesized (zero, one, . . . , <),
It stores data such as pitch information and amplitude information, and uses, for example, HI58882 manufactured by Hitachi. The speech synthesis unit 14 receives from the MPU++ the designation of the start address in the speech memory in which data necessary for speech synthesis is stored, and based on this, reads the data from the speech block memory 15 and synthesizes the speech signal. For example, HD38880, which is an LSI for speech synthesis manufactured by Hitachi, Ltd., is used.

第５図は音声メモリの内容を図式化したちのでるための
データが収納されているブロックの先頭番地（１６進数
４クタ）を表わしている。合成された音声はアンプ１８
を介してスピーカ１２０より発声される。FIG. 5 shows the starting address (4 digits in hexadecimal) of a block in which data for displaying the contents of the audio memory is stored. The synthesized voice is sent to the amplifier 18
The voice is emitted from the speaker 120 via the .

以上実施例で示したごとく、本発明による音声人力装置
を用いれば、間違うことなく確実に音声のデータを人力
することができる０なお本実施例においては入力音声デ
ータを認。As shown in the embodiments above, by using the human voice input device according to the present invention, it is possible to input voice data reliably without making any mistakes.In this embodiment, input voice data is recognized.

識した結果を音声合成にてエコーバックする方式につい
て述べているが、これに限ることなく他の手段例えばＣ
ＲＴディスプレイなトラ用イてもその効果に変りはない
。Although this article describes a method of echoing back the recognized results using speech synthesis, other methods such as C
Even if you use the RT display for tigers, the effect remains the same.

[Brief explanation of the drawing]

第１図は本発明による音声データ入力装置の構成を示す
図、第２図は本発明による音声データ入力装置の動作の
フローを示す図、第３図は警音声データの人力のタイ々ングを示す図、第４図は人力
した音声を分析し認識した結果をデータ化する場合のデ
ータのビクＦパターンを示す図、第５図は音声データＲ
ＯＭのアドレスを示す図である。１２・・・音声分析認識回路、１５・・・音声合成回路１６・・・タイ々ング信号発生回路、２１・・・音声入力作業、２２・・・エコーバック、２６・・・音声合成ルーチン。代理人弁理士　薄　１）利　幸４．−２、才１図牙　Ｚ　図２才　＋　図ＭＳＦ３　　　　　　　１−３Ｂ才　　、デ　　目FIG. 1 is a diagram showing the configuration of the voice data input device according to the present invention, FIG. 2 is a diagram showing the operation flow of the voice data input device according to the present invention, and FIG. 3 is a diagram showing the manual timing of police voice data. Figure 4 is a diagram showing the BIC F pattern of data when human-generated voice is analyzed and the recognition results are converted into data, and Figure 5 is voice data R.
It is a figure which shows the address of OM. 12...Speech analysis recognition circuit, 15...Speech synthesis circuit 16...Timing signal generation circuit, 21...Speech input work, 22...Echo back, 26...Speech synthesis routine. Representative Patent Attorney Susuki 1) Yuki Toshi 4. -2, Sai1 Zug Z Figure 2 Sai + Figure MSF3 1-3B Sai, De eyes

Claims

[Claims]

1. A voice recognition circuit for recognizing voice data composed of 11 or more elements, a means for echoing back the recognized content to the user according to the result of the discussion, and the voice data. It is equipped with a circuit for generating a signal for timing the input of the input and the generation of echo back by voice synthesis, and a means for controlling the above circuit, and the generated sound of the voice data is determined based on the content of the echo back. A voice data human-powered device, characterized in that if the content is different from the manually-generated data, the data can be manually input again, and the voice data can be reliably inputted by repeating the above procedure.