Nothing Special   »   [go: up one dir, main page]

JP3092788B2 - Speaker recognition threshold setting method and speaker recognition apparatus using the method - Google Patents

Speaker recognition threshold setting method and speaker recognition apparatus using the method

Info

Publication number
JP3092788B2
JP3092788B2 JP08004508A JP450896A JP3092788B2 JP 3092788 B2 JP3092788 B2 JP 3092788B2 JP 08004508 A JP08004508 A JP 08004508A JP 450896 A JP450896 A JP 450896A JP 3092788 B2 JP3092788 B2 JP 3092788B2
Authority
JP
Japan
Prior art keywords
speaker
model
voice
threshold value
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP08004508A
Other languages
Japanese (ja)
Other versions
JPH09198086A (en
Inventor
知子 松井
貞煕 古井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP08004508A priority Critical patent/JP3092788B2/en
Publication of JPH09198086A publication Critical patent/JPH09198086A/en
Application granted granted Critical
Publication of JP3092788B2 publication Critical patent/JP3092788B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】この発明は、例えば入力され
た音声により暗証番号の人と同一人であることを同定し
たりするために用いられ、入力音声を、特徴パラメータ
を用いた表現形式に変換し、その表現形式による入力音
声と、あらかじめ話者対応に登録された上記表現形式に
よる音声のモデルとの類似度を求めて、入力音声を発声
した話者を認識する話者認識方法における、話者の判定
に用いるしきい値の設定方法及びこの方法が適用された
話者認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is used, for example, to identify the same person as a personal identification number based on an input voice, and to convert the input voice into an expression form using feature parameters. In the speaker recognition method for recognizing a speaker who uttered the input voice, a similarity between the input voice in the expression and the model of the voice in the above-mentioned expression registered in advance for the speaker is obtained. The present invention relates to a method for setting a threshold value used for speaker determination and a speaker recognition device to which the method is applied.

【0002】[0002]

【従来の技術】図3に従来のテキスト独立形話者認識を
例としたその装置の機能構成を示す。まず話者の登録を
するが、各話者ごとに発声した文章などの音声(登録用
音声)が入力端子11より特徴パラメータ抽出手段12
に入力され、音声に含まれる特徴パラメータ(例えばケ
プストラム、ピッチなど)を用いた表現形式に変換さ
れ、この特徴パラメータの時系列に変換された登録用音
声データから、モデル作成手段13でその音声のモデ
ル、例えば隠れマルコフモデル(Hidden Markov Model:
HMM と記す。例えば複数のガウス分布の重み付き加算で
表現される)が作成される。HMMを作成する方法とし
ては、例えば文献「松井知子、古井貞煕:“VQ、離散
/連続HMMによるテキスト独立形話者認識法の比較検
討”、電子情報通信学会音声研究会資料、SP91−8
9、1991」に述べられている方法などを用いること
ができる。このようにして得られた話者ごとのHMMは
話者対応にモデル蓄積部14に登録される。
2. Description of the Related Art FIG. 3 shows a functional configuration of a conventional text independent speaker recognition apparatus. First, a speaker is registered. A voice (registration voice) such as a sentence uttered for each speaker is input from the input terminal 11 to the feature parameter extracting unit 12.
Is converted into an expression format using feature parameters (for example, cepstrum, pitch, etc.) included in the voice, and the model creation unit 13 converts the time-series registration voice data of the feature parameter Models, such as Hidden Markov Model:
Notated as HMM. For example, a plurality of Gaussian distributions are represented by weighted addition). As a method of creating an HMM, for example, a document “Tomoko Matsui, Sadahiro Furui:“ VQ, Comparative Study of Text-Independent Speaker Recognition Method Using Discrete / Continuous HMM ”, IEICE Speech Research Group, SP91-8
9, 1991 "can be used. The HMM for each speaker obtained in this manner is registered in the model storage unit 14 for each speaker.

【0003】話者を認識する場合は、その話者の発声音
声が入力端子11から特徴パラメータ抽出手段12に入
力されて、特徴パラメータの時系列に変換され、この入
力音声の特徴パラメータの時系列は類似度計算手段15
でモデル蓄積部14に蓄えられた各話者のHMMとの類
似度が計算され、その計算結果は、話者認識判定手段1
6で、しきい値蓄積部17に蓄積されている、本人の声
とみなせる類似度の変動の範囲を考慮したしきい値と比
較され、そのしきい値より大きければ、その入力音声は
類似度計算に用いたHMMの登録話者の音声であると判
定され、しきい値より小さければその他の人の音声であ
ると判定され、この判定結果が出力される。
When recognizing a speaker, the uttered voice of the speaker is input from an input terminal 11 to a characteristic parameter extracting means 12, and is converted into a time series of characteristic parameters. Is similarity calculating means 15
Calculates the similarity between each speaker stored in the model storage unit 14 and the HMM.
In step S6, the input voice is compared with a threshold value stored in the threshold value storage unit 17 in consideration of a range of similarity fluctuation that can be regarded as a person's voice. The voice is determined to be the voice of the registered speaker of the HMM used for the calculation, and if smaller than the threshold value, it is determined to be voice of another person, and this determination result is output.

【0004】従来においてしきい値を設定するために、
本人棄却率と詐称者受理率との二つの誤り率が考慮され
ていた。本人棄却率は全登録用音声のうち、本人の登録
用音声を用いた話者認識実験の結果から求められるもの
で、本人が誤って棄却される率を表し、詐称者受理率は
詐称者の音声を用いた話者認識実験の結果から求められ
るもので、詐称者が誤って受理される率を表している。
話者認識の目的によって、本人棄却率の方が詐称者受理
率よりも重要であったり、またその逆であったりする。
目的がはっきりしない場合には、ベイズの定理より、本
人棄却率と詐称者受理率が等しい、等誤り率を与える値
が最適なしきい値(等誤り率のしきい値)としていた。
図4Aに示すように、本人棄却率を示す曲線21はしき
い値を大きくするに従って大となる。一方、詐称者受理
率を示す曲線22はしきい値が大きくなるに従って減少
する。従来においては詐称者を本人以外の登録話者とし
て、全登録用音声を用いて、各モデル(HMM)との類
似度を計算して話者認識を行い、その際に、話者判定用
のしきい値を変化させ、つまり話者認識実験を行い、図
4Aに示した本人棄却率曲線21と詐称者受理率曲線2
2との交点、つまり両誤り率が等しい値ε0 となるしき
い値φ0 を求めてしきい値として設定し、即ち登録用音
声による等誤り率のしきい値を設定していた。
Conventionally, to set a threshold value,
Two error rates were considered, the rejection rate and the false acceptance rate. The rejection rate is obtained from the results of a speaker recognition experiment using the registration voice of the individual out of all the registered voices, and indicates the rate of false rejection of the individual. It is obtained from the result of a speaker recognition experiment using speech, and indicates the rate at which impostors are incorrectly accepted.
Depending on the purpose of speaker recognition, the false rejection rate may be more important than the false acceptance rate, and vice versa.
When the purpose was not clear, Bayes' theorem was used to determine the optimal threshold value (equal error rate threshold) at which the false rejection rate and the impostor acceptance rate were equal and gave the equal error rate.
As shown in FIG. 4A, the curve 21 indicating the rejection rate increases as the threshold value increases. On the other hand, the curve 22 indicating the impostor acceptance rate decreases as the threshold value increases. Conventionally, the impostor is a registered speaker other than himself / herself, and the similarity to each model (HMM) is calculated and speaker recognition is performed using all registration voices. The threshold was changed, that is, a speaker recognition experiment was performed, and the rejection rate curve 21 and the impostor acceptance rate curve 2 shown in FIG.
Intersection of the 2, that is set as a threshold seeking threshold phi 0 both error rate becomes equal epsilon 0, ie sets the threshold for equal error rate by a registered voice.

【0005】[0005]

【発明が解決しようとする課題】しかし、本人のモデル
が発声内容の違い、発声変動などに対して十分に頑健で
ない場合は、本人のモデルとそれを作成する時に用いた
本人の音声(本人の登録用音声)との類似度は、本人の
モデルと本人が認識の際に発声する音声との類似度に比
べて、一般に大きいものとなる。従って、本人が認識の
際に発声する音声について、しきい値を変化させる本人
棄却率曲線を求めると、例えば図4Aの点線曲線23の
ように、本人の登録用音声を用いた本人棄却率曲線21
よりも棄却率が悪くなる、つまり同一しきい値に対して
棄却率が大きくなる。つまり登録用音声による等誤り率
のしきい値φ0 は、認識用音声による等誤り率のしきい
値φ 0 よりも大きな値となり、その結果、登録用音声に
よる等誤り率のしきい値φ0を認識に用いた場合には本
人棄却率が大きくなるという問題があった。
However, the model of the person himself
Is sufficiently robust against utterance differences, utterance fluctuations, etc.
If not, use your own model and create it
The similarity to the voice of the person (voice for registration of the person)
Compared to the similarity between the model and the voice spoken during recognition
All are generally large. Therefore, the person himself
The person who changes the threshold for the voice uttered when
When the rejection rate curve is obtained, for example, the dotted line curve 23 in FIG.
Thus, the rejection rate curve 21 using the registration voice of the person
Rejection rate is worse than
Rejection rate increases. In other words, the equal error rate due to the registration voice
Threshold value φ0Is the threshold of equal error rate by speech for recognition.
Value φ 0Value, and as a result,
Equal error rate threshold φ0If used for recognition
There was a problem that the rejection rate increased.

【0006】また、本人の登録用の音声データ量はあま
り多くないために、特に本人のモデルが発声内容の違
い、発声変動などに対して十分に頑健でない場合が多
く、本人棄却率を信頼性高く求められないことも問題で
あった。更に話者の声は発声の度に変動し、特に2〜3
カ月の単位で大きく変動する。この点から、高い認識性
能を維持するためには定時に各話者について、音声を発
声してもらい、そのモデルを更新することが望まれる。
このようにモデルの更新が行われると、本人棄却率特性
及び詐称者受理率特性も変化する。従ってモデル更新が
行われるとしきい値も再設定することが望ましい。
In addition, since the amount of voice data for registration of a person is not so large, the model of the person is often not sufficiently robust against differences in utterance contents, fluctuations in utterance, and the like. The lack of high demand was also a problem. Furthermore, the voice of the speaker fluctuates with each utterance, and in particular, a few
It fluctuates greatly in units of months. From this point, in order to maintain high recognition performance, it is desired that each speaker be uttered at regular intervals to update the model.
When the model is updated in this way, the personal rejection rate characteristics and the impostor acceptance rate characteristics also change. Therefore, it is desirable to reset the threshold value when the model is updated.

【0007】[0007]

【課題を解決するための手段】請求項1の発明の方法に
よれば、詐称者を本人以外の登録話者として、登録用音
声を用いて話者認識実験を行った時の等誤り率を与える
しきい値から所定値だけ差し引いた値をしきい値とす
る、つまり等誤り率のしきい値より、高めの詐称者受理
率を与える値に設定する。この高めの詐称者受理率は、
等誤り率のしきい値での詐称者受理率よりも、この話者
認識方法のシステム誤り率の上限程度だけ高いものであ
る。この構成により、頑健なモデルでなくても、本人棄
却率が大きくなり過ぎることはない。
According to the method of the first aspect of the present invention, the equal error rate when a speaker recognition experiment is performed using a registration voice with an impostor as a registered speaker other than the registered person. A value obtained by subtracting a predetermined value from the given threshold value is set as the threshold value, that is, a value that gives a false acceptance rate higher than the equal error rate threshold value is set. This higher impostor acceptance rate
It is higher than the impostor acceptance rate at the equal error rate threshold by about the upper limit of the system error rate of this speaker recognition method. With this configuration, the rejection rate does not become too large even if the model is not robust.

【0008】請求項2の発明の方法では周期的にモデル
の更新を行い、その更新ごとに、その更新用音声と更新
モデルとを用いて詐称者を本人以外の登録話者として、
話者認識実験を行った時の等誤り率のしきい値から、前
記所定値より小さくかつ更新回数に応じて小さくなる値
が差し引いた値を新たなしきい値とする。この構成によ
り、モデルの更新が行われるに従って発声内容の違いや
発声変動などに対して次第に頑健になり、かつその理想
的なモデルを用いた場合の認識用音声に対する理想的な
等誤り率のしきい値に、前記高めの詐称者受理率を与え
る値から漸近してゆくことになる。
In the method according to the second aspect of the present invention, the model is periodically updated, and each time the model is updated, the impersonator is registered as a registered speaker other than himself using the update voice and the updated model.
A value obtained by subtracting a value smaller than the predetermined value and smaller according to the number of updates from the threshold of the equal error rate at the time of performing the speaker recognition experiment is set as a new threshold. With this configuration, as the model is updated, it becomes more and more robust against differences in utterance content, utterance fluctuation, etc., and the ideal equal error rate for the recognition speech when the ideal model is used. The threshold value is asymptotic from the value giving the higher impostor acceptance rate.

【0009】前記モデル更新ごとのしきい値の更新の例
を以下に示す。つまり次式に従ってしきい値φを設定す
る。 φ=wφ1 +(1−w)φ0 (1) ここでφ0 は詐称者を本人以外の登録話者として、登録
用音声を用いて話者認識実験を行った時の等誤り率のし
きい値、つまり最初に設定したしきい値を表し、φ1
詐称者受理率としきい値との関係(図4A)に基づい
て、詐称者受理率が{等誤り率ε0 +x}%(例えばx
=1%)になる値のしきい値を表す。この{等誤り率ε
0 +x}%は、その話者認識方法の性能から推定される
詐称者受理率(そのシステム推定誤り率)の上限に対応
している。wは話者のモデルの更新に合わせて、等誤り
率のしきい値にしきい値が漸近していく速度を制御する
パラメータで、例えば次式のように定義することができ
る。
An example of updating the threshold every time the model is updated will be described below. That is, the threshold value φ is set according to the following equation. φ = wφ 1 + (1−w) φ 0 (1) where φ 0 is an equal error rate when a speaker recognition experiment is performed using a registration voice with the impostor as a registered speaker other than himself / herself. threshold, i.e. represents the initially set threshold, phi 1 is based on the relationship (FIG. 4A) of the false acceptance rate and the threshold, false acceptance rate {equal error rate ε 0 + x}% (Eg x
= 1%). This {equal error rate ε
0 + x}% corresponds to the upper limit of the impostor acceptance rate (the system estimation error rate) estimated from the performance of the speaker recognition method. w is a parameter that controls the speed at which the threshold value gradually approaches the threshold value of the equal error rate in accordance with the updating of the speaker model, and can be defined by the following equation, for example.

【0010】 w=2/(1+exp(0.25t)) (2) ここでtは話者のモデルの更新の回数(t=0,1,
2,…)を表し、この式は実験により求めた式である。
この式(1)、(2)によれば、t=0では話者認識装
置を作った時、あるいは、認識すべき話者を全て新らた
なものにした時であり、つまり登録用音声を用いた最初
に決定するしきい値であり、等誤り率しきい値φ0 より
Δφ(=φ0 −φ1 )だけ小さいしきい値に設定され
る。通常、モデルの更新回数が多くなるに従って、wが
小さくなり、Δφも小さくなってφ0 に近づく。なおφ
0 ,φ1 もモデル更新ごとに求められる。
W = 2 / (1 + exp (0.25t)) (2) where t is the number of updates of the speaker model (t = 0, 1,
2,...), And this equation is an equation obtained by experiment.
According to the formulas (1) and (2), at t = 0, the time when a speaker recognition device is made or when all the speakers to be recognized are new, that is, the registration voice , And is set to a threshold smaller by Δφ (= φ 0 −φ 1 ) than the equal error rate threshold φ 0 . Usually, as the number of updates of the model increases, w decreases, and Δφ also decreases and approaches φ 0 . Note that φ
0, φ 1 is also determined for each model update.

【0011】[0011]

【発明の実施の形態】図1にこの発明の方法の実施例に
おける処理順を示し、図2にこの発明の装置の実施例の
機能構成を図3と対応する部分に同一符号を付けて示
す。図2においてこの実施例では、登録時及びモデル更
新時の各入力音声の特徴パラメータの時系列が一時蓄積
される特徴パラメータ時蓄積部25、モデル更新指示が
あるとモデル蓄積部14内のモデル更新を行うモデル更
新手段26、更に登録時及びモデル更新時にしきい値を
計算してしきい値蓄積部17のしきい値を更新するしき
い値計算手段27が設けられる。
FIG. 1 shows the order of processing in an embodiment of the method of the present invention, and FIG. 2 shows the functional configuration of an embodiment of the apparatus of the present invention by assigning the same reference numerals to parts corresponding to those in FIG. . 2, in this embodiment, a feature parameter time storage unit 25 for temporarily storing a time series of feature parameters of each input voice at the time of registration and model update, and a model update in the model storage unit 14 when there is a model update instruction. And a threshold calculating means 27 for calculating a threshold at the time of registration and updating of the model and updating the threshold of the threshold accumulating unit 17.

【0012】入力端子11に登録用音声又は更新用音声
が入力されると、図1、図2に示すように特徴パラメー
タ抽出手段12で特徴パラメータの時系列に変換され
(S1)、登録時にはモデル作成手段13でその音声の
モデルが作成され、モデル更新時には、更新用音声の特
徴パラメータ時系列により、モデル蓄積部14内の対応
モデルの更新が行われる(S2 )。
When a voice for registration or a voice for update is inputted to the input terminal 11, it is converted into a time series of characteristic parameters by the characteristic parameter extracting means 12 as shown in FIGS. 1 and 2 (S 1 ). model is the voice models in creation means 13 creates, at the time of model update, the feature parameter time series of update audio, updating the corresponding models in the model storage unit 14 is carried out (S 2).

【0013】このモデル更新のためには、各話者ごとに
登録用音声、更新用音声の各特徴パラメータ時系列を保
持しておき、それまでの全保持時系列と、新たに入力さ
れた更新用音声の時系列とを用いて新たにモデルを作成
してモデル蓄積部14内の対応モデルを更新する。ある
いは、モデルがHMMの場合、ベイズ推定により、更新
用音声の特徴パラメータの時系列Xの対応話者のHMM
に対する尤度f(X|1θ)と、それまでに発声された
音声の特徴を反映する事前確率密度関数g(θ)との積
が最大となるHMMのパラメータベクトルθを推定し、
そのθを新たなHMMとする。
In order to update the model, the time series of each characteristic parameter of the speech for registration and the speech for update for each speaker are held, and the time series of all the held speeches and the newly input update time are updated. A new model is created using the time series of the voice for use, and the corresponding model in the model storage unit 14 is updated. Alternatively, when the model is an HMM, the HMM of the speaker corresponding to the time series X of the feature parameters of the update speech is obtained by Bayesian estimation.
Estimate the parameter vector θ of the HMM that maximizes the product of the likelihood f (X | 1θ) with respect to the prior probability density function g (θ) reflecting the features of the voices uttered up to then.
Let θ be a new HMM.

【0014】次に登録時には登録用音声、モデル更新時
には更新用音声を用いて等誤り率ε 0 及びそのしきい値
φ0 を計算する(S3 )。つまりこれら音声の特徴パラ
メータの時系列は特徴パラメータ時蓄積部25に一時蓄
積され、これらとモデル蓄積部14内の各モデルとの類
似度が類似度計算手段15でそれぞれ、計算され、これ
ら類似度に対し、話者認識判定手段16で各種のしきい
値に対し判定を行い、詐称者を本人以外の登録話者とし
て、登録用音声(又は更新用音声)を用いた話者認識実
験を行い、図4Aに示した本人棄却率曲線と詐称者受理
率曲線とを求め、両誤り率が等しい誤り率ε0 と、その
時のしきい値φ0 を求める。
Next, at the time of registration, voice for registration, at the time of model update
Using the update speech 0And its threshold
φ0Is calculated (SThree). In other words, the feature parameters of these voices
The time series of the meter is temporarily stored in the feature parameter time storage unit 25.
Are stored in the storage unit 14 and the model of each model in the model storage unit 14.
The similarity is calculated by the similarity calculating means 15, respectively.
For the similarity, various thresholds are determined by the speaker recognition determining means 16.
Judgment is made for the value, and the impostor is regarded as a registered speaker other than himself.
Speaker recognition using registration voice (or update voice)
Test and the rejection rate curve shown in FIG.
And the error rate ε where both error rates are equal0And its
Time threshold φ0Ask for.

【0015】その後、詐称者受理率が(ε0 +x)%を
与えるしきい値φ1 を求め(S4 )、新たなしきい値φ
をwφ1 +(1−w)φ0 を計算して求める(S5 )。
この新たなしきい値φを、しきい値蓄積部17内の対応
話者のしきい値とする。次にモデル更新回数tを+1し
て終了とする(S6 )。ステップS3 ,S4 ,S5 ,S
6 はしきい値計算手段27で行われる。
Thereafter, a threshold φ 1 at which the impostor acceptance rate gives (ε 0 + x)% is obtained (S 4 ), and a new threshold φ
Is calculated by calculating wφ 1 + (1−w) φ 0 (S 5 ).
This new threshold value φ is set as the threshold value of the corresponding speaker in the threshold value storage unit 17. Next, the number of model updates t is incremented by 1 to end the process (S 6 ). Steps S 3 , S 4 , S 5 , S
Step 6 is performed by the threshold value calculating means 27.

【0016】図1において、t=0では登録時の登録用
音声を用いたしきい値計算が行われ、その時の等誤り率
のしきい値を与える誤り率ε0 よりもx%だけ高い詐称
者受理率となるしきい値φ1 がしきい値として設定さ
れ、モデルが頑健でなくても、本人棄却率が大き過ぎる
ことはない。また、モデル更新が行われるごとにその都
度、その更新されたモデルについて、その更新用音声を
用い、かつ詐称者を本人以外の登録話者として、話者認
識実験が行われ、つまり頑健なものに近づいて来たモデ
ルについての等誤り率しきい値に近いものとなり、かつ
wが小さくなり、その理想的に近づいた等誤り率しきい
値との差が小となる、これより小さいしきい値が設定さ
れ、つまりモデル更新が繰り返される程、望ましいしき
い値となる。
In FIG. 1, when t = 0, a threshold value is calculated using the voice for registration at the time of registration, and the impostor who is x% higher than the error rate ε 0 which gives the threshold of the equal error rate at that time is calculated. threshold phi 1 as the acceptance rate is set as a threshold, model may not robust, does not false rejection rate is too high. In addition, each time a model update is performed, a speaker recognition experiment is performed on the updated model using the updated voice and the impostor as a registered speaker other than the person itself. , Which is close to the equal error rate threshold for the model approaching, and w is small, and the difference from the ideally close equal error rate threshold is small. The more the value is set, that is, the more the model update is repeated, the more desirable the threshold becomes.

【0017】[0017]

【発明の効果】次にこの発明の効果を確めるための実験
例を述べる。実験は、男性20名が約15カ月に渡る5
つの時期(時期A、B、C、D、E)に発声した文章デ
ータ(1文章長は平均4秒)を対象とする。登録話者と
して男性10名、詐称者としてその他の男性10名を用
いた。これらの音声を、従来から使われている特徴量、
つまり、ケプストラムの短時間毎の時系列に変換する。
ケプストラムは標本化周波数12kHz、フレーム長3
2ms、フレーム周期8ms、LPC分析(Linear Pre
dictive Coding、線形予測分析)次数16で抽出した。
登録には、時期Aに発声した10文章を用いた。更新に
は、1回目の更新として時期Bに発声した10文章を用
い、2回目の更新として時期Cに発声した10文章を用
いた。テストでは、時期D、Eに発声した5文章を1文
章づつ用い、つまり時期A、B、Cによる各モデルがし
きい値について各5回づつテストを行った。なお、しき
い値の設定では、x=1%とした。
Next, an experimental example for confirming the effect of the present invention will be described. The experiment was conducted by 20 men for about 15 months5
Sentence data uttered at two times (time A, B, C, D, and E) (one sentence length is 4 seconds on average) is targeted. Ten males were used as registered speakers and the other ten males were used as impostors. These voices are used as feature values,
That is, the cepstrum is converted into a time series for each short time.
The cepstrum has a sampling frequency of 12 kHz and a frame length of 3
2 ms, frame period 8 ms, LPC analysis (Linear Pre
(dictive coding, linear prediction analysis).
Ten sentences uttered at time A were used for registration. For the update, ten sentences uttered at time B were used as the first update, and ten sentences uttered at time C were used as the second update. In the test, five sentences uttered at times D and E were used one sentence at a time, that is, each model at times A, B, and C performed a test on the threshold value five times each. In the setting of the threshold value, x = 1%.

【0018】この発明の効果は、テキスト独立型(例え
ば文献「松井知子、古井貞煕:“VQ、離散/連続HM
Mによるテキスト独立形話者認識法の比較検討”、電子
情報通信学会音声研究会資料、SP91−89、199
1」)の話者認識において試した。各話者のHMMは、
1状態が64個のガウス分布の重み付き加算(例えば文
献「松井知子、古井貞煕:“VQ、離散/連続HMMに
よるテキスト独立形話者認識法の比較検討”、電子情報
通信学会音声研究会資料、SP91−89、199
1」)で表した。
The effect of the present invention is that the text-independent type (for example, the documents "Tomoko Matsui, Sadahiro Furui:" VQ, discrete / continuous HM
M-Comparison of Text Independent Speaker Recognition Methods ", IEICE Spoken Language Study Group, SP91-89, 199
1)). The HMM for each speaker is
Weighted addition of 64 Gaussian distributions in one state (for example, literature "Tomoko Matsui, Sadahiro Furui:" Comparison of VQ, text-independent speaker recognition method using discrete / continuous HMM ", IEICE Speech Research Group) Documents, SP91-89, 199
1 ").

【0019】結果は、本人棄却率と詐称者受理率の平均
で評価した。その結果を図4Bに示す。従来法は、詐称
者を本人以外の登録話者として、全登録用音声を用いて
話者認識実験を行った時の等誤り率のしきい値による結
果を表す。これより、この発明方法は従来法と比べて、
高い性能を示すことがわかる。これらの結果より、この
発明方法は有効であることが実証された。
The results were evaluated by the average of the rejection rate and the false acceptance rate. The result is shown in FIG. 4B. In the conventional method, a result obtained by performing a speaker recognition experiment using all registration voices with the impostor as a registered speaker other than the registered person is represented by a threshold value of an equal error rate. From this, the method of the present invention, compared with the conventional method,
It turns out that it shows high performance. From these results, the method of the present invention was proved to be effective.

【図面の簡単な説明】[Brief description of the drawings]

【図1】この発明方法の実施例における処理手順を示す
流れ図。
FIG. 1 is a flowchart showing a processing procedure in an embodiment of the method of the present invention.

【図2】この発明装置の実施例の機能構成を示すブロッ
ク図。
FIG. 2 is a block diagram showing a functional configuration of an embodiment of the apparatus of the present invention.

【図3】従来の話者認識装置の機能構成を示すブロック
図。
FIG. 3 is a block diagram showing a functional configuration of a conventional speaker recognition device.

【図4】Aはしきい値に対する本人棄却率及び詐称者受
理率の関係を示す図、Bはこの発明の効果を説明するた
めの実験結果を示す図である。
FIG. 4A is a diagram showing a relationship between a rejection rate and an impostor acceptance rate with respect to a threshold, and FIG. 4B is a view showing an experimental result for explaining an effect of the present invention.

フロントページの続き (56)参考文献 特開 平8−123475(JP,A) 特開 昭62−209500(JP,A) 特開 昭47−41103(JP,A) 特開 平7−248791(JP,A) 特公 昭63−29279(JP,B2) IEEE Transactions on Acoustics,Spee ch and Signal Proc essing,Vol.ASSP−29, No.2,April 1981,S.Fu rui,”Cepstral Anal ysis Technique for Automatic Speaker Verification”,p. 254−272 日本音響学会平成8年度春季研究発表 会講演論文集▲I▼,1−5−6,松井 知子外「話者認識におけるモデルとしき い値の更新法の検討」,p.11−12, (平成8年3月26日発行) 電子情報通信学会技術研究報告[音声 ],Vol.94,No.90,SP94− 22,松井知子外「音韻・話者独立モデル による話者照合尤度の正規化」,p.61 −66(1994年6月16日発行) 電子情報通信学会技術研究報告[音声 ],Vol.95,No.468,SP95− 120,松井知子外「話者照合におけるモ デルとしきい値の更新法」,p.21−26 (1996年1月19日発行) 電子情報通信学会論文誌,Vol.J 81−D−▲II▼ No.2,Febu rary 1998,松井知子外「話者照合 におけるモデルとしきい値の更新法」, p.268−276(平成10年2月25日発行) (58)調査した分野(Int.Cl.7,DB名) G10L 15/00 - 17/00 JICSTファイル(JOIS)Continuation of the front page (56) References JP-A-8-123475 (JP, A) JP-A-62-209500 (JP, A) JP-A-47-41103 (JP, A) JP-A-7-248791 (JP) , A) JP-B-63-29279 (JP, B2) IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-29, no. 2, April 1981, S.M. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”, p. 254-272 Proceedings of the Spring Meeting of the Acoustical Society of Japan 1996, I-I, 1-5-6, Tomoko Matsui Examination of updating method of model and threshold ”, p. 11-12, (issued March 26, 1996) IEICE Technical Report [Voice], Vol. 94, no. 90, SP94-22, Tomiko Matsui et al. “Normalization of speaker matching likelihood by phoneme / speaker independent model”, p. 61-66 (issued on June 16, 1994) IEICE Technical Report [Voice], Vol. 95, No. 468, SP95-120, Tomoko Matsui et al., “Method of updating model and threshold in speaker verification,” p. 21-26 (issued on January 19, 1996) Transactions of the Institute of Electronics, Information and Communication Engineers, Vol. J 81-D- ▲ II ▼ No. 2, Febuary 1998, Tomoko Matsui et al., "Method of updating models and thresholds in speaker verification," p. 268-276 (Issued February 25, 1998) (58) Field surveyed (Int. Cl. 7 , DB name) G10L 15/00-17/00 JICST file (JOIS)

Claims (4)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 入力音声を、特徴パラメータを用いた表
現形式に変換し、その表現形式による入力音声と、あら
かじめ話者対応に登録された上記表現形式による音声の
モデルとの類似度を求め、その類似度と話者判定用しき
い値とを比較して上記入力音声を発声した話者を認識す
る話者認識方法における上記しきい値を設定する方法に
おいて、 各話者の上記モデル登録の際に発声された音声と上記登
録モデルを用いて本人棄却率と詐称者受理率との二つの
誤り率を計算し、 これら計算された二つの誤り率が等しくなるしきい値よ
り所定値を差し引いた値に上記話者判定用しきい値を設
定することを特徴とする話者認識用しきい値設定方法。
1. An input speech is converted into an expression form using feature parameters, and a similarity between the input speech in the expression form and a speech model in the above-mentioned expression form registered in advance for a speaker is obtained. A method for setting the threshold value in a speaker recognition method for recognizing a speaker who has uttered the input voice by comparing the similarity with a speaker determination threshold value, comprising the steps of: Calculate the two error rates, the rejection rate and the impostor acceptance rate, using the voice uttered at this time and the above registration model, and subtract a predetermined value from a threshold value at which the calculated two error rates become equal. A threshold value for speaker recognition, wherein the threshold value for speaker determination is set to the threshold value.
【請求項2】 上記各話者に対するモデルを更新するご
とに、その更新されたモデルとその更新の際に発声され
た音声を用いて上記二つの誤り率を計算し、 これら計算された二つの誤り率が等しくなるしきい値よ
り、上記所定値より小さく、かつ前回よりも小さい値を
差し引いた値に上記話者判定用しきい値を更新すること
を特徴とする請求項1記載の話者認識用しきい値設定方
法。
2. Each time the model for each speaker is updated, the two error rates are calculated using the updated model and the voice uttered at the time of the update, and the calculated two error rates are calculated. 2. The speaker according to claim 1, wherein the threshold value for speaker determination is updated to a value obtained by subtracting a value smaller than the predetermined value and smaller than a previous value from a threshold value at which an error rate becomes equal. How to set the threshold for recognition.
【請求項3】 上記所定値を、上記話者認識方法自体の
誤り率の上限とほぼ等しい値とすることを特徴とする請
求項1又は2記載の話者認識用しきい値設定方法。
3. The method according to claim 1, wherein the predetermined value is substantially equal to an upper limit of an error rate of the speaker recognition method itself.
【請求項4】 入力音声が、特徴パラメータ抽出手段で
特徴パラメータを用いた表現形式に変換され、この表現
形式による入力音声のモデルがモデル作成手段により作
成されてモデル蓄積手段に蓄積される。また、上記特徴
パラメータ抽出手段で変換された表現形式の音声は類似
度計算手段で上記モデル蓄積手段内の各モデルとの類似
度が計算され、これら計算された類似度は、しきい値蓄
積部の本人の声とみなせる類似度の変動の範囲を示すし
きい値と話者認識判定手段で比較され、類似度の方が大
きければその本人の音声であり、小さければ他人の音声
であると判定される話者認識装置において、 モデル更新の指示があると、上記特徴パラメータ抽出手
段よりの特徴パラメータを用いた表現形式による入力音
声により、これと対応する話者の上記モデル蓄積部内の
モデルを更新するモデル更新手段と、 上記更新されたモデルについてその更新時の音声につい
ての本人棄却率と詐称者受理率とを計算し、これら棄却
率と受理率が等しくなるしきい値からわずかに小さな値
を差し引いた値に、上記しきい値蓄積部内の対応する話
者のしきい値を更新するしきい値計算手段と、 を具備することを特徴とするしきい値更新を伴う話者認
識装置。
4. An input speech is converted into an expression format using feature parameters by a feature parameter extraction unit, and a model of the input speech in this expression format is created by a model creation unit and stored in a model storage unit. The similarity of each of the models in the model storage means is calculated by the similarity calculation means for the speech of the expression format converted by the feature parameter extraction means, and these calculated similarities are stored in a threshold storage section. The threshold value indicating the range of variation of the similarity that can be regarded as the voice of the individual is compared with the speaker recognition determining means. If the similarity is higher, the voice is the voice of the individual, and if the similarity is lower, the voice is the voice of another person. In the speaker recognition device, when a model update instruction is issued, the model in the model storage unit of the corresponding speaker is updated by an input speech in an expression form using the feature parameter from the feature parameter extracting unit. Means for updating the model, and calculating a rejection rate and an imposter acceptance rate for the speech at the time of updating the updated model, and determining whether the rejection rate and the acceptance rate are equal. Threshold calculating means for updating a threshold value of a corresponding speaker in the threshold accumulation unit to a value obtained by subtracting a slightly smaller value from the threshold value. Accompanying speaker recognition device.
JP08004508A 1996-01-16 1996-01-16 Speaker recognition threshold setting method and speaker recognition apparatus using the method Expired - Lifetime JP3092788B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP08004508A JP3092788B2 (en) 1996-01-16 1996-01-16 Speaker recognition threshold setting method and speaker recognition apparatus using the method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP08004508A JP3092788B2 (en) 1996-01-16 1996-01-16 Speaker recognition threshold setting method and speaker recognition apparatus using the method

Publications (2)

Publication Number Publication Date
JPH09198086A JPH09198086A (en) 1997-07-31
JP3092788B2 true JP3092788B2 (en) 2000-09-25

Family

ID=11586004

Family Applications (1)

Application Number Title Priority Date Filing Date
JP08004508A Expired - Lifetime JP3092788B2 (en) 1996-01-16 1996-01-16 Speaker recognition threshold setting method and speaker recognition apparatus using the method

Country Status (1)

Country Link
JP (1) JP3092788B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014036786A (en) * 2012-08-20 2014-02-27 Aisin Seiki Co Ltd Mattress with replaceable cover

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039951B1 (en) 2000-06-06 2006-05-02 International Business Machines Corporation System and method for confidence based incremental access authentication
US20040044931A1 (en) * 2000-11-29 2004-03-04 Manfred Bromba Method and device for determining an error rate of biometric devices
US7072750B2 (en) * 2001-05-08 2006-07-04 Intel Corporation Method and apparatus for rejection of speech recognition results in accordance with confidence level
KR100819848B1 (en) * 2005-12-08 2008-04-08 한국전자통신연구원 Apparatus and method for speech recognition using automatic update of threshold for utterance verification
JP6407633B2 (en) * 2014-09-02 2018-10-17 株式会社Kddiテクノロジー Communication device, determination method update method and program for voiceprint data
JP6407634B2 (en) * 2014-09-02 2018-10-17 株式会社Kddiテクノロジー Communication device, voice print data determination result notification method, and program
US11837238B2 (en) * 2020-10-21 2023-12-05 Google Llc Assessing speaker recognition performance

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
IEEE Transactions on Acoustics,Speech and Signal Processing,Vol.ASSP−29,No.2,April 1981,S.Furui,"Cepstral Analysis Technique for Automatic Speaker Verification",p.254−272
日本音響学会平成8年度春季研究発表会講演論文集▲I▼,1−5−6,松井知子外「話者認識におけるモデルとしきい値の更新法の検討」,p.11−12,(平成8年3月26日発行)
電子情報通信学会技術研究報告[音声],Vol.94,No.90,SP94−22,松井知子外「音韻・話者独立モデルによる話者照合尤度の正規化」,p.61−66(1994年6月16日発行)
電子情報通信学会技術研究報告[音声],Vol.95,No.468,SP95−120,松井知子外「話者照合におけるモデルとしきい値の更新法」,p.21−26(1996年1月19日発行)
電子情報通信学会論文誌,Vol.J81−D−▲II▼ No.2,Feburary 1998,松井知子外「話者照合におけるモデルとしきい値の更新法」,p.268−276(平成10年2月25日発行)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014036786A (en) * 2012-08-20 2014-02-27 Aisin Seiki Co Ltd Mattress with replaceable cover

Also Published As

Publication number Publication date
JPH09198086A (en) 1997-07-31

Similar Documents

Publication Publication Date Title
EP3719798B1 (en) Voiceprint recognition method and device based on memorability bottleneck feature
US9536525B2 (en) Speaker indexing device and speaker indexing method
US7013276B2 (en) Method of assessing degree of acoustic confusability, and system therefor
JP4355322B2 (en) Speech recognition method based on reliability of keyword model weighted for each frame, and apparatus using the method
Masuko et al. Imposture using synthetic speech against speaker verification based on spectrum and pitch
JPH075892A (en) Voice recognition method
JP4897040B2 (en) Acoustic model registration device, speaker recognition device, acoustic model registration method, and acoustic model registration processing program
CN106653002A (en) Literal live broadcasting method and platform
Munteanu et al. Automatic speaker verification experiments using HMM
Jin et al. Overview of front-end features for robust speaker recognition
JP3092788B2 (en) Speaker recognition threshold setting method and speaker recognition apparatus using the method
Ozaydin Design of a text independent speaker recognition system
Ilyas et al. Speaker verification using vector quantization and hidden Markov model
KR100551953B1 (en) Apparatus and Method for Distinction Using Pitch and MFCC
JP3216565B2 (en) Speaker model adaptation method for speech model, speech recognition method using the method, and recording medium recording the method
Singh et al. Features and techniques for speaker recognition
JP3036509B2 (en) Method and apparatus for determining threshold in speaker verification
Nair et al. A reliable speaker verification system based on LPCC and DTW
JPH07271392A (en) Degree of similarity normalization method for speaker recognition and speaker recognition device using the method
JP2001350494A (en) Device and method for collating
Sailaja et al. Text independent speaker identification with finite multivariate generalized gaussian mixture model and hierarchical clustering algorithm
Fakotakis et al. A continuous HMM text-independent speaker recognition system based on vowel spotting.
JPH05323990A (en) Talker recognizing method
JPH09198084A (en) Method and device for speaker recognition accompanied by model update
Dutta et al. A comparative study on feature dependency of the Manipuri language based phonetic engine

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20070728

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080728

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080728

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090728

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090728

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100728

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100728

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110728

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120728

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130728

Year of fee payment: 13

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term