TW201833810A - Method and system of authentication based on voiceprint recognition - Google Patents
Method and system of authentication based on voiceprint recognition Download PDFInfo
- Publication number
- TW201833810A TW201833810A TW106135250A TW106135250A TW201833810A TW 201833810 A TW201833810 A TW 201833810A TW 106135250 A TW106135250 A TW 106135250A TW 106135250 A TW106135250 A TW 106135250A TW 201833810 A TW201833810 A TW 201833810A
- Authority
- TW
- Taiwan
- Prior art keywords
- voiceprint
- vector
- voice data
- verification
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 157
- 238000012795 verification Methods 0.000 claims abstract description 88
- 238000012549 training Methods 0.000 claims abstract description 50
- 239000000203 mixture Substances 0.000 claims description 37
- 238000001228 spectrum Methods 0.000 claims description 24
- 238000009432 framing Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 239000000463 material Substances 0.000 claims description 5
- 238000000638 solvent extraction Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 59
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0861—Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Biomedical Technology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Collating Specific Patterns (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
本發明涉及通信技術領域,尤其涉及一種基於聲紋識別的身份驗證的方法及系統。 The present invention relates to the field of communications technologies, and in particular, to a method and system for identity verification based on voiceprint recognition.
目前,大型金融公司的業務範圍涉及保險、銀行、投資等多個業務範疇,每個業務範疇通常都需要同客戶進行溝通,溝通的方式有多種(例如電話溝通或者面對面溝通等)。在進行溝通之前,對客戶的身份進行驗證成為保證業務安全的重要組成部分。為了滿足業務的即時性需求,金融公司通常採用人工方式對客戶的身份進行分析驗證。由於客戶群體龐大,依靠人工進行判別分析以對驗證客戶的身份的方式準確性也不高,效率也低。 At present, the business scope of large financial companies involves insurance, banking, investment and other business areas. Each business category usually needs to communicate with customers, and there are many ways to communicate (such as telephone communication or face-to-face communication). Before communicating, verifying the identity of the customer becomes an important part of ensuring business security. In order to meet the immediate needs of the business, financial companies usually use manual methods to analyze and verify the identity of customers. Due to the large customer base, relying on manual discriminant analysis to verify the identity of the customer is not accurate or efficient.
本發明之一目的在於提供一種基於聲紋識別的身份驗證的方法及系統,旨在提高用戶身份驗證的準確率及效率。 An object of the present invention is to provide a method and system for identity verification based on voiceprint recognition, which aims to improve the accuracy and efficiency of user identity verification.
為實現上述目的,本發明提供一種基於聲紋識別的身份驗證的方法,該基於聲紋識別的身份驗證的方法包括步驟:S1,在接收到進行身份驗證的用戶的語音資料後,獲取該語音資料的聲紋特徵,並基於該聲紋特徵構建對應的聲紋特徵向量;S2,將該聲紋特徵向量輸入預先訓練生成的背景通道模型,以構建出該語音資料對應的當前聲紋鑒別向量;S3,計算該當前聲紋鑒別向量與預存的該用戶的標準聲紋鑒別向量之間的空間距離,基於該距離對該用戶進行身份驗證,並生成驗證結果。 In order to achieve the above object, the present invention provides a voiceprint recognition-based identity verification method, and the voiceprint recognition-based identity verification method includes the following steps: S1, after receiving the voice data of the user performing the identity verification, acquiring the voice The voiceprint feature of the data is constructed, and the corresponding voiceprint feature vector is constructed based on the voiceprint feature; S2, the voiceprint feature vector is input into the background channel model generated by the pre-training to construct the current voiceprint discrimination vector corresponding to the voice data. S3, calculating a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, authenticating the user based on the distance, and generating a verification result.
較佳地,該步驟S1包括子步驟: S11,對該語音資料進行預加權、分幀及加窗處理;S12,對每一個加窗處理進行傅立葉轉換得到對應的頻譜;S13,將該頻譜輸入梅爾濾波器以輸出得到梅爾頻譜;S14,在梅爾頻譜上面進行倒頻譜分析以獲得梅爾頻率倒頻譜係數MFCC,基於該梅爾頻率倒頻譜係數MFCC組成對應的聲紋特徵向量。 Preferably, the step S1 includes the sub-steps: S11, pre-weighting, framing, and windowing the voice data; S12, performing Fourier transform on each windowing process to obtain a corresponding spectrum; S13, inputting the spectrum The Meyer filter obtains the Mel spectrum from the output; S14, performs cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, and forms a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
較佳地,該步驟S3包括子步驟:S31,計算該當前聲紋鑒別向量與預存的該用戶的標準聲紋鑒別向量之間的 餘弦距離:,為該標準聲紋鑒別向量,為當前聲紋鑒別向 量;S32,若該餘弦距離小於或者等於預設的距離閾值,則生成驗證通過的資訊;S33,若該餘弦距離大於預設的距離閾值,則生成驗證不通過的資訊。 Preferably, the step S3 comprises the substeps: S31, calculating a cosine distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user: , Identify the vector for the standard voiceprint, For the current voiceprint discrimination vector; S32, if the cosine distance is less than or equal to the preset distance threshold, generate verification information; S33, if the cosine distance is greater than the preset distance threshold, generate information that the verification fails.
較佳地,該背景通道模型為高斯混合模型,該步驟S1之前包括步驟:獲取預設數量的語音資料樣本,並獲取各語音資料樣本對應的聲紋特徵,並基於各語音資料樣本對應的聲紋特徵構建各語音資料樣本對應的聲紋特徵向量;將各語音資料樣本對應的聲紋特徵向量分為第一比例的訓練集及第二比例的驗證集,該第一比例及第二比例的和小於等於1;利用該訓練集中的聲紋特徵向量對高斯混合模型進行訓練,並在訓練完成後,利用該驗證集對訓練後的高斯混合模型的準確率進行驗證;若該準確率大於預設閾值,則模型訓練結束,以訓練後的高斯混合模型作為該步驟S2的背景通道模型,或者,若該準確率小於等於預設閾值,則增加該語音資料樣本的數量,並基於增加後的語音資料樣本重新進行訓練。 Preferably, the background channel model is a Gaussian mixture model, and the step S1 includes the steps of: acquiring a preset number of voice data samples, and acquiring voiceprint features corresponding to the voice data samples, and based on the sounds corresponding to the voice data samples. The voice feature vector corresponding to each voice data sample is constructed; the voiceprint feature vector corresponding to each voice data sample is divided into a first proportional training set and a second proportional verification set, and the first ratio and the second ratio are And less than or equal to 1; the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the verification set is used to verify the accuracy of the trained Gaussian mixture model; if the accuracy is greater than the pre- If the threshold is set, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the number of the voice data samples is increased, and based on the increased The voice data samples are retrained.
較佳地,該步驟S3替換為:計算該當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,獲取最小的空間距離,基於該最小的空間距離對該用戶進行身份驗證,並生成驗證結果。 Preferably, the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vectors, obtaining a minimum spatial distance, and authenticating the user based on the minimum spatial distance. And generate verification results.
為實現上述目的,本發明還提供一種基於聲紋識別的身份驗證的系統,該基於聲紋識別的身份驗證的系統包括:第一獲取模組,用於在接收到進行身份驗證的用戶的語音資料後,獲取該語音資料的聲紋特徵,並基於該聲紋特徵構建對應的聲紋特徵向量;構建模組,用於將該聲紋特徵向量輸入預先訓練生成的背景通道模型,以構建出該語音資料對應的當前聲紋鑒別向量;第一驗證模組,用於計算該當前聲紋鑒別向量與預存的該用戶的標準聲紋鑒別向量之間的空間距離,基於該距離對該用戶進行身份驗證,並生成驗證結果。 To achieve the above object, the present invention also provides a voiceprint recognition based authentication system, the voiceprint recognition based authentication system comprising: a first acquisition module, configured to receive a voice of a user performing authentication After the data, the voiceprint feature of the voice data is obtained, and the corresponding voiceprint feature vector is constructed based on the voiceprint feature; and the module is configured to input the voiceprint feature vector into the background channel model generated by the pre-training to construct The current voiceprint discrimination vector corresponding to the voice data; the first verification module is configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, and perform the user on the basis based on the distance Authenticate and generate verification results.
較佳地,該第一獲取模組具體用於對該語音資料進行預加權、分幀及加窗處理;對每一個加窗進行傅立葉轉換得到對應的頻譜;將該頻譜輸入梅爾濾波器以輸出得到梅爾頻譜;在梅爾頻譜上面進行倒頻譜分析以獲得梅爾頻率倒頻譜係數MFCC,基於該梅爾頻率倒頻譜係數MFCC組成對應的聲紋特徵向量。 Preferably, the first acquiring module is specifically configured to perform pre-weighting, framing, and windowing processing on the voice data; performing Fourier transform on each window to obtain a corresponding spectrum; and inputting the spectrum into the mel filter The output obtains the Mel spectrum; the cepstrum analysis is performed on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, and the corresponding Moiré feature vector is composed based on the Mel frequency cepstral coefficient MFCC.
較佳地,該第一驗證模組具體用於計算該當前聲紋鑒別向量 與預存的該用戶的標準聲紋鑒別向量之間的餘弦距離:,為 該標準聲紋鑒別向量,為當前聲紋鑒別向量;若該餘弦距離小於或者等於預設的距離閾值,則生成驗證通過的資訊;若該餘弦距離大於預設的距離閾值,則生成驗證不通過的資訊。 Preferably, the first verification module is specifically configured to calculate a cosine distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user: , Identify the vector for the standard voiceprint, A vector for identifying the current voiceprint; if the cosine distance is less than or equal to a preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
較佳地,該基於聲紋識別的身份驗證的系統還包括:第二獲取模組,用於獲取預設數量的語音資料樣本,並獲取各語音資料樣本對應的聲紋特徵,並基於各語音資料樣本對應的聲紋特徵構建各語音資料樣本對應的聲紋特徵向量;劃分模組,用於將各語音資料樣本對應的聲紋特徵向量分為第一比例的訓練集及第二比例的驗證集,該第一比例及第二比例的和小於等於1;訓練模組,用於利用該訓練集中的聲紋特徵向量對高斯混合模型進行訓練, 並在訓練完成後,利用該驗證集對訓練後的高斯混合模型的準確率進行驗證;處理模組,用於若該準確率大於預設閾值,則模型訓練結束,以訓練後的高斯混合模型作為該背景通道模型,或者,若該準確率小於等於預設閾值,則增加該語音資料樣本的數量,並基於增加後的語音資料樣本重新進行訓練。 Preferably, the voiceprint recognition-based identity verification system further includes: a second acquisition module, configured to acquire a preset number of voice data samples, and obtain voiceprint features corresponding to each voice data sample, and based on each voice The voiceprint feature corresponding to the data sample constructs a voiceprint feature vector corresponding to each voice data sample; the partitioning module is configured to divide the voiceprint feature vector corresponding to each voice data sample into a first proportional training set and a second proportional verification. The set, the sum of the first ratio and the second ratio is less than or equal to 1; the training module is configured to train the Gaussian mixture model by using the voiceprint feature vector in the training set, and after the training is completed, use the verification set pair training The accuracy of the subsequent Gaussian mixture model is verified; the processing module is configured to end the model training if the accuracy is greater than a preset threshold, and use the trained Gaussian mixture model as the background channel model, or if the accuracy is If the threshold is less than or equal to the preset threshold, the number of the voice data samples is increased, and the training is performed based on the added voice data samples.
較佳地,該第一驗證模組替換為第二驗證模組,用於計算該當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,獲取最小的空間距離,基於該最小的空間距離對該用戶進行身份驗證,並生成驗證結果。 Preferably, the first verification module is replaced with a second verification module, configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vectors, to obtain a minimum spatial distance, based on the The minimum spatial distance authenticates the user and generates a verification result.
本發明的有益效果是:本發明預先訓練生成的背景通道模型為通過對大量語音資料的挖掘與比對訓練得到,這一模型可以在最大限度保留用戶的聲紋特徵的同時,精確刻畫用戶說話時的背景聲紋特徵,並能夠在識別時將這一特徵去除,而提取用戶聲音的固有特徵,能夠較大地提高用戶身份驗證的準確率,並提高身份驗證的效率。 The invention has the beneficial effects that the background channel model generated by the pre-training of the invention is obtained by mining and comparing a large amount of speech data, and the model can accurately describe the user's speech while retaining the user's voiceprint feature to the utmost extent. The background voiceprint feature can be removed at the time of recognition, and the intrinsic feature of the user voice can be extracted, which can greatly improve the accuracy of user identity verification and improve the efficiency of identity verification.
1‧‧‧電子裝置 1‧‧‧Electronic device
10‧‧‧基於聲紋識別的身分驗證的系統 10‧‧‧System for identity verification based on voiceprint recognition
101‧‧‧第一獲取模組 101‧‧‧First acquisition module
102‧‧‧構建模組 102‧‧‧Building module
103‧‧‧第一驗證模組 103‧‧‧First verification module
11‧‧‧記憶體 11‧‧‧ memory
12‧‧‧處理器 12‧‧‧ Processor
13‧‧‧顯示器 13‧‧‧ display
S1‧‧‧步驟 S1‧‧‧ steps
S11‧‧‧子步驟 S11‧‧‧ substeps
S12‧‧‧子步驟 S12‧‧‧ substeps
S13‧‧‧子步驟 S13‧‧‧ substeps
S14‧‧‧子步驟 S14‧‧‧ substeps
S2‧‧‧步驟 S2‧‧‧ steps
S3‧‧‧步驟 S3‧‧‧ steps
S31‧‧‧子步驟 S31‧‧‧ substeps
S32‧‧‧子步驟 S32‧‧‧ substeps
S33‧‧‧子步驟 S33‧‧‧ substeps
第1圖為本發明基於聲紋識別的身份驗證的方法較佳實施例的流程示意圖。 FIG. 1 is a schematic flow chart of a preferred embodiment of a method for authenticating voiceprint recognition based authentication according to the present invention.
第2圖為第1圖所示步驟S1的細化流程示意圖。 Fig. 2 is a schematic diagram showing the refinement flow of step S1 shown in Fig. 1.
第3圖為第1圖所示步驟S3的細化流程示意圖。 Fig. 3 is a schematic diagram showing the refinement flow of step S3 shown in Fig. 1.
第4圖為本發明基於聲紋識別的身份驗證的系統較佳實施例的運行環境示意圖。 4 is a schematic diagram of an operating environment of a preferred embodiment of a voiceprint recognition based authentication system according to the present invention.
第5圖為本發明基於聲紋識別的身份驗證的系統較佳實施例的結構示意圖。 FIG. 5 is a schematic structural diagram of a system for authenticating a voiceprint recognition based authentication method according to the present invention.
以下結合附圖對本發明的原理及特徵進行描述,所舉實例只用于解釋本發明,並非用於限定本發明的範圍。 The principles and features of the present invention are described in the following with reference to the accompanying drawings.
如第1圖所示,第1圖為本發明基於聲紋識別的身份驗證的 方法一實施例的流程示意圖,該基於聲紋識別的身份驗證的方法可以由一基於聲紋識別的身份驗證的系統執行,該系統可以由軟體及/或硬體實現,並且該系統可整合在伺服器中。該基於聲紋識別的身份驗證的方法包括以下步驟:步驟S1,在接收到進行身份驗證的用戶的語音資料後,獲取該語音資料的聲紋特徵,並基於該聲紋特徵構建對應的聲紋特徵向量;本實施例中,語音資料由語音採集設備採集得到(語音採集設備例如為麥克風),語音採集設備將採集的語音資料發送給基於聲紋識別的身份驗證的系統。 As shown in FIG. 1 , FIG. 1 is a schematic flowchart of an embodiment of a voiceprint recognition based identity verification method according to an embodiment of the present invention. The voiceprint recognition based identity verification method may be performed by a voiceprint recognition based identity verification. The system is executed, the system can be implemented by software and/or hardware, and the system can be integrated in the server. The voiceprint recognition-based authentication method includes the following steps: Step S1: After receiving the voice data of the user who performs the authentication, acquire the voiceprint feature of the voice data, and construct a corresponding voiceprint based on the voiceprint feature. Feature vector; in this embodiment, the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
在採集語音資料時,應儘量防止環境噪音及語音採集設備的干擾。語音採集設備與用戶保持適當距離,且儘量不用失真大的語音採集設備,電源較佳使用市電,並保持電流穩定;在進行電話錄音時應使用感測器。在提取語音資料中的聲紋特徵之前,可以對語音資料進行去噪音處理,以進一步減少干擾。為了能夠提取得到語音資料的聲紋特徵,所採集的語音資料為預設資料長度的語音資料,或者為大於預設資料長度的語音資料。 When collecting voice data, you should try to prevent environmental noise and interference from voice collection equipment. The voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device. The power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone. Before extracting the voiceprint features in the voice data, the voice data can be denoised to further reduce interference. In order to extract the voiceprint feature of the voice data, the collected voice data is voice data of a preset data length, or voice data larger than a preset data length.
聲紋特徵包括多種類型,例如寬頻聲紋、窄頻聲紋、振幅聲紋等,本實施例的聲紋特徵為較佳地為語音資料的梅爾頻率倒頻譜係數(Mel Frequency Cepstrum Coefficient,MFCC)。在構建對應的聲紋特徵向量時,將語音資料的聲紋特徵組成特徵資料矩陣,該特徵資料矩陣即為語音資料的聲紋特徵向量。 The voiceprint feature includes a plurality of types, such as a wide-frequency voiceprint, a narrow-frequency voiceprint, an amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is preferably a Mel Frequency Cepstrum Coefficient (MFCC). ). When constructing the corresponding voiceprint feature vector, the voiceprint feature of the voice data is composed into a feature data matrix, which is the voiceprint feature vector of the voice data.
步驟S2,將該聲紋特徵向量輸入預先訓練生成的背景通道模型,以構建出該語音資料對應的當前聲紋鑒別向量;其中,將聲紋特徵向量輸入預先訓練生成的背景通道模型,較佳地,該背景通道模型為高斯混合模型,利用該背景通道模型來計算聲紋特徵向量,得出對應的當前聲紋鑒別向量(即i-vector)。 Step S2, the voiceprint feature vector is input into the background channel model generated by the pre-training to construct a current voiceprint discrimination vector corresponding to the voice data; wherein the voiceprint feature vector is input into the background channel model generated by the pre-training, preferably The background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector (ie, i-vector).
具體地,該計算過程包括:1)、選擇高斯模型:首先,利用泛用背景通道模型中的參數來計算每幀資料在不同高斯模型的似然對數值, 通過對似然對數值矩陣每列並行排序,選取前N個高斯模型,最終獲得一每幀資料在混合高斯模型中數值的矩陣:Loglike=E(X)*D(X)-1*X T -0.5*D(X)-1*(X.2) T ,其中,Loglike為似然對數值矩陣,E(X)為泛用背景通道模型訓練出來的均值矩陣,D(X)為共變異數矩陣,X為資料矩陣,X.2為矩陣每個值取平方。 Specifically, the calculation process includes: 1) selecting a Gaussian model: first, using parameters in the general background channel model to calculate the likelihood value of each frame of data in different Gaussian models, by using a column of likelihood logarithmic values Parallel sorting, select the first N Gaussian models, and finally obtain a matrix of values per frame of data in the mixed Gaussian model: Loglike = E ( X ) * D ( X ) -1 * X T -0.5* D ( X ) -1 *( X . 2 ) T , where Loglike is a likelihood logarithmic matrix, E(X) is a mean matrix trained by a general background channel model, D(X) is a covariance matrix, and X is a data matrix, X 2 Square the value of each matrix.
2)、計算後驗機率:將每幀資料X進行X * XT計算,得到一個對稱矩陣,可簡化為下三角矩陣,並將元素按順序排列為1行,變成一個N幀乘以該下三角矩陣個數緯度的一個向量進行計算,將所有幀的該向量組合成新的資料矩陣,同時將泛用背景模型中計算機率的共變異數矩陣,每個矩陣也簡化為下三角矩陣,變成與新資料矩陣類似的矩陣,在通過泛用背景通道模型中的均值矩陣及共變異數矩陣算出每幀資料的在該選擇的高斯模型下的似然對數值,然後進行Softmax回歸,最後進行正歸化操作,得到每幀在混合高斯模型後驗機率分佈,將每幀的機率分佈向量組成機率矩陣。 2) Calculate the posterior probability: X * XT calculation is performed for each frame of data X to obtain a symmetric matrix, which can be simplified into a lower triangular matrix, and the elements are arranged in order of 1 row, and become an N frame multiplied by the lower triangle. A vector of latitudes of the matrix is calculated, and the vectors of all frames are combined into a new data matrix, and the matrix of the common variability of the computer rate in the general background model is simplified, and each matrix is also reduced to a lower triangular matrix. A matrix similar to the new data matrix, the likelihood log value of each frame of data in the selected Gaussian model is calculated by using the mean matrix and the common variance matrix in the universal background channel model, and then Softmax regression is performed, and finally the positive return is performed. The operation results in a posterior probability distribution of each frame in the mixed Gaussian model, and the probability distribution vector of each frame is composed into a probability matrix.
3)、提取當前聲紋鑒別向量:首先進行一階、二階係數的計算,一階係數計算可以通過機率矩陣列求和得到: ,其中,Gammai為一階係數向量的第i 個元素,loglikesji為機率矩陣的第j行,第i個元素。 3) Extract the current voiceprint discrimination vector: first calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix columns: Where Gamma i is the ith element of the first-order coefficient vector, and loglikes ji is the j-th row of the probability matrix, the ith element.
二階係數可以通過機率矩陣的轉置乘以資料矩陣獲得:X=Loglike T * feats,其中,X為二階係數矩陣,loglike為機率矩陣,feats為特徵資料矩陣。 The second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix: X = Loglike T * feats , where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
在計算得到一階、二階係數以後,並行計算一次項及二次項,然後通過一次項及二次項計算當前聲紋鑒別向量。 After the first-order and second-order coefficients are calculated, the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
較佳地,背景通道模型為高斯混合模型,在上述步驟S1之前包括: 獲取預設數量的語音資料樣本,並獲取各語音資料樣本對應的聲紋特徵,並基於各語音資料樣本對應的聲紋特徵構建各語音資料樣本對應的聲紋特徵向量;將各語音資料樣本對應的聲紋特徵向量分為第一比例的訓練集及第二比例的驗證集,該第一比例及第二比例的和小於等於1;利用該訓練集中的聲紋特徵向量對高斯混合模型進行訓練,並在訓練完成後,利用該驗證集對訓練後的高斯混合模型的準確率進行驗證;若該準確率大於預設閾值,則模型訓練結束,以訓練後的高斯混合模型作為該步驟S2的背景通道模型,或者,若該準確率小於或等於預設閾值,則增加該語音資料樣本的數量,並基於增加後的語音資料樣本重新進行訓練。 Preferably, the background channel model is a Gaussian mixture model, and before the step S1, the method includes: acquiring a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and based on the voiceprint corresponding to each voice data sample The feature maps the voiceprint feature vector corresponding to each voice data sample; the voiceprint feature vector corresponding to each voice data sample is divided into a first proportional training set and a second proportional verification set, and the sum of the first ratio and the second ratio Less than or equal to 1; the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the verification set is used to verify the accuracy of the trained Gaussian mixture model; if the accuracy is greater than the preset Threshold, the model training ends, the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the number of the voice data samples is increased, and based on the increased The voice data samples are retrained.
其中,在利用訓練集中的聲紋特徵向量對高斯混合模型進行訓練時,抽取出來的D維聲紋特徵對應的似然機率可用K個高斯分量表示為: ,其中,P(x)為語音資料樣本由高斯混合模型生成的機率(混 合高斯模型),wk為每個高斯模型的權重,p(x|k)為樣本由第k個高斯模型生成的機率,K為高斯模型數量。 When the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, the likelihood probability of the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components: , where P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x|k) is the sample generated by the kth Gaussian model. Probability, K is the number of Gaussian models.
整個高斯混合模型的參數可以表示為:{w i ,μ i ,Σ i },wi為第i個高斯模型的權重,μi為第i個高斯模型的均值,Σi為第i個高斯模型的協方差。訓練該高斯混合模型可以用非監督的EM算法。訓練完成後,得到高斯混合模型的權重向量、常數向量、N個共變異數矩陣、均值乘以共變異數的矩陣等,即為一個訓練後的高斯混合模型。 The parameters of the entire Gaussian mixture model can be expressed as: { w i , μ i , Σ i }, w i is the weight of the i-th Gaussian model, μ i is the mean of the i-th Gaussian model, and Σ i is the i-th Gaussian The covariance of the model. Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N common variance matrix, and the mean multiplied by the common variance matrix are obtained, which is a trained Gaussian mixture model.
步驟S3,計算該當前聲紋鑒別向量與預存的該用戶的標準聲紋鑒別向量之間的空間距離,基於該距離對該用戶進行身份驗證,並生成驗證結果。 Step S3: Calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
向量與向量之間的距離有多種,包括餘弦距離及歐氏距離等 等,較佳地,本實施例的空間距離為餘弦距離,餘弦距離為利用向量空間中兩個向量夾角的餘弦值作為衡量兩個個體間差異的大小的度量。 There are various distances between the vector and the vector, including the cosine distance and the Euclidean distance, etc. Preferably, the spatial distance of the present embodiment is a cosine distance, and the cosine distance is measured by using the cosine of the angles of the two vectors in the vector space. A measure of the magnitude of the difference between two individuals.
其中,標準聲紋鑒別向量為預先獲得並存儲的聲紋鑒別向量,標準聲紋鑒別向量在存儲時攜帶其對應的用戶的標識資訊,其能夠準確代表對應的用戶的身份。在計算空間距離前,根據用戶提供的標識資訊獲得存儲的聲紋鑒別向量。 The standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user. The stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
其中,在計算得到的空間距離小於等於預設距離閾值時,驗證通過,反之,則驗證失敗。 Wherein, when the calculated spatial distance is less than or equal to the preset distance threshold, the verification passes, and vice versa, the verification fails.
與現有技術相比,本實施例預先訓練生成的背景通道模型為通過對大量語音資料的挖掘與比對訓練得到,這一模型可以在最大限度保留用戶的聲紋特徵的同時,精確刻畫用戶說話時的背景聲紋特徵,並能夠在識別時將這一特徵去除,而提取用戶聲音的固有特徵,能夠較大地提高用戶身份驗證的準確率,並提高身份驗證的效率;此外,本實施例充分利用了人聲中與聲道相關的聲紋特徵,這種聲紋特徵並不需要對文本加以限制,因而在進行識別與驗證的過程中有較大的靈活性。 Compared with the prior art, the background channel model generated by the pre-training in this embodiment is obtained by mining and comparing a large amount of voice data, and the model can accurately describe the user's voice while retaining the user's voiceprint feature to the utmost extent. The background voiceprint feature, and can remove this feature at the time of recognition, and extract the inherent features of the user voice, can greatly improve the accuracy of the user identity verification, and improve the efficiency of the identity verification; in addition, this embodiment is sufficient It utilizes the voiceprint features associated with the vocal vocal in the human voice. This voiceprint feature does not require restrictions on the text, and thus has greater flexibility in the process of identification and verification.
在一較佳的實施例中,如第2圖所示,在上述第1圖的實施例的基礎上,上述步驟S1包括:子步驟S11,對該語音資料進行預加權、分幀及加窗處理;本實施例中,在接收到進行身份驗證的用戶的語音資料後,對語音資料進行處理。其中,預加權處理實際是高通濾波處理,濾除低頻資料,使得語音資料中的高頻特性更加突顯,具體地,高通濾波的傳遞函數為:H(Z)=1-αZ -1,其中,Z為語音資料,α為常量係數,較佳地,α的取值為0.97;由於聲音信號只在較短時間內呈現平穩性,因此將一段聲音信號分成N段短時間的信號(即N幀),且為了避免聲音的連續性特徵遺失,相鄰幀之間有一段重複區域,重複區域一般為每幀長的1/2;在對語音資料進行分幀後,每一幀信號都當成平穩信號來處理,但吉布斯效應的存在,語音資料的起始幀及結束幀是不連續的,在分幀之後,更加背離原始語音,因此,需要對語音資料進行加窗處理。 In a preferred embodiment, as shown in FIG. 2, based on the embodiment of FIG. 1 above, the step S1 includes: sub-step S11, pre-weighting, framing, and windowing the voice data. Processing; in this embodiment, after receiving the voice data of the user who performs identity verification, the voice data is processed. Wherein, the pre-weighting process is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the voice data are more prominent. Specifically, the transfer function of the high-pass filter is: H ( Z )=1- αZ -1 , wherein Z is the speech data, α is a constant coefficient, preferably, the value of α is 0.97; since the sound signal exhibits smoothness only in a short time, the sound signal is divided into N segments of short time signals (ie, N frames) ), and in order to avoid the loss of the continuity feature of the sound, there is a repeating area between adjacent frames, and the repeating area is generally 1/2 of the length of each frame; after framing the voice data, each frame signal is regarded as stable The signal is processed, but the existence of the Gibbs effect, the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the speech data needs to be windowed.
子步驟S12,對每一個加窗進行傅立葉轉換得到對應的頻 譜;子步驟S13,將該頻譜輸入梅爾濾波器以輸出得到梅爾頻譜;子步驟S14,在梅爾頻譜上面進行倒頻譜分析以獲得梅爾頻率倒頻譜係數MFCC,基於該梅爾頻率倒頻譜係數MFCC組成對應的聲紋特徵向量。其中,倒頻譜分析例如為取對數、做反轉換,反轉換一般是通過DCT離散餘弦轉換來實現,取DCT後的第2個到第13個係數作為MFCC係數。梅爾頻率倒頻譜係數MFCC即為這幀語音資料的聲紋特徵,將每幀的梅爾頻率倒頻譜係數MFCC組成特徵資料矩陣,該特徵資料矩陣即為語音資料的聲紋特徵向量。 Sub-step S12, performing Fourier transform on each window to obtain a corresponding spectrum; sub-step S13, inputting the spectrum into the mel filter to output a mel spectrum; sub-step S14, performing cepst analysis on the mel spectrum A Mel frequency cepstral coefficient MFCC is obtained, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC. The cepstrum analysis is, for example, taking logarithm and inverse transform. The inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients. The Mel frequency cepstral coefficient MFCC is the voiceprint feature of the speech data of this frame. The Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
在一較佳的實施例中,如第3圖所示,在上述第1圖的實施例的基礎上,上述步驟S3包括:子步驟S31,計算該當前聲紋鑒別向量與預存的該用戶的標 準聲紋鑒別向量之間的餘弦距離:,為該標準聲紋鑒別向量, 為當前聲紋鑒別向量;子步驟S32,若該餘弦距離小於或者等於預設的距離閾值,則生成驗證通過的資訊;子步驟S33,若該餘弦距離大於預設的距離閾值,則生成驗證不通過的資訊。 In a preferred embodiment, as shown in FIG. 3, based on the embodiment of FIG. 1 above, the step S3 includes: sub-step S31, calculating the current voiceprint discrimination vector and the pre-stored user's The cosine distance between standard voiceprint discrimination vectors: , Identify the vector for the standard voiceprint, For the current voiceprint discrimination vector; sub-step S32, if the cosine distance is less than or equal to the preset distance threshold, generate verification pass information; sub-step S33, if the cosine distance is greater than the preset distance threshold, generate verification Information passed.
在一較佳的實施例中,在上述第1圖的實施例的基礎上,上述的步驟S3替換為:計算該當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,獲取最小的空間距離,基於該最小的空間距離對該用戶進行身份驗證,並生成驗證結果。 In a preferred embodiment, based on the embodiment of FIG. 1 above, the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, Get the minimum spatial distance, authenticate the user based on the minimum spatial distance, and generate the verification result.
本實施例與第1圖的實施例不同的是,本實施例在存儲標準聲紋鑒別向量時並不攜帶用戶的標識資訊,在驗證用戶的身份時,計算當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,並取得最小的空間距離,如果該最小的空間距離小於預設的距離閾值(該距離閾值 與上述實施例的距離閾值相同或者不同),則驗證通過,否則驗證失敗。 This embodiment is different from the embodiment of FIG. 1 in that the present embodiment does not carry the user's identification information when storing the standard voiceprint authentication vector, and calculates the current voiceprint identification vector and the pre-stored each when verifying the identity of the user. The standard voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, Otherwise the verification fails.
請參閱第4圖,第4圖是本發明基於聲紋識別的身份驗證的系統10較佳實施例的運行環境示意圖。 Please refer to FIG. 4, which is a schematic diagram of an operating environment of a preferred embodiment of the voiceprint recognition based authentication system 10 of the present invention.
在本實施例中,基於聲紋識別的身份驗證的系統10安裝並運行於電子裝置1中。電子裝置1可以是桌上型計算機、筆記本、掌上電腦及服務器等計算設備。該電子裝置1可包括,但不僅限於,記憶體11、處理器12及顯示器13。第1圖僅示出了具有元件11-13的電子裝置1,但是應理解的是,並不要求實施所有示出的元件,可以替代的實施更多或者更少的元件。 In the present embodiment, the voiceprint recognition based authentication system 10 is installed and operates in the electronic device 1. The electronic device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a server. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. The first figure shows only the electronic device 1 with the elements 11-13, but it should be understood that not all illustrated elements are required to be implemented, and more or fewer elements may be implemented instead.
記憶體11在一些實施例中可以是電子裝置1的內部記憶單元,例如該電子裝置1的硬碟或記憶體。記憶體11在另一些實施例中也可以是電子裝置1的外部記憶設備,例如電子裝置1上配備的插接式硬碟,智慧媒體卡(Smart Media Card,SMC),安全數位(Secure Digital,SD)卡,快閃記憶卡(Flash Card)等。進一步地,記憶體11還可以既包括電子裝置1的內部存儲單元也包括外部記憶設備。記憶體11用於儲存安裝於電子裝置1的應用軟體及各類資料,例如基於聲紋識別的身份驗證的系統10的程式代碼等。記憶體11還可以用於暫時地存儲已經輸出或者將要輸出的資料。 The memory 11 may be an internal memory unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1, in some embodiments. The memory 11 may also be an external memory device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart media card (SMC), and a secure digital device (Secure Digital, SD) card, flash card, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external memory device. The memory 11 is used to store application software installed on the electronic device 1 and various types of materials, such as program code of the system 10 for voiceprint recognition based authentication. The memory 11 can also be used to temporarily store data that has been output or is about to be output.
處理器12在一些實施例中可以是一中央處理器(Central Processing Unit,CPU),微處理器或其他資料處理芯片,用於運行記憶體11中存儲的程式代碼或處理資料,例如執行基於聲紋識別的身份驗證的系統10等。 The processor 12, in some embodiments, may be a central processing unit (CPU), a microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example, performing sound based Pattern recognition for the authentication system 10 etc.
顯示器13在一些實施例中可以是LED顯示器、液晶顯示器、觸控式液晶顯示器以及OLED(Organic Light-Emitting Diode,有機發光二極體)觸摸器等。顯示器13用於顯示在電子裝置1中處理的資訊以及用於顯示可視化的用戶界面,例如聲紋識別界面等。電子裝置1的部件11-13通過系統匯流排相互通信。 The display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like in some embodiments. The display 13 is for displaying information processed in the electronic device 1 and a user interface for displaying visualization, such as a voiceprint recognition interface or the like. The components 11-13 of the electronic device 1 communicate with each other through a system bus bar.
請參閱第5圖,是本發明基於聲紋識別的身份驗證的系統10較佳實施例的功能模組圖。在本實施例中,基於聲紋識別的身份驗證的 系統10可以被分割成一個或多個模組,一個或者多個模組被存儲於記憶體11中,並由一個或多個處理器(本實施例為處理器12)所執行,以完成本發明。例如,在第5圖中,基於聲紋識別的身份驗證的系統10可以被分割成偵測模組21、識別模組22、複製模組23、安裝模組24及啟動模組25。本發明所稱的模組是指能夠完成特定功能的一系列電腦程式指令段,比程式更適合於描述基於聲紋識別的身份驗證的系統10在電子裝置1中的執行過程,其中:第一獲取模組101,用於在接收到進行身份驗證的用戶的語音資料後,獲取該語音資料的聲紋特徵,並基於該聲紋特徵構建對應的聲紋特徵向量;本實施例中,語音資料由語音採集設備採集得到(語音採集設備例如為麥克風),語音採集設備將採集的語音資料發送給基於聲紋識別的身份驗證的系統。 Please refer to FIG. 5, which is a functional block diagram of a preferred embodiment of the voiceprint recognition based authentication system 10 of the present invention. In this embodiment, the voiceprint recognition based authentication system 10 can be divided into one or more modules, one or more modules being stored in the memory 11 and being processed by one or more processors ( This embodiment is executed by the processor 12) to complete the present invention. For example, in FIG. 5, the voiceprint recognition based authentication system 10 can be divided into a detection module 21, an identification module 22, a replication module 23, an installation module 24, and a startup module 25. The module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function, and is more suitable than the program for describing the execution process of the voiceprint recognition-based authentication system 10 in the electronic device 1, wherein: The acquiring module 101 is configured to acquire a voiceprint feature of the voice data after receiving the voice data of the user who performs the authentication, and construct a corresponding voiceprint feature vector based on the voiceprint feature; in this embodiment, the voice data It is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the system based on the voiceprint recognition identity verification.
在採集語音資料時,應儘量防止環境噪音及語音採集設備的干擾。語音採集設備與用戶保持適當距離,且儘量不用失真大的語音採集設備,電源較佳使用市電,並保持電流穩定;在進行電話錄音時應使用感測器。在提取語音資料中的聲紋特徵之前,可以對語音資料進行去噪音處理,以進一步減少干擾。為了能夠提取得到語音資料的聲紋特徵,所採集的語音資料為預設資料長度的語音資料,或者為大於預設資料長度的語音資料。 When collecting voice data, you should try to prevent environmental noise and interference from voice collection equipment. The voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device. The power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone. Before extracting the voiceprint features in the voice data, the voice data can be denoised to further reduce interference. In order to extract the voiceprint feature of the voice data, the collected voice data is voice data of a preset data length, or voice data larger than a preset data length.
聲紋特徵包括多種類型,例如寬帶聲紋、窄帶聲紋、振幅聲紋等,本實施例的聲紋特徵為較佳地為語音資料的梅爾頻率倒頻譜係數(Mel Frequency Cepstrum Coefficient,MFCC)。在構建對應的聲紋特徵向量時,將語音資料的聲紋特徵組成特徵資料矩陣,該特徵資料矩陣即為語音資料的聲紋特徵向量。 The voiceprint features include a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is preferably a Mel Frequency Cepstrum Coefficient (MFCC). . When constructing the corresponding voiceprint feature vector, the voiceprint feature of the voice data is composed into a feature data matrix, which is the voiceprint feature vector of the voice data.
構建模組102,用於將該聲紋特徵向量輸入預先訓練生成的背景通道模型,以構建出該語音資料對應的當前聲紋鑒別向量;其中,將聲紋特徵向量輸入預先訓練生成的背景通道模型,較佳地,該背景通道模型為高斯混合模型,利用該背景通道模型來計算聲 紋特徵向量,得出對應的當前聲紋鑒別向量(即i-vector)。 The building module 102 is configured to input the voiceprint feature vector into the background channel model generated by the pre-training to construct a current voiceprint discrimination vector corresponding to the voice data; wherein the voiceprint feature vector is input into the background channel generated by the pre-training The model, preferably, the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector (ie, i-vector).
具體地,該計算過程包括: Specifically, the calculation process includes:
1)、選擇高斯模型:首先,利用通用背景通道模型中的參數來計算每幀資料在不同高斯模型的似然對數值,通過對似然對數值矩陣每列並行排序,選取前N個高斯模型,最終獲得一每幀資料在混合高斯模型中數值的矩陣:Loglike=E(X)*D(X)-1*X T -0.5*D(X)-1*(X.2) T ,其中,Loglike為似然對數值矩陣,E(X)為通用背景通道模型訓練出來的均值矩陣,D(X)為共變異數矩陣,X為資料矩陣,X.2為矩陣每個值取平方。 1) Select the Gaussian model: First, use the parameters in the general background channel model to calculate the likelihood value of each frame of data in different Gaussian models. By sorting the columns of the likelihood logarithmic matrix in parallel, select the first N Gaussian models. Finally, obtain a matrix of values per frame of data in the mixed Gaussian model: Loglike = E ( X ) * D ( X ) -1 * X T -0.5* D ( X ) -1 *( X . 2 ) T , where Loglike is the likelihood logarithmic matrix, E(X) is the mean matrix trained by the general background channel model, D(X) is the covariance matrix, X is the data matrix, and X. 2 is the square of each value of the matrix.
2)、計算後驗機率:將每幀資料X進行X * XT計算,得到一個對稱矩陣,可簡化為下三角矩陣,並將元素按順序排列為1行,變成一個N幀乘以該下三角矩陣個數緯度的一個向量進行計算,將所有幀的該向量組合成新的資料矩陣,同時將通用背景模型中計算機率的共變異數矩陣,每個矩陣也簡化為下三角矩陣,變成與新資料矩陣類似的矩陣,在通過通用背景通道模型中的均值矩陣及共變異數矩陣算出每幀資料的在該選擇的高斯模型下的似然對數值,然後進行Softmax回歸,最後進行正歸化操作,得到每幀在混合高斯模型後驗機率分佈,將每幀的機率分佈向量組成機率矩陣。 2) Calculate the posterior probability: X * XT calculation is performed for each frame of data X to obtain a symmetric matrix, which can be simplified into a lower triangular matrix, and the elements are arranged in order of 1 row, and become an N frame multiplied by the lower triangle. A vector of latitudes of the matrix is calculated, and the vectors of all frames are combined into a new data matrix. At the same time, the computer-wide covariance matrix of the general background model is simplified, and each matrix is also reduced to a lower triangular matrix. A matrix similar to the data matrix, the likelihood logarithm of each frame of data in the selected Gaussian model is calculated by the mean matrix and the common variance matrix in the general background channel model, and then Softmax regression is performed, and finally the normalization operation is performed. Obtain a posterior probability distribution of each frame in the mixed Gaussian model, and construct a probability matrix of the probability distribution vector of each frame.
3)、提取當前聲紋鑒別向量:首先進行一階、二階係數的計算,一階係數計算可以通過機率矩陣列求和得到: ,其中,Gammai為一階係數向量的第i 個元素,loglikesji為機率矩陣的第j行,第i個元素。 3) Extract the current voiceprint discrimination vector: first calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix columns: Where Gamma i is the ith element of the first-order coefficient vector, and loglikes ji is the j-th row of the probability matrix, the ith element.
二階係數可以通過機率矩陣的轉置乘以資料矩陣獲得:X=Loglike T * feats,其中,X為二階係數矩陣,loglike為機率矩陣,feats為特徵資料矩陣。 The second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix: X = Loglike T * feats , where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
在計算得到一階,二階係數以後,並行計算一次項及二次項, 然後通過一次項及二次項計算當前聲紋鑒別向量。 After the first-order and second-order coefficients are calculated, the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
較佳地,背景通道模型為高斯混合模型,基於聲紋識別的身份驗證的系統還包括:第二獲取模組,用於獲取預設數量的語音資料樣本,並獲取各語音資料樣本對應的聲紋特徵,並基於各語音資料樣本對應的聲紋特徵構建各語音資料樣本對應的聲紋特徵向量;劃分模組,用於將各語音資料樣本對應的聲紋特徵向量分為第一比例的訓練集及第二比例的驗證集,該第一比例及第二比例的和小於等於1;訓練模組,用於利用該訓練集中的聲紋特徵向量對高斯混合模型進行訓練,並在訓練完成後,利用該驗證集對訓練後的高斯混合模型的準確率進行驗證;處理模組,用於若該準確率大於預設閾值,則模型訓練結束,以訓練後的高斯混合模型作為該背景通道模型,或者,若該準確率小於等於預設閾值,則增加該語音資料樣本的數量,並基於增加後的語音資料樣本重新進行訓練。 Preferably, the background channel model is a Gaussian mixture model, and the voiceprint recognition based authentication system further includes: a second acquisition module, configured to acquire a preset number of voice data samples, and obtain sounds corresponding to the voice data samples. The pattern features, and the voiceprint feature vector corresponding to each voice data sample is constructed based on the voiceprint features corresponding to each phonetic data sample; the segmentation module is configured to divide the voiceprint feature vector corresponding to each phonetic data sample into the first ratio training. And the second ratio of the verification set, the sum of the first ratio and the second ratio is less than or equal to 1; the training module is configured to train the Gaussian mixture model by using the voiceprint feature vector in the training set, and after the training is completed The verification set is used to verify the accuracy of the trained Gaussian mixture model; the processing module is configured to end the model training if the accuracy is greater than a preset threshold, and the trained Gaussian mixture model is used as the background channel model. Or, if the accuracy is less than or equal to the preset threshold, increase the number of the voice data samples, and based on the increased voice data sample Retrain.
其中,在利用訓練集中的聲紋特徵向量對高斯混合模型進行訓練時,抽取出來的D維聲紋特徵對應的似然機率可用K個高斯分量表示為: ,其中,P(x)為語音資料樣本由高斯混合模 型生成的機率(混合高斯模型),wk為每個高斯模型的權重,p(x|k)為樣本由第k個高斯模型生成的機率,K為高斯模型數量。 When the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, the likelihood probability of the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components: , where P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x|k) is the sample generated by the kth Gaussian model. Probability, K is the number of Gaussian models.
整個高斯混合模型的參數可以表示為:{w i ,μ i ,Σ i },wi為第i個高斯模型的權重,μi為第i個高斯模型的均值,Σi為第i個高斯模型的協方差。訓練該高斯混合模型可以用非監督的EM算法。訓練完成後,得到高斯混合模型的權重向量、常數向量、N個共變異數矩陣、均值乘以共變異數的矩陣等,即為一個訓練後的高斯混合模型。 The parameters of the entire Gaussian mixture model can be expressed as: { w i , μ i , Σ i }, w i is the weight of the i-th Gaussian model, μ i is the mean of the i-th Gaussian model, and Σ i is the i-th Gaussian The covariance of the model. Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N common variance matrix, and the mean multiplied by the common variance matrix are obtained, which is a trained Gaussian mixture model.
第一驗證模組103,用於計算該當前聲紋鑒別向量與預存的該用戶的標準聲紋鑒別向量之間的空間距離,基於該距離對該用戶進行身份驗證,並生成驗證結果。 The first verification module 103 is configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
向量與向量之間的距離有多種,包括餘弦距離及歐氏距離等等,較佳地,本實施例的空間距離為餘弦距離,餘弦距離為利用向量空間中兩個向量夾角的餘弦值作為衡量兩個個體間差異的大小的度量。 There are various distances between the vector and the vector, including the cosine distance and the Euclidean distance, etc. Preferably, the spatial distance of the present embodiment is a cosine distance, and the cosine distance is measured by using the cosine of the angles of the two vectors in the vector space. A measure of the magnitude of the difference between two individuals.
其中,標準聲紋鑒別向量為預先獲得並存儲的聲紋鑒別向量,標準聲紋鑒別向量在存儲時攜帶其對應的用戶的標識資訊,其能夠準確代表對應的用戶的身份。在計算空間距離前,根據用戶提供的標識資訊獲得存儲的聲紋鑒別向量。 The standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user. The stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
其中,在計算得到的空間距離小於等於預設距離閾值時,驗證通過,反之,則驗證失敗。 Wherein, when the calculated spatial distance is less than or equal to the preset distance threshold, the verification passes, and vice versa, the verification fails.
在一較佳的實施例中,在上述圖5的實施例的基礎上,上述第一獲取模組101具體用於對該語音資料進行預加權、分幀及加窗處理;對每一個加窗進行傅立葉轉換得到對應的頻譜;將該頻譜輸入梅爾濾波器以輸出得到梅爾頻譜;在梅爾頻譜上面進行倒頻譜分析以獲得梅爾頻率倒頻譜係數MFCC,基於該梅爾頻率倒頻譜係數MFCC組成對應的聲紋特徵向量。 In a preferred embodiment, based on the foregoing embodiment of FIG. 5, the first acquiring module 101 is specifically configured to perform pre-weighting, framing, and windowing on the voice data; Perform Fourier transform to obtain the corresponding spectrum; input the spectrum into the Meyer filter to output the Mel spectrum; perform cepstral analysis on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, based on the Mel frequency cepstral coefficient The MFCC constitutes a corresponding voiceprint feature vector.
其中,預加權處理實際是高通濾波處理,濾除低頻資料,使得語音資料中的高頻特性更加突顯,具體地,高通濾波的傳遞函數為:H(Z)=1-αZ -1,其中,Z為語音資料,α為常量係數,較佳地,α的取值為0.97;由於聲音信號只在較短時間內呈現平穩性,因此將一段聲音信號分成N段短時間的信號(即N幀),且為了避免聲音的連續性特徵遺失,相鄰幀之間有一段重複區域,重複區域一般為每幀長的1/2;在對語音資料進行分幀後,每一幀信號都當成平穩信號來處理,但吉布斯效應的存在,語音資料的起始幀及結束幀是不連續的,在分幀之後,更加背離原始語音,因此,需要對語音資料進行加窗處理。 Wherein, the pre-weighting process is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the voice data are more prominent. Specifically, the transfer function of the high-pass filter is: H ( Z )=1- αZ -1 , wherein Z is the speech data, α is a constant coefficient, preferably, the value of α is 0.97; since the sound signal exhibits smoothness only in a short time, the sound signal is divided into N segments of short time signals (ie, N frames) ), and in order to avoid the loss of the continuity feature of the sound, there is a repeating area between adjacent frames, and the repeating area is generally 1/2 of the length of each frame; after framing the voice data, each frame signal is regarded as stable The signal is processed, but the existence of the Gibbs effect, the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the speech data needs to be windowed.
其中,倒頻譜分析例如為取對數、做反轉換,反轉換一般是通過DCT離散餘弦轉換來實現,取DCT後的第2個到第13個係數作為 MFCC係數。梅爾頻率倒頻譜係數MFCC即為這幀語音資料的聲紋特徵,將每幀的梅爾頻率倒頻譜係數MFCC組成特徵資料矩陣,該特徵資料矩陣即為語音資料的聲紋特徵向量。 Among them, the cepstrum analysis is, for example, taking the logarithm and performing the inverse transform. The inverse transform is generally realized by DCT discrete cosine transform, and the second to thirteenth coefficients after the DCT are taken as the MFCC coefficients. The Mel frequency cepstral coefficient MFCC is the voiceprint feature of the speech data of this frame. The Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
在一較佳的實施例中,在上述圖5的實施例的基礎上,該第一驗證模組103具體用於計算該當前聲紋鑒別向量與預存的該用戶的標準 聲紋鑒別向量之間的餘弦距離:,為該標準聲紋鑒別向量, 為當前聲紋鑒別向量;若該餘弦距離小於或者等於預設的距離閾值,則生成驗證通過的資訊;若該餘弦距離大於預設的距離閾值,則生成驗證不通過的資訊。 In a preferred embodiment, based on the foregoing embodiment of FIG. 5, the first verification module 103 is specifically configured to calculate between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user. Cosine distance: , Identify the vector for the standard voiceprint, A vector for identifying the current voiceprint; if the cosine distance is less than or equal to a preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
在一較佳的實施例中,在上述第4圖的實施例的基礎上,上述的第一驗證模組替換為第二驗證模組,用於計算該當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,獲取最小的空間距離,基於該最小的空間距離對該用戶進行身份驗證,並生成驗證結果。 In a preferred embodiment, based on the embodiment of FIG. 4, the first verification module is replaced with a second verification module for calculating the current voiceprint identification vector and the pre-stored standards. The spatial distance between the voiceprint discrimination vectors obtains a minimum spatial distance, and the user is authenticated based on the minimum spatial distance, and a verification result is generated.
本實施例與第5圖的實施例不同的是,本實施例在存儲標準聲紋鑒別向量時並不攜帶用戶的標識資訊,在驗證用戶的身份時,計算當前聲紋鑒別向量與預存的各標準聲紋鑒別向量之間的空間距離,並取得最小的空間距離,如果該最小的空間距離小於預設的距離閾值(該距離閾值與上述實施例的距離閾值相同或者不同),則驗證通過,否則驗證失敗。 This embodiment is different from the embodiment of FIG. 5 in that the present embodiment does not carry the user's identification information when storing the standard voiceprint authentication vector, and calculates the current voiceprint identification vector and the pre-stored each when verifying the identity of the user. The standard voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, Otherwise the verification fails.
以上該僅為本發明的較佳實施例,並不用以限制本發明,凡在本發明的精神及原則之內,所作的任何修改、等同替換、改進等,均應包含在本發明的保護範圍之內。 The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the scope of the present invention. within.
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
??201710147695.X | 2017-03-13 | ||
CN201710147695.XA CN107068154A (en) | 2017-03-13 | 2017-03-13 | The method and system of authentication based on Application on Voiceprint Recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201833810A true TW201833810A (en) | 2018-09-16 |
TWI641965B TWI641965B (en) | 2018-11-21 |
Family
ID=59622093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW106135250A TWI641965B (en) | 2017-03-13 | 2017-10-13 | Method and system of authentication based on voiceprint recognition |
Country Status (3)
Country | Link |
---|---|
CN (2) | CN107068154A (en) |
TW (1) | TWI641965B (en) |
WO (2) | WO2018166112A1 (en) |
Families Citing this family (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107527620B (en) * | 2017-07-25 | 2019-03-26 | 平安科技(深圳)有限公司 | Electronic device, the method for authentication and computer readable storage medium |
CN107993071A (en) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | Electronic device, auth method and storage medium based on vocal print |
CN108172230A (en) * | 2018-01-03 | 2018-06-15 | 平安科技(深圳)有限公司 | Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model |
CN108269575B (en) * | 2018-01-12 | 2021-11-02 | 平安科技(深圳)有限公司 | Voice recognition method for updating voiceprint data, terminal device and storage medium |
CN108154371A (en) * | 2018-01-12 | 2018-06-12 | 平安科技(深圳)有限公司 | Electronic device, the method for authentication and storage medium |
CN108091326B (en) * | 2018-02-11 | 2021-08-06 | 张晓雷 | Voiceprint recognition method and system based on linear regression |
CN108768654B (en) * | 2018-04-09 | 2020-04-21 | 平安科技(深圳)有限公司 | Identity verification method based on voiceprint recognition, server and storage medium |
CN108766444B (en) * | 2018-04-09 | 2020-11-03 | 平安科技(深圳)有限公司 | User identity authentication method, server and storage medium |
CN108694952B (en) * | 2018-04-09 | 2020-04-28 | 平安科技(深圳)有限公司 | Electronic device, identity authentication method and storage medium |
CN108806695A (en) * | 2018-04-17 | 2018-11-13 | 平安科技(深圳)有限公司 | Anti- fraud method, apparatus, computer equipment and the storage medium of self refresh |
CN108447489B (en) * | 2018-04-17 | 2020-05-22 | 清华大学 | Continuous voiceprint authentication method and system with feedback |
CN108630208B (en) * | 2018-05-14 | 2020-10-27 | 平安科技(深圳)有限公司 | Server, voiceprint-based identity authentication method and storage medium |
CN108650266B (en) * | 2018-05-14 | 2020-02-18 | 平安科技(深圳)有限公司 | Server, voiceprint verification method and storage medium |
CN108834138B (en) * | 2018-05-25 | 2022-05-24 | 北京国联视讯信息技术股份有限公司 | Network distribution method and system based on voiceprint data |
CN109101801B (en) * | 2018-07-12 | 2021-04-27 | 北京百度网讯科技有限公司 | Method, apparatus, device and computer readable storage medium for identity authentication |
CN109087647B (en) * | 2018-08-03 | 2023-06-13 | 平安科技(深圳)有限公司 | Voiceprint recognition processing method and device, electronic equipment and storage medium |
CN109256138B (en) * | 2018-08-13 | 2023-07-07 | 平安科技(深圳)有限公司 | Identity verification method, terminal device and computer readable storage medium |
CN110867189A (en) * | 2018-08-28 | 2020-03-06 | 北京京东尚科信息技术有限公司 | Login method and device |
CN110880325B (en) * | 2018-09-05 | 2022-06-28 | 华为技术有限公司 | Identity recognition method and equipment |
CN109450850B (en) * | 2018-09-26 | 2022-10-11 | 深圳壹账通智能科技有限公司 | Identity authentication method, identity authentication device, computer equipment and storage medium |
CN109377662A (en) * | 2018-09-29 | 2019-02-22 | 途客易达(天津)网络科技有限公司 | Charging pile control method, device and electronic equipment |
CN109257362A (en) * | 2018-10-11 | 2019-01-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN109378002B (en) * | 2018-10-11 | 2024-05-07 | 平安科技(深圳)有限公司 | Voiceprint verification method, voiceprint verification device, computer equipment and storage medium |
CN109147797B (en) * | 2018-10-18 | 2024-05-07 | 平安科技(深圳)有限公司 | Customer service method, device, computer equipment and storage medium based on voiceprint recognition |
CN109524026B (en) * | 2018-10-26 | 2022-04-26 | 北京网众共创科技有限公司 | Method and device for determining prompt tone, storage medium and electronic device |
CN109473105A (en) * | 2018-10-26 | 2019-03-15 | 平安科技(深圳)有限公司 | The voice print verification method, apparatus unrelated with text and computer equipment |
CN109493873A (en) * | 2018-11-13 | 2019-03-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109360573A (en) * | 2018-11-13 | 2019-02-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109636630A (en) * | 2018-12-07 | 2019-04-16 | 泰康保险集团股份有限公司 | Method, apparatus, medium and electronic equipment of the detection for behavior of insuring |
CN110046910B (en) * | 2018-12-13 | 2023-04-14 | 蚂蚁金服(杭州)网络技术有限公司 | Method and equipment for judging validity of transaction performed by customer through electronic payment platform |
CN109816508A (en) * | 2018-12-14 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Method for authenticating user identity, device based on big data, computer equipment |
CN109473108A (en) * | 2018-12-15 | 2019-03-15 | 深圳壹账通智能科技有限公司 | Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition |
CN109545226B (en) * | 2019-01-04 | 2022-11-22 | 平安科技(深圳)有限公司 | Voice recognition method, device and computer readable storage medium |
CN110322888B (en) * | 2019-05-21 | 2023-05-30 | 平安科技(深圳)有限公司 | Credit card unlocking method, apparatus, device and computer readable storage medium |
CN110298150B (en) * | 2019-05-29 | 2021-11-26 | 上海拍拍贷金融信息服务有限公司 | Identity verification method and system based on voice recognition |
CN110334603A (en) * | 2019-06-06 | 2019-10-15 | 视联动力信息技术股份有限公司 | Authentication system |
CN110738998A (en) * | 2019-09-11 | 2020-01-31 | 深圳壹账通智能科技有限公司 | Voice-based personal credit evaluation method, device, terminal and storage medium |
CN110473569A (en) * | 2019-09-11 | 2019-11-19 | 苏州思必驰信息科技有限公司 | Detect the optimization method and system of speaker's spoofing attack |
CN110971755B (en) * | 2019-11-18 | 2021-04-20 | 武汉大学 | Double-factor identity authentication method based on PIN code and pressure code |
CN111402899B (en) * | 2020-03-25 | 2023-10-13 | 中国工商银行股份有限公司 | Cross-channel voiceprint recognition method and device |
CN111597531A (en) * | 2020-04-07 | 2020-08-28 | 北京捷通华声科技股份有限公司 | Identity authentication method and device, electronic equipment and readable storage medium |
CN111625704A (en) * | 2020-05-11 | 2020-09-04 | 镇江纵陌阡横信息科技有限公司 | Non-personalized recommendation algorithm model based on user intention and data cooperation |
CN111710340A (en) * | 2020-06-05 | 2020-09-25 | 深圳市卡牛科技有限公司 | Method, device, server and storage medium for identifying user identity based on voice |
CN111613230A (en) * | 2020-06-24 | 2020-09-01 | 泰康保险集团股份有限公司 | Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium |
CN111899566A (en) * | 2020-08-11 | 2020-11-06 | 南京畅淼科技有限责任公司 | Ship traffic management system based on AIS |
CN112289324B (en) * | 2020-10-27 | 2024-05-10 | 湖南华威金安企业管理有限公司 | Voiceprint identity recognition method and device and electronic equipment |
CN112669841B (en) * | 2020-12-18 | 2024-07-02 | 平安科技(深圳)有限公司 | Training method and device for generating model of multilingual voice and computer equipment |
CN112802481A (en) * | 2021-04-06 | 2021-05-14 | 北京远鉴信息技术有限公司 | Voiceprint verification method, voiceprint recognition model training method, device and equipment |
CN113421575B (en) * | 2021-06-30 | 2024-02-06 | 平安科技(深圳)有限公司 | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium |
CN113889120A (en) * | 2021-09-28 | 2022-01-04 | 北京百度网讯科技有限公司 | Voiceprint feature extraction method and device, electronic equipment and storage medium |
CN114780787A (en) * | 2022-04-01 | 2022-07-22 | 杭州半云科技有限公司 | Voiceprint retrieval method, identity verification method, identity registration method and device |
CN114826709B (en) * | 2022-04-15 | 2024-07-09 | 马上消费金融股份有限公司 | Identity authentication and acoustic environment detection method, system, electronic equipment and medium |
Family Cites Families (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645137B2 (en) * | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
CN1170239C (en) * | 2002-09-06 | 2004-10-06 | 浙江大学 | Palm acoustic-print verifying system |
TWI234762B (en) * | 2003-12-22 | 2005-06-21 | Top Dihital Co Ltd | Voiceprint identification system for e-commerce |
US7447633B2 (en) * | 2004-11-22 | 2008-11-04 | International Business Machines Corporation | Method and apparatus for training a text independent speaker recognition system using speech data with text labels |
US7536304B2 (en) * | 2005-05-27 | 2009-05-19 | Porticus, Inc. | Method and system for bio-metric voice print authentication |
CN101064043A (en) * | 2006-04-29 | 2007-10-31 | 上海优浪信息科技有限公司 | Sound-groove gate inhibition system and uses thereof |
CN102479511A (en) * | 2010-11-23 | 2012-05-30 | 盛乐信息技术(上海)有限公司 | Large-scale voiceprint authentication method and system |
TW201301261A (en) * | 2011-06-27 | 2013-01-01 | Hon Hai Prec Ind Co Ltd | Identity authentication system and method thereof |
CN102238190B (en) * | 2011-08-01 | 2013-12-11 | 安徽科大讯飞信息科技股份有限公司 | Identity authentication method and system |
CN102509547B (en) * | 2011-12-29 | 2013-06-19 | 辽宁工业大学 | Method and system for voiceprint recognition based on vector quantization based |
US9042867B2 (en) * | 2012-02-24 | 2015-05-26 | Agnitio S.L. | System and method for speaker recognition on mobile devices |
CN102695112A (en) * | 2012-06-09 | 2012-09-26 | 九江妙士酷实业有限公司 | Automobile player and volume control method thereof |
CN102820033B (en) * | 2012-08-17 | 2013-12-04 | 南京大学 | Voiceprint identification method |
CN102916815A (en) * | 2012-11-07 | 2013-02-06 | 华为终端有限公司 | Method and device for checking identity of user |
CN103220286B (en) * | 2013-04-10 | 2015-02-25 | 郑方 | Identity verification system and identity verification method based on dynamic password voice |
CN104427076A (en) * | 2013-08-30 | 2015-03-18 | 中兴通讯股份有限公司 | Recognition method and recognition device for automatic answering of calling system |
CN103632504A (en) * | 2013-12-17 | 2014-03-12 | 上海电机学院 | Silence reminder for library |
CN104765996B (en) * | 2014-01-06 | 2018-04-27 | 讯飞智元信息科技有限公司 | Voiceprint password authentication method and system |
CN104978507B (en) * | 2014-04-14 | 2019-02-01 | 中国石油化工集团公司 | A kind of Intelligent controller for logging evaluation expert system identity identifying method based on Application on Voiceprint Recognition |
CN105100911A (en) * | 2014-05-06 | 2015-11-25 | 夏普株式会社 | Intelligent multimedia system and method |
CN103986725A (en) * | 2014-05-29 | 2014-08-13 | 中国农业银行股份有限公司 | Client side, server side and identity authentication system and method |
CN104157301A (en) * | 2014-07-25 | 2014-11-19 | 广州三星通信技术研究有限公司 | Method, device and terminal deleting voice information blank segment |
CN105321293A (en) * | 2014-09-18 | 2016-02-10 | 广东小天才科技有限公司 | Danger detection reminding method and intelligent equipment |
CN104485102A (en) * | 2014-12-23 | 2015-04-01 | 智慧眼(湖南)科技发展有限公司 | Voiceprint recognition method and device |
CN104751845A (en) * | 2015-03-31 | 2015-07-01 | 江苏久祥汽车电器集团有限公司 | Voice recognition method and system used for intelligent robot |
CN104992708B (en) * | 2015-05-11 | 2018-07-24 | 国家计算机网络与信息安全管理中心 | Specific audio detection model generation in short-term and detection method |
CN105096955B (en) * | 2015-09-06 | 2019-02-01 | 广东外语外贸大学 | A kind of speaker's method for quickly identifying and system based on model growth cluster |
CN105611461B (en) * | 2016-01-04 | 2019-12-17 | 浙江宇视科技有限公司 | Noise suppression method, device and system for front-end equipment voice application system |
CN105575394A (en) * | 2016-01-04 | 2016-05-11 | 北京时代瑞朗科技有限公司 | Voiceprint identification method based on global change space and deep learning hybrid modeling |
CN106971717A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | Robot and audio recognition method, the device of webserver collaborative process |
CN105869645B (en) * | 2016-03-25 | 2019-04-12 | 腾讯科技(深圳)有限公司 | Voice data processing method and device |
CN106210323B (en) * | 2016-07-13 | 2019-09-24 | Oppo广东移动通信有限公司 | A kind of speech playing method and terminal device |
CN106169295B (en) * | 2016-07-15 | 2019-03-01 | 腾讯科技(深圳)有限公司 | Identity vector generation method and device |
CN106373576B (en) * | 2016-09-07 | 2020-07-21 | Tcl科技集团股份有限公司 | Speaker confirmation method and system based on VQ and SVM algorithms |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
-
2017
- 2017-03-13 CN CN201710147695.XA patent/CN107068154A/en active Pending
- 2017-06-30 WO PCT/CN2017/091361 patent/WO2018166112A1/en active Application Filing
- 2017-08-20 CN CN201710715433.9A patent/CN107517207A/en active Pending
- 2017-09-30 WO PCT/CN2017/105031 patent/WO2018166187A1/en active Application Filing
- 2017-10-13 TW TW106135250A patent/TWI641965B/en active
Also Published As
Publication number | Publication date |
---|---|
CN107068154A (en) | 2017-08-18 |
WO2018166112A1 (en) | 2018-09-20 |
WO2018166187A1 (en) | 2018-09-20 |
TWI641965B (en) | 2018-11-21 |
CN107517207A (en) | 2017-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI641965B (en) | Method and system of authentication based on voiceprint recognition | |
WO2019100606A1 (en) | Electronic device, voiceprint-based identity verification method and system, and storage medium | |
CN107527620B (en) | Electronic device, the method for authentication and computer readable storage medium | |
WO2018107810A1 (en) | Voiceprint recognition method and apparatus, and electronic device and medium | |
Ajmera et al. | Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram | |
Dey et al. | Speech biometric based attendance system | |
TWI527023B (en) | A voiceprint recognition method and apparatus | |
WO2019136912A1 (en) | Electronic device, identity authentication method and system, and storage medium | |
KR20180104595A (en) | Method for identifying a gate, device, storage medium and backstage server | |
CN110047490A (en) | Method for recognizing sound-groove, device, equipment and computer readable storage medium | |
Biagetti et al. | Speaker identification with short sequences of speech frames | |
Farah et al. | Speaker recognition system using mel-frequency cepstrum coefficients, linear prediction coding and vector quantization | |
WO2019218515A1 (en) | Server, voiceprint-based identity authentication method, and storage medium | |
Herrera-Camacho et al. | Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE | |
Nagakrishnan et al. | Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models | |
Kekre et al. | Performance comparison of speaker identification using dct, walsh, haar on full and row mean of spectrogram | |
Nguyen et al. | Vietnamese speaker authentication using deep models | |
Sleit et al. | A histogram based speaker identification technique | |
Eshwarappa et al. | Bimodal biometric person authentication system using speech and signature features | |
Al-Hassani et al. | Design a text-prompt speaker recognition system using LPC-derived features | |
AU2019100372A4 (en) | A robust speaker recognition system based on dynamic time wrapping | |
CN112669881B (en) | Voice detection method, device, terminal and storage medium | |
Olakanmi et al. | A Secure Voice Signature Based Lightweight Authentication Approach for Remote Access Control | |
Hameed et al. | Secure E-Voting System using Voiceprint | |
Chaudhari et al. | Low power, small foot print embedded voice biometrics system |