CN107068154A

CN107068154A - The method and system of authentication based on Application on Voiceprint Recognition

Info

Publication number: CN107068154A
Application number: CN201710147695.XA
Authority: CN
Inventors: 王健宗; 丁涵宇; 郭卉; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2017-03-13
Filing date: 2017-03-13
Publication date: 2017-08-18
Also published as: WO2018166112A1; TW201833810A; WO2018166187A1; TWI641965B; CN107517207A

Abstract

The present invention relates to a kind of method and system of the authentication based on Application on Voiceprint Recognition, the method for the authentication based on Application on Voiceprint Recognition includes：After the speech data for the user for carrying out authentication is received, the vocal print feature of the speech data is obtained, and corresponding vocal print feature vector is built based on the vocal print feature；By the background channel model of vocal print feature vector input training in advance generation, to construct the corresponding current vocal print discriminant vectorses of the speech data；The space length between the standard vocal print discriminant vectorses of the current vocal print discriminant vectorses and the user prestored is calculated, authentication is carried out to the user based on the distance, and generate the result.The present invention can improve the accuracy rate and efficiency of subscriber authentication.

Description

The method and system of authentication based on Application on Voiceprint Recognition

Technical field

The present invention relates to communication technical field, more particularly to a kind of authentication based on Application on Voiceprint Recognition method and be System.

Background technology

At present, the scope of business of large-scale financing corporation is related to multiple business such as insurance, bank, investment, each business Category is generally required for same client to be linked up, and the mode of communication has a variety of (such as telephonic communications or communication face-to-face). Before being linked up, the identity to client carries out checking as the important component for ensureing service security.In order to meet industry The real-time demand of business, financing corporation generally carries out analysis checking using manual type to the identity of client.Due to customer group Huge, by artificial progress discriminant analysis, accuracy is not also high in the way of the identity to verifying client, and efficiency is also low.

The content of the invention

It is an object of the invention to provide a kind of method and system of the authentication based on Application on Voiceprint Recognition, it is intended to improves and uses The accuracy rate and efficiency of family authentication.

To achieve the above object, the present invention provides a kind of method of the authentication based on Application on Voiceprint Recognition, described to be based on sound The method of the authentication of line identification includes：

S1, after the speech data for the user for carrying out authentication is received, obtains the vocal print feature of the speech data, And corresponding vocal print feature vector is built based on the vocal print feature；

S2, by the background channel model of vocal print feature vector input training in advance generation, to construct the voice The corresponding current vocal print discriminant vectorses of data；

S3, calculates the space between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored Distance, carries out authentication, and generate the result based on the distance to the user.

Preferably, the step S1 includes：

S11, preemphasis, framing and windowing process are carried out to the speech data；

S12, carries out Fourier transform to each adding window and obtains corresponding frequency spectrum；

S13, inputs Mel wave filter by the frequency spectrum and obtains Mel frequency spectrum to export；

S14, carries out cepstral analysis to obtain mel-frequency cepstrum coefficient MFCC, based on the Mel on Mel frequency spectrum Frequency cepstral coefficient MFCC constitutes corresponding vocal print feature vector.

Preferably, the step S3 includes：

S31, calculates remaining between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored Chordal distance： For the standard vocal print discriminant vectorses,For current vocal print discriminant vectorses；

S32, if the COS distance is less than or equal to default distance threshold, generates the information being verified；

S33, if the COS distance is more than default distance threshold, the information that generation checking does not pass through.

Preferably, the background channel model includes before being gauss hybrid models, the step S1：

The speech data sample of predetermined number is obtained, and obtains the corresponding vocal print feature of each speech data sample, and is based on The corresponding vocal print feature vector of each speech data sample of each corresponding vocal print feature structure of speech data sample；

The corresponding vocal print feature vector of each speech data sample is divided into the training set of the first ratio and testing for the second ratio Card collection, first ratio and the second ratio and less than or equal to 1；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the checking set pair training；

If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training Step S2 background channel model is stated, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the speech data sample This quantity, and training is re-started based on the speech data sample after increase.

Preferably, the step S3 is replaced with：Each standard vocal print for calculating the current vocal print discriminant vectorses and prestoring reflects Space length not between vector, obtains minimum space length, and body is carried out to the user based on the minimum space length Part checking, and generate the result.

To achieve the above object, the present invention also provides a kind of system of the authentication based on Application on Voiceprint Recognition, described to be based on The system of the authentication of Application on Voiceprint Recognition includes：

First acquisition module, for after the speech data for the user for carrying out authentication is received, obtaining the voice The vocal print feature of data, and corresponding vocal print feature vector is built based on the vocal print feature；

Module is built, for by the background channel model of vocal print feature vector input training in advance generation, to build Go out the corresponding current vocal print discriminant vectorses of the speech data；

First authentication module, the standard vocal print of the user for calculating the current vocal print discriminant vectorses and prestoring differentiates Space length between vector, carries out authentication, and generate the result based on the distance to the user.

Preferably, first acquisition module to the speech data specifically for carrying out at preemphasis, framing and adding window Reason；Fourier transform is carried out to each adding window and obtains corresponding frequency spectrum；The frequency spectrum is inputted Mel wave filter to export To Mel frequency spectrum；Cepstral analysis is carried out on Mel frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the Mel Frequency cepstral coefficient MFCC constitutes corresponding vocal print feature vector.

Preferably, first authentication module is specifically for the calculating current vocal print discriminant vectorses and the user prestored Standard vocal print discriminant vectorses between COS distance： For the standard vocal print discriminant vectorses,For Current vocal print discriminant vectorses；If the COS distance is less than or equal to default distance threshold, the letter being verified is generated Breath；If the COS distance is more than default distance threshold, the information that generation checking does not pass through.

Preferably, the system of the authentication based on Application on Voiceprint Recognition also includes：

Second acquisition module, for obtaining the speech data sample of predetermined number, and obtains each speech data sample correspondence Vocal print feature, and the corresponding vocal print feature of each speech data sample is built based on the corresponding vocal print feature of each speech data sample Vector；

Division module, for the corresponding vocal print feature vector of each speech data sample is divided into the first ratio training set and The checking collection of second ratio, first ratio and the second ratio and less than or equal to 1；

Training module, for being trained using the vocal print feature vector in the training set to gauss hybrid models, and After the completion of training, verified using the accuracy rate of the gauss hybrid models after the checking set pair training；

Processing module, if being more than predetermined threshold value for the accuracy rate, model training terminates, and is mixed with the Gauss after training Matched moulds type as the background channel model, or, if the accuracy rate be less than or equal to predetermined threshold value, increase the voice number Training is re-started according to the quantity of sample, and based on the speech data sample after increase.

Preferably, first authentication module replaces with the second authentication module, for calculate the current vocal print differentiate to Space length between each standard vocal print discriminant vectorses measured and prestored, obtains minimum space length, based on described minimum Space length carries out authentication to the user, and generates the result.

The beneficial effects of the invention are as follows：The background channel model of training in advance generation of the present invention is by a large amount of voice numbers According to excavation obtained with comparing training, this model can be accurate to carve while the vocal print feature of user is retained to greatest extent Background vocal print feature when user speaks is drawn, and can be removed this feature in identification, and extracts the intrinsic of user voice Feature, can significantly improve the accuracy rate of subscriber authentication, and improve the efficiency of authentication.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the method preferred embodiment of the authentication of the invention based on Application on Voiceprint Recognition；

Fig. 2 is the refinement schematic flow sheet of step S1 shown in Fig. 1；

Fig. 3 is the refinement schematic flow sheet of step S3 shown in Fig. 1；

Fig. 4 is the running environment schematic diagram of the system preferred embodiment of the authentication of the invention based on Application on Voiceprint Recognition；

Fig. 5 is the structural representation of the system preferred embodiment of the authentication of the invention based on Application on Voiceprint Recognition.

Embodiment

The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.

As shown in figure 1, Fig. 1 is the flow signal of the embodiment of method one of the authentication of the invention based on Application on Voiceprint Recognition Figure, being somebody's turn to do the method for the authentication based on Application on Voiceprint Recognition can be performed by the system of an authentication based on Application on Voiceprint Recognition, should System can realize by software and/or hardware, and the system can it is integrated in the server.The identity based on Application on Voiceprint Recognition is tested The method of card comprises the following steps：

Step S1, after the speech data for the user for carrying out authentication is received, obtains the vocal print of the speech data Feature, and corresponding vocal print feature vector is built based on the vocal print feature；

In the present embodiment, speech data collects (voice capture device is, for example, microphone) by voice capture device, The system that the speech data of collection is sent to the authentication based on Application on Voiceprint Recognition by voice capture device.

When gathering speech data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device Suitable distance is kept with user, and as far as possible without the big voice capture device of distortion, power supply preferably uses civil power, and keeps electric current It is stable；Sensor should be used when carrying out telephonograph., can be to voice number before the vocal print feature in extracting speech data According to carrying out going noise treatment, disturbed with further reduce.In order to extract the vocal print feature for obtaining speech data, gathered Speech data is the speech data of preset data length, or is the speech data more than preset data length.

Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print, the vocal print of the present embodiment Be characterized as preferably speech data mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).When building corresponding vocal print feature vector, by the vocal print feature composition characteristic data matrix of speech data, this feature Data matrix is the vocal print feature vector of speech data.

Step S2, by the background channel model of vocal print feature vector input training in advance generation, described in constructing The corresponding current vocal print discriminant vectorses of speech data；

Wherein, by the background channel model of vocal print feature vector input training in advance generation, it is preferable that the background channel mould Type is gauss hybrid models, and vocal print feature vector is calculated using the background channel model, show that corresponding current vocal print differentiates Vectorial (i.e. i-vector).

Specifically, the calculating process includes：

1) Gauss model, is selected：First, every frame data are calculated using the parameter in common background channel model in difference The likelihood logarithm value of Gauss model, by likelihood logarithm value matrix each column sorting in parallel, choosing top n Gauss model, finally Obtain a matrix per frame data numerical value in mixed Gauss model：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

Wherein, Loglike is likelihood logarithm value matrix, and E (X) is that common background channel model trains the average square come Battle array, D (X) is covariance matrix, and X is data matrix, X.²Each it is worth for matrix squared.

2) posterior probability, is calculated：X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down Angular moment battle array, and element is arranged as 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model Variance matrix, each matrix is also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data Value, then carries out Softmax recurrence, operation is finally normalized, every frame is obtained in mixed Gauss model Posterior probability distribution, The ProbabilityDistribution Vector of every frame is constituted into probability matrix.

3) current vocal print discriminant vectorses, are extracted：Carry out single order first, the calculating of second order coefficient, coefficient of first order is calculated can be with Obtained by probability matrix row summation：

Wherein, Gamma_iFor i-th of element of coefficient of first order vector, loglikes_jiFor The jth row of probability matrix, i-th of element.

Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix：

X=Loglike^T* feats, wherein, X is second order coefficient matrix, and loglike is probability matrix, and feats is characterized Data matrix.

After calculating obtains single order, second order coefficient, then parallel computation first order and quadratic term pass through first order and two Secondary item calculates current vocal print discriminant vectorses.

Preferably, background channel model is gauss hybrid models, is included before above-mentioned steps S1：

Wherein, when the vocal print feature vector in using training set is trained to gauss hybrid models, the D extracted The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component：

Wherein, P (x) is the probability (mixing that speech data sample is generated by gauss hybrid models Gauss model), w_kFor the weight of each Gauss model, the probability that p (x | k) generate for sample by k-th of Gauss model, K is high This model quantity.

The parameter of whole gauss hybrid models can be expressed as：{w_i,μ_i,Σ_i, w_iFor the weight of i-th of Gauss model, μ_i For the average of i-th of Gauss model, ∑_iFor the covariance of i-th of Gauss model.The gauss hybrid models are trained to use non-prison The EM algorithms superintended and directed.After the completion of training, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained It is multiplied by matrix of covariance etc., the gauss hybrid models after as one training.

Step S3, is calculated between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored Space length, carries out authentication, and generate the result based on the distance to the user.

Vector has a variety of with the distance between vector, including COS distance and Euclidean distance etc., it is preferable that the present embodiment Space length be COS distance, COS distance be using two vectorial angle cosine values in vector space be used as measurement two The measurement of the size of interindividual variation.

Wherein, standard vocal print discriminant vectorses are the vocal print discriminant vectorses for being obtained ahead of time and storing, standard vocal print discriminant vectorses The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space Before distance, the identification information provided according to user obtains the vocal print discriminant vectorses of storage.

Wherein, when calculating obtained space length less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying Failure.

Compared with prior art, the background channel model of the present embodiment training in advance generation is by a large amount of speech datas Excavation obtained with comparing training, this model can to greatest extent retain user vocal print feature while, accurately portray Background vocal print feature when user speaks, and can remove this feature in identification, and extract the intrinsic spy of user voice Levy, can significantly improve the accuracy rate of subscriber authentication, and improve the efficiency of authentication；In addition, the present embodiment is abundant Vocal print feature related to sound channel in voice is make use of, this vocal print feature simultaneously need not be any limitation as, thus entering to text There is larger flexibility during row identification and checking.

In a preferred embodiment, as shown in Fig. 2 on the basis of above-mentioned Fig. 1 embodiment, above-mentioned steps S1 bags Include：

Step S11, preemphasis, framing and windowing process are carried out to the speech data；In the present embodiment, receive into After the speech data of the user of row authentication, speech data is handled.Wherein, preemphasis processing is really high-pass filtering Processing, filters out low-frequency data so that the high frequency characteristics in speech data is more highlighted, specifically, the transmission function of high-pass filtering For：H (Z)=1- α Z^-1, wherein, Z is speech data, and α is constant factor, it is preferable that α value is 0.97；Due to voice signal Stationarity is only presented within a short period of time, therefore one section of voice signal is divided into the signal (i.e. N frames) of N sections of short time, and in order to Avoid the continuity Characteristics of sound from losing, there is one section of repeat region between consecutive frame, repeat region is generally 1/2 per frame length； After framing is carried out to speech data, each frame signal is handled all as stationary signal, but the presence of Gibbs' effect, voice The start frame and end frame of data are discontinuous, after framing, more away from raw tone, accordingly, it would be desirable to voice number According to progress windowing process.

Step S12, carries out Fourier transform to each adding window and obtains corresponding frequency spectrum；

Step S13, inputs Mel wave filter by the frequency spectrum and obtains Mel frequency spectrum to export；

Step S14, carries out cepstral analysis to obtain mel-frequency cepstrum coefficient MFCC, based on described on Mel frequency spectrum Mel-frequency cepstrum coefficient MFCC constitutes corresponding vocal print feature vector.Wherein, cepstral analysis is, for example, to take the logarithm, do inversion Change, inverse transformation is realized generally by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as MFCC systems Number.Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech data, by the mel-frequency cepstrum coefficient of every frame MFCC composition characteristic data matrixes, this feature data matrix is the vocal print feature vector of speech data.

In a preferred embodiment, as shown in figure 3, on the basis of above-mentioned Fig. 1 embodiment, upper step S3 includes：

Step S31, is calculated between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored COS distance： For the standard vocal print discriminant vectorses,For current vocal print discriminant vectorses；

Step S32, if the COS distance is less than or equal to default distance threshold, generates the letter being verified Breath；

Step S33, if the COS distance is more than default distance threshold, the information that generation checking does not pass through.

In a preferred embodiment, on the basis of above-mentioned Fig. 1 embodiment, above-mentioned step S3 is replaced with：Calculate Space length between the current vocal print discriminant vectorses and each standard vocal print discriminant vectorses prestored, obtain minimum space away from From carrying out authentication to the user based on the minimum space length, and generate the result.

The present embodiment from unlike Fig. 1 embodiment, the present embodiment store standard vocal print discriminant vectorses when do not take Identification information with user, when verifying the identity of user, calculates current vocal print discriminant vectorses and each standard vocal print prestored reflects Space length not between vector, and the space length of minimum is obtained, if the minimum space length is less than default distance Threshold value (distance threshold identical with the distance threshold of above-described embodiment or difference), then be verified, otherwise authentication failed.

Referring to Fig. 4, Fig. 4 is the operation ring of the preferred embodiment of system 10 of the authentication of the invention based on Application on Voiceprint Recognition Border schematic diagram.

In the present embodiment, the system 10 of the authentication based on Application on Voiceprint Recognition is installed and run in electronic installation 1.Electricity Sub-device 1 can be the computing devices such as desktop PC, notebook, palm PC and server.The electronic installation 1 can be wrapped Include, but be not limited only to, memory 11, processor 12 and display 13.Fig. 1 illustrate only the electronic installation with component 11-13 1, it should be understood that being not required for implementing all components shown, the more or less component of the implementation that can be substituted.

Memory 11 can be the internal storage unit of electronic installation 1 in certain embodiments, such as electronic installation 1 Hard disk or internal memory.Memory 11 can also be the External memory equipment of electronic installation 1 in further embodiments, and for example electronics is filled Put the plug-in type hard disk being equipped with 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, memory 11 can also be both interior including electronic installation 1 Portion's memory cell also includes External memory equipment.Memory 11, which is used to store, is installed on the application software of electronic installation 1 and all kinds of Data, such as the program code of the system 10 of the authentication based on Application on Voiceprint Recognition.Memory 11 can be also used for temporarily Store the data that has exported or will export.

Processor 12 can be in certain embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips, for the program code or processing data stored in run memory 11, example Such as perform the system 10 of the authentication based on Application on Voiceprint Recognition.

Display 13 can be in certain embodiments light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..Display 13 is used to be shown in The information that is handled in electronic installation 1 and for showing visual user interface, such as Application on Voiceprint Recognition interface.Electronic installation 1 part 11-13 is in communication with each other by system bus.

Referring to Fig. 5, being the functional module of the preferred embodiment of system 10 of the authentication of the invention based on Application on Voiceprint Recognition Figure.In the present embodiment, the system 10 of the authentication based on Application on Voiceprint Recognition can be divided into one or more modules, one Or multiple modules are stored in memory 11, and held by one or more processors (the present embodiment is by processor 12) OK, to complete the present invention.For example, in Figure 5, the system 10 of the authentication based on Application on Voiceprint Recognition can be divided into detecting mould Block 21, identification module 22, replication module 23, installation module 24 and starting module 25.Module alleged by the present invention is to have referred to Into the series of computation machine programmed instruction section of specific function, than program more suitable for describing the authentication based on Application on Voiceprint Recognition The implementation procedure of system 10 in the electronic apparatus 1, wherein：

First acquisition module 101, for after the speech data for the user for carrying out authentication is received, obtaining institute's predicate The vocal print feature of sound data, and corresponding vocal print feature vector is built based on the vocal print feature；

Module 102 is built, for by the background channel model of vocal print feature vector input training in advance generation, with structure Build out the corresponding current vocal print discriminant vectorses of the speech data；

Specifically, the calculating process includes：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

Preferably, background channel model is gauss hybrid models, and the system of the authentication based on Application on Voiceprint Recognition also includes：

First authentication module 103, for the standard vocal print for the user for calculating the current vocal print discriminant vectorses and prestoring Space length between discriminant vectorses, carries out authentication, and generate the result based on the distance to the user.

In a preferred embodiment, on the basis of above-mentioned Fig. 5 embodiment, above-mentioned first acquisition module 101 is specific For carrying out preemphasis, framing and windowing process to the speech data；Fourier transform is carried out to each adding window to obtain pair The frequency spectrum answered；The frequency spectrum is inputted into Mel wave filter and obtains Mel frequency spectrum to export；Cepstrum point is carried out on Mel frequency spectrum Analysis to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum coefficient MFCC constitute corresponding vocal print feature to Amount.

Wherein, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that the high frequency in speech data is special Property is more highlighted, and specifically, the transmission function of high-pass filtering is：H (Z)=1- α Z^-1, wherein, Z is speech data, and α is constant system Number, it is preferable that α value is 0.97；Because stationarity is only presented in voice signal within a short period of time, therefore one section of sound is believed Number it is divided into the signal (i.e. N frames) of N sections of short time, and is lost in order to avoid the continuity Characteristics of sound, has one section between consecutive frame Repeat region, repeat region is generally 1/2 per frame length；After framing is carried out to speech data, each frame signal is all as flat Steady signal is handled, but the presence of Gibbs' effect, the start frame and end frame of speech data be it is discontinuous, framing it Afterwards, more away from raw tone, accordingly, it would be desirable to carry out windowing process to speech data.

Wherein, cepstral analysis is, for example, to take the logarithm, do inverse transformation, and inverse transformation comes generally by DCT discrete cosine transforms Realize, take the 2nd after DCT to the 13rd coefficient as MFCC coefficients.Mel-frequency cepstrum coefficient MFCC is this frame voice The vocal print feature of data, by the mel-frequency cepstrum coefficient MFCC composition characteristic data matrixes of every frame, this feature data matrix is For the vocal print feature vector of speech data.

In a preferred embodiment, on the basis of above-mentioned Fig. 5 embodiment, first authentication module 103 is specific For calculating the COS distance between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored： For the standard vocal print discriminant vectorses,For current vocal print discriminant vectorses；If the COS distance is less than Or equal to default distance threshold, then generate the information being verified；If the COS distance is more than default distance threshold, The information that then generation checking does not pass through.

In a preferred embodiment, on the basis of above-mentioned Fig. 4 embodiment, the first above-mentioned authentication module is replaced with Second authentication module, for calculating the space between the current vocal print discriminant vectorses and each standard vocal print discriminant vectorses prestored Distance, obtains minimum space length, carries out authentication to the user based on the minimum space length, and generate checking As a result.

The present embodiment from unlike Fig. 5 embodiment, the present embodiment store standard vocal print discriminant vectorses when do not take Identification information with user, when verifying the identity of user, calculates current vocal print discriminant vectorses and each standard vocal print prestored reflects Space length not between vector, and the space length of minimum is obtained, if the minimum space length is less than default distance Threshold value (distance threshold identical with the distance threshold of above-described embodiment or difference), then be verified, otherwise authentication failed.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims

1. a kind of method of the authentication based on Application on Voiceprint Recognition, it is characterised in that the authentication based on Application on Voiceprint Recognition Method include：

S1, after the speech data for the user for carrying out authentication is received, obtains the vocal print feature of the speech data, and base Corresponding vocal print feature vector is built in the vocal print feature；

S2, by the background channel model of vocal print feature vector input training in advance generation, to construct the speech data Corresponding current vocal print discriminant vectorses；

S3, calculate space between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored away from From carrying out authentication to the user based on the distance, and generate the result.

2. the method for the authentication according to claim 1 based on Application on Voiceprint Recognition, it is characterised in that the step S1 bags Include：

S14, carries out cepstral analysis to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency on Mel frequency spectrum Cepstrum coefficient MFCC constitutes corresponding vocal print feature vector.

3. the method for the authentication according to claim 1 based on Application on Voiceprint Recognition, it is characterised in that the step S3 bags Include：

S31, calculate cosine between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored away from From： For the standard vocal print discriminant vectorses,For current vocal print discriminant vectorses；

4. the method for the authentication based on Application on Voiceprint Recognition according to any one of claims 1 to 3, it is characterised in that institute Background channel model is stated for gauss hybrid models, is included before the step S1：

The speech data sample of predetermined number is obtained, and obtains the corresponding vocal print feature of each speech data sample, and based on each language The corresponding vocal print feature of sound data sample builds the corresponding vocal print feature vector of each speech data sample；

The corresponding vocal print feature vector of each speech data sample is divided into the training set of the first ratio and the checking collection of the second ratio, First ratio and the second ratio and less than or equal to 1；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, utilized The accuracy rate of gauss hybrid models after the checking set pair training is verified；

If the accuracy rate is more than predetermined threshold value, model training terminates, and the step is used as using the gauss hybrid models after training Rapid S2 background channel model, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the speech data sample Quantity, and training is re-started based on the speech data sample after increase.

5. the method for the authentication according to claim 1 or 2 based on Application on Voiceprint Recognition, it is characterised in that the step S3 is replaced with：The space length between the current vocal print discriminant vectorses and each standard vocal print discriminant vectorses prestored is calculated, is obtained The space length of minimum is taken, authentication is carried out to the user based on the minimum space length, and generate the result.

6. a kind of system of the authentication based on Application on Voiceprint Recognition, it is characterised in that the authentication based on Application on Voiceprint Recognition System include：

First acquisition module, for after the speech data for the user for carrying out authentication is received, obtaining the speech data Vocal print feature, and corresponding vocal print feature vector is built based on the vocal print feature；

Module is built, for by the background channel model of vocal print feature vector input training in advance generation, to construct State the corresponding current vocal print discriminant vectorses of speech data；

First authentication module, for the standard vocal print discriminant vectorses for the user for calculating the current vocal print discriminant vectorses and prestoring Between space length, authentication is carried out to the user based on the distance, and the result is generated.

7. the system of the authentication according to claim 6 based on Application on Voiceprint Recognition, it is characterised in that described first obtains Module to the speech data specifically for carrying out preemphasis, framing and windowing process；Fourier change is carried out to each adding window Get corresponding frequency spectrum in return；The frequency spectrum is inputted into Mel wave filter and obtains Mel frequency spectrum to export；Enter on Mel frequency spectrum Row cepstral analysis constitutes corresponding sound to obtain mel-frequency cepstrum coefficient MFCC based on the mel-frequency cepstrum coefficient MFCC Line characteristic vector.

8. the system of the authentication according to claim 6 based on Application on Voiceprint Recognition, it is characterised in that first checking Module is specifically for more than calculating between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user prestored Chordal distance： For the standard vocal print discriminant vectorses,For current vocal print discriminant vectorses；If the cosine Distance is less than or equal to default distance threshold, then generates the information being verified；If the COS distance is more than default Distance threshold, the then information that generation checking does not pass through.

9. the system of the authentication based on Application on Voiceprint Recognition according to any one of claim 6 to 8, it is characterised in that institute Stating the system of the authentication based on Application on Voiceprint Recognition also includes：

Second acquisition module, for obtaining the speech data sample of predetermined number, and obtains the corresponding sound of each speech data sample Line feature, and based on the corresponding vocal print feature of each speech data sample build the corresponding vocal print feature of each speech data sample to Amount；

Division module, the training set and second for the corresponding vocal print feature vector of each speech data sample to be divided into the first ratio The checking collection of ratio, first ratio and the second ratio and less than or equal to 1；

Training module, for being trained using the vocal print feature vector in the training set to gauss hybrid models, and in instruction After the completion of white silk, verified using the accuracy rate of the gauss hybrid models after the checking set pair training；

Processing module, if being more than predetermined threshold value for the accuracy rate, model training terminates, with the Gaussian Mixture mould after training Type as the background channel model, or, if the accuracy rate be less than or equal to predetermined threshold value, increase the speech data sample This quantity, and training is re-started based on the speech data sample after increase.

10. the system of the authentication based on Application on Voiceprint Recognition according to claim 6 or 7, it is characterised in that described first Authentication module replaces with the second authentication module, and each standard vocal print for calculating the current vocal print discriminant vectorses and prestoring differentiates Space length between vector, obtains minimum space length, and identity is carried out to the user based on the minimum space length Checking, and generate the result.