CN110047490A

CN110047490A - Method for recognizing sound-groove, device, equipment and computer readable storage medium

Info

Publication number: CN110047490A
Application number: CN201910182453.3A
Authority: CN
Inventors: 徐凌智; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2019-07-23
Also published as: WO2020181824A1

Abstract

The invention discloses a kind of method for recognizing sound-groove, device, equipment and computer readable storage medium, which includes: to obtain verifying voice to be identified；Using the first vocal print feature of GMM-UBM model extraction verifying voice, the second vocal print feature of verifying voice is extracted using neural network model；The first vocal print feature for verifying voice and the second vocal print feature are subjected to Fusion Features, are verified the fusion vocal print feature vector of voice；Calculate the similarity between the vocal print feature vector for respectively registering user in the fusion vocal print feature vector and default registration voice print database of verifying voice；Based on similarity, the Application on Voiceprint Recognition result of decision verification voice.Two models extract feature to verifying voice respectively and are used to carry out speech verification, compared to single model extraction verifying voice feature and for carrying out speech verification, the information that the feature of two model extractions is included is more comprehensive, so that the accuracy rate of Application on Voiceprint Recognition is improved.

Description

Method for recognizing sound-groove, device, equipment and computer readable storage medium

Technical field

The present invention relates to sound groove recognition technology in e fields more particularly to method for recognizing sound-groove, unit and computer can Read storage medium.

Background technique

Voiceprint Recognition System is according to the speciality of voice come the system of automatic identification speaker's identity, body line identification technology category In one kind of biometric authentication technology, i.e., verified by identity of the voice to speaker.This technology has preferable convenient The features such as property, stability, measurability, safety, it is generally used for bank, social security, public security, smart home, mobile payment etc. Field.

Current Voiceprint Recognition System is generally based on gauss hybrid models-common background mould of proposition the 1990s Type (GMM-UBM), the model simple and flexible and have preferable robustness.However, recently as the development of technology, nerve net The training study of network achieves breakthrough development, and voiceprint verification system neural network based is applied and practices, and is based on The model of neural network closes the performance showed in some collection and is higher than single gauss hybrid models-universal background model (GMM-UBM)。

Summary of the invention

The main purpose of the present invention is to provide a kind of method for recognizing sound-groove, unit and computer-readable storages Medium, it is intended to solve the not high technical problem of accuracy of speech recognition in the prior art.

To achieve the above object, a kind of method for recognizing sound-groove provided by the invention, the method for recognizing sound-groove include following Step:

Obtain verifying voice to be identified；

Using the first vocal print feature for verifying voice described in GMM-UBM model extraction, institute is extracted using neural network model State the second vocal print feature of verifying voice；

First vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, obtain the verifying language The fusion vocal print feature vector of sound；

It calculates in the fusion vocal print feature vector and default registration voice print database of the verifying voice and respectively registers user's Similarity between vocal print feature vector；

Based on the similarity, the Application on Voiceprint Recognition result of the verifying voice is determined.

Optionally, before the acquisition verifying voice to be identified, further includes:

Obtain the registration voice of registration user；

Using the third vocal print feature for registering voice described in GMM-UBM model extraction, institute is extracted using neural network model State the falling tone line feature of registration voice；

The third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtain the registration language The fusion vocal print feature vector of sound；

The fusion vocal print feature vector of the registration voice is saved in the registration voice print database, using as registration The vocal print feature vector of user.

Optionally, described to include: using the first vocal print feature for verifying voice described in GMM-UBM model extraction

Preemphasis, framing and adding window pretreatment are carried out to the verifying voice；

Pitch period, linear prediction residue error, linear prediction cepstrum coefficient are extracted from the pretreated verifying voice The first-order difference of coefficient, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, obtain First vocal print feature of the verifying voice；

It is described using neural network model extract it is described verifying voice the second vocal print feature include:

The verifying voice is arranged in the sound spectrograph of predetermined number of latitude；

It is identified by sound spectrograph of the neural network to the predetermined number of latitude, obtains the second vocal print of the verifying voice Feature.

Optionally, described that first vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, it obtains Include: to the fusion vocal print feature vector for verifying voice

The first vocal print feature dimension and the second vocal print characteristic dimension are carried out using Markov Chain Monte Carlo stochastic model Fusion, obtain it is described verifying voice fusion vocal print feature vector.

Optionally, first vocal print feature includes multiple first vocal print subcharacters, and second vocal print feature includes more A second vocal print subcharacter；

It is described to carry out the first vocal print feature dimension and the second vocal print feature using Markov Chain Monte Carlo stochastic model The fusion of dimension, the fusion vocal print feature vector for obtaining the verifying voice include:

The fusion feature vocal print total characteristic number of verifying voice is set as K；

Fusion vocal print feature total characteristic according to the verifying voice is K, determines the first vocal print using direct sampling method Feature and the second vocal print subcharacter integration percentage；

According to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, simulation is sampled using the Gibbs of MCMC The sampling process of joint normal distribution determines the first vocal print subcharacter and described second that first vocal print feature is chosen respectively The second vocal print subcharacter that vocal print feature is chosen forms the fusion vocal print feature vector of the verifying voice.

Optionally, the fusion vocal print feature total characteristic according to the verifying voice is K, is determined using direct sampling method First vocal print subcharacter and the integration percentage of the second vocal print subcharacter include:

Step A: the random number generated between one [0,1] represents the first vocal print subcharacter as parameter p, parameter p The shared ratio in the fusion vocal print feature of the verifying voice；

Step B: the initial value k=0 for recording the counter of the number of iterations is initialized；

Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one The quantity of the second vocal print subcharacter, the second vocal print subcharacter adds 1, as q > p, chooses first vocal print The quantity of subcharacter, the first vocal print subcharacter adds 1；

Step D:k value increases by 1, judges whether k >=K, if it is counts the fusion feature wait be selected into the verifying voice First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process；It is no Then, return step C.

Optionally, the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter, utilizes MCMC's The sampling process of Gibbs sampling simulation joint normal distribution determines the first vocal print that first vocal print feature is chosen respectively Second vocal print subcharacter of feature and the second vocal print Feature Selection forms the fusion vocal print feature vector of the verifying voice Include:

Step E: transfer number threshold value is set as T, initializes transfer number t=0；

Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generates M Random number between a [0,1] is as original state

X (0)=[x₁(0), x₂(0)…x_M(0)]；

Step G: transfer number t every increase by 1, to each variable x_i(t), i ∈ { 1,2 ... M }, by below by joint probability point The conditional probability distribution formula that cloth obtains is calculated as follows:

P(x_i(t+1)|x₁(t+1), x₂(t+1)…x_i-1(0), x_i+1(t)…x_M(t)),

Wherein, the mean value of joint probability distribution is X；Judge whether t < T, if it is return step G, otherwise obtains

P (T)=[P (x₁(T)), P (x₂(T)) ... P (x_i(T)) ... P (x_M(T))]；

Step H: first according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D Vocal print subcharacter is number A, A corresponding probability P x before choosing_i(T) maximum first vocal print subcharacter is as selected verifying voice Fusion vocal print feature vector the first vocal print subcharacter；

Step I: transfer number threshold value is set as T, initializes transfer number t=0；

Step J: verifying the number of feature in the fusion vocal print feature vector of voice, be recorded as N described in statistics gatherer, generates Random number between N number of [0,1] is as original state

Y (0)=[y₁(0), y₂(0)…y_N(0)]；

Step K: transfer number t every increase by 1, to each variable y_j(t), j ∈ { 1,2 ... N }, by below by joint probability point The conditional probability distribution formula that cloth obtains is calculated as follows:

P(y_i(t+1)|y₁(t+1), y₂(t+1)…y_j-1(0), y_j+1(t)…y_N(t)),

Wherein, the mean value of joint probability distribution is Y；

Judge whether otherwise t < T is obtained if so, thening follow the steps K

P (T)=[P (y₁(T)), P (y₂(T)) ... P (y_j(T)) ... P (y_N(T))]；

Step L: according to the second vocal print of the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D Subcharacter is number B, B corresponding probability P y before choosing_j(T) maximum second vocal print subcharacter melting as selected verifying voice Second vocal print subcharacter of chorus line feature vector.

In addition, to achieve the above object, the present invention also provides a kind of voice print identification device, the voice print identification device packet It includes:

Data acquisition module, for obtaining verifying voice to be identified；

Data processing module, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using nerve net The second vocal print feature of voice is verified described in network model extraction；

Data fusion module melts for the first vocal print feature of the verifying voice to be carried out feature with the second vocal print feature It closes, obtains the fusion vocal print feature vector of the verifying voice；

Data comparison module, for calculating the fusion vocal print feature vector and default registration voice print database of the verifying voice The similarity between the vocal print feature vector of user is respectively registered in library；

Data judgment module determines the Application on Voiceprint Recognition result of the verifying voice for being based on the similarity.

Optionally, the data processing module is also used to obtain the registration voice of registration user, the data processing module Be also used to using the third vocal print feature for registering voice described in GMM-UBM model extraction, extracted using neural network model described in Register the falling tone line feature of voice；The data fusion module is also used to the first vocal print feature and the of the verifying voice Two vocal print features carry out Fusion Features, obtain the fusion vocal print feature vector of the verifying voice；The voice print identification device is also Including data memory module, the data memory module is used to the fusion vocal print feature vector of the registration voice being saved in institute It states in registration voice print database, using the vocal print feature vector as registration user.

Optionally, the data processing module further include:

First pretreatment unit is pre-processed for carrying out preemphasis, framing and adding window to the verifying voice；

First extraction unit, for extracting pitch period, linear prediction cepstrum coefficient from the pretreated verifying voice Coefficient, the first-order difference of linear prediction residue error, energy, the first-order difference of energy and Gamma tone filter cepstrum The characteristic parameter of coefficient obtains the first vocal print feature of the verifying voice；

Second pretreatment unit, for the verifying voice to be arranged in the sound spectrograph of predetermined number of latitude；

Second extraction unit obtains described for being identified by sound spectrograph of the neural network to the predetermined number of latitude Verify the second vocal print feature of voice.

Optionally, the data fusion module includes:

Data fusion unit, for using Markov Chain Monte Carlo stochastic model carry out the first vocal print feature dimension with The fusion of second vocal print characteristic dimension obtains the fusion vocal print feature vector of the verifying voice.

Optionally, the data fusion unit includes:

Subelement is set, for setting the fusion feature vocal print total characteristic number of verifying voice as K；

It determines subelement, for being K according to the fusion vocal print feature total characteristic of the verifying voice, utilizes direct sampling method Determine the first vocal print subcharacter and the second vocal print subcharacter integration percentage；

It merges subelement and utilizes MCMC for the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter Gibbs sampling simulation joint normal distribution sampling process, determine the first vocal print that first vocal print feature is chosen respectively Second vocal print subcharacter of subcharacter and the second vocal print Feature Selection, form it is described verifying voice fusion vocal print feature to Amount.

Optionally, the determining subelement is used for:

Optionally, the fusion subelement is used for:

X (0)=[x₁(0), x₂(0)…x_M(0)]；

P(x_i(t+1)|x₁(t+1), x₂(t+1)…x_i-1(0), x_i+1(t)…x_M(t)),

P (T)=[P (x₁(T)), P (x₂(T)) ... P (x_i(T)) ... P (x_M(T))]；

Y (0)=[y₁(0), y₂(0)…y_N(0)]；

P(y_i(t+1)|y₁(t+1), y₂(t+1)…y_j-1(0), y_j+1(t)…y_N(t))

Wherein, the mean value of joint probability distribution is Y；

Judge whether otherwise t < T is obtained if so, thening follow the steps K

P (T)=[P (y₁(T)), P (y₂(T)) ... P (y_j(T)) ... P (y_N(T))]；

In addition, to achieve the above object, the present invention also provides a kind of application on voiceprint recognition equipment, the application on voiceprint recognition equipment includes Processor, memory and it is stored in the Application on Voiceprint Recognition program that can be executed on the memory and by the processor, the sound The step of line recognizer realizes above-mentioned method for recognizing sound-groove when being executed by the processor.

In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Application on Voiceprint Recognition program is stored on storage medium, the Application on Voiceprint Recognition program realizes above-mentioned Application on Voiceprint Recognition side when being executed by processor The step of method.

The present invention extracts the first vocal print feature of verifying voice by GMM-UBM model from verifying voice, passes through nerve Network model extracts the second vocal print feature of verifying voice from verifying voice；The first vocal print feature and second of voice will be verified Vocal print feature is merged, and the fusion vocal print feature vector of voice is verified；Calculate verifying voice fusion feature vocal print to The similarity between the vocal print feature vector of user is respectively registered in amount and default voice print database；Based on similarity, decision verification The Application on Voiceprint Recognition result of voice.By the above-mentioned means, GMM-UBM model and neural network model are combined, two models point It is other that feature is extracted to verifying voice, while two extracted features of model are used to carry out speech verification, compared to single mould For type extracts the feature of verifying voice and carries out speech verification, the information that the feature of two model extractions is included is more complete Face thus can comprehensively be verified verifying voice and registration voice, so that the accuracy rate of Application on Voiceprint Recognition obtains It improves.

Detailed description of the invention

Fig. 1 is the hardware structural diagram of application on voiceprint recognition equipment involved in the embodiment of the present invention；

Fig. 2 is the flow diagram of one embodiment of method for recognizing sound-groove of the present invention；

Fig. 3 is the flow diagram of another embodiment of method for recognizing sound-groove of the present invention；

Fig. 4 is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2；

Fig. 5 is the refinement flow diagram of another embodiment of step S20 in Fig. 2；

Fig. 6 is the flow diagram of mono- embodiment of step S30 in Fig. 2；

Fig. 7 is the functional block diagram of one embodiment of vocal print device of the present invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.

Referring to Fig.1, Fig. 1 is the hardware structural diagram that Application on Voiceprint Recognition involved in the embodiment of the present invention manages equipment. In the embodiment of the present invention, application on voiceprint recognition equipment may include processor 1001 (such as CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components； User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard)；Network interface 1004 can Choosing may include standard wireline interface and wireless interface (such as WI-FI interface)；Memory 1005 can be high-speed RAM storage Device, is also possible to stable memory (non-volatile memory), such as magnetic disk storage, and memory 1005 is optional It can also be the storage device independently of aforementioned processor 1001.

It will be understood by those skilled in the art that hardware configuration shown in Fig. 1 does not constitute the limit to application on voiceprint recognition equipment It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.

With continued reference to Fig. 1, the memory 1005 in Fig. 1 as a kind of computer readable storage medium may include operation system System, network communication module and data administrator.

In Fig. 1, network communication module is mainly used for connecting server, carries out data communication with server；And processor 1001 can call the data administrator stored in memory 1005, and execute Application on Voiceprint Recognition side provided in an embodiment of the present invention Method.

Based on above-mentioned application on voiceprint recognition equipment, each embodiment of method for recognizing sound-groove of the invention is proposed.

Referring to Fig. 2, Fig. 2 is the flow diagram of one embodiment of method for recognizing sound-groove of the present invention, in the present embodiment, the sound Line recognition methods the following steps are included:

Step S10: verifying voice to be identified is obtained；

In the present embodiment, voice is verified to have carried out the sound that the user after voice registration is issued, if the user Voice registration is not carried out, then the sound that the user is issued is invalid voice.There are many kinds of the acquisition modes for verifying voice, such as The sound that the user that voice is registered is issued is obtained by microphone, the sound that microphone will acquire is sent to vocal print knowledge Other processing terminal；For another example the sound that the user that voice is registered is issued is obtained by intelligent terminal (mobile phone, plate etc.) Sound, the verifying voice that intelligent terminal will acquire are sent to the processing terminal of application on voiceprint recognition equipment；Certainly, verifying voice can also be adopted It is obtained with other equipment, is just not listed one by one herein.

It is worth noting that, can also be sieved to verifying voice to be identified when obtaining verifying voice to be identified Choosing, to reject second-rate verifying voice to be identified.Specifically, knowledge can also be treated simultaneously when obtaining verifying voice It is other verifying voice duration and it is to be identified verifying voice volume detected, if it is to be identified verifying voice when grow up It in or equal to default voice duration, then prompts to obtain verifying voice success to be identified, if the duration of verifying voice to be identified Less than default voice duration, then prompt to obtain verifying voice failure to be identified.So set, ensure that the to be identified of acquisition The quality for verifying voice, also ensure that extracted from verifying voice to be identified be characterized in it is obvious, clearly, from And be conducive to improve the accuracy rate of Application on Voiceprint Recognition.

Step S20: using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using neural network mould Type extracts the second vocal print feature of the verifying voice；

In the present embodiment, GMM-UBM model (gauss hybrid models-universal background model) and neural network model simultaneously from Feature is extracted in verifying voice, since GMM-UBM model and neural network model are two different models, two models When extracting vocal print feature from verifying voice, identical vocal print feature may be extracted, it is also possible to different vocal print features is extracted, It is also possible to the identical vocal print feature of extraction unit point, the specific restriction that just do not do herein.Preferably, GMM-UBM model and mind Extract different vocal print features from verifying voice through network model, for example, GMM-UBM model extracted from verifying voice the It include multiple subcharacters such as tone color, frequency, amplitude, volume in one vocal print feature, neural network model is extracted from verifying voice The second vocal print feature in include the multiple subcharacters of fundamental frequency, mel-frequency cepstrum coefficient, formant, fundamental tone, reflection coefficient etc..

It should be noted that GMM-UBM model and neural network model the extraction sound in the same acoustic segment of verifying voice Line feature, GMM-UBM model and neural network model can also extract vocal print feature in the alternative sounds section of verifying voice, GMM-UBM model and neural network model can also extract vocal print feature in the partly overlapping acoustic segment of verifying voice, This does not do specific restriction.

Step S30: the first vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, tested Demonstrate,prove the fusion vocal print feature vector of voice；

In the present embodiment, the fusion vocal print feature vector of voice is verified by the first vocal print feature of verifying voice and second What vocal print feature merged, there are many kinds of the amalgamation modes of the first vocal print feature and the second vocal print feature, such as the first vocal print Feature and the second vocal print feature merge the fusion vocal print feature vector to form verifying voice by way of being overlapped mutually, and for another example the One vocal print feature and the second vocal print feature merge the fusion vocal print spy to form verifying voice in such a way that part subcharacter is superimposed Levy vector.Certainly, the first vocal print feature and the second vocal print feature for verifying voice can also be merged using other modes, herein Just be not listed one by one.

Step S40: it calculates and is respectively infused in the fusion vocal print feature vector and default registration voice print database of the verifying voice Similarity between the vocal print feature vector of volume user；

In the present embodiment, the vocal print feature vector for registering user is application on voiceprint recognition equipment foundation in user speech registration , each user is corresponding with the vocal print feature vector of a registration user, the vocal print feature of the registration user of each user Vector is stored in the data storage module of application on voiceprint recognition equipment, and the vocal print feature vector of multiple registration users constitutes default Register voice print database.

Verify the calculating of the similarity between the fusion vocal print feature vector of voice and the vocal print feature vector of registration user There are many kinds of methods, such as similar between the fusion vocal print feature vector and the vocal print feature vector of registration user of verifying voice Degree is calculated using cosine similarity, i.e., according to formula:

The fusion vocal print feature vector for calculating verifying voice is similar to the cosine between the vocal print feature vector of registration user Degree, the value being calculated is bigger, then illustrates that merging vocal print feature vector and the similarity for the vocal print feature vector for registering user gets over Small, the value being calculated is smaller, then illustrates that merging vocal print feature vector and the similarity for the vocal print feature vector for registering user gets over Greatly.

Certainly, verify voice fusion vocal print feature vector and register user vocal print feature vector between similarity also It can be calculated using Pearson correlation coefficient, Euclidean distance, cosine similarity, manhatton distance etc., it is just different herein One lists.

It is worth noting that, be stored in general default registration voice print database the vocal print feature of a large amount of registration user to Amount needs to verify the fusion vocal print feature vector and registration voice print database of the verifying voice of voice when carrying out Application on Voiceprint Recognition The vocal print feature vector that user is respectively registered in library is compared, this allows for application on voiceprint recognition equipment needs and is largely calculated. In consideration of it, the vocal print feature vector association of each registration user in default registration voice print database can be got up, specifically, can It, will be pre- to register the similarity between the vocal print feature vector of user by any two in the default registration voice print database of calculating If the vocal print feature vector association of each registration user in registration voice print database is got up.This is in the fusion sound for calculating verifying voice It, can be according to testing when the similarity of the vocal print feature vector of line feature vector and a certain registration user in registration voice print database The similarity demonstrate,proved between the fusion vocal print feature vector of voice and the vocal print feature vector of a certain registration user is screened, with row Except the vocal print feature vector of other lower registration users of vocal print feature vector similarity with a certain registration user, thus may be used To reduce the calculation amount of application on voiceprint recognition equipment.

Step S50: being based on the similarity, determines the Application on Voiceprint Recognition result of the verifying voice.

In the present embodiment, the Application on Voiceprint Recognition of voice is verified the result is that closing based on the size between similarity and preset threshold System's determination, that is, verify the similarity between the fusion vocal print feature vector of voice and the vocal print feature vector of a certain registration user When equal to or more than preset threshold, then Application on Voiceprint Recognition success is determined；The fusion vocal print feature vector of verifying voice and each registration When similarity between the vocal print feature vector of user is less than preset threshold, then judge that Application on Voiceprint Recognition fails.

It should be noted that if having vocal print feature vector and the verifying of multiple registration users in default registration voice print database When the similarity of the fusion vocal print feature vector of voice is more than preset threshold, at this point, determining that the vocal print of multiple registration users is special Vocal print feature vector is merged with the similarity highest that merge vocal print feature vector of verifying voice with voice is verified in sign vector Match.

It is the flow diagram of another embodiment of method for recognizing sound-groove of the present invention referring to Fig. 3, Fig. 3.Based on the above embodiment, In the present embodiment, further comprising the steps of before step S10:

Step S100: the registration voice of registration user is obtained；

In the present embodiment, the sound that the user that voice is registered by needs issues is registered, the acquisition modes of voice are registered There are many kinds of, such as the sound that unregistered user is issued obtained by microphone, the sound that microphone will acquire is sent to The processing terminal of Application on Voiceprint Recognition；For another example the registration voice of the user obtained by intelligent terminal (mobile phone, plate etc.), intelligent terminal The registration voice that will acquire is sent to the processing terminal of application on voiceprint recognition equipment；Certainly, registration voice can also use other equipment It obtains, is herein just not listed one by one.

It is worth noting that, verifying mark of the registration voice as user when Voiceprint Recognition System can register customers as Standard registers the quality of voice quality, directly influences the accuracy rate of Application on Voiceprint Recognition.In order to improve the accuracy rate of Application on Voiceprint Recognition, also Registration voice can be screened, when obtaining registration voice to reject second-rate registration voice.Specifically, obtaining When taking registration voice, the duration to registration voice and the volume of registration voice it can also detect simultaneously, if registration voice Duration is greater than or equal to default voice duration, then prompts to obtain registration voice success, if the duration of registration voice is less than default language Sound duration then prompts to obtain registration voice failure.So set, ensure that the quality of registration voice, also ensure that from registration Extracted in voice be characterized in it is obvious, clearly, to be conducive to improve the accuracy rate of Application on Voiceprint Recognition.

Step S110: using the third vocal print feature for registering voice described in GMM-UBM model extraction, using neural network mould Type extracts the falling tone line feature of the registration voice；

In the present embodiment, GMM-UBM model and neural network model extract feature from registration voice simultaneously, due to GMM- UBM model and neural network model are two different models, therefore when two models extract vocal print feature from registration voice, Identical vocal print feature may be extracted, it is also possible to extract different vocal print features, it is also possible to the identical vocal print of extraction unit point Feature, the specific restriction that just do not do herein.Preferably, GMM-UBM model and neural network model are extracted from registration voice Different vocal print features, for example, GMM-UBM model from include in the third vocal print feature extracted in registration voice tone color, frequency, Multiple subcharacters such as amplitude, volume, neural network model include fundamental frequency, plum from the falling tone line feature extracted in registration voice Your frequency cepstral coefficient, formant, fundamental tone, reflection coefficient etc. multiple subcharacters.

It is worth noting that, GMM-UBM model and neural network model the extraction sound in the same acoustic segment of registration voice Line feature, GMM-UBM model and neural network model can also extract vocal print feature in the alternative sounds section of registration voice, GMM-UBM model and neural network model can also extract vocal print feature in the partly overlapping acoustic segment of registration voice, This does not do specific restriction.

It should be noted that the subcharacter phase that the subcharacter that third vocal print feature is included is included with the first vocal print feature Together, the subcharacter that falling tone line feature is included is identical as the subcharacter that the second vocal print feature is included.

Step S120: the third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtained The fusion vocal print feature vector of the registration voice；

In the present embodiment, the fusion vocal print feature vector of voice is registered by the third vocal print feature of registration voice and the 4th Vocal print feature merges to obtain, and there are many kinds of the amalgamation modes of third vocal print feature and falling tone line feature, such as third vocal print is special Falling tone line feature of seeking peace merges the fusion vocal print feature vector to form verifying voice by way of being overlapped mutually, for another example third Vocal print feature and falling tone line feature merge the fusion vocal print feature to form verifying voice in such a way that part subcharacter is superimposed Vector.Certainly, the third vocal print feature and falling tone line feature for registering voice can also be merged using other modes, herein It is not listed one by one.

Step S130: the fusion vocal print feature vector of the registration voice is saved in the registration voice print database, Using the vocal print feature vector as registration user.

In the present embodiment, it is equipped with registration voice print database in the data storage module of application on voiceprint recognition equipment, registers voice Fusion vocal print feature vector be stored in registration voice print database, the registration voice print database storage registration voice fusion sound When line feature vector, can by the fusion vocal print feature vector for registering voice classify store, such as according to similarity come into The fusion vocal print feature vector of the higher multiple registration voices of similarity is stored in a subset by row classification storage, more A subset composition registration voice print database.For another example classification storage is carried out according to gender, i.e., male is registered to the registration voice of user Fusion vocal print feature vector sum women register the fusion vocal print feature vector of registration voice of user and be stored separately.Certainly, it infuses The fusion feature vector of volume voice can also be stored using other modes, be just not listed one by one herein.

It is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2 referring to Fig. 4, Fig. 4.Based on the above embodiment, this reality It applies in example, step S20 includes:

Step S210: preemphasis, framing and adding window are carried out to the verifying voice and pre-processed；

Preemphasis: since the average power spectra of voice signal is influenced by glottal excitation and mouth and nose radiation, high frequency multiplication is about Fall in 800Hz or more by 6dB/ frequency multiplication, institute is in the hope of speech signal spec-trum, and frequency is higher, and corresponding ingredient is smaller, high frequency section Frequency spectrum it is also more hard to find, to carry out preemphasis processing thus.Its purpose is to promote high frequency section, the frequency spectrum of signal is made to become flat It is smooth, low frequency is maintained at into the entire frequency band of high frequency, can seek frequency spectrum with same signal-to-noise ratio.Preemphasis is generally in voice signal number After word, and preemphasis filter is single order, and the way of realization of filter: H (z)=1-u*z-1, wherein u generally exists Between (0.9,1).

Framing, adding window: it since voice signal has short-term stationarity, needs to divide voice signal after the completion of pretreatment Frame, windowing process, convenient for being handled with short time analysis technique voice signal.Under normal conditions, the frame number of each second is about 33~100 frames, the method that contiguous segmentation had both can be used in framing, the method that overlapping segmentation can also be used, but the latter can make frame with It is seamlessly transitted between frame, keeps its continuity.The overlapping part of former frame and a later frame is known as frame shifting, and frame moves and the ratio of frame length Generally it is taken as (0~1/2).Voice signal is intercepted into i.e. framing with the window of removable finite length on one side, is generallyd use Window function have rectangular window (Rectangular), Hamming window (Hamming) and Hanning window (Hanning) etc..

Voice signal will extract characteristic parameter, the selection of characteristic parameter should meet several principles after pretreatment: First, it is easy to extract characteristic parameter from voice signal；Second, it is not easy to be imitated；Third, not at any time and spatial variations, With opposite stability；4th, it can effectively identify different speakers.Speaker identification system relies primarily on voice at present Low level acoustic feature identified that these features can be divided into temporal signatures and transform domain feature.

Step S220: mel-frequency cepstrum coefficient, linear prediction cepstrum coefficient are extracted from the pretreated verifying voice The first-order difference of coefficient, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, with To the first vocal print feature of the verifying voice；

Specific step is as follows for the extraction of mel-frequency cepstrum coefficient:

(1) for treated, voice signal carries out Short Time Fourier Transform, obtains its frequency spectrum.Here using in quick Fu Leaf transformation FFT to carry out discrete cosine transform to each frame voice signal.It will first be mended after each frame time-domain signal x (n) several A 0 with formation length is the sequence of N, then carries out Fast Fourier Transform (FFT) to it, finally obtains linear spectral X (k).X (k) with Conversion formula between x (n) are as follows:

(2) frequency spectrum X (k) is gone square to acquire energy spectrum, then carry out smooth by Mel frequency filter and eliminated humorous Wave obtains corresponding Mel frequency spectrum.Wherein Mel frequency filter group is the masking effect according to sound, in the spectral range of voice Several triangular band pass wave filters H of interior setting_m(k) (number that 0≤m≤M, M are filter), centre frequency f (m), the interval between each f (m) is broadening with the increase of m value.

The transmission function of triangular band pass wave filter group can be indicated with following formula:

(3) the Mel spectrum of Mel filter group output is taken and logarithm is calculated as follows obtains log spectrum S (m), for compressing The dynamic range of speech manual, and by the multiplying property conversion of noise in frequency domain at additivity ingredient.

(4) discrete cosine transform is carried out to log spectrum S (m), obtains the parameter of mel-frequency cepstrum coefficient (MFCC) c(n)。

Wherein L is the order of MFCC parameter.

Specific step is as follows for normalized energy characteristic parameter extraction in short-term:

(1) frame { Si (n), n=1,2 ..., N } for giving the length N in voice segments, calculates the logarithm energy in short-term of the frame The formula of amount is as follows；

Wherein L is the frame number of voice segments.

(2) since the energy difference of different phonetic section different speech frame is bigger, in order to can be in the cepstrum of front Coefficient is calculated together as vector, needs to be normalized.

Wherein, E_max=maxE₁, i.e., maximum logarithmic energy in voice segments.

Specific step is as follows for LPCC characteristic parameter extraction:

(1) solve linear prediction LPC: in linear prediction (LPC) analysis, channel model is expressed as the full pole mould of following formula Type:

P is the order of lpc analysis, a in formula_kIt is inverse filter for linear predictor coefficient (k=1,2 ..., p), A (z).LPC Analysis be just to solve for linear predictor coefficient a_k, the present invention, which uses, is based on (the i.e. Durbin calculation of autocorrelative Recursive Solution equation Method).

(2) the cepstrum coefficient LPCC of LPC is sought: pretreated voice signal x (n) cepstrumIt is defined as the Z of x (n) The logarithm transform of transformation, as

The mould for only considering X (z), ignores its phase, just obtains the cepstrum c (n) of signal are as follows:

C (n)=Z^-1(log|X(z)|-jargX(z))

LPCC is not instead of by input signal x (n), by LPC coefficient a_nIt obtains.LPCC parameter C_nRecurrence formula:

Dynamic feature coefficient: first-order difference, the first-order difference of linear prediction residue error, one of mel-frequency cepstrum coefficient Specific step is as follows for the extraction of order difference energy parameter:

Previously described mel-frequency cepstrum coefficient, linear prediction residue error, energy feature parameter only characterize voice The timely information of spectrum, belongs to static parameter.Experiment shows also to include letter related with speaker in the multidate information of speech manual Breath can be used to improve the discrimination of Speaker Recognition System.

(1) multidate information of speech cepstrum is the rule for characterizing speech characteristic parameter and changing over time.Speech cepstrum is at any time Between transformation can be expressed with following formula:

In formula, c_mIndicate that m rank cepstrum coefficient, n and k indicate the serial number of cepstrum coefficient on a timeline.H (k) (k=- K ,-k+1 ..., k-1, k) it is the window function that length is 2k+1, it is usually symmetrical.The coefficient of first order △ c of orthogonal polynomial_m (n) as above shown in formula.

(2) window function in practical application mostly uses rectangular window, and K usually takes 2, and dynamic parameter is known as present frame at this time The linear combination of front cross frame and rear two frame parameter.So being fallen according to the available mel-frequency cepstrum coefficient of above formula, linear prediction The first-order dynamic parameter of spectral coefficient, energy.

Specific step is as follows for the characteristic parameter extraction of Gamma tone filter cepstrum coefficient:

(1) Short Time Fourier Transform is carried out to pretreated voice signal, obtains its frequency spectrum.Here using in quick Fu Leaf transformation FFT to carry out discrete cosine transform to each frame voice signal.It will first be mended after each frame time-domain signal x (n) several A 0 with formation length is the sequence of N, then carries out Fast Fourier Transform (FFT) to it, finally obtains linear spectral X (k).X (k) with Conversion formula between x (n) are as follows:

(2) Gamma tone filter group is obtained, Gamma tone filter is the cochlea auditory filter an of standard, The time-domain pulse response of the filter are as follows:

G (t)=At^n-1e^-2πbtcos(2πf_i+φ_i) U (t), t >=0,1≤i≤N

In formula, A is filter gain, f_iIt is the centre frequency of filter, U (t) is jump function, φ_iIt is phase, in order to Simplified model enables φ_iIt is the order of filter for 0, n, experiment shows the filtering that can be good at simulating human ear cochlea when n=4 Feature.

b_tIt is the decay factor of filter, it determines the rate of decay of impulse response, and related with the bandwidth of filter, b_t=1.019ERB (f_i), in psychoacoustics,

In formula, N is the number of filter, and the centre frequency of each filter group equidistantly distributed on the domain ERB is entire to filter The frequency coverage of device group is 80Hz-8000Hz, and the calculation formula of each centre frequency is as follows:

Wherein f_HFor filter cutoff frequency, v_iIt is filter overlap factor, is used to specify between adjacent filter and is overlapped hundred Divide ratio.After each filter centre frequency determines, corresponding bandwidth can be obtained by above formula.

(3) Gamma tone filter group filters.The power spectrum X (k) obtained to step (1) is squared to obtain capacity spectrum, Then Gamma tone filtering group G is used_m(k) it is filtered.Log spectrum S (m) is obtained, for compressing the dynamic of speech manual Range, and by the multiplying property conversion of noise in frequency domain at additivity ingredient.Wherein the calculation formula of S (m) is as follows:

(4) discrete cosine transform is carried out to log spectrum S (m), obtains the spy of Gamma tone filter cepstrum coefficient It levies parameter G (n), G (n) calculation formula is as follows:

It is the refinement flow diagram of another embodiment of step S20 in Fig. 2 referring to Fig. 5, Fig. 5.In the present embodiment, above-mentioned step Suddenly S20 includes:

Step S210 ': the verifying voice is arranged in the sound spectrograph of predetermined number of latitude；

Specifically, it can be spaced the feature vector that predetermined latitude is extracted from verifying voice at every predetermined time, it will Verifying voice is arranged in the sound spectrograph of predetermined number of latitude.

Wherein, above-mentioned predetermined number of latitude, predetermined latitude and scheduled time interval can in specific implementation according to demand and/ Or the sets itselfs such as system performance, the present embodiment to the size of above-mentioned predetermined number of latitude, predetermined latitude and scheduled time interval not It limits.

Step S220 ': it is identified by sound spectrograph of the neural network to the predetermined number of latitude, obtains the verifying voice The second vocal print feature.

Verifying voice is arranged in the sound spectrograph of predetermined number of latitude, is then composed by language of the neural network model to predetermined number of latitude Figure is identified, the second vocal print feature of verifying voice is obtained, and extracts verifying language by neural network model so as to realize Second vocal print feature of sound can preferably characterize the acoustic feature in voice, improve the accuracy rate of speech recognition.

It is worth noting that, the two is when carrying out the first vocal print feature and the second vocal print feature extraction to verifying voice It is non-interfering, that is to say, that above-mentioned steps S210, step S220 are mutually indepedent relative to step S210 ', step S220 ' It carries out, and is sequence in no particular order between step S210, step S220 and step S210 ', step S220 '.

Further, in one embodiment of method for recognizing sound-groove of the present invention, above-mentioned steps S30 is specifically included:

In the present embodiment, Markov Chain Monte Carlo stochastic model obtains from the first vocal print feature multiple respectively at random Feature, obtains multiple features from the second vocal print feature, then by the multiple features obtained from the first vocal print feature and from second The multiple Fusion Features obtained in vocal print feature are verified the fusion vocal print feature vector of voice.

For example, Markov Chain Monte Carlo stochastic model extracts 10 from 15 features in the first vocal print feature at random A feature extracts 15 features from 20 features of the second vocal print feature, 25 vocal print features can be obtained after fusion Along the fusion vocal print feature vector of voice.

It is the refinement flow diagram of mono- embodiment of step S30 in Fig. 2 referring to Fig. 6, Fig. 6.In the present embodiment, described One vocal print feature includes multiple first vocal print subcharacters, and second vocal print feature includes multiple second vocal print subcharacters；

Based on the above embodiment, in the present embodiment, above-mentioned steps S30 includes:

Step S310: the fusion feature vocal print total characteristic number of verifying voice is set as K；

Step S320: the fusion vocal print feature total characteristic according to the verifying voice is K, determines the using direct sampling method One vocal print subcharacter and the second vocal print subcharacter integration percentage；

Step S330: according to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, utilize MCMC's The sampling process of Gibbs sampling simulation joint normal distribution determines the first vocal print that first vocal print feature is chosen respectively Second vocal print subcharacter of feature and the second vocal print Feature Selection, form it is described verifying voice fusion vocal print feature to Amount.

Further, step 320 specifically includes:

Step D:k value increases by 1, judges whether k≤K, if it is counts the fusion feature wait be selected into the verifying voice First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process；It is no Then, step C in return.

Assuming that the total latitude number K=8, the parameter p=generated at random of the fusion vocal print feature vector of the verifying voice of setting 0.4, the number A=3 of the first vocal print subcharacter to be selected in, the second vocal print subcharacter are obtained by the iteration of 8 above process Number B=5, then subsequent specific features choose during to choose 3 the first vocal print subcharacters and 5 the second vocal prints Subcharacter.

Further, step 330 specifically includes:

X (0)=[x₁(0), x₂(0)…x_M(0)]；

P(x_i(t+1)|x₁(t+1), x₂(t+1)…x_i-1(0), x_i+1(t)…x_M(t)),

P (T)=[P (x₁(T)), P (x₂(T)) ... P (x_i(T)) ... P (x_M(T))]；

Y (0)=[y₁(0), y₂(0)…y_N(0)]；

P(y_i(t+1)|y₁(t+1), y₂(t+1)…y_j-1(0), y_j+1(t)…y_N(t)),

Wherein, the mean value of joint probability distribution is Y；

Judge whether otherwise t < T is obtained if so, thening follow the steps K

P (T)=[P (y₁(T)), P (y₂(T)) ... P (y_j(T)) ... P (y_N(T))]；

If the second vocal print subcharacter totally 5 in the verifying voice fusion vocal print feature vector acquired in upper step, in step D X in calculated the present embodiment₀(0)=[0.2,0.3,0.4,0.5,0.6]；When t=0, according to Px_i(t+1)=[x₁(t+1), x₂(t+1) ... X_i-1(t+1), x_i+1(t+1)...x_M(t+1)] Px is successively obtained₁(1)、Px₂(1)、Px₃(1)、Px₄(1)、Px₅ (1), it is assumed that Px is calculated_i(1)=[0.5,0.6,0.2,0.8,0.1].It circuits sequentially, until reaching predetermined transfer number, T=50 in the present embodiment, is calculated Px_i(50), it is assumed that Px is calculated_i(50)=[0.6,0.2,0.5,0.8,0.9], then Two behavioural characteristics for choosing corresponding maximum probability are added verifying voice and merge vocal print feature vector.

In addition, the present invention also provides a kind of voice print identification devices.

It is the functional block diagram of one embodiment of voice print identification device of the present invention referring to Fig. 7, Fig. 7.

In the present embodiment, the voice print identification device includes:

Data acquisition module 10, for obtaining verifying voice to be identified；

Data processing module 20, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using nerve Network model extracts the second vocal print feature of the verifying voice；

Data fusion module 30, for the first vocal print feature of the verifying voice and the second vocal print feature to be carried out feature Fusion obtains the fusion vocal print feature vector of the verifying voice；

Data comparison module 40, for calculating the fusion vocal print feature vector and default registration vocal print number of the verifying voice According to the similarity between the vocal print feature vector for respectively registering user in library；

Data judgment module 50 determines the Application on Voiceprint Recognition result of the verifying voice for being based on the similarity.

Further, data acquisition module 10 is also used to obtain the registration voice of registration user；Data processing module 20 is also For extracting the note using neural network model using the third vocal print feature for registering voice described in GMM-UBM model extraction The falling tone line feature of volume voice；Data fusion module 30 is also used to the third vocal print feature and the falling tone of the registration voice Line feature carries out Fusion Features, obtains the fusion vocal print feature vector of the registration voice；

The voice print identification device further includes data memory module 60, for by it is described registration voice fusion vocal print feature Vector is saved in the registration voice print database, using the vocal print feature vector as registration user.

Further, the data processing module 20 further include:

First pretreatment unit 201 is pre-processed for carrying out preemphasis, framing and adding window to the verifying voice；

First extraction unit 202, for extracting pitch period from the pretreated verifying voice, linear prediction is fallen Spectral coefficient, the first-order difference of linear prediction residue error, energy, the first-order difference of energy and Gamma tone filter cepstrum The characteristic parameter of coefficient obtains the first vocal print feature of the verifying voice；

Second pretreatment unit 203, for the verifying voice to be arranged in the sound spectrograph of predetermined number of latitude；

Second extraction unit 202 obtains institute for identifying by sound spectrograph of the neural network to the predetermined number of latitude State the second vocal print feature of verifying voice.

Further, the data fusion module 30 includes:

Data fusion unit 301, for carrying out the first vocal print feature dimension using Markov Chain Monte Carlo stochastic model The fusion of degree and the second vocal print characteristic dimension obtains the fusion vocal print feature vector of the verifying voice.

Further, data fusion unit 301 includes:

Subelement 3011 is set, for setting the fusion feature vocal print total characteristic number of verifying voice as K；

It determines subelement 3012, for being K according to the fusion vocal print feature total characteristic of the verifying voice, utilizes directly pumping Sample method determines the first vocal print subcharacter and the second vocal print subcharacter integration percentage；

Subelement 3013 is merged, for the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter, is utilized The sampling process of the Gibbs sampling simulation joint normal distribution of MCMC, determines first vocal print feature is chosen first respectively Second vocal print subcharacter of vocal print subcharacter and the second vocal print Feature Selection, the fusion vocal print for forming the verifying voice are special Levy vector.

Further, the determining subelement 3012 is used for:

Further, the fusion subelement 3013 is used for:

X (0)=[x₁(0), x₂(0)…x_M(0)]；

P(x_i(t+1)|x₁(t+1), x₂(t+1)…x_i-1(0), x_i+1(t)…x_M(t)),

P (T)=[P (x₁(T)), P (x₂(T)) ... P (x_i(T)) ... P (x_M(T))]；

Y (0)=[y₁(0), y₂(0)…y_N(0)]；

P(y_i(t+1)|y₁(t+1), y₂(t+1)…y_j-1(0), y_j+1(t)…y_N(t)),

Wherein, the mean value of joint probability distribution is Y；

Judge whether otherwise t < T is obtained if so, thening follow the steps K

P (T)=[P (y₁(T)), P (y₂(T)) ... P (y_j(T)) ... P (y_N(T))]；

In addition, the embodiment of the present invention also provides a kind of computer readable storage medium.

Application on Voiceprint Recognition program is stored on computer readable storage medium of the present invention, wherein the Application on Voiceprint Recognition program is located When managing device execution, realize such as the step of above-mentioned method for recognizing sound-groove.

Wherein, Application on Voiceprint Recognition program, which is performed realized method, can refer to each reality of method for recognizing sound-groove of the present invention Example is applied, details are not described herein again.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the present invention.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly Other related technical areas are used in, all of these belong to the protection of the present invention.

Claims

1. a kind of method for recognizing sound-groove, which is characterized in that the method for recognizing sound-groove the following steps are included:

Obtain verifying voice to be identified；

Using described in GMM-UBM model extraction verify voice the first vocal print feature, using neural network model extract described in test Demonstrate,prove the second vocal print feature of voice；

First vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, are verified the fusion of voice Vocal print feature vector；

Calculate the vocal print that user is respectively registered in the fusion vocal print feature vector and default registration voice print database of the verifying voice Similarity between feature vector；

2. method for recognizing sound-groove as described in claim 1, which is characterized in that it is described obtain verifying voice to be identified it Before, further includes:

Obtain the registration voice of registration user；

Using the third vocal print feature for registering voice described in GMM-UBM model extraction, the note is extracted using neural network model The falling tone line feature of volume voice；

The third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtain the registration voice Merge vocal print feature vector；

By it is described registration voice fusion vocal print feature vector be saved in the registration voice print database, using as register user Vocal print feature vector.

3. method for recognizing sound-groove as described in claim 1, which is characterized in that described use is tested described in GMM-UBM model extraction Card voice the first vocal print feature include:

Pitch period, linear prediction residue error, linear prediction residue error are extracted from the pretreated verifying voice First-order difference, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, obtain described Verify the first vocal print feature of voice；

It is identified by sound spectrograph of the neural network to the predetermined number of latitude, the second vocal print for obtaining the verifying voice is special Sign.

4. method for recognizing sound-groove as described in claim 1, which is characterized in that first vocal print by the verifying voice is special Sign carries out Fusion Features with the second vocal print feature, and the fusion vocal print feature vector for obtaining the verifying voice includes:

Melting for the first vocal print feature dimension and the second vocal print characteristic dimension is carried out using Markov Chain Monte Carlo stochastic model It closes, obtains the fusion vocal print feature vector of the verifying voice.

5. method for recognizing sound-groove as claimed in claim 4, which is characterized in that first vocal print feature includes multiple first sound Line subcharacter, second vocal print feature include multiple second vocal print subcharacters；

It is described to carry out the first vocal print feature dimension and the second vocal print characteristic dimension using Markov Chain Monte Carlo stochastic model Fusion, obtain it is described verifying voice fusion vocal print feature vector include:

Fusion vocal print feature total characteristic according to the verifying voice is K, determines the first vocal print subcharacter using direct sampling method And the second vocal print subcharacter integration percentage；

According to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, the Gibbs sampling simulation joint of MCMC is utilized The sampling process of normal distribution determines the first vocal print subcharacter and second vocal print that first vocal print feature is chosen respectively Second vocal print subcharacter of Feature Selection forms the fusion vocal print feature vector of the verifying voice.

6. method for recognizing sound-groove as claimed in claim 5, which is characterized in that the fusion vocal print according to the verifying voice Feature total characteristic is K, determines the first vocal print subcharacter and the second vocal print subcharacter integration percentage packet using direct sampling method It includes:

Step A: the random number generated between one [0,1] represents the first vocal print subcharacter in institute as parameter p, parameter p State ratio shared in the fusion vocal print feature of verifying voice；

Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one described the The quantity of two vocal print subcharacters, the second vocal print subcharacter adds 1, as q > p, chooses the first vocal print subcharacter, The quantity of the first vocal print subcharacter adds 1；

Step D:k value increases by 1, judges whether k≤K, if it is counts the fusion feature vocal print wait be selected into the verifying voice First vocal print subcharacter of vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process；Otherwise, Return step C.

7. method for recognizing sound-groove as claimed in claim 6, which is characterized in that described according to the first vocal print subcharacter and the rising tone The integration percentage of line subcharacter, using MCMC Gibbs sampling simulation joint normal distribution sampling process, respectively determine described in Second vocal print subcharacter of the first vocal print subcharacter and the second vocal print Feature Selection that the first vocal print feature is chosen, forms institute State verifying voice fusion vocal print feature vector include:

Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generate M it is a [0, 1] random number between is as original state

X (0)=[x₁(0), x₂(0)…x_M(0)]；

Step G: transfer number t every increase by 1, to each variable x_i(t), i ∈ { 1,2 ... M }, by being obtained below by joint probability distribution To conditional probability distribution formula calculated as follows:

P(x_i(t+1)|x₁(t+1),x₂(t+1)…x_i-1(0),x_i+1(t)…x_M(t)),

P (T)=[P (x₁(T)),P(x₂(T)),…P(x_i(T)),…P(x_M(T))]；

Step H: according to the first vocal print described in the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D Subcharacter is number A, A corresponding probability P x before choosing_i(T) maximum first vocal print subcharacter melting as selected verifying voice First vocal print subcharacter of chorus line feature vector；

Step J: verifying the number of feature in the fusion vocal print feature vector of voice described in statistics gatherer, is recorded as N, generates N number of [0,1] random number between is as original state

Y (0)=[y₁(0), y₂(0)…y_N(0)]；

Step K: transfer number t every increase by 1, to each variable y_j(t), j ∈ { 1,2 ... N }, by being obtained below by joint probability distribution To conditional probability distribution formula calculated as follows:

P(y_i(t+1)|y₁(t+1),y₂(t+1)…y_j-1(0),y_j+1(t)…y_N(t)),

Wherein, the mean value of joint probability distribution is Y；

Judge whether t < T, if so, thening follow the steps K, otherwise obtains

P (T)=[P (y₁(T)),P(y₂(T)),…P(y_j(T)),…P(y_N(T))]；

Step L: the second vocal print according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D is special Sign is number B, B corresponding probability P y before choosing_j(T) fusion sound of the maximum second vocal print subcharacter as selected verifying voice Second vocal print subcharacter of line feature vector.

8. a kind of voice print identification device, which is characterized in that the voice print identification device includes:

Data acquisition module, for obtaining verifying voice to be identified；

Data processing module, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using neural network mould Type extracts the second vocal print feature of the verifying voice；

Data fusion module, for the first vocal print feature of the verifying voice and the second vocal print feature to be carried out Fusion Features, Obtain the fusion vocal print feature vector of the verifying voice；

Data comparison module, for calculating the fusion vocal print feature vector of the verifying voice and presetting in registration voice print database Similarity between the vocal print feature vector of each registration user；

9. a kind of application on voiceprint recognition equipment, which is characterized in that the application on voiceprint recognition equipment includes processor, memory and is stored in On the memory and the Application on Voiceprint Recognition program that can be executed by the processor, the Application on Voiceprint Recognition program are held by the processor The step of method for recognizing sound-groove as described in any one of claims 1 to 7 is realized when row.

10. a kind of computer readable storage medium, which is characterized in that be stored with vocal print knowledge on the computer readable storage medium Other program realizes the Application on Voiceprint Recognition as described in any one of claims 1 to 7 when the Application on Voiceprint Recognition program is executed by processor The step of method.