CN110047490A - Method for recognizing sound-groove, device, equipment and computer readable storage medium - Google Patents
Method for recognizing sound-groove, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN110047490A CN110047490A CN201910182453.3A CN201910182453A CN110047490A CN 110047490 A CN110047490 A CN 110047490A CN 201910182453 A CN201910182453 A CN 201910182453A CN 110047490 A CN110047490 A CN 110047490A
- Authority
- CN
- China
- Prior art keywords
- vocal print
- voice
- fusion
- print feature
- verifying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000003860 storage Methods 0.000 title claims abstract description 15
- 230000001755 vocal effect Effects 0.000 claims abstract description 342
- 230000004927 fusion Effects 0.000 claims abstract description 120
- 238000000605 extraction Methods 0.000 claims abstract description 30
- 238000003062 neural network model Methods 0.000 claims abstract description 25
- 239000000284 extract Substances 0.000 claims abstract description 22
- 238000005070 sampling Methods 0.000 claims description 18
- 230000010354 integration Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000009432 framing Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000004088 simulation Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 2
- 238000002844 melting Methods 0.000 claims 2
- 230000008018 melting Effects 0.000 claims 2
- 241001342895 Chorus Species 0.000 claims 1
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 claims 1
- 230000000630 rising effect Effects 0.000 claims 1
- 238000012795 verification Methods 0.000 abstract description 7
- 238000001228 spectrum Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000004615 ingredient Substances 0.000 description 3
- 238000005267 amalgamation Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 210000003477 cochlea Anatomy 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 210000005036 nerve Anatomy 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000005086 pumping Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of method for recognizing sound-groove, device, equipment and computer readable storage medium, which includes: to obtain verifying voice to be identified;Using the first vocal print feature of GMM-UBM model extraction verifying voice, the second vocal print feature of verifying voice is extracted using neural network model;The first vocal print feature for verifying voice and the second vocal print feature are subjected to Fusion Features, are verified the fusion vocal print feature vector of voice;Calculate the similarity between the vocal print feature vector for respectively registering user in the fusion vocal print feature vector and default registration voice print database of verifying voice;Based on similarity, the Application on Voiceprint Recognition result of decision verification voice.Two models extract feature to verifying voice respectively and are used to carry out speech verification, compared to single model extraction verifying voice feature and for carrying out speech verification, the information that the feature of two model extractions is included is more comprehensive, so that the accuracy rate of Application on Voiceprint Recognition is improved.
Description
Technical field
The present invention relates to sound groove recognition technology in e fields more particularly to method for recognizing sound-groove, unit and computer can
Read storage medium.
Background technique
Voiceprint Recognition System is according to the speciality of voice come the system of automatic identification speaker's identity, body line identification technology category
In one kind of biometric authentication technology, i.e., verified by identity of the voice to speaker.This technology has preferable convenient
The features such as property, stability, measurability, safety, it is generally used for bank, social security, public security, smart home, mobile payment etc.
Field.
Current Voiceprint Recognition System is generally based on gauss hybrid models-common background mould of proposition the 1990s
Type (GMM-UBM), the model simple and flexible and have preferable robustness.However, recently as the development of technology, nerve net
The training study of network achieves breakthrough development, and voiceprint verification system neural network based is applied and practices, and is based on
The model of neural network closes the performance showed in some collection and is higher than single gauss hybrid models-universal background model
(GMM-UBM)。
Summary of the invention
The main purpose of the present invention is to provide a kind of method for recognizing sound-groove, unit and computer-readable storages
Medium, it is intended to solve the not high technical problem of accuracy of speech recognition in the prior art.
To achieve the above object, a kind of method for recognizing sound-groove provided by the invention, the method for recognizing sound-groove include following
Step:
Obtain verifying voice to be identified;
Using the first vocal print feature for verifying voice described in GMM-UBM model extraction, institute is extracted using neural network model
State the second vocal print feature of verifying voice;
First vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, obtain the verifying language
The fusion vocal print feature vector of sound;
It calculates in the fusion vocal print feature vector and default registration voice print database of the verifying voice and respectively registers user's
Similarity between vocal print feature vector;
Based on the similarity, the Application on Voiceprint Recognition result of the verifying voice is determined.
Optionally, before the acquisition verifying voice to be identified, further includes:
Obtain the registration voice of registration user;
Using the third vocal print feature for registering voice described in GMM-UBM model extraction, institute is extracted using neural network model
State the falling tone line feature of registration voice;
The third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtain the registration language
The fusion vocal print feature vector of sound;
The fusion vocal print feature vector of the registration voice is saved in the registration voice print database, using as registration
The vocal print feature vector of user.
Optionally, described to include: using the first vocal print feature for verifying voice described in GMM-UBM model extraction
Preemphasis, framing and adding window pretreatment are carried out to the verifying voice;
Pitch period, linear prediction residue error, linear prediction cepstrum coefficient are extracted from the pretreated verifying voice
The first-order difference of coefficient, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, obtain
First vocal print feature of the verifying voice;
It is described using neural network model extract it is described verifying voice the second vocal print feature include:
The verifying voice is arranged in the sound spectrograph of predetermined number of latitude;
It is identified by sound spectrograph of the neural network to the predetermined number of latitude, obtains the second vocal print of the verifying voice
Feature.
Optionally, described that first vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, it obtains
Include: to the fusion vocal print feature vector for verifying voice
The first vocal print feature dimension and the second vocal print characteristic dimension are carried out using Markov Chain Monte Carlo stochastic model
Fusion, obtain it is described verifying voice fusion vocal print feature vector.
Optionally, first vocal print feature includes multiple first vocal print subcharacters, and second vocal print feature includes more
A second vocal print subcharacter;
It is described to carry out the first vocal print feature dimension and the second vocal print feature using Markov Chain Monte Carlo stochastic model
The fusion of dimension, the fusion vocal print feature vector for obtaining the verifying voice include:
The fusion feature vocal print total characteristic number of verifying voice is set as K;
Fusion vocal print feature total characteristic according to the verifying voice is K, determines the first vocal print using direct sampling method
Feature and the second vocal print subcharacter integration percentage;
According to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, simulation is sampled using the Gibbs of MCMC
The sampling process of joint normal distribution determines the first vocal print subcharacter and described second that first vocal print feature is chosen respectively
The second vocal print subcharacter that vocal print feature is chosen forms the fusion vocal print feature vector of the verifying voice.
Optionally, the fusion vocal print feature total characteristic according to the verifying voice is K, is determined using direct sampling method
First vocal print subcharacter and the integration percentage of the second vocal print subcharacter include:
Step A: the random number generated between one [0,1] represents the first vocal print subcharacter as parameter p, parameter p
The shared ratio in the fusion vocal print feature of the verifying voice;
Step B: the initial value k=0 for recording the counter of the number of iterations is initialized;
Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one
The quantity of the second vocal print subcharacter, the second vocal print subcharacter adds 1, as q > p, chooses first vocal print
The quantity of subcharacter, the first vocal print subcharacter adds 1;
Step D:k value increases by 1, judges whether k >=K, if it is counts the fusion feature wait be selected into the verifying voice
First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process;It is no
Then, return step C.
Optionally, the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter, utilizes MCMC's
The sampling process of Gibbs sampling simulation joint normal distribution determines the first vocal print that first vocal print feature is chosen respectively
Second vocal print subcharacter of feature and the second vocal print Feature Selection forms the fusion vocal print feature vector of the verifying voice
Include:
Step E: transfer number threshold value is set as T, initializes transfer number t=0;
Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generates M
Random number between a [0,1] is as original state
X (0)=[x1(0), x2(0)…xM(0)];
Step G: transfer number t every increase by 1, to each variable xi(t), i ∈ { 1,2 ... M }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(xi(t+1)|x1(t+1), x2(t+1)…xi-1(0), xi+1(t)…xM(t)),
Wherein, the mean value of joint probability distribution is X;Judge whether t < T, if it is return step G, otherwise obtains
P (T)=[P (x1(T)), P (x2(T)) ... P (xi(T)) ... P (xM(T))];
Step H: first according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Vocal print subcharacter is number A, A corresponding probability P x before choosingi(T) maximum first vocal print subcharacter is as selected verifying voice
Fusion vocal print feature vector the first vocal print subcharacter;
Step I: transfer number threshold value is set as T, initializes transfer number t=0;
Step J: verifying the number of feature in the fusion vocal print feature vector of voice, be recorded as N described in statistics gatherer, generates
Random number between N number of [0,1] is as original state
Y (0)=[y1(0), y2(0)…yN(0)];
Step K: transfer number t every increase by 1, to each variable yj(t), j ∈ { 1,2 ... N }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(yi(t+1)|y1(t+1), y2(t+1)…yj-1(0), yj+1(t)…yN(t)),
Wherein, the mean value of joint probability distribution is Y;
Judge whether otherwise t < T is obtained if so, thening follow the steps K
P (T)=[P (y1(T)), P (y2(T)) ... P (yj(T)) ... P (yN(T))];
Step L: according to the second vocal print of the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Subcharacter is number B, B corresponding probability P y before choosingj(T) maximum second vocal print subcharacter melting as selected verifying voice
Second vocal print subcharacter of chorus line feature vector.
In addition, to achieve the above object, the present invention also provides a kind of voice print identification device, the voice print identification device packet
It includes:
Data acquisition module, for obtaining verifying voice to be identified;
Data processing module, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using nerve net
The second vocal print feature of voice is verified described in network model extraction;
Data fusion module melts for the first vocal print feature of the verifying voice to be carried out feature with the second vocal print feature
It closes, obtains the fusion vocal print feature vector of the verifying voice;
Data comparison module, for calculating the fusion vocal print feature vector and default registration voice print database of the verifying voice
The similarity between the vocal print feature vector of user is respectively registered in library;
Data judgment module determines the Application on Voiceprint Recognition result of the verifying voice for being based on the similarity.
Optionally, the data processing module is also used to obtain the registration voice of registration user, the data processing module
Be also used to using the third vocal print feature for registering voice described in GMM-UBM model extraction, extracted using neural network model described in
Register the falling tone line feature of voice;The data fusion module is also used to the first vocal print feature and the of the verifying voice
Two vocal print features carry out Fusion Features, obtain the fusion vocal print feature vector of the verifying voice;The voice print identification device is also
Including data memory module, the data memory module is used to the fusion vocal print feature vector of the registration voice being saved in institute
It states in registration voice print database, using the vocal print feature vector as registration user.
Optionally, the data processing module further include:
First pretreatment unit is pre-processed for carrying out preemphasis, framing and adding window to the verifying voice;
First extraction unit, for extracting pitch period, linear prediction cepstrum coefficient from the pretreated verifying voice
Coefficient, the first-order difference of linear prediction residue error, energy, the first-order difference of energy and Gamma tone filter cepstrum
The characteristic parameter of coefficient obtains the first vocal print feature of the verifying voice;
Second pretreatment unit, for the verifying voice to be arranged in the sound spectrograph of predetermined number of latitude;
Second extraction unit obtains described for being identified by sound spectrograph of the neural network to the predetermined number of latitude
Verify the second vocal print feature of voice.
Optionally, the data fusion module includes:
Data fusion unit, for using Markov Chain Monte Carlo stochastic model carry out the first vocal print feature dimension with
The fusion of second vocal print characteristic dimension obtains the fusion vocal print feature vector of the verifying voice.
Optionally, the data fusion unit includes:
Subelement is set, for setting the fusion feature vocal print total characteristic number of verifying voice as K;
It determines subelement, for being K according to the fusion vocal print feature total characteristic of the verifying voice, utilizes direct sampling method
Determine the first vocal print subcharacter and the second vocal print subcharacter integration percentage;
It merges subelement and utilizes MCMC for the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter
Gibbs sampling simulation joint normal distribution sampling process, determine the first vocal print that first vocal print feature is chosen respectively
Second vocal print subcharacter of subcharacter and the second vocal print Feature Selection, form it is described verifying voice fusion vocal print feature to
Amount.
Optionally, the determining subelement is used for:
Step A: the random number generated between one [0,1] represents the first vocal print subcharacter as parameter p, parameter p
The shared ratio in the fusion vocal print feature of the verifying voice;
Step B: the initial value k=0 for recording the counter of the number of iterations is initialized;
Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one
The quantity of the second vocal print subcharacter, the second vocal print subcharacter adds 1, as q > p, chooses first vocal print
The quantity of subcharacter, the first vocal print subcharacter adds 1;
Step D:k value increases by 1, judges whether k >=K, if it is counts the fusion feature wait be selected into the verifying voice
First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process;It is no
Then, return step C.
Optionally, the fusion subelement is used for:
Step E: transfer number threshold value is set as T, initializes transfer number t=0;
Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generates M
Random number between a [0,1] is as original state
X (0)=[x1(0), x2(0)…xM(0)];
Step G: transfer number t every increase by 1, to each variable xi(t), i ∈ { 1,2 ... M }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(xi(t+1)|x1(t+1), x2(t+1)…xi-1(0), xi+1(t)…xM(t)),
Wherein, the mean value of joint probability distribution is X;Judge whether t < T, if it is return step G, otherwise obtains
P (T)=[P (x1(T)), P (x2(T)) ... P (xi(T)) ... P (xM(T))];
Step H: first according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Vocal print subcharacter is number A, A corresponding probability P x before choosingi(T) maximum first vocal print subcharacter is as selected verifying voice
Fusion vocal print feature vector the first vocal print subcharacter;
Step I: transfer number threshold value is set as T, initializes transfer number t=0;
Step J: verifying the number of feature in the fusion vocal print feature vector of voice, be recorded as N described in statistics gatherer, generates
Random number between N number of [0,1] is as original state
Y (0)=[y1(0), y2(0)…yN(0)];
Step K: transfer number t every increase by 1, to each variable yj(t), j ∈ { 1,2 ... N }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(yi(t+1)|y1(t+1), y2(t+1)…yj-1(0), yj+1(t)…yN(t))
Wherein, the mean value of joint probability distribution is Y;
Judge whether otherwise t < T is obtained if so, thening follow the steps K
P (T)=[P (y1(T)), P (y2(T)) ... P (yj(T)) ... P (yN(T))];
Step L: according to the second vocal print of the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Subcharacter is number B, B corresponding probability P y before choosingj(T) maximum second vocal print subcharacter melting as selected verifying voice
Second vocal print subcharacter of chorus line feature vector.
In addition, to achieve the above object, the present invention also provides a kind of application on voiceprint recognition equipment, the application on voiceprint recognition equipment includes
Processor, memory and it is stored in the Application on Voiceprint Recognition program that can be executed on the memory and by the processor, the sound
The step of line recognizer realizes above-mentioned method for recognizing sound-groove when being executed by the processor.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium
Application on Voiceprint Recognition program is stored on storage medium, the Application on Voiceprint Recognition program realizes above-mentioned Application on Voiceprint Recognition side when being executed by processor
The step of method.
The present invention extracts the first vocal print feature of verifying voice by GMM-UBM model from verifying voice, passes through nerve
Network model extracts the second vocal print feature of verifying voice from verifying voice;The first vocal print feature and second of voice will be verified
Vocal print feature is merged, and the fusion vocal print feature vector of voice is verified;Calculate verifying voice fusion feature vocal print to
The similarity between the vocal print feature vector of user is respectively registered in amount and default voice print database;Based on similarity, decision verification
The Application on Voiceprint Recognition result of voice.By the above-mentioned means, GMM-UBM model and neural network model are combined, two models point
It is other that feature is extracted to verifying voice, while two extracted features of model are used to carry out speech verification, compared to single mould
For type extracts the feature of verifying voice and carries out speech verification, the information that the feature of two model extractions is included is more complete
Face thus can comprehensively be verified verifying voice and registration voice, so that the accuracy rate of Application on Voiceprint Recognition obtains
It improves.
Detailed description of the invention
Fig. 1 is the hardware structural diagram of application on voiceprint recognition equipment involved in the embodiment of the present invention;
Fig. 2 is the flow diagram of one embodiment of method for recognizing sound-groove of the present invention;
Fig. 3 is the flow diagram of another embodiment of method for recognizing sound-groove of the present invention;
Fig. 4 is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2;
Fig. 5 is the refinement flow diagram of another embodiment of step S20 in Fig. 2;
Fig. 6 is the flow diagram of mono- embodiment of step S30 in Fig. 2;
Fig. 7 is the functional block diagram of one embodiment of vocal print device of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is the hardware structural diagram that Application on Voiceprint Recognition involved in the embodiment of the present invention manages equipment.
In the embodiment of the present invention, application on voiceprint recognition equipment may include processor 1001 (such as CPU), communication bus 1002, user interface
1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components;
User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard);Network interface 1004 can
Choosing may include standard wireline interface and wireless interface (such as WI-FI interface);Memory 1005 can be high-speed RAM storage
Device, is also possible to stable memory (non-volatile memory), such as magnetic disk storage, and memory 1005 is optional
It can also be the storage device independently of aforementioned processor 1001.
It will be understood by those skilled in the art that hardware configuration shown in Fig. 1 does not constitute the limit to application on voiceprint recognition equipment
It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
With continued reference to Fig. 1, the memory 1005 in Fig. 1 as a kind of computer readable storage medium may include operation system
System, network communication module and data administrator.
In Fig. 1, network communication module is mainly used for connecting server, carries out data communication with server;And processor
1001 can call the data administrator stored in memory 1005, and execute Application on Voiceprint Recognition side provided in an embodiment of the present invention
Method.
Based on above-mentioned application on voiceprint recognition equipment, each embodiment of method for recognizing sound-groove of the invention is proposed.
Referring to Fig. 2, Fig. 2 is the flow diagram of one embodiment of method for recognizing sound-groove of the present invention, in the present embodiment, the sound
Line recognition methods the following steps are included:
Step S10: verifying voice to be identified is obtained;
In the present embodiment, voice is verified to have carried out the sound that the user after voice registration is issued, if the user
Voice registration is not carried out, then the sound that the user is issued is invalid voice.There are many kinds of the acquisition modes for verifying voice, such as
The sound that the user that voice is registered is issued is obtained by microphone, the sound that microphone will acquire is sent to vocal print knowledge
Other processing terminal;For another example the sound that the user that voice is registered is issued is obtained by intelligent terminal (mobile phone, plate etc.)
Sound, the verifying voice that intelligent terminal will acquire are sent to the processing terminal of application on voiceprint recognition equipment;Certainly, verifying voice can also be adopted
It is obtained with other equipment, is just not listed one by one herein.
It is worth noting that, can also be sieved to verifying voice to be identified when obtaining verifying voice to be identified
Choosing, to reject second-rate verifying voice to be identified.Specifically, knowledge can also be treated simultaneously when obtaining verifying voice
It is other verifying voice duration and it is to be identified verifying voice volume detected, if it is to be identified verifying voice when grow up
It in or equal to default voice duration, then prompts to obtain verifying voice success to be identified, if the duration of verifying voice to be identified
Less than default voice duration, then prompt to obtain verifying voice failure to be identified.So set, ensure that the to be identified of acquisition
The quality for verifying voice, also ensure that extracted from verifying voice to be identified be characterized in it is obvious, clearly, from
And be conducive to improve the accuracy rate of Application on Voiceprint Recognition.
Step S20: using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using neural network mould
Type extracts the second vocal print feature of the verifying voice;
In the present embodiment, GMM-UBM model (gauss hybrid models-universal background model) and neural network model simultaneously from
Feature is extracted in verifying voice, since GMM-UBM model and neural network model are two different models, two models
When extracting vocal print feature from verifying voice, identical vocal print feature may be extracted, it is also possible to different vocal print features is extracted,
It is also possible to the identical vocal print feature of extraction unit point, the specific restriction that just do not do herein.Preferably, GMM-UBM model and mind
Extract different vocal print features from verifying voice through network model, for example, GMM-UBM model extracted from verifying voice the
It include multiple subcharacters such as tone color, frequency, amplitude, volume in one vocal print feature, neural network model is extracted from verifying voice
The second vocal print feature in include the multiple subcharacters of fundamental frequency, mel-frequency cepstrum coefficient, formant, fundamental tone, reflection coefficient etc..
It should be noted that GMM-UBM model and neural network model the extraction sound in the same acoustic segment of verifying voice
Line feature, GMM-UBM model and neural network model can also extract vocal print feature in the alternative sounds section of verifying voice,
GMM-UBM model and neural network model can also extract vocal print feature in the partly overlapping acoustic segment of verifying voice,
This does not do specific restriction.
Step S30: the first vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, tested
Demonstrate,prove the fusion vocal print feature vector of voice;
In the present embodiment, the fusion vocal print feature vector of voice is verified by the first vocal print feature of verifying voice and second
What vocal print feature merged, there are many kinds of the amalgamation modes of the first vocal print feature and the second vocal print feature, such as the first vocal print
Feature and the second vocal print feature merge the fusion vocal print feature vector to form verifying voice by way of being overlapped mutually, and for another example the
One vocal print feature and the second vocal print feature merge the fusion vocal print spy to form verifying voice in such a way that part subcharacter is superimposed
Levy vector.Certainly, the first vocal print feature and the second vocal print feature for verifying voice can also be merged using other modes, herein
Just be not listed one by one.
Step S40: it calculates and is respectively infused in the fusion vocal print feature vector and default registration voice print database of the verifying voice
Similarity between the vocal print feature vector of volume user;
In the present embodiment, the vocal print feature vector for registering user is application on voiceprint recognition equipment foundation in user speech registration
, each user is corresponding with the vocal print feature vector of a registration user, the vocal print feature of the registration user of each user
Vector is stored in the data storage module of application on voiceprint recognition equipment, and the vocal print feature vector of multiple registration users constitutes default
Register voice print database.
Verify the calculating of the similarity between the fusion vocal print feature vector of voice and the vocal print feature vector of registration user
There are many kinds of methods, such as similar between the fusion vocal print feature vector and the vocal print feature vector of registration user of verifying voice
Degree is calculated using cosine similarity, i.e., according to formula:
The fusion vocal print feature vector for calculating verifying voice is similar to the cosine between the vocal print feature vector of registration user
Degree, the value being calculated is bigger, then illustrates that merging vocal print feature vector and the similarity for the vocal print feature vector for registering user gets over
Small, the value being calculated is smaller, then illustrates that merging vocal print feature vector and the similarity for the vocal print feature vector for registering user gets over
Greatly.
Certainly, verify voice fusion vocal print feature vector and register user vocal print feature vector between similarity also
It can be calculated using Pearson correlation coefficient, Euclidean distance, cosine similarity, manhatton distance etc., it is just different herein
One lists.
It is worth noting that, be stored in general default registration voice print database the vocal print feature of a large amount of registration user to
Amount needs to verify the fusion vocal print feature vector and registration voice print database of the verifying voice of voice when carrying out Application on Voiceprint Recognition
The vocal print feature vector that user is respectively registered in library is compared, this allows for application on voiceprint recognition equipment needs and is largely calculated.
In consideration of it, the vocal print feature vector association of each registration user in default registration voice print database can be got up, specifically, can
It, will be pre- to register the similarity between the vocal print feature vector of user by any two in the default registration voice print database of calculating
If the vocal print feature vector association of each registration user in registration voice print database is got up.This is in the fusion sound for calculating verifying voice
It, can be according to testing when the similarity of the vocal print feature vector of line feature vector and a certain registration user in registration voice print database
The similarity demonstrate,proved between the fusion vocal print feature vector of voice and the vocal print feature vector of a certain registration user is screened, with row
Except the vocal print feature vector of other lower registration users of vocal print feature vector similarity with a certain registration user, thus may be used
To reduce the calculation amount of application on voiceprint recognition equipment.
Step S50: being based on the similarity, determines the Application on Voiceprint Recognition result of the verifying voice.
In the present embodiment, the Application on Voiceprint Recognition of voice is verified the result is that closing based on the size between similarity and preset threshold
System's determination, that is, verify the similarity between the fusion vocal print feature vector of voice and the vocal print feature vector of a certain registration user
When equal to or more than preset threshold, then Application on Voiceprint Recognition success is determined;The fusion vocal print feature vector of verifying voice and each registration
When similarity between the vocal print feature vector of user is less than preset threshold, then judge that Application on Voiceprint Recognition fails.
It should be noted that if having vocal print feature vector and the verifying of multiple registration users in default registration voice print database
When the similarity of the fusion vocal print feature vector of voice is more than preset threshold, at this point, determining that the vocal print of multiple registration users is special
Vocal print feature vector is merged with the similarity highest that merge vocal print feature vector of verifying voice with voice is verified in sign vector
Match.
The present invention extracts the first vocal print feature of verifying voice by GMM-UBM model from verifying voice, passes through nerve
Network model extracts the second vocal print feature of verifying voice from verifying voice;The first vocal print feature and second of voice will be verified
Vocal print feature is merged, and the fusion vocal print feature vector of voice is verified;Calculate verifying voice fusion feature vocal print to
The similarity between the vocal print feature vector of user is respectively registered in amount and default voice print database;Based on similarity, decision verification
The Application on Voiceprint Recognition result of voice.By the above-mentioned means, GMM-UBM model and neural network model are combined, two models point
It is other that feature is extracted to verifying voice, while two extracted features of model are used to carry out speech verification, compared to single mould
For type extracts the feature of verifying voice and carries out speech verification, the information that the feature of two model extractions is included is more complete
Face thus can comprehensively be verified verifying voice and registration voice, so that the accuracy rate of Application on Voiceprint Recognition obtains
It improves.
It is the flow diagram of another embodiment of method for recognizing sound-groove of the present invention referring to Fig. 3, Fig. 3.Based on the above embodiment,
In the present embodiment, further comprising the steps of before step S10:
Step S100: the registration voice of registration user is obtained;
In the present embodiment, the sound that the user that voice is registered by needs issues is registered, the acquisition modes of voice are registered
There are many kinds of, such as the sound that unregistered user is issued obtained by microphone, the sound that microphone will acquire is sent to
The processing terminal of Application on Voiceprint Recognition;For another example the registration voice of the user obtained by intelligent terminal (mobile phone, plate etc.), intelligent terminal
The registration voice that will acquire is sent to the processing terminal of application on voiceprint recognition equipment;Certainly, registration voice can also use other equipment
It obtains, is herein just not listed one by one.
It is worth noting that, verifying mark of the registration voice as user when Voiceprint Recognition System can register customers as
Standard registers the quality of voice quality, directly influences the accuracy rate of Application on Voiceprint Recognition.In order to improve the accuracy rate of Application on Voiceprint Recognition, also
Registration voice can be screened, when obtaining registration voice to reject second-rate registration voice.Specifically, obtaining
When taking registration voice, the duration to registration voice and the volume of registration voice it can also detect simultaneously, if registration voice
Duration is greater than or equal to default voice duration, then prompts to obtain registration voice success, if the duration of registration voice is less than default language
Sound duration then prompts to obtain registration voice failure.So set, ensure that the quality of registration voice, also ensure that from registration
Extracted in voice be characterized in it is obvious, clearly, to be conducive to improve the accuracy rate of Application on Voiceprint Recognition.
Step S110: using the third vocal print feature for registering voice described in GMM-UBM model extraction, using neural network mould
Type extracts the falling tone line feature of the registration voice;
In the present embodiment, GMM-UBM model and neural network model extract feature from registration voice simultaneously, due to GMM-
UBM model and neural network model are two different models, therefore when two models extract vocal print feature from registration voice,
Identical vocal print feature may be extracted, it is also possible to extract different vocal print features, it is also possible to the identical vocal print of extraction unit point
Feature, the specific restriction that just do not do herein.Preferably, GMM-UBM model and neural network model are extracted from registration voice
Different vocal print features, for example, GMM-UBM model from include in the third vocal print feature extracted in registration voice tone color, frequency,
Multiple subcharacters such as amplitude, volume, neural network model include fundamental frequency, plum from the falling tone line feature extracted in registration voice
Your frequency cepstral coefficient, formant, fundamental tone, reflection coefficient etc. multiple subcharacters.
It is worth noting that, GMM-UBM model and neural network model the extraction sound in the same acoustic segment of registration voice
Line feature, GMM-UBM model and neural network model can also extract vocal print feature in the alternative sounds section of registration voice,
GMM-UBM model and neural network model can also extract vocal print feature in the partly overlapping acoustic segment of registration voice,
This does not do specific restriction.
It should be noted that the subcharacter phase that the subcharacter that third vocal print feature is included is included with the first vocal print feature
Together, the subcharacter that falling tone line feature is included is identical as the subcharacter that the second vocal print feature is included.
Step S120: the third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtained
The fusion vocal print feature vector of the registration voice;
In the present embodiment, the fusion vocal print feature vector of voice is registered by the third vocal print feature of registration voice and the 4th
Vocal print feature merges to obtain, and there are many kinds of the amalgamation modes of third vocal print feature and falling tone line feature, such as third vocal print is special
Falling tone line feature of seeking peace merges the fusion vocal print feature vector to form verifying voice by way of being overlapped mutually, for another example third
Vocal print feature and falling tone line feature merge the fusion vocal print feature to form verifying voice in such a way that part subcharacter is superimposed
Vector.Certainly, the third vocal print feature and falling tone line feature for registering voice can also be merged using other modes, herein
It is not listed one by one.
Step S130: the fusion vocal print feature vector of the registration voice is saved in the registration voice print database,
Using the vocal print feature vector as registration user.
In the present embodiment, it is equipped with registration voice print database in the data storage module of application on voiceprint recognition equipment, registers voice
Fusion vocal print feature vector be stored in registration voice print database, the registration voice print database storage registration voice fusion sound
When line feature vector, can by the fusion vocal print feature vector for registering voice classify store, such as according to similarity come into
The fusion vocal print feature vector of the higher multiple registration voices of similarity is stored in a subset by row classification storage, more
A subset composition registration voice print database.For another example classification storage is carried out according to gender, i.e., male is registered to the registration voice of user
Fusion vocal print feature vector sum women register the fusion vocal print feature vector of registration voice of user and be stored separately.Certainly, it infuses
The fusion feature vector of volume voice can also be stored using other modes, be just not listed one by one herein.
It is the refinement flow diagram of mono- embodiment of step S20 in Fig. 2 referring to Fig. 4, Fig. 4.Based on the above embodiment, this reality
It applies in example, step S20 includes:
Step S210: preemphasis, framing and adding window are carried out to the verifying voice and pre-processed;
Preemphasis: since the average power spectra of voice signal is influenced by glottal excitation and mouth and nose radiation, high frequency multiplication is about
Fall in 800Hz or more by 6dB/ frequency multiplication, institute is in the hope of speech signal spec-trum, and frequency is higher, and corresponding ingredient is smaller, high frequency section
Frequency spectrum it is also more hard to find, to carry out preemphasis processing thus.Its purpose is to promote high frequency section, the frequency spectrum of signal is made to become flat
It is smooth, low frequency is maintained at into the entire frequency band of high frequency, can seek frequency spectrum with same signal-to-noise ratio.Preemphasis is generally in voice signal number
After word, and preemphasis filter is single order, and the way of realization of filter: H (z)=1-u*z-1, wherein u generally exists
Between (0.9,1).
Framing, adding window: it since voice signal has short-term stationarity, needs to divide voice signal after the completion of pretreatment
Frame, windowing process, convenient for being handled with short time analysis technique voice signal.Under normal conditions, the frame number of each second is about
33~100 frames, the method that contiguous segmentation had both can be used in framing, the method that overlapping segmentation can also be used, but the latter can make frame with
It is seamlessly transitted between frame, keeps its continuity.The overlapping part of former frame and a later frame is known as frame shifting, and frame moves and the ratio of frame length
Generally it is taken as (0~1/2).Voice signal is intercepted into i.e. framing with the window of removable finite length on one side, is generallyd use
Window function have rectangular window (Rectangular), Hamming window (Hamming) and Hanning window (Hanning) etc..
Voice signal will extract characteristic parameter, the selection of characteristic parameter should meet several principles after pretreatment:
First, it is easy to extract characteristic parameter from voice signal;Second, it is not easy to be imitated;Third, not at any time and spatial variations,
With opposite stability;4th, it can effectively identify different speakers.Speaker identification system relies primarily on voice at present
Low level acoustic feature identified that these features can be divided into temporal signatures and transform domain feature.
Step S220: mel-frequency cepstrum coefficient, linear prediction cepstrum coefficient are extracted from the pretreated verifying voice
The first-order difference of coefficient, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, with
To the first vocal print feature of the verifying voice;
Specific step is as follows for the extraction of mel-frequency cepstrum coefficient:
(1) for treated, voice signal carries out Short Time Fourier Transform, obtains its frequency spectrum.Here using in quick Fu
Leaf transformation FFT to carry out discrete cosine transform to each frame voice signal.It will first be mended after each frame time-domain signal x (n) several
A 0 with formation length is the sequence of N, then carries out Fast Fourier Transform (FFT) to it, finally obtains linear spectral X (k).X (k) with
Conversion formula between x (n) are as follows:
(2) frequency spectrum X (k) is gone square to acquire energy spectrum, then carry out smooth by Mel frequency filter and eliminated humorous
Wave obtains corresponding Mel frequency spectrum.Wherein Mel frequency filter group is the masking effect according to sound, in the spectral range of voice
Several triangular band pass wave filters H of interior settingm(k) (number that 0≤m≤M, M are filter), centre frequency f
(m), the interval between each f (m) is broadening with the increase of m value.
The transmission function of triangular band pass wave filter group can be indicated with following formula:
(3) the Mel spectrum of Mel filter group output is taken and logarithm is calculated as follows obtains log spectrum S (m), for compressing
The dynamic range of speech manual, and by the multiplying property conversion of noise in frequency domain at additivity ingredient.
(4) discrete cosine transform is carried out to log spectrum S (m), obtains the parameter of mel-frequency cepstrum coefficient (MFCC)
c(n)。
Wherein L is the order of MFCC parameter.
Specific step is as follows for normalized energy characteristic parameter extraction in short-term:
(1) frame { Si (n), n=1,2 ..., N } for giving the length N in voice segments, calculates the logarithm energy in short-term of the frame
The formula of amount is as follows;
Wherein L is the frame number of voice segments.
(2) since the energy difference of different phonetic section different speech frame is bigger, in order to can be in the cepstrum of front
Coefficient is calculated together as vector, needs to be normalized.
Wherein, Emax=maxE1, i.e., maximum logarithmic energy in voice segments.
Specific step is as follows for LPCC characteristic parameter extraction:
(1) solve linear prediction LPC: in linear prediction (LPC) analysis, channel model is expressed as the full pole mould of following formula
Type:
P is the order of lpc analysis, a in formulakIt is inverse filter for linear predictor coefficient (k=1,2 ..., p), A (z).LPC
Analysis be just to solve for linear predictor coefficient ak, the present invention, which uses, is based on (the i.e. Durbin calculation of autocorrelative Recursive Solution equation
Method).
(2) the cepstrum coefficient LPCC of LPC is sought: pretreated voice signal x (n) cepstrumIt is defined as the Z of x (n)
The logarithm transform of transformation, as
The mould for only considering X (z), ignores its phase, just obtains the cepstrum c (n) of signal are as follows:
C (n)=Z-1(log|X(z)|-jargX(z))
LPCC is not instead of by input signal x (n), by LPC coefficient anIt obtains.LPCC parameter CnRecurrence formula:
Dynamic feature coefficient: first-order difference, the first-order difference of linear prediction residue error, one of mel-frequency cepstrum coefficient
Specific step is as follows for the extraction of order difference energy parameter:
Previously described mel-frequency cepstrum coefficient, linear prediction residue error, energy feature parameter only characterize voice
The timely information of spectrum, belongs to static parameter.Experiment shows also to include letter related with speaker in the multidate information of speech manual
Breath can be used to improve the discrimination of Speaker Recognition System.
(1) multidate information of speech cepstrum is the rule for characterizing speech characteristic parameter and changing over time.Speech cepstrum is at any time
Between transformation can be expressed with following formula:
In formula, cmIndicate that m rank cepstrum coefficient, n and k indicate the serial number of cepstrum coefficient on a timeline.H (k) (k=-
K ,-k+1 ..., k-1, k) it is the window function that length is 2k+1, it is usually symmetrical.The coefficient of first order △ c of orthogonal polynomialm
(n) as above shown in formula.
(2) window function in practical application mostly uses rectangular window, and K usually takes 2, and dynamic parameter is known as present frame at this time
The linear combination of front cross frame and rear two frame parameter.So being fallen according to the available mel-frequency cepstrum coefficient of above formula, linear prediction
The first-order dynamic parameter of spectral coefficient, energy.
Specific step is as follows for the characteristic parameter extraction of Gamma tone filter cepstrum coefficient:
(1) Short Time Fourier Transform is carried out to pretreated voice signal, obtains its frequency spectrum.Here using in quick Fu
Leaf transformation FFT to carry out discrete cosine transform to each frame voice signal.It will first be mended after each frame time-domain signal x (n) several
A 0 with formation length is the sequence of N, then carries out Fast Fourier Transform (FFT) to it, finally obtains linear spectral X (k).X (k) with
Conversion formula between x (n) are as follows:
(2) Gamma tone filter group is obtained, Gamma tone filter is the cochlea auditory filter an of standard,
The time-domain pulse response of the filter are as follows:
G (t)=Atn-1e-2πbtcos(2πfi+φi) U (t), t >=0,1≤i≤N
In formula, A is filter gain, fiIt is the centre frequency of filter, U (t) is jump function, φiIt is phase, in order to
Simplified model enables φiIt is the order of filter for 0, n, experiment shows the filtering that can be good at simulating human ear cochlea when n=4
Feature.
btIt is the decay factor of filter, it determines the rate of decay of impulse response, and related with the bandwidth of filter,
bt=1.019ERB (fi), in psychoacoustics,
In formula, N is the number of filter, and the centre frequency of each filter group equidistantly distributed on the domain ERB is entire to filter
The frequency coverage of device group is 80Hz-8000Hz, and the calculation formula of each centre frequency is as follows:
Wherein fHFor filter cutoff frequency, viIt is filter overlap factor, is used to specify between adjacent filter and is overlapped hundred
Divide ratio.After each filter centre frequency determines, corresponding bandwidth can be obtained by above formula.
(3) Gamma tone filter group filters.The power spectrum X (k) obtained to step (1) is squared to obtain capacity spectrum,
Then Gamma tone filtering group G is usedm(k) it is filtered.Log spectrum S (m) is obtained, for compressing the dynamic of speech manual
Range, and by the multiplying property conversion of noise in frequency domain at additivity ingredient.Wherein the calculation formula of S (m) is as follows:
(4) discrete cosine transform is carried out to log spectrum S (m), obtains the spy of Gamma tone filter cepstrum coefficient
It levies parameter G (n), G (n) calculation formula is as follows:
It is the refinement flow diagram of another embodiment of step S20 in Fig. 2 referring to Fig. 5, Fig. 5.In the present embodiment, above-mentioned step
Suddenly S20 includes:
Step S210 ': the verifying voice is arranged in the sound spectrograph of predetermined number of latitude;
Specifically, it can be spaced the feature vector that predetermined latitude is extracted from verifying voice at every predetermined time, it will
Verifying voice is arranged in the sound spectrograph of predetermined number of latitude.
Wherein, above-mentioned predetermined number of latitude, predetermined latitude and scheduled time interval can in specific implementation according to demand and/
Or the sets itselfs such as system performance, the present embodiment to the size of above-mentioned predetermined number of latitude, predetermined latitude and scheduled time interval not
It limits.
Step S220 ': it is identified by sound spectrograph of the neural network to the predetermined number of latitude, obtains the verifying voice
The second vocal print feature.
Verifying voice is arranged in the sound spectrograph of predetermined number of latitude, is then composed by language of the neural network model to predetermined number of latitude
Figure is identified, the second vocal print feature of verifying voice is obtained, and extracts verifying language by neural network model so as to realize
Second vocal print feature of sound can preferably characterize the acoustic feature in voice, improve the accuracy rate of speech recognition.
It is worth noting that, the two is when carrying out the first vocal print feature and the second vocal print feature extraction to verifying voice
It is non-interfering, that is to say, that above-mentioned steps S210, step S220 are mutually indepedent relative to step S210 ', step S220 '
It carries out, and is sequence in no particular order between step S210, step S220 and step S210 ', step S220 '.
Further, in one embodiment of method for recognizing sound-groove of the present invention, above-mentioned steps S30 is specifically included:
The first vocal print feature dimension and the second vocal print characteristic dimension are carried out using Markov Chain Monte Carlo stochastic model
Fusion, obtain it is described verifying voice fusion vocal print feature vector.
In the present embodiment, Markov Chain Monte Carlo stochastic model obtains from the first vocal print feature multiple respectively at random
Feature, obtains multiple features from the second vocal print feature, then by the multiple features obtained from the first vocal print feature and from second
The multiple Fusion Features obtained in vocal print feature are verified the fusion vocal print feature vector of voice.
For example, Markov Chain Monte Carlo stochastic model extracts 10 from 15 features in the first vocal print feature at random
A feature extracts 15 features from 20 features of the second vocal print feature, 25 vocal print features can be obtained after fusion
Along the fusion vocal print feature vector of voice.
It is the refinement flow diagram of mono- embodiment of step S30 in Fig. 2 referring to Fig. 6, Fig. 6.In the present embodiment, described
One vocal print feature includes multiple first vocal print subcharacters, and second vocal print feature includes multiple second vocal print subcharacters;
Based on the above embodiment, in the present embodiment, above-mentioned steps S30 includes:
Step S310: the fusion feature vocal print total characteristic number of verifying voice is set as K;
Step S320: the fusion vocal print feature total characteristic according to the verifying voice is K, determines the using direct sampling method
One vocal print subcharacter and the second vocal print subcharacter integration percentage;
Step S330: according to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, utilize MCMC's
The sampling process of Gibbs sampling simulation joint normal distribution determines the first vocal print that first vocal print feature is chosen respectively
Second vocal print subcharacter of feature and the second vocal print Feature Selection, form it is described verifying voice fusion vocal print feature to
Amount.
Further, step 320 specifically includes:
Step A: the random number generated between one [0,1] represents the first vocal print subcharacter as parameter p, parameter p
The shared ratio in the fusion vocal print feature of the verifying voice;
Step B: the initial value k=0 for recording the counter of the number of iterations is initialized;
Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one
The quantity of the second vocal print subcharacter, the second vocal print subcharacter adds 1, as q > p, chooses first vocal print
The quantity of subcharacter, the first vocal print subcharacter adds 1;
Step D:k value increases by 1, judges whether k≤K, if it is counts the fusion feature wait be selected into the verifying voice
First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process;It is no
Then, step C in return.
Assuming that the total latitude number K=8, the parameter p=generated at random of the fusion vocal print feature vector of the verifying voice of setting
0.4, the number A=3 of the first vocal print subcharacter to be selected in, the second vocal print subcharacter are obtained by the iteration of 8 above process
Number B=5, then subsequent specific features choose during to choose 3 the first vocal print subcharacters and 5 the second vocal prints
Subcharacter.
Further, step 330 specifically includes:
Step E: transfer number threshold value is set as T, initializes transfer number t=0;
Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generates M
Random number between a [0,1] is as original state
X (0)=[x1(0), x2(0)…xM(0)];
Step G: transfer number t every increase by 1, to each variable xi(t), i ∈ { 1,2 ... M }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(xi(t+1)|x1(t+1), x2(t+1)…xi-1(0), xi+1(t)…xM(t)),
Wherein, the mean value of joint probability distribution is X;Judge whether t < T, if it is return step G, otherwise obtains
P (T)=[P (x1(T)), P (x2(T)) ... P (xi(T)) ... P (xM(T))];
Step H: first according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Vocal print subcharacter is number A, A corresponding probability P x before choosingi(T) maximum first vocal print subcharacter is as selected verifying voice
Fusion vocal print feature vector the first vocal print subcharacter;
Step I: transfer number threshold value is set as T, initializes transfer number t=0;
Step J: verifying the number of feature in the fusion vocal print feature vector of voice, be recorded as N described in statistics gatherer, generates
Random number between N number of [0,1] is as original state
Y (0)=[y1(0), y2(0)…yN(0)];
Step K: transfer number t every increase by 1, to each variable yj(t), j ∈ { 1,2 ... N }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(yi(t+1)|y1(t+1), y2(t+1)…yj-1(0), yj+1(t)…yN(t)),
Wherein, the mean value of joint probability distribution is Y;
Judge whether otherwise t < T is obtained if so, thening follow the steps K
P (T)=[P (y1(T)), P (y2(T)) ... P (yj(T)) ... P (yN(T))];
Step L: according to the second vocal print of the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Subcharacter is number B, B corresponding probability P y before choosingj(T) maximum second vocal print subcharacter melting as selected verifying voice
Second vocal print subcharacter of chorus line feature vector.
If the second vocal print subcharacter totally 5 in the verifying voice fusion vocal print feature vector acquired in upper step, in step D
X in calculated the present embodiment0(0)=[0.2,0.3,0.4,0.5,0.6];When t=0, according to Pxi(t+1)=[x1(t+1),
x2(t+1) ... Xi-1(t+1), xi+1(t+1)...xM(t+1)] Px is successively obtained1(1)、Px2(1)、Px3(1)、Px4(1)、Px5
(1), it is assumed that Px is calculatedi(1)=[0.5,0.6,0.2,0.8,0.1].It circuits sequentially, until reaching predetermined transfer number,
T=50 in the present embodiment, is calculated Pxi(50), it is assumed that Px is calculatedi(50)=[0.6,0.2,0.5,0.8,0.9], then
Two behavioural characteristics for choosing corresponding maximum probability are added verifying voice and merge vocal print feature vector.
In addition, the present invention also provides a kind of voice print identification devices.
It is the functional block diagram of one embodiment of voice print identification device of the present invention referring to Fig. 7, Fig. 7.
In the present embodiment, the voice print identification device includes:
Data acquisition module 10, for obtaining verifying voice to be identified;
Data processing module 20, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using nerve
Network model extracts the second vocal print feature of the verifying voice;
Data fusion module 30, for the first vocal print feature of the verifying voice and the second vocal print feature to be carried out feature
Fusion obtains the fusion vocal print feature vector of the verifying voice;
Data comparison module 40, for calculating the fusion vocal print feature vector and default registration vocal print number of the verifying voice
According to the similarity between the vocal print feature vector for respectively registering user in library;
Data judgment module 50 determines the Application on Voiceprint Recognition result of the verifying voice for being based on the similarity.
Further, data acquisition module 10 is also used to obtain the registration voice of registration user;Data processing module 20 is also
For extracting the note using neural network model using the third vocal print feature for registering voice described in GMM-UBM model extraction
The falling tone line feature of volume voice;Data fusion module 30 is also used to the third vocal print feature and the falling tone of the registration voice
Line feature carries out Fusion Features, obtains the fusion vocal print feature vector of the registration voice;
The voice print identification device further includes data memory module 60, for by it is described registration voice fusion vocal print feature
Vector is saved in the registration voice print database, using the vocal print feature vector as registration user.
Further, the data processing module 20 further include:
First pretreatment unit 201 is pre-processed for carrying out preemphasis, framing and adding window to the verifying voice;
First extraction unit 202, for extracting pitch period from the pretreated verifying voice, linear prediction is fallen
Spectral coefficient, the first-order difference of linear prediction residue error, energy, the first-order difference of energy and Gamma tone filter cepstrum
The characteristic parameter of coefficient obtains the first vocal print feature of the verifying voice;
Second pretreatment unit 203, for the verifying voice to be arranged in the sound spectrograph of predetermined number of latitude;
Second extraction unit 202 obtains institute for identifying by sound spectrograph of the neural network to the predetermined number of latitude
State the second vocal print feature of verifying voice.
Further, the data fusion module 30 includes:
Data fusion unit 301, for carrying out the first vocal print feature dimension using Markov Chain Monte Carlo stochastic model
The fusion of degree and the second vocal print characteristic dimension obtains the fusion vocal print feature vector of the verifying voice.
Further, data fusion unit 301 includes:
Subelement 3011 is set, for setting the fusion feature vocal print total characteristic number of verifying voice as K;
It determines subelement 3012, for being K according to the fusion vocal print feature total characteristic of the verifying voice, utilizes directly pumping
Sample method determines the first vocal print subcharacter and the second vocal print subcharacter integration percentage;
Subelement 3013 is merged, for the integration percentage according to the first vocal print subcharacter and the second vocal print subcharacter, is utilized
The sampling process of the Gibbs sampling simulation joint normal distribution of MCMC, determines first vocal print feature is chosen first respectively
Second vocal print subcharacter of vocal print subcharacter and the second vocal print Feature Selection, the fusion vocal print for forming the verifying voice are special
Levy vector.
Further, the determining subelement 3012 is used for:
Step A: the random number generated between one [0,1] represents the first vocal print subcharacter as parameter p, parameter p
The shared ratio in the fusion vocal print feature of the verifying voice;
Step B: the initial value k=0 for recording the counter of the number of iterations is initialized;
Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one
The quantity of the second vocal print subcharacter, the second vocal print subcharacter adds 1, as q > p, chooses first vocal print
The quantity of subcharacter, the first vocal print subcharacter adds 1;
Step D:k value increases by 1, judges whether k≤K, if it is counts the fusion feature wait be selected into the verifying voice
First vocal print subcharacter of vocal print vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process;It is no
Then, step C in return.
Further, the fusion subelement 3013 is used for:
Step E: transfer number threshold value is set as T, initializes transfer number t=0;
Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generates M
Random number between a [0,1] is as original state
X (0)=[x1(0), x2(0)…xM(0)];
Step G: transfer number t every increase by 1, to each variable xi(t), i ∈ { 1,2 ... M }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(xi(t+1)|x1(t+1), x2(t+1)…xi-1(0), xi+1(t)…xM(t)),
Wherein, the mean value of joint probability distribution is X;Judge whether t < T, if it is return step G, otherwise obtains
P (T)=[P (x1(T)), P (x2(T)) ... P (xi(T)) ... P (xM(T))];
Step H: first according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Vocal print subcharacter is number A, A corresponding probability P x before choosingi(T) maximum first vocal print subcharacter is as selected verifying voice
Fusion vocal print feature vector the first vocal print subcharacter;
Step I: transfer number threshold value is set as T, initializes transfer number t=0;
Step J: verifying the number of feature in the fusion vocal print feature vector of voice, be recorded as N described in statistics gatherer, generates
Random number between N number of [0,1] is as original state
Y (0)=[y1(0), y2(0)…yN(0)];
Step K: transfer number t every increase by 1, to each variable yj(t), j ∈ { 1,2 ... N }, by below by joint probability point
The conditional probability distribution formula that cloth obtains is calculated as follows:
P(yi(t+1)|y1(t+1), y2(t+1)…yj-1(0), yj+1(t)…yN(t)),
Wherein, the mean value of joint probability distribution is Y;
Judge whether otherwise t < T is obtained if so, thening follow the steps K
P (T)=[P (y1(T)), P (y2(T)) ... P (yj(T)) ... P (yN(T))];
Step L: according to the second vocal print of the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Subcharacter is number B, B corresponding probability P y before choosingj(T) maximum second vocal print subcharacter melting as selected verifying voice
Second vocal print subcharacter of chorus line feature vector.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium.
Application on Voiceprint Recognition program is stored on computer readable storage medium of the present invention, wherein the Application on Voiceprint Recognition program is located
When managing device execution, realize such as the step of above-mentioned method for recognizing sound-groove.
Wherein, Application on Voiceprint Recognition program, which is performed realized method, can refer to each reality of method for recognizing sound-groove of the present invention
Example is applied, details are not described herein again.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set
It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly
Other related technical areas are used in, all of these belong to the protection of the present invention.
Claims (10)
1. a kind of method for recognizing sound-groove, which is characterized in that the method for recognizing sound-groove the following steps are included:
Obtain verifying voice to be identified;
Using described in GMM-UBM model extraction verify voice the first vocal print feature, using neural network model extract described in test
Demonstrate,prove the second vocal print feature of voice;
First vocal print feature of the verifying voice and the second vocal print feature are subjected to Fusion Features, are verified the fusion of voice
Vocal print feature vector;
Calculate the vocal print that user is respectively registered in the fusion vocal print feature vector and default registration voice print database of the verifying voice
Similarity between feature vector;
Based on the similarity, the Application on Voiceprint Recognition result of the verifying voice is determined.
2. method for recognizing sound-groove as described in claim 1, which is characterized in that it is described obtain verifying voice to be identified it
Before, further includes:
Obtain the registration voice of registration user;
Using the third vocal print feature for registering voice described in GMM-UBM model extraction, the note is extracted using neural network model
The falling tone line feature of volume voice;
The third vocal print feature of the registration voice and falling tone line feature are subjected to Fusion Features, obtain the registration voice
Merge vocal print feature vector;
By it is described registration voice fusion vocal print feature vector be saved in the registration voice print database, using as register user
Vocal print feature vector.
3. method for recognizing sound-groove as described in claim 1, which is characterized in that described use is tested described in GMM-UBM model extraction
Card voice the first vocal print feature include:
Preemphasis, framing and adding window pretreatment are carried out to the verifying voice;
Pitch period, linear prediction residue error, linear prediction residue error are extracted from the pretreated verifying voice
First-order difference, energy, the first-order difference of energy and Gamma tone filter cepstrum coefficient characteristic parameter, obtain described
Verify the first vocal print feature of voice;
It is described using neural network model extract it is described verifying voice the second vocal print feature include:
The verifying voice is arranged in the sound spectrograph of predetermined number of latitude;
It is identified by sound spectrograph of the neural network to the predetermined number of latitude, the second vocal print for obtaining the verifying voice is special
Sign.
4. method for recognizing sound-groove as described in claim 1, which is characterized in that first vocal print by the verifying voice is special
Sign carries out Fusion Features with the second vocal print feature, and the fusion vocal print feature vector for obtaining the verifying voice includes:
Melting for the first vocal print feature dimension and the second vocal print characteristic dimension is carried out using Markov Chain Monte Carlo stochastic model
It closes, obtains the fusion vocal print feature vector of the verifying voice.
5. method for recognizing sound-groove as claimed in claim 4, which is characterized in that first vocal print feature includes multiple first sound
Line subcharacter, second vocal print feature include multiple second vocal print subcharacters;
It is described to carry out the first vocal print feature dimension and the second vocal print characteristic dimension using Markov Chain Monte Carlo stochastic model
Fusion, obtain it is described verifying voice fusion vocal print feature vector include:
The fusion feature vocal print total characteristic number of verifying voice is set as K;
Fusion vocal print feature total characteristic according to the verifying voice is K, determines the first vocal print subcharacter using direct sampling method
And the second vocal print subcharacter integration percentage;
According to the integration percentage of the first vocal print subcharacter and the second vocal print subcharacter, the Gibbs sampling simulation joint of MCMC is utilized
The sampling process of normal distribution determines the first vocal print subcharacter and second vocal print that first vocal print feature is chosen respectively
Second vocal print subcharacter of Feature Selection forms the fusion vocal print feature vector of the verifying voice.
6. method for recognizing sound-groove as claimed in claim 5, which is characterized in that the fusion vocal print according to the verifying voice
Feature total characteristic is K, determines the first vocal print subcharacter and the second vocal print subcharacter integration percentage packet using direct sampling method
It includes:
Step A: the random number generated between one [0,1] represents the first vocal print subcharacter in institute as parameter p, parameter p
State ratio shared in the fusion vocal print feature of verifying voice;
Step B: the initial value k=0 for recording the counter of the number of iterations is initialized;
Step C: generating the random number q between one [0,1], and be compared with parameter p, as q < p, chooses one described the
The quantity of two vocal print subcharacters, the second vocal print subcharacter adds 1, as q > p, chooses the first vocal print subcharacter,
The quantity of the first vocal print subcharacter adds 1;
Step D:k value increases by 1, judges whether k≤K, if it is counts the fusion feature vocal print wait be selected into the verifying voice
First vocal print subcharacter of vector and the number of the second vocal print subcharacter are recorded as A and B respectively, terminate sampling process;Otherwise,
Return step C.
7. method for recognizing sound-groove as claimed in claim 6, which is characterized in that described according to the first vocal print subcharacter and the rising tone
The integration percentage of line subcharacter, using MCMC Gibbs sampling simulation joint normal distribution sampling process, respectively determine described in
Second vocal print subcharacter of the first vocal print subcharacter and the second vocal print Feature Selection that the first vocal print feature is chosen, forms institute
State verifying voice fusion vocal print feature vector include:
Step E: transfer number threshold value is set as T, initializes transfer number t=0;
Step F: the number of feature in the fusion vocal print feature vector of the verifying voice of statistics gatherer is recorded as M, generate M it is a [0,
1] random number between is as original state
X (0)=[x1(0), x2(0)…xM(0)];
Step G: transfer number t every increase by 1, to each variable xi(t), i ∈ { 1,2 ... M }, by being obtained below by joint probability distribution
To conditional probability distribution formula calculated as follows:
P(xi(t+1)|x1(t+1),x2(t+1)…xi-1(0),xi+1(t)…xM(t)),
Wherein, the mean value of joint probability distribution is X;Judge whether t < T, if it is return step G, otherwise obtains
P (T)=[P (x1(T)),P(x2(T)),…P(xi(T)),…P(xM(T))];
Step H: according to the first vocal print described in the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D
Subcharacter is number A, A corresponding probability P x before choosingi(T) maximum first vocal print subcharacter melting as selected verifying voice
First vocal print subcharacter of chorus line feature vector;
Step I: transfer number threshold value is set as T, initializes transfer number t=0;
Step J: verifying the number of feature in the fusion vocal print feature vector of voice described in statistics gatherer, is recorded as N, generates N number of
[0,1] random number between is as original state
Y (0)=[y1(0), y2(0)…yN(0)];
Step K: transfer number t every increase by 1, to each variable yj(t), j ∈ { 1,2 ... N }, by being obtained below by joint probability distribution
To conditional probability distribution formula calculated as follows:
P(yi(t+1)|y1(t+1),y2(t+1)…yj-1(0),yj+1(t)…yN(t)),
Wherein, the mean value of joint probability distribution is Y;
Judge whether t < T, if so, thening follow the steps K, otherwise obtains
P (T)=[P (y1(T)),P(y2(T)),…P(yj(T)),…P(yN(T))];
Step L: the second vocal print according to the fusion vocal print feature vector wait be selected into the verifying voice calculated in step D is special
Sign is number B, B corresponding probability P y before choosingj(T) fusion sound of the maximum second vocal print subcharacter as selected verifying voice
Second vocal print subcharacter of line feature vector.
8. a kind of voice print identification device, which is characterized in that the voice print identification device includes:
Data acquisition module, for obtaining verifying voice to be identified;
Data processing module, using the first vocal print feature for verifying voice described in GMM-UBM model extraction, using neural network mould
Type extracts the second vocal print feature of the verifying voice;
Data fusion module, for the first vocal print feature of the verifying voice and the second vocal print feature to be carried out Fusion Features,
Obtain the fusion vocal print feature vector of the verifying voice;
Data comparison module, for calculating the fusion vocal print feature vector of the verifying voice and presetting in registration voice print database
Similarity between the vocal print feature vector of each registration user;
Data judgment module determines the Application on Voiceprint Recognition result of the verifying voice for being based on the similarity.
9. a kind of application on voiceprint recognition equipment, which is characterized in that the application on voiceprint recognition equipment includes processor, memory and is stored in
On the memory and the Application on Voiceprint Recognition program that can be executed by the processor, the Application on Voiceprint Recognition program are held by the processor
The step of method for recognizing sound-groove as described in any one of claims 1 to 7 is realized when row.
10. a kind of computer readable storage medium, which is characterized in that be stored with vocal print knowledge on the computer readable storage medium
Other program realizes the Application on Voiceprint Recognition as described in any one of claims 1 to 7 when the Application on Voiceprint Recognition program is executed by processor
The step of method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910182453.3A CN110047490A (en) | 2019-03-12 | 2019-03-12 | Method for recognizing sound-groove, device, equipment and computer readable storage medium |
PCT/CN2019/118656 WO2020181824A1 (en) | 2019-03-12 | 2019-11-15 | Voiceprint recognition method, apparatus and device, and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910182453.3A CN110047490A (en) | 2019-03-12 | 2019-03-12 | Method for recognizing sound-groove, device, equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110047490A true CN110047490A (en) | 2019-07-23 |
Family
ID=67274752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910182453.3A Pending CN110047490A (en) | 2019-03-12 | 2019-03-12 | Method for recognizing sound-groove, device, equipment and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110047490A (en) |
WO (1) | WO2020181824A1 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110517698A (en) * | 2019-09-05 | 2019-11-29 | 科大讯飞股份有限公司 | A kind of determination method, apparatus, equipment and the storage medium of sound-groove model |
CN110556126A (en) * | 2019-09-16 | 2019-12-10 | 平安科技(深圳)有限公司 | Voice recognition method and device and computer equipment |
CN110838294A (en) * | 2019-11-11 | 2020-02-25 | 效生软件科技(上海)有限公司 | Voice verification method and device, computer equipment and storage medium |
CN110880321A (en) * | 2019-10-18 | 2020-03-13 | 平安科技(深圳)有限公司 | Intelligent braking method, device and equipment based on voice and storage medium |
CN111370003A (en) * | 2020-02-27 | 2020-07-03 | 杭州雄迈集成电路技术股份有限公司 | Voiceprint comparison method based on twin neural network |
CN111552832A (en) * | 2020-04-01 | 2020-08-18 | 深圳壹账通智能科技有限公司 | Risk user identification method and device based on voiceprint features and associated map data |
WO2020181824A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Voiceprint recognition method, apparatus and device, and computer-readable storage medium |
CN112185344A (en) * | 2020-09-27 | 2021-01-05 | 北京捷通华声科技股份有限公司 | Voice interaction method and device, computer readable storage medium and processor |
CN112382300A (en) * | 2020-12-14 | 2021-02-19 | 北京远鉴信息技术有限公司 | Voiceprint identification method, model training method, device, equipment and storage medium |
CN112614493A (en) * | 2020-12-04 | 2021-04-06 | 珠海格力电器股份有限公司 | Voiceprint recognition method, system, storage medium and electronic device |
CN112687274A (en) * | 2019-10-17 | 2021-04-20 | 北京猎户星空科技有限公司 | Voice information processing method, device, equipment and medium |
CN114512134A (en) * | 2020-11-17 | 2022-05-17 | 阿里巴巴集团控股有限公司 | Method and device for voiceprint information extraction, model training and voiceprint recognition |
CN115022087A (en) * | 2022-07-20 | 2022-09-06 | 中国工商银行股份有限公司 | Voice recognition verification processing method and device |
CN115019804A (en) * | 2022-08-03 | 2022-09-06 | 北京惠朗时代科技有限公司 | Multi-verification type voiceprint recognition method and system for multi-employee intensive sign-in |
WO2022233239A1 (en) * | 2021-05-07 | 2022-11-10 | 华为技术有限公司 | Upgrading method and apparatus, and electronic device |
CN115831152A (en) * | 2022-11-28 | 2023-03-21 | 国网山东省电力公司应急管理中心 | Sound monitoring device and method for monitoring running state of generator of emergency equipment in real time |
CN116386647A (en) * | 2023-05-26 | 2023-07-04 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
CN118522288A (en) * | 2024-07-24 | 2024-08-20 | 山东第一医科大学附属省立医院(山东省立医院) | Voiceprint recognition-based otorhinolaryngological patient identity verification method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1567431A (en) * | 2003-07-10 | 2005-01-19 | 上海优浪信息科技有限公司 | Method and system for identifying status of speaker |
US20080010065A1 (en) * | 2006-06-05 | 2008-01-10 | Harry Bratt | Method and apparatus for speaker recognition |
CN103440873A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Music recommendation method based on similarities |
CN103745002A (en) * | 2014-01-24 | 2014-04-23 | 中国科学院信息工程研究所 | Method and system for recognizing hidden paid posters on basis of fusion of behavior characteristic and content characteristic |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
CN105575394A (en) * | 2016-01-04 | 2016-05-11 | 北京时代瑞朗科技有限公司 | Voiceprint identification method based on global change space and deep learning hybrid modeling |
CN106710589A (en) * | 2016-12-28 | 2017-05-24 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based speech feature extraction method and device |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN106887225A (en) * | 2017-03-21 | 2017-06-23 | 百度在线网络技术(北京)有限公司 | Acoustic feature extracting method, device and terminal device based on convolutional neural networks |
CN108417217A (en) * | 2018-01-11 | 2018-08-17 | 苏州思必驰信息科技有限公司 | Speaker Identification network model training method, method for distinguishing speek person and system |
CN109102812A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | A kind of method for recognizing sound-groove, system and electronic equipment |
CN109767790A (en) * | 2019-02-28 | 2019-05-17 | 中国传媒大学 | A kind of speech-emotion recognition method and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104900235B (en) * | 2015-05-25 | 2019-05-28 | 重庆大学 | Method for recognizing sound-groove based on pitch period composite character parameter |
US10008209B1 (en) * | 2015-09-25 | 2018-06-26 | Educational Testing Service | Computer-implemented systems and methods for speaker recognition using a neural network |
CN109147797B (en) * | 2018-10-18 | 2024-05-07 | 平安科技(深圳)有限公司 | Customer service method, device, computer equipment and storage medium based on voiceprint recognition |
CN110047490A (en) * | 2019-03-12 | 2019-07-23 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, equipment and computer readable storage medium |
-
2019
- 2019-03-12 CN CN201910182453.3A patent/CN110047490A/en active Pending
- 2019-11-15 WO PCT/CN2019/118656 patent/WO2020181824A1/en active Application Filing
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1567431A (en) * | 2003-07-10 | 2005-01-19 | 上海优浪信息科技有限公司 | Method and system for identifying status of speaker |
US20080010065A1 (en) * | 2006-06-05 | 2008-01-10 | Harry Bratt | Method and apparatus for speaker recognition |
CN103440873A (en) * | 2013-08-27 | 2013-12-11 | 大连理工大学 | Music recommendation method based on similarities |
CN103745002A (en) * | 2014-01-24 | 2014-04-23 | 中国科学院信息工程研究所 | Method and system for recognizing hidden paid posters on basis of fusion of behavior characteristic and content characteristic |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
CN105575394A (en) * | 2016-01-04 | 2016-05-11 | 北京时代瑞朗科技有限公司 | Voiceprint identification method based on global change space and deep learning hybrid modeling |
CN106710589A (en) * | 2016-12-28 | 2017-05-24 | 百度在线网络技术(北京)有限公司 | Artificial intelligence-based speech feature extraction method and device |
CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
CN106887225A (en) * | 2017-03-21 | 2017-06-23 | 百度在线网络技术(北京)有限公司 | Acoustic feature extracting method, device and terminal device based on convolutional neural networks |
CN109102812A (en) * | 2017-06-21 | 2018-12-28 | 北京搜狗科技发展有限公司 | A kind of method for recognizing sound-groove, system and electronic equipment |
CN108417217A (en) * | 2018-01-11 | 2018-08-17 | 苏州思必驰信息科技有限公司 | Speaker Identification network model training method, method for distinguishing speek person and system |
CN109767790A (en) * | 2019-02-28 | 2019-05-17 | 中国传媒大学 | A kind of speech-emotion recognition method and system |
Non-Patent Citations (5)
Title |
---|
仲伟峰等: "深浅层特征及模型融合的说话人识别", 《声学学报》, vol. 43, no. 2, pages 264 - 271 * |
李亭亭等: "语音信号特征参数的分析和选取", 《信息与电脑》, no. 5, pages 45 - 49 * |
林舒都等: "基于i-vector和深度学习的说话人识别", 《计算机技术与发展》, vol. 27, no. 6, pages 66 - 71 * |
王昕等: "基于DNN处理的鲁棒性I-Vector说话人识别算法", 《计算机工程与应用》, no. 54, pages 167 - 172 * |
胡青: "卷积神经网络在声纹识别中的应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 3, pages 1 - 47 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181824A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Voiceprint recognition method, apparatus and device, and computer-readable storage medium |
CN110517698A (en) * | 2019-09-05 | 2019-11-29 | 科大讯飞股份有限公司 | A kind of determination method, apparatus, equipment and the storage medium of sound-groove model |
CN110517698B (en) * | 2019-09-05 | 2022-02-01 | 科大讯飞股份有限公司 | Method, device and equipment for determining voiceprint model and storage medium |
CN110556126A (en) * | 2019-09-16 | 2019-12-10 | 平安科技(深圳)有限公司 | Voice recognition method and device and computer equipment |
CN110556126B (en) * | 2019-09-16 | 2024-01-05 | 平安科技(深圳)有限公司 | Speech recognition method and device and computer equipment |
CN112687274A (en) * | 2019-10-17 | 2021-04-20 | 北京猎户星空科技有限公司 | Voice information processing method, device, equipment and medium |
CN110880321B (en) * | 2019-10-18 | 2024-05-10 | 平安科技(深圳)有限公司 | Intelligent braking method, device, equipment and storage medium based on voice |
CN110880321A (en) * | 2019-10-18 | 2020-03-13 | 平安科技(深圳)有限公司 | Intelligent braking method, device and equipment based on voice and storage medium |
CN110838294B (en) * | 2019-11-11 | 2022-03-04 | 效生软件科技(上海)有限公司 | Voice verification method and device, computer equipment and storage medium |
CN110838294A (en) * | 2019-11-11 | 2020-02-25 | 效生软件科技(上海)有限公司 | Voice verification method and device, computer equipment and storage medium |
CN111370003A (en) * | 2020-02-27 | 2020-07-03 | 杭州雄迈集成电路技术股份有限公司 | Voiceprint comparison method based on twin neural network |
CN111552832A (en) * | 2020-04-01 | 2020-08-18 | 深圳壹账通智能科技有限公司 | Risk user identification method and device based on voiceprint features and associated map data |
CN112185344A (en) * | 2020-09-27 | 2021-01-05 | 北京捷通华声科技股份有限公司 | Voice interaction method and device, computer readable storage medium and processor |
CN114512134A (en) * | 2020-11-17 | 2022-05-17 | 阿里巴巴集团控股有限公司 | Method and device for voiceprint information extraction, model training and voiceprint recognition |
CN112614493A (en) * | 2020-12-04 | 2021-04-06 | 珠海格力电器股份有限公司 | Voiceprint recognition method, system, storage medium and electronic device |
CN112382300A (en) * | 2020-12-14 | 2021-02-19 | 北京远鉴信息技术有限公司 | Voiceprint identification method, model training method, device, equipment and storage medium |
WO2022233239A1 (en) * | 2021-05-07 | 2022-11-10 | 华为技术有限公司 | Upgrading method and apparatus, and electronic device |
CN115022087A (en) * | 2022-07-20 | 2022-09-06 | 中国工商银行股份有限公司 | Voice recognition verification processing method and device |
CN115022087B (en) * | 2022-07-20 | 2024-02-27 | 中国工商银行股份有限公司 | Voice recognition verification processing method and device |
CN115019804A (en) * | 2022-08-03 | 2022-09-06 | 北京惠朗时代科技有限公司 | Multi-verification type voiceprint recognition method and system for multi-employee intensive sign-in |
CN115831152A (en) * | 2022-11-28 | 2023-03-21 | 国网山东省电力公司应急管理中心 | Sound monitoring device and method for monitoring running state of generator of emergency equipment in real time |
CN115831152B (en) * | 2022-11-28 | 2023-07-04 | 国网山东省电力公司应急管理中心 | Sound monitoring device and method for monitoring operation state of emergency equipment generator in real time |
CN116386647A (en) * | 2023-05-26 | 2023-07-04 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
CN116386647B (en) * | 2023-05-26 | 2023-08-22 | 北京瑞莱智慧科技有限公司 | Audio verification method, related device, storage medium and program product |
CN118522288A (en) * | 2024-07-24 | 2024-08-20 | 山东第一医科大学附属省立医院(山东省立医院) | Voiceprint recognition-based otorhinolaryngological patient identity verification method |
Also Published As
Publication number | Publication date |
---|---|
WO2020181824A1 (en) | 2020-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110047490A (en) | Method for recognizing sound-groove, device, equipment and computer readable storage medium | |
CN104835498B (en) | Method for recognizing sound-groove based on polymorphic type assemblage characteristic parameter | |
TWI641965B (en) | Method and system of authentication based on voiceprint recognition | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
CN104900235B (en) | Method for recognizing sound-groove based on pitch period composite character parameter | |
US8160877B1 (en) | Hierarchical real-time speaker recognition for biometric VoIP verification and targeting | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
WO2019136912A1 (en) | Electronic device, identity authentication method and system, and storage medium | |
CN107507626B (en) | Mobile phone source identification method based on voice frequency spectrum fusion characteristics | |
CN102324232A (en) | Method for recognizing sound-groove and system based on gauss hybrid models | |
CN109036382A (en) | A kind of audio feature extraction methods based on KL divergence | |
CN113327626A (en) | Voice noise reduction method, device, equipment and storage medium | |
CN113823293B (en) | Speaker recognition method and system based on voice enhancement | |
CN104517066A (en) | Folder encrypting method | |
CN108091326A (en) | A kind of method for recognizing sound-groove and system based on linear regression | |
CN112992155B (en) | Far-field voice speaker recognition method and device based on residual error neural network | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN109545226A (en) | A kind of audio recognition method, equipment and computer readable storage medium | |
CN115394318A (en) | Audio detection method and device | |
Herrera-Camacho et al. | Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE | |
Reynolds et al. | Automatic speaker recognition | |
Nagakrishnan et al. | Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models | |
Saleema et al. | Voice biometrics: the promising future of authentication in the internet of things | |
Shi et al. | Anti-replay: A fast and lightweight voice replay attack detection system | |
Nguyen et al. | Vietnamese speaker authentication using deep models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |