CN107293306B - A kind of appraisal procedure of the Objective speech quality based on output - Google Patents
A kind of appraisal procedure of the Objective speech quality based on output Download PDFInfo
- Publication number
- CN107293306B CN107293306B CN201710475912.8A CN201710475912A CN107293306B CN 107293306 B CN107293306 B CN 107293306B CN 201710475912 A CN201710475912 A CN 201710475912A CN 107293306 B CN107293306 B CN 107293306B
- Authority
- CN
- China
- Prior art keywords
- sequence
- monitoring data
- signal
- mel
- expression formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013507 mapping Methods 0.000 claims abstract description 25
- 238000011156 evaluation Methods 0.000 claims abstract description 15
- 238000005259 measurement Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000012544 monitoring process Methods 0.000 claims description 60
- 230000006870 function Effects 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 19
- 230000001360 synchronised effect Effects 0.000 claims description 17
- 238000001914 filtration Methods 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000001228 spectrum Methods 0.000 claims description 7
- 230000000739 chaotic effect Effects 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000013139 quantization Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000009432 framing Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 229910002056 binary alloy Inorganic materials 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000000205 computational method Methods 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000002592 echocardiography Methods 0.000 claims description 3
- 238000000611 regression analysis Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims 1
- 238000001303 quality assessment method Methods 0.000 abstract description 3
- 238000004458 analytical method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000013441 quality evaluation Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000013432 robust analysis Methods 0.000 description 1
- 230000005654 stationary process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of method of the objective speech quality assessment based on output, includes the following steps:Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Obtain the reference model for meeting human hearing characteristic;The mel-frequency cepstrum coefficient of distorted speech and the reference model for meeting human hearing characteristic are subjected to consistency measure calculation;One section of sequence is inserted into raw tone, calculates the bit error rate that the sequence is extracted in the distorted speech after being transmitted by system;Mapping relations between subjective MOS and homogeneity measure are established according to consistency measurement and the bit error rate, obtain the objective prediction model to voice MOS to be evaluated points, the objective evaluation of voice quality is carried out by the objective prediction model.Using the method for the present invention, step is simplified, easy to use, and is capable of the quality of effectively objective evaluation voice, does not depend on subjective assessment.
Description
Technical field
The present invention relates to voice process technology fields, particularly, are related to a kind of Objective speech quality based on output
Appraisal procedure.
Background technology
Speech quality objective assessment refers to machine automatic discrimination voice quality, by whether needing angle using input voice
Degree can be divided into two classes:Objective evaluation based on input-output mode and the objective evaluation based on the way of output.
In many fields, such as wireless mobile communications, space flight navigation and modern military, often require that evaluation method has
Higher flexibility, real-time and versatility, and also will can be to voice matter in the case that cannot get original input speech signal
Amount is assessed, and is difficult often to obtain corresponding raw tone, phonetic storage in the objective evaluation of the mode based on input-output
Etc. cost bigger, there is certain drawbacks under these application scenarios.
The general process of objective speech quality assessment method based on output is certain characteristic parameter of Calculation Estimation voice,
And with carrying out consistency calculating by the characteristic parameter of reference voice after particular model study conclusion, final mapping obtains subjectivity
MOS points of estimated value.In this process, characteristic parameter, training pattern and MOS divide the selection of mapping method to be most important
, it affects the performance of assessment system.Since human ear meets Bark critical band to the perception characteristics of sound, in feature
It needs to realize linear frequency and inflection frequency conversion during parameter extraction.Meanwhile in this kind of application is wirelessly communicated, in addition to from voice
It itself analyzes outer, it is also necessary to consider influence of the external factors such as channel quality to voice quality.
Therefore, design is a kind of can be for the appraisal procedure tool of the voice quality after objective evaluation coding or channel transmission
It is significant.
Invention content
The purpose of the present invention is to provide a kind of methods of the objective speech quality assessment based on output.In view of human ear pair
The auditory properties of frequency, while the cepstral analysis of voice signal is taken into account, using mel-frequency cepstrum coefficient (Mel-Frequency
Cepstral Coefficients, MFCC) description phonetic feature.It is trained by combining mel-frequency cepstrum coefficient and GMM-HMM
Model obtains speech objective distortion value, at the same by channel effect by error rate index be introduced into it is objective estimate, then establish master
See MOS point and it is objective estimate between mapping relations, the prediction model to subjective MOS is obtained, so as to be used for objective comment
Voice quality after valency coding or channel transmission.Details as Follows:
A kind of appraisal procedure of the Objective speech quality based on output, includes the following steps:
Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Acquisition meets human hearing characteristic
Reference model;
The mel-frequency cepstrum coefficient of distorted speech and the reference model for meeting human hearing characteristic are subjected to uniformity amount
Degree calculates;One section of sequence is inserted into raw tone, calculates and the sequence is extracted in the distorted speech after being transmitted by system
The bit error rate of row;
Mapping relations between subjective MOS and homogeneity measure are established according to consistency measurement and the bit error rate, are obtained pair
Voice MOS to be evaluated points of objective prediction model carries out the objective evaluation of voice quality by the objective prediction model.
Preferred in above technical scheme, the calculating process of the mel-frequency cepstrum coefficient includes pretreatment, FFT becomes
It changes, four steps of Mel frequency filterings and discrete cosine transform.
Preferred in above technical scheme, the pretreatment specifically includes following steps:
Step 1.1, preemphasis, specifically:Come using the digital filter of the promotion high frequency characteristics with 6dB/ octaves
Realize preemphasis, transmission function is expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, value 0.9-1.0;
Step 1.2, end-point detection, specifically:It is carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if certain
The Short Time Speech signal that a length is N is x (m), short-time energy E expression formulas 2) it calculates:
Its short-time zero-crossing rate Z expression formulas 3) it calculates:
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, specifically:The framing is that voice is divided into frame one by one, the length of each frame
For 10-30ms;The adding window is to carry out adding window to each frame signal using Hamming windows.
Preferred in above technical scheme, the detailed process of the adding window is:If frame signal is x (n), window function w
(n), then the signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);Wherein, N is the number of sampling per frame, and the expression formula of w (n) is w
(n)=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤N-1.
Preferred in above technical scheme, the Mel frequency filterings are specifically:It will be used by the discrete spectrum of FFT processing
Sequence triangular filter is filtered, and obtains one group of Coefficient ml、m2、……;The number p of the wave filter group is cut by signal
Only frequency determines, all wave filters generally cover from 0Hz to Nyquist the half of frequency, i.e. sample rate;miBy expressing
Formula 5) it is calculated:
Wherein:
F [i] is triangle filtering
The centre frequency of device meets:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]);X (k) is frame signal x
(n) through FFT treated discrete spectrums.
Preferred in above technical scheme, the discrete cosine transform is specifically:By the Mel frequencies by Mel frequency filterings
Spectral transformation obtains Mel frequency cepstral coefficients, by expression formula 6 to time domain) it is calculated:
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is to count per frame sample, and P is the number of wave filter group.
Preferred in above technical scheme, the reference model detailed process that acquisition meets human hearing characteristic is as follows:
If the characteristic vector sequence of observation is O=o1,o2,…,oT, the state model sequence of the sequence is S=s1,
s2,…,sN, then the HMM model of the sequence be expressed as expression formula 7):
λ=(π, A, B) 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijBetween state
The transition probability matrix redirected, aijTo jump to the probability of state j from state i;B={ bi(ot)=P (ot|st=i), 2≤i
≤ N-1 } for state output probability distribution collection;
To continuous HMM model, observation sequence is continuous signal, M mixed Gaussian of signal space corresponding with state j
Density function and represent, such as expression formula 8) and expression formula 9) under:
Wherein, cjkThe coefficient of k-th of Gaussian Mixture Model Probability Density Function of expression state j;μjkIt is the mean value of Gaussian density function
Vector;CjkFor covariance matrix, D is the dimension of observation sequence O;HMM parameters are by observation sequence O=o1,o2,…,oTEstimate
It arriving, the target of estimation is that the likelihood function P of model and training data (O | λ) is made to maximize to estimate newest λ, even if
The forward direction probability calculation formula such as expression formula 10 of the likelihood function p (O | λ)):
Wherein:α1(i)=π bi(o1),1≤i≤N;
The backward probability calculation formula such as expression formula 11 of the likelihood function p (O | λ)):
Wherein:βt (i)=1,1≤i≤N;
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, defines ξ hereintWhen (i, j) is t
Quarter state be siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in the posterior probability of moment t):
The revaluation of HMM parameter lambdas is as follows as a result,:
In the parameter c of t moment state k-th of Gaussian mixture components of jjk, μjkAnd CjkBy expression formula 14), 15) and 16)
It reevaluates:
Wherein, γt(j, k) represents the probability in t moment state k-th of Gaussian mixture components of j, can be obtained by following formula:
Preferred in above technical scheme, the computational methods of consistency measurement are specifically:Using expression formula 17) it is counted
It calculates:
Wherein:X1,...,XNIt is the mel-frequency cepstrum coefficient vector of distorted speech, N is vectorial number, and C is distorted speech
It is measured with the consistency of model.
Preferred in above technical scheme, the calculating process of the bit error rate is as follows:
Step A, a PN sequence is generated, and is multiplied with a chaos sequence, the generation of chaos sequence is reflected by logistic
Generation is penetrated, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are known as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings
Work in chaos state, i.e., the sequence { x that primary condition generates under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic
, it is not convergent and very sensitive to initial value;Generation monitoring data sequent is as follows:
Step a1, real-valued sequence is generated first, and it is big for monitoring data sequent to choose the length that some position starts in sequence
Small one section;
Step a2, real-valued sequence is become into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence, you can obtain monitoring data sequent;
Step B, synchronous code is inserted into for monitoring data sequent, frame by frame extracts monitoring data sequent embedded below;
Step C, the monitoring data sequent for being inserted into synchronous code is embedded in wavelet field in voice signal, detailed process is as follows:
Step c1, Daubechies10 small echos are chosen as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame for 1152 sampled points, and to every frame signal into
3 layers of wavelet transformation of row;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so as to which monitoring data sequent is embedded in voice
In signal, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, and monitoring sequence is contained after quantization
The coefficient of column information for f' the specific steps are:
To f modulus and floor operation, as f > 0, ifN=m%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent is embedded into voice signal one by one according to above-mentioned formula;
Step c4, the signal of embedded monitoring data sequent is switched back into time-domain signal;
Step D, embedded monitoring data sequent is extracted in the voice received, and calculates the bit error rate, the process specifically extracted
Include the following steps:
Step d1, synchronous code is searched in voice signal, specifically:If need the length that the signal length searched for is L, then L
Degree should be more than the length of two synchronous codes and the summation of a complete monitoring data sequent length;If the initiating searches point of signal is
I=1, if the sample value of signalIn the range of 900-1100, then it is assumed that possible synchronous code has been searched,
It is compared using preset synchronous code;If determined as synchronous code, then I points are the initial position of monitoring data sequent, otherwise enable I
=I+L;
Step d2, since the starting point found, wavelet transform is carried out to voice signal;
Step d3, make the operation with contrary during insertion to the coefficient f after wavelet decomposition, i.e.,:During f > 0, ifW=m%2;During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, pass through expression formula 18) calculate the bit error rate:
Wherein Seqsend、SeqreceiveAnd SeqlengthRepresent that transmission monitoring data sequent, reception monitoring data sequent and sequence are long respectively
Degree;Hammingweight () represents to ask the Hamming weight of sequence, XOR expression xor operations.
Preferred in above technical scheme, the mapping relations pass through expression formula 19) it obtains:
In formula:F () is multivariate nonlinear regression analysis model;CiIt is the consistency measurement of i-th kind of parameter;N is phonetic feature
The number of parameter;It is c1,...,cNIt is scored by the objective MOS that f () is predicted.
It applies the technical scheme of the present invention, effect is:
1st, Mel frequency scales are approached using MFCC, so as to stretch the low-frequency information of voice and compacting high-frequency information, it can be used
In voice robust analysis and speech recognition, inhibit the feature dependent on speaker, retain the philological quality of voice segments.
2nd, the present invention establish subjective MOS and it is objective estimate and channel quality between mapping relations, obtain to subjectivity
MOS points of prediction model so that point closer subjective quality.
3rd, the method for the present invention step is simplified, easy to use, and is capable of the quality of effectively objective evaluation voice, does not depend on master
See evaluation.
Other than objects, features and advantages described above, the present invention also has other objects, features and advantages.
Below with reference to accompanying drawings, the present invention is described in further detail.
Description of the drawings
The attached drawing for forming the part of the application is used to provide further understanding of the present invention, schematic reality of the invention
Example and its explanation are applied for explaining the present invention, is not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the principle schematic diagram of the appraisal procedure of the Objective speech quality based on output in embodiment 1.
Specific embodiment
The embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention can be limited according to claim
Fixed and covering multitude of different ways is implemented.
Embodiment 1:
A kind of appraisal procedure of the Objective speech quality based on output, refers to Fig. 1, specifically includes:It calculates and is passed by system
Raw tone (is obtained distorted speech by the mel-frequency cepstrum coefficient of the distorted speech after defeated after system is transmitted;Calculate plum
Your process of frequency cepstral coefficient is MFCC parameter extraction process);The reference model that acquisition meets human hearing characteristic (first carries
The MFCC parameters of reference voice are taken, then obtain GMM-HMM models);By the mel-frequency cepstrum coefficient of distorted speech and meet people
The reference model of ear auditory properties carries out consistency measure calculation (i.e. consistency calculates);One section of sequence is inserted into raw tone
Row, calculate the bit error rate that the sequence is extracted in the distorted speech after being transmitted by system;It is measured and missed according to consistency
Code check establishes the mapping relations between subjective MOS and homogeneity measure (i.e. MOS points of mappings in Fig. 1), obtains to be evaluated
Voice MOS points of objective prediction model, by the objective prediction model carry out voice quality objective evaluation (here by
The MOD of the mapping points of degrees of correlation and biased error between subjective MOS are as evaluation criterion).Evaluation voice is ITU languages
Sound library (International Telecommunication Union's sound bank), Details as Follows:
The calculating process of mel-frequency cepstrum coefficient includes pretreatment, FFT (Fast Fourier Transform (FFT)) transformation, Mel frequencies
Filtering and four steps of discrete cosine transform, specifically:
The pretreatment specifically includes following steps:
Step 1.1, preemphasis, specifically:Come using the digital filter of the promotion high frequency characteristics with 6dB/ octaves
Realize preemphasis, transmission function is expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, and value is 0.9-1.0 (taking 0.95 herein);
Step 1.2, end-point detection, specifically:It is carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if certain
The Short Time Speech signal that a length is N is x (m), short-time energy E expression formulas 2) it calculates:
Its short-time zero-crossing rate Z expression formulas 3) it calculates:
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, specifically:In order to be analyzed using the analysis method of stationary process, by language
Sound is divided into frame one by one, and the length of each frame is 10-30ms;Meanwhile it for the truncation effect for reducing speech frame, uses
Hamming windows (hamming code window) carry out adding window to each frame signal, specifically:
If frame signal is x (n), window function is w (n), then the signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);Wherein, N is the number of sampling per frame, and the expression formula of w (n) is w
(n)=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤N-1.
The Mel frequency filterings are specifically:It will be filtered by the discrete spectrum of FFT processing with sequence triangular filter
Processing, obtains one group of Coefficient ml、m2、……;The number p of the wave filter group is determined that all wave filters are total by the cutoff frequency of signal
The half of covering frequency (nyquist frequency), i.e. sample rate from 0Hz to Nyquist on body;miBy expression formula 5) it calculates
It obtains:
Wherein:
F [i] is triangle filtering
The centre frequency of device meets:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]).
Since Mel spectral coefficients are all real numbers, can time domain be transformed to by discrete cosine transform.It is described discrete remaining
String converts:, to time domain, Mel frequency cepstral coefficients will be obtained, by table by the Mel Spectrum Conversions of Mel frequency filterings
Up to formula 6) it is calculated:
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is to count per frame sample, and P is the number of wave filter group.
The reference model detailed process that acquisition meets human hearing characteristic is as follows:
Pronunciation modeling and training based on GMM-HMM, if the characteristic vector sequence of observation is O=o1,o2,…,oT, the sequence
The state model sequence of row is S=s1,s2,…,sN, then the HMM model (hidden Markov model) of the sequence be expressed as expressing
Formula 7):
λ=(π, A, B) 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijBetween state
The transition probability matrix redirected, aijTo jump to the probability of state j from state i;B={ bi(ot)=P (ot|st=i), 2≤i
≤ N-1 } for state output probability distribution collection;
To continuous HMM model, observation sequence is continuous signal, M mixed Gaussian of signal space corresponding with state j
Density function and represent, such as expression formula 8) and expression formula 9) under:
Wherein, cjkThe coefficient of k-th of Gaussian Mixture Model Probability Density Function of expression state j;μjkIt is the mean value of Gaussian density function
Vector;CjkFor covariance matrix, D is the dimension of observation sequence O;HMM parameters are by observation sequence O=o1,o2,…,oTEstimate
It arriving, the target of estimation is that the likelihood function P of model and training data (O | λ) is made to maximize to estimate newest λ, even ifEM algorithms (EM algorithm) may be used to realize in this, and the EM algorithms include two parts:Before
It is as follows to estimating again for backward probability calculation and HMM parameters and Gaussian mixture parameters:
The forward direction probability calculation formula such as expression formula 10 of the likelihood function p (O | λ)):
Wherein:α1(i)=π bi(o1),1≤i≤N;
The backward probability calculation formula such as expression formula 11 of the likelihood function p (O | λ)):
Wherein:βt (i)=1,1≤i≤N;
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, defines ξ hereintWhen (i, j) is t
Quarter state be siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in the posterior probability of moment t):
The revaluation of HMM parameter lambdas is as follows as a result,:
In the parameter c of t moment state k-th of Gaussian mixture components of jjk、μjkAnd CjkBy expression formula 14), 15) and 16)
It reevaluates:
Wherein, γt(j, k) represents the probability in t moment state k-th of Gaussian mixture components of j, can be obtained by following formula:
Consistency measurement computational methods be specifically:After modeling, mel-frequency cepstrum coefficient and the ginseng of distorted speech
Examine model and carry out consistency measurement using expression formula 17) it is calculated:
Wherein:X1,...,XNIt is mel-frequency cepstrum coefficient (MFCC) vector of distorted speech, N is vectorial number, and C is
The consistency of distorted speech and model is measured.
The calculating process of the bit error rate is as follows:
Step A, a PN sequence is generated, and is multiplied with a chaos sequence, the generation of chaos sequence is reflected by logistic
Generation is penetrated, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are known as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings
Work in chaos state, i.e., the sequence { x that primary condition generates under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic
, it is not convergent and very sensitive to initial value;Generation monitoring data sequent is as follows:
Step a1, real-valued sequence is generated first, and it is big for monitoring data sequent to choose the length that some position starts in sequence
Small one section;
Step a2, real-valued sequence is become into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence (pseudo noise sequence), you can obtain monitoring data sequent;
Step B, synchronous code is inserted into for monitoring data sequent, frame by frame extracted monitoring data sequent embedded below, specifically:
Synchronous code is inserted into for monitoring data sequent, the purpose for being inserted into synchronous code is that for audio after the attenuation of channel, receiving terminal is difficult in order to prevent
To extract monitoring data sequent;The synchronous code that we use is 16 bits, and in order to be accurately located synchronous code, we adopt
The method taken is the embedded synchronous code in the time domain of voice signal, and concrete methods of realizing is by 16 sampled points before monitoring data sequent
Amplitude be set to 1000, in this way receiving terminal extract monitoring data sequent during, if there is the nonsynchronous situation of starting point, then may be used
Using sampled point of continuous 16 sample values 900~1100, rising for watermark is rapidly found out in a manner of searching synchronous code
Beginning sample position, in this way, can frame by frame extract monitoring data sequent embedded below;
Step C, the monitoring data sequent for being inserted into synchronous code in embedded voice signal, is selected embedding in wavelet field in wavelet field
The reason of entering is that embedded monitoring data sequent has better concealment in transform domain, will not cause human ear that can distinguish to raw tone
Influence.The detailed process of sequence embedded voice in wavelet field is as follows:
Step c1, it can be generated due to the use of the different same problems of wavelet basis analysis different as a result, therefore, it is necessary to roots
Suitable wavelet basis is selected according to the problem of analysis, chooses Daubechies10 small echos herein as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame for 1152 sampled points, and to every frame signal into
3 layers of wavelet transformation of row;In view of the auditory properties of human ear, select to be embedded in sequence in high band herein;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so as to which monitoring data sequent is embedded in voice
In signal, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, and monitoring sequence is contained after quantization
The coefficient of column information for f' the specific steps are:First to f modulus and floor operation, as f > 0, ifn
=m%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent can be embedded into voice signal one by one according to above-mentioned formula.
Step c4, the signal of embedded monitoring data sequent is switched back into time-domain signal;
Step D, embedded monitoring data sequent is extracted in the voice received, and calculates the bit error rate, details are:Monitoring data sequent
Extraction be embedded inverse process, therefore the wavelet function and the series of wavelet decomposition used all remain unchanged, and specifically extract
Process includes the following steps:
Step d1, synchronous code is searched in voice signal, specifically:If need the length that the signal length searched for is L, then L
Degree should be more than the length of two synchronous codes and the summation of a complete monitoring data sequent length.If the initiating searches point of signal is
I=1, if the sample value of signalIn the range of 900-1100, then it is assumed that possible synchronous code has been searched,
It is compared using preset synchronous code;If determined as synchronous code, then I points are the initial position of monitoring data sequent, otherwise enable I
=I+L;
Step d2, since the starting point found, wavelet transform is carried out to voice signal;
Step d3, make the operation with contrary during insertion to the coefficient f after wavelet decomposition, i.e.,:
During f > 0, ifW=m%2;
During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, pass through expression formula 18) calculate the bit error rate
(bit error rate as one of speech quality evaluation objective estimate):
Wherein Seqsend、SeqreceiveAnd SeqlengthRepresent that transmission monitoring data sequent, reception monitoring data sequent and sequence are long respectively
Degree;Hammingweight () represents to ask the Hamming weight of sequence, XOR expression xor operations.
In the case where calculating various distortion conditions after the parameter consistency measurement of voice, a kind of Function Mapping relationship can be used
Come expression parameter consistency measurement with it is objectiveBetween relationship, i.e., described mapping relations pass through expression formula 19) obtain:
In formula:For anticipation function, (it can be linearly or nonlinearly regression relation or fitting of a polynomial to f ()
Relationship in this patent embodiment, more accurately predicts MOS values, herein preferred multivariate nonlinear regression analysis model in order to obtain);
CiIt is the consistency measurement of i-th kind of parameter;N is the number of speech characteristic parameter;It is c1,...,cNIt is predicted by f ()
The objective MOS scorings gone out.The bit error rate is bigger, illustrates to interfere in channel stronger, and the speech damage brought in transmission process is also corresponding
Also it is big, it is correspondingIt is worth smaller, the quality of voice is poorer.
The performance of speech quality evaluation algorithm is weighed from the degree of correlation, biased error below.The degree of correlation mainly reflects voice
Whether the mapping relations that quality evaluation algorithm obtains MOS points of prediction by distortion map are reasonable, the general MOS with Algorithm mapping points
Degree of correlation and biased error between known subjectivity MOS values is as evaluation criterion.
Correlation coefficient ρ and pass through expression formula 20 with standard estimated bias σ) and expression formula 21) obtain:
Wherein:MOSo(i) be i-th of voice prediction MOS values, MOSs(i) it is known MOS points, N is total voice pair
Number,Represent the mean value of prediction MOS values,Represent MOS points of mean value.
Correlation coefficient ρ is closer to 1, and prediction MOS values are closer to true MOS values;Biased error σ is smaller, then predicts that error is got over
Small, the performance of algorithm is better.
The appraisal procedure of the present embodiment 1 and International Telecommunication Union ITU P.563 method for objectively evaluating (ITU-TP.563)
Performance comparison is the results detailed in Table 1.
From table 1 it follows that the method for the present invention (embodiment 1) is certain relative to having on ITU-TP.563 algorithm performances
The raising of degree, the average degree of correlation ρ higher of subjective MOS, estimated bias σ is relatively low, and therefore, the method for the present invention has validity
And feasibility.
The performance comparison sheet that 1 the method for the present invention of table (embodiment 1) and ITU-TP.563 are respectively handled voice
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, that is made any repaiies
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (9)
1. a kind of appraisal procedure of the Objective speech quality based on output, which is characterized in that include the following steps:
Calculate the mel-frequency cepstrum coefficient of the distorted speech after system is transmitted;Obtain the reference for meeting human hearing characteristic
Model;
The mel-frequency cepstrum coefficient of distorted speech and the reference model for meeting human hearing characteristic are subjected to consistency measurement meter
It calculates;One section of sequence is inserted into raw tone, calculates and extracts the sequence in the distorted speech after being transmitted by system
The bit error rate;
Mapping relations between subjective MOS and homogeneity measure are established according to consistency measurement and the bit error rate, are obtained to be evaluated
The objective prediction model that MOS points of valency voice carries out the objective evaluation of voice quality by the objective prediction model;
The reference model detailed process that acquisition meets human hearing characteristic is as follows:
If the characteristic vector sequence of observation is O=o1,o2,…,oT, the state model sequence of the sequence is S=s1,s2,…,sN,
Then the HMM model of the sequence is expressed as expression formula 7):
λ=(π, A, B) 7);
Wherein, π={ πi=P (s1=i), i=1,2 ..., N } it is initial state probabilities vector;A={ aijRedirected between state
Transition probability matrix, aijTo jump to the probability of state j from state i;B={ bi(ot)=P (ot|st=i), 2≤i≤N-1 } be
The output probability distribution collection of state;
To continuous HMM model, observation sequence is continuous signal, M mixed Gaussian density of signal space corresponding with state j
Function and represent, such as expression formula 8) and expression formula 9) under:
Wherein, cjkThe coefficient of k-th of Gaussian Mixture Model Probability Density Function of expression state j;μjkIt is the mean vector of Gaussian density function;
CjkFor covariance matrix, D is the dimension of observation sequence O;HMM parameters are by observation sequence O=o1,o2,…,oTEstimation obtains, estimation
Target be that the likelihood function P of model and training data (O | λ) is made to maximize to estimate newest λ, even if
The forward direction probability calculation formula such as expression formula 10 of the likelihood function p (O | λ)):
Wherein:α1(i)=π bi(o1),1≤i≤N;
The backward probability calculation formula such as expression formula 11 of the likelihood function p (O | λ)):
Wherein:βt (i)=1,1≤i≤N;
To giving observation sequence O=o1,o2,…,oTNewest λ is obtained by revaluation, defines ξ hereint(i, j) is t moment shape
State is siAnd t+1 moment state is sjProbability, by expression formula 12) obtain:
Under conditions of setting models λ and observation sequence O, state siIt is expression formula 13 in the posterior probability of moment t):
The revaluation of HMM parameter lambdas is as follows as a result,:
In the parameter c of t moment state k-th of Gaussian mixture components of jjk, μjkAnd CjkBy expression formula 14), 15) and 16) estimate again
Meter:
Wherein, γt(j, k) represents the probability in t moment state k-th of Gaussian mixture components of j, can be obtained by following formula:
2. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:The Meier
The calculating process of frequency cepstral coefficient includes four pretreatment, FFT transform, Mel frequency filterings and discrete cosine transform steps.
3. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:
The pretreatment specifically includes following steps:
Step 1.1, preemphasis, specifically:It is realized using the digital filter of the promotion high frequency characteristics with 6dB/ octaves
Preemphasis, transmission function are expression formula 1):
H (z)=1- μ z-11);
Wherein:μ is pre emphasis factor, value 0.9-1.0;
Step 1.2, end-point detection, specifically:It is carried out by setting the thresholding of short-time energy and short-time zero-crossing rate, if some is grown
It is x (m) to spend for the Short Time Speech signal of N, short-time energy E expression formulas 2) it calculates:
Its short-time zero-crossing rate Z expression formulas 3) it calculates:
Wherein, sgn [] is sign function, i.e.,:
Step 1.3, framing and adding window, specifically:The framing is that voice is divided into frame one by one, and the length of each frame is
10-30ms;The adding window is to carry out adding window to each frame signal using Hamming windows.
4. the appraisal procedure of the Objective speech quality according to claim 3 based on output, it is characterised in that:The adding window
Detailed process be:If frame signal is x (n), window function is w (n), then the signal y (n) after adding window is expression formula 4):
Y (n)=x (n) w (n), 0≤n≤N-1 4);Wherein, N is the number of sampling per frame, and the expression formula of w (n) is w (n)
=0.54-0.46cos [2 π n/ (N-1)], 0≤n≤N-1.
5. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:The Mel
Frequency filtering is specifically:It will be filtered by the discrete spectrum of FFT processing with sequence triangular filter, obtain a system
Number ml、m2、……;The number p of the wave filter group determines by the cutoff frequency of signal, all wave filters generally cover from 0Hz to
The half of Nyquist frequencies, i.e. sample rate;miBy expression formula 5) it is calculated:
Wherein:
F [i] is triangular filter
Centre frequency meets:Mel (f [i+1])-Mel (f [i])=Mel (f [i])-Mel (f [i-1]);X (k) is passed through for frame signal x (n)
FFT treated discrete spectrums.
6. the appraisal procedure of the Objective speech quality according to claim 2 based on output, it is characterised in that:It is described discrete
Cosine transform is specifically:By the Mel Spectrum Conversions of process Mel frequency filterings to time domain, Mel frequency cepstral coefficients are obtained, by
Expression formula 6) it is calculated:
Wherein:MFCC (i) is Mel frequency cepstral coefficients, and N is to count per frame sample, and P is the number of wave filter group.
7. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:Uniformity amount
The computational methods of degree are specifically:Using expression formula 17) it is calculated:
Wherein:X1,...,XNIt is the mel-frequency cepstrum coefficient vector of distorted speech, N is vectorial number, and C is distorted speech and mould
The consistency measurement of type.
8. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:The error code
The calculating process of rate is as follows:
Step A, a PN sequence is generated, and is multiplied with a chaos sequence, the generation of chaos sequence is mapped by logistic produces
Raw, logistic mapping definitions are as follows:
xk+1=μ xk(1-xk)
Wherein, 0≤μ≤4 are known as branch parameter, xk∈ (0,1), when 3.5699456 ...<During μ≤4, logistic mappings works in
Sequence { the x that chaos state, i.e. primary condition generate under logistic mappingsk;K=0,1,2,3 ... } it is aperiodic, does not receive
It is holding back and very sensitive to initial value;Generation monitoring data sequent is as follows:
Step a1, real-valued sequence is generated first, and it is monitoring data sequent size to choose the length that some position starts in sequence
One section;
Step a2, real-valued sequence is become into binary sequence:By defining a threshold value Γ, obtained by real-valued sequence:
Binary chaotic sequence is { Γ (xk);K=0,1,2,3 ... };
Step a3, binary chaotic sequence is multiplied with a PN sequence, you can obtain monitoring data sequent;
Step B, synchronous code is inserted into for monitoring data sequent, frame by frame extracts monitoring data sequent embedded below;
Step C, the monitoring data sequent for being inserted into synchronous code is embedded in wavelet field in voice signal, detailed process is as follows:
Step c1, Daubechies10 small echos are chosen as wavelet function;
Step c2, sub-frame processing is carried out to voice signal, the size per frame is 1152 sampled points, and carries out 3 to every frame signal
Layer wavelet transformation;
Step c3, wavelet coefficient is quantified, and monitoring data sequent is modulated, so as to which monitoring data sequent is embedded in voice signal
In, if coefficient to be quantified is f, the bit of embedded monitoring data sequent is w, and quantization step is Δ, is believed after quantization containing monitoring data sequent
The coefficient of breath for f' the specific steps are:
To f modulus and floor operation, as f > 0, ifN=m%2, then:
As f < 0, ifN=m%2, n=w, then:
Monitoring data sequent is embedded into voice signal one by one according to above-mentioned formula;
Step c4, the signal of embedded monitoring data sequent is switched back into time-domain signal;
Step D, embedded monitoring data sequent is extracted in the voice received, and calculates the bit error rate, the process specifically extracted includes
Following steps:
Step d1, synchronous code is searched in voice signal, specifically:If needing the signal length searched for, then the length of L should for L
When the length more than two synchronous codes and the summation of a complete monitoring data sequent length;If the initiating searches point of signal is I=
1, if the sample value of signalIn the range of 900-1100, then it is assumed that searched possible synchronous code, profit
It is compared with preset synchronous code;If determined as synchronous code, then I points are the initial position of monitoring data sequent, otherwise enable I=
I+L;
Step d2, since the starting point found, wavelet transform is carried out to voice signal;
Step d3, make the operation with contrary during insertion to the coefficient f after wavelet decomposition, i.e.,:During f > 0, ifw
=m%2;During f < 0, ifW=m%2;
So as to extract binary system monitoring data sequent;
Step d4, compare the monitoring data sequent extracted and the monitoring data sequent of insertion, pass through expression formula 18) calculate the bit error rate:
Wherein Seqsend、SeqreceiveAnd SeqlengthIt represents to send monitoring data sequent respectively, receive monitoring data sequent and sequence length;
Hammingweight () represents to ask the Hamming weight of sequence, XOR expression xor operations.
9. the appraisal procedure of the Objective speech quality according to claim 1 based on output, it is characterised in that:The mapping
Relationship passes through expression formula 19) it obtains:
In formula:F () is multivariate nonlinear regression analysis model;CiIt is the consistency measurement of i-th kind of parameter;N is speech characteristic parameter
Number;It is c1,...,cNIt is scored by the objective MOS that f () is predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710475912.8A CN107293306B (en) | 2017-06-21 | 2017-06-21 | A kind of appraisal procedure of the Objective speech quality based on output |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710475912.8A CN107293306B (en) | 2017-06-21 | 2017-06-21 | A kind of appraisal procedure of the Objective speech quality based on output |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107293306A CN107293306A (en) | 2017-10-24 |
CN107293306B true CN107293306B (en) | 2018-06-15 |
Family
ID=60096759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710475912.8A Active CN107293306B (en) | 2017-06-21 | 2017-06-21 | A kind of appraisal procedure of the Objective speech quality based on output |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107293306B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818797B (en) * | 2017-12-07 | 2021-07-06 | 苏州科达科技股份有限公司 | Voice quality evaluation method, device and system |
CN108364661B (en) * | 2017-12-15 | 2020-11-24 | 海尔优家智能科技(北京)有限公司 | Visual voice performance evaluation method and device, computer equipment and storage medium |
CN110289014B (en) * | 2019-05-21 | 2021-11-19 | 华为技术有限公司 | Voice quality detection method and electronic equipment |
CN110211566A (en) * | 2019-06-08 | 2019-09-06 | 安徽中医药大学 | A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency |
CN111091816B (en) * | 2020-03-19 | 2020-08-04 | 北京五岳鑫信息技术股份有限公司 | Data processing system and method based on voice evaluation |
CN111968677B (en) * | 2020-08-21 | 2021-09-07 | 南京工程学院 | Voice quality self-evaluation method for fitting-free hearing aid |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713273A (en) * | 2005-07-21 | 2005-12-28 | 复旦大学 | Algorithm of local robust digital voice-frequency watermark for preventing time size pantography |
CN101847409A (en) * | 2010-03-25 | 2010-09-29 | 北京邮电大学 | Voice integrity protection method based on digital fingerprint |
CN102044248A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluating method for audio quality of streaming media |
CN102881289A (en) * | 2012-09-11 | 2013-01-16 | 重庆大学 | Hearing perception characteristic-based objective voice quality evaluation method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7327985B2 (en) * | 2003-01-21 | 2008-02-05 | Telefonaktiebolaget Lm Ericsson (Publ) | Mapping objective voice quality metrics to a MOS domain for field measurements |
-
2017
- 2017-06-21 CN CN201710475912.8A patent/CN107293306B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713273A (en) * | 2005-07-21 | 2005-12-28 | 复旦大学 | Algorithm of local robust digital voice-frequency watermark for preventing time size pantography |
CN102044248A (en) * | 2009-10-10 | 2011-05-04 | 北京理工大学 | Objective evaluating method for audio quality of streaming media |
CN101847409A (en) * | 2010-03-25 | 2010-09-29 | 北京邮电大学 | Voice integrity protection method based on digital fingerprint |
CN102881289A (en) * | 2012-09-11 | 2013-01-16 | 重庆大学 | Hearing perception characteristic-based objective voice quality evaluation method |
Non-Patent Citations (1)
Title |
---|
A Speech Quality Evaluation Method Based on Auditory Characteristic;Qingxian Li etc;《Proceedings of the 2016 International Conference on Intelligent Control and Computer Application(ICCA 2016)》;20160131;第320-323页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107293306A (en) | 2017-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107293306B (en) | A kind of appraisal procedure of the Objective speech quality based on output | |
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
Hossan et al. | A novel approach for MFCC feature extraction | |
Tiwari | MFCC and its applications in speaker recognition | |
EP2352145B1 (en) | Transient speech signal encoding method and device, decoding method and device, processing system and computer-readable storage medium | |
CN1121681C (en) | Speech processing | |
Dubey et al. | Non-intrusive speech quality assessment using several combinations of auditory features | |
CN111696580B (en) | Voice detection method and device, electronic equipment and storage medium | |
Karbasi et al. | Twin-HMM-based non-intrusive speech intelligibility prediction | |
CN101577116A (en) | Extracting method of MFCC coefficients of voice signal, device and Mel filtering method | |
Lim et al. | Classification of underwater transient signals using mfcc feature vector | |
Aparna et al. | Role of windowing techniques in speech signal processing for enhanced signal cryptography | |
Maganti et al. | Auditory processing-based features for improving speech recognition in adverse acoustic conditions | |
CN105741853A (en) | Digital speech perception hash method based on formant frequency | |
Gandhiraj et al. | Auditory-based wavelet packet filterbank for speech recognition using neural network | |
Varela et al. | Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector | |
Makhijani et al. | Speech enhancement using pitch detection approach for noisy environment | |
Jawarkar et al. | Effect of nonlinear compression function on the performance of the speaker identification system under noisy conditions | |
KR102427874B1 (en) | Method and Apparatus for Artificial Band Conversion Based on Learning Model | |
Tomchuk | Spectral masking in MFCC calculation for noisy speech | |
Rahdari et al. | A two-level multi-gene genetic programming model for speech quality prediction in Voice over Internet Protocol systems | |
KR100474969B1 (en) | Vector quantization method of line spectral coefficients for coding voice singals and method for calculating masking critical valule therefor | |
CN110689875A (en) | Language identification method and device and readable storage medium | |
Maurya et al. | Speaker recognition for noisy speech in telephonic channel | |
Surwade et al. | Speech recognition using HMM/ANN hybrid model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |