CN105957520B - A kind of voice status detection method suitable for echo cancelling system - Google Patents
A kind of voice status detection method suitable for echo cancelling system Download PDFInfo
- Publication number
- CN105957520B CN105957520B CN201610519040.6A CN201610519040A CN105957520B CN 105957520 B CN105957520 B CN 105957520B CN 201610519040 A CN201610519040 A CN 201610519040A CN 105957520 B CN105957520 B CN 105957520B
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- piecemeal
- training sample
- gauss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 230000011664 signaling Effects 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 34
- 230000000694 effects Effects 0.000 claims abstract description 29
- 238000004891 communication Methods 0.000 claims abstract description 22
- 230000009977 dual effect Effects 0.000 claims abstract description 21
- 238000001914 filtration Methods 0.000 claims abstract description 14
- 238000012706 support-vector machine Methods 0.000 claims abstract description 8
- 238000001228 spectrum Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 239000004576 sand Substances 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 210000000554 iris Anatomy 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000014155 detection of activity Effects 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
The present invention is a kind of voice status detection method suitable for echo cancelling system, is related to the technical field of voice interaction of IP based network.The present invention utilizes noise training sample and voice training sample architecture support vector machines (SVM) classifier, signal to be detected is the proximally and distally signal after piecemeal, VAD judgement is carried out to this piecemeal remote signaling based on the SVM classifier of gauss hybrid models using what is constructed, if it is judged that for no voice, stop filter update and filtering, near-end voice signals are directly exported, if it is determined that there is voice in distal end, carry out dual end communication judgement;When being in dual end communication, stops filter coefficient update, near end signal is filtered;Otherwise, device coefficient update and filtering are filtered according to remote signaling.The present invention improves the accuracy of Voice activity detector, avoids both-end mute state being mistaken for dual end communication state, it is therefore prevented that the mistake of filter is updated and filtered without reference to signal.
Description
Technical field
The present invention relates to the technical field of voice interaction of IP based network, in particular to one kind to be suitable for echo cancelling system
Voice status detection method.
Background technique
Echo cancellation technology is widely used in the IP based networks such as TeleConference Bridge, on-vehicle Bluetooth system, IP phone
In voice interactive system, the sound to eliminate loudspeaker broadcasting is picked up after a variety of propagateds by microphone, and passes back
The acoustic echo formed to system distal end.The core concept of echo cancellor is by a sef-adapting filter analog echo road
Diameter, and estimated echo signal is subtracted from the signal that microphone picks up.
Voice status detection plays a crucial role in echo cancellor.It is needed before voice signal enters filter
Current speech state is judged first, the voice status according to locating for system determines the working condition of filter.Whether
Can accurately be formed a prompt judgement system voice state, be had a great impact to the effect of echo cancellor.
Existing echo cancelling system typically directly uses DTD (Double Talk Detection, double talk detection)
Algorithm judges whether system is in dual end communication state, and stops filter coefficient update under dual end communication state, prevents this
Filter is dissipated due to the interference by near-end speech in the case of kind.Common DTD algorithm --- Geigel algorithm passes through ratio
The range value of nearer end signal and remote signaling judges whether there is near-end speech, in the ratio of near end signal and remote signaling amplitude
Value ξ(g)Think that system is in dual end communication state when greater than particular value T.Work as:
When, it is believed that there are near-end speech, system is in dual end communication state.Wherein | y (k) | it is near-end speech range value,
Max | x (k-1) | ..., | x (k-N) | be far-end speech signal top n sampled point maximum amplitude value.Thresholding T is according to echo
Path attenuation determines, can usually take 0.5;N is usually equal with filter length.
But the method has the following shortcomings:
1, Geigel algorithm assumes that near-end speech is much larger than the echo signal of distal end, not fully meets echo cancellor
Actual conditions, therefore be not very accurate in some cases.
2, without distal end VAD (Voice Activity Detection, Voice activity detector) with regard to directly carrying out DTD
It may result in both-end mute state and be mistaken for dual end communication state.
3, only stop filter coefficient update under dual end communication state, in the state that far-end speech is not present continue into
Row filtering and coefficient update may cause filter divergence, and not existing distal end language is proximally mistakenly subtracted in signal
Sound.
Summary of the invention
In order to overcome the problems, such as that above-mentioned three, the present invention propose the voice status detection method of combination VAD and DTD a kind of,
And design new filtering and more new strategy according to testing result to improve Detection accuracy, the erroneous judgement of voice status is avoided, is prevented
The mistake of filter updates and filtering.
A kind of voice status detection method suitable for echo cancelling system provided by the invention realizes that steps are as follows:
Step 1: utilizing noise training sample and voice training sample architecture support vector machines classifier.
Characteristics extraction and gauss hybrid models GMM training are carried out to noise training sample and voice training sample respectively,
Construct corresponding Gauss super vector.SVM classifier kernel function and voice signal and noise signal are constructed using Gauss super vector
Corresponding SVM model obtains SVM classifier using the kernel function and SVM Construction of A Model that have constructed.
Step 2: signal to be detected is the proximally and distally signal after piecemeal.Using having constructed based on Gaussian Mixture mould
The SVM classifier of type carries out VAD judgement to this piecemeal remote signaling.
Characteristics extraction and GMM training are carried out to this piecemeal remote signaling, construct Gauss super vector.This piecemeal distal end is believed
Number corresponding Gauss super vector, which is input in the SVM classifier constructed, to be made decisions.If being classified as noise, judging result is
Without voice, then stop filter update and filtering, directly output near-end voice signals.Otherwise illustrate that there is voice in distal end, carry out down
The dual end communication of one step is adjudicated.
Step 3: judging whether system belongs to dual end communication state.
Calculate the normalized crosscorrelation ξ of remote signaling and error signalXECC, compare normalized crosscorrelation ξXECCWith setting
Thresholding TXECC, work as ξXECC< TXECCWhen, there is voice in proximal end, and system is in dual end communication state, stops filter coefficient update, right
Near end signal is filtered.Work as ξXECC≥TXECCWhen, proximal end is filtered device coefficient update and filter without voice, according to remote signaling
Wave.
Advantages of the present invention with have the active effect that
(1) Voice activity detector is carried out to remote signaling using the algorithm of support vector machine based on gauss hybrid models,
The accuracy for improving Voice activity detector, overcome existing for the commonly Voice activity detector method based on energy
The problem of inaccuracy is detected under Low SNR.
(2) it carries out far-end speech detection of activity first before double talk detection, is carried out again when distally there is voice
Double talk detection can be avoided both-end mute state being mistaken for dual end communication state.It is logical using the both-end based on cross-correlation
Detection algorithm is talked about, the accuracy of double talk detection is improved.
(3) the different phonetic state according to locating for system takes different filtering and more new strategy.With traditional echo cancellor
System only stops filter coefficient update in dual end communication and compares, and also stops filter coefficient in the state that distal end is without voice
It updates and filters, the mistake of filter without reference to signal can be further prevented to update and filter.
Detailed description of the invention
Fig. 1 is the overall flow schematic diagram of the voice status detection method suitable for echo cancelling system of the invention;
Fig. 2 is emulation of the embodiment of the present invention two sections of PCM stream schematic diagrames used;
Fig. 3 is the effect diagram that the embodiment of the present invention is used only that the DTD detection based on energy carries out echo cancellor;
Fig. 4 is the effect diagram that the embodiment of the present invention carries out echo cancellor using the method for the present invention;
Fig. 5 is Sipdroid echo cancellor effect diagram of the embodiment of the present invention using the echo cancellor library before improving;
Fig. 6 is the Sipdroid echo cancellor effect diagram that the embodiment of the present invention uses improved echo cancellor library;
Specific embodiment
Below in conjunction with drawings and examples, the present invention is described in further detail.
The method of the present invention carries out VAD to remote signaling first before DTD, in the absence of VAD detects remote signaling
Stop filter coefficient update and filtering, directly to prevent filter divergence and mistakenly filter.Detect there is distal end in VAD
DTD is carried out when voice again, and stops filter coefficient update in dual end communication.Vad algorithm used in it is based on GMM
The SVM (Support Vector Machine, support vector machines) of (Gaussian Mixture Model, gauss hybrid models)
Algorithm, the algorithm utilize GMM construction feature super vector, and GMM super vector is used for characteristic value input and the Kernel of SVM,
Accuracy rate is higher than the commonly vad algorithm based on energy or correlation.The DTD algorithm used is believed based on remote signaling and error
The DTD of number cross-correlation, accuracy rate is also above the Geigel algorithm commonly based on energy.By the way that distal end VAD and DTD are combined
Come, the accuracy of voice status detection can be improved.It, can be to prevent by taking different filtering strategies under different phonetic state
The only diverging of filter and the filtering of mistake, substantially improve the effect of echo cancellor.
It is illustrated with reference to Fig. 1 each step of the voice status detection method suitable for echo cancelling system of the invention.
Step 1, using noise training sample and voice training sample architecture SVM classifier, including step S101~
S103。
Step S101: characteristics extraction is carried out to noise signal training sample and voice signal training sample.Here it uses
Characteristic value be Mel cepstrum coefficient (MFCC).The specific extraction process of MFCC: carrying out preemphasis, piecemeal and windowing process to signal,
Piecemeal after adding window is found out to the frequency spectrum parameter of each piecemeal by Fast Fourier Transform (FFT) (FFT).By the frequency spectrum of each piecemeal
Parameter by one group of Mel scale filter as composed by K triangle strip bandpass filter, K Mel bandpass filter number from
0 arrives K-1, and the output of each frequency band is taken logarithm, finds out the logarithmic energy of each output, obtains to each piecemeal voice signal
Corresponding K log spectrum.K is positive integer, and general value is 20~30.K obtained log spectrum is finally subjected to cosine
Transformation finds out Mel cepstrum coefficient.Log spectrum is transformed into cepstrum frequency domain by discrete cosine transform and obtains Mel cepstrum coefficient
Formula is as follows:
Wherein, Si(k) corresponding obtained log spectrum, K after the bandpass filter for passing through number k for i-th of piecemeal signal
For the number of Mel bandpass filter, miIt (l) is the l rank parameter of the MFCC of i-th of piecemeal voice signal, L is the MFCC extracted
Total order, i indicates corresponding i-th of piecemeal in formula (1), and i is positive integer.
Step S102: noise signal training sample and the corresponding Gauss super vector of voice signal training sample are generated.
The MFCC parameter for being utilized respectively noise signal training sample and voice signal training sample establishes noise signal and language
The corresponding gauss hybrid models of sound signal.GMM is substantially a kind of Multi-dimensional probability density function, N rank gauss hybrid models g (x)
It is that frame feature is described by the linear combination of N number of single Gaussian Profile in the distribution of feature space, to a certain piecemeal, g (x) is indicated such as
Under:
Wherein, x is the L dimensional feature vector that constitutes of MFCC parameter of training sample this piecemeal, and N is the rank of gauss hybrid models
Number, piIt (x) is i-th of Gaussian component of gauss hybrid models, wiFor gauss hybrid models component pi(x) weighted factor.
pi(x) it is expressed as follows:
Wherein, ΣiIt is the covariance matrix of i-th of Gaussian component, μiIt is the mean vector of i-th of Gaussian component, therefore,
The parameter set λ of GMM model can be expressed as follows:
λ=(wi,μi,Σi), i=1,2 ..., N (4)
Corresponding gauss hybrid models g (x) can be indicated are as follows:
Wherein, N () indicates Gaussian probability-density function.
The process for establishing GMM model is actually to pass through the process of the parameter of training estimation GMM model.It can be using most
Big expectation EM algorithm carries out model parameter update.There are two key steps for the algorithm: expectation E step and maximization M step.E step utilizes
Current parameter set calculates the desired value of the likelihood score function of partial data, and M step obtains new ginseng by maximizing expectation function
Number.E step and M walk iteration always until convergence.The GMM model for finally distinguishing available voice and noise, is set as g (s) and g
(n), s indicates that voice signal, n indicate noise signal.
Gauss super vector is constructed using established gauss hybrid models.Gauss super vector is the parameter of gauss hybrid models
It, can be by the GMM Gauss super vector m of voice and noise made of constructionsAnd mnIt respectively indicates as follows:
For the mean vector of Gaussian component each in g (s),For Gauss each in g (n) point
The mean vector of amount.
Step S103: the Gauss super vector construction SVM classifier constructed is utilized.It is utilized respectively noise signal and voice letter
Number corresponding Gauss super vector mnAnd msEstablish noise signal and the corresponding SVM model of voice signal.Utilize noise signal and voice
The corresponding Gauss super vector m of signalnAnd msConstruct K-L kernel function.The kernel function is dissipated using the K-L between two GMM probability distribution
Degree constructs.
By the GMM super vector m of voice and noisenAnd msKernel function K (n, s) expression of construction is as follows:
Determine available SVM classifier after the SVM of kernel function, the SVM of voice signal and noise signal.
Step 2 carries out VAD judgement to this piecemeal remote signaling based on the SVM classifier of GMM using what is constructed.Input
The signal to be detected of SVM classifier is the proximally and distally signal after piecemeal.It needs to carry out Fourier transformation first to be transformed into frequency
Then domain calculates the characteristic value of signal piecemeal, i.e. MFCC, normalized crosscorrelation etc. according to signal spectrum.It particularly may be divided into step
S201~S203.
Step S201: this piecemeal remote signaling MFCC parameter extraction.The specific extraction process of MFCC parameter with step 101,
The corresponding MFCC parameter of this piecemeal remote signaling is finally obtained by formula (1).
Step S202: the corresponding Gauss super vector of this piecemeal remote signaling generates.Joined using this piecemeal remote signaling MFCC
Number establishes gauss hybrid models, and using established gauss hybrid models construct the corresponding Gauss of this piecemeal remote signaling surpass to
Amount.Gauss super vector generation method is with step S102, as shown in formula (6) and (7).
Step S203: the corresponding Gauss super vector of this piecemeal remote signaling is input in the SVM classifier constructed, is made
Speech/noise classification is carried out with the SVM algorithm based on GMM.Obtain the VAD court verdict of far-end speech.If being classified as noise,
Judging result is no voice, then stops filter update and filtering, directly output near-end voice signals.If being classified as voice,
Illustrate that there is voice in distal end, carries out the dual end communication judgement of next step.
Step 3, judges whether system belongs to dual end communication state.
Step S301: error signal.
Adaptive filter coefficient simulates echo path, thus this piecemeal remote signaling and adaptive filter coefficient into
The available estimated echo signal x of row convolutionT(n) w (n), error signal e (n) be this piecemeal near end signal d (n) with estimate
Count echo signal xT(n) difference of w (n).
Adaptive filter coefficient is to be constantly updated according to adaptive algorithm using error signal and remote signaling.One
Kind is common, and more new algorithm --- the more new formula of LMS algorithm is as follows:
W (n+1)=+ 2 μ e (n) x (n) of w (n) (9)
Wherein, μ is step-length, and w (n) is filter weight vector, and e (n) is error signal, and x (n) is remote signaling.N is represented
N-th of moment (sampled point).
Step S302: the normalized crosscorrelation of remote signaling and error signal is calculated.Since the computing cross-correlation of time domain can
To be converted to the dot product of frequency domain, i.e. two signal spectrum values are multiplied point by point, therefore can directly utilize remote signaling frequency spectrum X (k)
The value of the normalized crosscorrelation is acquired with error signal spectrum E (k), computation complexity is lower.Normalized crosscorrelation is in frequency domain
Calculation method:
ξXECCIndicate that the normalized crosscorrelation of remote signaling and error signal, k indicate frequency point.
Step S303:DTD judgement.Compare the normalized crosscorrelation ξ of remote signaling and error signalXECCIt is mutual with normalization
It closes the door and limits.When proximal end is without voice, the normalized crosscorrelation ξ of remote signaling and error signalXECCIt should be equal to 1, and proximal end has
When voice, normalized crosscorrelation ξXECCLess than 1.Therefore, can be set one be slightly less than 1 constant TXECCAs threshold value, TXECC
Usual value is between 0.9 to 1, and threshold value real-time update according to testing result.The algorithm of update selects according to the actual situation
It takes.One good threshold value should make misinformation probability and miss probability all relatively small.Such as: one can be arbitrarily selected first
It is slightly less than 1 constant, it is 0 that near-end speech, which is then arranged, calculates misinformation probability and miss probability, adjusts in a certain range
TXECC, until misinformation probability and miss probability are all smaller.
When normalized crosscorrelation is less than thresholding, it may be assumed that
ξXECC< TXECC (11)
System is in dual end communication state, stops filter coefficient update, is directly believed using original filter coefficient proximal end
It number is filtered;Otherwise, near-end speech is not present, only exists far-end speech, had at this moment both been filtered device coefficient update, also carry out
Filtering.
Voice status detection method proposed by the present invention is applied in actual echo cancelling system, including two ends
End, verifies practical communication effect using VoIP software Sipdroid.
It is emulated first using voice status detection method of the matlab to combination VAD and DTD proposed by the present invention.It is imitative
Very voice signal used includes 1 section of 30 seconds far-end speech PCM (Pulse Code Modulation, pulse code modulation)
Stream and 1 section of corresponding near-end speech PCM stream, sample frequency is 8000Hz.In echo cancelling system, filter
Length is set as 128, and adaptive filter algorithm uses BFDAF algorithm (i.e. the NLMS algorithm of frequency domain), and voice status detection algorithm
Using voice status detection method proposed by the present invention.
As shown in Fig. 2, the two section PCM streams used for emulation.It is followed successively by remote signaling waveform, near end signal wave from top to bottom
Shape.Abscissa is time, unit s;Ordinate is range value.Using original voice status detection method, i.e. Jin Shiyong is based on
The DTD of energy is detected, and echo cancellor effect is as shown in Figure 3.It can be seen from the figure that under the conditions of VAD is unmodified, front half section
Echo cancellor effect it is preferable, but there are a small amount of residual echos;The effect of second half section is then less desirable, and primary sound is eliminated
It must compare more, the signal after echo cancellor produces larger distortion.
Using voice status detection method proposed by the present invention, the effect of echo cancellor is as shown in Figure 4.Before comparison improves
With two sections of PCM streams for carrying out obtaining after echo cancellor respectively after improvement, it can be seen that echo cancellor effect is improving voice shape
It improves significantly after state detection method.Residual echo is eliminated more thorough, and near-end speech is also almost without there is distortion phenomenon.
In order to further verify effect of the voice status detection method proposed by the present invention in actual echo elimination system,
Corresponding c program is write to this method, and this method is tested using voice communication software Sipdroid.
The step of voice status detection method according to the present invention, which modifies, executes VAD and DTD in the WebRTC of echo cancellor library
Part, the echo cancellor library is then called in Sipdroid.Practical both-end is carried out using Sipdroid under various circumstances
It converses and records, the voice PCM stream before and after echo cancellor is saved, to carry out echo cancellor effect analysis.
In order to more convenient and clear when carrying out observation analysis after taking out voice flow, every time in test, two callers
Count off is successively carried out from 1 to 10.Under various circumstances, repeatedly led to before improving with improved Sipdroid version respectively
Words test is to compare.
The multiple speaking test of Sipdroid echo cancellor effect progress first to the echo cancellor library before improving is used, and
PCM stream after taking out distal end, proximal end and echo cancellor.Test results are shown in figure 5, and the PCM stream of count off part is only intercepted in figure.
Wherein, first segment PCM stream is remote signaling, and second segment PCM stream is near end signal, and third section PCM stream is close after echo cancellor
End signal.As it can be seen that echo cancellor effect is less desirable, there is a little residual echo in count off part, and dotted line frame irises out part.Other
Test result is largely similar.
Then, to the echo cancellor effect for the Sipdroid for using improved echo cancellor library also use same method into
The multiple speaking test of row, and take out the PCM stream after distal end, proximal end and echo cancellor.Fig. 6 is more representational primary test
As a result.Similar with Fig. 5, first segment PCM stream is remote signaling in figure, and second segment PCM stream is near end signal, and third section PCM stream is
Near end signal after echo cancellor.As it can be seen that echo cancellor effect compares reason after using the improved speech detection method of the present invention
Think, the residual echo of count off part eliminates ratio more thoroughly, and as dotted line frame irises out part, while the reservation of primary sound is not also by shadow
It rings.Repeatedly test discovery, under various circumstances, the effect of echo cancellor will receive certain influence, and stability need further
It improves.But in most cases, all compared with before-improvement using the echo cancellor effect after voice status detection method of the invention
Echo cancellor effect have clear improvement.
Claims (5)
1. a kind of voice status detection method suitable for echo cancelling system, which is characterized in that realize that steps are as follows:
Step 1: constructing support vector machines classifier using noise signal training sample and voice signal training sample;
Characteristics extraction is carried out to noise signal training sample and voice signal training sample respectively and gauss hybrid models GMM is instructed
Practice, construct corresponding Gauss super vector, then utilizes the kernel function and voice signal of Gauss super vector construction SVM classifier
SVM model corresponding with noise signal;SVM classifier is obtained using the kernel function and SVM Construction of A Model that have constructed;
Step 2: signal to be detected is the proximally and distally signal after piecemeal, it is remote to this piecemeal using the SVM classifier constructed
End signal carries out VAD judgement;VAD indicates Voice activity detector;
Characteristics extraction and GMM training are carried out to this piecemeal remote signaling, construct Gauss super vector, then this piecemeal remote signaling
Corresponding Gauss super vector, which is input in the SVM classifier constructed, to be made decisions;If it is judged that being noise, no language is indicated
Sound then stops filter update and filtering, directly output near-end voice signals, otherwise illustrates that there is voice in distal end, carries out in next step
Dual end communication judgement;
Step 3: judging whether system belongs to dual end communication state;
Calculate the normalized crosscorrelation ξ of remote signaling and error signalXECC;Compare normalized crosscorrelation ξXECCWith the thresholding of setting
TXECC, work as ξXECC< TXECCWhen, system is in dual end communication state, stops filter coefficient update, filters near end signal
Wave;Otherwise, proximal end is filtered device coefficient update and filtering according to remote signaling without voice.
2. a kind of voice status detection method suitable for echo cancelling system according to claim 1, which is characterized in that
The first step constructs SVM classifier, includes the following steps:
Step S101: characteristics extraction is carried out to noise signal training sample and voice signal training sample;Used feature
Value is Mel cepstrum coefficient MFCC;
The extraction process of MFCC is: carrying out preemphasis, piecemeal and windowing process to signal, the piecemeal after adding window is passed through quick Fu
In leaf transformation FFT find out the frequency spectrum parameter of each piecemeal;By the frequency spectrum parameter of each piecemeal by one group by K triangular band pass
Mel scale filter composed by filter, and logarithm is taken to the output of each frequency band, obtain log spectrum;If K band logical filter
The number of wave device is from 0 to K-1, then corresponding obtained log spectrum is S after the bandpass filter that i-th of piecemeal passes through number ki
(k), the l rank parameter m of the MFCC of i-th of piecemeali(l) are as follows:
Wherein, L is total order of the MFCC extracted;
Step S102: the Gauss super vector of noise signal training sample and voice signal training sample is generated;
The MFCC parameter for being utilized respectively noise signal training sample and voice signal training sample establishes noise signal and voice letter
Number corresponding gauss hybrid models;
To a certain piecemeal, N rank gauss hybrid models g (x) is indicated are as follows:
Wherein, x is the L dimensional feature vector that constitutes of MFCC parameter of training sample this piecemeal, piIt (x) is the i-th of gauss hybrid models
A Gaussian component, wiFor the weighted factor of i-th of Gaussian component;ΣiIt is the covariance matrix of i-th of Gaussian component, μiIt is i-th
The mean vector of a Gaussian component;
Gauss hybrid models g (x) is further indicated that are as follows:N () indicates Gaussian probability density letter
Number;
The update that gauss hybrid models parameter is carried out using EM algorithm, if finally obtaining the height of voice signal training sample
This mixed model is g (s), wherein the mean vector of each Gaussian component isS indicates voice signal;It finally obtains
Noise signal training sample gauss hybrid models be g (n), wherein the mean vector of each Gaussian component isN indicates noise signal;Voice signal training sample and noise are constructed using established gauss hybrid models
The Gauss super vector m of signal training samplesAnd mnIt is respectively as follows:
Step S103: the Gauss super vector construction SVM classifier constructed is utilized;
It is utilized respectively Gauss super vector mnAnd msEstablish noise signal and the corresponding SVM model of voice signal;
Utilize Gauss super vector mnAnd msIt is as follows to construct kernel function K (n, s):
The SVM model for determining kernel function, the SVM model of voice signal and noise signal, obtains SVM classifier.
3. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist
In in the third step, the method for error signal is: this piecemeal remote signaling and adaptive filter coefficient are carried out
Convolution obtains estimated echo signal, and error signal is the difference of this piecemeal near end signal and estimated echo signal.
4. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist
In in the third step, according to the normalized crosscorrelation ξ of following formula calculating remote signaling and error signalXECC:
Wherein, k indicates that frequency point, X (k) are remote signaling frequency spectrum, and E (k) is error signal spectrum.
5. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist
In, in the third step, the thresholding T of settingXECCFor the value between 0.9 to 1, and real-time update is carried out according to court verdict.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610519040.6A CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610519040.6A CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105957520A CN105957520A (en) | 2016-09-21 |
CN105957520B true CN105957520B (en) | 2019-10-11 |
Family
ID=56903377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610519040.6A Active CN105957520B (en) | 2016-07-04 | 2016-07-04 | A kind of voice status detection method suitable for echo cancelling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105957520B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429994B (en) * | 2017-02-15 | 2020-10-09 | 阿里巴巴集团控股有限公司 | Audio identification and echo cancellation method, device and equipment |
CN109215672B (en) * | 2017-07-05 | 2021-11-16 | 苏州谦问万答吧教育科技有限公司 | Method, device and equipment for processing sound information |
CN109309764B (en) * | 2017-07-28 | 2021-09-03 | 北京搜狗科技发展有限公司 | Audio data processing method and device, electronic equipment and storage medium |
CN107888792B (en) * | 2017-10-19 | 2019-09-17 | 浙江大华技术股份有限公司 | A kind of echo cancel method, apparatus and system |
CN109068012B (en) * | 2018-07-06 | 2021-04-27 | 南京时保联信息科技有限公司 | Double-end call detection method for audio conference system |
CN109348072B (en) * | 2018-08-30 | 2021-03-02 | 湖北工业大学 | Double-end call detection method applied to echo cancellation system |
CN109473123B (en) | 2018-12-05 | 2022-05-31 | 百度在线网络技术(北京)有限公司 | Voice activity detection method and device |
CN109379501B (en) * | 2018-12-17 | 2021-12-21 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109448748B (en) * | 2018-12-17 | 2021-08-03 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109493878B (en) * | 2018-12-17 | 2021-08-31 | 嘉楠明芯(北京)科技有限公司 | Filtering method, device, equipment and medium for echo cancellation |
CN109547655A (en) * | 2018-12-30 | 2019-03-29 | 广东大仓机器人科技有限公司 | A kind of method of the echo cancellation process of voice-over-net call |
CN111294473B (en) * | 2019-01-28 | 2022-01-04 | 展讯通信(上海)有限公司 | Signal processing method and device |
CN112133324A (en) * | 2019-06-06 | 2020-12-25 | 北京京东尚科信息技术有限公司 | Call state detection method, device, computer system and medium |
CN110246516B (en) * | 2019-07-25 | 2022-06-17 | 福建师范大学福清分校 | Method for processing small space echo signal in voice communication |
CN112614500B (en) * | 2019-09-18 | 2024-06-25 | 北京声智科技有限公司 | Echo cancellation method, device, equipment and computer storage medium |
CN110944089A (en) * | 2019-11-04 | 2020-03-31 | 中移(杭州)信息技术有限公司 | Double-talk detection method and electronic equipment |
CN111049848B (en) | 2019-12-23 | 2021-11-23 | 腾讯科技(深圳)有限公司 | Call method, device, system, server and storage medium |
CN111048118B (en) * | 2019-12-24 | 2022-07-26 | 大众问问(北京)信息科技有限公司 | Voice signal processing method and device and terminal |
CN111161748B (en) * | 2020-02-20 | 2022-09-23 | 百度在线网络技术(北京)有限公司 | Double-talk state detection method and device and electronic equipment |
CN114242106B (en) * | 2020-09-09 | 2024-10-29 | 中车株洲电力机车研究所有限公司 | Voice processing method and device |
CN112637833B (en) * | 2020-12-21 | 2022-10-11 | 新疆品宣生物科技有限责任公司 | Communication terminal information detection method and equipment |
CN113223546A (en) * | 2020-12-28 | 2021-08-06 | 南京愔宜智能科技有限公司 | Audio and video conference system and echo cancellation device for same |
CN113241085B (en) * | 2021-04-29 | 2022-07-22 | 北京梧桐车联科技有限责任公司 | Echo cancellation method, device, equipment and readable storage medium |
CN115273909B (en) * | 2022-07-28 | 2024-07-30 | 歌尔科技有限公司 | Voice activity detection method, device, equipment and computer readable storage medium |
CN117437929B (en) * | 2023-12-21 | 2024-03-08 | 睿云联(厦门)网络通讯技术有限公司 | Real-time echo cancellation method based on neural network |
CN118645113B (en) * | 2024-08-14 | 2024-10-29 | 腾讯科技(深圳)有限公司 | Voice signal processing method, device, equipment, medium and product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012009047A1 (en) * | 2010-07-12 | 2012-01-19 | Audience, Inc. | Monaural noise suppression based on computational auditory scene analysis |
WO2013040414A1 (en) * | 2011-09-16 | 2013-03-21 | Qualcomm Incorporated | Mobile device context information using speech detection |
CN103151039A (en) * | 2013-02-07 | 2013-06-12 | 中国科学院自动化研究所 | Speaker age identification method based on SVM (Support Vector Machine) |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN105657110A (en) * | 2016-02-26 | 2016-06-08 | 深圳Tcl数字技术有限公司 | Voice communication echo cancellation method and device |
-
2016
- 2016-07-04 CN CN201610519040.6A patent/CN105957520B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012009047A1 (en) * | 2010-07-12 | 2012-01-19 | Audience, Inc. | Monaural noise suppression based on computational auditory scene analysis |
WO2013040414A1 (en) * | 2011-09-16 | 2013-03-21 | Qualcomm Incorporated | Mobile device context information using speech detection |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103151039A (en) * | 2013-02-07 | 2013-06-12 | 中国科学院自动化研究所 | Speaker age identification method based on SVM (Support Vector Machine) |
CN105657110A (en) * | 2016-02-26 | 2016-06-08 | 深圳Tcl数字技术有限公司 | Voice communication echo cancellation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105957520A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105957520B (en) | A kind of voice status detection method suitable for echo cancelling system | |
US11017791B2 (en) | Deep neural network-based method and apparatus for combining noise and echo removal | |
CN109841206B (en) | Echo cancellation method based on deep learning | |
Carbajal et al. | Multiple-input neural network-based residual echo suppression | |
US9633671B2 (en) | Voice quality enhancement techniques, speech recognition techniques, and related systems | |
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
CN107123430A (en) | Echo cancellation method, device, conference tablet and computer storage medium | |
Zhang et al. | FT-LSTM based complex network for joint acoustic echo cancellation and speech enhancement | |
CN109979476B (en) | Method and device for removing reverberation of voice | |
CN112735456A (en) | Speech enhancement method based on DNN-CLSTM network | |
CN107610712B (en) | Voice enhancement method combining MMSE and spectral subtraction | |
CN109767780A (en) | A kind of audio signal processing method, device, equipment and readable storage medium storing program for executing | |
CN106157964A (en) | A kind of determine the method for system delay in echo cancellor | |
Seidel et al. | Y $^ 2$-Net FCRN for Acoustic Echo and Noise Suppression | |
CN107635082A (en) | A kind of both-end sounding end detecting system | |
Nuthakki et al. | Speech enhancement based on deep convolutional neural network | |
CN106161820B (en) | A kind of interchannel decorrelation method for stereo acoustic echo canceler | |
Sawata et al. | Improving character error rate is not equal to having clean speech: Speech enhancement for asr systems with black-box acoustic models | |
CN112382301A (en) | Noise-containing voice gender identification method and system based on lightweight neural network | |
CN115083431A (en) | Echo cancellation method and device, electronic equipment and computer readable medium | |
CN110148421A (en) | A kind of residual echo detection method, terminal and device | |
CN103971697B (en) | Sound enhancement method based on non-local mean filtering | |
JP2001520764A (en) | Speech analysis system | |
CN101533642B (en) | Method for processing voice signal and device | |
CN112133324A (en) | Call state detection method, device, computer system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |