Nothing Special   »   [go: up one dir, main page]

CN105957520B - A kind of voice status detection method suitable for echo cancelling system - Google Patents

A kind of voice status detection method suitable for echo cancelling system Download PDF

Info

Publication number
CN105957520B
CN105957520B CN201610519040.6A CN201610519040A CN105957520B CN 105957520 B CN105957520 B CN 105957520B CN 201610519040 A CN201610519040 A CN 201610519040A CN 105957520 B CN105957520 B CN 105957520B
Authority
CN
China
Prior art keywords
voice
signal
piecemeal
training sample
gauss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610519040.6A
Other languages
Chinese (zh)
Other versions
CN105957520A (en
Inventor
王珂
明萌
纪红
李曦
张鹤立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201610519040.6A priority Critical patent/CN105957520B/en
Publication of CN105957520A publication Critical patent/CN105957520A/en
Application granted granted Critical
Publication of CN105957520B publication Critical patent/CN105957520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The present invention is a kind of voice status detection method suitable for echo cancelling system, is related to the technical field of voice interaction of IP based network.The present invention utilizes noise training sample and voice training sample architecture support vector machines (SVM) classifier, signal to be detected is the proximally and distally signal after piecemeal, VAD judgement is carried out to this piecemeal remote signaling based on the SVM classifier of gauss hybrid models using what is constructed, if it is judged that for no voice, stop filter update and filtering, near-end voice signals are directly exported, if it is determined that there is voice in distal end, carry out dual end communication judgement;When being in dual end communication, stops filter coefficient update, near end signal is filtered;Otherwise, device coefficient update and filtering are filtered according to remote signaling.The present invention improves the accuracy of Voice activity detector, avoids both-end mute state being mistaken for dual end communication state, it is therefore prevented that the mistake of filter is updated and filtered without reference to signal.

Description

A kind of voice status detection method suitable for echo cancelling system
Technical field
The present invention relates to the technical field of voice interaction of IP based network, in particular to one kind to be suitable for echo cancelling system Voice status detection method.
Background technique
Echo cancellation technology is widely used in the IP based networks such as TeleConference Bridge, on-vehicle Bluetooth system, IP phone In voice interactive system, the sound to eliminate loudspeaker broadcasting is picked up after a variety of propagateds by microphone, and passes back The acoustic echo formed to system distal end.The core concept of echo cancellor is by a sef-adapting filter analog echo road Diameter, and estimated echo signal is subtracted from the signal that microphone picks up.
Voice status detection plays a crucial role in echo cancellor.It is needed before voice signal enters filter Current speech state is judged first, the voice status according to locating for system determines the working condition of filter.Whether Can accurately be formed a prompt judgement system voice state, be had a great impact to the effect of echo cancellor.
Existing echo cancelling system typically directly uses DTD (Double Talk Detection, double talk detection) Algorithm judges whether system is in dual end communication state, and stops filter coefficient update under dual end communication state, prevents this Filter is dissipated due to the interference by near-end speech in the case of kind.Common DTD algorithm --- Geigel algorithm passes through ratio The range value of nearer end signal and remote signaling judges whether there is near-end speech, in the ratio of near end signal and remote signaling amplitude Value ξ(g)Think that system is in dual end communication state when greater than particular value T.Work as:
When, it is believed that there are near-end speech, system is in dual end communication state.Wherein | y (k) | it is near-end speech range value, Max | x (k-1) | ..., | x (k-N) | be far-end speech signal top n sampled point maximum amplitude value.Thresholding T is according to echo Path attenuation determines, can usually take 0.5;N is usually equal with filter length.
But the method has the following shortcomings:
1, Geigel algorithm assumes that near-end speech is much larger than the echo signal of distal end, not fully meets echo cancellor Actual conditions, therefore be not very accurate in some cases.
2, without distal end VAD (Voice Activity Detection, Voice activity detector) with regard to directly carrying out DTD It may result in both-end mute state and be mistaken for dual end communication state.
3, only stop filter coefficient update under dual end communication state, in the state that far-end speech is not present continue into Row filtering and coefficient update may cause filter divergence, and not existing distal end language is proximally mistakenly subtracted in signal Sound.
Summary of the invention
In order to overcome the problems, such as that above-mentioned three, the present invention propose the voice status detection method of combination VAD and DTD a kind of, And design new filtering and more new strategy according to testing result to improve Detection accuracy, the erroneous judgement of voice status is avoided, is prevented The mistake of filter updates and filtering.
A kind of voice status detection method suitable for echo cancelling system provided by the invention realizes that steps are as follows:
Step 1: utilizing noise training sample and voice training sample architecture support vector machines classifier.
Characteristics extraction and gauss hybrid models GMM training are carried out to noise training sample and voice training sample respectively, Construct corresponding Gauss super vector.SVM classifier kernel function and voice signal and noise signal are constructed using Gauss super vector Corresponding SVM model obtains SVM classifier using the kernel function and SVM Construction of A Model that have constructed.
Step 2: signal to be detected is the proximally and distally signal after piecemeal.Using having constructed based on Gaussian Mixture mould The SVM classifier of type carries out VAD judgement to this piecemeal remote signaling.
Characteristics extraction and GMM training are carried out to this piecemeal remote signaling, construct Gauss super vector.This piecemeal distal end is believed Number corresponding Gauss super vector, which is input in the SVM classifier constructed, to be made decisions.If being classified as noise, judging result is Without voice, then stop filter update and filtering, directly output near-end voice signals.Otherwise illustrate that there is voice in distal end, carry out down The dual end communication of one step is adjudicated.
Step 3: judging whether system belongs to dual end communication state.
Calculate the normalized crosscorrelation ξ of remote signaling and error signalXECC, compare normalized crosscorrelation ξXECCWith setting Thresholding TXECC, work as ξXECC< TXECCWhen, there is voice in proximal end, and system is in dual end communication state, stops filter coefficient update, right Near end signal is filtered.Work as ξXECC≥TXECCWhen, proximal end is filtered device coefficient update and filter without voice, according to remote signaling Wave.
Advantages of the present invention with have the active effect that
(1) Voice activity detector is carried out to remote signaling using the algorithm of support vector machine based on gauss hybrid models, The accuracy for improving Voice activity detector, overcome existing for the commonly Voice activity detector method based on energy The problem of inaccuracy is detected under Low SNR.
(2) it carries out far-end speech detection of activity first before double talk detection, is carried out again when distally there is voice Double talk detection can be avoided both-end mute state being mistaken for dual end communication state.It is logical using the both-end based on cross-correlation Detection algorithm is talked about, the accuracy of double talk detection is improved.
(3) the different phonetic state according to locating for system takes different filtering and more new strategy.With traditional echo cancellor System only stops filter coefficient update in dual end communication and compares, and also stops filter coefficient in the state that distal end is without voice It updates and filters, the mistake of filter without reference to signal can be further prevented to update and filter.
Detailed description of the invention
Fig. 1 is the overall flow schematic diagram of the voice status detection method suitable for echo cancelling system of the invention;
Fig. 2 is emulation of the embodiment of the present invention two sections of PCM stream schematic diagrames used;
Fig. 3 is the effect diagram that the embodiment of the present invention is used only that the DTD detection based on energy carries out echo cancellor;
Fig. 4 is the effect diagram that the embodiment of the present invention carries out echo cancellor using the method for the present invention;
Fig. 5 is Sipdroid echo cancellor effect diagram of the embodiment of the present invention using the echo cancellor library before improving;
Fig. 6 is the Sipdroid echo cancellor effect diagram that the embodiment of the present invention uses improved echo cancellor library;
Specific embodiment
Below in conjunction with drawings and examples, the present invention is described in further detail.
The method of the present invention carries out VAD to remote signaling first before DTD, in the absence of VAD detects remote signaling Stop filter coefficient update and filtering, directly to prevent filter divergence and mistakenly filter.Detect there is distal end in VAD DTD is carried out when voice again, and stops filter coefficient update in dual end communication.Vad algorithm used in it is based on GMM The SVM (Support Vector Machine, support vector machines) of (Gaussian Mixture Model, gauss hybrid models) Algorithm, the algorithm utilize GMM construction feature super vector, and GMM super vector is used for characteristic value input and the Kernel of SVM, Accuracy rate is higher than the commonly vad algorithm based on energy or correlation.The DTD algorithm used is believed based on remote signaling and error The DTD of number cross-correlation, accuracy rate is also above the Geigel algorithm commonly based on energy.By the way that distal end VAD and DTD are combined Come, the accuracy of voice status detection can be improved.It, can be to prevent by taking different filtering strategies under different phonetic state The only diverging of filter and the filtering of mistake, substantially improve the effect of echo cancellor.
It is illustrated with reference to Fig. 1 each step of the voice status detection method suitable for echo cancelling system of the invention.
Step 1, using noise training sample and voice training sample architecture SVM classifier, including step S101~ S103。
Step S101: characteristics extraction is carried out to noise signal training sample and voice signal training sample.Here it uses Characteristic value be Mel cepstrum coefficient (MFCC).The specific extraction process of MFCC: carrying out preemphasis, piecemeal and windowing process to signal, Piecemeal after adding window is found out to the frequency spectrum parameter of each piecemeal by Fast Fourier Transform (FFT) (FFT).By the frequency spectrum of each piecemeal Parameter by one group of Mel scale filter as composed by K triangle strip bandpass filter, K Mel bandpass filter number from 0 arrives K-1, and the output of each frequency band is taken logarithm, finds out the logarithmic energy of each output, obtains to each piecemeal voice signal Corresponding K log spectrum.K is positive integer, and general value is 20~30.K obtained log spectrum is finally subjected to cosine Transformation finds out Mel cepstrum coefficient.Log spectrum is transformed into cepstrum frequency domain by discrete cosine transform and obtains Mel cepstrum coefficient Formula is as follows:
Wherein, Si(k) corresponding obtained log spectrum, K after the bandpass filter for passing through number k for i-th of piecemeal signal For the number of Mel bandpass filter, miIt (l) is the l rank parameter of the MFCC of i-th of piecemeal voice signal, L is the MFCC extracted Total order, i indicates corresponding i-th of piecemeal in formula (1), and i is positive integer.
Step S102: noise signal training sample and the corresponding Gauss super vector of voice signal training sample are generated.
The MFCC parameter for being utilized respectively noise signal training sample and voice signal training sample establishes noise signal and language The corresponding gauss hybrid models of sound signal.GMM is substantially a kind of Multi-dimensional probability density function, N rank gauss hybrid models g (x) It is that frame feature is described by the linear combination of N number of single Gaussian Profile in the distribution of feature space, to a certain piecemeal, g (x) is indicated such as Under:
Wherein, x is the L dimensional feature vector that constitutes of MFCC parameter of training sample this piecemeal, and N is the rank of gauss hybrid models Number, piIt (x) is i-th of Gaussian component of gauss hybrid models, wiFor gauss hybrid models component pi(x) weighted factor.
pi(x) it is expressed as follows:
Wherein, ΣiIt is the covariance matrix of i-th of Gaussian component, μiIt is the mean vector of i-th of Gaussian component, therefore, The parameter set λ of GMM model can be expressed as follows:
λ=(wiii), i=1,2 ..., N (4)
Corresponding gauss hybrid models g (x) can be indicated are as follows:
Wherein, N () indicates Gaussian probability-density function.
The process for establishing GMM model is actually to pass through the process of the parameter of training estimation GMM model.It can be using most Big expectation EM algorithm carries out model parameter update.There are two key steps for the algorithm: expectation E step and maximization M step.E step utilizes Current parameter set calculates the desired value of the likelihood score function of partial data, and M step obtains new ginseng by maximizing expectation function Number.E step and M walk iteration always until convergence.The GMM model for finally distinguishing available voice and noise, is set as g (s) and g (n), s indicates that voice signal, n indicate noise signal.
Gauss super vector is constructed using established gauss hybrid models.Gauss super vector is the parameter of gauss hybrid models It, can be by the GMM Gauss super vector m of voice and noise made of constructionsAnd mnIt respectively indicates as follows:
For the mean vector of Gaussian component each in g (s),For Gauss each in g (n) point The mean vector of amount.
Step S103: the Gauss super vector construction SVM classifier constructed is utilized.It is utilized respectively noise signal and voice letter Number corresponding Gauss super vector mnAnd msEstablish noise signal and the corresponding SVM model of voice signal.Utilize noise signal and voice The corresponding Gauss super vector m of signalnAnd msConstruct K-L kernel function.The kernel function is dissipated using the K-L between two GMM probability distribution Degree constructs.
By the GMM super vector m of voice and noisenAnd msKernel function K (n, s) expression of construction is as follows:
Determine available SVM classifier after the SVM of kernel function, the SVM of voice signal and noise signal.
Step 2 carries out VAD judgement to this piecemeal remote signaling based on the SVM classifier of GMM using what is constructed.Input The signal to be detected of SVM classifier is the proximally and distally signal after piecemeal.It needs to carry out Fourier transformation first to be transformed into frequency Then domain calculates the characteristic value of signal piecemeal, i.e. MFCC, normalized crosscorrelation etc. according to signal spectrum.It particularly may be divided into step S201~S203.
Step S201: this piecemeal remote signaling MFCC parameter extraction.The specific extraction process of MFCC parameter with step 101, The corresponding MFCC parameter of this piecemeal remote signaling is finally obtained by formula (1).
Step S202: the corresponding Gauss super vector of this piecemeal remote signaling generates.Joined using this piecemeal remote signaling MFCC Number establishes gauss hybrid models, and using established gauss hybrid models construct the corresponding Gauss of this piecemeal remote signaling surpass to Amount.Gauss super vector generation method is with step S102, as shown in formula (6) and (7).
Step S203: the corresponding Gauss super vector of this piecemeal remote signaling is input in the SVM classifier constructed, is made Speech/noise classification is carried out with the SVM algorithm based on GMM.Obtain the VAD court verdict of far-end speech.If being classified as noise, Judging result is no voice, then stops filter update and filtering, directly output near-end voice signals.If being classified as voice, Illustrate that there is voice in distal end, carries out the dual end communication judgement of next step.
Step 3, judges whether system belongs to dual end communication state.
Step S301: error signal.
Adaptive filter coefficient simulates echo path, thus this piecemeal remote signaling and adaptive filter coefficient into The available estimated echo signal x of row convolutionT(n) w (n), error signal e (n) be this piecemeal near end signal d (n) with estimate Count echo signal xT(n) difference of w (n).
Adaptive filter coefficient is to be constantly updated according to adaptive algorithm using error signal and remote signaling.One Kind is common, and more new algorithm --- the more new formula of LMS algorithm is as follows:
W (n+1)=+ 2 μ e (n) x (n) of w (n) (9)
Wherein, μ is step-length, and w (n) is filter weight vector, and e (n) is error signal, and x (n) is remote signaling.N is represented N-th of moment (sampled point).
Step S302: the normalized crosscorrelation of remote signaling and error signal is calculated.Since the computing cross-correlation of time domain can To be converted to the dot product of frequency domain, i.e. two signal spectrum values are multiplied point by point, therefore can directly utilize remote signaling frequency spectrum X (k) The value of the normalized crosscorrelation is acquired with error signal spectrum E (k), computation complexity is lower.Normalized crosscorrelation is in frequency domain Calculation method:
ξXECCIndicate that the normalized crosscorrelation of remote signaling and error signal, k indicate frequency point.
Step S303:DTD judgement.Compare the normalized crosscorrelation ξ of remote signaling and error signalXECCIt is mutual with normalization It closes the door and limits.When proximal end is without voice, the normalized crosscorrelation ξ of remote signaling and error signalXECCIt should be equal to 1, and proximal end has When voice, normalized crosscorrelation ξXECCLess than 1.Therefore, can be set one be slightly less than 1 constant TXECCAs threshold value, TXECC Usual value is between 0.9 to 1, and threshold value real-time update according to testing result.The algorithm of update selects according to the actual situation It takes.One good threshold value should make misinformation probability and miss probability all relatively small.Such as: one can be arbitrarily selected first It is slightly less than 1 constant, it is 0 that near-end speech, which is then arranged, calculates misinformation probability and miss probability, adjusts in a certain range TXECC, until misinformation probability and miss probability are all smaller.
When normalized crosscorrelation is less than thresholding, it may be assumed that
ξXECC< TXECC (11)
System is in dual end communication state, stops filter coefficient update, is directly believed using original filter coefficient proximal end It number is filtered;Otherwise, near-end speech is not present, only exists far-end speech, had at this moment both been filtered device coefficient update, also carry out Filtering.
Voice status detection method proposed by the present invention is applied in actual echo cancelling system, including two ends End, verifies practical communication effect using VoIP software Sipdroid.
It is emulated first using voice status detection method of the matlab to combination VAD and DTD proposed by the present invention.It is imitative Very voice signal used includes 1 section of 30 seconds far-end speech PCM (Pulse Code Modulation, pulse code modulation) Stream and 1 section of corresponding near-end speech PCM stream, sample frequency is 8000Hz.In echo cancelling system, filter Length is set as 128, and adaptive filter algorithm uses BFDAF algorithm (i.e. the NLMS algorithm of frequency domain), and voice status detection algorithm Using voice status detection method proposed by the present invention.
As shown in Fig. 2, the two section PCM streams used for emulation.It is followed successively by remote signaling waveform, near end signal wave from top to bottom Shape.Abscissa is time, unit s;Ordinate is range value.Using original voice status detection method, i.e. Jin Shiyong is based on The DTD of energy is detected, and echo cancellor effect is as shown in Figure 3.It can be seen from the figure that under the conditions of VAD is unmodified, front half section Echo cancellor effect it is preferable, but there are a small amount of residual echos;The effect of second half section is then less desirable, and primary sound is eliminated It must compare more, the signal after echo cancellor produces larger distortion.
Using voice status detection method proposed by the present invention, the effect of echo cancellor is as shown in Figure 4.Before comparison improves With two sections of PCM streams for carrying out obtaining after echo cancellor respectively after improvement, it can be seen that echo cancellor effect is improving voice shape It improves significantly after state detection method.Residual echo is eliminated more thorough, and near-end speech is also almost without there is distortion phenomenon.
In order to further verify effect of the voice status detection method proposed by the present invention in actual echo elimination system, Corresponding c program is write to this method, and this method is tested using voice communication software Sipdroid.
The step of voice status detection method according to the present invention, which modifies, executes VAD and DTD in the WebRTC of echo cancellor library Part, the echo cancellor library is then called in Sipdroid.Practical both-end is carried out using Sipdroid under various circumstances It converses and records, the voice PCM stream before and after echo cancellor is saved, to carry out echo cancellor effect analysis.
In order to more convenient and clear when carrying out observation analysis after taking out voice flow, every time in test, two callers Count off is successively carried out from 1 to 10.Under various circumstances, repeatedly led to before improving with improved Sipdroid version respectively Words test is to compare.
The multiple speaking test of Sipdroid echo cancellor effect progress first to the echo cancellor library before improving is used, and PCM stream after taking out distal end, proximal end and echo cancellor.Test results are shown in figure 5, and the PCM stream of count off part is only intercepted in figure. Wherein, first segment PCM stream is remote signaling, and second segment PCM stream is near end signal, and third section PCM stream is close after echo cancellor End signal.As it can be seen that echo cancellor effect is less desirable, there is a little residual echo in count off part, and dotted line frame irises out part.Other Test result is largely similar.
Then, to the echo cancellor effect for the Sipdroid for using improved echo cancellor library also use same method into The multiple speaking test of row, and take out the PCM stream after distal end, proximal end and echo cancellor.Fig. 6 is more representational primary test As a result.Similar with Fig. 5, first segment PCM stream is remote signaling in figure, and second segment PCM stream is near end signal, and third section PCM stream is Near end signal after echo cancellor.As it can be seen that echo cancellor effect compares reason after using the improved speech detection method of the present invention Think, the residual echo of count off part eliminates ratio more thoroughly, and as dotted line frame irises out part, while the reservation of primary sound is not also by shadow It rings.Repeatedly test discovery, under various circumstances, the effect of echo cancellor will receive certain influence, and stability need further It improves.But in most cases, all compared with before-improvement using the echo cancellor effect after voice status detection method of the invention Echo cancellor effect have clear improvement.

Claims (5)

1. a kind of voice status detection method suitable for echo cancelling system, which is characterized in that realize that steps are as follows:
Step 1: constructing support vector machines classifier using noise signal training sample and voice signal training sample;
Characteristics extraction is carried out to noise signal training sample and voice signal training sample respectively and gauss hybrid models GMM is instructed Practice, construct corresponding Gauss super vector, then utilizes the kernel function and voice signal of Gauss super vector construction SVM classifier SVM model corresponding with noise signal;SVM classifier is obtained using the kernel function and SVM Construction of A Model that have constructed;
Step 2: signal to be detected is the proximally and distally signal after piecemeal, it is remote to this piecemeal using the SVM classifier constructed End signal carries out VAD judgement;VAD indicates Voice activity detector;
Characteristics extraction and GMM training are carried out to this piecemeal remote signaling, construct Gauss super vector, then this piecemeal remote signaling Corresponding Gauss super vector, which is input in the SVM classifier constructed, to be made decisions;If it is judged that being noise, no language is indicated Sound then stops filter update and filtering, directly output near-end voice signals, otherwise illustrates that there is voice in distal end, carries out in next step Dual end communication judgement;
Step 3: judging whether system belongs to dual end communication state;
Calculate the normalized crosscorrelation ξ of remote signaling and error signalXECC;Compare normalized crosscorrelation ξXECCWith the thresholding of setting TXECC, work as ξXECC< TXECCWhen, system is in dual end communication state, stops filter coefficient update, filters near end signal Wave;Otherwise, proximal end is filtered device coefficient update and filtering according to remote signaling without voice.
2. a kind of voice status detection method suitable for echo cancelling system according to claim 1, which is characterized in that The first step constructs SVM classifier, includes the following steps:
Step S101: characteristics extraction is carried out to noise signal training sample and voice signal training sample;Used feature Value is Mel cepstrum coefficient MFCC;
The extraction process of MFCC is: carrying out preemphasis, piecemeal and windowing process to signal, the piecemeal after adding window is passed through quick Fu In leaf transformation FFT find out the frequency spectrum parameter of each piecemeal;By the frequency spectrum parameter of each piecemeal by one group by K triangular band pass Mel scale filter composed by filter, and logarithm is taken to the output of each frequency band, obtain log spectrum;If K band logical filter The number of wave device is from 0 to K-1, then corresponding obtained log spectrum is S after the bandpass filter that i-th of piecemeal passes through number ki (k), the l rank parameter m of the MFCC of i-th of piecemeali(l) are as follows:
Wherein, L is total order of the MFCC extracted;
Step S102: the Gauss super vector of noise signal training sample and voice signal training sample is generated;
The MFCC parameter for being utilized respectively noise signal training sample and voice signal training sample establishes noise signal and voice letter Number corresponding gauss hybrid models;
To a certain piecemeal, N rank gauss hybrid models g (x) is indicated are as follows:
Wherein, x is the L dimensional feature vector that constitutes of MFCC parameter of training sample this piecemeal, piIt (x) is the i-th of gauss hybrid models A Gaussian component, wiFor the weighted factor of i-th of Gaussian component;ΣiIt is the covariance matrix of i-th of Gaussian component, μiIt is i-th The mean vector of a Gaussian component;
Gauss hybrid models g (x) is further indicated that are as follows:N () indicates Gaussian probability density letter Number;
The update that gauss hybrid models parameter is carried out using EM algorithm, if finally obtaining the height of voice signal training sample This mixed model is g (s), wherein the mean vector of each Gaussian component isS indicates voice signal;It finally obtains Noise signal training sample gauss hybrid models be g (n), wherein the mean vector of each Gaussian component isN indicates noise signal;Voice signal training sample and noise are constructed using established gauss hybrid models The Gauss super vector m of signal training samplesAnd mnIt is respectively as follows:
Step S103: the Gauss super vector construction SVM classifier constructed is utilized;
It is utilized respectively Gauss super vector mnAnd msEstablish noise signal and the corresponding SVM model of voice signal;
Utilize Gauss super vector mnAnd msIt is as follows to construct kernel function K (n, s):
The SVM model for determining kernel function, the SVM model of voice signal and noise signal, obtains SVM classifier.
3. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist In in the third step, the method for error signal is: this piecemeal remote signaling and adaptive filter coefficient are carried out Convolution obtains estimated echo signal, and error signal is the difference of this piecemeal near end signal and estimated echo signal.
4. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist In in the third step, according to the normalized crosscorrelation ξ of following formula calculating remote signaling and error signalXECC:
Wherein, k indicates that frequency point, X (k) are remote signaling frequency spectrum, and E (k) is error signal spectrum.
5. a kind of voice status detection method suitable for echo cancelling system according to claim 1 or 2, feature exist In, in the third step, the thresholding T of settingXECCFor the value between 0.9 to 1, and real-time update is carried out according to court verdict.
CN201610519040.6A 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system Active CN105957520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610519040.6A CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Publications (2)

Publication Number Publication Date
CN105957520A CN105957520A (en) 2016-09-21
CN105957520B true CN105957520B (en) 2019-10-11

Family

ID=56903377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610519040.6A Active CN105957520B (en) 2016-07-04 2016-07-04 A kind of voice status detection method suitable for echo cancelling system

Country Status (1)

Country Link
CN (1) CN105957520B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429994B (en) * 2017-02-15 2020-10-09 阿里巴巴集团控股有限公司 Audio identification and echo cancellation method, device and equipment
CN109215672B (en) * 2017-07-05 2021-11-16 苏州谦问万答吧教育科技有限公司 Method, device and equipment for processing sound information
CN109309764B (en) * 2017-07-28 2021-09-03 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN107888792B (en) * 2017-10-19 2019-09-17 浙江大华技术股份有限公司 A kind of echo cancel method, apparatus and system
CN109068012B (en) * 2018-07-06 2021-04-27 南京时保联信息科技有限公司 Double-end call detection method for audio conference system
CN109348072B (en) * 2018-08-30 2021-03-02 湖北工业大学 Double-end call detection method applied to echo cancellation system
CN109473123B (en) 2018-12-05 2022-05-31 百度在线网络技术(北京)有限公司 Voice activity detection method and device
CN109379501B (en) * 2018-12-17 2021-12-21 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109448748B (en) * 2018-12-17 2021-08-03 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109493878B (en) * 2018-12-17 2021-08-31 嘉楠明芯(北京)科技有限公司 Filtering method, device, equipment and medium for echo cancellation
CN109547655A (en) * 2018-12-30 2019-03-29 广东大仓机器人科技有限公司 A kind of method of the echo cancellation process of voice-over-net call
CN111294473B (en) * 2019-01-28 2022-01-04 展讯通信(上海)有限公司 Signal processing method and device
CN112133324A (en) * 2019-06-06 2020-12-25 北京京东尚科信息技术有限公司 Call state detection method, device, computer system and medium
CN110246516B (en) * 2019-07-25 2022-06-17 福建师范大学福清分校 Method for processing small space echo signal in voice communication
CN112614500B (en) * 2019-09-18 2024-06-25 北京声智科技有限公司 Echo cancellation method, device, equipment and computer storage medium
CN110944089A (en) * 2019-11-04 2020-03-31 中移(杭州)信息技术有限公司 Double-talk detection method and electronic equipment
CN111049848B (en) 2019-12-23 2021-11-23 腾讯科技(深圳)有限公司 Call method, device, system, server and storage medium
CN111048118B (en) * 2019-12-24 2022-07-26 大众问问(北京)信息科技有限公司 Voice signal processing method and device and terminal
CN111161748B (en) * 2020-02-20 2022-09-23 百度在线网络技术(北京)有限公司 Double-talk state detection method and device and electronic equipment
CN114242106B (en) * 2020-09-09 2024-10-29 中车株洲电力机车研究所有限公司 Voice processing method and device
CN112637833B (en) * 2020-12-21 2022-10-11 新疆品宣生物科技有限责任公司 Communication terminal information detection method and equipment
CN113223546A (en) * 2020-12-28 2021-08-06 南京愔宜智能科技有限公司 Audio and video conference system and echo cancellation device for same
CN113241085B (en) * 2021-04-29 2022-07-22 北京梧桐车联科技有限责任公司 Echo cancellation method, device, equipment and readable storage medium
CN115273909B (en) * 2022-07-28 2024-07-30 歌尔科技有限公司 Voice activity detection method, device, equipment and computer readable storage medium
CN117437929B (en) * 2023-12-21 2024-03-08 睿云联(厦门)网络通讯技术有限公司 Real-time echo cancellation method based on neural network
CN118645113B (en) * 2024-08-14 2024-10-29 腾讯科技(深圳)有限公司 Voice signal processing method, device, equipment, medium and product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012009047A1 (en) * 2010-07-12 2012-01-19 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
WO2013040414A1 (en) * 2011-09-16 2013-03-21 Qualcomm Incorporated Mobile device context information using speech detection
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN105657110A (en) * 2016-02-26 2016-06-08 深圳Tcl数字技术有限公司 Voice communication echo cancellation method and device

Also Published As

Publication number Publication date
CN105957520A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105957520B (en) A kind of voice status detection method suitable for echo cancelling system
US11017791B2 (en) Deep neural network-based method and apparatus for combining noise and echo removal
CN109841206B (en) Echo cancellation method based on deep learning
Carbajal et al. Multiple-input neural network-based residual echo suppression
US9633671B2 (en) Voice quality enhancement techniques, speech recognition techniques, and related systems
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN107123430A (en) Echo cancellation method, device, conference tablet and computer storage medium
Zhang et al. FT-LSTM based complex network for joint acoustic echo cancellation and speech enhancement
CN109979476B (en) Method and device for removing reverberation of voice
CN112735456A (en) Speech enhancement method based on DNN-CLSTM network
CN107610712B (en) Voice enhancement method combining MMSE and spectral subtraction
CN109767780A (en) A kind of audio signal processing method, device, equipment and readable storage medium storing program for executing
CN106157964A (en) A kind of determine the method for system delay in echo cancellor
Seidel et al. Y $^ 2$-Net FCRN for Acoustic Echo and Noise Suppression
CN107635082A (en) A kind of both-end sounding end detecting system
Nuthakki et al. Speech enhancement based on deep convolutional neural network
CN106161820B (en) A kind of interchannel decorrelation method for stereo acoustic echo canceler
Sawata et al. Improving character error rate is not equal to having clean speech: Speech enhancement for asr systems with black-box acoustic models
CN112382301A (en) Noise-containing voice gender identification method and system based on lightweight neural network
CN115083431A (en) Echo cancellation method and device, electronic equipment and computer readable medium
CN110148421A (en) A kind of residual echo detection method, terminal and device
CN103971697B (en) Sound enhancement method based on non-local mean filtering
JP2001520764A (en) Speech analysis system
CN101533642B (en) Method for processing voice signal and device
CN112133324A (en) Call state detection method, device, computer system and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant