CN104810021B - The pre-treating method and device recognized applied to far field - Google Patents
The pre-treating method and device recognized applied to far field Download PDFInfo
- Publication number
- CN104810021B CN104810021B CN201510236032.6A CN201510236032A CN104810021B CN 104810021 B CN104810021 B CN 104810021B CN 201510236032 A CN201510236032 A CN 201510236032A CN 104810021 B CN104810021 B CN 104810021B
- Authority
- CN
- China
- Prior art keywords
- signal
- fixed
- echo cancellation
- optimal
- sound echo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
A kind of pre-treating method and device recognized applied to far field of present invention proposition, should be applied to the pre-treating method of far field identification includes voice signal to be processed being fixed Wave beam forming processing, is fixed the beam signal after Wave beam forming processing;Beam signal after handling the fixed beam formation, carry out sound Echo cancellation and optimal beam selection;According to the beam signal after sound Echo cancellation and optimal beam selection, obtain being applied to the signal after the pre-treatment that far field is recognized.This method can improve pre-treatment effect, and optionally, operand can be reduced when voice signal quantity is larger.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of pre-treating method and dress recognized applied to far field
Put.
Background technology
Far field identification technology, namely remote identification technology, typically to solve speaker apart from 2 meters of speech ciphering equipment it
The speech recognition request of outer scene.In order to obtain more stable reliable far field recognition performance, recognized for far field before scene
Processing (far field pickup) technology just seems particularly urgent and important.
In the prior art, the flow series connection of far field pickup includes successively:Sound Echo cancellation (Acoustic echo
Cancellation, AEC), auditory localization, Adaptive beamformer (Adaptive Beamforming, ABF), single wheat enhancing
And post processing.
But, auditory localization module is needed in the prior art, and the degree of accuracy of auditory localization module itself is with regard to undesirable, Er Qieyu
Follow-up ABF series connection, can also influence ABF performance, so that pre-treatment effect is influenceed, in addition, AEC is first carried out, when to be processed
When the quantity of voice signal is larger, operand is also larger.
The content of the invention
It is contemplated that at least solving one of technical problem in correlation technique to a certain extent.
Therefore, it is an object of the present invention to propose a kind of pre-treating method recognized applied to far field, this method can
To improve pre-treatment effect, and optionally, operand can be reduced when voice signal quantity is larger.
It is another object of the present invention to propose a kind of pretreating device recognized applied to far field.
To reach above-mentioned purpose, what first aspect present invention embodiment was proposed is applied to the pre-treating method that far field is recognized,
Including:Wave beam forming processing is fixed to voice signal to be processed, the beam signal after Wave beam forming processing is fixed;
Beam signal after handling the fixed beam formation, carry out sound Echo cancellation and optimal beam selection;According to sound echo
Beam signal after elimination and optimal beam selection, obtains being applied to the signal after the pre-treatment that far field is recognized.
What first aspect present invention embodiment was proposed is applied to the pre-treating method that far field is recognized, it is not necessary to auditory localization mould
Block, can avoid the problem of inaccurate pre-treatment effect caused of auditory localization is bad, so as to improve pre-treatment effect, and
And, optionally, AEC is carried out again after first carrying out FBF, because the number of beams after usual FBF is relative to voice signal to be processed
Quantity it is small, operand can be reduced.
To reach above-mentioned purpose, what second aspect of the present invention embodiment was proposed is applied to the pretreating device that far field is recognized,
Including:Fixed beam formation module, for voice signal to be processed being fixed Wave beam forming processing, is fixed wave beam
Beam signal after formation processing;Processing module, for the beam signal after fixed beam formation processing, carry out sound to be returned
Ripple is eliminated and optimal beam selection;Acquisition module, for being believed according to the wave beam after sound Echo cancellation and optimal beam selection
Number, obtain being applied to the signal after the pre-treatment that far field is recognized.
What second aspect of the present invention embodiment was proposed is applied to the pretreating device that far field is recognized, it is not necessary to auditory localization mould
Block, can avoid the problem of inaccurate pre-treatment effect caused of auditory localization is bad, so as to improve pre-treatment effect, and
And, optionally, AEC is carried out again after first carrying out FBF, because the number of beams after usual FBF is relative to voice signal to be processed
Quantity it is small, operand can be reduced.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments
Substantially and be readily appreciated that, wherein:
Fig. 1 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that one embodiment of the invention is proposed;
Fig. 2 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that another embodiment of the present invention is proposed;
Fig. 3 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that another embodiment of the present invention is proposed;
Fig. 4 is the structural representation for being applied to the pretreating device that far field is recognized that another embodiment of the present invention is proposed;
Fig. 5 is the structural representation for being applied to the pretreating device that far field is recognized that another embodiment of the present invention is proposed;
Fig. 6 is the structural representation for being applied to the pretreating device that far field is recognized that another embodiment of the present invention is proposed.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar module or the module with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.On the contrary, this
All changes in the range of spirit and intension that the embodiment of invention includes falling into attached claims, modification and equivalent
Thing.
Fig. 1 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that one embodiment of the invention is proposed, the party
Method includes:
S11:Wave beam forming processing is fixed to voice signal to be processed, the ripple after Wave beam forming processing is fixed
Beam signal.
Wherein, voice signal to be processed can refer to microphone signal, and microphone signal refers to the letter that microphone is picked up
Number, including near-end voice signals (phonetic control command), RMR room reverb and various environmental noises etc..
When being recognized in far field, in order to improve recognition performance, it will usually using microphone array (shotgun microphone or omnidirectional
Microphone), therefore, voice signal to be processed can specifically refer to microphone array signals, and microphone array signals include many
Road microphone signal.
Beam-forming technology can include the ABF used in the prior art, in addition to fixed beam formation (Fixed
Beamforming, FBF).
ABF spatial beams characteristic is adaptive change, and FBF spatial beams characteristic is changeless.Space
The signal gain response of beam feature such as specific direction.
During FBF processing, optionally, the number for the fixed beam that the fixed beam formation processing is used is multiple, each
Fixed beam covering part space, all fixed beams form the covering to whole space.
All standing by wave beam to space, it is ensured that user may detect that user when being located at space optional position
Speech, it is to avoid the limitation to customer location.
When the quantity of voice signal to be processed (such as microphone array signals) is larger, in order to reduce operand, FBF is adopted
The quantity of fixed beam can be less than the quantity of voice signal to be processed.
For example, the number of the fixed beam is 3, different fixed beams is covered each by 120 degree different of spaces;
Or, the number of the fixed beam is 6, and different fixed beams is covered each by 60 degree different of spaces.
S12:Beam signal after handling the fixed beam formation, carry out sound Echo cancellation and optimal beam choosing
Select.
Wherein, sound Echo cancellation (Acoustic would generally be included in interference signal, speech recognition interactive system in order to eliminate
Echo cancellation, AEC) module, AEC modules are commonly referred to as BargeIn functional modules.
Interference signal is, for example, the music that speech recognition interactive system (hereinafter referred to as system) is produced, phonetic synthesis
(text to speech, TTS) signal etc..
Because AEC modules are except that will follow the trail of study from the loudspeaker of system to the acoustic transfer function of microphone
(Acoustic transfer function, ATF), will also learn the anaplasia at any time that the various processing modules before it are produced
The composition of change, if these changes are faster than the convergence rate of sef-adapting filter in AEC, just occurs that AEC modules always can not
The problem of ideal learns to these quick changes, and then cause the interference signal for system plays not eliminate very well.
Because ABF spatial beams characteristic is to change, also, the pace of change of generally ABF wave filter is far longer than
The pace of change of the wave filter of AEC modules, so, ABF in the prior art can not be placed on to AEC and come to improve signal to noise ratio.And
AEC treatment effect depends on signal to noise ratio, and signal to noise ratio more high disposal effect is better.Due to that ABF can not be placed on before AEC to carry
ABF can not be placed on the mode handled before AEC by high s/n ratio, therefore, prior art, can influence AEC effects, Jin Erhui
Influence far field recognition effect.
And in the present embodiment, using FBF, because FBF spatial beams characteristic is changeless, come for AEC modules
Say to be exactly known, it is not necessary to which AEC modules are tracked study, therefore, FBF can be placed in the present embodiment before AEC.By
After being handled by FBF, signal to noise ratio can be improved, therefore, FBF is placed on before AEC, AEC treatment effect will be improved, and then
Improve far field recognition effect.
On the other hand, during quantity larger (such as more than 6) of the signal included in microphone array signals, prior art
In, first carry out AEC, then the number of the AEC modules of needs is just identical with the quantity of microphone signal, also just than larger.And this
In embodiment, first carry out FBF carry out AEC again, it is necessary to AEC modules quantity it is identical with the number of FBF wave beams, and FBF ripple
Beam number is typically smaller than the quantity of the larger microphone signal of quantity, and such as FBF number of beams is 3 or 6, then can
With the quantity for the AEC modules for significantly reducing needs, operand is reduced.
When optimal beam is selected, it can be selected according to default selection criterion.For example, default selection criterion is most
Big signal-to-noise ratio (SNR) Criterion, the then wave beam for selecting signal to noise ratio maximum is used as optimal beam.
In specific processing, it can first carry out AEC and carry out optimal beam selection again, or, it can also first carry out optimal ripple
Beam selection carries out AEC again.
S13:According to the beam signal after sound Echo cancellation and optimal beam selection, obtain being applied to before the identification of far field
Signal after processing.
After carry out sound Echo cancellation and optimal beam selection, some post processings can be carried out again, further to improve
Treatment effect.
After signal after the preceding processing for obtaining recognizing applied to far field, the signal after the pre-treatment can be input to
Processing is identified in identifier (far field identification engine).
In the present embodiment, it is not necessary to which auditory localization is handled, therefore it is possible to prevente effectively from is caused due to auditory localization mistake
Overall system performance it is unstable and abnormal;, can be effective by selecting optimal beam signal in fixed space beam signal
Constraint and limitation of the conventional method for near-end speaker position are broken through, so as to realize that seamlessly adaptation teller is continuous in room
Mobile application scenarios, significantly improve overall customer experience;Using fixed beam formation technology, its spatial beams characteristic is all not
Change over time, this characteristic is to be arrived well by follow-up AEC modules study, so as to which FBF modules are mentioned
Handled before AEC modules.The reference signal of more high s/n ratio on the one hand so can be obtained, is effectively improved follow-up AEC's
Convergence rate and performance, on the other hand, due to being generally less than microphone number with FBF spatial beams number, it is possible to
Effectively reduce the access times of AEC modules and reduce overall calculation amount.
Fig. 2 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that another embodiment of the present invention is proposed, should
Method includes:
S21:Microphone array signals are fixed with Wave beam forming processing.
, can be first with microphone array (shotgun microphone or omnidirectional in order to improve AEC performances and reduce amount of calculation
Microphone) whole space is divided into several spatial beams regions (such as 3 or 6).
Due to using fixed beam formation (Fixed Beamforming, FBF) technology, beam feature is not anaplasia at any time
Change, therefore this characteristic is to be arrived well by follow-up AEC modules study.Therefore FBF modules can be mentioned to AEC moulds
Handled before block.The reference signal for obtaining more high s/n ratio on the one hand can be so handled using FBF, so as to be effectively improved
Follow-up AEC convergence rate and performance;On the other hand, generally it is less than microphone with the spatial beams number of FBF modules formation
Number, thus can effectively reduce the access times of AEC modules and reduce overall calculation amount.
S22:It is right using the beam signal number identical sound Echo cancellation module after being handled with the fixed beam formation
Beam signal carry out sound Echo cancellation after each fixed beam formation processing, obtains the letter of the wave beam after multiple sound Echo cancellations
Number.
FBF modules can export the beam signal in some directions, and these signals are passed through AEC modules to eliminate the interference that it is included
The music of signal, such as system plays, TTS, far field recognition performance can be just obviously improved by removing the signal after echo.
S23:In multiple beam signals after sound Echo cancellation, optimal beam selection is carried out, optimal beam letter is selected
Number.
Each spatial beams signal after two above resume module, eliminates various environment to greatest extent
Interference, including background noise, the music of RMR room reverb and system plays, TTS etc..In this step, the present embodiment can be according to one
Fixed criterion (such as maximum signal noise ratio principle etc.), selects optimal spatial beams signal from several spatial beams signals,
It is used as the output signal of the step.The auditory localization module in conventional solution is so eliminated, meter is not only effectively reduced
Calculation amount, and error propagation effect can be avoided, that is, overall system performance is unstable caused by auditory localization mistake
And exception.Eliminate simultaneously to the relatively-stationary limitation in near-end speaker position in conventional solution, so as to further improve
Consumer's Experience;The present embodiment instead of auditory localization module by automatically selecting Optimal Signals in some fixed beam signals,
The application scenarios for seamlessly adapting to teller's continuous moving in room can thus be realized.
S24:Single wheat passage enhancing is carried out to the beam signal after sound Echo cancellation and optimal beam selection and is post-processed,
And the signal after single wheat is strengthened and post-processed is defined as being applied to the signal after the pre-treatment that far field is recognized.
Similar with traditional technical scheme, various single microphone noise cancellation techniques can be used for further eliminating remaining noise
And concatenate special post-processing technology in rear end, such as gain amplification, dynamic range control (Dynamic range control,
DRC) etc., so as to preferably improve far field recognition performance.
In the present embodiment, on the basis of a upper embodiment, it can first carry out AEC and carry out optimal direction beam selection again,
Need not now limit does not have system interference signal, and applicable scene is wider.
Fig. 3 is the schematic flow sheet for being applied to the pre-treating method that far field is recognized that another embodiment of the present invention is proposed, should
When method be may apply in the absence of system interference signal, this method includes:
S31:Microphone array signals are fixed with Wave beam forming processing.
The content of fixed beam formation processing may refer to the associated description in above-described embodiment, will not be repeated here.
S32:From the beam signal after multiple fixed beam formation processing, optimal beam selection is carried out, one is selected
Optimal beam signal.
There is special module to carry out the detection of talk situation in AEC modules, can substantially there is three kinds of states, only near-end speech
Signal, double speaking state (near-end speech and far-end speech) and the only state of remote signaling, far-end speech is the sound of system plays
Happy or TTS signals etc..When knowing currently only near-end voice signals by the detection of the dedicated module in AEC modules, it is possible to
It is determined that in the absence of system interference signal, so as to first carry out optimal beam selection, being carried out for example with maximum signal noise ratio principle
Selection.The mode of specific optimal beam selection may refer to the associated description of above-described embodiment, will not be repeated here.
S33:Using a sound Echo cancellation module, to the optimal beam signal carry out sound Echo cancellation.
After optimal beam selection, the only signal, therefore can be carried out only with an AEC module all the way of output
AEC, so as to reduce operand.
S34:Single wheat enhancing is carried out to the beam signal after sound Echo cancellation and optimal beam selection and is post-processed, and will
Signal after single wheat enhancing and post processing is defined as being applied to the signal after the pre-treatment that far field is recognized.
Similar with traditional technical scheme, various single microphone noise cancellation techniques can be used for further eliminating remaining noise
And concatenate special post-processing technology in rear end, such as gain amplification, dynamic range control (Dynamic range control,
DRC) etc., so as to preferably improve far field recognition performance.
In the present embodiment, in no system interference signal, it can first carry out optimal direction beam selection and carry out AEC again,
So as to reduce the quantity of AEC modules, operand is reduced.
Fig. 4 is the structural representation for being applied to the pretreating device that far field is recognized that another embodiment of the present invention is proposed, should
Device 40 includes:
Fixed beam formation module 41, for voice signal to be processed being fixed Wave beam forming processing, consolidate
Determine the beam signal after Wave beam forming processing;
Wherein, voice signal to be processed can refer to microphone signal, and microphone signal refers to the letter that microphone is picked up
Number, including near-end voice signals (phonetic control command), RMR room reverb and various environmental noises etc..
When being recognized in far field, in order to improve recognition performance, it will usually using microphone array (shotgun microphone or omnidirectional
Microphone), therefore, voice signal to be processed can specifically refer to microphone array signals, and microphone array signals include many
Road microphone signal.
Beam-forming technology can include the ABF used in the prior art, in addition to fixed beam formation (Fixed
Beamforming, FBF).
ABF spatial beams characteristic is adaptive change, and FBF spatial beams characteristic is changeless.Space
The signal gain response of beam feature such as specific direction.
During FBF processing, optionally, the number for the fixed beam that the fixed beam formation processing is used is multiple, each
Fixed beam covering part space, all fixed beams form the covering to whole space.
All standing by wave beam to space, it is ensured that user may detect that user when being located at space optional position
Speech, it is to avoid the limitation to customer location.
When the quantity of voice signal to be processed (such as microphone array signals) is larger, in order to reduce operand, FBF is adopted
The quantity of fixed beam can be less than the quantity of voice signal to be processed.
For example, the number of the fixed beam is 3, different fixed beams is covered each by 120 degree different of spaces;
Or, the number of the fixed beam is 6, and different fixed beams is covered each by 60 degree different of spaces.
Processing module 42, for the fixed beam formation handle after beam signal, carry out sound Echo cancellation and
Optimal beam is selected;
Wherein, sound Echo cancellation (Acoustic would generally be included in interference signal, speech recognition interactive system in order to eliminate
Echo cancellation, AEC) module, AEC modules are commonly referred to as BargeIn functional modules.
Interference signal is, for example, the music that speech recognition interactive system (hereinafter referred to as system) is produced, phonetic synthesis
(text to speech, TTS) signal etc..
Because AEC modules are except that will follow the trail of study from the loudspeaker of system to the acoustic transfer function of microphone
(Acoustic transfer function, ATF), will also learn the anaplasia at any time that the various processing modules before it are produced
The composition of change, if these changes are faster than the convergence rate of sef-adapting filter in AEC, just occurs that AEC modules always can not
The problem of ideal learns to these quick changes, and then cause the interference signal for system plays not eliminate very well.
Because ABF spatial beams characteristic is to change, also, the pace of change of generally ABF wave filter is far longer than
The pace of change of the wave filter of AEC modules, so, ABF in the prior art can not be placed on to AEC and come to improve signal to noise ratio.And
AEC treatment effect depends on signal to noise ratio, and signal to noise ratio more high disposal effect is better.Due to that ABF can not be placed on before AEC to carry
ABF can not be placed on the mode handled before AEC by high s/n ratio, therefore, prior art, can influence AEC effects, Jin Erhui
Influence far field recognition effect.
And in the present embodiment, using FBF, because FBF spatial beams characteristic is changeless, come for AEC modules
Say to be exactly known, it is not necessary to which AEC modules are tracked study, therefore, FBF can be placed in the present embodiment before AEC.By
After being handled by FBF, signal to noise ratio can be improved, therefore, FBF is placed on before AEC, AEC treatment effect will be improved, and then
Improve far field recognition effect.
On the other hand, during quantity larger (such as more than 6) of the signal included in microphone array signals, prior art
In, first carry out AEC, then the number of the AEC modules of needs is just identical with the quantity of microphone signal, also just than larger.And this
In embodiment, first carry out FBF carry out AEC again, it is necessary to AEC modules quantity it is identical with the number of FBF wave beams, and FBF ripple
Beam number is typically smaller than the quantity of the larger microphone signal of quantity, and such as FBF number of beams is 3 or 6, then can
With the quantity for the AEC modules for significantly reducing needs, operand is reduced.
When optimal beam is selected, it can be selected according to default selection criterion.For example, default selection criterion is most
Big signal-to-noise ratio (SNR) Criterion, the then wave beam for selecting signal to noise ratio maximum is used as optimal beam.
In specific processing, it can first carry out AEC and carry out optimal beam selection again, or, it can also first carry out optimal ripple
Beam selection carries out AEC again.
For example, with reference to Fig. 5, when the beam signal after fixed beam formation processing is multiple, the processing module
42 include:
Sound Echo cancellation module 51, the beam signal number after being handled with the fixed beam formation is identical, and described solid
The connection of Wave beam forming module is determined, for the beam signal carry out sound Echo cancellation after each fixed beam formation processing, obtaining
Beam signal after multiple sound Echo cancellations;
FBF modules can export the beam signal in some directions, and these signals are passed through AEC modules to eliminate the interference that it is included
The music of signal, such as system plays, TTS, far field recognition performance can be just obviously improved by removing the signal after echo.
Optimal beam selecting module 52, is connected with the sound Echo cancellation module, for multiple after sound Echo cancellation
In beam signal, optimal beam selection is carried out, optimal beam signal is selected.
Each spatial beams signal after two above resume module, eliminates various environment to greatest extent
Interference, including background noise, the music of RMR room reverb and system plays, TTS etc..In this step, the present embodiment can be according to one
Fixed criterion (such as maximum signal noise ratio principle etc.), selects optimal spatial beams signal from several spatial beams signals,
It is used as the output signal of the step.The auditory localization module in conventional solution is so eliminated, meter is not only effectively reduced
Calculation amount, and error propagation effect can be avoided, that is, overall system performance is unstable caused by auditory localization mistake
And exception.Eliminate simultaneously to the relatively-stationary limitation in near-end speaker position in conventional solution, so as to further improve
Consumer's Experience;The present embodiment instead of auditory localization module by automatically selecting Optimal Signals in some fixed beam signals,
The application scenarios for seamlessly adapting to teller's continuous moving in room can thus be realized.
In another example, referring to Fig. 6, when the beam signal after fixed beam formation processing is multiple, and, when in the absence of
During system interference signal, the processing module 42 includes:
Optimal beam selecting module 61, is connected with fixed beam formation module, for being formed from multiple fixed beams
In beam signal after processing, optimal beam selection is carried out, an optimal beam signal is selected;
There is special module to carry out the detection of talk situation in AEC modules, can substantially there is three kinds of states, only near-end speech
Signal, double speaking state (near-end speech and far-end speech) and the only state of remote signaling, far-end speech is the sound of system plays
Happy or TTS signals etc..When knowing currently only near-end voice signals by the detection of the dedicated module in AEC modules, it is possible to
It is determined that in the absence of system interference signal, so as to first carry out optimal beam selection, being carried out for example with maximum signal noise ratio principle
Selection.The mode of specific optimal beam selection may refer to the associated description of above-described embodiment, will not be repeated here.
One sound Echo cancellation module 62, is connected with the optimal beam selecting module, for believing the optimal beam
Number carry out sound Echo cancellation.
After optimal beam selection, the only signal, therefore can be carried out only with an AEC module all the way of output
AEC, so as to reduce operand.
Acquisition module 43, for according to the beam signal after sound Echo cancellation and optimal beam selection, being applied to
Signal after the pre-treatment of far field identification.
After carry out sound Echo cancellation and optimal beam selection, some post processings can be carried out again, further to improve
Treatment effect.
After signal after the preceding processing for obtaining recognizing applied to far field, the signal after the pre-treatment can be input to
Processing is identified in identifier (far field identification engine).
Optionally, the acquisition module 43 specifically for:
Single wheat enhancing is carried out to the beam signal after sound Echo cancellation and optimal beam selection and is post-processed, and by single wheat
Signal after enhancing and post processing is defined as being applied to the signal after the pre-treatment that far field is recognized.
Similar with traditional technical scheme, various single microphone noise cancellation techniques can be used for further eliminating remaining noise
And concatenate special post-processing technology in rear end, such as gain amplification, dynamic range control (Dynamic range control,
DRC) etc., so as to preferably improve far field recognition performance.
In the present embodiment, it is not necessary to which auditory localization is handled, therefore it is possible to prevente effectively from is caused due to auditory localization mistake
Overall system performance it is unstable and abnormal;, can be effective by selecting optimal beam signal in fixed space beam signal
Constraint and limitation of the conventional method for near-end speaker position are broken through, so as to realize that seamlessly adaptation teller is continuous in room
Mobile application scenarios, significantly improve overall customer experience;Using fixed beam formation technology, its spatial beams characteristic is all not
Change over time, this characteristic is to be arrived well by follow-up AEC modules study, so as to which FBF modules are mentioned
Handled before AEC modules.The reference signal of more high s/n ratio on the one hand so can be obtained, is effectively improved follow-up AEC's
Convergence rate and performance, on the other hand, due to being generally less than microphone number with FBF spatial beams number, it is possible to
Effectively reduce the access times of AEC modules and reduce overall calculation amount.
It should be noted that in the description of the invention, term " first ", " second " etc. are only used for describing purpose, without
It is understood that to indicate or imply relative importance.In addition, in the description of the invention, unless otherwise indicated, the implication of " multiple "
Refer at least two.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include
Module, fragment or the portion of the code of one or more executable instructions for the step of realizing specific logical function or process
Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not be by shown or discussion suitable
Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned
In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage
Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried
Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing module, can also
That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould
Block can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as
Fruit is realized using in the form of software function module and as independent production marketing or in use, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described
Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not
Necessarily refer to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be any
One or more embodiments or example in combine in an appropriate manner.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example
Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, changed, replacing and modification.
Claims (11)
1. a kind of pre-treating method recognized applied to far field, it is characterised in that including:
Wave beam forming processing is fixed to voice signal to be processed, the beam signal after Wave beam forming processing is fixed;
Beam signal after handling the fixed beam formation, carry out sound Echo cancellation and optimal beam selection;
According to the beam signal after sound Echo cancellation and optimal beam selection, obtain being applied to after the pre-treatment that far field is recognized
Signal.
2. according to the method described in claim 1, it is characterised in that the fixed beam that the fixed beam formation processing is used
Number is multiple, and each fixed beam covering part space, all fixed beams form the covering to whole space.
3. method according to claim 2, it is characterised in that the number of the fixed beam is 3, different fixation ripples
Beam is covered each by 120 degree different of spaces;Or, the number of the fixed beam is 6, and different fixed beams cover respectively
60 degree different of space of lid.
4. according to the method described in claim 1, it is characterised in that the fixed beam that the fixed beam formation processing is used
Number is multiple, and, the quantity of the fixed beam is less than the quantity of voice signal to be processed.
5. the method according to claim any one of 1-4, it is characterised in that the ripple after fixed beam formation is handled
When beam signal is multiple, it is described the fixed beam formation is handled after beam signal, carry out sound Echo cancellation and optimal
Beam selection, including:
Using the beam signal number identical sound Echo cancellation module after being handled with the fixed beam formation, to each fixation
Beam signal carry out sound Echo cancellation after Wave beam forming processing, obtains the beam signal after multiple sound Echo cancellations;
In multiple beam signals after sound Echo cancellation, optimal beam selection is carried out, optimal beam signal is selected.
6. the method according to claim any one of 1-4, it is characterised in that the ripple after fixed beam formation is handled
When beam signal is multiple, and, when in the absence of system interference signal, the signal wave after the processing to the fixed beam formation
Beam signal, carry out sound Echo cancellation and optimal beam selection, including:
From the beam signal after multiple fixed beam formation processing, optimal beam selection is carried out, an optimal beam is selected
Signal;
Using a sound Echo cancellation module, to the optimal beam signal carry out sound Echo cancellation.
7. the method according to claim any one of 1-4, it is characterised in that described according to sound Echo cancellation and optimal ripple
Beam signal after beam selection, obtains being applied to the signal after the pre-treatment that far field is recognized, including:
Single wheat enhancing is carried out to the beam signal after sound Echo cancellation and optimal beam selection and is post-processed, and single wheat is strengthened
It is defined as being applied to the signal after the pre-treatment that far field is recognized with the signal after post processing.
8. a kind of pretreating device recognized applied to far field, it is characterised in that including:
Fixed beam formation module, for voice signal to be processed being fixed Wave beam forming processing, is fixed wave beam
Beam signal after formation processing;
Processing module, for the beam signal after fixed beam formation processing, carry out sound Echo cancellation and optimal ripple
Beam is selected;
Acquisition module, for according to the beam signal after sound Echo cancellation and optimal beam selection, obtaining being applied to far field knowledge
Signal after other pre-treatment.
9. device according to claim 8, it is characterised in that the beam signal after fixed beam formation is handled is
When multiple, the processing module includes:
Sound Echo cancellation module, the beam signal number after being handled with the fixed beam formation is identical, with the fixed beam
Module connection is formed, for the beam signal carry out sound Echo cancellation after each fixed beam formation processing, obtaining multiple sound
Beam signal after Echo cancellation;
Optimal beam selecting module, is connected with the sound Echo cancellation module, for multiple wave beams letter after sound Echo cancellation
In number, optimal beam selection is carried out, optimal beam signal is selected.
10. device according to claim 8, it is characterised in that the beam signal after fixed beam formation is handled
When being multiple, and, when in the absence of system interference signal, the processing module includes:
Optimal beam selecting module, is connected with fixed beam formation module, for after multiple fixed beam formation processing
Beam signal in, carry out optimal beam selection, select an optimal beam signal;
One sound Echo cancellation module, is connected with the optimal beam selecting module, for being carried out to the optimal beam signal
Sound Echo cancellation.
11. the device according to claim any one of 8-10, it is characterised in that the acquisition module specifically for:
Single wheat enhancing is carried out to the beam signal after sound Echo cancellation and optimal beam selection and is post-processed, and single wheat is strengthened
It is defined as being applied to the signal after the pre-treatment that far field is recognized with the signal after post processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236032.6A CN104810021B (en) | 2015-05-11 | 2015-05-11 | The pre-treating method and device recognized applied to far field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236032.6A CN104810021B (en) | 2015-05-11 | 2015-05-11 | The pre-treating method and device recognized applied to far field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104810021A CN104810021A (en) | 2015-07-29 |
CN104810021B true CN104810021B (en) | 2017-08-18 |
Family
ID=53694808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510236032.6A Active CN104810021B (en) | 2015-05-11 | 2015-05-11 | The pre-treating method and device recognized applied to far field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104810021B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105355210B (en) * | 2015-10-30 | 2020-06-23 | 百度在线网络技术(北京)有限公司 | Preprocessing method and device for far-field speech recognition |
CN105304093B (en) * | 2015-11-10 | 2017-07-25 | 百度在线网络技术(北京)有限公司 | Signal front-end processing method and device for speech recognition |
CN105427860B (en) * | 2015-11-11 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | Far field audio recognition method and device |
CN107018470B (en) * | 2016-01-28 | 2019-02-26 | 讯飞智元信息科技有限公司 | A kind of voice recording method and system based on annular microphone array |
CN105940445B (en) | 2016-02-04 | 2018-06-12 | 曾新晓 | A kind of voice communication system and its method |
CN105845131A (en) * | 2016-04-11 | 2016-08-10 | 乐视控股(北京)有限公司 | Far-talking voice recognition method and device |
US20170365271A1 (en) * | 2016-06-15 | 2017-12-21 | Adam Kupryjanow | Automatic speech recognition de-reverberation |
US10090000B1 (en) * | 2017-11-01 | 2018-10-02 | GM Global Technology Operations LLC | Efficient echo cancellation using transfer function estimation |
CN109935226A (en) * | 2017-12-15 | 2019-06-25 | 上海擎语信息科技有限公司 | A kind of far field speech recognition enhancing system and method based on deep neural network |
CN110364166B (en) * | 2018-06-28 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Electronic equipment for realizing speech signal recognition |
CN109599104B (en) * | 2018-11-20 | 2022-04-01 | 北京小米智能科技有限公司 | Multi-beam selection method and device |
CN109697987B (en) * | 2018-12-29 | 2021-05-25 | 思必驰科技股份有限公司 | External far-field voice interaction device and implementation method |
CN109920405A (en) * | 2019-03-05 | 2019-06-21 | 百度在线网络技术(北京)有限公司 | Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing |
CN110265020B (en) * | 2019-07-12 | 2021-07-06 | 大象声科(深圳)科技有限公司 | Voice wake-up method and device, electronic equipment and storage medium |
CN110517682B (en) * | 2019-09-02 | 2022-08-30 | 腾讯科技(深圳)有限公司 | Voice recognition method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9053697B2 (en) * | 2010-06-01 | 2015-06-09 | Qualcomm Incorporated | Systems, methods, devices, apparatus, and computer program products for audio equalization |
DE112011105267T5 (en) * | 2011-05-24 | 2014-03-20 | Mitsubishi Electric Corporation | Target sound reinforcement device and vehicle navigation system |
KR101732193B1 (en) * | 2011-10-21 | 2017-05-04 | 삼성전자주식회사 | Method for controlling power charge and wireless power charge device therefor |
CN102968999B (en) * | 2011-11-18 | 2015-04-22 | 斯凯普公司 | Audio signal processing |
-
2015
- 2015-05-11 CN CN201510236032.6A patent/CN104810021B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104810021A (en) | 2015-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104810021B (en) | The pre-treating method and device recognized applied to far field | |
US10522167B1 (en) | Multichannel noise cancellation using deep neural network masking | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
US11967316B2 (en) | Audio recognition method, method, apparatus for positioning target audio, and device | |
CN109074816B (en) | Far field automatic speech recognition preprocessing | |
US11158333B2 (en) | Multi-stream target-speech detection and channel fusion | |
CN107577449B (en) | Wake-up voice pickup method, device, equipment and storage medium | |
US9653060B1 (en) | Hybrid reference signal for acoustic echo cancellation | |
US9947338B1 (en) | Echo latency estimation | |
EP2222091B1 (en) | Method for determining a set of filter coefficients for an acoustic echo compensation means | |
CN102831898B (en) | Microphone array voice enhancement device with sound source direction tracking function and method thereof | |
CN105427860B (en) | Far field audio recognition method and device | |
US10553236B1 (en) | Multichannel noise cancellation using frequency domain spectrum masking | |
WO2021022094A1 (en) | Per-epoch data augmentation for training acoustic models | |
US20070058820A1 (en) | Sound field controlling apparatus | |
US20180359560A1 (en) | Signal processor | |
US20100150360A1 (en) | Audio source localization system and method | |
US9313573B2 (en) | Method and device for microphone selection | |
CN107483761A (en) | A kind of echo suppressing method and device | |
WO2019113253A1 (en) | Voice enhancement in audio signals through modified generalized eigenvalue beamformer | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
US8867754B2 (en) | Dereverberation apparatus and dereverberation method | |
US11521635B1 (en) | Systems and methods for noise cancellation | |
US10937441B1 (en) | Beam level based adaptive target selection | |
US10863296B1 (en) | Microphone failure detection and re-optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |