CN112037810B

CN112037810B - Echo processing method, device, medium and computing equipment

Info

Publication number: CN112037810B
Application number: CN202011023561.5A
Authority: CN
Inventors: 郝一亚
Original assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2023-10-03
Anticipated expiration: 2040-09-25
Also published as: CN112037810A

Abstract

The invention provides an echo processing method, electronic equipment, device and computing equipment, wherein the method comprises the following steps: collecting audio information; wherein, the audio information comprises reference audio; extracting audio characteristics of a first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band; and controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band.

Description

Echo processing method, device, medium and computing equipment

Technical Field

The embodiment of the invention relates to the field of audio information processing, in particular to an echo processing method, device, medium and computing equipment based on noise injection.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

AEC (Acoustic Echo Cancellation, echo cancellation) is a signal processing technique that functions to cancel an echo signal in a communication system, ensure that a speaker is not interfered by the echo signal, and improve call quality. In an ideal environment, many AEC algorithms can meet the above requirements. However, due to the complexity of the audio signal in the actual environment, in the current AEC algorithm, if the echo can be eliminated cleanly, the required near-end signal is more or less damaged; if the near-end signal is guaranteed to be lossless, in some cases residual echo will occur. Therefore, how to ensure the effect of echo cancellation and ensure the audio signal to be lossless becomes a problem to be solved.

Disclosure of Invention

The present application is intended to provide an echo processing method, apparatus, medium and computing device, so as to at least solve the above technical problems.

In a first aspect of an embodiment of the present application, there is provided an echo processing method, including:

collecting audio information; wherein, the audio information comprises reference audio;

extracting audio characteristics of a first frequency band where a reference audio is located in the audio information to obtain audio characteristics of the audio information in the first frequency band;

the echo canceller is controlled based on the audio characteristics of the audio information in the first frequency band.

In one embodiment of the present application, the extracting the audio feature of the first frequency band where the reference audio is located in the audio information to obtain the audio feature of the audio information in the first frequency band includes:

acquiring a time domain characteristic value and a frequency domain characteristic value of the audio information in the first frequency band where the reference audio is located;

and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

In one embodiment of the application, the method further comprises:

acquiring an energy peak value and two energy trough values of the audio information of the ith frame in a first frequency band;

And determining the peak-to-valley ratio of the ith frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the ith frame of audio information in the first frequency band as a frequency domain characteristic value of the ith frame of audio information in the first frequency band.

In one embodiment of the present invention, the energy peak in the first frequency band is: the energy value corresponding to the first frequency point in the first frequency band;

the two energy trough values are:

the energy value corresponding to a first adjacent frequency point obtained by increasing the preset bandwidth value by taking the first frequency point as the center in the first frequency band, and the energy value corresponding to a second adjacent frequency point which reduces the preset bandwidth value by taking the first frequency point as the center in the first frequency band;

or,

a first energy trough value in a frequency band within the first frequency band that is greater than the first frequency point, and a first energy trough value in a frequency band within the first frequency band that is less than the first frequency point.

In one embodiment of the invention, the method further comprises:

acquiring L frames of audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

Determining an energy average value of the L-frame audio information and a maximum energy value in the L-frame audio information;

and determining a peak fluctuation value of the ith frame of audio information based on the energy average value and the maximum energy value of the L frame of audio information and the energy value of each frame of audio information in the L frame of audio information at a first frequency point, and taking the peak fluctuation value of the ith frame of audio information as a time domain characteristic value of the ith frame of audio information in a first frequency band.

In one embodiment of the invention, the method further comprises:

converting the L-frame audio information to obtain a frequency domain signal of each frame of audio information in the L-frame audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

determining an energy representation of each of the L frames of audio information based on a frequency domain signal of each of the frames of audio information;

constructing a feature plane containing the L-frame audio information based on the energy representation of each frame of the L-frame audio information;

wherein the feature plane comprises: the energy value of at least one frequency point of each frame of audio information in the L frames of audio information in the first frequency band; the at least one frequency point comprises a first frequency point.

In one embodiment of the present application, the controlling the echo canceller based on the audio characteristic of the audio information in the first frequency band includes:

determining that echo information exists under the condition that the time domain characteristic value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain characteristic value is larger than a second threshold value, and controlling to start the echo canceller;

and/or the number of the groups of groups,

and under the condition that the time domain characteristic value of the audio information in the first frequency band is not smaller than a first threshold value and the frequency domain characteristic value is not larger than a second threshold value, determining that echo information does not exist, and controlling to close the echo canceller.

In one embodiment of the application, the method further comprises:

mixing the reference audio with the audio information to be played currently to obtain mixed audio information; and playing the mixed audio information.

In one embodiment of the application, the first frequency band is a frequency band of 14850Hz to 15150 Hz.

In a second aspect of the embodiment of the present application, there is provided a return electronic device, including:

the pick-up is used for collecting audio information; wherein, the audio information comprises reference audio;

the processor is used for extracting audio characteristics of a first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band; the echo canceller is controlled based on the audio characteristics of the audio information in the first frequency band.

In one embodiment of the present invention, the processor is configured to obtain a time domain feature value and a frequency domain feature value of the audio information in the first frequency band where the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

In one embodiment of the present invention, the processor is configured to obtain an energy peak value and two energy trough values of the i-th frame of audio information in the first frequency band; and determining the peak-to-valley ratio of the ith frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the ith frame of audio information in the first frequency band as a frequency domain characteristic value of the ith frame of audio information in the first frequency band.

the two energy trough values are:

Or,

In one embodiment of the invention, the pickup is further adapted to

the processor is used for determining an energy average value of the L-frame audio information and a maximum energy value in the L-frame audio information; and determining a peak fluctuation value of the ith frame of audio information based on the energy average value and the maximum energy value of the L frame of audio information and the energy value of each frame of audio information in the L frame of audio information at a first frequency point, and taking the peak fluctuation value of the ith frame of audio information as a time domain characteristic value of the ith frame of audio information in a first frequency band.

In one embodiment of the present invention, the processor is configured to convert L frames of audio information to obtain a frequency domain signal of each frame of audio information in the L frames of audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

In one embodiment of the present invention, the processor is configured to determine that echo information exists and control to turn on the echo canceller when the time domain feature value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain feature value is larger than a second threshold value;

and/or the number of the groups of groups,

and the processor is used for determining that echo information does not exist and controlling to close the echo canceller under the condition that the time domain characteristic value of the audio information in the first frequency band is not smaller than a first threshold value and the frequency domain characteristic value is not larger than a second threshold value.

In one embodiment of the invention, the electronic device further comprises:

the audio mixer is used for mixing the reference audio with the current audio information to be played to obtain mixed audio information;

And the loudspeaker is used for playing the mixed audio information.

In one embodiment of the application, the first frequency band is a frequency band comprising 14850Hz to 15150 Hz.

In a third aspect of the embodiment of the present application, there is provided an echo processing device, including:

the audio acquisition unit is used for acquiring audio information; wherein, the audio information comprises reference audio;

the feature extraction unit is used for extracting audio features of a first frequency band where the reference audio is located in the audio information to obtain the audio features of the audio information in the first frequency band;

and the echo cancellation AEC control unit is used for controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band.

In one embodiment of the present application, the feature extraction unit is configured to obtain a time domain feature value and a frequency domain feature value of the audio information in the first frequency band where the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

In one embodiment of the present application, the feature extraction unit is configured to obtain an energy peak value and two energy trough values of the i-th frame of audio information in the first frequency band; and determining the peak-to-valley ratio of the ith frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the ith frame of audio information in the first frequency band as a frequency domain characteristic value of the ith frame of audio information in the first frequency band.

the two energy trough values are:

or,

In one embodiment of the present invention, the feature extraction unit is configured to obtain L-frame audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

In one embodiment of the invention, the feature extraction unit is configured to

In one embodiment of the present invention, the AEC control unit is configured to determine that echo information exists and control to turn on the echo canceller when the time domain feature value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain feature value is larger than a second threshold value;

and/or the number of the groups of groups,

And the AEC control unit is used for determining that echo information does not exist and controlling the echo canceller to be closed under the condition that the time domain characteristic value of the audio information in the first frequency band is not smaller than a first threshold value and the frequency domain characteristic value is not larger than a second threshold value.

In one embodiment of the application, the apparatus further comprises:

the audio mixing unit is used for mixing the reference audio with the current audio information to be played to obtain mixed audio information;

and the audio output unit is used for playing the mixed audio information.

In a fourth aspect of embodiments of the present application, there is provided a computing device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by any of the embodiments of the present application.

In a fifth aspect of an embodiment of the present application, there is provided a medium including:

which stores a computer program, characterized in that the program, when being executed by a processor, implements the method provided by any of the embodiments of the application.

According to the embodiment of the invention, the audio information containing the reference audio is collected, the audio characteristics of the first frequency band are extracted, and the echo canceller is controlled according to the extracted audio characteristics of the audio information in the first frequency band. In this way, the influence on the audio of the audio information except the first frequency band can be avoided, so that the audio signal is prevented from being damaged; and by extracting the characteristics of the first frequency band containing the reference audio, the characteristics of the audio information can be acquired more accurately, and the accuracy of echo cancellation can be ensured when the control of echo cancellation is further carried out.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically shows a flow chart of an implementation of an echo processing method according to an embodiment of the invention;

FIG. 2 schematically illustrates a schematic diagram of an ear perception domain according to an embodiment of the present invention;

fig. 3 schematically shows a characteristic plan view in a schematic echo detection method according to an embodiment of the present invention;

FIG. 4 schematically illustrates an AEC process schematic according to an embodiment of the invention;

FIG. 5 schematically illustrates a media schematic according to an embodiment of the invention;

fig. 6 schematically shows a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 7 schematically shows a schematic structural diagram of an echo detection device according to an embodiment of the present invention;

FIG. 8 schematically illustrates a computing device architecture diagram according to an embodiment of the invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the invention may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, an echo processing method, an echo processing device, an echo processing medium and a computing device are provided.

Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

AEC is a signal processing technology, and its function is to eliminate echo signal in communication system, ensure that the speaker is not interfered by echo signal, and promote conversation quality. The inventors have found that many AEC algorithms can meet the above requirements very well in an ideal environment, such as in an Anechoic Chamber. However, due to the complexity of the audio signal in the actual environment, almost all AEC algorithms cannot simultaneously and well give consideration to the "echo cancellation" and the "signal lossless" in the actual environment.

In view of this, the present invention provides an echo processing method, apparatus, medium and computing device, which collect audio information including reference audio, so as to extract audio features of a first frequency band, and control an echo canceller according to the extracted audio features of the audio information in the first frequency band. In this way, the influence on the audio of the audio information except the first frequency band can be avoided, so that the audio signal is prevented from being damaged; and by extracting the characteristics of the first frequency band containing the reference audio, the characteristics of the audio information can be acquired more accurately, and the accuracy of echo cancellation can be ensured when the control of echo cancellation is further carried out.

Having described the basic principles of the present invention, various non-limiting embodiments of the invention are described in detail below.

Exemplary method

A first aspect of the present invention provides an echo processing method, which is described below with reference to fig. 1, according to an exemplary embodiment of the present invention, including:

s101: collecting audio information; wherein, the audio information comprises reference audio;

s102: extracting audio characteristics of a first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band;

s103: and controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band.

The scheme provided by the embodiment can be applied to electronic equipment, and the electronic equipment at least needs to be provided with a microphone, a processor and the like. Further, the electronic apparatus may further have an audio output function or be capable of connecting to a device having an audio output function, where the audio output function may be realized by a speaker, an earphone, or the like. In an example, the electronic device may be any one of a mobile phone, a tablet computer, a notebook computer, and the like.

In this embodiment, the reference audio may be mixed with the audio information to be played in advance.

The reference audio may be noise, and specifically may be noise of a first frequency point in the first frequency band. In a preferred example, the first frequency band may be a frequency band of 14850Hz to 15150 Hz; of course, in practical arrangements, the first frequency band may be larger or smaller, which is not exhaustive in this embodiment. The first frequency point may also be selected according to actual situations, and in a preferred example, may be a 15KHz frequency point; of course, other frequency points in the first frequency band can be selected as the first frequency point according to the actual situation, namely, the frequency point of the injected noise.

Further, the factors considered in relation to the frequency point of the selected reference audio may include: first, not perceived by the human ear; secondly, the algorithm has strong robustness after utilizing the noise, the characteristics are easy to extract, and the algorithm is not easy to be interfered by external noise.

In particular, a single frequency signal of 15kHz is not easily found by the human ear. As shown in fig. 2, the ordinate represents sound size (dB) and the abscissa represents frequency (Hz). The perception Field (audio Field) of the human ear is in the light gray interval, and the average perception threshold (Threshold of Audibility) is the arc below the perception Field, meaning that above this threshold the human ear is audible, whereas energy below this line is not perceivable by the human ear. As can be seen from FIG. 2, the human ear's perceptibility is strongest at 1kHz-2kHz, and the frequency is continuously increased or decreased on the basis of the frequency, and the human ear's perceptibility is decreased. When the frequency reaches 15kHz, the average perception threshold of the human ear has approached 40dB SPL (Sound Pressure Level ). The 15kHz injected noise does not affect the objective indicators of speech quality either, and the basic frequency of POQAL (Perceptual Objective Listening Quality Analysis, listening quality perception objective assessment) designed by ITU (International Telecommunication Union, international telecommunications union) is 300-3400Hz, and its maximum bandwidth (ultra-wideband) is 50-14000Hz. It can thus also be seen that 15kHz injected noise has little effect on speech quality.

Second, the robustness and noise immunity of the algorithm are also very important. The frequency is also hardly noticeable by the human ear at very low frequencies, for example below 80 Hz. But the environment contains a lot of low frequency noise that can seriously affect the feature extraction of the back-injection noise. The robustness of a single frequency signal of 15kHz is relatively much higher because it is very difficult to have other background interference in the environment at this frequency. This greatly improves the accuracy of feature extraction.

Finally, at 15kHz, most headphones, speakers on mobile devices, can contain. However, if the frequency is increased again, the frequency response curve of most devices will slide down, so that high-frequency sound cannot be played normally, and the feature extraction will be affected.

Therefore, the present embodiment sets the first frequency point to 15KHz. It should be understood that the first frequency point may be other frequency points within a certain bandwidth of about 15KHz, for example, may be 14.99KHz, 15.12KHz, etc.; in addition, the reference audio, that is, the noise, may have a certain bandwidth, for example, a certain bandwidth may be selected for positive and negative 15kHz, and the certain bandwidth of the reference audio may be smaller than the bandwidth of the first frequency band, which is not exhaustive herein.

Further, in one scenario, the reference audio may be a mix with audio information to be played in another electronic device that performs audio output.

In another scenario, the reference audio may be a mix with the audio information to be played in the present electronic device, where the present electronic device refers to an electronic device that performs the foregoing S101-S103.

In the scene where the same electronic device mixes the reference audio information and the audio information to be played, before executing S101, the method may further include: mixing the reference audio with the audio information to be played currently to obtain mixed audio information; and playing the mixed audio information.

Here, the selection of the reference audio is already described in the foregoing embodiments, and will not be described in detail; the mixing of the reference audio and the audio information to be played can be realized by a mixer, the reference audio and the audio information to be played are input into the mixer, the mixed audio information output by the mixer is obtained, and then the mixed audio information is played. Playing the mixed audio information may be performed by an audio output unit of the electronic device, such as a speaker of the mobile phone or a headset connected to the mobile phone.

The played mixed audio information is transmitted in the sound field, wherein in the process of transmitting the sound field, sounds of other frequencies or frequency bands in the sound field may be mixed.

On the basis of the foregoing description, in the present embodiment S101, audio information is collected, and the audio information may include injection noise designed in advance, that is, reference audio. In addition, the audio information may include sounds of other frequencies or frequency bands mixed in the sound field transmission.

By adopting the above processing of adding the reference audio in the first frequency point of the first frequency band, since the first frequency band is selected in the frequency band from 14850Hz to 15150Hz in this embodiment, this frequency band can have a small influence on the voice quality, that is, the voice quality of the normal conversation of the user will not be affected, and since the mixing of the reference audio is performed in this frequency band, other background noise can be introduced as little as possible, thereby avoiding the interference of other noise to the reference audio, and ensuring that the feature extraction for the first frequency band is more accurate.

After the injected noise (i.e., the reference audio) is collected, the aforementioned S102-S103 are performed, i.e., the determination of "whether there is echo" can be performed through feature extraction.

Before performing the process of acquiring the audio feature in step S102, a feature plane needs to be constructed, and the process of constructing the feature plane may include:

Here, the i-th frame of audio information may be currently acquired audio information or currently analyzed audio information.

The L-frame audio information may be the i-th frame audio and L-1 frame audio information preceding the i-th frame audio information.

The step of converting the L-frame audio information to obtain a frequency domain signal of each frame of audio information in the L-frame audio information, which can be to convert each currently acquired frame of audio information into a frequency domain signal, and only extracting the frequency domain signal of the L-frame audio information when the i-th frame audio information is subjected to subsequent processing; or, when processing is needed, the frequency domain signal of each audio information can be obtained by obtaining L frames of audio information and converting the L frames of audio information one by one.

Wherein the mode of converting the audio information into the frequency domain signal can be that the recovered audio information is converted from the time domain into the frequency domain by fast Fourier transform (FFT, fast Fourier Transform), and D is used _i (ω) represents (i-th frame).

The calculation method for determining the energy representation of each frame of audio information based on the frequency domain signal of each frame of audio information in the L frames of audio information may be to calculate the energy magnitude of the frequency domain signal of each frame of audio information in the log domain, which is illustrated by the following formula 1 taking the ith frame of audio information as an example:

Wherein EN is _i (omega) frequency domain signal D representing the audio information of the ith frame _i (omega) energy size in log domain (dB Full Scale (dBFs), fullness vs. level), D _i (e ^jω ) Representation D _i Real part of (ω), D _i ^* (e ^-jω ) Representation D _i Imaginary part of (ω).

Based on the energy representation of each of the L frames of audio information, a Feature plane (Feature Surface) containing the L frames of audio information is constructed, as follows:

from EN (ω) of each frame of the L-frame audio information, a feature plane can be constructed, which contains: the energy value of at least one frequency point of each frame of audio information in the L frames of audio information in the first frequency band; the at least one frequency point comprises a first frequency point.

For example, as shown in FIG. 3, the feature plane contains energy values for at least one frequency bin within the first frequency band, 14850-15150 Hz. In fig. 3, the characteristic plane x-axis represents frequency, the y-axis represents what frame (time axis) is, and the z-axis represents energy level (dBFs).

By calculating the feature plane, the energy value of each frequency point in the frequency range, namely the first frequency range, in the dimension of frequency and time can be obtained.

Therefore, more frequency point energy values can be covered by calculating the characteristic plane, more data values are provided for subsequent calculation, the calculation efficiency is improved, and the calculating characteristic plane only needs to cover L frames of audio information in time and the energy value corresponding to the frequency point in the first frequency band of each frame of audio information, so that excessive calculation amount is not generated and excessive calculation resources are not occupied.

After the feature plane is constructed, the feature value extraction is started. The eigenvalues include two dimensions, a frequency domain eigenvalue and a time domain eigenvalue, which may correspond to the x-axis data and the y-axis data in the feature plane of fig. 3, respectively.

And when the step S102 is executed, extracting the audio characteristics of the first frequency band where the reference audio is located in the acquired audio information to obtain the audio characteristics of the audio information in the first frequency band. This can be achieved by:

The determining method for the frequency domain characteristic value may include:

and determining a Peak-to-valley ratio (Peak-to-Peak ratio) of the audio information of the ith frame in a first frequency band based on the energy Peak value and the two energy trough values, and taking the Peak-to-valley ratio of the audio information of the ith frame in the first frequency band as a frequency domain characteristic value of the audio information of the ith frame in the first frequency band.

Specifically, the mode of obtaining the frequency domain eigenvalue can be calculated by the following formula 2:

PT _i ＝[EN _i (k ₀ )-EN _i (k _L )]*[EN _i (k ₀ )-EN _i (k _R )]/EN _i (k ₀ ) ² (equation 2)

Wherein k is ₀ Represents the frequency band of 15000Hz and k of the frequency point corresponding to the wave crest _L Represents the frequency point corresponding to the first trough, k _R And represents the frequency point corresponding to the other trough. PT (PT) _i The peak-to-valley ratio size at the i-th frame is indicated. PT (PT) _i The range of (1, 0) is between, with a closer to 1 representing a larger peak and a closer to 0 representing a smaller peak.

Specifically, on the constructed feature plane, referring to fig. 3, the x-axis represents the frequency variation, the y-axis represents the time axis, and the z-axis represents the energy level. For example, in this embodiment, i=5 is taken, that is, the feature plane in fig. 3 is taken through the section corresponding to y=5, so as to obtain a graph of the change of the energy value of the audio information in the 5 th frame. In the graph, the energy value corresponding to the 5 th frame of audio information in the z-axis changes along with the change of the x-axis frequency, so that a series of energy peaks and energy troughs are obtained, the peak-to-valley ratio of the 5 th frame of audio information in a first frequency band is determined according to the energy peaks and the two energy troughs, and the peak-to-valley ratio of the 5 th frame of audio information in the first frequency band is used as the frequency domain characteristic value of the 5 th frame of audio information in the first frequency band.

The energy peak value in the first frequency band is: the energy value corresponding to the first frequency point in the first frequency band;

the two energy trough values are:

or,

Specifically, a first frequency point is selected in the first frequency band, the first frequency point needs to meet a specific condition, for example, noise corresponding to the frequency point should not be easily perceived by human ears, and meanwhile, the noise corresponding to the first frequency point is strong in robustness and features are easy to extract under the algorithm described by the invention. For example, the noise corresponding to 15000Hz is not easy to be perceived by people, the noise corresponding to the algorithm of the invention has strong robustness, and the characteristics are easy to extract, so that 15000Hz can be selected as the first frequency point. In fig. 3, let x=15000, y=5, and take the energy value corresponding to the first frequency point in the 5 th frame as the energy peak value in the first frequency band.

And selecting a specific preset frequency bandwidth by taking the first frequency point as the center, so as to obtain two energy trough values. For example, x=15000 is still selected as the first frequency point, and the preset frequency bandwidth is 50Hz. In the first frequency band, a preset bandwidth value of 50Hz is increased by taking 15000Hz as a center to obtain an energy value corresponding to 15050Hz of a first adjacent frequency point as an energy trough value, and in the first frequency band, the preset bandwidth value of 50Hz is reduced by taking the first frequency point of 15000Hz as the center to obtain an energy value corresponding to 14950Hz of a second adjacent frequency point as another energy trough value.

Or,

still, 50Hz is selected as a preset bandwidth, and in the first frequency band, the energy trough value in the frequency band corresponding to 15000 Hz-15050 Hz is used as one energy trough value, and the energy trough value in the frequency band corresponding to 14950 Hz-15000 Hz is used as the other energy trough value.

According to the values in the above embodiment, the corresponding frequency domain characteristic value PT can be calculated by taking the above formula (2) _i . For example, will i=5, k ₀ ＝15000,k _L ＝14950，k _R Substitution of =15050 into equation (2) yields:

PT ₅ ＝[EN ₅ (15000)-EN ₅ (14950)]*[EN ₅ (15000)-EN ₅ (15050)]/EN ₅ (15000) ²

wherein EN is ₅ (15000) Representing frame 5Peak values corresponding to the audio information; EN (EN) ₅ (14950) And EN ₅ (15050) Representing two corresponding trough values of audio information of the 5 th frame, PT ₅ The peak-to-valley ratio size at the 5 th frame, that is, the frequency domain feature value corresponding to the 5 th frame audio information, is indicated. PT (PT) ₅ In the range of (0, 1), PT ₅ The closer the value of (2) is to 1, the larger the representative frequency domain eigenvalue is; PT (PT) ₅ The closer the value of (c) is to 0, the smaller the representative frequency domain eigenvalue.

The peak-to-valley ratio is calculated to obtain the relative value of the highest peak value and the valley energy value, so that the characteristic of the audio information can be more accurately and stably represented compared with the frequency domain characteristic value of the audio information obtained by independently obtaining the energy value of a certain frequency point.

The determining method of the time domain feature value may include:

Specifically, on the constructed feature plane, referring to fig. 3, the x-axis represents the frequency variation, the y-axis represents the time axis, and the z-axis represents the energy level. For example, in this embodiment, the first frequency point is 15000Hz, that is, the characteristic plane in fig. 3 is cut through the section corresponding to x=15000, so as to obtain a graph of the change of the energy value of the audio information corresponding to the first frequency point. In the graph, the energy value of the audio information corresponding to the first frequency point in the z-axis changes along with the change of the y-axis frequency, so as to obtain a series of energy wave peak values, and the peak maximum value EN is obtained through statistics _max Peak average valueAnd then the peak fluctuation information is determined according to the energy peak value, so as to be used as the corresponding time domain characteristic value Pj under the specific time _i 。Pj _i The smaller the value of (c) represents the smaller the ripple, the higher the reliability of the echo detection.

Pj _i Reference formula (3):

represents the average energy, EN, over the L frame range _max Represents the maximum energy of one frame in the L frame range, pj _i The peak fluctuation of the i-th frame in the L range is represented. Pj (Pj) _i The smaller the value of (c) represents the smaller the ripple, the higher the reliability of the echo detection.

The fluctuation value obtained by introducing the average energy of the L-frame audio information with relevance in time to calculate is used as the time domain characteristic value, so that the situation that the frequency domain characteristic value is only used as a judgment standard for judging whether echo exists or not can be avoided, the situation that the echo is misjudged due to sudden change of the frequency domain characteristic value is avoided, the stability of the system is improved, and the robustness of the system is stronger.

After the detection is completed, step S103 is performed. Under the condition of echo, the echo canceller is normally opened, and under the condition of no echo, the echo canceller is closed, so that the audio frequency damage is reduced.

The characteristic value comprises a frequency domain characteristic value and a time domain characteristic value, wherein the frequency domain characteristic value PT _i Characterizing echo magnitude, whereas time domain eigenvalues Pj _i Representing peak fluctuations of the i-th frame in the L-frame range. Pj (Pj) _i The smaller the value of (c) is, the smaller the representative fluctuation is, which in turn indicates a higher reliability of the echo detection.

According to one embodiment of the present invention, when step S103 is performed, the controlling the echo canceller based on the audio characteristic of the audio information in the first frequency band includes:

and/or the number of the groups of groups,

Specifically, through a large amount of experimental data analysis, two threshold values T can be obtained _p (second threshold value) and T _j (first threshold values) respectively corresponding to the frequency domain characteristic values PT _i And a time domain eigenvalue Pj _i The echo detection criteria are determined by equation (4):

that is, when the extracted frequency domain feature value PT _i Greater than threshold value T _p And the extracted time domain value Pj _i Less than threshold value T _j When the echo detection result (precision) is a result of confirming that noise exists, the result of the precision is sent to the AEC module to dynamically open the AEC switch. In addition, when the extracted frequency domain characteristic value PT _i Not greater than threshold value T _p Or the extracted time domain value Pj _i Not less than threshold T _j And when the echo detection result is that the noise is confirmed to be absent, or the noise is within a tolerance limit, or the noise is smaller, the result of the Decision is sent to the AEC module to dynamically close the AEC switch.

Since most AECs are only turned on or off, the determination of whether to turn the AEC on or off is performed by using a threshold value, and thus, the method can be applied to the processing of most AECs. And the characteristic of the audio information can be more stable by combining the peak-to-valley ratio as the frequency domain characteristic value and the fluctuation value as the time domain characteristic value, so that the time domain characteristic value and the frequency domain characteristic value can be accurately controlled to be switched on or off by combining the corresponding threshold value for judgment.

The AEC (Acoustic Echo Cancellation) is a signal processing technology, and has the function of eliminating echo signals in a communication system, ensuring that a speaker is not interfered by the echo signals, and improving the conversation quality.

In particular, the processing of echo cancellation of an echo canceller may comprise:

fig. 4 illustrates the structure of the AEC module in a one-to-one RTC (Real-Time Communication) scenario. x represents a far-end signal (far-end signal) representing a signal transmitted from another device through the network. d represents the near-end signal, which is also the signal picked up by the microphone. The signal not only contains the signal (voice signal plus noise) in the sound field of the local terminal, but also contains the far-end signal played by the loudspeaker of the local terminal. The AEC module aims to cancel the echo part y in the signal d without damaging the local sound field signal. The core is to estimate the y signal by an Adaptive Filter (Adaptive Filter) and then cancel the echo in the signal d.

The algorithmic implementation in echo cancellation is a core of the adaptive filter design. In the processing, the signals of the main channel and the reference channel which are converted into digital signals are processed in a high-speed signal processor according to an adaptive filtering algorithm, and the signals are sent to an output module of the system after the processing is completed. And in the signal output module, an analog signal is obtained through a digital-to-analog converter, and then is subjected to low-pass filtering and is sent to a loudspeaker for output, so that a voice signal after echo cancellation is obtained.

Still more specifically, an adaptive filter is a device that processes an input signal and learns continuously until it reaches a desired value. Under the condition that the input signal is not stable, the self-adaptive filter can continuously adjust the weight vector of the filter according to the environment, so that the algorithm reaches a specific convergence condition, and the self-adaptive filtering process is realized. The adaptive filters may be classified into analog filters and discrete filters by input signal type, for example, the discrete filters may use digital filters (digital filters may be structurally classified into infinite impulse response filters (IIRs) whose inputs are related not only to past and present inputs, but also to past outputs, and finite impulse response Filters (FIRs) whose outputs are related to a limited number of past and present inputs) to provide the adaptive filters with greater stability and with sufficient filter coefficients to be adjusted to achieve a particular convergence criterion, a transverse FIR filter is typically selected for echo cancellation.

In addition, the algorithm of echo cancellation may include LSM algorithm, normalized Least Mean Square (NLMS) algorithm, and the like, which is not exhaustive in the present embodiment.

Exemplary Medium

Having described the method of an exemplary embodiment of the present invention, a medium of an exemplary embodiment of the present invention will be described with reference to fig. 5.

In some possible embodiments, the aspects of the present invention may also be implemented as a computer-readable medium having stored thereon a program for implementing the steps in the echo processing method according to the various exemplary embodiments of the present invention described in the above section of the "exemplary method" of the present specification when the program is executed by a processor.

Specifically, the processor is configured to implement the following steps when executing the program:

extracting audio characteristics of a first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band;

and controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band.

It should be noted that: the medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As shown in fig. 5, a medium 50 in accordance with an embodiment of the present invention is depicted that may employ a portable compact disc read-only memory (CD-ROM) and that includes a program and that may run on a device. However, the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take many forms, including, but not limited to: electromagnetic signals, optical signals, or any suitable combination of the preceding. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN)

Exemplary electronic device

Having described the method of an exemplary embodiment of the present invention, an electronic device of an exemplary embodiment of the present invention is described next with reference to fig. 6.

A second aspect of the present invention provides an electronic device 100 relating to echo processing, as shown in fig. 6, comprising:

a sound pickup 101 for collecting audio information; wherein, the audio information comprises reference audio;

the processor 102 is configured to extract audio features of a first frequency band where the reference audio is located in the audio information, so as to obtain audio features of the audio information in the first frequency band; and controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band.

In one embodiment, the processor 102 in the electronic device is configured to obtain a time domain feature value and a frequency domain feature value of the audio information in the first frequency band where the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

In one embodiment, the processor 102 in the electronic device is configured to obtain an energy peak value and two energy trough values of the i-th frame of audio information in the first frequency band; and determining the peak-to-valley ratio of the ith frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the ith frame of audio information in the first frequency band as a frequency domain characteristic value of the ith frame of audio information in the first frequency band.

In one embodiment, the energy peak in the first frequency band in the electronic device is: the energy value corresponding to the first frequency point in the first frequency band;

the two energy trough values are:

or,

In one embodiment, the pickup 101 in the electronic device is further configured to:

the processor 102 is configured to determine an average energy value of the L-frame audio information and a maximum energy value in the L-frame audio information; and determining a peak fluctuation value of the ith frame of audio information based on the energy average value and the maximum energy value of the L frame of audio information and the energy value of each frame of audio information in the L frame of audio information at a first frequency point, and taking the peak fluctuation value of the ith frame of audio information as a time domain characteristic value of the ith frame of audio information in a first frequency band.

In one embodiment, the processor 102 in the electronic device is configured to convert the L-frame audio information to obtain a frequency domain signal of each frame of audio information in the L-frame audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

In one embodiment, the processor 102 in the electronic device is configured to determine that echo information exists and control to start the echo canceller when the time domain feature value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain feature value is larger than a second threshold value;

and/or the number of the groups of groups,

the processor 102 is configured to determine that there is no echo information when the time domain feature value of the audio information in the first frequency band is not less than a first threshold value and the frequency domain feature value is not greater than a second threshold value, and control to turn off the echo canceller.

In one embodiment, the electronic device further comprises:

a mixer 103, configured to mix the reference audio with the audio information to be played currently, so as to obtain mixed audio information;

and a speaker 104 for playing the mixed audio information.

The first frequency band is a frequency band comprising 14850Hz to 15150 Hz.

Exemplary apparatus

Having described an exemplary electronic device of the present invention, an apparatus according to an exemplary embodiment of the present invention is described next with reference to fig. 7.

A third aspect of an embodiment of the present invention provides an echo processing device 200, as shown in fig. 7, including:

an audio acquisition unit 201 for acquiring audio information; wherein, the audio information comprises reference audio;

a feature extraction unit 202, configured to extract an audio feature of a first frequency band where the reference audio is located in the audio information, so as to obtain an audio feature of the audio information in the first frequency band;

the echo cancellation AEC control unit 203 is configured to control the echo canceller based on the audio characteristic of the audio information in the first frequency band.

In one embodiment, the feature extraction unit 202 is configured to obtain a time domain feature value and a frequency domain feature value of the audio information in the first frequency band where the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

In one embodiment, the feature extraction unit 202 is configured to obtain an energy peak value and two energy trough values of the i-th frame of audio information in the first frequency band; and determining the peak-to-valley ratio of the ith frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the ith frame of audio information in the first frequency band as a frequency domain characteristic value of the ith frame of audio information in the first frequency band.

In one embodiment, the energy peaks in the first frequency band are:

the energy value corresponding to the first frequency point in the first frequency band;

the two energy trough values are:

or,

In one manner of facts, the feature extraction unit 202 is configured to obtain L frames of audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

In one embodiment, the feature extraction unit 202 is configured to:

In one manner, the AEC control unit 203 is configured to determine that echo information exists and control to turn on the echo canceller when the time domain feature value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain feature value is larger than a second threshold value;

and/or the number of the groups of groups,

the AEC control unit 203 is configured to determine that echo information does not exist and control to turn off the echo canceller when the time domain feature value of the audio information in the first frequency band is not less than a first threshold value and the frequency domain feature value is not greater than a second threshold value.

In one embodiment, the apparatus further comprises:

the mixing unit 204 is configured to mix the reference audio with the audio information to be played currently to obtain mixed audio information;

an audio output unit 205 for playing the mixed audio information.

In one embodiment, the first frequency band is a frequency band of 14850Hz to 15150 Hz.

Exemplary computing device

Having described the method, electronic device, and apparatus of the exemplary embodiments of the present invention, a description of the computing device of the exemplary embodiments of the present invention follows with reference to FIG. 8.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In some possible implementations, a computing device according to embodiments of the present invention may include at least one processing unit and at least one memory unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to execute the steps in the feature processing method according to various exemplary embodiments of the present invention described in the "exemplary method" section of the present specification.

A computing device 90 according to such an embodiment of the invention is described below with reference to fig. 8. The computing device 90 shown in fig. 8 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 8, computing device 90 is in the form of a general purpose computing device. Components of computing device 90 may include, but are not limited to: the at least one processing unit 901, the at least one storage unit 902, and a bus 903 connecting different system components (including the processing unit 901 and the storage unit 902).

Bus 903 includes a data bus, a control bus, and an address bus.

The storage unit 902 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 9021 and/or cache memory 9022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 9023.

The storage unit 902 may also include a program/utility 9025 having a set (at least one) of program modules 9024, such program modules 9024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The computing device 90 may also communicate with one or more external devices 904 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 905. Moreover, the computing device 90 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through the network adapter 906. As shown in fig. 7, the network adapter 906 communicates with other modules of the computing device 90 over the bus 903. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 90, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of a feature handling device are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An echo processing method, the method comprising:

Controlling an echo canceller based on the audio characteristics of the audio information in the first frequency band;

the method further comprises the steps of:

determining the peak-to-valley ratio of the i-th frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the i-th frame of audio information in the first frequency band as a frequency domain characteristic value of the i-th frame of audio information in the first frequency band; the frequency domain characteristic value of the first frequency band is used for obtaining the audio characteristic of the first frequency band.

2. The method of claim 1, wherein the extracting the audio feature of the first frequency band of the audio information where the reference audio is located to obtain the audio feature of the audio information in the first frequency band includes:

3. The method of claim 1, wherein the energy peaks within the first frequency band are: the energy value corresponding to the first frequency point in the first frequency band;

The two energy trough values are:

or,

4. The method of claim 2, wherein the method further comprises:

5. The method of any of claims 2-4, wherein the method further comprises:

6. The method of claim 2, wherein the controlling the echo canceller based on the audio characteristics of the audio information in the first frequency band comprises:

And/or the number of the groups of groups,

7. The method of claim 1, wherein the method further comprises:

8. The method of claim 1, wherein the first frequency band is a frequency band of 14850Hz to 15150 Hz.

9. An electronic device, comprising:

the processor is used for extracting audio characteristics of a first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band; controlling an echo canceller based on the audio characteristics of the audio information in the first frequency band;

the processor is used for acquiring an energy peak value and two energy trough values of the ith frame of audio information in the first frequency band; determining the peak-to-valley ratio of the i-th frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the i-th frame of audio information in the first frequency band as a frequency domain characteristic value of the i-th frame of audio information in the first frequency band; the frequency domain characteristic value of the first frequency band is used for obtaining the audio characteristic of the first frequency band.

10. The electronic device of claim 9, wherein the processor is configured to obtain a time domain feature value and a frequency domain feature value of audio information in the first frequency band in which the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

11. The electronic device of claim 9, wherein the energy peaks within the first frequency band are: the energy value corresponding to the first frequency point in the first frequency band;

the two energy trough values are:

or,

12. The electronic device of claim 10, wherein the pickup is further configured to

13. The electronic device of any of claims 10-12, wherein the processor is configured to convert L frames of audio information to obtain a frequency domain signal for each of the L frames of audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

14. The electronic device of claim 10, wherein the processor is configured to determine that echo information exists and control the echo canceller to be turned on if a time domain feature value of the audio information in a first frequency band is less than a first threshold value and a frequency domain feature value is greater than a second threshold value;

and/or the number of the groups of groups,

15. The electronic device of claim 9, wherein the electronic device further comprises:

and the loudspeaker is used for playing the mixed audio information.

16. The electronic device of claim 9, wherein the first frequency band is a frequency band comprising 14850Hz to 15150 Hz.

17. An echo processing device, the device comprising:

the characteristic extraction unit is used for extracting the audio characteristics of the first frequency band where the reference audio is located in the audio information to obtain the audio characteristics of the audio information in the first frequency band;

an echo cancellation AEC control unit, configured to control an echo canceller based on an audio characteristic of the audio information in the first frequency band;

the characteristic extraction unit is used for obtaining an energy peak value and two energy trough values of the ith frame of audio information in the first frequency band; determining the peak-to-valley ratio of the i-th frame of audio information in a first frequency band based on the energy peak value and the two energy trough values, and taking the peak-to-valley ratio of the i-th frame of audio information in the first frequency band as a frequency domain characteristic value of the i-th frame of audio information in the first frequency band; the frequency domain characteristic value of the first frequency band is used for acquiring the audio characteristic of the first frequency band.

18. The apparatus of claim 17, wherein the feature extraction unit is configured to obtain a time domain feature value and a frequency domain feature value of audio information in the first frequency band in which the reference audio is located; and taking the time domain characteristic value and the frequency domain characteristic value as the audio frequency characteristic of the audio information in the first frequency band.

19. The apparatus of claim 17, wherein,

the two energy trough values are:

or,

20. The apparatus of claim 18, wherein the feature extraction unit is configured to obtain L-frame audio information; wherein, the L frame audio information comprises: the audio information of the i frame and the audio information of the L-1 frame before the audio information of the i frame; l is an integer of 1 or more;

21. The apparatus according to any of claims 18-20, wherein the feature extraction unit is configured to

22. The apparatus of claim 18, wherein,

the AEC control unit is configured to determine that echo information exists and control to start the echo canceller when the time domain feature value of the audio information in the first frequency band is smaller than a first threshold value and the frequency domain feature value is larger than a second threshold value;

And/or the number of the groups of groups,

23. The apparatus of claim 17, wherein the apparatus further comprises:

and the audio output unit is used for playing the mixed audio information.

24. The apparatus of claim 17, wherein the first frequency band is a frequency band of 14850Hz to 15150 Hz.

25. A computing device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.

26. A medium storing a computer program, which when executed by a processor performs the method of any one of claims 1-8.