EP2660814B1 - Adaptive equalization system - Google Patents
Adaptive equalization system Download PDFInfo
- Publication number
- EP2660814B1 EP2660814B1 EP12166906.3A EP12166906A EP2660814B1 EP 2660814 B1 EP2660814 B1 EP 2660814B1 EP 12166906 A EP12166906 A EP 12166906A EP 2660814 B1 EP2660814 B1 EP 2660814B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- signal
- curve
- equalization coefficients
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003044 adaptive effect Effects 0.000 title claims description 30
- 238000005259 measurement Methods 0.000 claims description 45
- 230000007774 longterm Effects 0.000 claims description 42
- 238000000034 method Methods 0.000 claims description 28
- 230000006978 adaptation Effects 0.000 claims description 25
- 230000004044 response Effects 0.000 claims description 12
- 238000001228 spectrum Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000003595 spectral effect Effects 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 238000009499 grossing Methods 0.000 description 9
- 230000015654 memory Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Definitions
- This application relates to sound processing and, more particularly, to adaptive equalization of speech signals.
- a speech signal may be adversely impacted by acoustical or electrical characteristics of the acoustical environment or the electrical audio path associated with the speech signal.
- the in-car acoustics or microphone characteristics may have a significant detrimental impact on the sound quality or intelligibility of a speech signal transmitted to a remote party.
- US 2009/132248 discloses a system that improves the speech intelligibility and the speech quality of a speech segment.
- the system includes a dynamic controller that detects a background noise from an input by modeling a signal.
- a variable gain amplifier adjusts the variable gain of the amplifier in response to an output of dynamic controller.
- a shaping filter adjusts a speech signal by tilting portions of the speech signal of the dynamic controller.
- an adaptive equalization system that improves the intelligibility of a speech signal.
- the system may automatically adjust the spectral shape of the speech signal to improve speech intelligibility.
- Equalization techniques such as parametric or graphic equalization have long been implemented in audio products to improve sound quality.
- an equalization curve is often tuned for a specific environment based on experience or to a particular target, but then usually remains unchanged during production or real-time use.
- the equalizer is adapted based on a target shape. This system attempts to automatically compensate for deficiencies in the audio path, which makes the output speech more pleasing and intelligible even in the presence of noise.
- the system may achieve this increase in intelligibility without requiring a voicing decision and without requiring advanced knowledge of the clean speech and the noise level.
- the system may be implemented in real-time applications where only noisy speech is available.
- Figure 1 illustrates a system that includes an audio signal source 102, an adaptive equalization system 104, and an audio signal output 106.
- the adaptive equalization system 104 receives an input speech signal from the audio signal source 102, processes the signal, and outputs an improved version of the input signal to the audio signal output 106.
- the output signal received by the audio signal output 106 may be more intelligible to a listener than the input signal received by the adaptive equalization system 104.
- the audio signal source 102 may be a microphone, an incoming communication system channel, a pre-processing system, or another signal input device.
- the audio signal output 106 may be a loudspeaker, an outgoing communication system channel, a speech recognition system, a post-processing system, or any other output device.
- the adaptive equalization system 104 includes a computer processor 108 and a memory device 110.
- the computer processor 108 may be implemented as a central processing unit (CPU), microprocessor, microcontroller, application specific integrated circuit (ASIC), or a combination of other type of circuits.
- the computer processor is a digital signal processor ("DSP") including a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing.
- DSP digital signal processor
- the digital signal processor may be designed and customized for a specific application, such as an audio system of a vehicle or a signal processing chip of a mobile communication device (e.g., a phone or tablet computer).
- the memory device 110 may include a magnetic disc, an optical disc, RAM, ROM, DRAM, SRAM, Flash and/or any other type of computer memory.
- the memory device 110 is communicatively coupled with the computer processor 108 so that the computer processor 108 can access data stored on the memory device 110, write data to the memory device 110, and execute programs and modules stored on the memory device 110.
- the memory device 110 includes one or more data storage areas 112 and one or more programs.
- the data and programs are accessible to the computer processor 108 so that the computer processor 108 is particularly programmed to implement the adaptive equalization functionality of the system.
- the programs may include one or more modules executable by the computer processor 108 to perform the desired function.
- the program modules may include a subband processing module 114, a signal power calculation module 116, a background noise level estimation module 118, a speech intelligibility measurement module 120, a spectral shape adjustment module 122, a normalization module 124, and an adaptive equalization module 126.
- the memory device 110 may also store additional programs, modules, or other data to provide additional programming to allow the computer processor 108 to perform the functionality of the adaptive equalization system 104.
- the described modules and programs may be parts of a single program, separate programs, or distributed across several memories and processors. Furthermore, the programs and modules, or any portion of the programs and modules, may instead be implemented in hardware.
- Figure 2 is a flow chart illustrating the functionality of the adaptive equalization system of Figure 1 .
- the functionality of Figure 2 may be achieved by the computer processor 108 accessing data from data storage 112 of Figure 1 and by executing one or more of the modules 114 - 126 of Figure 1 .
- the processor 108 may execute the subband processing module 114 at steps 202 and 222, the signal power calculation module 116 at step 204, the background noise level estimation module 118 at step 206, the speech intelligibility measurement module 120 at step 208, the spectral shape adjustment module 122 at step 210, the normalization module 124 at step 212, and the adaptive equalization module 126 at steps 214, 216, 218, and 220.
- Any of the modules or steps described herein may be combined or divided into a smaller or larger number of steps or modules than what is shown in Figures 1 and 2 .
- the adaptive equalization system may begin its signal processing sequence in Figure 2 with subband analysis at step 202.
- the system may receive an input speech signal that includes speech content, noise content, or both.
- a subband filter processes the input signal to extract frequency information of the input signal.
- the subband filter may be accomplished by various methods, such as a Fast Fourier Transform ("FFT"), critical filter bank, octave filter bank, or one-third octave filter bank.
- FFT Fast Fourier Transform
- the subband analysis at step 202 may include a frequency based transform, such as by a Fast Fourier Transform.
- the subband analysis at step 202 may include a time based filterbank.
- the time based filterbank may be composed of a bank of overlapping bandpass filters, where the center frequencies have non-linear spacing such as octave, 3rd octave, bark, mel, or other spacing techniques.
- Figure 3 illustrates the filter shapes of one implementation of a subband processing filterbank. As shown in Figure 3 , the bands may be narrower at lower frequencies and wider at higher frequencies.
- the lowest and highest filters may be shelving filters so that all the components may be resynthesized to essentially recreate the same input signal when no processing has been applied.
- a frequency based transform may use essentially the same filter shapes applied after transformation of the signal to create the same non-linear spacing or subbands. The frequency based transform may also use a windowed add/overlap analysis.
- the subband processing at step 202 outputs a set of subband signals represented as X n,k , which is the kth subband at time n.
- the system receives the subband signals and determines the subband average signal power of each subband.
- the subband average signal power output from step 204 is represented as X n,k .
- IIR Infinite Impulse Response
- the coefficient ⁇ is a fixed value.
- the coefficient ⁇ may be set at a fixed level of 0.9, which results in a relatively high amount of smoothing. Other higher or lower fixed values are also possible depending on the desired amount of smoothing.
- the coefficient ⁇ may be a variable value. For example, the system may decrease the value of the coefficient ⁇ during times when a lower amount of smoothing is desired, and increase the value of the coefficient ⁇ during times when a higher amount of smoothing is desired.
- the subband signal is smoothed, filtered, and/or averaged.
- the amount of smoothing may be constant or variable.
- the signal is smoothed in time.
- frequency smoothing may be used.
- the system may include some frequency smoothing when the subband filters have some frequency overlap.
- the amount of smoothing may be variable in order to exclude long stretches of silence into the average or for other reasons.
- the power analysis processing at step 204 outputs a smoothed magnitude/power of the input signal in each subband.
- the system receives the subband signals and estimates a subband background noise level for each subband.
- the subband average signal power output from step 206 is represented as B n,k .
- the background noise level is calculated using the background noise estimation techniques disclosed in U.S. Patent No. 7,844,453 , except that in the event of any inconsistent disclosure or definition from the present specification, the disclosure or definition herein shall be deemed to prevail.
- alternative background noise estimation techniques may be used, such as a noise power estimation technique based on minimum statistics.
- the background noise level calculated at step 206 may be smoothed and averaged in time or frequency.
- the output of the background noise estimation at step 206 may be the magnitude/power of the estimated noise for each subband.
- the system performs a speech intelligibility measurement.
- the speech intelligibility measurement outputs a value, represented as I , that is indicative of the intelligibility of the speech content in the input signal.
- the value may be within the range between zero and one, where a value closer to zero indicates that the speech signal has a relatively low intelligibility and where a value closer to one indicates that the speech signal has a relatively high intelligibility.
- the system calculates a Speech Intelligibility Index ("SII") at step 208.
- SII Speech Intelligibility Index
- the Speech Intelligibility Index may be calculated by the techniques described in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997 .
- other objective intelligibility measures such as the speech articulation index (“AI”) or speech-transmission index (“STI”) can also be used to predict speech intelligibility.
- AI speech articulation index
- STI speech-transmission index
- the speech intelligibility measurement at step 208 may receive the subband average signal power X n,k and subband background noise power B n,k as inputs. Additionally, the speech intelligibility measurement at step 208 may receive or access other data used to generate the speech intelligibility measurement. For example, the speech intelligibility measurement at step 208 may access a band importance function. In this example, the system uses the subband average signal power X n,k and subband background noise power B n,k to calculate a signal-to-noise ratio in each subband.
- Figure 4 illustrates one implementation of a signal power estimate 402 and a background noise estimate 404 of a speech signal. As shown in Figure 4 , the signal-to-noise ratio varies across the frequency range. In some frequency subbands a high signal-to-noise ratio results (such as in signal portion 406), while in other frequency subbands the signal-to-noise ratio is lower or even negative (such as in signal portion 408).
- the system may calculate the speech intelligibility measurement based on a band importance function.
- the band importance function illustrates the recognition that certain frequency bands are more important than others for speech intelligibility purposes.
- Figure 5 illustrates one implementation of a band importance function 502.
- the portions of the frequency spectrum between 1000 Hertz and 2500 Hertz have a relatively higher importance value than the very low end of the frequency spectrum (e.g., between 160 Hertz and 400 Hertz) or the very high end of the frequency spectrum (e.g., between 5000 Hertz and 8000 Hertz).
- the speech intelligibility measurement at step 208 may weigh the importance of each subband to calculate an output value based on the relative importance values and the subband SNR.
- the speech intelligibility index may be based on the product of a band importance function (e.g., the importance weights of Figure 5 ) and a band audibility function (e.g., the signal-to-noise ratio for each subband). If a first subband has a high signal-to-noise ratio and a high importance value, then it will provide a relatively high contribution to the overall intelligibility measurement. Alternatively, if a different subband has the same signal-to-noise ratio as the first subband but with a lower importance value, then this band will provide a lower contribution to the overall intelligibility measurement than the first subband.
- a band importance function e.g., the importance weights of Figure 5
- a band audibility function e.g., the signal-to-noise ratio for each subband.
- the importance values used for each band of the band importance function may be set based on the number of bands used and relative importance of each frequency range, as described in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997 .
- the output of the speech intelligibility measurement of step 208 may be a single measurement for the entire signal or may be a measurement for each subband of the signal.
- the system calculates a target spectral shape to be used later in the process as a reference template for equalization adaptation.
- Speech averaged over a long period of time has a typical subband shape.
- the overall shape may be influenced if the talker is male or female or if there is noise present.
- Two example Long-Term Average Speech Shape (“LTASS") subband shapes are shown in Figure 6 .
- Figure 6 shows a first template 602 that represents a talker in quiet conditions, and a second template 604 that represents a talker in noisy conditions.
- the actual LTASS shapes may change based on signal conditions and other factors.
- the system may use the speech intelligibility measurement ( I ) from step 208 to calculate a weighted mix of two predetermined LTASS templates.
- more than two predetermined LTASS templates may be used to calculate the output template shape.
- the speech intelligibility measurement is relatively high, then the average speech signal processed by the system is likely to be more similar to the LTASS shape in the quiet conditions.
- the speech intelligibility measurement is relatively low, then the average speech signal processed by the system is likely to be more similar to the LTASS shape in noisy conditions.
- the weighted long-term speech curve (e.g., the weighted mix of multiple predetermined templates) that is output from step 210 is used as at least part of the target for adaptation of the equalization coefficients.
- the equalized output during the adaptation process may look relatively similar in magnitude at a subband level to the weighted long-term speech curve template.
- the ability of the shapes to match is a moving target because the equalization coefficients and the weighted long-term speech curve shape may change based on signal conditions.
- the weighted long-term speech curve template is used as a reference when modifying speech spectra shape.
- Standard speech spectrums for different vocal efforts, namely normal, raised, loud and shout can be found in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997 .
- those templates may be adjusted to match the actual user environments, such as additive noise level, room acoustics, and microphone frequency response.
- the standard free-field LTASS templates may be adjusted based on the impulse response of the space (e.g., a known impulse response of a vehicle compartment) where the input signal is captured.
- the standard free-field LTASS templates may be adjusted based on the microphone impulse response of the microphone used to capture the input signal.
- L 1 and L 2 are the reference LTASS templates for quiet and noisy conditions, respectively
- I is the speech intelligibility index limited to be in the range between zero and one.
- w is limited to be in the range between zero and one.
- the fixed constants (e.g., 0.45 and 0.3) in the weight factor equation are merely examples, and may be adjusted to control the characteristics of the weighted mix of LTASS templates.
- the constant values may be adjusted to more heavily favor the quiet LTASS template over the noisy LTASS template in the weighting equation.
- the output of the weighted long-term speech curve adjustment at step 210 is a weighted long-term speech curve, represented as L n , k .
- the weighted long-term speech curve may be generated based on the first predetermined long-term average speech curve (e.g., the quiet conditions template), the second predetermined long-term average speech curve (e.g., the noisy conditions template), and the speech intelligibility measurement.
- the system may perform a normalization function at step 212.
- the weighted long-term speech curve template may be scaled based on the current conditions of the input signal and the noise estimate.
- an overall energy constraint may be enforced so that the average signal power after applying equalization gains would be similar to the original signal power without equalization.
- This is achieved by calculating a scaling factor ( ⁇ n ) which is applied to the weighted long-term speech curve template output from step 210 before the template is used in the equalization coefficient adaptation process.
- This normalization serves to minimize the difference between the average input signal power and the average output signal power. For example, the difference in some implementations may be within 1.8 dB.
- the system may perform adaptive equalization based on the normalized LTASS template to improve speech intelligibility of the input signal.
- the adaptive equalization process includes error signal generation at step 214, application of the prior equalization coefficients at step 216, equalization coefficient control at step 218, and application of the new adapted equalization coefficients at step 220.
- the system generates an error signal e n,k .
- the adaptive equalization system serves to adjust its equalization coefficients in order to minimize the value of the error signal.
- the error signal is calculated based on the weighted long-term speech curve template L n,k (with or without normalization), the subband background noise power B n,k , and a processed version of the input speech signal.
- the error signal may be determined without including the subband background noise power B n,k in the calculation.
- the processed version of the input speech signal used to generate the error signal may be calculated at step 216, where the system applies a prior version of the equalization coefficients ( G n-l,k ) to a power spectrum of the speech signal to generate an equalized signal.
- This equalized signal is compared to the weighted long-term speech curve template (e.g., the normalized speech curve from step 212) at step 214. Specifically, the system generates a summed signal by summing the background noise level estimate from step 206 with the normalized speech curve from step 212. The difference between the summed signal and the equalized signal from step 216 results in the error signal.
- the weighted long-term speech curve template e.g., the normalized speech curve from step 212
- the system updates its equalization coefficients in a feedback loop that attempts to drive the error signal to zero.
- the updates to the equalization coefficients may be smoothed.
- ⁇ is the step size
- ⁇ n is the scaling factor
- B is the background noise estimation.
- the value of the step size variable may be set to control the speed of adaptation.
- the step size may be
- the system may apply one or more limits on the adaptation of the equalization coefficients.
- the system may place a signal-to-noise ratio constraint on the adaptation.
- the system may calculate a signal-to-noise ratio of the speech signal, compare the signal-to-noise ratio to a predetermined upper threshold (e.g., 15 dB) or a predetermined lower threshold (e.g., 6 dB), and limit a boosting gain of the equalization coefficients in response to a determination that the signal-to-noise ratio is above the predetermined upper threshold or below the predetermined lower threshold.
- a predetermined upper threshold e.g. 15 dB
- a predetermined lower threshold e.g. 6 dB
- the system may place an intelligibility constraint on the adaptation of the equalization coefficients.
- the system may determine whether an adaptation of the equalization coefficients based on the weighted long-term speech curve would increase or decrease the speech intelligibility measurement of the speech signal.
- the adaptation of the equalization coefficients may be limited in response to a determination that the adaptation of the equalization coefficients would decrease the speech intelligibility measurement. With this constraint, the adaptation of the equalization coefficients should not decrease the intelligibility contribution of each sub-band. If the intelligibility of each subband is not reduced, then the intelligibility of the entire signal should also not be decreased.
- the system may use step size control to constrain adaptation. For example, adaptation is faster when the average speech is far away from the reference template and slower when close.
- the system applies the new adapted version of the equalization coefficients ( G n,k ) to the speech signal on a subband basis.
- the subbands overlap so there is already smoothing over frequency.
- the equalization coefficients may be smoothed over time and/or frequency at step 218.
- the signal is resynthesized from the multiple subbands. For example, the signal may be converted back to a pulse code modulation ("PCM") signal.
- PCM pulse code modulation
- the output signal from step 222 may have a higher level of intelligibility than the input signal received at step 202.
- Each of the processes described herein may be encoded in a computer-readable storage medium (e.g., a computer memory), programmed within a device (e.g., one or more circuits or processors), or may be processed by a controller or a computer. If the processes are performed by software, the software may reside in a local or distributed memory resident to or interfaced to a storage device, a communication interface, or non-volatile or volatile memory in communication with a transmitter.
- the memory may include an ordered listing of executable instructions for implementing logic.
- Logic or any system element described may be implemented through optic circuitry, digital circuitry, through source code, through analog circuitry, or through an analog source, such as through an electrical, audio, or video signal.
- the software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device.
- a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.
- a “computer-readable storage medium,” “machine-readable medium,” “propagated-signal” medium, and/or “signal-bearing medium” may comprise a medium (e.g., a non-transitory medium) that stores, communicates, propagates, or transports software or data for use by or in connection with an instruction executable system, apparatus, or device.
- the machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
- a non-exhaustive list of examples of a machine-readable medium would include: an electrical connection having one or more wires, a portable magnetic or optical disk, a volatile memory, such as a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM or Flash memory), or an optical fiber.
- a machine-readable medium may also include a tangible medium, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Circuit For Audible Band Transducer (AREA)
Description
- This application relates to sound processing and, more particularly, to adaptive equalization of speech signals.
- A speech signal may be adversely impacted by acoustical or electrical characteristics of the acoustical environment or the electrical audio path associated with the speech signal. For example, for a hands-free telephone system in an automobile, the in-car acoustics or microphone characteristics may have a significant detrimental impact on the sound quality or intelligibility of a speech signal transmitted to a remote party.
- Many speech enhancement systems have been developed to suppress background noise and improve speech quality, but little progress has been made to improve speech intelligibility. In recent years, researchers have investigated why current speech enhancement algorithms do not improve speech intelligibility. As a result, new algorithms have been developed that focus on speech intelligibility improvement. However, some of these algorithms require a voicing decision, which may be difficult to achieve in a noisy environment. Other proposed algorithms need additional training, or they need to know the clean speech and noise level in advance, which may not be possible in some applications.
-
US 2009/132248 discloses a system that improves the speech intelligibility and the speech quality of a speech segment. The system includes a dynamic controller that detects a background noise from an input by modeling a signal. A variable gain amplifier adjusts the variable gain of the amplifier in response to an output of dynamic controller. A shaping filter adjusts a speech signal by tilting portions of the speech signal of the dynamic controller. - The present invention is set out in the independent claims, with some optional features set out in the claims dependent thereto.
- The system may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosure. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.
-
Figure 1 illustrates an adaptive equalization system. -
Figure 2 illustrates the functionality of the adaptive equalization system ofFigure 1 . -
Figure 3 illustrates one implementation of a subband processing filterbank. -
Figure 4 is a graph illustrating one implementation of a signal power estimate and a background noise estimate of a speech signal. -
Figure 5 is a graph illustrating one implementation of a band importance function. -
Figure 6 is a graph illustrating two possible long-term average speech curve templates. - This detailed description describes an adaptive equalization system that improves the intelligibility of a speech signal. For example, the system may automatically adjust the spectral shape of the speech signal to improve speech intelligibility. Equalization techniques such as parametric or graphic equalization have long been implemented in audio products to improve sound quality. For example, an equalization curve is often tuned for a specific environment based on experience or to a particular target, but then usually remains unchanged during production or real-time use. In the adaptive equalization system described herein, the equalizer is adapted based on a target shape. This system attempts to automatically compensate for deficiencies in the audio path, which makes the output speech more pleasing and intelligible even in the presence of noise. In some implementations, the system may achieve this increase in intelligibility without requiring a voicing decision and without requiring advanced knowledge of the clean speech and the noise level. Thus, the system may be implemented in real-time applications where only noisy speech is available.
-
Figure 1 illustrates a system that includes anaudio signal source 102, anadaptive equalization system 104, and anaudio signal output 106. Theadaptive equalization system 104 receives an input speech signal from theaudio signal source 102, processes the signal, and outputs an improved version of the input signal to theaudio signal output 106. In one implementation, the output signal received by theaudio signal output 106 may be more intelligible to a listener than the input signal received by theadaptive equalization system 104. Theaudio signal source 102 may be a microphone, an incoming communication system channel, a pre-processing system, or another signal input device. Theaudio signal output 106 may be a loudspeaker, an outgoing communication system channel, a speech recognition system, a post-processing system, or any other output device. - The
adaptive equalization system 104 includes acomputer processor 108 and amemory device 110. Thecomputer processor 108 may be implemented as a central processing unit (CPU), microprocessor, microcontroller, application specific integrated circuit (ASIC), or a combination of other type of circuits. In one implementation, the computer processor is a digital signal processor ("DSP") including a specialized microprocessor with an architecture optimized for the fast operational needs of digital signal processing. Additionally, in some implementations, the digital signal processor may be designed and customized for a specific application, such as an audio system of a vehicle or a signal processing chip of a mobile communication device (e.g., a phone or tablet computer). Thememory device 110 may include a magnetic disc, an optical disc, RAM, ROM, DRAM, SRAM, Flash and/or any other type of computer memory. Thememory device 110 is communicatively coupled with thecomputer processor 108 so that thecomputer processor 108 can access data stored on thememory device 110, write data to thememory device 110, and execute programs and modules stored on thememory device 110. - The
memory device 110 includes one or moredata storage areas 112 and one or more programs. The data and programs are accessible to thecomputer processor 108 so that thecomputer processor 108 is particularly programmed to implement the adaptive equalization functionality of the system. The programs may include one or more modules executable by thecomputer processor 108 to perform the desired function. For example, the program modules may include asubband processing module 114, a signalpower calculation module 116, a background noise level estimation module 118, a speechintelligibility measurement module 120, a spectralshape adjustment module 122, anormalization module 124, and anadaptive equalization module 126. Thememory device 110 may also store additional programs, modules, or other data to provide additional programming to allow thecomputer processor 108 to perform the functionality of theadaptive equalization system 104. The described modules and programs may be parts of a single program, separate programs, or distributed across several memories and processors. Furthermore, the programs and modules, or any portion of the programs and modules, may instead be implemented in hardware. -
Figure 2 is a flow chart illustrating the functionality of the adaptive equalization system ofFigure 1 . The functionality ofFigure 2 may be achieved by thecomputer processor 108 accessing data fromdata storage 112 ofFigure 1 and by executing one or more of the modules 114 - 126 ofFigure 1 . For example, theprocessor 108 may execute thesubband processing module 114 atsteps 202 and 222, the signalpower calculation module 116 atstep 204, the background noise level estimation module 118 atstep 206, the speechintelligibility measurement module 120 atstep 208, the spectralshape adjustment module 122 atstep 210, thenormalization module 124 atstep 212, and theadaptive equalization module 126 atsteps Figures 1 and2 . - The adaptive equalization system may begin its signal processing sequence in
Figure 2 with subband analysis atstep 202. The system may receive an input speech signal that includes speech content, noise content, or both. Atstep 202, a subband filter processes the input signal to extract frequency information of the input signal. The subband filter may be accomplished by various methods, such as a Fast Fourier Transform ("FFT"), critical filter bank, octave filter bank, or one-third octave filter bank. The subband analysis atstep 202 may include a frequency based transform, such as by a Fast Fourier Transform. Alternatively, the subband analysis atstep 202 may include a time based filterbank. The time based filterbank may be composed of a bank of overlapping bandpass filters, where the center frequencies have non-linear spacing such as octave, 3rd octave, bark, mel, or other spacing techniques. As an example,Figure 3 illustrates the filter shapes of one implementation of a subband processing filterbank. As shown inFigure 3 , the bands may be narrower at lower frequencies and wider at higher frequencies. In the filterbank used atstep 202, the lowest and highest filters may be shelving filters so that all the components may be resynthesized to essentially recreate the same input signal when no processing has been applied. A frequency based transform may use essentially the same filter shapes applied after transformation of the signal to create the same non-linear spacing or subbands. The frequency based transform may also use a windowed add/overlap analysis. - The subband processing at
step 202 outputs a set of subband signals represented as Xn,k , which is the kth subband at time n. Atstep 204, the system receives the subband signals and determines the subband average signal power of each subband. The subband average signal power output fromstep 204 is represented asX n,k . In one implementation, for each subband, the subband average signal power is calculated by a first order Infinite Impulse Response ("IIR") filter according to the following equation:
Here, |Xn,k |2 is the signal power of kth suband at time n, and β is a coefficient in the range between zero and one. In one implementation, the coefficient β is a fixed value. For example, the coefficient β may be set at a fixed level of 0.9, which results in a relatively high amount of smoothing. Other higher or lower fixed values are also possible depending on the desired amount of smoothing. In other implementations, the coefficient β may be a variable value. For example, the system may decrease the value of the coefficient β during times when a lower amount of smoothing is desired, and increase the value of the coefficient β during times when a higher amount of smoothing is desired. - At
step 204, the subband signal is smoothed, filtered, and/or averaged. The amount of smoothing may be constant or variable. In one implementation, the signal is smoothed in time. In other implementations, frequency smoothing may be used. For example, the system may include some frequency smoothing when the subband filters have some frequency overlap. The amount of smoothing may be variable in order to exclude long stretches of silence into the average or for other reasons. The power analysis processing atstep 204 outputs a smoothed magnitude/power of the input signal in each subband. - At
step 206, the system receives the subband signals and estimates a subband background noise level for each subband. The subband average signal power output fromstep 206 is represented as Bn,k . In one implementation, the background noise level is calculated using the background noise estimation techniques disclosed inU.S. Patent No. 7,844,453 , except that in the event of any inconsistent disclosure or definition from the present specification, the disclosure or definition herein shall be deemed to prevail. In other implementations, alternative background noise estimation techniques may be used, such as a noise power estimation technique based on minimum statistics. The background noise level calculated atstep 206 may be smoothed and averaged in time or frequency. The output of the background noise estimation atstep 206 may be the magnitude/power of the estimated noise for each subband. - At
step 208, the system performs a speech intelligibility measurement. The speech intelligibility measurement outputs a value, represented as I, that is indicative of the intelligibility of the speech content in the input signal. The value may be within the range between zero and one, where a value closer to zero indicates that the speech signal has a relatively low intelligibility and where a value closer to one indicates that the speech signal has a relatively high intelligibility. In one implementation, the system calculates a Speech Intelligibility Index ("SII") atstep 208. The Speech Intelligibility Index may be calculated by the techniques described in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997. In other implementations, other objective intelligibility measures, such as the speech articulation index ("AI") or speech-transmission index ("STI") can also be used to predict speech intelligibility. - The speech intelligibility measurement at
step 208 may receive the subband average signal powerX n,k and subband background noise power Bn,k as inputs. Additionally, the speech intelligibility measurement atstep 208 may receive or access other data used to generate the speech intelligibility measurement. For example, the speech intelligibility measurement atstep 208 may access a band importance function. In this example, the system uses the subband average signal powerX n,k and subband background noise power Bn,k to calculate a signal-to-noise ratio in each subband.Figure 4 illustrates one implementation of asignal power estimate 402 and abackground noise estimate 404 of a speech signal. As shown inFigure 4 , the signal-to-noise ratio varies across the frequency range. In some frequency subbands a high signal-to-noise ratio results (such as in signal portion 406), while in other frequency subbands the signal-to-noise ratio is lower or even negative (such as in signal portion 408). - At
step 208 ofFigure 2 , the system may calculate the speech intelligibility measurement based on a band importance function. The band importance function illustrates the recognition that certain frequency bands are more important than others for speech intelligibility purposes.Figure 5 illustrates one implementation of aband importance function 502. In the example ofFigure 5 , the portions of the frequency spectrum between 1000 Hertz and 2500 Hertz have a relatively higher importance value than the very low end of the frequency spectrum (e.g., between 160 Hertz and 400 Hertz) or the very high end of the frequency spectrum (e.g., between 5000 Hertz and 8000 Hertz). The speech intelligibility measurement atstep 208 may weigh the importance of each subband to calculate an output value based on the relative importance values and the subband SNR. For example, the speech intelligibility index may be based on the product of a band importance function (e.g., the importance weights ofFigure 5 ) and a band audibility function (e.g., the signal-to-noise ratio for each subband). If a first subband has a high signal-to-noise ratio and a high importance value, then it will provide a relatively high contribution to the overall intelligibility measurement. Alternatively, if a different subband has the same signal-to-noise ratio as the first subband but with a lower importance value, then this band will provide a lower contribution to the overall intelligibility measurement than the first subband. The importance values used for each band of the band importance function may be set based on the number of bands used and relative importance of each frequency range, as described in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997. The output of the speech intelligibility measurement ofstep 208 may be a single measurement for the entire signal or may be a measurement for each subband of the signal. - At
step 210, the system calculates a target spectral shape to be used later in the process as a reference template for equalization adaptation. Speech averaged over a long period of time has a typical subband shape. The overall shape may be influenced if the talker is male or female or if there is noise present. Two example Long-Term Average Speech Shape ("LTASS") subband shapes are shown inFigure 6 . Specifically,Figure 6 shows afirst template 602 that represents a talker in quiet conditions, and asecond template 604 that represents a talker in noisy conditions. The actual LTASS shapes may change based on signal conditions and other factors. - At
step 210, the system may use the speech intelligibility measurement (I) fromstep 208 to calculate a weighted mix of two predetermined LTASS templates. In other implementations, more than two predetermined LTASS templates may be used to calculate the output template shape. As one example, if the speech intelligibility measurement is relatively high, then the average speech signal processed by the system is likely to be more similar to the LTASS shape in the quiet conditions. As another example, if the speech intelligibility measurement is relatively low, then the average speech signal processed by the system is likely to be more similar to the LTASS shape in noisy conditions. The weighted long-term speech curve (e.g., the weighted mix of multiple predetermined templates) that is output fromstep 210 is used as at least part of the target for adaptation of the equalization coefficients. When considering a long term average, the equalized output during the adaptation process may look relatively similar in magnitude at a subband level to the weighted long-term speech curve template. In some implementations, the ability of the shapes to match is a moving target because the equalization coefficients and the weighted long-term speech curve shape may change based on signal conditions. - The weighted long-term speech curve template is used as a reference when modifying speech spectra shape. Standard speech spectrums for different vocal efforts, namely normal, raised, loud and shout can be found in the American National Standard, "Methods for the Calculation of the Speech Intelligibility Index," ANSI S3.5-1997. However, for different applications, those templates may be adjusted to match the actual user environments, such as additive noise level, room acoustics, and microphone frequency response. As one example, the standard free-field LTASS templates may be adjusted based on the impulse response of the space (e.g., a known impulse response of a vehicle compartment) where the input signal is captured. As another example, the standard free-field LTASS templates may be adjusted based on the microphone impulse response of the microphone used to capture the input signal.
- In one implementation, the weighted long-term speech curve output from
step 210 is constantly or repeatedly adjusted based on the speech intelligibility index according to the following equation:
Here, L1 and L2 are the reference LTASS templates for quiet and noisy conditions, respectively, and w is a weight factor calculated according to the following equation:
Here, I is the speech intelligibility index limited to be in the range between zero and one. Furthermore, w is limited to be in the range between zero and one. The fixed constants (e.g., 0.45 and 0.3) in the weight factor equation are merely examples, and may be adjusted to control the characteristics of the weighted mix of LTASS templates. For examples, the constant values may be adjusted to more heavily favor the quiet LTASS template over the noisy LTASS template in the weighting equation. - The output of the weighted long-term speech curve adjustment at
step 210 is a weighted long-term speech curve, represented as L n,k . The weighted long-term speech curve may be generated based on the first predetermined long-term average speech curve (e.g., the quiet conditions template), the second predetermined long-term average speech curve (e.g., the noisy conditions template), and the speech intelligibility measurement. However, before the weighted long-term speech curve can be used as a reference for the adaptive equalization process, the system may perform a normalization function atstep 212. In one implementation, the weighted long-term speech curve template may be scaled based on the current conditions of the input signal and the noise estimate. For example, an overall energy constraint may be enforced so that the average signal power after applying equalization gains would be similar to the original signal power without equalization. This is achieved by calculating a scaling factor (γ n ) which is applied to the weighted long-term speech curve template output fromstep 210 before the template is used in the equalization coefficient adaptation process. The scaling factor may be calculated by the following equation:
This normalization serves to minimize the difference between the average input signal power and the average output signal power. For example, the difference in some implementations may be within 1.8 dB. - After the normalized LTASS template is available, the system may perform adaptive equalization based on the normalized LTASS template to improve speech intelligibility of the input signal. The adaptive equalization process includes error signal generation at
step 214, application of the prior equalization coefficients atstep 216, equalization coefficient control atstep 218, and application of the new adapted equalization coefficients at step 220. - At
step 214, the system generates an error signal en,k . The adaptive equalization system serves to adjust its equalization coefficients in order to minimize the value of the error signal. In one implementation, the error signal is calculated based on the weighted long-term speech curve template Ln,k (with or without normalization), the subband background noise power Bn,k , and a processed version of the input speech signal. In another implementation, the error signal may be determined without including the subband background noise power Bn,k in the calculation. The processed version of the input speech signal used to generate the error signal may be calculated atstep 216, where the system applies a prior version of the equalization coefficients (G n-l,k) to a power spectrum of the speech signal to generate an equalized signal. This equalized signal is compared to the weighted long-term speech curve template (e.g., the normalized speech curve from step 212) atstep 214. Specifically, the system generates a summed signal by summing the background noise level estimate fromstep 206 with the normalized speech curve fromstep 212. The difference between the summed signal and the equalized signal fromstep 216 results in the error signal. - At
step 218, the system updates its equalization coefficients in a feedback loop that attempts to drive the error signal to zero. In some implementations, the updates to the equalization coefficients may be smoothed. As one example, for the kth sub-band at time n, the equalizing gain may be calculated according to the following equations:
Here, µ is the step size, γ n is the scaling factor, and B is the background noise estimation. The value of the step size variable may be set to control the speed of adaptation. In one implementation, the step size may be set to 0.001, although higher or lower values may also be used depending on the desired speed of adaptation. - The system may apply one or more limits on the adaptation of the equalization coefficients. As one example, the system may place a signal-to-noise ratio constraint on the adaptation. In this example, the system may calculate a signal-to-noise ratio of the speech signal, compare the signal-to-noise ratio to a predetermined upper threshold (e.g., 15 dB) or a predetermined lower threshold (e.g., 6 dB), and limit a boosting gain of the equalization coefficients in response to a determination that the signal-to-noise ratio is above the predetermined upper threshold or below the predetermined lower threshold.
- As another example, the system may place an intelligibility constraint on the adaptation of the equalization coefficients. In this example, the system may determine whether an adaptation of the equalization coefficients based on the weighted long-term speech curve would increase or decrease the speech intelligibility measurement of the speech signal. The adaptation of the equalization coefficients may be limited in response to a determination that the adaptation of the equalization coefficients would decrease the speech intelligibility measurement. With this constraint, the adaptation of the equalization coefficients should not decrease the intelligibility contribution of each sub-band. If the intelligibility of each subband is not reduced, then the intelligibility of the entire signal should also not be decreased.
- As another example, the system may use step size control to constrain adaptation. For example, adaptation is faster when the average speech is far away from the reference template and slower when close.
- At step 220, the system applies the new adapted version of the equalization coefficients (G n,k) to the speech signal on a subband basis. In one implementation, the subbands overlap so there is already smoothing over frequency. Additionally, the equalization coefficients may be smoothed over time and/or frequency at
step 218. At step 222, the signal is resynthesized from the multiple subbands. For example, the signal may be converted back to a pulse code modulation ("PCM") signal. The output signal from step 222 may have a higher level of intelligibility than the input signal received atstep 202. - Each of the processes described herein may be encoded in a computer-readable storage medium (e.g., a computer memory), programmed within a device (e.g., one or more circuits or processors), or may be processed by a controller or a computer. If the processes are performed by software, the software may reside in a local or distributed memory resident to or interfaced to a storage device, a communication interface, or non-volatile or volatile memory in communication with a transmitter. The memory may include an ordered listing of executable instructions for implementing logic. Logic or any system element described may be implemented through optic circuitry, digital circuitry, through source code, through analog circuitry, or through an analog source, such as through an electrical, audio, or video signal. The software may be embodied in any computer-readable or signal-bearing medium, for use by, or in connection with an instruction executable system, apparatus, or device. Such a system may include a computer-based system, a processor-containing system, or another system that may selectively fetch instructions from an instruction executable system, apparatus, or device that may also execute instructions.
- A "computer-readable storage medium," "machine-readable medium," "propagated-signal" medium, and/or "signal-bearing medium" may comprise a medium (e.g., a non-transitory medium) that stores, communicates, propagates, or transports software or data for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. A non-exhaustive list of examples of a machine-readable medium would include: an electrical connection having one or more wires, a portable magnetic or optical disk, a volatile memory, such as a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM or Flash memory), or an optical fiber. A machine-readable medium may also include a tangible medium, as the software may be electronically stored as an image or in another format (e.g., through an optical scan), then compiled, and/or interpreted or otherwise processed. The processed medium may then be stored in a computer and/or machine memory.
- While various embodiments, features, and benefits of the present system have been described, it will be apparent to those of ordinary skill in the art that many more embodiments, features, and benefits are possible within the scope of the disclosure. For example, other alternate systems may include any combinations of structure and functions described above or shown in the figures.
- The invention is however solely defined by the appended claims.
Claims (15)
- An adaptive equalization method, comprising:calculating (208) a speech intelligibility measurement of a speech signal by a computer processor (108);obtaining (210) a first predetermined long-term average speech curve;obtaining (210) a second predetermined long-term average speech curve;generating (210) a weighted long-term speech curve by the computer processor based on the first predetermined long-term average speech curve, the second predetermined long-term average speech curve, and the speech intelligibility measurement; andadapting (218) equalization coefficients for the speech signal by the computer processor based on the weighted long-term speech curve.
- The method of claim 1, where the first predetermined long-term average speech curve is a first speech template in quiet conditions (602), and the second predetermined long-term average speech curve is a second speech template in noisy conditions (604);
where the step of generating the weighted long-term speech curve comprises:calculating (210) a weight factor from the speech intelligibility measurement; andaveraging (210) the first speech template in quiet conditions with the second speech template in noisy conditions based on the weight factor to generate the weighted long-term speech curve. - The method of claim 1, where calculating the speech intelligibility measurement comprises calculating a product of a band importance function and a band audibility function, summed over a plurality of bands of the speech signal.
- The method of claim 1, where calculating the speech intelligibility measurement comprises:calculating (204) a signal power measurement for a frequency band of the speech signal;estimating (206) a background noise level for the frequency band of the speech signal; andcalculating (208) the speech intelligibility measurement from the signal power measurement, the background noise level, and a band importance value associated with the frequency band of the speech signal.
- The method of claim 1, where adapting the equalization coefficients comprises:applying (216) a prior version of the equalization coefficients to a power spectrum of the speech signal to generate an equalized signal; andadapting (218) the equalization coefficients to generate an adapted version of the equalization coefficients based on a difference between the equalized signal and the weighted long-term speech curve;the method further comprising applying (220) the adapted version of the equalization coefficients to the speech signal to transform one or more aspects of the speech signal and produce an output speech signal.
- The method of claim 1, where adapting the equalization coefficients comprises:normalizing (212) the weighted long-term speech curve based on a power measurement of the speech signal to generate a normalized speech curve;applying (216) a prior version of the equalization coefficients to a power spectrum of the speech signal to generate an equalized signal;estimating (206) a background noise level of the speech signal;summing (214) the background noise level and the normalized speech curve to generate a summed signal;calculating (214) an error signal based on a difference between the summed signal and the equalized signal; andadapting (218) the equalization coefficients based on the error signal to generate an adapted version of the equalization coefficients.
- The method of claim 1, where adapting the equalization coefficients comprises:calculating a signal-to-noise ratio of the speech signal;comparing the signal-to-noise ratio to a predetermined upper threshold or a predetermined lower threshold; andlimiting a boosting gain of the equalization coefficients in response to a determination that the signal-to-noise ratio is above the predetermined upper threshold or below the predetermined lower threshold.
- The method of claim 1, where adapting the equalization coefficients comprises:determining whether an adaptation of the equalization coefficients based on the weighted long-term speech curve would increase or decrease the speech intelligibility measurement of the speech signal; andconstraining the adaptation of the equalization coefficients in response to a determination that the adaptation of the equalization coefficients would decrease the speech intelligibility measurement.
- An adaptive equalization system, comprising:a computer processor (108);a speech intelligibility measurement module (120) configured to calculate, when executed by the computer processor, a speech intelligibility measurement of a speech signal;a spectral shape adjustment module (122) configured to generate, when executed by the computer processor, a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement; andan adaptive equalization module (126) configured to adapt , when executed by the computer processor, equalization coefficients for the speech signal based on the weighted long-term speech curve.
- The system of claim 9, where the first predetermined long-term average speech curve is a first speech template in quiet conditions (602), and the second predetermined long-term average speech curve is a second speech template in noisy conditions (604);
where the spectral shape adjustment module is configured to calculate a weight factor from the speech intelligibility measurement; and
where the spectral shape adjustment module is configured to average the first speech template in quiet conditions with the second speech template in noisy conditions based on the weight factor to generate the weighted long-term speech curve. - The system of claim 9, where the speech intelligibility measurement module is configured to calculate the speech intelligibility measurement by determining a product of a band importance function and a band audibility function, summed over a plurality of bands of the speech signal.
- The system of claim 9, further comprising:a signal power calculation module (116) configured to calculate, when executed by the computer processor, a signal power measurement for a frequency band of the speech signal; anda background noise level estimation module (118) configured to estimate, when executed by the computer processor, a background noise level for the frequency band of the speech signal;where the speech intelligibility measurement module is configured to calculate the speech intelligibility measurement from the signal power measurement, the background noise level, and a band importance value associated with the frequency band of the speech signal.
- The system of claim 9, where the adaptive equalization module is configured to apply a prior version of the equalization coefficients to a power spectrum of the speech signal to generate an equalized signal;
where the adaptive equalization module is configured to adapt the equalization coefficients to generate an adapted version of the equalization coefficients based on a difference between the equalized signal and the weighted long-term speech curve; and
where the adaptive equalization module is configured to apply the adapted version of the equalization coefficients to the speech signal to transform one or more aspects of the speech signal and produce an output speech signal. - The system of claim 9, further comprising:a background noise level estimation module (118) configured to calculate, when executed by the computer processor, a background noise level of the speech signal; anda normalization module (124) configured to normalize, when executed by the computer processor, the weighted long-term speech curve based on a power measurement of the speech signal to generate a normalized speech curve;where the adaptive equalization module is configured to apply a prior version of the equalization coefficients to a power spectrum of the speech signal to generate an equalized signal;where the adaptive equalization module is configured to sum the background noise level and the normalized speech curve to generate a summed signal;where the adaptive equalization module is configured to calculate an error signal based on a difference between the summed signal and the equalized signal; andwhere the adaptive equalization module is configured to adapt the equalization coefficients based on the error signal to generate an adapted version of the equalization coefficients.
- The system of claim 9, further comprising:an adaptation constraint module (126) configured to compare, when executed by the computer processor, a signal-to-noise ratio of the speech signal to a predetermined upper threshold or a predetermined lower threshold, where the adaptation constraint module is configured to limit a boosting gain of the equalization coefficients in response to a determination that the signal-to-noise ratio is above the predetermined upper threshold or below the predetermined lower threshold; andan adaptation constraint module (126) configured to determine , when executed by the computer processor, whether an adaptation of the equalization coefficients based on the weighted long-term speech curve would increase or decrease the speech intelligibility measurement of the speech signal, where the adaptation constraint module is configured to constrain the adaptation of the equalization coefficients in response to a determination that the adaptation of the equalization coefficients would decrease the speech intelligibility measurement.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12166906.3A EP2660814B1 (en) | 2012-05-04 | 2012-05-04 | Adaptive equalization system |
CA2814434A CA2814434C (en) | 2012-05-04 | 2013-05-01 | Adaptive equalization system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12166906.3A EP2660814B1 (en) | 2012-05-04 | 2012-05-04 | Adaptive equalization system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2660814A1 EP2660814A1 (en) | 2013-11-06 |
EP2660814B1 true EP2660814B1 (en) | 2016-02-03 |
Family
ID=46022132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP12166906.3A Active EP2660814B1 (en) | 2012-05-04 | 2012-05-04 | Adaptive equalization system |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP2660814B1 (en) |
CA (1) | CA2814434C (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5740353B2 (en) * | 2012-06-05 | 2015-06-24 | 日本電信電話株式会社 | Speech intelligibility estimation apparatus, speech intelligibility estimation method and program thereof |
CN105989853B (en) * | 2015-02-28 | 2020-08-18 | 科大讯飞股份有限公司 | Audio quality evaluation method and system |
GB2553571B (en) * | 2016-09-12 | 2020-03-04 | Jaguar Land Rover Ltd | Apparatus and method for privacy enhancement |
IT202000010435A1 (en) * | 2020-05-08 | 2021-11-08 | Rai Radiotelevisione Italiana Spa | METHOD FOR IMPROVING THE PERCEPTION OF THE QUALITY OF A DIGITAL AUDIO SIGNAL EMITTED BY A RECEIVER OF TELEVISION SIGNALS, PARTICULARLY OF THE FLAT SCREEN TYPE, AND RELATED DEVICE |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DK1522206T3 (en) * | 2002-07-12 | 2007-11-05 | Widex As | Hearing aid and a method of improving speech intelligibility |
US7844453B2 (en) | 2006-05-12 | 2010-11-30 | Qnx Software Systems Co. | Robust noise estimation |
US8296136B2 (en) * | 2007-11-15 | 2012-10-23 | Qnx Software Systems Limited | Dynamic controller for improving speech intelligibility |
-
2012
- 2012-05-04 EP EP12166906.3A patent/EP2660814B1/en active Active
-
2013
- 2013-05-01 CA CA2814434A patent/CA2814434C/en active Active
Also Published As
Publication number | Publication date |
---|---|
CA2814434C (en) | 2016-08-16 |
EP2660814A1 (en) | 2013-11-06 |
CA2814434A1 (en) | 2013-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9536536B2 (en) | Adaptive equalization system | |
US9064498B2 (en) | Apparatus and method for processing an audio signal for speech enhancement using a feature extraction | |
US8626502B2 (en) | Improving speech intelligibility utilizing an articulation index | |
US8219389B2 (en) | System for improving speech intelligibility through high frequency compression | |
US8433582B2 (en) | Method and apparatus for estimating high-band energy in a bandwidth extension system | |
US8170221B2 (en) | Audio enhancement system and method | |
KR101482830B1 (en) | Method and apparatus for bandwidth extension of audio signal | |
CN101976566B (en) | Speech enhancement method and device applying the method | |
US8364479B2 (en) | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations | |
EP2244254B1 (en) | Ambient noise compensation system robust to high excitation noise | |
US8447044B2 (en) | Adaptive LPC noise reduction system | |
US20050207583A1 (en) | Audio enhancement system and method | |
US10043533B2 (en) | Method and device for boosting formants from speech and noise spectral estimation | |
WO2008121436A1 (en) | Method and apparatus for quickly detecting a presence of abrupt noise and updating a noise estimate | |
US11128954B2 (en) | Method and electronic device for managing loudness of audio signal | |
EP2238593A1 (en) | Method and apparatus for estimating high-band energy in a bandwidth extension system | |
WO2001073758A1 (en) | Spectrally interdependent gain adjustment techniques | |
US8199928B2 (en) | System for processing an acoustic input signal to provide an output signal with reduced noise | |
JP2023536104A (en) | Noise reduction using machine learning | |
EP2660814B1 (en) | Adaptive equalization system | |
WO2009123387A1 (en) | Procedure for processing noisy speech signals, and apparatus and computer program therefor | |
CN116547753A (en) | Machine learning assisted spatial noise estimation and suppression | |
Hayashi et al. | Single channel speech enhancement based on perceptual frequency-weighting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20120504 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: 2236008 ONTARIO INC. |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602012014293 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G10L0021020000 Ipc: G10L0021036400 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/0364 20130101AFI20150629BHEP Ipc: G10L 21/0264 20130101ALN20150629BHEP |
|
INTG | Intention to grant announced |
Effective date: 20150721 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 774023 Country of ref document: AT Kind code of ref document: T Effective date: 20160215 Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602012014293 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: FP |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 5 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 774023 Country of ref document: AT Kind code of ref document: T Effective date: 20160203 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160504 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160503 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160603 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160603 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160531 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602012014293 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: LU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160504 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
26N | No opposition filed |
Effective date: 20161104 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160531 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160531 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160503 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160504 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 7 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20120504 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: MT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160531 Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160203 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 602012014293 Country of ref document: DE Owner name: MALIKIE INNOVATIONS LTD., IE Free format text: FORMER OWNER: 2236008 ONTARIO INC., WATERLOO, ONTARIO, CA Ref country code: DE Ref legal event code: R082 Ref document number: 602012014293 Country of ref document: DE Representative=s name: MERH-IP MATIAS ERNY REICHL HOFFMANN PATENTANWA, DE Ref country code: DE Ref legal event code: R081 Ref document number: 602012014293 Country of ref document: DE Owner name: BLACKBERRY LIMITED, WATERLOO, CA Free format text: FORMER OWNER: 2236008 ONTARIO INC., WATERLOO, ONTARIO, CA |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20200730 AND 20200805 |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: PD Owner name: BLACKBERRY LIMITED; CA Free format text: DETAILS ASSIGNMENT: CHANGE OF OWNER(S), ASSIGNMENT; FORMER OWNER NAME: 2236008 ONTARIO INC. Effective date: 20201109 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 602012014293 Country of ref document: DE Ref country code: DE Ref legal event code: R081 Ref document number: 602012014293 Country of ref document: DE Owner name: MALIKIE INNOVATIONS LTD., IE Free format text: FORMER OWNER: BLACKBERRY LIMITED, WATERLOO, ONTARIO, CA |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20240527 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20240521 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240529 Year of fee payment: 13 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240527 Year of fee payment: 13 |