KR101741141B1

KR101741141B1 - Apparatus for suppressing noise and method thereof

Info

Publication number: KR101741141B1
Application number: KR1020150181624A
Authority: KR
Inventors: 이석필; 서지훈; 한혁수
Original assignee: 상명대학교산학협력단
Priority date: 2015-12-18
Filing date: 2015-12-18
Publication date: 2017-05-29
Also published as: WO2017104876A1

Abstract

A noise canceling method according to an aspect of the present invention includes: receiving a mixed signal including a voice signal and a noise signal; Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal; Obtaining a post-SNR using the noise signal and the mixed signal; Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame; Calculating a weight value using the estimated preceding SNR; Calculating a filter value for each frequency using the calculated weight value; And multiplying the mixed signal by the calculated filter value to obtain the enhanced estimated speech signal.

Description

[0001] APPARATUS FOR SUPPRESSING NOISE AND METHOD THEREOF [0002]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to signal processing for improving speech, and more particularly, to a signal processing method and apparatus for enhancing intelligibility of speech by eliminating wind sounds included in speech.

As the spread of smartphones grows, a variety of speech recognition technologies are being used. Apple's Siri and Google's Google Now are some of the more popular smartphone services using voice recognition.

In a quiet environment, the recognition rate of such a voice recognition service is high. Even in a normal call situation, the other party's voice can be heard well. However, when a noisy situation or wind noise is mixed with the user's voice and input to the smartphone, The voice recognition rate of the service is lowered and the voice of the other party can not be recognized well.

In the case of mixed wind sounds, the prior art attempted to reduce the wind noise by simply cutting out a specific band of the signal using a low pass filter (LPF) or a high pass filter (HPF).

Korean Patent Application No. 10-2005-0120682 The present invention relates to a method for automatically removing a wind sound according to a level, in which a mixed signal is filtered by a low-pass filter, and the level is measured to generate a control signal according to the measured level It is an invention to eliminate wind noise through a high pass filter.

However, there is a problem that the speech recognition rate can not be improved because a simple filtering method causes a loss in the user's voice band as well as the wind sound.

SUMMARY OF THE INVENTION It is an object of the present invention to provide an apparatus and method for obtaining a filter coefficient by using a preceding signal-to-noise ratio and a post-signal-to-noise ratio and using the same to eliminate wind noise.

The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

According to another aspect of the present invention, there is provided a noise canceling method comprising: receiving a mixed signal including a voice signal and a noise signal; Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal; Obtaining a post-SNR using the noise signal and the mixed signal; Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame; Calculating a weight value using the estimated preceding SNR; Calculating a filter value for each frequency using the calculated weight value; And multiplying the mixed signal by the calculated filter value to obtain the enhanced estimated speech signal.

According to another aspect of the present invention, there is provided a noise canceling apparatus including at least one processor, the processor including: an input unit for receiving a mixed signal including a voice signal and a noise signal; A frequency signal converter for converting the mixed signal into a frequency domain signal; Wherein the noise signal is obtained using the interval in which the speech signal is absent, the post-SNR is calculated using the noise signal and the mixed signal, and the post-SNR, the noise signal of the previous frame, An operation unit for estimating a preceding signal-to-noise ratio of a current frame using a signal-to-noise ratio, calculating a weight value using the estimated preceding signal-to-noise ratio, and calculating a filter value for each frequency using the calculated weight value; A filter unit for multiplying the mixed signal by the calculated filter value to obtain an improved speech signal; A time domain signal converter for converting the enhanced speech signal into a time domain signal; And a control unit.

According to the present invention, it is possible to increase the voice recognition rate and improve the clarity of voice during a call by providing a voice enhancement technique by filtering a signal mixed with a wind sound using a filter formed using a preceding signal-to-noise ratio and a post- .

1 is a flowchart of a noise removal method according to an embodiment of the present invention;
BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a noise canceling method.
3 is a structural view of a noise removing apparatus according to another embodiment of the present invention.
4 is a structural view of a computer apparatus in which a noise reduction method according to another embodiment of the present invention is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 shows a flowchart of a noise reduction method according to an embodiment of the present invention.

In order to remove noises for voice signal enhancement, a mixed signal is input first (S110).

Since the input mixed signal is a time domain signal, an FFT (Fast Fourier Transform) operation is performed to convert it into a frequency domain signal. The signal converted into the frequency domain signal through the FFT operation is composed of an amplitude signal and a phase signal. In the present invention, since the operation is performed using only the amplitude signal, the phase signal is transmitted to the output side without modification.

To obtain the a priori SNR, a noise signal, a mixed signal, and a posteriori SNR are required. Since only the mixed signal is received, the residual noise signal and the post-SNR are estimated from the mixed signal.

First, the noise signal is obtained by using the interval in which no speech is present in the mixed signal. Since the human voice is not always present in the mixed signal and therefore the human voice is not present in the short section after receiving the mixed signal input, the noise signal is obtained assuming that only the noise signal exists in this section.

After the noise signal is obtained, the post-SNR can be obtained using the noise signal and the mixed signal. The post-SNR can be obtained by the following Equation 1 (S120).

Post-signal-to-noise ratio

(P, k) represents the mixed signal and the noise signal at the p-th frame and the k-th frequency index, respectively. The noise signal uses the values assumed in the previous step.

The preceding signal-to-noise ratio is calculated using the calculated post-signal-to-noise ratio (S130).

Is an estimated speech signal from which a noise signal is removed from a mixed signal. The speech signal before the start of the calculation according to the present invention is initialized to 0, the speech signal of the frame is estimated, and the preceding signal-to- .

α is a preset proportional coefficient value for adjusting the influence of the estimated speech signal and the noise signal of the previous frame and the post-S / N ratio accumulated from the first frame to the previous frame in estimating the speech signal.

That is, α is a value between 0 and 1. The closer to 1, the more influence on the value of the previous frame. The closer to 0, the more affected by the accumulated value from the 1st frame to the previous frame. It means that the influence becomes larger.

When the preceding signal-to-noise ratio is extracted, the weight value is calculated using this value (S140), and the weight value can be obtained by Equation (3).

The μ value is a weight parameter. If the value of the preceding signal-to-noise ratio is large, it means that the size of the voice signal is large. Therefore, the weight value must be large. Conversely, if the value of the preceding signal-to-noise ratio is small, the voice signal is small as compared with the noise signal. .

If the weight value and the preceding signal-to-noise ratio value are obtained, the filter value H (p, k) used for noise cancellation can be obtained using the two values (S150).

Since the filter value for each frequency index can be obtained in the corresponding frame, the final speech signal in which noise is finally reduced can be obtained by multiplying the mixed signal by the noise elimination filter value (S160). This process is shown in Equation (5).

Y (p, k) represents a mixed signal, and as described above,

Is used to determine the preceding signal-to-noise ratio in the next frame.

FIG. 2 shows a flow chart of a signal until a mixed signal mixed with a noise signal is filtered to output a signal in which a noise signal is attenuated.

Finally, since the estimated voice signal is the amplitude signal of the voice signal, it is converted into a time domain signal by IFFT (Inverse Fast Fourier Transform) together with the phase signal of the voice signal that has not been subjected to the deformation.

When noise is removed by estimating the preceding signal-to-noise ratio, noise can be more effectively removed than by removing a noise with a simple filter such as a conventional LPF.

3 is a structural view of a noise removing apparatus according to another embodiment of the present invention.

The input unit 310 receives a mixed signal in which a voice signal and a noise signal are mixed. The input unit may be formed of a microphone or the like, or may extract only a mixed signal which is a voice signal by receiving a file type input such as an audio file or a moving picture file.

Since the present invention processes a signal in the frequency domain, the frequency signal transforming unit 320 transforms the received signal into a frequency signal through a method such as FFT. Frequency signal conversion can be performed not only by FFT, but also by methods such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), and Filterbank.

The operation unit 330 extracts a filter value for noise cancellation from the input signal.

The post-SNR is first obtained from the received mixed signal and the noise signal, and this procedure is as shown in Equation (1). It is impossible to distinguish the voice signal from the noise signal in the mixed signal. However, it is assumed that the voice signal does not exist in the initial input signal, and the post-SNR is calculated assuming that the signal in this interval is a noise signal.

The preceding signal-to-noise ratio is calculated using the thus-calculated post-signal-to-noise ratio, the speech signal estimated in the previous frame, and the average value of the noise signal obtained in the previous frame. In this process, it is possible to control the rate at which the previous frame estimation value and the history value of the previous frames including the previous frame influence the preceding signal-to-noise ratio using the proportional coefficient value.

If the ratio of the previous frame value is increased, it can be sensitive to the change between frames. However, it can cause inconvenience to the user due to frequent change, and when the ratio of the history value is increased, sudden change can be suppressed Although it is possible to hear a natural voice signal, it can not quickly respond to a signal that changes rapidly in time, so an optimal value between the two can be determined by experiments.

If the preceding signal-to-noise ratio is obtained by Equation (2), the weight value can be obtained. If the preceding signal-to-noise ratio is large, the speech signal is expected to be large. To increase the weight value and decrease the weight value, to be. The weight value is obtained by Equation (3).

The filter value can be finally obtained by using the weight value and the preceding signal-to-noise ratio value.

The filter unit 340 multiplies the mixed signal by the filter value thus obtained to obtain a noise-free signal.

Since the noise-canceled signal through the filter unit 340 is a signal in the frequency domain, the speech signal is converted into a time domain signal through the time signal converter 350 and provided to the output unit, You can hear the signal.

The time signal converter 350 may convert a frequency domain signal into a time domain signal using a method such as IFFT, Inverse DFT (IDFT), Inverse DCT (IDCT), or Inverse Filterbank.

Meanwhile, the noise cancellation method in the embodiment of the present invention can be implemented in a computer system or recorded on a recording medium. 4, a computer system includes at least one processor 421, a memory 423, a user input device 426, a data communication bus 422, a user output device 427, And may include a storage 428. Each of the above-described components performs data communication via the data communication bus 422. [

The computer system may further include a network interface 429 coupled to the network. The processor 421 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 423 and / or the storage 428.

The memory 423 and the storage 428 may include various forms of volatile or non-volatile storage media. For example, the memory 423 may include a ROM 424 and a RAM 425.

Accordingly, the noise cancellation method according to the embodiment of the present invention can be implemented in a computer-executable method. When the noise cancellation method according to an embodiment of the present invention is performed in a computer device, computer-readable instructions can perform the recognition method according to the present invention.

Meanwhile, the noise reduction method according to the present invention can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.

While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

310: input unit 320: frequency signal converter
330: operation unit 340: filter unit
350: time signal converting section

Claims

Receiving a mixed signal including a voice signal and a noise signal;
Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal;
Obtaining a post-SNR using the noise signal and the mixed signal;
Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame;
Calculating a weight value by dividing a square root of a value obtained by squaring a preceding SNR of the current frame and an absolute value of a preceding SNR of the current frame by an absolute value of a preceding SNR of the current frame;
Calculating a filter value for each frequency using the calculated weight value; And
Multiplying the mixed signal by the calculated filter value to obtain an enhanced estimated speech signal;
&Lt; / RTI >

2. The method of claim 1, wherein the step of determining the post-
A value obtained by dividing the size of the mixed signal by the size of the noise signal is set as a post-signal-to-noise ratio
In noise removal method.

2. The method of claim 1, wherein the preceding signal to noise ratio
A value obtained by dividing a value obtained by squaring the size of the estimated voice signal of the previous frame by an average value of a value obtained by squaring the size of the noise signal,
A value obtained by multiplying a value obtained by subtracting 1 from the post-signal-to-noise ratio and a value obtained by multiplying a large value of 0 by a value obtained by subtracting the predetermined proportional coefficient from 1 is added to a value obtained by adding all values from the first frame to the previous frame
In noise removal method.

delete

The method of claim 1,
And multiplying the value obtained by multiplying the preceding signal-to-noise ratio by the weight by a value obtained by multiplying the value obtained by multiplying the preceding signal-to-
In noise removal method.

A noise cancellation apparatus comprising one or more processors,
An input unit for receiving a mixed signal including a voice signal and a noise signal;
A frequency signal converter for converting the mixed signal into a frequency domain signal;
Wherein the noise signal is obtained using the interval in which the speech signal is absent, the post-SNR is calculated using the noise signal and the mixed signal, and the post-SNR, the noise signal of the previous frame, Noise ratio of the current frame and a value of a square root of a sum of a value obtained by squaring the preceding signal to noise ratio of the current frame and an absolute value of a preceding signal to noise ratio of the current frame, Calculating a weight value by dividing the weight value by an absolute value of a noise ratio, and calculating a filter value for each frequency using the calculated weight value;
A filter unit for multiplying the mixed signal by the calculated filter value to obtain an improved speech signal;
A time domain signal converter for converting the enhanced speech signal into a time domain signal;
The noise canceller comprising:

7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing the size of the mixed signal by the size of the noise signal is set as a post-signal-to-noise ratio
In noise canceling device.

7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing a value obtained by squaring the size of the estimated voice signal of the previous frame by an average value of a value obtained by squaring the size of the noise signal,
A value obtained by multiplying a value obtained by subtracting 1 from the post-signal-to-noise ratio and a value obtained by multiplying a large value of 0 by 1 and subtracting the predetermined proportional coefficient from the first frame to the previous frame is set as the preceding SNR
In noise canceling device.

delete

7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing a value obtained by multiplying the preceding signal-to-noise ratio by the weight value by a value obtained by multiplying the preceding signal-to-noise ratio by the weight value and 1 is added to the filter value
In noise canceling device.