Nothing Special   »   [go: up one dir, main page]

KR101741141B1 - Apparatus for suppressing noise and method thereof - Google Patents

Apparatus for suppressing noise and method thereof Download PDF

Info

Publication number
KR101741141B1
KR101741141B1 KR1020150181624A KR20150181624A KR101741141B1 KR 101741141 B1 KR101741141 B1 KR 101741141B1 KR 1020150181624 A KR1020150181624 A KR 1020150181624A KR 20150181624 A KR20150181624 A KR 20150181624A KR 101741141 B1 KR101741141 B1 KR 101741141B1
Authority
KR
South Korea
Prior art keywords
signal
noise
value
value obtained
multiplying
Prior art date
Application number
KR1020150181624A
Other languages
Korean (ko)
Inventor
이석필
서지훈
한혁수
Original Assignee
상명대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 상명대학교산학협력단 filed Critical 상명대학교산학협력단
Priority to KR1020150181624A priority Critical patent/KR101741141B1/en
Priority to PCT/KR2015/013970 priority patent/WO2017104876A1/en
Application granted granted Critical
Publication of KR101741141B1 publication Critical patent/KR101741141B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/0208

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Noise Elimination (AREA)

Abstract

A noise canceling method according to an aspect of the present invention includes: receiving a mixed signal including a voice signal and a noise signal; Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal; Obtaining a post-SNR using the noise signal and the mixed signal; Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame; Calculating a weight value using the estimated preceding SNR; Calculating a filter value for each frequency using the calculated weight value; And multiplying the mixed signal by the calculated filter value to obtain the enhanced estimated speech signal.

Description

[0001] APPARATUS FOR SUPPRESSING NOISE AND METHOD THEREOF [0002]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to signal processing for improving speech, and more particularly, to a signal processing method and apparatus for enhancing intelligibility of speech by eliminating wind sounds included in speech.

As the spread of smartphones grows, a variety of speech recognition technologies are being used. Apple's Siri and Google's Google Now are some of the more popular smartphone services using voice recognition.

In a quiet environment, the recognition rate of such a voice recognition service is high. Even in a normal call situation, the other party's voice can be heard well. However, when a noisy situation or wind noise is mixed with the user's voice and input to the smartphone, The voice recognition rate of the service is lowered and the voice of the other party can not be recognized well.

In the case of mixed wind sounds, the prior art attempted to reduce the wind noise by simply cutting out a specific band of the signal using a low pass filter (LPF) or a high pass filter (HPF).

Korean Patent Application No. 10-2005-0120682 The present invention relates to a method for automatically removing a wind sound according to a level, in which a mixed signal is filtered by a low-pass filter, and the level is measured to generate a control signal according to the measured level It is an invention to eliminate wind noise through a high pass filter.

However, there is a problem that the speech recognition rate can not be improved because a simple filtering method causes a loss in the user's voice band as well as the wind sound.

SUMMARY OF THE INVENTION It is an object of the present invention to provide an apparatus and method for obtaining a filter coefficient by using a preceding signal-to-noise ratio and a post-signal-to-noise ratio and using the same to eliminate wind noise.

The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

According to another aspect of the present invention, there is provided a noise canceling method comprising: receiving a mixed signal including a voice signal and a noise signal; Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal; Obtaining a post-SNR using the noise signal and the mixed signal; Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame; Calculating a weight value using the estimated preceding SNR; Calculating a filter value for each frequency using the calculated weight value; And multiplying the mixed signal by the calculated filter value to obtain the enhanced estimated speech signal.

According to another aspect of the present invention, there is provided a noise canceling apparatus including at least one processor, the processor including: an input unit for receiving a mixed signal including a voice signal and a noise signal; A frequency signal converter for converting the mixed signal into a frequency domain signal; Wherein the noise signal is obtained using the interval in which the speech signal is absent, the post-SNR is calculated using the noise signal and the mixed signal, and the post-SNR, the noise signal of the previous frame, An operation unit for estimating a preceding signal-to-noise ratio of a current frame using a signal-to-noise ratio, calculating a weight value using the estimated preceding signal-to-noise ratio, and calculating a filter value for each frequency using the calculated weight value; A filter unit for multiplying the mixed signal by the calculated filter value to obtain an improved speech signal; A time domain signal converter for converting the enhanced speech signal into a time domain signal; And a control unit.

According to the present invention, it is possible to increase the voice recognition rate and improve the clarity of voice during a call by providing a voice enhancement technique by filtering a signal mixed with a wind sound using a filter formed using a preceding signal-to-noise ratio and a post- .

1 is a flowchart of a noise removal method according to an embodiment of the present invention;
BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a noise canceling method.
3 is a structural view of a noise removing apparatus according to another embodiment of the present invention.
4 is a structural view of a computer apparatus in which a noise reduction method according to another embodiment of the present invention is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 shows a flowchart of a noise reduction method according to an embodiment of the present invention.

In order to remove noises for voice signal enhancement, a mixed signal is input first (S110).

Since the input mixed signal is a time domain signal, an FFT (Fast Fourier Transform) operation is performed to convert it into a frequency domain signal. The signal converted into the frequency domain signal through the FFT operation is composed of an amplitude signal and a phase signal. In the present invention, since the operation is performed using only the amplitude signal, the phase signal is transmitted to the output side without modification.

To obtain the a priori SNR, a noise signal, a mixed signal, and a posteriori SNR are required. Since only the mixed signal is received, the residual noise signal and the post-SNR are estimated from the mixed signal.

First, the noise signal is obtained by using the interval in which no speech is present in the mixed signal. Since the human voice is not always present in the mixed signal and therefore the human voice is not present in the short section after receiving the mixed signal input, the noise signal is obtained assuming that only the noise signal exists in this section.

After the noise signal is obtained, the post-SNR can be obtained using the noise signal and the mixed signal. The post-SNR can be obtained by the following Equation 1 (S120).

Figure 112015124354593-pat00001

Post-signal-to-noise ratio

Figure 112015124354593-pat00002
(P, k) represents the mixed signal and the noise signal at the p-th frame and the k-th frequency index, respectively. The noise signal uses the values assumed in the previous step.

The preceding signal-to-noise ratio is calculated using the calculated post-signal-to-noise ratio (S130).

Figure 112015124354593-pat00003

Figure 112015124354593-pat00004
Is an estimated speech signal from which a noise signal is removed from a mixed signal. The speech signal before the start of the calculation according to the present invention is initialized to 0, the speech signal of the frame is estimated, and the preceding signal-to- .

α is a preset proportional coefficient value for adjusting the influence of the estimated speech signal and the noise signal of the previous frame and the post-S / N ratio accumulated from the first frame to the previous frame in estimating the speech signal.

That is, α is a value between 0 and 1. The closer to 1, the more influence on the value of the previous frame. The closer to 0, the more affected by the accumulated value from the 1st frame to the previous frame. It means that the influence becomes larger.

When the preceding signal-to-noise ratio is extracted, the weight value is calculated using this value (S140), and the weight value can be obtained by Equation (3).

Figure 112015124354593-pat00005

The μ value is a weight parameter. If the value of the preceding signal-to-noise ratio is large, it means that the size of the voice signal is large. Therefore, the weight value must be large. Conversely, if the value of the preceding signal-to-noise ratio is small, the voice signal is small as compared with the noise signal. .

If the weight value and the preceding signal-to-noise ratio value are obtained, the filter value H (p, k) used for noise cancellation can be obtained using the two values (S150).

Figure 112015124354593-pat00006

Since the filter value for each frequency index can be obtained in the corresponding frame, the final speech signal in which noise is finally reduced can be obtained by multiplying the mixed signal by the noise elimination filter value (S160). This process is shown in Equation (5).

Figure 112015124354593-pat00007

Y (p, k) represents a mixed signal, and as described above,

Figure 112015124354593-pat00008
Is used to determine the preceding signal-to-noise ratio in the next frame.

FIG. 2 shows a flow chart of a signal until a mixed signal mixed with a noise signal is filtered to output a signal in which a noise signal is attenuated.

Finally, since the estimated voice signal is the amplitude signal of the voice signal, it is converted into a time domain signal by IFFT (Inverse Fast Fourier Transform) together with the phase signal of the voice signal that has not been subjected to the deformation.

When noise is removed by estimating the preceding signal-to-noise ratio, noise can be more effectively removed than by removing a noise with a simple filter such as a conventional LPF.

3 is a structural view of a noise removing apparatus according to another embodiment of the present invention.

The input unit 310 receives a mixed signal in which a voice signal and a noise signal are mixed. The input unit may be formed of a microphone or the like, or may extract only a mixed signal which is a voice signal by receiving a file type input such as an audio file or a moving picture file.

Since the present invention processes a signal in the frequency domain, the frequency signal transforming unit 320 transforms the received signal into a frequency signal through a method such as FFT. Frequency signal conversion can be performed not only by FFT, but also by methods such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), and Filterbank.

The operation unit 330 extracts a filter value for noise cancellation from the input signal.

The post-SNR is first obtained from the received mixed signal and the noise signal, and this procedure is as shown in Equation (1). It is impossible to distinguish the voice signal from the noise signal in the mixed signal. However, it is assumed that the voice signal does not exist in the initial input signal, and the post-SNR is calculated assuming that the signal in this interval is a noise signal.

The preceding signal-to-noise ratio is calculated using the thus-calculated post-signal-to-noise ratio, the speech signal estimated in the previous frame, and the average value of the noise signal obtained in the previous frame. In this process, it is possible to control the rate at which the previous frame estimation value and the history value of the previous frames including the previous frame influence the preceding signal-to-noise ratio using the proportional coefficient value.

If the ratio of the previous frame value is increased, it can be sensitive to the change between frames. However, it can cause inconvenience to the user due to frequent change, and when the ratio of the history value is increased, sudden change can be suppressed Although it is possible to hear a natural voice signal, it can not quickly respond to a signal that changes rapidly in time, so an optimal value between the two can be determined by experiments.

If the preceding signal-to-noise ratio is obtained by Equation (2), the weight value can be obtained. If the preceding signal-to-noise ratio is large, the speech signal is expected to be large. To increase the weight value and decrease the weight value, to be. The weight value is obtained by Equation (3).

The filter value can be finally obtained by using the weight value and the preceding signal-to-noise ratio value.

The filter unit 340 multiplies the mixed signal by the filter value thus obtained to obtain a noise-free signal.

Since the noise-canceled signal through the filter unit 340 is a signal in the frequency domain, the speech signal is converted into a time domain signal through the time signal converter 350 and provided to the output unit, You can hear the signal.

The time signal converter 350 may convert a frequency domain signal into a time domain signal using a method such as IFFT, Inverse DFT (IDFT), Inverse DCT (IDCT), or Inverse Filterbank.

Meanwhile, the noise cancellation method in the embodiment of the present invention can be implemented in a computer system or recorded on a recording medium. 4, a computer system includes at least one processor 421, a memory 423, a user input device 426, a data communication bus 422, a user output device 427, And may include a storage 428. Each of the above-described components performs data communication via the data communication bus 422. [

The computer system may further include a network interface 429 coupled to the network. The processor 421 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 423 and / or the storage 428.

The memory 423 and the storage 428 may include various forms of volatile or non-volatile storage media. For example, the memory 423 may include a ROM 424 and a RAM 425.

Accordingly, the noise cancellation method according to the embodiment of the present invention can be implemented in a computer-executable method. When the noise cancellation method according to an embodiment of the present invention is performed in a computer device, computer-readable instructions can perform the recognition method according to the present invention.

Meanwhile, the noise reduction method according to the present invention can be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media storing data that can be decoded by a computer system. For example, there may be a ROM (Read Only Memory), a RAM (Random Access Memory), a magnetic tape, a magnetic disk, a flash memory, an optical data storage device and the like. The computer-readable recording medium may also be distributed and executed in a computer system connected to a computer network and stored and executed as a code that can be read in a distributed manner.

While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

310: input unit 320: frequency signal converter
330: operation unit 340: filter unit
350: time signal converting section

Claims (10)

Receiving a mixed signal including a voice signal and a noise signal;
Obtaining the noise signal using an interval in which the speech signal is absent from the mixed signal;
Obtaining a post-SNR using the noise signal and the mixed signal;
Estimating a preceding signal-to-noise ratio of a current frame using the post-S / N ratio, the noise signal of the previous frame, and the preceding SNR of the previous frame;
Calculating a weight value by dividing a square root of a value obtained by squaring a preceding SNR of the current frame and an absolute value of a preceding SNR of the current frame by an absolute value of a preceding SNR of the current frame;
Calculating a filter value for each frequency using the calculated weight value; And
Multiplying the mixed signal by the calculated filter value to obtain an enhanced estimated speech signal;
≪ / RTI >
2. The method of claim 1, wherein the step of determining the post-
A value obtained by dividing the size of the mixed signal by the size of the noise signal is set as a post-signal-to-noise ratio
In noise removal method.
2. The method of claim 1, wherein the preceding signal to noise ratio
A value obtained by dividing a value obtained by squaring the size of the estimated voice signal of the previous frame by an average value of a value obtained by squaring the size of the noise signal,
A value obtained by multiplying a value obtained by subtracting 1 from the post-signal-to-noise ratio and a value obtained by multiplying a large value of 0 by a value obtained by subtracting the predetermined proportional coefficient from 1 is added to a value obtained by adding all values from the first frame to the previous frame
In noise removal method.
delete The method of claim 1,
And multiplying the value obtained by multiplying the preceding signal-to-noise ratio by the weight by a value obtained by multiplying the value obtained by multiplying the preceding signal-to-
In noise removal method.
A noise cancellation apparatus comprising one or more processors,
An input unit for receiving a mixed signal including a voice signal and a noise signal;
A frequency signal converter for converting the mixed signal into a frequency domain signal;
Wherein the noise signal is obtained using the interval in which the speech signal is absent, the post-SNR is calculated using the noise signal and the mixed signal, and the post-SNR, the noise signal of the previous frame, Noise ratio of the current frame and a value of a square root of a sum of a value obtained by squaring the preceding signal to noise ratio of the current frame and an absolute value of a preceding signal to noise ratio of the current frame, Calculating a weight value by dividing the weight value by an absolute value of a noise ratio, and calculating a filter value for each frequency using the calculated weight value;
A filter unit for multiplying the mixed signal by the calculated filter value to obtain an improved speech signal;
A time domain signal converter for converting the enhanced speech signal into a time domain signal;
The noise canceller comprising:
7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing the size of the mixed signal by the size of the noise signal is set as a post-signal-to-noise ratio
In noise canceling device.
7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing a value obtained by squaring the size of the estimated voice signal of the previous frame by an average value of a value obtained by squaring the size of the noise signal,
A value obtained by multiplying a value obtained by subtracting 1 from the post-signal-to-noise ratio and a value obtained by multiplying a large value of 0 by 1 and subtracting the predetermined proportional coefficient from the first frame to the previous frame is set as the preceding SNR
In noise canceling device.
delete 7. The apparatus of claim 6, wherein the calculating unit
A value obtained by dividing a value obtained by multiplying the preceding signal-to-noise ratio by the weight value by a value obtained by multiplying the preceding signal-to-noise ratio by the weight value and 1 is added to the filter value
In noise canceling device.
KR1020150181624A 2015-12-18 2015-12-18 Apparatus for suppressing noise and method thereof KR101741141B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020150181624A KR101741141B1 (en) 2015-12-18 2015-12-18 Apparatus for suppressing noise and method thereof
PCT/KR2015/013970 WO2017104876A1 (en) 2015-12-18 2015-12-18 Noise removal device and method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150181624A KR101741141B1 (en) 2015-12-18 2015-12-18 Apparatus for suppressing noise and method thereof

Publications (1)

Publication Number Publication Date
KR101741141B1 true KR101741141B1 (en) 2017-05-29

Family

ID=59053261

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150181624A KR101741141B1 (en) 2015-12-18 2015-12-18 Apparatus for suppressing noise and method thereof

Country Status (2)

Country Link
KR (1) KR101741141B1 (en)
WO (1) WO2017104876A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115881155A (en) * 2022-12-02 2023-03-31 宁波硕正电子科技有限公司 Transient noise suppression method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100922580B1 (en) * 2006-11-17 2009-10-21 한국전자통신연구원 Apparatus and method to reduce a noise for VoIP Service
US8352257B2 (en) * 2007-01-04 2013-01-08 Qnx Software Systems Limited Spectro-temporal varying approach for speech enhancement
KR20080075362A (en) * 2007-02-12 2008-08-18 인하대학교 산학협력단 A method for obtaining an estimated speech signal in noisy environments
KR101247652B1 (en) * 2011-08-30 2013-04-01 광주과학기술원 Apparatus and method for eliminating noise
CN103871421B (en) * 2014-03-21 2018-02-02 厦门莱亚特医疗器械有限公司 A kind of self-adaptation noise reduction method and system based on subband noise analysis

Also Published As

Publication number Publication date
WO2017104876A1 (en) 2017-06-22

Similar Documents

Publication Publication Date Title
JP5275748B2 (en) Dynamic noise reduction
US20200265857A1 (en) Speech enhancement method and apparatus, device and storage mediem
US8275154B2 (en) Apparatus for processing an audio signal and method thereof
CN106463106B (en) Wind noise reduction for audio reception
JP2007171961A (en) Advanced periodic signal enhancement
EP2624254A1 (en) Audio processing device and audio processing method for de-reverberation
WO1997028527A1 (en) A noisy speech parameter enhancement method and apparatus
US20200286501A1 (en) Apparatus and a method for signal enhancement
CN111868826B (en) Adaptive filtering method, device, equipment and storage medium in echo cancellation
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
CN111383647B (en) Voice signal processing method and device and readable storage medium
JP2014513320A (en) Method and apparatus for attenuating dominant frequencies in an audio signal
US7526428B2 (en) System and method for noise cancellation with noise ramp tracking
US10438606B2 (en) Pop noise control
CN113593599A (en) Method for removing noise signal in voice signal
KR20070078171A (en) Apparatus and method for noise reduction using snr-dependent suppression rate control
CN103824563A (en) Hearing aid denoising device and method based on module multiplexing
CN108053834B (en) Audio data processing method, device, terminal and system
TWI594232B (en) Method and apparatus for processing of audio signals
CN105869652B (en) Psychoacoustic model calculation method and device
KR101741141B1 (en) Apparatus for suppressing noise and method thereof
JP2017129741A (en) Noise reduction device and noise reduction method
KR101173980B1 (en) System and method for suppressing noise in voice telecommunication
WO2020203258A1 (en) Echo suppression device, echo suppression method, and echo suppression program
JPH113091A (en) Detection device of aural signal rise

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant