Subramanian et al., 2019 - Google Patents
Speech enhancement using end-to-end speech recognition objectivesSubramanian et al., 2019
View PDF- Document ID
- 6244451574881179750
- Author
- Subramanian A
- Wang X
- Baskar M
- Watanabe S
- Taniguchi T
- Tran D
- Fujita Y
- Publication year
- Publication venue
- 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
External Links
Snippet
Speech enhancement systems, which denoise and dereverberate distorted signals, are usually optimized based on signal reconstruction objectives including the maximum likelihood and minimum mean square error. However, emergent end-to-end neural methods …
- 230000001537 neural 0 abstract description 13
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or damping of, acoustic waves, e.g. sound
- G10K11/175—Methods or devices for protecting against, or damping of, acoustic waves, e.g. sound using interference effects; Masking sound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/18—Methods or devices for transmitting, conducting, or directing sound
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Subramanian et al. | Speech enhancement using end-to-end speech recognition objectives | |
Wang et al. | Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR | |
Chakrabarty et al. | Time–frequency masking based online multi-channel speech enhancement with convolutional recurrent neural networks | |
Parchami et al. | Recent developments in speech enhancement in the short-time Fourier transform domain | |
Li et al. | Multichannel speech enhancement based on time-frequency masking using subband long short-term memory | |
Yoshioka et al. | Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition | |
Zhao et al. | A two-stage algorithm for noisy and reverberant speech enhancement | |
Azarang et al. | A review of multi-objective deep learning speech denoising methods | |
Xiao et al. | The NTU-ADSC systems for reverberation challenge 2014 | |
Mohammadiha et al. | Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling | |
Subramanian et al. | An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions | |
DEREVERBERATION et al. | REVERB Workshop 2014 | |
Habets et al. | Dereverberation | |
Martín-Doñas et al. | Dual-channel DNN-based speech enhancement for smartphones | |
Song et al. | An integrated multi-channel approach for joint noise reduction and dereverberation | |
Kim et al. | Factorized MVDR deep beamforming for multi-channel speech enhancement | |
O'Shaughnessy | Speech enhancement—A review of modern methods | |
Huang et al. | Dereverberation | |
Kothapally et al. | Monaural speech dereverberation using deformable convolutional networks | |
Rahmani et al. | An iterative noise cross-PSD estimation for two-microphone speech enhancement | |
Li et al. | Robust speech dereverberation based on wpe and deep learning | |
Font | Multi-microphone signal processing for automatic speech recognition in meeting rooms | |
Cui et al. | Correntropy-based multi-objective multi-channel speech enhancement | |
Krini et al. | Speech enhancement with partial signal reconstruction based on deep recurrent neural networks and pitch-specific codebooks | |
Chetupalli et al. | Clean speech AE-DNN PSD constraint for MCLP based reverberant speech enhancement |