Subramanian et al., 2019 - Google Patents

Speech enhancement using end-to-end speech recognition objectives

Subramanian et al., 2019

Document ID: 6244451574881179750
Author: Subramanian A; Wang X; Baskar M; Watanabe S; Taniguchi T; Tran D; Fujita Y
Publication year: 2019
Publication venue: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

External Links

Cited by

Snippet

Speech enhancement systems, which denoise and dereverberate distorted signals, are usually optimized based on signal reconstruction objectives including the maximum likelihood and minimum mean square error. However, emergent end-to-end neural methods …

Continue reading at www.academia.edu (PDF) (other versions)

230000001537 neural 0 abstract description 13

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or damping of, acoustic waves, e.g. sound
- G10K11/175—Methods or devices for protecting against, or damping of, acoustic waves, e.g. sound using interference effects; Masking sound
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/18—Methods or devices for transmitting, conducting, or directing sound

Similar Documents

Publication	Publication Date	Title
Subramanian et al.	2019	Speech enhancement using end-to-end speech recognition objectives
Wang et al.	2020	Complex spectral mapping for single-and multi-channel speech enhancement and robust ASR
Chakrabarty et al.	2019	Time–frequency masking based online multi-channel speech enhancement with convolutional recurrent neural networks
Parchami et al.	2016	Recent developments in speech enhancement in the short-time Fourier transform domain
Li et al.	2019	Multichannel speech enhancement based on time-frequency masking using subband long short-term memory
Yoshioka et al.	2012	Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
Zhao et al.	2017	A two-stage algorithm for noisy and reverberant speech enhancement
Azarang et al.	2020	A review of multi-objective deep learning speech denoising methods
Xiao et al.	2014	The NTU-ADSC systems for reverberation challenge 2014
Mohammadiha et al.	2015	Speech dereverberation using non-negative convolutive transfer function and spectro-temporal modeling
Subramanian et al.	2019	An investigation of end-to-end multichannel speech recognition for reverberant and mismatch conditions
DEREVERBERATION et al.	2014	REVERB Workshop 2014
Habets et al.	2018	Dereverberation
Martín-Doñas et al.	2017	Dual-channel DNN-based speech enhancement for smartphones
Song et al.	2021	An integrated multi-channel approach for joint noise reduction and dereverberation
Kim et al.	2022	Factorized MVDR deep beamforming for multi-channel speech enhancement
O'Shaughnessy	2024	Speech enhancement—A review of modern methods
Huang et al.	2008	Dereverberation
Kothapally et al.	2024	Monaural speech dereverberation using deformable convolutional networks
Rahmani et al.	2009	An iterative noise cross-PSD estimation for two-microphone speech enhancement
Li et al.	2020	Robust speech dereverberation based on wpe and deep learning
Font	2005	Multi-microphone signal processing for automatic speech recognition in meeting rooms
Cui et al.	2022	Correntropy-based multi-objective multi-channel speech enhancement
Krini et al.	2020	Speech enhancement with partial signal reconstruction based on deep recurrent neural networks and pitch-specific codebooks
Chetupalli et al.	2019	Clean speech AE-DNN PSD constraint for MCLP based reverberant speech enhancement