He et al., 2020 - Google Patents
Mask-based blind source separation and MVDR beamforming in ASRHe et al., 2020
- Document ID
- 5638381240622721531
- Author
- He R
- Long Y
- Li Y
- Liang J
- Publication year
- Publication venue
- International Journal of Speech Technology
External Links
Snippet
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional …
- 238000000926 separation method 0 title abstract description 43
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation | |
CN109830245B (en) | A method and system for multi-speaker speech separation based on beamforming | |
Tan et al. | Neural spectrospatial filtering | |
Chazan et al. | Multi-microphone speaker separation based on deep DOA estimation | |
Grais et al. | Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
CN110970053A (en) | Multichannel speaker-independent voice separation method based on deep clustering | |
JP6348427B2 (en) | Noise removal apparatus and noise removal program | |
Haridas et al. | A novel approach to improve the speech intelligibility using fractional delta-amplitude modulation spectrogram | |
Sun et al. | Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation | |
Gul et al. | Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions | |
Saleem et al. | Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence | |
Nugraha et al. | Deep neural network based multichannel audio source separation | |
Li et al. | Speech enhancement algorithm based on sound source localization and scene matching for binaural digital hearing aids | |
He et al. | Mask-based blind source separation and MVDR beamforming in ASR | |
Li et al. | Speech enhancement based on binaural sound source localization and cosh measure wiener filtering | |
Sheeja et al. | Speech dereverberation and source separation using DNN-WPE and LWPR-PCA | |
Li et al. | MAF-Net: multidimensional attention fusion network for multichannel speech separation | |
Chen et al. | A multichannel learning-based approach for sound source separation in reverberant environments | |
Ali et al. | The identification and localization of speaker using fusion techniques and machine learning techniques | |
Hsu et al. | Array configuration-agnostic personalized speech enhancement using long-short-term spatial coherence | |
Venkatesan et al. | Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest | |
Ghalamiosgouei et al. | Robust Speaker Identification Based on Binaural Masks | |
Al-Ali et al. | Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments | |
Habib et al. | Auditory inspired methods for localization of multiple concurrent speakers |