Wang, 2008 - Google Patents
Time-frequency masking for speech separation and its potential for hearing aid designWang, 2008
View HTML- Document ID
- 3846170451997686267
- Author
- Wang D
- Publication year
- Publication venue
- Trends in amplification
External Links
Snippet
A new approach to the separation of speech from speech-in-noise mixtures is the use of time- frequency (TF) masking. Originated in the field of computational auditory scene analysis, TF masking performs separation in the time-frequency domain. This article introduces the TF …
- 230000000873 masking 0 title abstract description 131
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0202—Applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang | Time-frequency masking for speech separation and its potential for hearing aid design | |
Das et al. | Fundamentals, present and future perspectives of speech enhancement | |
Wang et al. | Supervised speech separation based on deep learning: An overview | |
Kim et al. | An algorithm that improves speech intelligibility in noise for normal-hearing listeners | |
Stern et al. | Hearing is believing: Biologically inspired methods for robust automatic speech recognition | |
Michelsanti et al. | Deep-learning-based audio-visual speech enhancement in presence of Lombard effect | |
Kollmeier et al. | Perception of speech and sound | |
Pertilä et al. | Distant speech separation using predicted time–frequency masks from spatial features | |
Zhao et al. | A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions | |
Bramsløw et al. | Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm | |
Monaghan et al. | Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners | |
Keshavarzi et al. | Use of a deep recurrent neural network to reduce wind noise: Effects on judged speech intelligibility and sound quality | |
Wang et al. | Speech enhancement for cochlear implant recipients | |
Zai et al. | Reconstruction of audio waveforms from spike trains of artificial cochlea models | |
Hansen et al. | A speech perturbation strategy based on “Lombard effect” for enhanced intelligibility for cochlear implant listeners | |
Priyanka et al. | Multi-channel speech enhancement using early and late fusion convolutional neural networks | |
Albornoz et al. | Feature extraction based on bio-inspired model for robust emotion recognition | |
Abel et al. | Novel two-stage audiovisual speech filtering in noisy environments | |
Venkatesan et al. | Binaural classification-based speech segregation and robust speaker recognition system | |
Patil et al. | Marathi speech intelligibility enhancement using I-AMS based neuro-fuzzy classifier approach for hearing aid users | |
Zouhir et al. | A bio-inspired feature extraction for robust speech recognition | |
Hummersone | A psychoacoustic engineering approach to machine sound source separation in reverberant environments | |
Li et al. | Improved environment-aware–based noise reduction system for cochlear implant users based on a knowledge transfer approach: Development and usability study | |
Gößling et al. | Perceptual evaluation of binaural MVDR-based algorithms to preserve the interaural coherence of diffuse noise fields | |
Maciejewski et al. | Building corpora for single-channel speech separation across multiple domains |