Wolf et al., 2013 - Google Patents

Channel selection using n-best hypothesis for multi-microphone ASR.

Wolf et al., 2013

Document ID: 12990565254119602760
Author: Wolf M; Nadeu C
Publication year: 2013
Publication venue: INTERSPEECH

External Links

Cited by

Snippet

If speech is captured by several arbitrarily-located microphones in a room, the degree of distortion by noise and reverberation may vary strongly from one channel to another. Channel selection for automatic speech recognition aims to rank the signals according to …

Continue reading at www.isca-archive.org (PDF) (other versions)

238000000034 method 0 abstract description 13

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters

Similar Documents

Publication	Publication Date	Title
Karpagavalli et al.	2016	A review on automatic speech recognition architecture and approaches
Watanabe et al.	2017	New Era for Robust Speech Recognition
Vincent et al.	2013	The second ‘CHiME’speech separation and recognition challenge: An overview of challenge systems and outcomes
Soltau et al.	2005	The IBM 2004 conversational telephony system for rich transcription
O’Shaughnessy	2008	Automatic speech recognition: History, methods and challenges
Kingsbury et al.	1997	Recognizing reverberant speech with RASTA-PLP
Hain et al.	2007	The AMI system for the transcription of speech in meetings
KR101237799B1 (en)	2013-02-28	Improving the robustness to environmental changes of a context dependent speech recognizer
Hori et al.	2015	The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition
Kumatani et al.	2011	Channel selection based on multichannel cross-correlation coefficients for distant speech recognition
JP6110945B2 (en)	2017-04-05	Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems
WO2016063795A1 (en)	2016-04-28	Method for transforming a noisy speech signal to an enhanced speech signal
EP2192575A1 (en)	2010-06-02	Speech recognition based on a multilingual acoustic model
US7181395B1 (en)	2007-02-20	Methods and apparatus for automatic generation of multiple pronunciations from acoustic data
JP6985221B2 (en)	2021-12-22	Speech recognition device and speech recognition method
Araki et al.	2017	Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming
Yamamoto et al.	2005	Making a robot recognize three simultaneous sentences in real-time
Fujita et al.	2015	Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection
Lakshmi Sarada et al.	2009	Automatic transcription of continuous speech into syllable-like units for Indian languages
Stüker et al.	2006	Cross-system adaptation and combination for continuous speech recognition: the influence of phoneme set and acoustic front-end.
Wolf et al.	2013	Channel selection using n-best hypothesis for multi-microphone ASR.
Kumatani et al.	2011	Maximum kurtosis beamforming with a subspace filter for distant speech recognition
Wang et al.	2015	Noise Robust IOA/CAS Speech Separation and Recognition System For The Third'CHIME'Challenge
Menne	2018	Learning acoustic features from the raw waveform for automatic speech recognition
Alam et al.	2016	CRIM's Speech Recognition System for the 4th CHIME Challenge