Abstract
Previously, Wang et al. [1] proposed a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. Preliminary experiments showed that this method is effective for isolated word recognition in a reverberant environment. However, robustness and effect factors of the dereverberation method based on spectral subtraction were not investigated. In this paper, we analyze the effect factors of compensation parameter estimation for the dereverberation method based on spectral subtraction, such as the number of channels (the number of microphones), the length of reverberation to be suppressed, and the length of the utterance used for parameter estimation, and evaluate these on large vocabulary continuous speech recognition (LVCSR). We conducted speech recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method with beamforming achieves a relative word error reduction rate of 19.2% relative to conventional cepstral mean normalization with beamforming for LVCSR. The experimental results also show that our proposed method is robust in a variety of reverberant environments for both isolated and continuous speech recognition and under various parameter estimation conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, L., Kitaoka, N., Nakagawa, S.: Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm. IEICE Trans. Information and Systems E94-D(3), 659–667 (2011)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Processing 29(2), 254–272 (1981)
Raut, C., Nishimoto, T., Sagayama, S.: Adaptation for long convolutional distortion by maximum likelihood based state filtering approach. In: Proc. of ICASSP-2006, vol. 1, pp. 1133–1136 (2006)
Jin, Q., Schultz, T., Waibel, A.: Far-field speaker recognition. IEEE Trans. ASLP 15(7), 2023–2032 (2007)
Huang, Y., Benesty, J., Chen, J.: Optimal step size of the adaptive multi-channel LMS algorithm for blind SIMO identification. IEEE Signal Processing Letters 12(3), 173–175 (2005)
Huang, Y., Benesty, J., Chen, J.: Acoustic MIMO Signal Processing. Springer, Heidelberg (2006)
Nakamura, S., Hiyane, K., Asano, F., Nishiura, T., Yamada, T.: Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition. In: Proc. of LREC 2000, pp. 965–968 (May 2000)
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn (E) 20(3), 199–206 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Odani, K., Kai, A. (2011). Evaluation of Hands-Free Large Vocabulary Continuous Speech Recognition by Blind Dereverberation Based on Spectral Subtraction by Multi-channel LMS Algorithm. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)