Abstract
This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques.
Similar content being viewed by others
References
R. Siemund, H. Höge, S. Kunzmann, and K. Marasek, ‘SPEECON-Speech Data for Consumer Devices,’ in Proc. of International Conference on Language Resources and Evaluation (LREC), Athens, Greece, 2000, vol. 2, pp. 883-886.
S. Nakamura and K. Shikano, ‘Room Acoustics and Reverberation: Impact on Hands-Free Recognition,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Rhodes, Greece, 1997, vol. 5, pp. 2419-2422.
L. Couvreur, C. Couvreur, and C. Ris, ‘A Corpus-Based Approach for Robust ASR in Reverberant Environments,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000, vol. 1, pp. 397-400.
Y. Pan and A. Waibel, ‘The Effects of Room Acoustics on MFCC Speech Parameter,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.
C. Avendano and H. Hermansky, ‘Study on the Dereverberation of Speech Based on Temporal Envelope Filtering,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 2, pp. 889-892.
S. Subramaniam, A.P. Petropulu, and C. Wendt, ‘Cepstrum-Based Deconvolution for Speech Dereverberation,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, 1996, pp. 392-396.
D. Cole, M. Moody, and S. Sridharan, ‘Position-Independent Enhancement of Reverberant Speech,’, Journal of Audio Engineering Society, vol. 45, no. 3, 1997, pp. 142-147.
H. Nomura, S. Hirobayashi, T. Koike, and M. Tohyama, ‘Dereverberation of Speech by Power Envelope Inverse Filtering,’ in Proc. of IEEE Workshop on Digital Signal Processing, Bryce Canyon, USA, 1998, vol. 1, pp. 229-232.
B. Yegnanarayana, P.M. Satyanarayanan, C. Avendano, and H. Hermansky, ‘Enhancement of Reverberant Speech Using LP Residual,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seattle, USA, 1998, vol. 1, pp. 405-408.
Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in Reverberant Space,’ Speech Communication, vol. 18, no. 4, 1996, pp. 317-334.
C. Marro, Y. Mahieux, and K.U. Simmer, ‘Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays with Postfiltering,’ IEEE Trans. on Speech and Audio Processing, vol. 6, no. 3, 1998, pp. 240-259.
F. Asano, S. Hayamizu, T. Yamada, and S. Nakamura, ‘Speech Enhancement Based on the Subspace Method,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, 2000, pp. 497-507.
A. Koutras, E. Dermatas, and G. Kokkinakis, ‘Improving Simultaneous Speech Recognition in Real Room Environments Using Overdetermined Blind Source Separation,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 2, pp. 1009-1013.
R. Mukai, S. Araki, and S. Makino, ‘Separation and Dereverberation Performance of Frequency Domain Blind Source Separation for Speech in a Reverberant Environment,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 4, pp. 2599-2602.
B.D. Radlovií, R.C. Williamson, and R.A. Kennedy, ‘Equalization in an Acoustic Reverberant Environment: Robustness Results,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, 2000, pp. 311-319.
S.T. Neely and J.B. Allen, ‘Invertibility of a Room Impulse Response,’ Journal of the Acoustical Society of America, vol. 66, no. 1, 1979, pp. 165-169.
M. Matassoni, M. Omologo, and D. Giuliani, ‘Hands-Free Speech Recognition Using a Filtered Clean Corpus and Incremental HMM Adaptation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1407-1410.
T. Takiguchi, S. Nakamura, and K. Shikano, ‘HMM-Separation-Based Speech Recognition for a Distant Moving Speaker,’ IEEE Trans. on Speech and Audio Processing, vol. 9, no. 2, 2001, pp. 127-140.
L. Couvreur, S. Dupont, C. Ris, J.-M. Boite, and C. Couvreur, ‘Fast Adaptation for Robust Speech Recognition in Reverberant Environments,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 85-88.
L. Rigazio, D. Kryze, P. Nguyen, and J.-C. Junqua, ‘Joint Environment and Speaker Adaptation,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 93-96.
K. Yamamoto, S. Nakagawa, and H. Matsumoto, ‘Evaluation of PMC for Segmental Unit Input HMM in Various Environments,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.
Y. Zhao, ‘Statistical Estimation for Hands-Free Speech Recognition,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.
S. Furui, ‘Cepstral Analysis Technique for Automatic Speaker Verification,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, no. 2, 1981, pp. 254-272.
H. Hermansky and N. Morgan, ‘RASTA Processing of Speech,’ IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, 1994, pp. 578-589.
C. Avendano, S. Van Vuuren, and H. Hermansky, ‘Data Based Filter Design for RASTA-like Channel Normalization in ASR,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 4, pp. 2087-2090.
B. Kingsbury and N. Morgan, ‘Recognizing Reverberant Speech with RASTA-PLP,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, 1997, vol. 2, pp. 1259-1262.
B. Kingsbury, N. Morgan, and S. Greenberg, ‘Improving ASR Performance For Reverberant Speech,’ in Proc. of ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-à-Mousson, France, 1997, pp. 87-90.
M.L. Shire and B.Y. Chen, ‘Data-Driven RASTA Filters in Reverberation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1627-1630.
M.L. Shire and B.Y. Chen, ‘On Data-derived Temporal Processing in Speech Feature Extraction,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Training of HMM with Filtered Speech Material for Hands-free Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 1, pp. 449-452.
V. Stahl, A. Fischer, and R. Bippus, ‘Acoustic Synthesis of Training Data for Speech Recognition in Living Room Environments,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, USA, 2001, vol. 1, pp. 21-24.
M. Omura, M. Yada, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Compensating of Room Acoustic Transfer Functions Affected by Change of Room Temperature,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 2, pp. 941-944.
H. Kuttruff, Room Acoustics, 4th ed. Elsevier, 2000.
A. Sankar and C.-H. Lee, ‘A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 3, 1996, pp. 190-202.
J.B. Allen and D.A. Berkley, ‘Image Method for Efficiently Simulating Small-Room Acoustics,’ Journal of the Acoustical Society of America, vol. 65, no. 4, 1979, pp. 943-950.
P.M. Peterson, ‘Simulating the Response of Multiple Microphones to a Single Acoustic Source in a Reverberant Room,’ Journal of the Acoustical Society of America, vol. 80, no. 5, 1986, pp. 1527-1529.
J. Moorer, ‘About this Reverberation Business,’ Computer Music Journal, vol. 3, no. 2, 1979, pp. 13-28.
C.J. Wellekens, ‘Explicit Time Correlation in Hidden Markov Models for Speech Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Dallas, USA, 1987, vol. 1, pp. 384-387.
P. Kenny, M. Lennig, and P. Mermelstein, ‘A Linear Predictive HMM for Vector-Valued Observations with Applications to Speech Recognition,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 38, no. 2, 1990, pp. 220-225.
A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ Journal of the Royal Statistical Society, ser. B, vol. 39, 1997, pp. 1-38.
AURORA database, http://www.elda.fr/aurora2.html.
A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-Hill, 1991.
H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994.
Y. Suzuki, F. Asano, H.-Y. Kim, and T. Sone, ‘An Optimum Computer-Generated Pulse Signal Suitable for the Measurement of Very Long Impulse Responses,’ Journal of the Acoustical Society America, vol. 97, no. 2, 1995, pp. 1119-1123.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Couvreur, L., Couvreur, C. Blind Model Selection for Automatic Speech Recognition in Reverberant Environments. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 189–203 (2004). https://doi.org/10.1023/B:VLSI.0000015096.78139.82
Published:
Issue Date:
DOI: https://doi.org/10.1023/B:VLSI.0000015096.78139.82