Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Laurent Couvreur¹ &
Christophe Couvreur²

102 Accesses
16 Citations
Explore all metrics

Abstract

This communication presents a new method for automatic speech recognition in reverberant environments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation time is computed using a statistical model for short-term log-energy sequences of anechoic speech. The estimated reverberation time is then used to select the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated reverberation time, which serves to recognize the reverberated speech utterance. The proposed model selection approach is shown to improve significantly recognition accuracy for a connected digit task in both simulated and real reverberant environments, outperforming standard channel normalization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Features in Deep-Learning-Based Speech Recognition

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

A Bayesian view on acoustic model-based techniques for robust speech recognition

Article Open access 02 December 2015

References

R. Siemund, H. Höge, S. Kunzmann, and K. Marasek, ‘SPEECON-Speech Data for Consumer Devices,’ in Proc. of International Conference on Language Resources and Evaluation (LREC), Athens, Greece, 2000, vol. 2, pp. 883-886.
Google Scholar
S. Nakamura and K. Shikano, ‘Room Acoustics and Reverberation: Impact on Hands-Free Recognition,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Rhodes, Greece, 1997, vol. 5, pp. 2419-2422.
Google Scholar
L. Couvreur, C. Couvreur, and C. Ris, ‘A Corpus-Based Approach for Robust ASR in Reverberant Environments,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000, vol. 1, pp. 397-400.
Google Scholar
Y. Pan and A. Waibel, ‘The Effects of Room Acoustics on MFCC Speech Parameter,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.
C. Avendano and H. Hermansky, ‘Study on the Dereverberation of Speech Based on Temporal Envelope Filtering,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 2, pp. 889-892.
Article Google Scholar
S. Subramaniam, A.P. Petropulu, and C. Wendt, ‘Cepstrum-Based Deconvolution for Speech Dereverberation,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 5, 1996, pp. 392-396.
Article Google Scholar
D. Cole, M. Moody, and S. Sridharan, ‘Position-Independent Enhancement of Reverberant Speech,’, Journal of Audio Engineering Society, vol. 45, no. 3, 1997, pp. 142-147.
Google Scholar
H. Nomura, S. Hirobayashi, T. Koike, and M. Tohyama, ‘Dereverberation of Speech by Power Envelope Inverse Filtering,’ in Proc. of IEEE Workshop on Digital Signal Processing, Bryce Canyon, USA, 1998, vol. 1, pp. 229-232.
Google Scholar
B. Yegnanarayana, P.M. Satyanarayanan, C. Avendano, and H. Hermansky, ‘Enhancement of Reverberant Speech Using LP Residual,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seattle, USA, 1998, vol. 1, pp. 405-408.
Google Scholar
Q.-G. Liu, B. Champagne, and P. Kabal, ‘A Microphone Array Processing Technique for Speech Enhancement in Reverberant Space,’ Speech Communication, vol. 18, no. 4, 1996, pp. 317-334.
Article Google Scholar
C. Marro, Y. Mahieux, and K.U. Simmer, ‘Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays with Postfiltering,’ IEEE Trans. on Speech and Audio Processing, vol. 6, no. 3, 1998, pp. 240-259.
Article Google Scholar
F. Asano, S. Hayamizu, T. Yamada, and S. Nakamura, ‘Speech Enhancement Based on the Subspace Method,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 5, 2000, pp. 497-507.
Article Google Scholar
A. Koutras, E. Dermatas, and G. Kokkinakis, ‘Improving Simultaneous Speech Recognition in Real Room Environments Using Overdetermined Blind Source Separation,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 2, pp. 1009-1013.
Google Scholar
R. Mukai, S. Araki, and S. Makino, ‘Separation and Dereverberation Performance of Frequency Domain Blind Source Separation for Speech in a Reverberant Environment,’ in Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, 2001, vol. 4, pp. 2599-2602.
Google Scholar
B.D. Radlovií, R.C. Williamson, and R.A. Kennedy, ‘Equalization in an Acoustic Reverberant Environment: Robustness Results,’ IEEE Trans. on Speech and Audio Processing, vol. 8, no. 3, 2000, pp. 311-319.
Article Google Scholar
S.T. Neely and J.B. Allen, ‘Invertibility of a Room Impulse Response,’ Journal of the Acoustical Society of America, vol. 66, no. 1, 1979, pp. 165-169.
Article Google Scholar
M. Matassoni, M. Omologo, and D. Giuliani, ‘Hands-Free Speech Recognition Using a Filtered Clean Corpus and Incremental HMM Adaptation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1407-1410.
Google Scholar
T. Takiguchi, S. Nakamura, and K. Shikano, ‘HMM-Separation-Based Speech Recognition for a Distant Moving Speaker,’ IEEE Trans. on Speech and Audio Processing, vol. 9, no. 2, 2001, pp. 127-140.
Article Google Scholar
L. Couvreur, S. Dupont, C. Ris, J.-M. Boite, and C. Couvreur, ‘Fast Adaptation for Robust Speech Recognition in Reverberant Environments,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 85-88.
L. Rigazio, D. Kryze, P. Nguyen, and J.-C. Junqua, ‘Joint Environment and Speaker Adaptation,’ in Proc. of ISCA Workshop on Adaptation Methods For Automatic Speech Recognition, Sophia Antipolis, France, 2001, pp. 93-96.
K. Yamamoto, S. Nakagawa, and H. Matsumoto, ‘Evaluation of PMC for Segmental Unit Input HMM in Various Environments,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.
Y. Zhao, ‘Statistical Estimation for Hands-Free Speech Recognition,’ in Proc. of International Workshop on Hands-free Speech Communication, Kyoto, Japan, 2001, pp. 183-186.
S. Furui, ‘Cepstral Analysis Technique for Automatic Speaker Verification,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, no. 2, 1981, pp. 254-272.
Article Google Scholar
H. Hermansky and N. Morgan, ‘RASTA Processing of Speech,’ IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, 1994, pp. 578-589.
Article Google Scholar
C. Avendano, S. Van Vuuren, and H. Hermansky, ‘Data Based Filter Design for RASTA-like Channel Normalization in ASR,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Philadelphia, USA, 1996, vol. 4, pp. 2087-2090.
Article Google Scholar
B. Kingsbury and N. Morgan, ‘Recognizing Reverberant Speech with RASTA-PLP,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Munich, Germany, 1997, vol. 2, pp. 1259-1262.
Google Scholar
B. Kingsbury, N. Morgan, and S. Greenberg, ‘Improving ASR Performance For Reverberant Speech,’ in Proc. of ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-à-Mousson, France, 1997, pp. 87-90.
M.L. Shire and B.Y. Chen, ‘Data-Driven RASTA Filters in Reverberation,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000, vol. 3, pp. 1627-1630.
Google Scholar
M.L. Shire and B.Y. Chen, ‘On Data-derived Temporal Processing in Speech Feature Extraction,’ in Proc. of International Conference on Spoken Language Processing (ICSLP), Beijing, China, 2000.
D. Giuliani, M. Matassoni, M. Omologo, and P. Svaizer, ‘Training of HMM with Filtered Speech Material for Hands-free Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 1, pp. 449-452.
Google Scholar
V. Stahl, A. Fischer, and R. Bippus, ‘Acoustic Synthesis of Training Data for Speech Recognition in Living Room Environments,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, USA, 2001, vol. 1, pp. 21-24.
Google Scholar
M. Omura, M. Yada, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura, ‘Compensating of Room Acoustic Transfer Functions Affected by Change of Room Temperature,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, USA, 1999, vol. 2, pp. 941-944.
Google Scholar
H. Kuttruff, Room Acoustics, 4th ed. Elsevier, 2000.
A. Sankar and C.-H. Lee, ‘A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition,’ IEEE Trans. on Speech and Audio Processing, vol. 4, no. 3, 1996, pp. 190-202.
Article Google Scholar
J.B. Allen and D.A. Berkley, ‘Image Method for Efficiently Simulating Small-Room Acoustics,’ Journal of the Acoustical Society of America, vol. 65, no. 4, 1979, pp. 943-950.
Article Google Scholar
P.M. Peterson, ‘Simulating the Response of Multiple Microphones to a Single Acoustic Source in a Reverberant Room,’ Journal of the Acoustical Society of America, vol. 80, no. 5, 1986, pp. 1527-1529.
Article Google Scholar
J. Moorer, ‘About this Reverberation Business,’ Computer Music Journal, vol. 3, no. 2, 1979, pp. 13-28.
Article Google Scholar
C.J. Wellekens, ‘Explicit Time Correlation in Hidden Markov Models for Speech Recognition,’ in Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Dallas, USA, 1987, vol. 1, pp. 384-387.
Google Scholar
P. Kenny, M. Lennig, and P. Mermelstein, ‘A Linear Predictive HMM for Vector-Valued Observations with Applications to Speech Recognition,’ IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 38, no. 2, 1990, pp. 220-225.
Article Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm,’ Journal of the Royal Statistical Society, ser. B, vol. 39, 1997, pp. 1-38.
MathSciNet Google Scholar
AURORA database, http://www.elda.fr/aurora2.html.
A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. McGraw-Hill, 1991.
H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994.
Y. Suzuki, F. Asano, H.-Y. Kim, and T. Sone, ‘An Optimum Computer-Generated Pulse Signal Suitable for the Measurement of Very Long Impulse Responses,’ Journal of the Acoustical Society America, vol. 97, no. 2, 1995, pp. 1119-1123.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Multitel—TCTS, Faculté Polytechnique de Mons, 1 Avenue Copernic, B-7000, Mons, Belgium
Laurent Couvreur
Speech & Language Technology Division, Scansoft, Inc., 32 Guldensporenpark, B-9820, Merelbeke, Belgium
Christophe Couvreur

Authors

Laurent Couvreur
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Couvreur
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Couvreur, L., Couvreur, C. Blind Model Selection for Automatic Speech Recognition in Reverberant Environments. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 36, 189–203 (2004). https://doi.org/10.1023/B:VLSI.0000015096.78139.82

Download citation

Published: 01 February 2004
Issue Date: February 2004
DOI: https://doi.org/10.1023/B:VLSI.0000015096.78139.82

Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Features in Deep-Learning-Based Speech Recognition

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

A Bayesian view on acoustic model-based techniques for robust speech recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Blind Model Selection for Automatic Speech Recognition in Reverberant Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Features in Deep-Learning-Based Speech Recognition

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

A Bayesian view on acoustic model-based techniques for robust speech recognition

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation