- Research Article
- Open access
- Published:
Robust Speech Recognition Using Factorial HMMs for Home Environments
EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 020593 (2007)
Abstract
We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. In addition, we performed home-environment adaptation of FHMMs to the characteristics of a given house. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method. Isolated word recognition experiments demonstrated the effectiveness of the proposed method under noisy conditions. Home-dependent word FHMMs (HD-FHMMs) reduced the word error rate by 20.5 compared to that of the clean-speech word HMMs.
References
Huang X, Acero A, Hon H: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice-Hall, Upper Saddle River, NJ, USA; 2001.
Cooke M, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001,34(3):267-285. 10.1016/S0167-6393(00)00034-0
Varga AP, Moore RK: Hidden Markov model decomposition of speech and noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 845–848.
Gales MJF, Young SJ: HMM recognition in noise using parallel model combination. Proceedings of the 3rd European Conference on Speech Communication and Technology (EuroSpeech '93), September 1993, Berlin, Germany 2: 837–840.
Ghahramani Z, Jordan MI: Factorial hidden Markov models. Machine Learning 1997,29(2-3):245-273.
Deoras AN, Hasegawa-Johnson M: A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 861–864.
Iwasawa T, Ohnaka S, Fujita Y: A speech recognition interface for robots using notification of III-suited conditions. Proceedings of the 16th Meeting of Special Interest Group on AI Challenges, 2002 33–38.
Roweis TS: One microphone source separation. Proceedings of Neural Information Processing Systems (NIPS '00), 2000, Denver, Colo, USA 13: 793–799.
Nadas A, Nahamoo D, Picheny MA: Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(10):1495-1503. 10.1109/29.35387
Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420
Logan B, Moreno P: Factorial HMMs for acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 813–816.
Furui S: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):52-59. 10.1109/TASSP.1986.1164788
Shinoda K, Watanabe T: Speaker adaptation with autonomous control using tree structure. Proceedings of the 4th European Conference on Speech Communication and Technology (EuroSpeech '95), September 1995, Madrid, Spain 1143–1146.
Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Betkowska, A., Shinoda, K. & Furui, S. Robust Speech Recognition Using Factorial HMMs for Home Environments. EURASIP J. Adv. Signal Process. 2007, 020593 (2007). https://doi.org/10.1155/2007/20593
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/20593