Research Article
Open access
Published: 01 December 2007

Robust Speech Recognition Using Factorial HMMs for Home Environments

Agnieszka Betkowska¹,
Koichi Shinoda¹ &
Sadaoki Furui¹

EURASIP Journal on Advances in Signal Processing volume 2007, Article number: 020593 (2007) Cite this article

1064 Accesses
10 Citations
Metrics details

Abstract

We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. As a model compensation method for this problem, we investigated the use of factorial hidden Markov model (FHMM) architecture developed from a clean-speech hidden Markov model (HMM) and a sudden-noise HMM. While in conventional studies this architecture is defined only for static features of the observation vector, we extended it to dynamic features. In addition, we performed home-environment adaptation of FHMMs to the characteristics of a given house. A database recorded by a personal robot called PaPeRo in home environments was used for the evaluation of the proposed method. Isolated word recognition experiments demonstrated the effectiveness of the proposed method under noisy conditions. Home-dependent word FHMMs (HD-FHMMs) reduced the word error rate by 20.5 compared to that of the clean-speech word HMMs.

References

Huang X, Acero A, Hon H: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice-Hall, Upper Saddle River, NJ, USA; 2001.
Google Scholar
Cooke M, Green P, Josifovski L, Vizinho A: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 2001,34(3):267-285. 10.1016/S0167-6393(00)00034-0
Article Google Scholar
Varga AP, Moore RK: Hidden Markov model decomposition of speech and noise. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), April 1990, Albuquerque, NM, USA 2: 845–848.
Google Scholar
Gales MJF, Young SJ: HMM recognition in noise using parallel model combination. Proceedings of the 3rd European Conference on Speech Communication and Technology (EuroSpeech '93), September 1993, Berlin, Germany 2: 837–840.
Google Scholar
Ghahramani Z, Jordan MI: Factorial hidden Markov models. Machine Learning 1997,29(2-3):245-273.
Article Google Scholar
Deoras AN, Hasegawa-Johnson M: A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 1: 861–864.
Google Scholar
Iwasawa T, Ohnaka S, Fujita Y: A speech recognition interface for robots using notification of III-suited conditions. Proceedings of the 16th Meeting of Special Interest Group on AI Challenges, 2002 33–38.
Google Scholar
Roweis TS: One microphone source separation. Proceedings of Neural Information Processing Systems (NIPS '00), 2000, Denver, Colo, USA 13: 793–799.
Google Scholar
Nadas A, Nahamoo D, Picheny MA: Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989,37(10):1495-1503. 10.1109/29.35387
Article Google Scholar
Davis SB, Mermelstein P: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 1980,28(4):357-366. 10.1109/TASSP.1980.1163420
Article Google Scholar
Logan B, Moreno P: Factorial HMMs for acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2: 813–816.
Google Scholar
Furui S: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(1):52-59. 10.1109/TASSP.1986.1164788
Article Google Scholar
Shinoda K, Watanabe T: Speaker adaptation with autonomous control using tree structure. Proceedings of the 4th European Conference on Speech Communication and Technology (EuroSpeech '95), September 1995, Madrid, Spain 1143–1146.
Google Scholar
Leggetter CJ, Woodland PC: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 1995,9(2):171-185. 10.1006/csla.1995.0010
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Tokyo, 152-8552, Japan
Agnieszka Betkowska, Koichi Shinoda & Sadaoki Furui

Authors

Agnieszka Betkowska
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Shinoda
View author publications
You can also search for this author in PubMed Google Scholar
Sadaoki Furui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agnieszka Betkowska.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Betkowska, A., Shinoda, K. & Furui, S. Robust Speech Recognition Using Factorial HMMs for Home Environments. EURASIP J. Adv. Signal Process. 2007, 020593 (2007). https://doi.org/10.1155/2007/20593

Download citation

Received: 01 February 2006
Revised: 19 August 2006
Accepted: 17 December 2006
Published: 01 December 2007
DOI: https://doi.org/10.1155/2007/20593