Abstract
Automatic Speech Recognition is a computer-driven transcription of spoken-language into human-readable text. This paper is focused on the development of an acoustic model for medium vocabulary, context independent, isolated Malayalam Speech Recognizer using Hidden Markov Model (HMM). In this work, the emission probabilities of syllables, based on HMMs are estimated from the Gaussian Mixture Model (GMM). Mel Frequency Cepstral Coefficient (MFCC) technique is used for feature extraction from the input speech. The generation of mixture weights for GMMs is done by implementing Dirichlet Distribution. The efficiency of thus generated Gaussian Mixture Model is verified with different Information Criteria namely Akaike Information Criterion, Bayes Information Criterion, Corrected AIC, Kullback Linear Information Criterion, corrected KIC and Approximated KIC (KICc, AKICc). The accuracy of medium vocabulary, speaker dependent and isolated Malayalam speech corpus for a single Gaussian is 90.91% and Word Error Rate (WER) is 11.9%. The word accuracy and WER of the system are calculated based on the experiments conducted for multivariate Gaussians. For Gaussian mixture five, a better word accuracy of 95.24% along with a WER of 4.76% is attained and the same is verified using Information Criteria.
Kerala State Council of Science Technology and Environment-KSCSTE.
L. Krishna Ramachandran—Research Scholar
S. Elizabeth—Professor
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benzeghiba, M., et al.: Automatic speech recognition and speech variability: a review. Speech Commun. 49(10), 763–786 (2007)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Malayalam Language (2018). https://en.wikipedia.org/wiki/Malayalam. Accessed 02 Jun 2018
Dirichlet Distribution (2018). https://en.wikipedia.org/wiki/Dirichlet_distribution. Accessed 02 Jun 2018
Abushariah, A.A.M., Gunawan, T.S., Khalifa, O.O., Abushariah, M.A.M.: English digits speech recognition system based on hidden Markov models. In: 2010 International Conference on Computer and Communication Engineer (ICCCE 2010), pp. 1–5. IEEE Press (2010)
Al-Qatab, B.A., Ainon, R.N.: Arabic speech recognition using hidden Markov model toolkit (HTK). In: International Symposium in Information Technology (ITSim), vol. 2, pp. 557–562. IEEE (2010)
Saini, P., Kaur, P., Dua, M.: Hindi automatic speech recognition using HTK. Int. J. Eng. Trends Technol. (IJETT) 4(6), 2223–2229 (2013)
Kumar, K., Aggarwal, R., Jain, A.: A hindi speech recognition system for connected words using HTK. Int. J. Comput. Syst. Eng. 1(1), 25–32 (2012)
Dua, M., Aggarwal, R., Kadyan, V., Dua, S.: Punjabi automatic speech recognition using HTK. IJCSI Int. J. Comput. Sci. Issues 9(4), 359 (2012)
Bhaskar, P.V., Rao, S.R.M., Gopi, A.: HTK based telugu speech recognition. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(12), 307–314 (2012)
Kurian, C., Balakrishnan, K.: Speech recognition of Malayalam numbers. In: World Congress on Nature & Biologically Inspired Computing, NaBIC 2009, pp. 1475–1479. IEEE (2009)
Kurian, C., Balakrishnan, K.: Connected digit speech recognition system for Malayalam language. Sadhana 38(6), 1339–1346 (2013)
Kurian, C., Balakrishnan, K.: Development & evaluation of different acoustic models for Malayalam continuous speech recognition. Procedia Eng. 30, 1081–1088 (2012)
Krishnan, V.V., Jayakumar, A., Babu, A.P.: Speech recognition of isolated Malayalam words using wavelet features and artificial neural network. In: 4th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2008, pp. 240–243. IEEE (2008)
Yu, K.: Generating Gaussian mixture models by model selection for speech recognition. F06 10–701 Final Project Report (2006)
Akogul, S., Erisoglu, M.: A comparison of information criteria in clustering based on mixture of multivariate normal distributions. Math. Comput. Appl. 21(3), 34 (2016)
Young, S.: Hidden Markov model toolkit: design and philosophy. CUED/F-INFENG/TR. 152, Cambridge University Engineering Department (1994)
Yu, D., Deng, L.: Automatic Speech Recognition, A Deep Learning Approach. SCT. Springer, London (2015). https://doi.org/10.1007/978-1-4471-5779-3
Reynolds, D.A.: Gaussian mixture models. Encycl. Biom. 2009, 659–663 (2009)
Karlis, D., Xekalaki, E.: Choosing initial values for the EM algorithm for finite mixtures. Comput. Stat. Data Anal. 41(3), 577–590 (2003)
Steele, R.J., Raftery, A.E.: Performance of bayesian model selection criteria for gaussian mixture models. Front. Stat. Decis. Mak. Bayesian Anal. 2, 113–130 (2010)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike, pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15
Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples. Biometrika 76(2), 297–307 (1989)
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Cavanaugh, J.E.: A large-sample model selection criterion based on kullback’s symmetric divergence. Stat. Probab. Lett. 42(4), 333–343 (1999)
Seghouane, A.K., Bekara, M.: A small sample model selection criterion based on kullback’s symmetric divergence. IEEE Trans. Signal Process. 52(12), 3314–3323 (2004)
Seghouane, A.K., Bekara, M., Fleury, G.: A criterion for model selection in the presence of incomplete data based on kullback’s symmetric divergence. Signal Process. 85(7), 1405–1417 (2005)
HTK hidden Markov model toolkit (1994). http://htk.eng.cam.ac.uk
Acknowledgment
This research is supported by Kerala State Council for Science, Technology and Environment (KSCSTE). I thank KSCSTE for funding the project under the Back-to-lab scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Krishna Ramachandran, L., Elizabeth, S. (2018). Generation of GMM Weights by Dirichlet Distribution and Model Selection Using Information Criterion for Malayalam Speech Recognition. In: Tiwary, U. (eds) Intelligent Human Computer Interaction. IHCI 2018. Lecture Notes in Computer Science(), vol 11278. Springer, Cham. https://doi.org/10.1007/978-3-030-04021-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-04021-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04020-8
Online ISBN: 978-3-030-04021-5
eBook Packages: Computer ScienceComputer Science (R0)