Abstract
In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- ANN:
-
Artificial Neural Networks
- ASR:
-
Automatic Speech Recognition
- CD:
-
Context Dependent
- CI:
-
Context Independent
- CIIL:
-
Central Institute of Indian Languages, Mysore
- CMU:
-
Carnegie Melon University
- HMM:
-
Hidden Markov Model
- LVCSR:
-
Large Vocabulary Continuous Speech Recognition
- SVM:
-
Support Vector Machine
- WER:
-
Word Error Rate
References
Arden, A. H. (1934). A progressive grammar of common Tamil (4th ed.). Madras: Christian Literature Society, pp. 59.
Arokianathan, S. (1981). Tamil clitics. Trivandrum: Dravidian Linguistics Association, pp. 5.
Asher, R. E., & Keane, E. L. (2005). Diphthongs in colloquial Tamil. In W. J. Hardcastle & J. Mackenzie Beck (Eds.) (pp. 141–171).
Bahl, L. R., Bakis, R., Cohen, P. S., Cole, A. G., Jelinek, F., Lewis, B. L., & Mercer, R. L. (1980). Further results on the recognition of a continuously read natural corpus, presented at the IEEE international. In Conference on acoustics, speech, signal processing.
Bahl, L. R., Brown, P. F., De Souza, P. V., & Mercer, R. L. (1988). Acoustic Markov models used in the Tangora speech recognition system. Presented at the IEEE international conference on acoustics, speech, signal processing, 1988.
Balasubramanian, T. (1980). Timing in Tamil. Journal of Phonetics, 8, 449–467.
CIIL, Central Institute of Indian Languages, Mysore, India. http://www.ciilcorpora.net/tamsam.htm.
Fujimura, O. (1975). Syllable as a unit of speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-23(1), 82–87.
Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.
Greenberg, S. (1998). Speaking in short hand—a syllable centric perspective for understanding pronunciation variation. In Proceedings of the ESCA workshop on modeling pronunciation variation for automatic speech recognition, Kekrade, 1998 (pp. 47–56).
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall PTR. ISBN:0-13-022616-5.
Hwang, M. Y., & Huang, X. D. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.
Khan, A. N., & Yegnanarayana, B. (2001). Development of speech recognition system for Tamil for small restricted task. In Proceedings of national conference on communication, India, 2001.
Lakshmi, A., & Hema, A. M. (2006). A syllable based continuous speech recognizer for Tamil. In INTERSPEECH 2006, Pittsburgh, Pennsylvania (pp. 1878–1881).
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In EUROSPEECH 2003.
Lee, K. F. (1990). Context dependent phonetic Markov models for speaker independent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(4), 599–609.
Lippmann, R. P., Martin, E. A., & Paul, D. P. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. IEEE international conference on acoustics, speech, signal processing (pp. 705–708).
Marthandan, C. R. (1983). Phonetics of casual Tamil. Ph.D. thesis, University of London.
Nagarajan, T., Kamakshi Prasad, V., & Hema, A. M. (2001). The minimum phase signal derived from the magnitude spectrum and its applications to speech segmentation. In Sixth biennial conference of signal processing and communications.
Nagarajan, T., Hema, A. M., & Hegde, R. M. (2003). Segmentation speech into syllable-like units. In EUROSPEECH-2003 (pp. 2893–2896).
Paul, D. B., & Martin, E. A. (1988). Speaker stress-resistant continuous speech recognition. Presented at the IEEE international conference on acoustics, speech, signal processing.
Plauche, M., Udhyakumar, N., Wooters, C., Pal, J., & Ramachadran, D. (2006). Speech recognition for illiterate access to information and technology. In Proceedings of first international conference on ICT and development.
Rabiner, L. R., Wilpon, J. G., & Soong, F. K. (1988). High performance connected digit recognition using hidden Markov models. Presented at the IEEE int. conf. acoustics, speech, signal processing.
Saraswathi, S., & Geetha, T. V. (2004). Lecture notes in computer science: Vol. 3285. Implementation of Tamil speech recognition system using neural networks.
Saraswathi, S., & Geetha, T. V. (2007). Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system. ACM Transaction on Asian Language Information Processing, 6(3), Article 9.
Schwartz, R. M., Chow, Y. L., Roucos, S., Krasner, M., & Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. Presented at the IEEE international conference acoustics, speech, signal processing.
Soundaraj, F. (2000). Accent in Tamil: Speech research for speech technology. In K. Nagamma Reddy (Ed.), Speech technology: Issues and implications in Indian languages (pp. 246–256). Thiruvananthapuram: International School of Dravidian Linguistics.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thangarajan, R., Natarajan, A.M. & Selvam, M. Syllable modeling in continuous speech recognition for Tamil language. Int J Speech Technol 12, 47–57 (2009). https://doi.org/10.1007/s10772-009-9058-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9058-0