Syllable modeling in continuous speech recognition for Tamil language

R. Thangarajan¹,
A. M. Natarajan² &
M. Selvam¹

320 Accesses
17 Citations
Explore all metrics

Abstract

In automatic speech recognition, the phone has probably been a dominating sub-word unit for more than one decade. Context Dependent phone or triphone modeling accounts for contextual variations between adjacent phones and state tying addresses modeling of triphones that are not seen during training. Recently, syllable is gaining momentum as a new sub-word unit. Syllable being a larger unit than a phone addresses the severe contextual variations between phones within it. Therefore, it is more stable than a phone and models pronunciation variability in a systematic way. Tamil language has challenging features like agglutination and morpho-phonology. In this paper, attempts have been made to provide solutions to these issues by using the syllable as a sub-word unit in an acoustic model. Initially, a small vocabulary context independent word models and a medium vocabulary context dependent phone models are developed. Subsequently, an algorithm based on prosodic syllable is proposed and two experiments have been conducted. First, syllable based context independent models have been trained and tested. Despite large number of syllables, this system has performed reasonably well compared to context independent word models in terms of word error rate and out of vocabulary words. Subsequently, in the second experiment, syllable information is integrated in conventional triphone modeling wherein cross-syllable triphones are replaced with monophones and the number of context dependent phone models is reduced by 22.76% in untied units. In spite of reduction in the number of models, the accuracy of the proposed system is comparable to that of the baseline triphone system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing Syllable Models for an HMM Based Speech Recognition System

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

An Amharic Syllable-Based Speech Corpus for Continuous Speech Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Abbreviations

ANN:: Artificial Neural Networks
ASR:: Automatic Speech Recognition
CD:: Context Dependent
CI:: Context Independent
CIIL:: Central Institute of Indian Languages, Mysore
CMU:: Carnegie Melon University
HMM:: Hidden Markov Model
LVCSR:: Large Vocabulary Continuous Speech Recognition
SVM:: Support Vector Machine
WER:: Word Error Rate

References

Arden, A. H. (1934). A progressive grammar of common Tamil (4th ed.). Madras: Christian Literature Society, pp. 59.
Google Scholar
Arokianathan, S. (1981). Tamil clitics. Trivandrum: Dravidian Linguistics Association, pp. 5.
Google Scholar
Asher, R. E., & Keane, E. L. (2005). Diphthongs in colloquial Tamil. In W. J. Hardcastle & J. Mackenzie Beck (Eds.) (pp. 141–171).
Bahl, L. R., Bakis, R., Cohen, P. S., Cole, A. G., Jelinek, F., Lewis, B. L., & Mercer, R. L. (1980). Further results on the recognition of a continuously read natural corpus, presented at the IEEE international. In Conference on acoustics, speech, signal processing.
Bahl, L. R., Brown, P. F., De Souza, P. V., & Mercer, R. L. (1988). Acoustic Markov models used in the Tangora speech recognition system. Presented at the IEEE international conference on acoustics, speech, signal processing, 1988.
Balasubramanian, T. (1980). Timing in Tamil. Journal of Phonetics, 8, 449–467.
Google Scholar
CIIL, Central Institute of Indian Languages, Mysore, India. http://www.ciilcorpora.net/tamsam.htm.
Fujimura, O. (1975). Syllable as a unit of speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-23(1), 82–87.
Article Google Scholar
Ganapathiraju, A., Hamaker, J., Picone, J., Ordowski, M., & Doddington, G. R. (2001). Syllable based large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 358–366.
Article Google Scholar
Greenberg, S. (1998). Speaking in short hand—a syllable centric perspective for understanding pronunciation variation. In Proceedings of the ESCA workshop on modeling pronunciation variation for automatic speech recognition, Kekrade, 1998 (pp. 47–56).
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm and system development. Englewood Cliffs: Prentice-Hall PTR. ISBN:0-13-022616-5.
Google Scholar
Hwang, M. Y., & Huang, X. D. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing, 1(4), 414–420.
Article Google Scholar
Khan, A. N., & Yegnanarayana, B. (2001). Development of speech recognition system for Tamil for small restricted task. In Proceedings of national conference on communication, India, 2001.
Lakshmi, A., & Hema, A. M. (2006). A syllable based continuous speech recognizer for Tamil. In INTERSPEECH 2006, Pittsburgh, Pennsylvania (pp. 1878–1881).
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., & Wolf, P. (2003). Design of the CMU Sphinx-4 decoder. In EUROSPEECH 2003.
Lee, K. F. (1990). Context dependent phonetic Markov models for speaker independent continuous speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(4), 599–609.
Article Google Scholar
Lippmann, R. P., Martin, E. A., & Paul, D. P. (1987). Multi-style training for robust isolated-word speech recognition. In Proc. IEEE international conference on acoustics, speech, signal processing (pp. 705–708).
Marthandan, C. R. (1983). Phonetics of casual Tamil. Ph.D. thesis, University of London.
Nagarajan, T., Kamakshi Prasad, V., & Hema, A. M. (2001). The minimum phase signal derived from the magnitude spectrum and its applications to speech segmentation. In Sixth biennial conference of signal processing and communications.
Nagarajan, T., Hema, A. M., & Hegde, R. M. (2003). Segmentation speech into syllable-like units. In EUROSPEECH-2003 (pp. 2893–2896).
Paul, D. B., & Martin, E. A. (1988). Speaker stress-resistant continuous speech recognition. Presented at the IEEE international conference on acoustics, speech, signal processing.
Plauche, M., Udhyakumar, N., Wooters, C., Pal, J., & Ramachadran, D. (2006). Speech recognition for illiterate access to information and technology. In Proceedings of first international conference on ICT and development.
Rabiner, L. R., Wilpon, J. G., & Soong, F. K. (1988). High performance connected digit recognition using hidden Markov models. Presented at the IEEE int. conf. acoustics, speech, signal processing.
Saraswathi, S., & Geetha, T. V. (2004). Lecture notes in computer science: Vol. 3285. Implementation of Tamil speech recognition system using neural networks.
Saraswathi, S., & Geetha, T. V. (2007). Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system. ACM Transaction on Asian Language Information Processing, 6(3), Article 9.
Schwartz, R. M., Chow, Y. L., Roucos, S., Krasner, M., & Makhoul, J. (1984). Improved hidden Markov modeling phonemes for continuous speech recognition. Presented at the IEEE international conference acoustics, speech, signal processing.
Soundaraj, F. (2000). Accent in Tamil: Speech research for speech technology. In K. Nagamma Reddy (Ed.), Speech technology: Issues and implications in Indian languages (pp. 246–256). Thiruvananthapuram: International School of Dravidian Linguistics.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Kongu Engineering College, Perundurai, 638 052, Erode, India
R. Thangarajan & M. Selvam
Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, 638 401, Erode, India
A. M. Natarajan

Authors

R. Thangarajan
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
M. Selvam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Thangarajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thangarajan, R., Natarajan, A.M. & Selvam, M. Syllable modeling in continuous speech recognition for Tamil language. Int J Speech Technol 12, 47–57 (2009). https://doi.org/10.1007/s10772-009-9058-0

Download citation

Received: 01 November 2009
Accepted: 02 November 2009
Published: 18 November 2009
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10772-009-9058-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Designing Syllable Models for an HMM Based Speech Recognition System

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

An Amharic Syllable-Based Speech Corpus for Continuous Speech Recognition

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Syllable modeling in continuous speech recognition for Tamil language

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Designing Syllable Models for an HMM Based Speech Recognition System

Semi-automatic Syllable Labelling for Assamese Language Using HMM and Vowel Onset-Offset Points

An Amharic Syllable-Based Speech Corpus for Continuous Speech Recognition

Explore related subjects

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation