Abstract
In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
MHRD. To know more about Indian Languages. [Online]. Available: http://mhrd.gov.in/sites/upload_files/mhrd/files/upload_document/ languagebr.pdf.
Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages http://speech.iiit.ac.in/svldownloads/pro_po_en_report/.
References
Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416.
Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech.
Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317.
Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & Disciplines
Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638.
Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068.
Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON).
Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118.
Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC).
Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES).
Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH.
Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.
Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7.
Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT).
Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.
Povey, D. et al. (2011). The Kaldi speech recognition toolkit, IEEE workshop on ASRU. http://kaldi-asr.org/
Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education.
Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724.
Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech.
Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6.
Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press.
Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822.
Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262.
Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.
Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6.
Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887.
Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA.
The International Phonetic Association. (2007). Handbook of the international phonetic association. Cambridge University Press. https://www.internationalphoneticassociation.org/
Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.
Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219.
Acknowledgements
We thank Prof. B. Yegnanarayana, Prof. K. Sri Rama Murthy, and Prof. R. Kumaraswamy for providing Kannada and Telugu datasets. These datasets were developed as a part of the consortium project titled ”Prosodically guided phonetic engine for searching speech databases in Indian languages” supported by DIT, New Delhi, India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Manjunath, K.E., Jayagopi, D.B., Sreenivasa Rao, K. et al. Development and analysis of multilingual phone recognition systems using Indian languages. Int J Speech Technol 22, 157–168 (2019). https://doi.org/10.1007/s10772-018-09589-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-09589-z