Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Development and analysis of multilingual phone recognition systems using Indian languages

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. MHRD. To know more about Indian Languages. [Online]. Available: http://mhrd.gov.in/sites/upload_files/mhrd/files/upload_document/ languagebr.pdf.

  2. Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages http://speech.iiit.ac.in/svldownloads/pro_po_en_report/.

  3. Sclite Tool http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.

References

  • Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416.

  • Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech.

  • Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317.

  • Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & Disciplines

  • Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638.

  • Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068.

  • Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON).

  • Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118.

  • Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC).

  • Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES).

  • Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH.

  • Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.

    Article  Google Scholar 

  • Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7.

  • Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT).

  • Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.

    Article  Google Scholar 

  • Povey, D. et al. (2011). The Kaldi speech recognition toolkit, IEEE workshop on ASRU. http://kaldi-asr.org/

  • Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education.

    Google Scholar 

  • Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724.

  • Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech.

  • Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6.

  • Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press.

    Google Scholar 

  • Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822.

  • Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262.

  • Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.

    Article  MATH  Google Scholar 

  • Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6.

  • Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887.

    Google Scholar 

  • Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA.

  • The International Phonetic Association. (2007). Handbook of the international phonetic association. Cambridge University Press. https://www.internationalphoneticassociation.org/

  • Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.

    Article  Google Scholar 

  • Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219.

Download references

Acknowledgements

We thank Prof. B. Yegnanarayana, Prof. K. Sri Rama Murthy, and Prof. R. Kumaraswamy for providing Kannada and Telugu datasets. These datasets were developed as a part of the consortium project titled ”Prosodically guided phonetic engine for searching speech databases in Indian languages” supported by DIT, New Delhi, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manjunath, K.E., Jayagopi, D.B., Sreenivasa Rao, K. et al. Development and analysis of multilingual phone recognition systems using Indian languages. Int J Speech Technol 22, 157–168 (2019). https://doi.org/10.1007/s10772-018-09589-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-09589-z

Keywords

Navigation