Abstract
Natural language translation is a well-defined task of linguistic technology that minimizes communication gap among people of diverse linguistic backgrounds. Although neural machine translation attains remarkable translational performance, it requires adequate amount of train data, which is a challenging task for low-resource language pair translation. Also, neural machine translation handles rare word problems, i.e., low-frequency words translation at the subword level, but it shows weakness for highly inflected language translation. In this work, we have explored neural machine translation on low-resource English-Assamese language pair with a proposed transliteration approach in the data preprocessing step. In the transliteration approach, the source language is transliterated into target language script that leverages a smaller subword vocabulary for the source-target languages. Moreover, the pre-trained embeddings on the monolingual data of transliterated source and target languages are used in the training process. With our approach, the neural machine translation significantly improves translational performance for English-to-Assamese and Assamese-to-English translation and obtain state-of-the-art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Megerdoomian, K., Parvaz, D.: Low-density language bootstrapping: the case of Tajiki Persian. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pp. 3293–3298. European Language Resources Association (ELRA), Marrakech, Morocco (2008)
Probst, K., Brown, R., Carbonell, J., Lavie, A., Levin, L.S., Peterson, E.: Design and implementation of controlled elicitation for machine translation of low-density languages, pp. 3293–3298 (2001)
Hogan, C.: OCR for minority languages. In: Symposium on Document Image Understanding Technology (1999)
Gu, J., Hassan, H., Devlin, J., Li, V.O.K.: Universal neural machine translation for extremely low resource languages. In: Walker, M.A., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT USA, June 1–6, vol. 1 (Long Papers), pp. 344–354. Association for Computational Linguistics, New Orleans, Louisiana (2018)
Denkowski, M., Neubig, G.: Stronger baselines for trustable results in neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 18–27. Association for Computational Linguistics, Vancouver (2017)
Kocmi, T.: Exploring benefits of transfer learning in neural machine translation (2020)
Saharia, N., Das, D., Sharma, U., Kalita, J.: Part of speech tagger for Assamese text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 33–36. Association for Computational Linguistics, Suntec, Singapore (2009)
Barman, A., Sarmah, J., Sarma, S.: Assamese WordNet based quality enhancement of bilingual machine translation system. In: Proceedings of the Seventh Global Wordnet Conference, pp. 256–261. University of Tartu Press, Tartu, Estonia (2014)
Baruah K.K., Das P., Hannan A., Sarma, S.K.: Assamese-English bilingual machine translation. Int. J. Nat. Lang. Comput. (IJNLC) 3 (2014)
Laskar, S.R., Khilji, A.F.U.R., Pakray, P., Bandyopadhyay, S.: EnAsCorp1.0: English-Assamese corpus. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 62–68. Association for Computational Linguistics, Suzhou, China (2020)
Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., Sahoo, S., Diddee, H., Kakwani, D., Kumar, N., et al.: Samanantar: the largest publicly available parallel corpora collection for 11 Indic languages. Trans. Assoc. Comput. Linguist. 10, 145–162 (2022)
Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, USA (2010)
Tarakeswara Rao, B., Patibandla, R., Murty, M.R.: A comparative study on effective approaches for unsupervised statistical machine translation. In: Embedded Systems and Artificial Intelligence, pp. 895–905. Springer (2020)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, May 7–9, Conference Track Proceedings, pp. 1–15. arXiv, San Diego, CA, USA (2015)
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar (2014)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 3104–3112. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Ramesh, S.H., Sankaranarayanan, K.P.: Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 112–119. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017)
Laskar, S.R., Ur Rahman Khilji, A.F., Pakray, P., Bandyopadhyay, S.: Improved neural machine translation for low-resource English–Assamese pair. J. Intell. Fuzzy Syst. 42(5), 4727–4738 (2022)
Bhat, I.A., Mujadia, V., Tammewar, A., Bhat, R.A., Shrivastava, M.: IIIT-H system submission for FIRE2014 shared task on transliterated search. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 48–53. FIRE ’14, Association for Computing Machinery, New York, NY, USA (2014)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany (2016)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543. ACL, Doha, Qatar (2014)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver, Canada (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. ACL ’02, Association for Computational Linguistics, Stroudsburg, PA, USA (2002)
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952. Association for Computational Linguistics, Cambridge, MA (2010)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Lavie, A., Denkowski, M.J.: The METEOR metric for automatic evaluation of machine translation. Mach. Transl. 23(2–3), 105–115 (2009)
Acknowledgements
We want to thank the Center for Natural Language Processing (CNLP), the Artificial Intelligence (AI) Lab and the Department of Computer Science and Engineering at the National Institute of Technology, Silchar, India, for providing the requisite support and infrastructure to execute this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Laskar, S.R., Paul, B., Pakray, P., Bandyopadhyay, S. (2023). Improving English-Assamese Neural Machine Translation Using Transliteration-Based Approach. In: Bhateja, V., Yang, XS., Lin, J.CW., Das, R. (eds) Evolution in Computational Intelligence. FICTA 2022. Smart Innovation, Systems and Technologies, vol 326. Springer, Singapore. https://doi.org/10.1007/978-981-19-7513-4_20
Download citation
DOI: https://doi.org/10.1007/978-981-19-7513-4_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7512-7
Online ISBN: 978-981-19-7513-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)