Nothing Special   »   [go: up one dir, main page]

Skip to main content

Improving English-Assamese Neural Machine Translation Using Transliteration-Based Approach

  • Conference paper
  • First Online:
Evolution in Computational Intelligence (FICTA 2022)

Abstract

Natural language translation is a well-defined task of linguistic technology that minimizes communication gap among people of diverse linguistic backgrounds. Although neural machine translation attains remarkable translational performance, it requires adequate amount of train data, which is a challenging task for low-resource language pair translation. Also, neural machine translation handles rare word problems, i.e., low-frequency words translation at the subword level, but it shows weakness for highly inflected language translation. In this work, we have explored neural machine translation on low-resource English-Assamese language pair with a proposed transliteration approach in the data preprocessing step. In the transliteration approach, the source language is transliterated into target language script that leverages a smaller subword vocabulary for the source-target languages. Moreover, the pre-trained embeddings on the monolingual data of transliterated source and target languages are used in the training process. With our approach, the neural machine translation significantly improves translational performance for English-to-Assamese and Assamese-to-English translation and obtain state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://rb.gy/owt9mq.

  2. 2.

    https://github.com/libindic/indic-trans.

  3. 3.

    http://data.statmt.org/pmindia/v1/parallel/.

References

  1. Megerdoomian, K., Parvaz, D.: Low-density language bootstrapping: the case of Tajiki Persian. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pp. 3293–3298. European Language Resources Association (ELRA), Marrakech, Morocco (2008)

    Google Scholar 

  2. Probst, K., Brown, R., Carbonell, J., Lavie, A., Levin, L.S., Peterson, E.: Design and implementation of controlled elicitation for machine translation of low-density languages, pp. 3293–3298 (2001)

    Google Scholar 

  3. Hogan, C.: OCR for minority languages. In: Symposium on Document Image Understanding Technology (1999)

    Google Scholar 

  4. Gu, J., Hassan, H., Devlin, J., Li, V.O.K.: Universal neural machine translation for extremely low resource languages. In: Walker, M.A., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT USA, June 1–6, vol. 1 (Long Papers), pp. 344–354. Association for Computational Linguistics, New Orleans, Louisiana (2018)

    Google Scholar 

  5. Denkowski, M., Neubig, G.: Stronger baselines for trustable results in neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 18–27. Association for Computational Linguistics, Vancouver (2017)

    Google Scholar 

  6. Kocmi, T.: Exploring benefits of transfer learning in neural machine translation (2020)

    Google Scholar 

  7. Saharia, N., Das, D., Sharma, U., Kalita, J.: Part of speech tagger for Assamese text. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 33–36. Association for Computational Linguistics, Suntec, Singapore (2009)

    Google Scholar 

  8. Barman, A., Sarmah, J., Sarma, S.: Assamese WordNet based quality enhancement of bilingual machine translation system. In: Proceedings of the Seventh Global Wordnet Conference, pp. 256–261. University of Tartu Press, Tartu, Estonia (2014)

    Google Scholar 

  9. Baruah K.K., Das P., Hannan A., Sarma, S.K.: Assamese-English bilingual machine translation. Int. J. Nat. Lang. Comput. (IJNLC) 3 (2014)

    Google Scholar 

  10. Laskar, S.R., Khilji, A.F.U.R., Pakray, P., Bandyopadhyay, S.: EnAsCorp1.0: English-Assamese corpus. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 62–68. Association for Computational Linguistics, Suzhou, China (2020)

    Google Scholar 

  11. Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., Sahoo, S., Diddee, H., Kakwani, D., Kumar, N., et al.: Samanantar: the largest publicly available parallel corpora collection for 11 Indic languages. Trans. Assoc. Comput. Linguist. 10, 145–162 (2022)

    Google Scholar 

  12. Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, USA (2010)

    MATH  Google Scholar 

  13. Tarakeswara Rao, B., Patibandla, R., Murty, M.R.: A comparative study on effective approaches for unsupervised statistical machine translation. In: Embedded Systems and Artificial Intelligence, pp. 895–905. Springer (2020)

    Google Scholar 

  14. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, May 7–9, Conference Track Proceedings, pp. 1–15. arXiv, San Diego, CA, USA (2015)

    Google Scholar 

  15. Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  16. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics, Lisbon, Portugal (2015)

    Google Scholar 

  17. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 3104–3112. NIPS’14, MIT Press, Cambridge, MA, USA (2014)

    Google Scholar 

  18. Ramesh, S.H., Sankaranarayanan, K.P.: Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 112–119. Association for Computational Linguistics, New Orleans, Louisiana, USA (2018)

    Google Scholar 

  19. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017)

    Google Scholar 

  20. Laskar, S.R., Ur Rahman Khilji, A.F., Pakray, P., Bandyopadhyay, S.: Improved neural machine translation for low-resource English–Assamese pair. J. Intell. Fuzzy Syst. 42(5), 4727–4738 (2022)

    Google Scholar 

  21. Bhat, I.A., Mujadia, V., Tammewar, A., Bhat, R.A., Shrivastava, M.: IIIT-H system submission for FIRE2014 shared task on transliterated search. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 48–53. FIRE ’14, Association for Computing Machinery, New York, NY, USA (2014)

    Google Scholar 

  22. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany (2016)

    Google Scholar 

  23. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543. ACL, Doha, Qatar (2014)

    Google Scholar 

  24. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver, Canada (2017)

    Google Scholar 

  25. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. ACL ’02, Association for Computational Linguistics, Stroudsburg, PA, USA (2002)

    Google Scholar 

  26. Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944–952. Association for Computational Linguistics, Cambridge, MA (2010)

    Google Scholar 

  27. Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)

    Google Scholar 

  28. Lavie, A., Denkowski, M.J.: The METEOR metric for automatic evaluation of machine translation. Mach. Transl. 23(2–3), 105–115 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

We want to thank the Center for Natural Language Processing (CNLP), the Artificial Intelligence (AI) Lab and the Department of Computer Science and Engineering at the National Institute of Technology, Silchar, India, for providing the requisite support and infrastructure to execute this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahinur Rahman Laskar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Laskar, S.R., Paul, B., Pakray, P., Bandyopadhyay, S. (2023). Improving English-Assamese Neural Machine Translation Using Transliteration-Based Approach. In: Bhateja, V., Yang, XS., Lin, J.CW., Das, R. (eds) Evolution in Computational Intelligence. FICTA 2022. Smart Innovation, Systems and Technologies, vol 326. Springer, Singapore. https://doi.org/10.1007/978-981-19-7513-4_20

Download citation

Publish with us

Policies and ethics