Computer Science > Computation and Language

arXiv:2308.05574 (cs)

[Submitted on 10 Aug 2023]

Title:Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Authors:Danish Ebadulla, Rahul Raman, S. Natarajan, Hridhay Kiran Shetty, Ashish Harish Shenoy

View PDF

Abstract:Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or compute requirements. Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time. In this work, we overcome the shortcomings of zero-shot translation by taking advantage of transliteration and linguistic similarity. We build a single encoder-decoder neural machine translation system for Dravidian-Dravidian multilingual translation and perform zero-shot translation. We compare the data vs zero-shot accuracy tradeoff and evaluate the performance of our vanilla method against the current state of the art pivot based method. We also test the theory that morphologically rich languages require large vocabularies by restricting the vocabulary using an optimal transport based technique. Our model manages to achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50\% of the language directions.

Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.05574 [cs.CL]
	(or arXiv:2308.05574v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.05574

Submission history

From: Rahul Raman [view email]
[v1] Thu, 10 Aug 2023 13:38:09 UTC (20 KB)

Computer Science > Computation and Language

Title:Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators