Computer Science > Computation and Language

arXiv:2406.03893 (cs)

[Submitted on 6 Jun 2024]

Title:How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Authors:Anushka Singh, Ananya B. Sai, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M Khapra

Abstract:While machine translation evaluation has been studied primarily for high-resource languages, there has been a recent interest in evaluation for low-resource languages due to the increasing availability of data and models. In this paper, we focus on a zero-shot evaluation setting focusing on low-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi. We collect sufficient Multi-Dimensional Quality Metrics (MQM) and Direct Assessment (DA) annotations to create test sets and meta-evaluate a plethora of automatic evaluation metrics. We observe that even for learned metrics, which are known to exhibit zero-shot performance, the Kendall Tau and Pearson correlations with human annotations are only as high as 0.32 and 0.45. Synthetic data approaches show mixed results and overall do not help close the gap by much for these languages. This indicates that there is still a long way to go for low-resource evaluation.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.03893 [cs.CL]
	(or arXiv:2406.03893v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.03893

Submission history

From: Anushka Singh [view email]
[v1] Thu, 6 Jun 2024 09:28:08 UTC (7,611 KB)

Computer Science > Computation and Language

Title:How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators