Computer Science > Computation and Language

arXiv:2407.11833 (cs)

[Submitted on 16 Jul 2024]

Title:LoFTI: Localization and Factuality Transfer to Indian Locales

Authors:Sona Elza Simon (1), Soumen Kumar Mondal (1), Abhishek Singhania (2), Sayambhu Sen (2), Preethi Jyothi (1) ((1) Indian Institute of Technology Bombay, (2) Amazon Alexa)

View PDF HTML (experimental)

Abstract:Large language models (LLMs) encode vast amounts of world knowledge acquired via training on large web-scale datasets crawled from the internet. However, these datasets typically exhibit a geographical bias towards English-speaking Western countries. This results in LLMs producing biased or hallucinated responses to queries that require answers localized to other geographical regions. In this work, we introduce a new benchmark named LoFTI (Localization and Factuality Transfer to Indian Locales) that can be used to evaluate an LLM's localization and factual text transfer capabilities. LoFTI consists of factual statements about entities in source and target locations; the source locations are spread across the globe and the target locations are all within India with varying degrees of hyperlocality (country, states, cities). The entities span a wide variety of categories. We use LoFTI to evaluate Mixtral, GPT-4 and two other Mixtral-based approaches well-suited to the task of localized factual transfer. We demonstrate that LoFTI is a high-quality evaluation benchmark and all the models, including GPT-4, produce skewed results across varying levels of hyperlocality.

Comments:	21 pages
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2407.11833 [cs.CL]
	(or arXiv:2407.11833v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.11833

Submission history

From: Sona Elza Simon [view email]
[v1] Tue, 16 Jul 2024 15:20:43 UTC (3,776 KB)

Computer Science > Computation and Language

Title:LoFTI: Localization and Factuality Transfer to Indian Locales

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LoFTI: Localization and Factuality Transfer to Indian Locales

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators