Computer Science > Computation and Language

arXiv:2404.06224 (cs)

[Submitted on 9 Apr 2024]

Title:Low-Cost Generation and Evaluation of Dictionary Example Sentences

Authors:Bill Cai, Clarence Boon Liang Ng, Daniel Tan, Shelvia Hotama

Abstract:Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models present the opportunity to create low-cost, zero-shot methods for the generation and evaluation of dictionary example sentences. We introduce a new automatic evaluation metric called OxfordEval that measures the win-rate of generated sentences against existing Oxford Dictionary sentences. OxfordEval shows high alignment with human judgments, enabling large-scale automated quality evaluation. We experiment with various LLMs and configurations to generate dictionary sentences across word classes. We complement this with a novel approach of using masked language models to identify and select sentences that best exemplify word meaning. The eventual model, FM-MLM, achieves over 85.1% win rate against Oxford baseline sentences according to OxfordEval, compared to 39.8% win rate for prior model-generated sentences.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2404.06224 [cs.CL]
	(or arXiv:2404.06224v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.06224

Submission history

From: Clarence Boon Liang Ng [view email]
[v1] Tue, 9 Apr 2024 11:26:59 UTC (7,905 KB)

Computer Science > Computation and Language

Title:Low-Cost Generation and Evaluation of Dictionary Example Sentences

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Low-Cost Generation and Evaluation of Dictionary Example Sentences

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators