Computer Science > Computation and Language

arXiv:2305.16581 (cs)

[Submitted on 26 May 2023]

Title:An Investigation of Noise in Morphological Inflection

Authors:Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann

View PDF

Abstract:With a growing focus on morphological inflection systems for languages where high-quality data is scarce, training data noise is a serious but so far largely ignored concern. We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data. Then, we compare the effect of different types of noise on multiple state-of-the-art inflection models. Finally, we propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise. Our experiments show that various architectures are impacted differently by separate types of noise, but encoder-decoders tend to be more robust to noise than models trained with a copy bias. CMLM pretraining helps transformers, but has lower impact on LSTMs.

Comments:	ACL 2023 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.16581 [cs.CL]
	(or arXiv:2305.16581v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.16581

Submission history

From: Adam Wiemerslage [view email]
[v1] Fri, 26 May 2023 02:14:34 UTC (7,526 KB)

Computer Science > Computation and Language

Title:An Investigation of Noise in Morphological Inflection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Investigation of Noise in Morphological Inflection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators