Automatically Identifying Gender Issues in Machine Translation using Perturbations

Abstract

The successful application of neural methods to machine translation has realized huge quality advances for the community. With these improvements, many have noted outstanding challenges, including the modeling and treatment of gendered language. While previous studies have identified issues using synthetic examples, we develop a novel technique to mine examples from real world data to explore challenges for deployed systems. We use our method to compile an evaluation benchmark spanning examples for four languages from three language families, which we publicly release to facilitate research. The examples in our benchmark expose where model representations are gendered, and the unintended consequences these gendered representations can have in downstream application.

Anthology ID:: 2020.findings-emnlp.180
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1991–1995
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.180
DOI:: 10.18653/v1/2020.findings-emnlp.180
Bibkey:
Cite (ACL):: Hila Gonen and Kellie Webster. 2020. Automatically Identifying Gender Issues in Machine Translation using Perturbations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1991–1995, Online. Association for Computational Linguistics.
Cite (Informal):: Automatically Identifying Gender Issues in Machine Translation using Perturbations (Gonen & Webster, Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.180.pdf
Video:: https://slideslive.com/38940033

PDF Cite Search Video