StereoSet: Measuring stereotypical bias in pretrained language models

Abstract

A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or African Americans are athletic. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real-world data, they are known to capture stereotypical biases. It is important to quantify to what extent these biases are present in them. Although this is a rapidly growing area of research, existing literature lacks in two important aspects: 1) they mainly evaluate bias of pretrained language models on a small set of artificial sentences, even though these models are trained on natural data 2) current evaluations focus on measuring bias without considering the language modeling ability of a model, which could lead to misleading trust on a model even if it is a poor language model. We address both these problems. We present StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion. We contrast both stereotypical bias and language modeling ability of popular models like BERT, GPT-2, RoBERTa, and XLnet. We show that these models exhibit strong stereotypical biases. Our data and code are available at https://stereoset.mit.edu.

Anthology ID:: 2021.acl-long.416
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5356–5371
Language:
URL:: https://aclanthology.org/2021.acl-long.416
DOI:: 10.18653/v1/2021.acl-long.416
Bibkey:
Cite (ACL):: Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
Cite (Informal):: StereoSet: Measuring stereotypical bias in pretrained language models (Nadeem et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-long.416.pdf
Video:: https://aclanthology.org/2021.acl-long.416.mp4
Code: moinnadeem/StereoSet + additional community code
Data: StereoSet

PDF Cite Search Code Video