research-article

Open access

Measuring Model Biases in the Absence of Ground Truth

Authors:

Christina Greer,

Margaret MitchellAuthors Info & Claims

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pages 327 - 335

https://doi.org/10.1145/3461702.3462557

Published: 30 July 2021 Publication History

Abstract

The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice.

We present an elegant mathematical solution that tackles both issues simultaneously, using image classification as a working example. By treating a classification model's predictions for a given image as a set of labels analogous to a "bag of words", we rank the biases that a model has learned with respect to different identity labels. We use man, woman as a concrete example of an identity label set (although this set need not be binary), and present rankings for the labels that are most biased towards one identity or the other. We demonstrate how the statistical properties of different association metrics can lead to different rankings of the most "gender biased" labels, and conclude that normalized pointwise mutual information (nPMI) is most useful in practice. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.

Supplementary Material

ZIP File (aiespp034aux.zip)

Download
2.73 MB

References

[1]

Nikolaos Aletras and Mark Stevenson. 2013. Evaluating Topic Coherence Using Distributional Semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) -- Long Papers. Association for Computational Linguistics, Potsdam, Germany, 13--22. https://www.aclweb.org/ anthology/W13-0102

[2]

Solon Barocas and Andrew D. Selbst. 2014. Big Data's Disparate Impact. SSRN eLibrary (2014).

[3]

Richard Berk. 2016. A primer on fairness in criminal justice risk assessments. The Criminologist 41, 6 (2016), 6--9.

[4]

G. Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction.

[5]

Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (Proceedings of Machine Learning Research, Vol. 81), Sorelle A. Friedler and Christo Wilson (Eds.). PMLR, New York, NY, USA, 77--91. http://proceedings.mlr.press/v81/buolamwini18a.html

[6]

Kaylee Burns, Lisa Anne Hendricks, Trevor Darrell, and Anna Rohrbach. 2018. Women also Snowboard: Overcoming Bias in Captioning Models. CoRR abs/1803.09797 (2018). arXiv:1803.09797 http://arxiv.org/abs/1803.09797

[7]

Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186. https://doi.org/10.1126/science.aal4230 arXiv:https://science.sciencemag.org/content/356/6334/183.full.pdf

[8]

Alexandra Chouldechova. 2016. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. arXiv:1610.07524 [stat.AP]

[9]

KennethWard Church and Patrick Hanks. 1990. Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16, 1 (1990), 22--29. https://www.aclweb.org/anthology/J90--1003

Digital Library

[10]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.

[11]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. 2011. Fairness Through Awareness. CoRR abs/1104.3913 (2011). arXiv:1104.3913 http://arxiv.org/abs/1104.3913

[12]

Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98--136. https://doi.org/10.1007/s11263-014-0733--5

Digital Library

[13]

Robert M Fano. 1961. Transmission of information: A statistical theory of communications. American Journal of Physics 29 (1961), 793--794.

[14]

A. G. Greenwald, D. E. McGhee, and J. L. Schwartz. 1998. Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology 74 (1998). Issue 6.

[15]

Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. arXiv:1610.02413 [cs.LG]

[16]

Zellig Harris. 1954. Distributional structure. Word 10, 2--3 (1954), 146--162. https://doi.org/10.1007/978--94-009--8467--7_1

[17]

Dan Jurafsky and James H. Martin. 2009. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Pearson Prentice Hall, Upper Saddle River, N.J. http://www.amazon.com/Speech-Language-Processing-2nd-Edition/dp/ 0131873210/ref=pd_bxgy_b_img_y

Digital Library

[18]

M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1--2 (06 1938), 81--93. arXiv:https://academic.oup.com/biomet/article-pdf/30/1- 2/81/423380/30--1--2--81.pdf https://doi.org/10.1093/biomet/30.1--2.81

[19]

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper R. R. Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, and Vittorio Ferrari. 2018. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. CoRR abs/1811.00982 (2018). arXiv:1811.00982 http://arxiv.org/abs/1811.00982

[20]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. CoRR abs/1405.0312 (2014). arXiv:1405.0312 http://arxiv.org/abs/1405.0312

[21]

FranÇois Role and Mohamed Nadif. 2011. Handling the Impact of Low Frequency Events on Co-Occurrence Based Measures of Word Similarity - A Case Study of Pointwise Mutual Information. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - KDIR, (IC3K 2011). SciTePress, 218--223.

[22]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. https: //doi.org/10.1007/s11263-015-0816-y

Digital Library

[23]

Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27, 3 (1948), 379--423. http://dblp.uni-trier.de/db/journals/bstj/bstj27. html#Shannon48

[24]

Jacob Snow. 2018. Amazon's Face Recognition Falsely Matched 28 Members of Congress With Mugshots. (2018).

[25]

C. Spearman. 1904. The Proof and Measurement of Association Between Two Things. American Journal of Psychology 15 (1904), 88--103.

[26]

Stanford Vision Lab. 2020. ImageNet. http://image-net.org/explore (2020). accessed 6.Oct.2020.

[27]

Pierre Stock and Moustapha Cisse. 2018. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In Proceedings of the European Conference on Computer Vision (ECCV). 498--512.

[28]

M. P. Toglia and W. F. Battig. 1978. Handbook of semantic word norms. Lawrence Erlbaum.

[29]

Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 2019. Predictive Inequity in Object Detection. CoRR abs/1902.11097 (2019). arXiv:1902.11097 http://arxiv.org/abs/1902.11097

Cited By

Das SStanton RWallace N(2023)Algorithmic FairnessAnnual Review of Financial Economics10.1146/annurev-financial-110921-12593015:1(565-593)Online publication date: 1-Nov-2023
https://doi.org/10.1146/annurev-financial-110921-125930
Chien JRoberts MUstun B(2023)Algorithmic Censoring in Dynamic Learning SystemsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623247(1-20)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623247
Garg TMasud SSuresh TChakraborty T(2023)Handling Bias in Toxic Speech Detection: A SurveyACM Computing Surveys10.1145/358049455:13s(1-32)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3580494
Show More Cited By

Index Terms

Measuring Model Biases in the Absence of Ground Truth
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Using objective ground-truth labels created by multiple annotators for improved video classification: a comparative study
Surfacing Racial Stereotypes through Identity Portrayal
FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Content warning: this paper discusses and contains content that may be offensive or upsetting.

People express racial stereotypes through conversations with others, increasingly in a digital format; as a result, the ability to computationally identify ...
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

Computer vision is widely deployed, has highly visible, society-altering applications, and documented problems with bias and representation. Datasets are critical for benchmarking progress in fair computer vision, and often employ broad racial ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

July 2021

1077 pages

ISBN:9781450384735

DOI:10.1145/3461702

Program Chairs:
Marion Fourcade
University of California Berkeley, USA
,
Benjamin Kuipers
University of Michigan, USA
,
Seth Lazar
Australian National University, Australia
,
Deirdre Mulligan
University of California Berkeley, USA

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AIES '21

Sponsor:

SIGAI

AIES '21: AAAI/ACM Conference on AI, Ethics, and Society

May 19 - 21, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,214
Total Downloads

Downloads (Last 12 months)567
Downloads (Last 6 weeks)84

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Das SStanton RWallace N(2023)Algorithmic FairnessAnnual Review of Financial Economics10.1146/annurev-financial-110921-12593015:1(565-593)Online publication date: 1-Nov-2023
https://doi.org/10.1146/annurev-financial-110921-125930
Chien JRoberts MUstun B(2023)Algorithmic Censoring in Dynamic Learning SystemsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623247(1-20)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623247
Garg TMasud SSuresh TChakraborty T(2023)Handling Bias in Toxic Speech Detection: A SurveyACM Computing Surveys10.1145/358049455:13s(1-32)Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1145/3580494
Sariyildiz MAlahari KLarlus DKalantidis Y(2023)Fake it Till You Make it: Learning Transferable Representations from Synthetic ImageNet Clones2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00774(8011-8021)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00774
Gray MSamala RLiu QSkiles DXu JTong WWu L(2023)Measurement and Mitigation of Bias in Artificial Intelligence: A Narrative Literature Review for Regulatory ScienceClinical Pharmacology & Therapeutics10.1002/cpt.3117115:4(687-697)Online publication date: 12-Dec-2023
https://doi.org/10.1002/cpt.3117
Zhai XKolesnikov AHoulsby NBeyer L(2022)Scaling Vision Transformers2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.01179(1204-1213)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.01179
Fabris AMessina SSilvello GSusto G(2022)Algorithmic fairness datasets: the story so farData Mining and Knowledge Discovery10.1007/s10618-022-00854-z36:6(2074-2152)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s10618-022-00854-z

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents