Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3462204.3481729acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
extended-abstract

Investigating and Mitigating Biases in Crowdsourced Data

Published: 23 October 2021 Publication History

Abstract

It is common practice for machine learning systems to rely on crowdsourced label data for training and evaluation. It is also well-known that biases present in the label data can induce biases in the trained models. Biases may be introduced by the mechanisms used for deciding what data should/could be labelled or by the mechanisms employed to obtain the labels. Various approaches have been proposed to detect and correct biases once the label dataset has been constructed. However, proactively reducing biases during the data labelling phase and ensuring data fairness could be more economical compared to post-processing bias mitigation approaches. In this workshop, we aim to foster discussion on ongoing research around biases in crowdsourced data and to identify future research directions to detect, quantify and mitigate biases before, during and after the labelling process such that both task requesters and crowd workers can benefit. We will explore how specific crowdsourcing workflows, worker attributes, and work practices contribute to biases in the labelled data; how to quantify and mitigate biases as part of the labelling process; and how such mitigation approaches may impact workers and the crowdsourcing ecosystem. The outcome of the workshop will include a collaborative publication of a research agenda to improve or develop novel methods relating to crowdsourcing tools, processes and work practices to address biases in crowdsourced data. We also plan to run a Crowd Bias Challenge prior to the workshop, where participants will be asked to collect labels for a given dataset while minimising potential biases.

References

[1]
Hadis Anahideh, Abolfazl Asudeh, and Saravanan Thirumuruganathan. 2020. Fair active learning. Vol. 1. ACM. arxiv:2001.01796
[2]
Abolfazl Asudeh, Zhongjun Jin, and H. V. Jagadish. 2019. Assessing and remedying coverage for a given dataset. Proceedings - International Conference on Data Engineering 2019-April (2019), 554–565. https://doi.org/10.1109/ICDE.2019.00056
[3]
Agathe Balayn, Panagiotis Mavridis, Alessandro Bozzon, Benjamin Timmermans, and Zoltán Szlávik. 2018. Characterising and mitigating aggregation-bias in crowdsourced toxicity annotations. CEUR Workshop Proceedings 2276 (2018), 67–71.
[4]
Natã M. Barbosa and Monchu Chen. 2019. Rehumanized crowdsourcing: A labeling framework addressing bias and ethics in machine learning. Conference on Human Factors in Computing Systems - Proceedings (2019), 1–12. https://doi.org/10.1145/3290605.3300773
[5]
Flavio P. Calmon, Dennis Wei, Karthikeyan Natesan Ramamurthy, and Kush R. Varshney. 2017. Optimized data pre-processing for discrimination prevention. arXivNips(2017).
[6]
Carsten Eickhoff. 2018. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM ’18). ACM, USA, 162–170. https://doi.org/10.1145/3159652.3159654
[7]
Mor Geva, Yoav Goldberg, and Jonathan Berant. 2019. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (2019), 1161–1166.
[8]
Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, and Klaus Mueller. 2020. Measuring Social Biases of Crowd Workers using Counterfactual Queries. In Workshop on Fair and Responsible AI at ACM CHI 2020.
[9]
Naman Goel and Boi Faltings. 2019. Crowdsourcing with Fairness, Diversity and Budget Constraints. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society(AIES ’19). ACM, USA, 297–304. https://doi.org/10.1145/3306618.3314282
[10]
Jorge Goncalves, Simo Hosio, Jakob Rogstadius, Evangelos Karapanos, and Vassilis Kostakos. 2015. Motivating participation and improving quality of contribution in ubiquitous crowdsourcing. Computer Networks 90(2015), 34–48. https://doi.org/10.1016/j.comnet.2015.07.002 Crowdsourcing.
[11]
Jorge Goncalves, Simo Hosio, Niels van Berkel, Furqan Ahmed, and Vassilis Kostakos. 2017. CrowdPickUp: Crowdsourcing Task Pickup in the Wild. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 51 (Sept. 2017), 22 pages. https://doi.org/10.1145/3130916
[12]
Jorge Goncalves, Hannu Kukka, Iván Sánchez, and Vassilis Kostakos. 2016. Crowdsourcing Queue Estimations in Situ. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing(CSCW ’16). ACM, USA, 1040–1051. https://doi.org/10.1145/2818048.2819997
[13]
Danula Hettiachchi, Mike Schaekermann, Tristan J. McKinney, and Matthew Lease. 2021. The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2(2021). arXiv 2105.09457.
[14]
Danula Hettiachchi, Niels van Berkel, Vassilis Kostakos, and Jorge Goncalves. 2020. CrowdCog: A Cognitive Skill based System for Heterogeneous Task Assignment and Recommendation in Crowdsourcing. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (oct 2020), 1–22. https://doi.org/10.1145/3415181
[15]
Simo Hosio, Jorge Goncalves, Vili Lehdonvirta, Denzil Ferreira, and Vassilis Kostakos. 2014. Situated Crowdsourcing Using a Market Model. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology(UIST ’14). ACM, USA, 55–64.
[16]
Simo Johannes Hosio, Niels van Berkel, Jonas Oppenlaender, and Jorge Goncalves. 2020. Crowdsourcing Personalized Weight Loss Diets. Computer 53, 1 (2020), 63–71. https://doi.org/10.1109/MC.2019.2902542
[17]
Xiao Hu, Haobo Wang, Anirudh Vegesana, Somesh Dube, Kaiwen Yu, Gore Kao, Shuo-Han Chen, Yung-Hsiang Lu, George K. Thiruvathukal, and Ming Yin. 2020. Crowdsourcing Detection of Sampling Biases in Image Datasets. In Proceedings of The Web Conference 2020(WWW ’20). ACM, USA, 2955–2961. https://doi.org/10.1145/3366423.3380063
[18]
Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3290605.3300637
[19]
Ece Kamar, Ashish Kapoo, and Eric Horvitz. 2015. Identifying and Accounting for Task-Dependent Bias in Crowdsourcing. Proceedings, The Third AAAI Conference on Human Computation and Crowdsourcing (HCOMP-15) (2015), 92–101.
[20]
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Vol. 33. 1–33 pages. https://doi.org/10.1007/s10115-011-0463-8
[21]
Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, and Emine Yilmaz. 2021. Estimation of Fair Ranking Metrics with Incomplete Judgments. In Proceedings of the Web Conference 2021(WWW ’21). ACM, USA, 1065–1075.
[22]
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. 1301–1318.
[23]
Emmanouil Krasanakis, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2018. Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. Proceedings of the Web Conference 2018 2 (2018), 853–862. https://doi.org/10.1145/3178876.3186133
[24]
Matthew Lease, Vitor Carvalho, and Emine Yilmaz (Eds.). 2010. Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010). Online, Geneva, Switzerland. http://ir.ischool.utexas.edu/cse2010/materials/CSE2010-Proceedings.pdf
[25]
Matthew Lease and Gabriella Kazai. 2011. Overview of the trec 2011 crowdsourcing track. In Proceedings of the text retrieval conference (TREC).
[26]
Matthew Lease and Emine Yilmaz. 2013. Crowdsourcing for Information Retrieval: Introduction to the Special Issue. Information Retrieval 16, 2 (April 2013), 91–100.
[27]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arxiv:1908.09635
[28]
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2(2019). https://doi.org/10.3389/fdata.2019.00013
[29]
Jonas Oppenlaender, Maximilian Mackeprang, Abderrahmane Khiat, Maja Vuković, Jorge Goncalves, and Simo Hosio. 2019. DC2S2: Designing Crowd-Powered Creativity Support Systems. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems(CHI EA ’19). ACM, USA, 1–8. https://doi.org/10.1145/3290607.3299027
[30]
Adam Roegiest 2019. FACTS-IR: Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval. SIGIR Forum 53, 2 (2019).
[31]
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics. 1668–1678.
[32]
Mike Schaekermann, Carrie J Cai, Abigail E Huang, and Rory Sayres. 2020. Expert Discussions Improve Comprehension of Difficult Cases in Medical Image Assessment. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems - CHI ’20. ACM, USA. https://doi.org/10.1145/3313831.3376290
[33]
Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work. In Proceedings of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2018), Vol. 2. USA, 1–19. https://doi.org/10.1145/3274423
[34]
Shilad Sen, Margaret E Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao Wang, and Brent Hecht. 2015. Turkers, Scholars,“Arafat” and “Peace” Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 826–838.
[35]
Milad Shokouhi, Ryen White, and Emine Yilmaz. 2015. Anchoring and Adjustment in Relevance Estimation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’15). ACM, USA, 963–966.
[36]
Mark Smucker, Gabriella Kazai, and Matthew Lease. 2013. Overview of the TREC 2012 Crowdsourcing Track. In Proceedings of the 21st NIST Text Retrieval Conference (TREC).
[37]
Niels Van Berkel, Jorge Goncalves, Danula Hettiachchi, Senuri Wijenayake, Ryan M. Kelly, and Vassilis Kostakos. 2019. Crowdsourcing perceptions of fair predictors for machine learning: A recidivism case study. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019). https://doi.org/10.1145/3359130
[38]
Niels van Berkel, Jorge Goncalves, Daniel Russo, Simo Hosio, and Mikael B. Skov. 2021. Effect of Information Presentation on Fairness Perceptions of Machine Learning Predictors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(CHI ’21). ACM, USA, Article 245, 13 pages. https://doi.org/10.1145/3411764.3445365

Cited By

View all
  • (2024)Crowdsourcing Geospatial Data for Earth and Human Observations: A ReviewJournal of Remote Sensing10.34133/remotesensing.01054Online publication date: 22-Jan-2024
  • (2024)Developing Strategies for Co-designing Assistive Augmentation TechnologiesProceedings of the Augmented Humans International Conference 202410.1145/3652920.3653038(324-326)Online publication date: 4-Apr-2024
  • (2024)Data Biasing Removal with Blockchain and Crowd AnnotationProcedia Computer Science10.1016/j.procs.2024.03.258233:C(692-702)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CSCW '21 Companion: Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social Computing
October 2021
370 pages
ISBN:9781450384797
DOI:10.1145/3462204
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2021

Check for updates

Author Tags

  1. biases
  2. crowdsourcing
  3. data quality

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

CSCW '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

Upcoming Conference

CSCW '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Crowdsourcing Geospatial Data for Earth and Human Observations: A ReviewJournal of Remote Sensing10.34133/remotesensing.01054Online publication date: 22-Jan-2024
  • (2024)Developing Strategies for Co-designing Assistive Augmentation TechnologiesProceedings of the Augmented Humans International Conference 202410.1145/3652920.3653038(324-326)Online publication date: 4-Apr-2024
  • (2024)Data Biasing Removal with Blockchain and Crowd AnnotationProcedia Computer Science10.1016/j.procs.2024.03.258233:C(692-702)Online publication date: 1-Jan-2024
  • (2023)Workshop on Understanding and Mitigating Cognitive Biases in Human-AI CollaborationCompanion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing10.1145/3584931.3611284(512-517)Online publication date: 14-Oct-2023
  • (2023)LINGO : Visually Debiasing Natural Language Instructions to Support Task DiversityComputer Graphics Forum10.1111/cgf.1484042:3(409-421)Online publication date: 27-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media