research-article

Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets

Authors:

Joseph Chee Chang,

Saleema Amershi,

Ece KamarAuthors Info & Claims

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Pages 2334 - 2346

https://doi.org/10.1145/3025453.3026044

Published: 02 May 2017 Publication History

Abstract

Crowdsourcing provides a scalable and efficient way to construct labeled datasets for training machine learning systems. However, creating comprehensive label guidelines for crowdworkers is often prohibitive even for seemingly simple concepts. Incomplete or ambiguous label guidelines can then result in differing interpretations of concepts and inconsistent labels. Existing approaches for improving label quality, such as worker screening or detection of poor work, are ineffective for this problem and can lead to rejection of honest work and a missed opportunity to capture rich interpretations about data. We introduce Revolt, a collaborative approach that brings ideas from expert annotation workflows to crowd-based labeling. Revolt eliminates the burden of creating detailed label guidelines by harnessing crowd disagreements to identify ambiguous concepts and create rich structures (groups of semantically related items) for post-hoc label decisions. Experiments comparing Revolt to traditional crowdsourced labeling show that Revolt produces high quality labels without requiring label guidelines in turn for an increase in monetary cost. This up front cost, however, is mitigated by Revolt's ability to produce reusable structures that can accommodate a variety of label boundaries without requiring new data to be collected. Further comparisons of Revolt's collaborative and non-collaborative variants show that collaboration reaches higher label accuracy with lower monetary cost.

References

[1]

1995. The 20 Newsgroups Dataset. (1995). http://people.csail.mit.edu/jrennie/20Newsgroups/

[2]

Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 19--26.

Digital Library

[3]

Omar Alonso, Catherine C Marshall, and Marc Najork. 2013. Are some tweets more interesting than others' hardquestion. In Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval. ACM, 2.

Digital Library

[4]

Paul André, Aniket Kittur, and Steven P Dow. 2014. Crowd synthesis: Extracting categories and clusters from complex data. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 989--998.

Digital Library

[5]

Yoram Bachrach, Thore Graepel, Tom Minka, and John Guiver. 2012. How to grade a test without knowing the answers--A Bayesian graphical model for adaptive crowdsourcing and aptitude testing. The proceedings of the International Conference on Machine Learning (2012).

[6]

Michael S. Bernstein, Joel Brandt, Robert C. Miller, and David R. Karger. 2011. Crowds in Two Seconds: Enabling Realtime Crowd-powered Interfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 33--42.

Digital Library

[7]

Jeffrey P Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and others. 2010. VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, 333--342.

Digital Library

[8]

Jonathan Bragg, Daniel S Weld, and others. 2013. Crowdsourcing multi-label classification for taxonomy creation. In First AAAI conference on human computation and crowdsourcing.

[9]

Michele A. Burton, Erin Brady, Robin Brewer, Callie Neylan, Jeffrey P. Bigham, and Amy Hurst. 2012. Crowdsourcing Subjective Fashion Advice Using VizWiz: Challenges and Opportunities. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, New York, NY, USA, 135--142.

Digital Library

[10]

Chris Callison-Burch and Mark Dredze. 2010. Creating speech and language data with Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, 1--12.

Digital Library

[11]

Joel Chan, Steven Dang, and Steven P Dow. 2016. Improving crowd innovation with expert facilitation. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 1223--1235.

Digital Library

[12]

Joseph Chee Chang, Aniket Kittur, and Nathan Hahn. 2016. Alloy: Clustering with Crowds and Computation. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 3180--3191.

Digital Library

[13]

Joseph Z. Chang, Jason S. Chang, and Jyh-Shing Roger Jang. 2012. Learning to Find Translations and Transliterations on the Web. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL '12). Association for Computational Linguistics, 130--134.

[14]

Lydia B Chilton, Juho Kim, Paul André, Felicia Cordeiro, James A Landay, Daniel S Weld, Steven P Dow, Robert C Miller, and Haoqi Zhang. 2014. Frenzy: collaborative data organization for creating conference sessions. In Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 1255--1264.

Digital Library

[15]

Lydia B Chilton, Greg Little, Darren Edge, Daniel S Weld, and James A Landay. 2013. Cascade: Crowdsourcing taxonomy creation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1999--2008.

Digital Library

[16]

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web (WWW '12). ACM, New York, NY, USA, 469--478.

Digital Library

[17]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.

[18]

Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-a-crowd: Tell Me What You Like, and I'Ll Tell You What to Do. In Proceedings of the 22Nd International Conference on World Wide Web (WWW '13). ACM, New York, NY, USA, 367--374.

Digital Library

[19]

Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a Learning Science for Complex Crowdsourcing Tasks. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2623--2634.

Digital Library

[20]

Steven Dow, Anand Kulkarni, Scott Klemmer, and Björn Hartmann. 2012. Shepherding the crowd yields better work. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 1013--1022.

Digital Library

[21]

Ryan Drapeau, Lydia B Chilton, Jonathan Bragg, and Daniel S Weld. 2016. MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy. (2016).

[22]

Derek L Hansen, Patrick J Schone, Douglas Corey, Matthew Reid, and Jake Gehring. 2013. Quality control mechanisms for crowdsourcing: peer review, arbitration, & expertise at familysearch indexing. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 649--660.

Digital Library

[23]

Google Inc. 2016. Google Search Quality Evaluator Guidelines. (2016). http://static.googleusercontent. com/media/google.com/en//insidesearch/howsearchworks/ assets/searchqualityevaluatorguidelines.pdf

[24]

Oana Inel, Khalid Khamkham, Tatiana Cristea, Anca Dumitrache, Arne Rutjes, Jelle van der Ploeg, Lukasz Romaszko, Lora Aroyo, and Robert-Jan Sips. 2014. Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data. In International Semantic Web Conference. Springer, 486--504.

Digital Library

[25]

Panagiotis G Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD workshop on human computation. ACM, 64--67.

Digital Library

[26]

Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. ACM, 1637--1648.

Digital Library

[27]

Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 467--474.

Digital Library

[28]

Joy Kim, Justin Cheng, and Michael S Bernstein. 2014. Ensemble: exploring complementary strengths of leaders and crowds in creative collaboration. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, 745--755.

Digital Library

[29]

Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1301--1318.

Digital Library

[30]

James Q Knowlton. 1966. On the definition of "picture". AV Communication Review 14, 2 (1966), 157--183.

[31]

Ranjay A. Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, and Michael S. Bernstein. 2016. Embracing Error to Enable Rapid Crowdsourcing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 3167--3179.

Digital Library

[32]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[33]

Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084.

Digital Library

[34]

Gierad Laput, Walter S Lasecki, Jason Wiese, Robert Xiao, Jeffrey P Bigham, and Chris Harrison. 2015. Zensors: Adaptive, rapidly deployable, human-intelligent sensor feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1935--1944.

Digital Library

[35]

Walter S Lasecki, Mitchell Gordon, Danai Koutra, Malte F Jung, Steven P Dow, and Jeffrey P Bigham. 2014. Glance: Rapidly coding behavioral video with the crowd. In Proceedings of the 27th annual ACM symposium on User interface software and technology. ACM, 551--562.

Digital Library

[36]

Walter S Lasecki, Rachel Wesley, Jeffrey Nichols, Anand Kulkarni, James F Allen, and Jeffrey P Bigham. 2013. Chorus: a crowd-powered conversational assistant. In Proceedings of the 26th annual ACM symposium on User interface software and technology. ACM, 151--162.

Digital Library

[37]

Kathleen M MacQueen, Eleanor McLellan, Kelly Kay, and Bobby Milstein. 1998. Codebook development for team-based qualitative analysis. Cultural anthropology methods 10, 2 (1998), 31--36.

[38]

A. Mao, Y. Chen, K.Z. Gajos, D.C. Parkes, A.D. Procaccia, and H. Zhang. 2012. TurkServer: Enabling Synchronous and Longitudinal Online Experiments. In Proceedings of HCOMP'12.

[39]

Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics 19, 2 (1993), 313--330.

Digital Library

[40]

Matt McGee. 2012. Yes, Bing Has Human Search Quality Raters and Here's How They Judge Web Pages. (2012). http://searchengineland.com/ bing-search-quality-rating-guidelines-130592

[41]

Brian McInnis, Dan Cosley, Chaebong Nam, and Gilly Leshed. 2016. Taking a HIT: Designing around Rejection, Mistrust, Risk, and WorkersaAZ A Experiences in Amazon Mechanical Turk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2271--2282.

Digital Library

[42]

George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.

Digital Library

[43]

Tanushree Mitra, Clayton J Hutto, and Eric Gilbert. 2015. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1345--1354.

Digital Library

[44]

Barbara Plank, Dirk Hovy, and Anders Søgaard. 2014. Linguistically debatable or just plain wrong?. In ACL (2). 507--511.

[45]

Matt Post, Chris Callison-Burch, and Miles Osborne. 2012. Constructing parallel corpora for six indian languages via crowdsourcing. In Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics, 401--409.

[46]

Philip Resnik and Noah A. Smith. 2003. The Web As a Parallel Corpus. Comput. Linguist. 29, 3 (Sept. 2003), 349--380.

Digital Library

[47]

Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. 2011. An assessment of intrinsic and extrinsic motivation on task performance in crowdsourcing markets. ICWSM 11 (2011), 17--21.

[48]

Jeffrey Rzeszotarski and Aniket Kittur. 2012. CrowdScape: interactively visualizing user behavior and output. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 55--62.

Digital Library

[49]

Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and Fast-but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, 254--263. http://dl.acm.org/citation.cfm?id=1613715.1613751

Digital Library

[50]

Anselm Strauss and Juliet Corbin. 1998. Basics of qualitative research: Techniques and procedures for developing grounded theory . Sage Publications, Inc.

[51]

Ann Taylor, Mitchell Marcus, and Beatrice Santorini. 2003. The Penn treebank: an overview. In Treebanks. Springer, 5--22.

[52]

Long Tran-Thanh, Trung Dong Huynh, Avi Rosenfeld, Sarvapali D. Ramchurn, and Nicholas R. Jennings. 2014. BudgetFix: Budget Limited Crowdsourcing for Interdependent Task Allocation with Quality Guarantees. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS '14). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 477--484. http://dl.acm.org/citation.cfm?id=2615731.2615809

[53]

Cynthia Weston, Terry Gandell, Jacinthe Beauchamp, Lynn McAlpine, Carol Wiseman, and Cathy Beauchamp. 2001. Analyzing interview data: The development and evolution of a coding system. Qualitative sociology 24, 3 (2001), 381--400.

[54]

Janyce M Wiebe, Rebecca F Bruce, and Thomas P O'Hara. 1999. Development and use of a gold-standard data set for subjectivity classifications. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 246--253.

Digital Library

Cited By

Han TXu WFang YDing X(2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125284
Chang CTang YYang XChen XIgarashi T(2024)Speed Labeling: Non-stop Scrolling for Fast Image LabelingProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670958(1-10)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670958
Oppenlaender JAbbas TGadiraju U(2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3641023
Show More Cited By

Index Terms

Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets
1. Human-centered computing

Recommendations

Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers(for example, if the data is high-dimensional or unintuitive, or the labels are ...
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

May 2017

7138 pages

ISBN:9781450346559

DOI:10.1145/3025453

General Chairs:
Gloria Mark
University of California Irvine
,
Susan Fussell
Cornell University
,
Program Chairs:
Cliff Lampe
University of Michigan
,
m.c. schraefel
University of Southampton
,
Juan Pablo Hourcade
University of Iowa
,
Caroline Appert
Université Paris-Sud
,
Daniel Wigdor
University of Toronto

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '17

Sponsor:

SIGCHI

CHI '17: CHI Conference on Human Factors in Computing Systems

May 6 - 11, 2017

Colorado, Denver, USA

Acceptance Rates

CHI '17 Paper Acceptance Rate 600 of 2,400 submissions, 25%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI '25

Sponsor:
sigchi

CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

168
Total Citations
View Citations
2,113
Total Downloads

Downloads (Last 12 months)259
Downloads (Last 6 weeks)35

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Han TXu WFang YDing X(2025)Large Scale Anonymous Collusion and its detection in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125284259(125284)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125284
Chang CTang YYang XChen XIgarashi T(2024)Speed Labeling: Non-stop Scrolling for Fast Image LabelingProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670958(1-10)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670958
Oppenlaender JAbbas TGadiraju U(2024)The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and GuidelinesProceedings of the ACM on Human-Computer Interaction10.1145/36410238:CSCW1(1-45)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3641023
Ara ZSalemi HHong SSenarath YPeterson SHughes APurohit H(2024)Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic SystemsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645214(405-418)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645214
Mohammadzadeh BFrançoise JGouiffès MCaramiaux B(2024)Studying Collaborative Interactive Machine Teaching in Image ClassificationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645204(195-208)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645204
Park JKo EPark YYim JKim J(2024)DynamicLabels: Supporting Informed Construction of Machine Learning Label Sets with Crowd FeedbackProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645157(209-228)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645157
Salim SHoque MMueller K(2024)Belief Miner: A Methodology for Discovering Causal Beliefs and Causal Illusions from General PopulationsProceedings of the ACM on Human-Computer Interaction10.1145/36372988:CSCW1(1-37)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637298
Kommiya Mothilal RGuha SAhmed S(2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642501
de Groot EGadiraju U(2024)"Are we all in the same boat?" Customizable and Evolving Avatars to Improve Worker Engagement and Foster a Sense of Community in Online Crowd WorkProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642429(1-26)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642429
Li CZhang ZSaugstad MSafranchik EKulkarni CHuang XPatel SIyer VAlthoff TFroehlich J(2024)LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing SystemsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642089(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642089
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents