research-article

Public Access

Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks

Authors:

Jeffrey HeerAuthors Info & Claims

CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing

Pages 1637 - 1648

https://doi.org/10.1145/2818048.2820016

Published: 27 February 2016 Publication History

Abstract

Crowdsourcing is a common strategy for collecting the “gold standard” labels required for many natural language applications. Crowdworkers differ in their responses for many reasons, but existing approaches often treat disagreements as "noise" to be removed through filtering or aggregation. In this paper, we introduce the workflow design pattern of crowd parting: separating workers based on shared patterns in responses to a crowdsourcing task. We illustrate this idea using an automated clustering-based method to identify divergent, but valid, worker interpretations in crowdsourced entity annotations collected over two distinct corpora -- Wikipedia articles and Tweets. We demonstrate how the intermediate-level view provide by crowd-parting analysis provides insight into sources of disagreement not easily gleaned from viewing either individual annotation sets or aggregated results. We discuss several concrete applications for how this approach could be applied directly to improving the quality and efficiency of crowdsourced annotation tasks.

References

[1]

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality Control in Crowdsourcing Systems: Issues and Directions. IEEE Internet Computing 17, 2 (2013), 76–81.

Digital Library

[2]

Paul André, Michael Bernstein, and Kurt Luther. 2012. Who Gives a Tweet?: Evaluating Microblog Content Value. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 471–474.

Digital Library

[3]

Paul André, Aniket Kittur, and Steven P. Dow. 2014. Crowd Synthesis: Extracting Categories and Clusters from Complex Data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '14). ACM, New York, NY, USA, 989–998.

Digital Library

[4]

Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine 36, 1 (2015), 15–24. http://www.aaai.org/ojs/ index.php/aimagazine/article/view/2564

Digital Library

[5]

Kenneth Benoit, Drew Conway, Benjamin E Lauderdale, Michael Laver, and Slava Mikhaylov. 2014. Crowd-sourced text analysis: reproducible and agile production of political data. (2014). Presentation at 3rd annual 'New Directions in Analyzing Text as Data' Conference.

[6]

Anthony Brew, Derek Greene, and Pádraig Cunningham. 2010. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In Proceedings of the 2010 Conference on ECAI 2010: 19th European Conference on Artficial Intelligence. IOS Press, Amsterdam, The Netherlands, The Netherlands, 145–150. http://dl.acm.org/citation.cfm?id=1860967.1860997

Digital Library

[7]

Chris Callison-Burch. 2009. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1 (EMNLP '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 286–295. http://dl.acm.org/citation.cfm?id=1699510.1699548

[8]

Timothy Chklovski and Rada Mihalcea. 2003. Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation. In RANLP '03.

[9]

Ofer Dekel and Ohad Shamir. 2009. Vox Populi: Collecting High-Quality Labels from a Crowd. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT '09).

[10]

Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-a-crowd: Tell Me What You Like, and I'll Tell You What to Do. In Proceedings of the 22nd International Conference on World Wide Web (WWW '13). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 367–374. http://dl.acm.org/citation.cfm?id=2488388.2488421

Digital Library

[11]

Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are Your Participants Gaming the System?: Screening Mechanical Turk Workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2399–2402.

Digital Library

[12]

Anca Dumitrache. 2015. Crowdsourcing Disagreement for Collecting Semantic Annotation. In The Semantic Web. Latest Advances and New Domains, Fabien Gandon, Marta Sabou, Harald Sack, Claudia d'Amato, Philippe Cudré-Mauroux, and Antoine Zimmermann (Eds.). Lecture Notes in Computer Science, Vol. 9088. Springer International Publishing, 701–710.

[13]

Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating Named Entities in Twitter Data with Crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 80–88. http://dl.acm.org/citation.cfm?id=1866696.1866709

Digital Library

[14]

Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378–382.

[15]

Antonio Foncubierta Rodríguez and Henning Müller. 2012. Ground Truth Generation in Medical Imaging: A Crowdsourcing-based Iterative Approach. In Proceedings of the ACM Multimedia 2012 Workshop on Crowdsourcing for Multimedia (CrowdMM '12). ACM, New York, NY, USA, 9–14.

Digital Library

[16]

Pei-Yun Hsueh, Prem Melville, and Vikas Sindhwani. 2009. Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing (HLT '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 27–35. http://dl.acm.org/citation.cfm?id=1564131.1564137

Digital Library

[17]

Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality Management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP '10). ACM, New York, NY, USA, 64–67.

Digital Library

[18]

Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar, and Dean Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the Crowd. In Proceedings of COLING 2012: Posters. The COLING 2012 Organizing Committee, Mumbai, India, 539–548. http://www.aclweb.org/anthology/C12-2053

[19]

Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing Step-by-step Information Extraction to Enhance Existing How-to Videos. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 4017–4026.

Digital Library

[20]

Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453–456.

Digital Library

[21]

James Q. Knowlton. 1966. On the Definition of "Picture". AV Communication Review 14, 2 (1966), pp. 157–183. http://www.jstor.org/stable/30217297

[22]

Nicholas Kong, Marti A. Hearst, and Maneesh Agrawala. 2014. Extracting References Between Text and Charts via Crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 31–40.

Digital Library

[23]

Balaji Lakshminarayanan and Yee Whye Teh. 2013. Inferring ground truth from multi-annotator ordinal data: a probabilistic approach. ArXiv e-prints (2013). http://adsabs.harvard.edu/abs/2013arXiv1305.0015L

[24]

Kurt Luther, Jari-Lee Tolentino, Wei Wu, Amy Pavel, Brian P. Bailey, Maneesh Agrawala, Björn Hartmann, and Steven P. Dow. 2015. Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 473–485.

Digital Library

[25]

Diana Lynn MacLean and Jeffrey Heer. 2013. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association 20, 6 (2013), 1120–1127.

[26]

Tanushree Mitra, C.J. Hutto, and Eric Gilbert. 2015. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1345–1354.

Digital Library

[27]

Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2012. Learning multilingual named entity recognition from Wikipedia. Artficial Intelligence 194 (2012), 151–175.

Digital Library

[28]

Stefanie Nowak and Stefan Rüger. 2010. How Reliable Are Annotations via Crowdsourcing: A Study About Inter-annotator Agreement for Multi-label Image Annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR '10). ACM, New York, NY, USA, 557–566.

Digital Library

[29]

Gabriel Parent and Maxine Eskenazi. 2010. Clustering Dictionary Definitions Using Amazon Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 21–29. http://dl.acm.org/citation.cfm?id=1866696.1866699

Digital Library

[30]

Vikas C Raykar and Shipeng Yu. 2011. Ranking annotators for crowdsourced labeling tasks. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 1809–1817. http://papers.nips.cc/paper/ 4469-ranking-annotators-for-crowdsourced-labeling-tasks. pdf

[31]

Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named Entity Recognition in Tweets: An Experimental Study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 1524–1534. http://dl.acm.org/citation.cfm?id=2145432.2145595

Digital Library

[32]

Jeffrey M. Rzeszotarski and Aniket Kittur. 2011. Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 13–22.

Digital Library

[33]

Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 826–838.

Digital Library

[34]

Pao Siangliulue, Kenneth C. Arnold, Krzysztof Z. Gajos, and Steven P. Dow. 2015. Toward Collaborative Ideation at Scale: Leveraging Ideas from Others to Generate More Creative and Diverse Ideas. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 937–945.

Digital Library

[35]

Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 254–263. http://dl.acm.org/citation.cfm?id=1613715.1613751

Digital Library

[36]

Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. 2007. Internet-scale Collection of Human-reviewed Data. In Proceedings of the 16th International Conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 231–240.

Digital Library

[37]

Anna Tordai, Jacco van Ossenbruggen, Guus Schreiber, and Bob Wielinga. 2011. Let's Agree to Disagree: On the Evaluation of Vocabulary Alignment. In Proceedings of the Sixth International Conference on Knowledge Capture (K-CAP '11). ACM, New York, NY, USA, 65–72.

Digital Library

[38]

Robert Voyer, Valerie Nygaard, Will Fitzgerald, and Hannah Copperman. 2010. A Hybrid Model for Annotating Named Entity Training Corpora. In Proceedings of the Fourth Linguistic Annotation Workshop (LAW IV '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 243–246. http://dl.acm.org/citation.cfm?id=1868720.1868759

Digital Library

[39]

Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association 58, 301 (1963), 236–244. http://www. tandfonline.com/doi/abs/10.1080/01621459.1963.10500845

[40]

P. Welinder and P. Perona. 2010. Online crowdsourcing: Rating annotators and obtaining cost-effective labels. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. 25–32.

[41]

Janyce M. Wiebe, Rebecca F. Bruce, and Thomas P. O'Hara. 1999. Development and Use of a Gold-standard Data Set for Subjectivity Classfications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL '99). Association for Computational Linguistics, Stroudsburg, PA, USA, 246–253.

Digital Library

[42]

Wesley Willett, Shiry Ginosar, Avital Steinitz, Björn Hartmann, and Maneesh Agrawala. 2013. Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis. IEEE Transactions on Visualization and Computer Graphics 19, 12 (Dec. 2013), 2198–2206.

Digital Library

[43]

Wesley Willett, Jeffrey Heer, and Maneesh Agrawala. 2012. Strategies for Crowdsourcing Social Data Analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, New York, NY, USA, 227–236.

Digital Library

Cited By

Deng ZXiang Y(2025)A partitioning Monte Carlo approach for consensus tasks in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125559262(125559)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125559
Valentine MBohn RPratt AJain PSinger SBernstein M(2024)Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care UnitProceedings of the ACM on Human-Computer Interaction10.1145/36870298:CSCW2(1-29)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3687029
Ara ZSalemi HHong SSenarath YPeterson SHughes APurohit H(2024)Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic SystemsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645214(405-418)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645214
Show More Cited By

Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks
1. Computing methodologies
  1. Artificial intelligence

Recommendations

A Community Rather Than A Union: Understanding Self-Organization Phenomenon on MTurk and How It Impacts Turkers and Requesters
CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems

This paper aims to understand the self-organization phenomenon among the workers of Amazon Mechanical Turk (MTurk), a well-known crowdsourcing platform. Specifically, we explored 1) why MTurk workers self-organize into online communities (Turker ...
Efficient Crowd Exploration of Large Networks: The Case of Causal Attribution

Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive "microtasks". We study the crowdsourcing of large networks where the crowd provides the network topology via ...
How many crowdsourced workers should a requester hire?

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a potentially large group of workers at a reduced cost. The crowdsourcing process, as we consider in this paper, is as follows: a requester hires a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing

February 2016

1866 pages

ISBN:9781450335928

DOI:10.1145/2818048

General Chairs:
Darren Gergle
Northwestern University
,
Meredith Ringel Morris
Microsoft Research
,
Program Chairs:
Pernille Bjørn
University of Copenhagen
,
Joseph Konstan
University of Minnesota

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

DARPA

Conference

CSCW '16

Sponsor:

SIGCHI

CSCW '16: Computer Supported Cooperative Work and Social Computing

February 27 - March 2, 2016

California, San Francisco, USA

Acceptance Rates

CSCW '16 Paper Acceptance Rate 142 of 571 submissions, 25%;

Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

Upcoming Conference

CSCW '25

Sponsor:
sigchi

Computer-Supported Cooperative Work and Social Computing

October 18 - 22, 2025

Bergen , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
726
Total Downloads

Downloads (Last 12 months)197
Downloads (Last 6 weeks)26

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Deng ZXiang Y(2025)A partitioning Monte Carlo approach for consensus tasks in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125559262(125559)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125559
Valentine MBohn RPratt AJain PSinger SBernstein M(2024)Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care UnitProceedings of the ACM on Human-Computer Interaction10.1145/36870298:CSCW2(1-29)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3687029
Ara ZSalemi HHong SSenarath YPeterson SHughes APurohit H(2024)Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic SystemsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645214(405-418)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645214
Park JKo EPark YYim JKim J(2024)DynamicLabels: Supporting Informed Construction of Machine Learning Label Sets with Crowd FeedbackProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645157(209-228)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645157
Salim SHoque MMueller K(2024)Belief Miner: A Methodology for Discovering Causal Beliefs and Causal Illusions from General PopulationsProceedings of the ACM on Human-Computer Interaction10.1145/36372988:CSCW1(1-37)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637298
Yang YShin AKim NWoo HChung JSong J(2024)Find the Bot!: Gamifying Facial Emotion Recognition for Both Human Training and Machine Learning Data CollectionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642880(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642880
Li SXia XDeng JGe SLiu T(2024)Transferring Annotator- and Instance-Dependent Transition Matrix for Learning From CrowdsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338820946:11(7377-7391)Online publication date: Nov-2024
https://doi.org/10.1109/TPAMI.2024.3388209
Tahaei NBergler S(2024)Experiments in Modeling DisagreementArtificial Neural Networks in Pattern Recognition10.1007/978-3-031-71602-7_21(245-255)Online publication date: 19-Sep-2024
https://doi.org/10.1007/978-3-031-71602-7_21
Narimanzadeh HBadie-Modiri ASmirnova IChen T(2023)Crowdsourcing Subjective Annotations Using Pairwise Comparisons Reduces Bias and Error Compared to the Majority-vote MethodProceedings of the ACM on Human-Computer Interaction10.1145/36101837:CSCW2(1-29)Online publication date: 4-Oct-2023
https://dl.acm.org/doi/10.1145/3610183
Chen QZhang A(2023)Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus DisagreementProceedings of the ACM on Human-Computer Interaction10.1145/36100747:CSCW2(1-26)Online publication date: 4-Oct-2023
https://dl.acm.org/doi/10.1145/3610074
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents