Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2818048.2820016acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article
Public Access

Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks

Published: 27 February 2016 Publication History

Abstract

Crowdsourcing is a common strategy for collecting the “gold standard” labels required for many natural language applications. Crowdworkers differ in their responses for many reasons, but existing approaches often treat disagreements as "noise" to be removed through filtering or aggregation. In this paper, we introduce the workflow design pattern of crowd parting: separating workers based on shared patterns in responses to a crowdsourcing task. We illustrate this idea using an automated clustering-based method to identify divergent, but valid, worker interpretations in crowdsourced entity annotations collected over two distinct corpora -- Wikipedia articles and Tweets. We demonstrate how the intermediate-level view provide by crowd-parting analysis provides insight into sources of disagreement not easily gleaned from viewing either individual annotation sets or aggregated results. We discuss several concrete applications for how this approach could be applied directly to improving the quality and efficiency of crowdsourced annotation tasks.

References

[1]
Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality Control in Crowdsourcing Systems: Issues and Directions. IEEE Internet Computing 17, 2 (2013), 76–81.
[2]
Paul André, Michael Bernstein, and Kurt Luther. 2012. Who Gives a Tweet?: Evaluating Microblog Content Value. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (CSCW '12). ACM, New York, NY, USA, 471–474.
[3]
Paul André, Aniket Kittur, and Steven P. Dow. 2014. Crowd Synthesis: Extracting Categories and Clusters from Complex Data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '14). ACM, New York, NY, USA, 989–998.
[4]
Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine 36, 1 (2015), 15–24. http://www.aaai.org/ojs/ index.php/aimagazine/article/view/2564
[5]
Kenneth Benoit, Drew Conway, Benjamin E Lauderdale, Michael Laver, and Slava Mikhaylov. 2014. Crowd-sourced text analysis: reproducible and agile production of political data. (2014). Presentation at 3rd annual 'New Directions in Analyzing Text as Data' Conference.
[6]
Anthony Brew, Derek Greene, and Pádraig Cunningham. 2010. Using Crowdsourcing and Active Learning to Track Sentiment in Online Media. In Proceedings of the 2010 Conference on ECAI 2010: 19th European Conference on Artficial Intelligence. IOS Press, Amsterdam, The Netherlands, The Netherlands, 145–150. http://dl.acm.org/citation.cfm?id=1860967.1860997
[7]
Chris Callison-Burch. 2009. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1 (EMNLP '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 286–295. http://dl.acm.org/citation.cfm?id=1699510.1699548
[8]
Timothy Chklovski and Rada Mihalcea. 2003. Exploiting Agreement and Disagreement of Human Annotators for Word Sense Disambiguation. In RANLP '03.
[9]
Ofer Dekel and Ohad Shamir. 2009. Vox Populi: Collecting High-Quality Labels from a Crowd. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT '09).
[10]
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux. 2013. Pick-a-crowd: Tell Me What You Like, and I'll Tell You What to Do. In Proceedings of the 22nd International Conference on World Wide Web (WWW '13). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 367–374. http://dl.acm.org/citation.cfm?id=2488388.2488421
[11]
Julie S. Downs, Mandy B. Holbrook, Steve Sheng, and Lorrie Faith Cranor. 2010. Are Your Participants Gaming the System?: Screening Mechanical Turk Workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 2399–2402.
[12]
Anca Dumitrache. 2015. Crowdsourcing Disagreement for Collecting Semantic Annotation. In The Semantic Web. Latest Advances and New Domains, Fabien Gandon, Marta Sabou, Harald Sack, Claudia d'Amato, Philippe Cudré-Mauroux, and Antoine Zimmermann (Eds.). Lecture Notes in Computer Science, Vol. 9088. Springer International Publishing, 701–710.
[13]
Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating Named Entities in Twitter Data with Crowdsourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 80–88. http://dl.acm.org/citation.cfm?id=1866696.1866709
[14]
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378–382.
[15]
Antonio Foncubierta Rodríguez and Henning Müller. 2012. Ground Truth Generation in Medical Imaging: A Crowdsourcing-based Iterative Approach. In Proceedings of the ACM Multimedia 2012 Workshop on Crowdsourcing for Multimedia (CrowdMM '12). ACM, New York, NY, USA, 9–14.
[16]
Pei-Yun Hsueh, Prem Melville, and Vikas Sindhwani. 2009. Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing (HLT '09). Association for Computational Linguistics, Stroudsburg, PA, USA, 27–35. http://dl.acm.org/citation.cfm?id=1564131.1564137
[17]
Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. 2010. Quality Management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP '10). ACM, New York, NY, USA, 64–67.
[18]
Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar, and Dean Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the Crowd. In Proceedings of COLING 2012: Posters. The COLING 2012 Organizing Committee, Mumbai, India, 539–548. http://www.aclweb.org/anthology/C12-2053
[19]
Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing Step-by-step Information Extraction to Enhance Existing How-to Videos. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 4017–4026.
[20]
Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, New York, NY, USA, 453–456.
[21]
James Q. Knowlton. 1966. On the Definition of "Picture". AV Communication Review 14, 2 (1966), pp. 157–183. http://www.jstor.org/stable/30217297
[22]
Nicholas Kong, Marti A. Hearst, and Maneesh Agrawala. 2014. Extracting References Between Text and Charts via Crowdsourcing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 31–40.
[23]
Balaji Lakshminarayanan and Yee Whye Teh. 2013. Inferring ground truth from multi-annotator ordinal data: a probabilistic approach. ArXiv e-prints (2013). http://adsabs.harvard.edu/abs/2013arXiv1305.0015L
[24]
Kurt Luther, Jari-Lee Tolentino, Wei Wu, Amy Pavel, Brian P. Bailey, Maneesh Agrawala, Björn Hartmann, and Steven P. Dow. 2015. Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 473–485.
[25]
Diana Lynn MacLean and Jeffrey Heer. 2013. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. Journal of the American Medical Informatics Association 20, 6 (2013), 1120–1127.
[26]
Tanushree Mitra, C.J. Hutto, and Eric Gilbert. 2015. Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 1345–1354.
[27]
Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2012. Learning multilingual named entity recognition from Wikipedia. Artficial Intelligence 194 (2012), 151–175.
[28]
Stefanie Nowak and Stefan Rüger. 2010. How Reliable Are Annotations via Crowdsourcing: A Study About Inter-annotator Agreement for Multi-label Image Annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR '10). ACM, New York, NY, USA, 557–566.
[29]
Gabriel Parent and Maxine Eskenazi. 2010. Clustering Dictionary Definitions Using Amazon Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 21–29. http://dl.acm.org/citation.cfm?id=1866696.1866699
[30]
Vikas C Raykar and Shipeng Yu. 2011. Ranking annotators for crowdsourced labeling tasks. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 1809–1817. http://papers.nips.cc/paper/ 4469-ranking-annotators-for-crowdsourced-labeling-tasks. pdf
[31]
Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named Entity Recognition in Tweets: An Experimental Study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '11). Association for Computational Linguistics, Stroudsburg, PA, USA, 1524–1534. http://dl.acm.org/citation.cfm?id=2145432.2145595
[32]
Jeffrey M. Rzeszotarski and Aniket Kittur. 2011. Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 13–22.
[33]
Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 826–838.
[34]
Pao Siangliulue, Kenneth C. Arnold, Krzysztof Z. Gajos, and Steven P. Dow. 2015. Toward Collaborative Ideation at Scale: Leveraging Ideas from Others to Generate More Creative and Diverse Ideas. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15). ACM, New York, NY, USA, 937–945.
[35]
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 254–263. http://dl.acm.org/citation.cfm?id=1613715.1613751
[36]
Qi Su, Dmitry Pavlov, Jyh-Herng Chow, and Wendell C. Baker. 2007. Internet-scale Collection of Human-reviewed Data. In Proceedings of the 16th International Conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 231–240.
[37]
Anna Tordai, Jacco van Ossenbruggen, Guus Schreiber, and Bob Wielinga. 2011. Let's Agree to Disagree: On the Evaluation of Vocabulary Alignment. In Proceedings of the Sixth International Conference on Knowledge Capture (K-CAP '11). ACM, New York, NY, USA, 65–72.
[38]
Robert Voyer, Valerie Nygaard, Will Fitzgerald, and Hannah Copperman. 2010. A Hybrid Model for Annotating Named Entity Training Corpora. In Proceedings of the Fourth Linguistic Annotation Workshop (LAW IV '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 243–246. http://dl.acm.org/citation.cfm?id=1868720.1868759
[39]
Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function. Journal of the American statistical association 58, 301 (1963), 236–244. http://www. tandfonline.com/doi/abs/10.1080/01621459.1963.10500845
[40]
P. Welinder and P. Perona. 2010. Online crowdsourcing: Rating annotators and obtaining cost-effective labels. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. 25–32.
[41]
Janyce M. Wiebe, Rebecca F. Bruce, and Thomas P. O'Hara. 1999. Development and Use of a Gold-standard Data Set for Subjectivity Classfications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL '99). Association for Computational Linguistics, Stroudsburg, PA, USA, 246–253.
[42]
Wesley Willett, Shiry Ginosar, Avital Steinitz, Björn Hartmann, and Maneesh Agrawala. 2013. Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis. IEEE Transactions on Visualization and Computer Graphics 19, 12 (Dec. 2013), 2198–2206.
[43]
Wesley Willett, Jeffrey Heer, and Maneesh Agrawala. 2012. Strategies for Crowdsourcing Social Data Analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, New York, NY, USA, 227–236.

Cited By

View all
  • (2025)A partitioning Monte Carlo approach for consensus tasks in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125559262(125559)Online publication date: Mar-2025
  • (2024)Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care UnitProceedings of the ACM on Human-Computer Interaction10.1145/36870298:CSCW2(1-29)Online publication date: 8-Nov-2024
  • (2024)Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic SystemsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645214(405-418)Online publication date: 18-Mar-2024
  • Show More Cited By
  1. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CSCW '16: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing
    February 2016
    1866 pages
    ISBN:9781450335928
    DOI:10.1145/2818048
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Amazon Mechanical Turk
    2. Crowdsourcing
    3. annotation
    4. clustering
    5. natural language processing
    6. user studies

    Qualifiers

    • Research-article

    Funding Sources

    • DARPA

    Conference

    CSCW '16
    Sponsor:
    CSCW '16: Computer Supported Cooperative Work and Social Computing
    February 27 - March 2, 2016
    California, San Francisco, USA

    Acceptance Rates

    CSCW '16 Paper Acceptance Rate 142 of 571 submissions, 25%;
    Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

    Upcoming Conference

    CSCW '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)197
    • Downloads (Last 6 weeks)26
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A partitioning Monte Carlo approach for consensus tasks in crowdsourcingExpert Systems with Applications10.1016/j.eswa.2024.125559262(125559)Online publication date: Mar-2025
    • (2024)Constructing a Classification Scheme - and its Consequences: A Field Study of Learning to Label Data for Computer Vision in a Hospital Intensive Care UnitProceedings of the ACM on Human-Computer Interaction10.1145/36870298:CSCW2(1-29)Online publication date: 8-Nov-2024
    • (2024)Closing the Knowledge Gap in Designing Data Annotation Interfaces for AI-powered Disaster Management Analytic SystemsProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645214(405-418)Online publication date: 18-Mar-2024
    • (2024)DynamicLabels: Supporting Informed Construction of Machine Learning Label Sets with Crowd FeedbackProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645157(209-228)Online publication date: 18-Mar-2024
    • (2024)Belief Miner: A Methodology for Discovering Causal Beliefs and Causal Illusions from General PopulationsProceedings of the ACM on Human-Computer Interaction10.1145/36372988:CSCW1(1-37)Online publication date: 26-Apr-2024
    • (2024)Find the Bot!: Gamifying Facial Emotion Recognition for Both Human Training and Machine Learning Data CollectionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642880(1-20)Online publication date: 11-May-2024
    • (2024)Transferring Annotator- and Instance-Dependent Transition Matrix for Learning From CrowdsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338820946:11(7377-7391)Online publication date: Nov-2024
    • (2024)Experiments in Modeling DisagreementArtificial Neural Networks in Pattern Recognition10.1007/978-3-031-71602-7_21(245-255)Online publication date: 19-Sep-2024
    • (2023)Crowdsourcing Subjective Annotations Using Pairwise Comparisons Reduces Bias and Error Compared to the Majority-vote MethodProceedings of the ACM on Human-Computer Interaction10.1145/36101837:CSCW2(1-29)Online publication date: 4-Oct-2023
    • (2023)Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus DisagreementProceedings of the ACM on Human-Computer Interaction10.1145/36100747:CSCW2(1-26)Online publication date: 4-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media