Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2684822.2685301acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Leveraging In-Batch Annotation Bias for Crowdsourced Active Learning

Published: 02 February 2015 Publication History

Abstract

Data annotation bias is found in many situations. Often it can be ignored as just another component of the noise floor. However, it is especially prevalent in crowdsourcing tasks and must be actively managed. Annotation bias on single data items has been studied with regard to data difficulty, annotator bias, etc., while annotation bias on batches of multiple data items simultaneously presented to annotators has not been studied. In this paper, we verify the existence of "in-batch annotation bias" between data items in the same batch. We propose a factor graph based batch annotation model to quantitatively capture the in-batch annotation bias, and measure the bias during a crowdsourcing annotation process of inappropriate comments in LinkedIn. We discover that annotators tend to make polarized annotations for the entire batch of data items in our task. We further leverage the batch annotation model to propose a novel batch active learning algorithm. We test the algorithm on a real crowdsourcing platform and find that it outperforms in-batch bias naïve algorithms.

References

[1]
K. Brinker. Incorporating diversity in active learning with support vector machines. In ICML, volume 3, pages 59--66, 2003.
[2]
B. Carterette and I. Soboroff. The effect of assessor error on ir system evaluation. In SIGIR, pages 539--546. ACM, 2010.
[3]
R. Chattopadhyay, Z. Wang, W. Fan, I. Davidson, S. Panchanathan, and J. Ye. Batch mode active sampling based on marginal probability distribution matching. In KDD, pages 741--749. ACM, 2012.
[4]
T. Cohn and L. Specia. Modelling annotator bias with multi-task gaussian processes: An application to machine translation quality estimation. In ACL, pages 32--42, 2013.
[5]
A. Das, S. Gollapudi, R. Panigrahy, and M. Salek. Debiasing social wisdom. In KDD, pages 500--508. ACM, 2013.
[6]
T. Demeester, R. Aly, D. Hiemstra, D. Nguyen, D. Trieschnigg, and C. Develder. Exploiting user disagreement for web search evaluation: an experimental approach. In WSDM, pages 33--42. ACM, 2014.
[7]
P. Donmez, J. G. Carbonell, and J. Schneider. Efficiently learning the accuracy of labeling sources for selective sampling. In KDD, pages 259--268. ACM, 2009.
[8]
R. G. Gomes, P. Welinder, A. Krause, and P. Perona. Crowdclustering. In NIPS, pages 558--566, 2011.
[9]
Y. Guo and D. Schuurmans. Discriminative batch mode active learning. In NIPS, pages 593--600, 2008.
[10]
S. C. Hoi, R. Jin, and M. R. Lyu. Large-scale text categorization by batch mode active learning. In WWW, pages 633--642, 2006.
[11]
S. C. Hoi, R. Jin, J. Zhu, and M. R. Lyu. Batch mode active learning and its application to medical image classification. In ICML, pages 417--424. ACM, 2006.
[12]
H. Kajino, Y. Tsuboi, and H. Kashima. A convex formulation for learning from crowds. In AAAI, 2012.
[13]
M. C. Mozer, H. Pashler, M. Wilder, R. V. Lindsey, M. C. Jones, and M. N. Jones. Decontaminating human judgments by removing sequential dependencies. NIPS, 23, 2010.
[14]
V. C. Raykar and S. Yu. Ranking annotators for crowdsourced labeling tasks. In NIPS, pages 1809--1817, 2011.
[15]
V. C. Raykar, S. Yu, L. H. Zhao, A. Jerebko, C. Florin, G. H. Valadez, L. Bogoni, and L. Moy. Supervised learning from multiple experts: whom to trust when everyone lies a bit. In ICML, pages 889--896. ACM, 2009.
[16]
V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. JMLR, 11:1297--1322, 2010.
[17]
F. Scholer, D. Kelly, W.-C. Wu, H. S. Lee, and W. Webber. The effect of threshold priming and need for cognition on relevance calibration and assessment. In SIGIR, pages 623--632. ACM, 2013.
[18]
F. Scholer, A. Turpin, and M. Sanderson. Quantifying test collection quality based on the consistency of relevance judgements. In SIGIR, pages 1063--1072. ACM, 2011.
[19]
B. Settles. Active learning literature survey. University of Wisconsin, Madison, 52:55--66, 2010.
[20]
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, pages 614--622. ACM, 2008.
[21]
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast|but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP, pages 254--263, 2008.
[22]
H. Su, J. Deng, and L. Fei-Fei. Crowdsourcing annotations for visual object detection. In Human Computation Workshops at AAAI, 2012.
[23]
M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, pages 155--164, 2014.
[24]
S. Vijayanarasimhan and K. Grauman. Large-scale live active learning: Training object detectors with crawled data and crowds. In CVPR, pages 1449--1456. IEEE, 2011.
[25]
Z. Wang and J. Ye. Querying discriminative and representative samples for batch mode active learning. In KDD, pages 158--166. ACM, 2013.
[26]
F. L. Wauthier and M. I. Jordan. Bayesian bias mitigation for crowdsourcing. In NIPS, pages 1800--1808, 2011.
[27]
P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In NIPS, pages 2424--2432, 2010.
[28]
J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009.
[29]
Y. Yan, G. M. Fung, R. Rosales, and J. G. Dy. Active learning from crowds. In ICML, pages 1161--1168, 2011.
[30]
K. Yu, J. Bi, and V. Tresp. Active learning via transductive experimental design. In ICML, pages 1081--1088. ACM, 2006.

Cited By

View all
  • (2024)Detecting Oncoming Vehicles at Night in Urban Scenarios - An Annotation Proof-Of-Concept2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588522(2117-2124)Online publication date: 2-Jun-2024
  • (2024)Balancing the Scales: Enhancing Fairness in Facial Emotion Recognition with Latent AlignmentPattern Recognition10.1007/978-3-031-78354-8_8(113-128)Online publication date: 4-Dec-2024
  • (2023)Causal Structure Learning of Bias for Fair Affect Recognition2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW58289.2023.00038(340-349)Online publication date: Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining
February 2015
482 pages
ISBN:9781450333177
DOI:10.1145/2684822
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2015

Check for updates

Author Tags

  1. active learning
  2. annotation bias
  3. crowdsourcing

Qualifiers

  • Research-article

Conference

WSDM 2015

Acceptance Rates

WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)212
  • Downloads (Last 6 weeks)19
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting Oncoming Vehicles at Night in Urban Scenarios - An Annotation Proof-Of-Concept2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588522(2117-2124)Online publication date: 2-Jun-2024
  • (2024)Balancing the Scales: Enhancing Fairness in Facial Emotion Recognition with Latent AlignmentPattern Recognition10.1007/978-3-031-78354-8_8(113-128)Online publication date: 4-Dec-2024
  • (2023)Causal Structure Learning of Bias for Fair Affect Recognition2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW58289.2023.00038(340-349)Online publication date: Jan-2023
  • (2022)Mitigating Observation Biases in Crowdsourced Label Aggregation2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956439(1171-1177)Online publication date: 21-Aug-2022
  • (2022)Cost-effective crowdsourced join queries for entity resolution without prior knowledgeFuture Generation Computer Systems10.1016/j.future.2021.09.008127:C(240-251)Online publication date: 1-Feb-2022
  • (2021)Language Translation as a Socio-Technical System:Case-Studies of Mixed-Initiative InteractionsProceedings of the 4th ACM SIGCAS Conference on Computing and Sustainable Societies10.1145/3460112.3471954(156-172)Online publication date: 28-Jun-2021
  • (2021)Understanding and Mitigating Annotation Bias in Facial Expression Recognition2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.01471(14960-14971)Online publication date: Oct-2021
  • (2020)Between Subjectivity and ImpositionProceedings of the ACM on Human-Computer Interaction10.1145/34151864:CSCW2(1-25)Online publication date: 15-Oct-2020
  • (2019)Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective JudgmentsProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300637(1-12)Online publication date: 2-May-2019
  • (2018)Cognitive Biases in CrowdsourcingProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159654(162-170)Online publication date: 2-Feb-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media