Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Attribute augmentation-based label integration for crowdsourcing

Published: 24 December 2022 Publication History

Abstract

Crowdsourcing provides an effective and low-cost way to collect labels from crowd workers. Due to the lack of professional knowledge, the quality of crowdsourced labels is relatively low. A common approach to addressing this issue is to collect multiple labels for each instance from different crowd workers and then a label integration method is used to infer its true label. However, to our knowledge, almost all existing label integration methods merely make use of the original attribute information and do not pay attention to the quality of the multiple noisy label set of each instance. To solve these issues, this paper proposes a novel three-stage label integration method called attribute augmentation-based label integration (AALI). In the first stage, we design an attribute augmentation method to enrich the original attribute space. In the second stage, we develop a filter to single out reliable instances with high-quality multiple noisy label sets. In the third stage, we use majority voting to initialize integrated labels of reliable instances and then use cross-validation to build multiple component classifiers on reliable instances to predict all instances. Experimental results on simulated and real-world crowdsourced datasets demonstrate that AALI outperforms all the other state-of-the-art competitors.

References

[1]
Jiang L, Zhang L, Yu L, and Wang D Class-specific attribute weighted naive Bayes Pattern Recognition 2019 88 321-330
[2]
Dong Y, Jiang L, and Li C Improving data and model quality in crowdsourcing using co-training-based noise correction Information Sciences 2022 583 174-188
[3]
Chen Z, Jiang L, and Li C Label distribution-based noise correction for multiclass crowdsourcing International Journal of Intelligent Systems 2022 37 9 5752-5767
[4]
Zhang N, Xue J, Ma Y, Zhang R, Liang T, and Tan Y A Hybrid sequence-based android malware detection using natural language processing International Journal of Intelligent Systems 2021 36 10 5770-5784
[5]
Hu Y, Ou Z, Xu X, Song M. A crowdsourcing repeated annotations system for visual object detection. In: Proceedings of the 3rd International Conference on Vision, Image and Signal Processing. 2019, 14
[6]
Ocquaye E N N, Mao Q, Xue Y, and Song H Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network International Journal of Intelligent Systems 2021 36 1 53-71
[7]
Sheng V S, Provost F, Ipeirotis P G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622
[8]
Tian T, Zhu J, and You B Max-margin majority voting for learning from crowds IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 41 10 2480-2494
[9]
Sheng V S, Zhang J. Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 9837–9843
[10]
Zhang J Knowledge learning with crowdsourcing: a brief review and systematic perspective IEEE/CAA Journal of Automatica Sinica 2022 9 5 749-762
[11]
Dawid A P and Skene A M Maximum likelihood estimation of observer error-rates using the EM algorithm Journal of the Royal Statistical Society: Series C (Applied Statistics) 1979 28 1 20-28
[12]
Demartini G, Difallah D E, Cudré-Mauroux P. ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478
[13]
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, and Moy L Learning from crowds The Journal of Machine Learning Research 2010 11 1297-1322
[14]
Gemalmaz M A, Yin M. Accounting for confirmation bias in crowdsourced label aggregation. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 1729–1735
[15]
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009, 2035–2043
[16]
Han T, Sun H, Song Y, Fang Y, and Liu X Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing Frontiers of Computer Science 2021 15 4 154315
[17]
Zhang J and Wu X Multi-label truth inference for crowdsourcing using mixture models IEEE Transactions on Knowledge and Data Engineering 2021 33 5 2083-2095
[18]
Rodrigues F, Pereira F C. Deep learning from crowds. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1611–1618
[19]
Guan M Y, Gulshan V, Dai A M, Hinton G E. Who said what: modeling individual labelers improves classification. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 3109–3118
[20]
Atarashi K, Oyama S, Kurihara M. Semi-supervised learning from crowds using deep generative models. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018, 1555–1562
[21]
Li S Y, Huang S J, and Chen S Crowdsourcing aggregation with deep Bayesian learning Science China Information Sciences 2021 64 3 130104
[22]
Sheng V S, Zhang J, Gu B, and Wu X Majority voting and pairing with multiple noisy labeling IEEE Transactions on Knowledge and Data Engineering 2019 31 7 1355-1368
[23]
Tao F, Jiang L, and Li C Label similarity-based weighted soft majority voting and pairing for crowdsourcing Knowledge and Information Systems 2020 62 7 2521-2538
[24]
Tao F, Jiang L, and Li C Differential evolution-based weighted soft majority voting for crowdsourcing Engineering Applications of Artificial Intelligence 2021 106 104474
[25]
Karger D R, Oh S, and Shah D Budget-optimal task allocation for reliable crowdsourcing systems Operations Research 2014 62 1 1-24
[26]
Li H, Yu B. Error rate bounds and iterative weighted majority voting for crowdsourcing. 2014, arXiv preprint arXiv: 1411.4086
[27]
Zhang J, Wu X, and Sheng V S Imbalanced multiple noisy labeling IEEE Transactions on Knowledge and Data Engineering 2015 27 2 489-503
[28]
Zhang J, Sheng V S, Wu J, and Wu X Multi-class ground truth inference in crowdsourcing with clustering IEEE Transactions on Knowledge and Data Engineering 2016 28 4 1080-1085
[29]
Zhang J, Wu M, and Sheng V S Ensemble learning from crowds IEEE Transactions on Knowledge and Data Engineering 2019 31 8 1506-1519
[30]
Jiang L, Zhang H, Tao F, and Li C Learning from crowds with multiple noisy label distribution propagation IEEE Transactions on Neural Networks and Learning Systems 2022 33 11 6558-6568
[31]
Zhang J, Sheng V S, Nicholson B, and Wu X CEKA: a tool for mining the wisdom of crowds The Journal of Machine Learning Research 2015 16 1 2853-2858
[32]
Witten I H, Frank E, and Hall M A Data Mining: Practical Machine Learning Tools and Techniques 2011 3rd ed. Morgan Kaufmann Elsevier
[33]
Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence. 1992, 223–228
[34]
Quinlan J R C4.5: Programs for Machine Learning 1993 San Mateo Morgan Kaufmann Publishers
[35]
le Cessie S and van Houwelingen J C Ridge estimators in logistic regression Journal of the Royal Statistical Society: Series C (Applied Statistics) 1992 41 1 191-201
[36]
Alcala-Fdez J, Fernández A, Luengo J, Derrac J, GarcÃ-a S, Sánchez L, and Herrera H KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework Journal of Multiple-Valued Logic and Soft Computing 2011 17 2–3 255-287
[37]
Demšar J Statistical comparisons of classifiers over multiple data sets The Journal of Machine Learning Research 2006 7 1-30
[38]
Jiang L, Zhang L, Li C, and Wu J A correlation-based feature weighting filter for naive Bayes IEEE Transactions on Knowledge and Data Engineering 2019 31 2 201-213
[39]
Oliva A and Torralba A Modeling the shape of the scene: a holistic representation of the spatial envelope International Journal of Computer Vision 2001 42 3 145-175

Cited By

View all
  • (2024)Learning high-dependence Bayesian network classifier with robust topologyExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122395239:COnline publication date: 1-Apr-2024
  • (2023)Instance Weighting-Based Noise Correction for CrowdsourcingAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4752-2_24(285-297)Online publication date: 10-Aug-2023

Index Terms

  1. Attribute augmentation-based label integration for crowdsourcing
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Frontiers of Computer Science: Selected Publications from Chinese Universities
      Frontiers of Computer Science: Selected Publications from Chinese Universities  Volume 17, Issue 5
      Oct 2023
      232 pages
      ISSN:2095-2228
      EISSN:2095-2236
      Issue’s Table of Contents

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 24 December 2022
      Accepted: 18 August 2022
      Received: 15 April 2022

      Author Tags

      1. crowdsourcing
      2. label integration
      3. attribute augmentation
      4. instance filtering

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Learning high-dependence Bayesian network classifier with robust topologyExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122395239:COnline publication date: 1-Apr-2024
      • (2023)Instance Weighting-Based Noise Correction for CrowdsourcingAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4752-2_24(285-297)Online publication date: 10-Aug-2023

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media