Abstract
Multilabel classification has excelled in several distinct fields during the past few decades but still has significant limitations. One of the critical concerns is the lack or insufficient availability of label instances, and data labelling also needs time and budget, which is a challenge. Crowdsourcing overcomes the problem of label availability, yet, it has drawbacks such as label quality and budget limitations. The paper introduced the multilabel reverse auction framework to address the lack of crowd worker's issue. Each crowd worker must provide cost and confidence for each task for a specific domain. Furthermore, two methods for systematic budget selection are presented to address the insufficient domain coverage within the budget limitation: Greedy bid selection and Multi cover bid selection. Both approaches choose the most inexpensive crowd workers while considering worker expertise and domain coverage. Crowd version binary relevance and multilabel k-nearest neighbours are also introduced to support label aggregation and reduce low-quality workers' impact while considering the domain. An experimental study shows the effectiveness of our approach on seven multilabel datasets using diverse crowds. It delivers more than 16% improvement compared to the random selection with a majority voting baseline technique. The proposed method is compared against five benchmark algorithms and provides promising results when minimal availability of data and workers.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in an extensive experimental comparison of methods for multilabel learning.
References
Howe J et al (2006) The rise of crowdsourcing. Wired magazine 14:1–4
LaToza TD, van der Hoek A (2016) Crowdsourcing in software engineering: models, motivations, and challenges. IEEE Softw 33:74–80. https://doi.org/10.1109/MS.2016.12
Lease M, Yilmaz E (2012) Crowdsourcing for information retrieval. ACM SIGIR Forum 45:66–75. https://doi.org/10.1145/2093346.2093356
Muller CL, Chapman L, Johnston S, Kidd C, Illingworth S, Foody G, Overeem A, Leigh RR (2015) Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int J Climatol 35:3185–3203. https://doi.org/10.1002/joc.4210
Xu Z, Liu Y, Yen NY, Mei L, Luo X, Wei X, Hu C (2020) Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans Cloud Comput 8:387–397. https://doi.org/10.1109/TCC.2016.2517638
Mohammadzadeh H, Gharehchopogh FS (2021) A multi-agent system based for solving high-dimensional optimization problems: a case study on email spam detection. Int J Commun Syst. https://doi.org/10.1002/dac.4670
Vuurens J, de Vries AP, Eickhoff C How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy
Zhong J, Tang K, Zhou Z-H Active Learning from Crowds with Unsure Option
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multilabel document classification. Mach Learn 88:157–208. https://doi.org/10.1007/s10994-011-5272-5
Gharehchopogh FS, Namazi M, Ebrahimi L, Abdollahzadeh B (2023) Advances in sparrow search algorithm: a comprehensive survey. Arch Comput Methods Eng 30:427–455. https://doi.org/10.1007/s11831-022-09804-w
Gharehchopogh FS, Ucan A, Ibrikci T, Arasteh B, Isik G (2023) Slime mould algorithm: a comprehensive survey of its variants and applications. Arch Comput Methods Eng 30:2683–2723. https://doi.org/10.1007/s11831-023-09883-3
Shen Y, Zhang C, Soleimanian Gharehchopogh F, Mirjalili S (2023) An improved whale optimization algorithm based on multi-population evolution for global optimization and engineering design problems. Expert Syst Appl 215:119269. https://doi.org/10.1016/j.eswa.2022.119269
Suyal H, Singh A (2021) Improving multilabel classification in prototype selection scenario. Comput Intell Healthcare Inf 103–119
Rabby G, Berka P (2022) Multi-class classification of COVID-19 documents using machine learning algorithms. J Intell Inf Syst. https://doi.org/10.1007/s10844-022-00768-8
Lo H-Y, Wang J-C, Wang H-M, Lin S-D (2011) Cost-sensitive multilabel learning for audio tag annotation and retrieval. IEEE Trans Multimedia 13:518–529. https://doi.org/10.1109/TMM.2011.2129498
Gharehchopogh FS (2023) An improved Harris Hawks optimization algorithm with multi-strategy for community detection in social network. J Bionic Eng 20:1175–1197. https://doi.org/10.1007/s42235-022-00303-z
Tsoumakas G, Katakis I (2007) Multi-label classification. Int J Data Warehouse Min 3:1–13. https://doi.org/10.4018/jdwm.2007070101
Lughofer E (2022) Evolving multilabel fuzzy classifier. Inf Sci 597:1–23. https://doi.org/10.1016/j.ins.2022.03.045
Mishra NK, Singh PK (2022) Linear ordering problem based classifier chain using genetic algorithm for multilabel classification. Appl Soft Comput 117:108395
Loza Mencía E, Park S-H, Fürnkranz J (2010) Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73:1164–1176. https://doi.org/10.1016/j.neucom.2009.11.024
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I (2011) Multilabel classification of music by emotion. EURASIP J Audio Speech Music Process 2011:4. https://doi.org/10.1186/1687-4722-2011-426793
Yap XH, Raymer M (2021) Multilabel classification and label dependence in in silico toxicity prediction. Toxicol Vitro 74:105157. https://doi.org/10.1016/j.tiv.2021.105157
Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multilabel classification. IEEE Trans Knowl Data Eng 28:3309–3323. https://doi.org/10.1109/TKDE.2016.2608339
Zhao T, Zhang Y, Miao D, Pedrycz W (2022) Selective label enhancement for multilabel classification based on three-way decisions. Int J Approximate Reason 150:172–187. https://doi.org/10.1016/j.ijar.2022.08.008
Zhu X, Li J, Ren J, Wang J, Wang G (2023) Dynamic ensemble learning for multilabel classification. Inf Sci 623:94–111. https://doi.org/10.1016/j.ins.2022.12.022
Li G, Wang J, Zheng Y, Franklin MJ (2016) Crowdsourced Data Management: a Survey. IEEE Trans Knowl Data Eng 28:2296–2319. https://doi.org/10.1109/TKDE.2016.2535242
Tong Y, Zhou Z, Zeng Y, Chen L, Shahabi C (2020) Spatial crowdsourcing: a survey. VLDB J 29:217–250. https://doi.org/10.1007/s00778-019-00568-7
Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17:76–81. https://doi.org/10.1109/MIC.2013.20
Yadav A, Mishra S, Sairam AS (2022) A multi-objective worker selection scheme in crowdsourced platforms using NSGA-II. Expert Syst Appl 201:116991. https://doi.org/10.1016/j.eswa.2022.116991
Wu G, Chen Z, Liu J, Han D, Qiao B (2021) Task assignment for social-oriented crowdsourcing. Front Comput Sci 15:152316. https://doi.org/10.1007/s11704-019-9119-8
Abdullah NA, Rahman MM, Rahman MdM, Ghauth KI (2020) A Framework for optimal worker selection in spatial crowdsourcing using Bayesian network. IEEE Access 8:120218–120233. https://doi.org/10.1109/ACCESS.2020.3005543
Hu Q, He Q, Huang H, Chiew K, Liu Z (2016) A formalized framework for incorporating expert labels in crowdsourcing environment. J Intell Inf Syst 47:403–425. https://doi.org/10.1007/s10844-015-0371-6
Wang Y, Gao Y, Li Y, Tong X (2020) A worker-selection incentive mechanism for optimizing platform-centric mobile crowdsourcing systems. Comput Networks 171:107144. https://doi.org/10.1016/j.comnet.2020.107144
Dang D, Liu Y, Zhang X, Huang S (2016) A crowdsourcing worker quality evaluation algorithm on mapreduce for big data applications. IEEE Trans Parallel Distrib Syst 27:1879–1888. https://doi.org/10.1109/TPDS.2015.2457924
Fang Y, Sun H, Li G, Zhang R, Huai J (2018) Context-aware result inference in crowdsourcing. Inf Sci 460–461:346–363. https://doi.org/10.1016/j.ins.2018.05.050
Yuen M-C, King I, Leung K-S (2021) Temporal context-aware task recommendation in crowdsourcing systems. Knowl Based Syst 219:106770. https://doi.org/10.1016/j.knosys.2021.106770
Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic Model Based Multilabel Classification. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 996–1003
Davtyan M, Eickhoff C, Hofmann T (2015) Exploiting document content for efficient aggregation of crowdsourcing votes. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 783–790
Zhang J, Wu M, Zhou C, Sheng VS (2022) Active crowdsourcing for multilabel annotation. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3194022
Gui X, Lu X, Yu G (2021) Cost-effective batch-mode multilabel active learning. Neurocomputing 463:355–367
Li S-Y, Jiang Y, Chawla NV, Zhou Z-H (2019) Multilabel Learning from Crowds. IEEE Trans Knowl Data Eng 31:1369–1382. https://doi.org/10.1109/TKDE.2018.2857766
Chen Z, Jiang L, Li C (2022) Label augmented and weighted majority voting for crowdsourcing. Inf Sci 606:397–409. https://doi.org/10.1016/j.ins.2022.05.066
Yu G, Tu J, Wang J, Domeniconi C, Zhang X (2021) Active multilabel crowd consensus. IEEE Trans Neural Netw Learn Syst 32:1448–1459. https://doi.org/10.1109/TNNLS.2020.2984729
Adamska P, Juźwin M, Wierzbicki A (2020) Picking peaches or squeezing lemons: selecting crowdsourcing workers for reducing cost of redundancy. pp 510–523
Haruna CR, Hou M, Eghan MJ, Kpiebaareh MY, Tandoh L (2019) An effective and cost-based framework for a qualitative hybrid data deduplication. pp 511–520
Shen S, Ji M, Wu Z, Yang X (2022) An optimization approach for worker selection in crowdsourcing systems. Comput Ind Eng 173:108730. https://doi.org/10.1016/j.cie.2022.108730
Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds. In: Proceedings of the 24th annual ACM symposium on User interface software and technology-UIST '11. ACM Press, New York, p 33
Itoh Y, Matsubara S (2021) Adaptive budget allocation for cooperative task solving in crowdsourcing. In: 2021 IEEE international conference on big data (big data). IEEE, pp 3525–3533
Gao H, Liu CH, Tang J, Yang D, Hui P, Wang W (2019) Online quality-aware incentive mechanism for mobile crowd sensing with extra bonus. IEEE Trans Mob Comput 18:2589–2603. https://doi.org/10.1109/TMC.2018.2877459
Vazirani VV (2001) Approximation algorithms. Springer, Berlin
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37:1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009
Zhang M-L, Zhou Z-H (2007) ML-KNN: A lazy learning approach to multilabel learning. Pattern Recognit 40:2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019
Kim H-C, Ghahramani Z (2012) Bayesian classifier combination. In: Artificial Intelligence and Statistics. pp 619–627
Kim H, Ghahramani Z (2003) The EM-EP algorithm for Gaussian process classification. In: Proceedings of the workshop on probabilistic graphical models for classification at ECML
Kwok JT-Y (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10:1018–1031. https://doi.org/10.1109/72.788642
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multilabel learning. Pattern Recognit 45:3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, volume 2. Wiley, Hoboken
Acknowledgements
Not Applicable.
Funding
For this research, the authors do not take any funding.
Author information
Authors and Affiliations
Contributions
Both authors contributed equally.
Corresponding author
Ethics declarations
Conflict of interest
Conflict of interest on behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical approval
Not Applicable.
Consent to participate
Not Applicable.
Consent for publication
Submissions have not been previously published, and all co-authors agree to publish.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Suyal, H., Singh, A. Multilabel classification using crowdsourcing under budget constraints. Knowl Inf Syst 66, 841–877 (2024). https://doi.org/10.1007/s10115-023-01973-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-01973-9