Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14094))

Included in the following conference series:

Abstract

The problem of multi-label classification is widespread in real life, and its imbalanced characteristics seriously affect classification performance. Currently, resampling methods can be used to solve the problem of imbalanced classification of multi-label data. However, resampling methods ignore the correlation between labels, which may pull in new imbalance while changing the distribution of the original dataset, resulting in a decrease in classification performance instead of an increase. In addition, the resampling ratio needs to be manually set, resulting in significant fluctuations in classification performance. To address this issue, a multi-label imbalanced data classification method ESP based on label partition integration is proposed. ESP divides the dataset into single label datasets and label pair datasets without changing its original distribution, and then learns each dataset to construct multiple binary classification models. Finally, all binary classification models are integrated into a multi-label classification model. The experimental results show that ESP outperforms the five commonly used resampling methods in two common measures: F-Measure and Accuracy.

Supported by the Fundamental Research Funds for the Central Universities under Grant No.2021QN1075.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ai, X., Jian, W., Sheng, V.S., Yao, Y., Cui, Z.: Best first over-sampling for multilabel classification. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1803–1806 (2015)

    Google Scholar 

  2. Almeida, T.B., Borges, H.B.: An adaptation of the ML-kNN algorithm to predict the number of classes in hierarchical multi-label classification. In: Torra, V., Narukawa, Y., Honda, A., Inoue, S. (eds.) MDAI 2017. LNCS (LNAI), vol. 10571, pp. 77–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67422-3_8

    Chapter  Google Scholar 

  3. Bhattacharya, S., Rajan, V., Shrivastava, H.: ICU mortality prediction: a classification algorithm for imbalanced datasets. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1288–1294. AAAI Press (2017)

    Google Scholar 

  4. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  5. Charte, F., Rivera, A.J., Del Jesus, M.J., Herrera, F.: MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)

    Article  Google Scholar 

  6. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS (LNAI), vol. 8073, pp. 150–160. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40846-5_16

    Chapter  Google Scholar 

  7. Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: Resampling multilabel datasets by decoupling highly imbalanced labels. In: Onieva, E., Santos, I., Osaba, E., Quintián, H., Corchado, E. (eds.) HAIS 2015. LNCS (LNAI), vol. 9121, pp. 489–501. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19644-2_41

    Chapter  Google Scholar 

  8. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)

    Article  Google Scholar 

  9. Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: MLeNN: a first approach to heuristic multilabel undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 1–9. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10840-7_1

    Chapter  Google Scholar 

  10. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    Article  MATH  Google Scholar 

  11. Chen, L., Fu, Y., Chen, N., Ye, J., Liu, G.: Rule reduction for EBRB classification based on clustering. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 442–454. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_38

    Chapter  Google Scholar 

  12. Chen, P.H., Fan, R.E., Lin, C.J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17(4), 893–908 (2006)

    Article  Google Scholar 

  13. Elisseeff, A.E., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 681–687 (2001)

    Google Scholar 

  14. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  15. Liu, B., Tsoumakas, G.: Making classifier chains resilient to class imbalance. In: Proceedings of The 10th Asian Conference on Machine Learning, ACML 2018, Beijing, China, 14–16 November 2018. Proceedings of Machine Learning Research, vol. 95, pp. 280–295. PMLR (2018)

    Google Scholar 

  16. Nguyen, T.T., Nguyen, T.T.T., Luong, A.V., Nguyen, Q.V.H., Liew, A.W.C., Stantic, B.: Multi-label classification via label correlation and first order feature dependance in a data stream. Pattern Recogn. 90, 35–51 (2019)

    Article  Google Scholar 

  17. Pereira, R.M., Costa, Y.M., Silla, C.N., Jr.: MLTL: a multi-label approach for the Tomek Link undersampling algorithm. Neurocomputing 383, 95–105 (2020)

    Article  Google Scholar 

  18. Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 3738–3750 (2012)

    Article  Google Scholar 

  19. Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021)

    Article  Google Scholar 

  20. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. SMC-6, 769–772 (1976)

    Google Scholar 

  21. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  22. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. SMC-2 (1972)

    Google Scholar 

  23. Yu, G., Domeniconi, C., Rangwala, H., Zhang, G., Yu, Z.: Transductive multi-label ensemble classification for protein function prediction. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1077–1085 (2012)

    Google Scholar 

  24. Zakaryazad, A., Duman, E.: A profit-driven artificial neural network (ANN) with applications to fraud detection and direct marketing. Neurocomputing 175, 121–131 (2016)

    Article  Google Scholar 

  25. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014)

    Article  Google Scholar 

  26. Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721 (2005)

    Google Scholar 

  27. Zhang, W.B., Pincus, Z.: Predicting all-cause mortality from basic physiology in the Framingham heart study. Aging Cell 12, 39–48 (2016)

    Article  Google Scholar 

  28. Zhong, W., Raahemi, B., Liu, J.: Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream. Peer-to-Peer Netw. Appl. 6(3), 233–246 (2013)

    Article  Google Scholar 

  29. Zhu, X.: Semi-supervised Learning Literature Survey. University of Wisconsin-Madison (2008)

    Google Scholar 

  30. Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30, 1081–1094 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongbin Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diao, Y., Sun, Z., Zhou, Y. (2023). A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds) Web Information Systems and Applications. WISA 2023. Lecture Notes in Computer Science, vol 14094. Springer, Singapore. https://doi.org/10.1007/978-981-99-6222-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6222-8_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6221-1

  • Online ISBN: 978-981-99-6222-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics