Abstract
The problem of multi-label classification is widespread in real life, and its imbalanced characteristics seriously affect classification performance. Currently, resampling methods can be used to solve the problem of imbalanced classification of multi-label data. However, resampling methods ignore the correlation between labels, which may pull in new imbalance while changing the distribution of the original dataset, resulting in a decrease in classification performance instead of an increase. In addition, the resampling ratio needs to be manually set, resulting in significant fluctuations in classification performance. To address this issue, a multi-label imbalanced data classification method ESP based on label partition integration is proposed. ESP divides the dataset into single label datasets and label pair datasets without changing its original distribution, and then learns each dataset to construct multiple binary classification models. Finally, all binary classification models are integrated into a multi-label classification model. The experimental results show that ESP outperforms the five commonly used resampling methods in two common measures: F-Measure and Accuracy.
Supported by the Fundamental Research Funds for the Central Universities under Grant No.2021QN1075.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ai, X., Jian, W., Sheng, V.S., Yao, Y., Cui, Z.: Best first over-sampling for multilabel classification. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1803–1806 (2015)
Almeida, T.B., Borges, H.B.: An adaptation of the ML-kNN algorithm to predict the number of classes in hierarchical multi-label classification. In: Torra, V., Narukawa, Y., Honda, A., Inoue, S. (eds.) MDAI 2017. LNCS (LNAI), vol. 10571, pp. 77–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67422-3_8
Bhattacharya, S., Rajan, V., Shrivastava, H.: ICU mortality prediction: a classification algorithm for imbalanced datasets. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1288–1294. AAAI Press (2017)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Charte, F., Rivera, A.J., Del Jesus, M.J., Herrera, F.: MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) HAIS 2013. LNCS (LNAI), vol. 8073, pp. 150–160. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40846-5_16
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: Resampling multilabel datasets by decoupling highly imbalanced labels. In: Onieva, E., Santos, I., Osaba, E., Quintián, H., Corchado, E. (eds.) HAIS 2015. LNCS (LNAI), vol. 9121, pp. 489–501. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19644-2_41
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: MLeNN: a first approach to heuristic multilabel undersampling. In: Corchado, E., Lozano, J.A., Quintián, H., Yin, H. (eds.) IDEAL 2014. LNCS, vol. 8669, pp. 1–9. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10840-7_1
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Chen, L., Fu, Y., Chen, N., Ye, J., Liu, G.: Rule reduction for EBRB classification based on clustering. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 442–454. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_38
Chen, P.H., Fan, R.E., Lin, C.J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17(4), 893–908 (2006)
Elisseeff, A.E., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 681–687 (2001)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Liu, B., Tsoumakas, G.: Making classifier chains resilient to class imbalance. In: Proceedings of The 10th Asian Conference on Machine Learning, ACML 2018, Beijing, China, 14–16 November 2018. Proceedings of Machine Learning Research, vol. 95, pp. 280–295. PMLR (2018)
Nguyen, T.T., Nguyen, T.T.T., Luong, A.V., Nguyen, Q.V.H., Liew, A.W.C., Stantic, B.: Multi-label classification via label correlation and first order feature dependance in a data stream. Pattern Recogn. 90, 35–51 (2019)
Pereira, R.M., Costa, Y.M., Silla, C.N., Jr.: MLTL: a multi-label approach for the Tomek Link undersampling algorithm. Neurocomputing 383, 95–105 (2020)
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 3738–3750 (2012)
Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021)
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. SMC-6, 769–772 (1976)
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. SMC-2 (1972)
Yu, G., Domeniconi, C., Rangwala, H., Zhang, G., Yu, Z.: Transductive multi-label ensemble classification for protein function prediction. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1077–1085 (2012)
Zakaryazad, A., Duman, E.: A profit-driven artificial neural network (ANN) with applications to fraud detection and direct marketing. Neurocomputing 175, 121–131 (2016)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014)
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, vol. 2, pp. 718–721 (2005)
Zhang, W.B., Pincus, Z.: Predicting all-cause mortality from basic physiology in the Framingham heart study. Aging Cell 12, 39–48 (2016)
Zhong, W., Raahemi, B., Liu, J.: Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream. Peer-to-Peer Netw. Appl. 6(3), 233–246 (2013)
Zhu, X.: Semi-supervised Learning Literature Survey. University of Wisconsin-Madison (2008)
Zhu, Y., Kwok, J.T., Zhou, Z.H.: Multi-label learning with global and local label correlation. IEEE Trans. Knowl. Data Eng. 30, 1081–1094 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Diao, Y., Sun, Z., Zhou, Y. (2023). A Multi-label Imbalanced Data Classification Method Based on Label Partition Integration. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds) Web Information Systems and Applications. WISA 2023. Lecture Notes in Computer Science, vol 14094. Springer, Singapore. https://doi.org/10.1007/978-981-99-6222-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-6222-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6221-1
Online ISBN: 978-981-99-6222-8
eBook Packages: Computer ScienceComputer Science (R0)