Abstract
This paper considers a category of classification problems where the samples of different classes are not represented equally. It arises in a variety of application areas and has been widely studied in pattern recognition. This paper focuses on enhancing the original data representation by combining the gravitation-based method with multiple empirical kernel approach, this paper proposes a sample level method known as the gravitational balanced multiple kernel learning (GBMKL) method. Our proposed GBMKL method integrates gravity strategy to generate the gravitation balanced midpoint samples (GBMS) located on the classification boundary; meanwhile, the classification boundary can be rectified by the nearest neighbors of the boundary (NNB) samples, which can improve the generalization performance. We further design two regularization terms corresponding to GBMS and NNB to avoid overfitting. In the training and testing process, the samples are mapped into multiple empirical kernel spaces to obtain more sufficient data representation. We conduct extensive computational experiments on 54 imbalanced datasets including both artificial and real-word datasets selected from knowledge extraction based on evolutionary learning datasets, the experimental results reveal interesting insights and show the advantages of the proposed GBMKL approach for dealing with the imbalanced classification problems. In addition, parameter analysis of two regularization terms confirms their positive impacts on the classification performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yuan B, Luo X, Zhang Z, Yu Y, Huo H, Johannes T, Zou X (2021) A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput Appl 33(9):4457–4481. https://doi.org/10.1007/s00521-020-05256-0
Gan D, Shen J, An B, Xu M, Liu N (2020) Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Ind Eng. https://doi.org/10.1016/j.cie.2019.106266
Shone N, Ngoc T, Phai V, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Topics Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI.2017.2772792
Kang Q, Shi L, Zhou M, Wang X, Wu Q, Wei Z (2018) A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans Neural Netw Learn Syst 29(9):4152–4165. https://doi.org/10.1109/TNNLS.2017.2755595
Tao Y, Jiang B, Xue L, Xie C, Zhang Y (2021) Evolutionary synthetic oversampling technique and cocktail ensemble model for warfarin dose prediction with imbalanced data. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05568-1
Borowska K, Stepaniuk J (2019) A rough-granular approach to the imbalanced data classification problem. Appl Soft Comput 83:105607. https://doi.org/10.1016/j.asoc.2019.105607
Bagui S, Li K (2021) Resampling imbalanced data for network intrusion detection datasets. J Big Data 8(1):6. https://doi.org/10.1186/s40537-020-00390-x
Aydogan EK, Ozmen M, Delice Y (2019) Cbr-pso: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems. Neural Comput Appl 31(10):6345–6363. https://doi.org/10.1007/s00521-018-3469-2
Zhang L, Zhang D (2017) Evolutionary cost-sensitive extreme learning machine. IEEE Trans Neural Netw Learn Syst 28(12):3045–3060. https://doi.org/10.1109/TNNLS.2016.2607757
Zhang X, Wang D, Zhou Y, Chen H, Cheng F, Liu M (2019) Kernel modified optimal margin distribution machine for imbalanced data classification. Pattern Recogn Lett 125:325–332. https://doi.org/10.1016/j.patrec.2019.05.005
Gomes H, Barddal J, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):1–36. https://doi.org/10.1145/3054925
Zhu Z, Wang Z, Li D, Du W (2019) Tree-based space partition and merging ensemble learning framework for imbalanced problems. Inf Sci 503:1–22. https://doi.org/10.1016/j.ins.2019.06.033
Yu L, Zhou R, Tang L, Chen R (2018) A dbn-based resampling svm ensemble learning paradigm for credit classification with imbalanced data. Appl Soft Comput 69:192–202. https://doi.org/10.1016/j.asoc.2018.04.049
Peng L, Zhang H, Zhang H, Yang B (2017) A fast feature weighting algorithm of data gravitation classification. Inf Sci 375:54–78. https://doi.org/10.1016/j.ins.2016.09.044
Peng L, Liu Y (2017) Gravitation theory based model for multi-label classification. Int J Comput Commun Control 12(5):689–703. https://doi.org/10.15837/ijccc.2017.5.2926
Meng Z, Zhao Z, Su F (2019) Multi-classification of breast cancer histology images by using gravitation loss. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2019, Brighton, United Kingdom, May 12–17, 2019, IEEE, pp 1030–1034. https://doi.org/10.1109/ICASSP.2019.8683592
Aguilera J, González-Gurrola LC, Montes-y-Gómez M, López R, Escalante HJ (2020) From neighbors to strengths - the k-strongest strengths (kss) classification algorithm. Pattern Recogn Lett 136:301–308. https://doi.org/10.1016/j.patrec.2020.06.020
Teng A, Peng L, Xie Y, Zhang H, Chen Z (2020) Gradient descent evolved imbalanced data gravitation classification with an application on internet video traffic identification. Inf Sci 539:447–460. https://doi.org/10.1016/j.ins.2020.05.141
Wang Z, Li Y, Li D, Zhu Z, Du W (2020) Entropy and gravitation based dynamic radius nearest neighbor classification for imbalanced problem. Knowl-Based Syst 193:105474. https://doi.org/10.1016/j.knosys.2020.105474
Zhu Y, Wang Z, Gao D (2015) Gravitational fixed radius nearest neighbor for imbalanced problem. Knowl-Based Syst 90(C):224–238. https://doi.org/10.1016/j.knosys.2015.09.015
Toksöz MA, Ulusoy I (2017) Hyperspectral image classification via kernel basic thresholding classifier. IEEE Trans Geosci Remote Sens 55(2):715–728. https://doi.org/10.1109/TGRS.2016.2613931
Han M, Zhang S, Xu M, Qiu T, Wang N (2019) Multivariate chaotic time series online prediction based on improved kernel recursive least squares algorithm. IEEE Trans Cybern 49(4):1160–1172. https://doi.org/10.1109/TCYB.2018.2789686
Zhu Z, Wang Z, Li D, Du W, Zhou Y (2020) Multiple partial empirical kernel learning with instance weighting and boundary fitting. Neural Netw 123:26–37. https://doi.org/10.1016/j.neunet.2019.11.019
Gu Y, Liu T, Jia X, Benediktsson J, Chanussot J (2016) Nonlinear multiple kernel learning with multiple-structure-element extended morphological profiles for hyperspectral image classification. IEEE Trans Geosci Remote Sens 54(6):3235–3247. https://doi.org/10.1109/TGRS.2015.2514161
Wang Z, Chen S, Sun T (2007) Multik-mhks: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353. https://doi.org/10.1109/TPAMI.2007.70786
Alcalá-Fdez J, Sánchez L, García S, Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318. https://doi.org/10.1007/s00500-008-0323-y
Aiolli F, Donini M (2015) Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224. https://doi.org/10.1016/j.neucom.2014.11.078
Iranmehr A, Masnadi-Shirazi H, Vasconcelos N (2019) Cost-sensitive support vector machines. Neurocomputing 343:50–64. https://doi.org/10.1016/j.neucom.2018.11.099
Weston J, Collobert R, Sinz F, Bottou L, Vapnik V (2006) Inference with the universum. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 1009–1016. https://doi.org/10.1145/1143844.1143971
Liu X, Wu J, Zhou Z (2009) Exploratory undersampling for class imbalance learning. IEEE Trans Syst Man Cybern Part B 39(2):539–550. https://doi.org/10.1109/TSMCB.2008.2007853
Zhang C, Tan KC, Li H, Hong GS (2019) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 30(1):109–122. https://doi.org/10.1109/TNNLS.2018.2832648
Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 261:70–82. https://doi.org/10.1016/j.neucom.2016.09.120
Acknowledgements
This work is supported by Shanghai Science and Technology Program “Distributed and generative few-shot algorithm and theory research” under Grant No. 20511100600, Natural Science Foundation of China under Grant No. 62076094, Shanghai Science and Technology Program “Federated based cross-domain and cross-task incremental learning” under Grant No. 21511100800, Natural Science Foundations of China under Grant No. 61806078, and National Science Foundation of China for Distinguished Young Scholars under Grant No. 61725301.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that the manuscript has been approved by all authors for publication, and no conflict of interest exits in the submission of it.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, M., Wang, Z., Li, Y. et al. Gravitation balanced multiple kernel learning for imbalanced classification. Neural Comput & Applic 34, 13807–13823 (2022). https://doi.org/10.1007/s00521-022-07187-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07187-4