research-article

An efficient ensemble learning method based on multi-objective feature selection

Authors:

Chunhua YangAuthors Info & Claims

Volume 679, Issue C

https://doi.org/10.1016/j.ins.2024.121084

Published: 19 September 2024 Publication History

Abstract

Ensemble learning (EL) boosts model prediction performance across various domains through two main steps: generating individual classifiers (ICs) and combining them. Creating accurate and diverse ICs is crucial for a strong ensemble, while selecting the best ICs, known as ensemble selection (ES), is critical yet challenging due to the accuracy-diversity trade-off and the lack of agreed-upon diversity metrics. This paper introduces an EL strategy that uses multi-objective feature selection (MOFS) and a feature relevance-guided selection to tackle these challenges. Our approach uses a hybrid MOFS algorithm to produce accurate and diverse ICs, and then it employs a novel knowledge-based feature-relevance-guided metric for precise diversity assessment during ES. The ES issue is cast as an optimization problem, aiming to maximize both diversity and accuracy, and an efficient ES algorithm is developed to select optimal ICs. Extensive tests on public datasets and a real-world prediction task demonstrate the effectiveness of our method, especially in achieving high accuracy.

References

[1]

H. Yu, Q. Dai, AE-DIL: a double incremental learning algorithm for non-stationary time series prediction via adaptive ensemble, Inf. Sci. 636 (2023).

[2]

W. Dai, J. Liu, L. Wang, Cloud ensemble learning for fault diagnosis of rolling bearings with stochastic configuration networks, Inf. Sci. 658 (2024).

[3]

O. Sagi, L. Rokach, Ensemble learning: a survey, Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8 (2018).

[4]

X. Zhou, J. He, C. Yang, An ensemble learning method based on deep neural network and group decision making, Knowl.-Based Syst. 239 (2022).

[5]

Y. Xu, Z. Yu, C.L.P. Chen, W. Cao, A novel classifier ensemble method based on subspace enhancement for high-dimensional data classification, IEEE Trans. Knowl. Data Eng. 35 (2023) 16–30.

[6]

M.H.L. Louk, B.A. Tama, Dual-IDS: a bagging-based gradient boosting decision tree model for network anomaly intrusion detection system, Expert Syst. Appl. 213 (2023).

Digital Library

[7]

Y. Xu, P. Huang, Semi-supervised text classification based on ensemble learning through optimized sampling, J. Chin. Inf. Process. 31 (2017) 180–189.

[8]

K. Khosravi, A. Golkarian, A.M. Melesse, R.C. Deo, Suspended sediment load modeling using advanced hybrid rotation forest based elastic network approach, J. Hydrol. 610 (2022).

[9]

S. Martiello Mastelini, F.K. Nakano, C. Vens, A.C.P. de Leon Ferreira de Carvalho, Online extra trees regressor, IEEE Trans. Neural Netw. Learn. Syst. 34 (2023) 6755–6767.

[10]

G. Stiglic, J.J. Rodriguez, P. Kokol, Rotation of random forests for genomic and proteomic classification problems, Softw. Tools Algorithms Biol. Syst. (2011) 211–221.

[11]

M.F. Amasyali, O.K. Ersoy, Classifier ensembles with the extended space forest, IEEE Trans. Knowl. Data Eng. 26 (2013) 549–562.

[12]

C. Zhang, J. Zhang, A novel method for constructing ensemble classifiers, Stat. Comput. 19 (2009) 317–327.

[13]

E.J. Ferreira, A.C. Delbem, R.A.F. Romero, O.N. Oliveira, Random subspaces of the instance and principal component spaces for ensembles, in: Proc. Int. Joint Conf. Neural Netw., IEEE, 2009, pp. 816–819.

[14]

Y. Xu, Z. Yu, W. Cao, C.P. Chen, J. You, Adaptive classifier ensemble method based on spatial perception for high-dimensional data classification, IEEE Trans. Knowl. Data Eng. 33 (2019) 2847–2862.

[15]

J. Xia, P. Ghamisi, N. Yokoya, A. Iwasaki, Random forest ensembles and extended multiextinction profiles for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 56 (2017) 202–216.

[16]

Z. Yu, D. Wang, Z. Zhao, C.P. Chen, J. You, H. Wong, J. Zhang, Hybrid incremental ensemble learning for noisy real-world data classification, IEEE Trans. Cybern. 49 (2017) 403–416.

[17]

Y. Wei, X. Wang, W. Guan, L. Nie, Z. Lin, B. Chen, Neural multimodal cooperative learning toward micro-video understanding, IEEE Trans. Image Process. 29 (2019) 1–14.

[18]

F. Han, W. Chen, Q. Ling, H. Han, Multi-objective particle swarm optimization with adaptive strategies for feature selection, Swarm Evol. Comput. 62 (2021).

[19]

Y. Bian, Y. Wang, Y. Yao, H. Chen, Ensemble pruning based on objection maximization with a general distributed framework, IEEE Trans. Neural Netw. Learn. Syst. 31 (2020) 3766–3774.

[20]

Y. Sun, X. Zhou, C. Yang, T. Huang, A visual analytics approach for multi-attribute decision making based on intuitionistic fuzzy AHP and UMAP, Inf. Fusion 96 (2023) 269–280.

[21]

X. Zhou, Q. Wang, R. Zhang, C. Yang, A hybrid feature selection method for production condition recognition in froth flotation with noisy labels, Miner. Eng. 153 (2020).

[22]

P. Wang, B. Xue, J. Liang, M. Zhang, Feature clustering-assisted feature selection with differential evolution, Pattern Recognit. 140 (2023).

[23]

Z. Huang, C. Yang, X. Zhou, T. Huang, A hybrid feature selection method based on binary state transition algorithm and ReliefF, IEEE J. Biomed. Health Inform. 23 (2019) 1888–1898.

[24]

X. Song, Y. Zhang, Y. Guo, X. Sun, Y. Wang, Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data, IEEE Trans. Evol. Comput. 24 (2020) 882–895.

[25]

X. Song, Y. Zhang, D. Gong, X. Gao, A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data, IEEE Trans. Cybern. 52 (2022) 9573–9586.

[26]

S. Zadeh, M. Ghadiri, V. Mirrokni, M. Zadimoghaddam, Scalable feature selection via distributed diversity maximization, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, 2017, p. 1.

[27]

X. Zhang, Y. Tian, R. Cheng, Y. Jin, An efficient approach to nondominated sorting for evolutionary multiobjective optimization, IEEE Trans. Evol. Comput. 19 (2015) 201–213.

[28]

J.T. Pintas, L.A.F. Fernandes, A.C.B. Garcia, Feature selection methods for text classification: a systematic literature review, Artif. Intell. Rev. 54 (2021) 6149–6200.

[29]

Y. Li, W. Huang, R. Wu, K. Guo, An improved artificial bee colony algorithm for solving multi-objective low-carbon flexible job shop scheduling problem, Appl. Soft Comput. 95 (2020).

[30]

Y. Zhang, D. Gong, X. Gao, T. Tian, X. Sun, Binary differential evolution with self-learning for multi-objective feature selection, Inf. Sci. 507 (2020) 67–85.

Digital Library

[31]

E. Hancer, B. Xue, M. Zhang, D. Karaboga, B. Akay, Pareto front feature selection based on artificial bee colony optimization, Inf. Sci. 422 (2018) 462–479.

Digital Library

[32]

Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Artif. Intell. Med. 55 (1997) 119–139.

[33]

J.H. Friedman, Greedy function approximation: a gradient boosting machine, Ann. Stat. 29 (2001) 1189–1232.

[34]

T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, Association for Computing Machinery, New York, NY, USA, 2016, pp. 785–794.

[35]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T. Liu, LightGBM: a highly efficient gradient boosting decision tree, in: Advances in Neural Information Processing Systems, vol. 30, NIPS'17, Curran Associates, Inc., Red, Hook, NY, USA, 2017, pp. 3149–3157.

[36]

G. Ngo, R. Beard, R. Chandra, Evolutionary bagging for ensemble learning, Neurocomputing 510 (2022) 1–14.

[37]

J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (2006) 1–30.

Digital Library

[38]

L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.

Digital Library

[39]

P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (2006) 3–42.

Digital Library

[40]

J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classifier ensemble method, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 1619–1630.

Digital Library

[41]

Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, X. Liang, An improved random forest based on the classification accuracy and correlation measurement of decision trees, Expert Syst. Appl. 237 (2024).

[42]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python, J. Mach. Learn. Res. 12 (2011) 2825–2830.

[43]

H. Chen, G. Zhang, X. Pan, R. Jia, Using dual evolutionary search to construct decision tree based ensemble classifier, Complex Intell. Syst. 9 (2023) 1327–1345.

[44]

C. Qian, Y. Yu, Z. Zhou, Pareto ensemble pruning, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, 2015, p. 1.

[45]

Y. Guo, J. Feng, B. Jiao, N. Cui, S. Yang, Z. Yu, A dual evolutionary bagging for class imbalance learning, Expert Syst. Appl. 206 (2022).

[46]

J. Teng, H. Zhang, W. Liu, X. Shu, F. Ye, A dynamic Bayesian model for breast cancer survival prediction, IEEE J. Biomed. Health Inform. 26 (2022) 5716–5727.

[47]

X. Zhang, X.P. Dong, Y.Z. Guan, M. Ren, D. Guo, Y. He, Research progress on epidemiological trend and risk factors of female breast cancer, Cancer Res. Prev. Treat. 48 (2021) 87–92.

[48]

J. Teng, A. Abdygametova, J. Du, B. Ma, R. Zhou, Y. Shyr, F. Ye, Bayesian inference of lymph node ratio estimation and survival prognosis for breast cancer patients, IEEE J. Biomed. Health Inform. 24 (2020) 354–364.

[49]

L. Shi, F. Yan, H. Liu, Screening model of candidate drugs for breast cancer based on ensemble learning algorithm and molecular descriptor, Expert Syst. Appl. 213 (2023).

Index Terms

An efficient ensemble learning method based on multi-objective feature selection
1. Computing methodologies
  1. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Classifier ensemble generation and selection with multiple feature representations for classification applications in computer-aided detection and diagnosis on mammography

Novel ensemble classifier framework for improved classification of breast lesions.Ensemble generation algorithm using different types of breast lesion features.Ensemble selection mechanism to find an optimal subset of component classifiers.Impressive ...
VEGAS: A Variable Length-Based Genetic Algorithm for Ensemble Selection in Deep Ensemble Learning
Intelligent Information and Database Systems
Abstract
In this study, we introduce an ensemble selection method for deep ensemble systems called VEGAS. The deep ensemble models include multiple layers of the ensemble of classifiers (EoC). At each layer, we train the EoC and generates training data for ...
Ensemble Selection based on Classifier Prediction Confidence
Highlights
- An ensemble selection method that takes into account each base classifier's confidence during classification and its overall credibility on the task is ...
Abstract
Ensemble selection is one of the most studied topics in ensemble learning because a selected subset of base classifiers may perform better than the whole ensemble system. In recent years, a great many ensemble selection methods have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 679, Issue C

Sep 2024

1581 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 19 September 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents