A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification
<p>Flowchart of our proposed tri-phase hybrid wrapper-filter feature selection method.</p> "> Figure 2
<p>Comparison of accuracies obtained on four disease datasets using KNN, SVM, and NB classifiers without any feature selection.</p> "> Figure 3
<p>Comparison of accuracies, number of features, and computational time obtained on four disease datasets using our proposed tri-stage wrapper-filter feature selection method.</p> "> Figure 4
<p>Comparison of number of features obtained by each phase of our proposed tri-stage wrapper-filter feature selection method for all the four disease datasets considering the highest accuracy achieved in each phase.</p> ">
Abstract
:1. Introduction
Contributions
2. Literature Survey
2.1. Arrhythmia
2.2. Leukemia
2.3. DLBCL
2.4. Prostate Cancer
2.5. Motivation
3. Materials and Methods
3.1. Phase 1
3.1.1. Ranker Methods Used
- MI:
- CS
- RFF
- XV
3.1.2. Classification Algorithms Used
- KNN
- SVM
- NB
- XGBoost
3.2. Phase 2
3.3. Phase 3
3.3.1. Whale Optimization Algorithm
3.3.2. Exploitation Phase
3.3.3. Exploration Phase
Algorithm 1 Algorithm of WOA |
Input: Number of whales (n), Max_Iter Output: Prey or the fittest whale ) Calculate the objective value of each solution while) for each solution if) if) Update the current solution’s position by Equation (5) else if) ) Use Equation (12) end if else if) Update the current solution’s position by the Equation (9) end if end for Check whether there is any solution present beyond the search space and update it Calculate each solution’s fitness end while return X* |
3.4. Dataset Details
4. Results and Discussion
4.1. Parameter Tuning
4.2. Experimental Outcomes and Analysis
4.3. Comparison with State-of-the-Art Methods
4.4. Statistical Significance Test
4.5. Results on Other UCI Datasets
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ghosh, M.; Guha, R.; Singh, P.K.; Bhateja, V.; Sarkar, R. A histogram based fuzzy ensemble technique for feature selection. Evol. Intell. 2019, 12, 713–724. [Google Scholar] [CrossRef]
- Ghosh, K.K.; Ahmed, S.; Singh, P.K.; Geem, Z.W.; Sarkar, R. Improved Binary Sailfish Optimizer Based on Adaptive β-Hill Climbing for Feature Selection. IEEE Access 2020, 8, 83548–83560. [Google Scholar] [CrossRef]
- Duval, B.; Hao, J.-K.; Hernandez, J.C.H. A memetic algorithm for gene selection and molecular classification of cancer. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ‘09, Montreal, QC, Canada, 8–12 July 2009; pp. 201–208. [Google Scholar] [CrossRef] [Green Version]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Lu, H.; Chen, J.; Yan, K.; Jin, Q.; Xue, Y.; Gao, Z. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017, 256, 56–62. [Google Scholar] [CrossRef]
- Arrhythmia. Available online: https://www.nhlbi.nih.gov/health-topics/arrhythmia (accessed on 13 August 2021).
- Ophthalmologic Manifestations of Leukemias. Available online: https://emedicine.medscape.com/article/1201870-overview#a6 (accessed on 30 April 2021).
- Filippini, T.; Heck, J.; Malagoli, C.; Del Giovane, C.; Vinceti, M. A Review and Meta-Analysis of Outdoor Air Pollution and Risk of Childhood Leukemia. J. Environ. Sci. Health Part C 2015, 33, 36–66. [Google Scholar] [CrossRef] [PubMed]
- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics. GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Available online:. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497009/ (accessed on 17 August 2021).
- Shensheng Xu, S.; Mak, M.W.; Cheung, C.C. Deep neural networks versus support vector machines for ECG arrhythmia classification. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017; pp. 127–132. [Google Scholar] [CrossRef]
- Singh, N.; Singh, P. Cardiac arrhythmia classification using machine learning techniques. In Engineering Vibration, Communication and Information Processing; Ray, K., Sharan, S., Rawat, S., Jain, S., Srivastava, S., Bandyopadhyay, A., Eds.; Springer: Singapore, 2019; Volume 478. [Google Scholar] [CrossRef]
- Sahebi, G.; Movahedi, P.; Ebrahimi, M.; Pahikkala, T.; Plosila, J.; Tenhunen, H. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance. Comput. Biol. Med. 2020, 125, 103974. [Google Scholar] [CrossRef]
- Cui, X.; Li, Y.; Fan, J.; Wang, T.; Zheng, Y. A Hybrid Improved Dragonfly Algorithm for Feature Selection. IEEE Access 2020, 8, 155619–155629. [Google Scholar] [CrossRef]
- Kadam, V.; Jadhav, S.; Yadav, S. Bagging based ensemble of Support Vector Machines with improved elitist GA-SVM features selection for cardiac arrhythmia classification. Int. J. Hybrid Intell. Syst. 2020, 16, 25–33. [Google Scholar] [CrossRef]
- Wang, T.; Chen, P.; Bao, T.; Li, J.; Yu, X. Arrhythmia Classification Algorithm based on SMOTE and Feature Selection. IJPE 2021, 17, 263. [Google Scholar] [CrossRef]
- Wang, Y.; Yang, X.-G.; Lu, Y. Informative gene selection for microarray classification via adaptive elastic net with conditional mutual information. Appl. Math. Model. 2019, 71, 286–297. [Google Scholar] [CrossRef] [Green Version]
- Sun, L.; Wang, L.; Xu, J.; Zhang, S. A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures. Entropy 2019, 21, 138. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Khamees, M.; Rashed, A.A.-B. Hybrid SCA-CS optimization algorithm for feature selection in classification problems. AIP Conf. Proc. 2020, 2290, 040001. [Google Scholar] [CrossRef]
- Kilicarslan, S.; Adem, K.; Celik, M. Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med. Hypotheses 2020, 137, 109577. [Google Scholar] [CrossRef] [PubMed]
- Santhakumar, D.; Logeswari, S. Hybrid ant lion mutated ant colony optimizer technique for Leukemia prediction using microarray gene data. J. Ambient Intell. Humaniz. Comput. 2020, 12, 2965–2973. [Google Scholar] [CrossRef]
- Sheikhpour, R.; Fazli, R.; Mehrabani, S. Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method. Iran. J. Pediatr. Hematol. Oncol. 2021. [Google Scholar] [CrossRef]
- Zhou, P.; Hu, X.; Li, P.; Wu, X. Online feature selection for high dimensional class-imbalanced data. In Knowledge-Based Systems; Elsevier: Amsterdam, The Netherlands, 2017; Volume 136, pp. 187–199. [Google Scholar] [CrossRef]
- Kang, C.; Huo, Y.; Xin, L.; Tian, B.; Yu, B. Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J. Theor. Biol. 2018, 463, 77–91. [Google Scholar] [CrossRef] [PubMed]
- Yan, C.; Ma, J.; Luo, H.; Patel, A. Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high dimensional biomedical datasets. In Chemometrics and Intelligent Laboratory Systems; Elsevier: Amsterdam, The Netherlands, 2019; Volume 184, pp. 102–111. [Google Scholar] [CrossRef]
- Bir-Jmel, A.; Douiri, S.M.; Elbernoussi, S. Gene Selection via a New Hybrid Ant Colony Optimization Algorithm for Cancer Classification in High-Dimensional Data. Comput. Math. Methods Med. 2019, 2019, 7828590. [Google Scholar] [CrossRef]
- Alirezanejad, M.; Enayatifar, R.; Motameni, H.; Nematzadeh, H. Heuristic filter feature selection methods for medical datasets. Genomics 2019, 112, 1173–1181. [Google Scholar] [CrossRef]
- Liu, X.-Y.; Liang, Y.; Wang, S.; Yang, Z.-Y.; Ye, H.-S. A Hybrid Genetic Algorithm with Wrapper-Embedded Approaches for Feature Selection. IEEE Access 2018, 6, 22863–22874. [Google Scholar] [CrossRef]
- Prabhakar, S.K.; Lee, S.-W. Transformation Based Tri-Level Feature Selection Approach Using Wavelets and Swarm Computing for Prostate Cancer Classification. IEEE Access 2020, 8, 127462–127476. [Google Scholar] [CrossRef]
- Cahyaningrum, K.; Adiwijaya; Astuti, W. Microarray gene expression classification for cancer detection using artificial neural networks and genetic algorithm hybrid intelligence. In Proceedings of the International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 5–6 August 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Xiongshi, D.; Li, M.; Deng, S.; Wang, L. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification. arXiv 2021, arXiv:2106.05841. [Google Scholar]
- De Lima, M.D.; Lima, J.D.O.R.E.; Barbosa, R.M. Medical data set classification using a new feature selection algorithm combined with twin-bounded support vector machine. Med. Biol. Eng. Comput. 2020, 58, 519–528. [Google Scholar] [CrossRef]
- Chatterjee, B.; Bhattacharyya, T.; Ghosh, K.K.; Singh, P.K.; Geem, Z.W.; Sarkar, R. Late Acceptance Hill Climbing Based Social Ski Driver Algorithm for Feature Selection. IEEE Access 2020, 8, 75393–75408. [Google Scholar] [CrossRef]
- Ghosh, K.K.; Singh, P.K.; Hong, J.; Geem, Z.W.; Sarkar, R. Binary Social Mimic Optimization Algorithm With X-Shaped Transfer Function for Feature Selection. IEEE Access 2020, 8, 97890–97906. [Google Scholar] [CrossRef]
- Chatterjee, I.; Ghosh, M.; Singh, P.K.; Sarkar, R.; Nasipuri, M. A Clustering-based feature selection framework for handwritten Indic script classification. Expert Syst. 2019, 36. [Google Scholar] [CrossRef]
- Guha, R.; Ghosh, M.; Singh, P.K.; Sarkar, R.; Nasipuri, M. A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem. Complex Intell. Syst. 2021, 1–17. [Google Scholar] [CrossRef]
- Saha, S.; Ghosh, M.; Ghosh, S.; Sen, S.; Singh, P.K.; Geem, Z.W.; Sarkar, R. Feature Selection for Facial Emotion Recognition Using Cosine Similarity-Based Harmony Search Algorithm. Appl. Sci. 2020, 10, 2816. [Google Scholar] [CrossRef] [Green Version]
- Dey, A.; Chattopadhyay, S.; Singh, P.K.; Ahmadian, A.; Ferrara, M.; Sarkar, R. A Hybrid Meta-Heuristic Feature Selection Method Using Golden Ratio and Equilibrium Optimization Algorithms for Speech Emotion Recognition. IEEE Access 2020, 8, 200953–200970. [Google Scholar] [CrossRef]
- Guha, S.; Das, A.; Singh, P.K.; Ahmadian, A.; Senu, N.; Sarkar, R. Hybrid Feature Selection Method Based on Harmony Search and Naked Mole-Rat Algorithms for Spoken Language Identification from Audio Signals. IEEE Access 2020, 8, 182868–182887. [Google Scholar] [CrossRef]
- Das, A.; Guha, S.; Singh, P.K.; Ahmadian, A.; Senu, N.; Sarkar, R. A Hybrid Meta-Heuristic Feature Selection Method for Identification of Indian Spoken Languages from Audio Signals. IEEE Access 2020, 8, 181432–181449. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69. [Google Scholar] [CrossRef] [Green Version]
- Ghosh, M.; Adhikary, S.; Ghosh, K.K.; Sardar, A.; Begum, S.; Sarkar, R. Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med. Biol. Eng. Comput. 2018, 57, 159–176. [Google Scholar] [CrossRef]
- Kira, K.; Rendell, L.A. A practical approach to feature selection. In Proceedings of the Ninth International Workshop on Machine Learning, Aberdeen, Scotland, 1–3 July 1992; pp. 249–256, ISBN 978-1-55860-247-2. [Google Scholar] [CrossRef]
- Fix, E.; Hodges, J.L. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties. Int. Stat. Rev. 1989, 57, 238. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction with 200 Full-Color Illustrations; Springer: New York, NY, USA, 2001; ISBN 0-387-95284-5. [Google Scholar]
- Understanding XGBoost Algorithm|What Is XGBoost Algorithm? Available online: https://www.mygreatlearning.com/blog/xgboost-algorithm (accessed on 30 March 2021).
- Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Tubishat, M.; Abushariah, M.; Idris, N.; Aljarah, I. Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Appl. Intell. 2018, 49, 1688–1707. [Google Scholar] [CrossRef]
- Hussien, A.G.; Hassanien, A.E.; Houssein, E.; Bhattacharyya, S.; Amin, M. S-Shaped Binary Whale Optimization Algorithm for Feature Selection; Springer: Singapore, 2018; pp. 79–87. [Google Scholar] [CrossRef]
- Mafarja, M.; Mirjalili, S. Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 2018, 62, 441–453. [Google Scholar] [CrossRef]
- Arrhythmia Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/arrhythmia (accessed on 28 April 2021).
- Data set name: leukemia. Available online: https://file.biolab.si/biolab/supp/bi-cancer/projections/info/leukemia.html (accessed on 28 April 2021).
- Data set name: DLBCL. Available online: https://file.biolab.si/biolab/supp/bi-cancer/projections/info/DLBCL.html (accessed on 28 April 2021).
- Data set name: Prostate. Available online: https://file.biolab.si/biolab/supp/bi-cancer/projections/info/prostata.html (accessed on 28 April 2021).
- Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H. Benchmarking relief-based feature selection methods for bioinformatics data mining. J. Biomed. Inform. 2018, 85, 168–188. [Google Scholar] [CrossRef] [PubMed]
- Guha, R.; Chatterjee, B.; Sk, K.H.; Ahmed, S.; Bhattacharya, T.; Sarkar, R. Py_FS: A Python Package for Feature Selection using Meta-heuristic Optimization Algorithms. In Proceedings of the 3rd International Conference on Computational Intelligence in Pattern Recognition (CIPR-2021), Kolkata, India, 24–25 April 2021. [Google Scholar]
- Golub, T.R.; Slonim, D.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286, 531–537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sheikh, K.H.; Ahmed, S.; Mukhopadhyay, K.; Singh, P.K.; Yoon, J.H.; Geem, Z.W.; Sarkar, R. EHHM: Electrical Harmony Based Hybrid Meta-Heuristic for Feature Selection. IEEE Access 2020, 8, 158125–158141. [Google Scholar] [CrossRef]
- Singh, P.K.; Sarkar, R.; Nasipuri, M. Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int. J. Appl. Pattern Recognit. 2015, 2, 1–23. [Google Scholar] [CrossRef]
- Singh, P.K.; Sarkar, R.; Nasipuri, M. Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int. J. Comput. Sci. Math. 2016, 7, 410–442. [Google Scholar] [CrossRef]
- One Sample T Test—Clearly Explained with Examples|ML+. Available online: https://www.machinelearningplus.com/statistics/one-sample-t-test/ (accessed on 28 July 2021).
- Connectionist Bench (Sonar, Mines vs. Rocks) Data Set. Available online: http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks) (accessed on 24 July 2021).
- Ionosphere Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/ionosphere (accessed on 24 July 2021).
- Chess (King-Rook vs. King-Pawn) Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King-Pawn) (accessed on 24 July 2021).
- Thejas, G.S.; Joshi, S.R.; Iyengar, S.S.; Sunitha, N.R.; Badrinath, P. Mini-Batch Normalized Mutual Information: A Hybrid Feature Selection Method. IEEE Access 2019, 7, 116875–116885. [Google Scholar] [CrossRef]
- Mandal, M.; Ghosh, D.; Acharya, S.; Saha, N.; Sarkar, R. MIRFCS: An Ensemble of Filter Methods for Classification of Disease Data. In Proceedings of the 3rd International Conference on Computational Intelligence in Pattern Recognition (CIPR-2021), Kolkata, India, 24–25 April 2021. [Google Scholar]
Sl. No. | Dataset | Total Number of Attributes | Total Number of Instances | Class Distribution |
---|---|---|---|---|
1. | Arrhythmia | 279 | 452 | Non-affected: 245 Affected: 207 |
2. | Leukemia | 5147 | 72 | ALL: 47 AML: 25 |
3. | DLBCL | 7070 | 77 | Non-DLBCL: 19 DLBCL: 58 |
4. | Prostate cancer | 12,532 | 102 | Normal tissue: 50 Prostate tumor: 52 |
Dataset | Parameter Details | ||
---|---|---|---|
Phase 1 | Phase 3 | ||
Value of ‘m’ | #Search Agents | Value of ‘K’ in KNN | |
Arrhythmia | Range: 50–70 Interval value: 5 Final value: 50 | Range: 30–100 Interval value: 5 Final value: 70 | 5 |
Leukemia | Range: 100–150 Interval value: 10 Final value: 100 | Range: 30–100 Interval value: 5 Final value: 50 | 4 |
DLBCL | Range: 150–200 Interval value: 10 Final value: 150 | Range: 30–100 Interval value: 5 Final value: 60 | 4 |
Prostate cancer | Range: 200–250 Interval value: 10 Final value: 250 | Range: 30–100 Interval value: 5 Final value: 50 | 5 |
Dataset | Phase 1 | Phase 2 | |||
---|---|---|---|---|---|
Number of Discarded Correlated Features | Number of Non-Correlated Features | ||||
Arrhythmia | 50 | 60 | 25 | 35 | 20 |
Leukemia | 100 | 70 | 46 | 24 | 23 |
DLBCL | 150 | 50 | 26 | 24 | 23 |
Prostate cancer | 200 | 75 | 50 | 25 | 25 |
Dataset | Accuracy (%) Obtained by the Proposed Method (with Feature Selection) | Accuracy (%) on the Entire Dataset (without Feature Selection) | ||||||
---|---|---|---|---|---|---|---|---|
Original #Features | #Features Selected | Accuracy (%) | Computation Time (s) | KNN | SVM | NB | Computation Time (s) | |
Arrhythmia | 279 | 3 | 94.50 | 238 | 63.06 | 75.44 | 67.71 | 42.8 |
Leukemia | 5147 | 4 | 100 | 358 | 87.67 | 88.92 | 100 | 43.9 |
DLBCL | 7070 | 4 | 100 | 545 | 84.10 | 75.35 | 78.75 | 47.5 |
Prostate cancer | 12,532 | 3 | 100 | 782 | 85.36 | 84.36 | 62.54 | 55.6 |
Dataset | XGBoost Classifier | |||||
---|---|---|---|---|---|---|
Accuracy (%) | Precision (%) | Recall (%) | F1_Score (%) | No. of Best Features | ||
Arrhythmia | Phase 1 | 96.46 | 94.21 | 98.57 | 96.34 | 26 |
Phase 2 | 96.24 | 93.82 | 98.41 | 96.06 | 17 | |
Leukemia | Phase 1 | 96.25 | 90 | 87.5 | 88.73 | 8 |
Phase 2 | 97.32 | 80 | 80 | 80 | 6 | |
DLBCL | Phase 1 | 92.14 | 68.33 | 70 | 69.15 | 8 |
Phase 2 | 93.39 | 73.33 | 70 | 71.62 | 9 | |
Prostate cancer | Phase 1 | 95.18 | 96.57 | 94.89 | 95.72 | 19 |
Phase 2 | 94.18 | 96.57 | 93.55 | 95.03 | 5 |
Dataset | Method | No. of Features Selected | Classification Accuracy (%) |
---|---|---|---|
Arrhythmia | Xu et al. [11] | 236 | 82.96 |
Singh et al. [12] | 30 | 85.58 | |
Sahebi et al. [13] | 135 | 99.02 | |
Cui et al. [14] | 169 | 74.77 | |
Kadam et al. [15] | 92 | 88.72 | |
Wang et al. [16] | 89 | 98.68 | |
Proposed | 3 | 94.50 |
Dataset | Method | No. of Feature Selected | Classification Accuracy (%) |
---|---|---|---|
Leukemia | Wang et al. [17] | 27 | 91.05 |
Sun et al. [18] | 7.6 | 87.5 | |
Khamess et al. [19] | 27.91 | 90.88 | |
Kilicarslan et al. [20] | 36 | 99.86 | |
Santhakumar et al. [21] | NA | 95.45 | |
Sheikhpour et al. [22] | 8 | 100 | |
Proposed | 4 | 100 |
Dataset | Method | No. of Features Selected | Classification Accuracy (%) |
---|---|---|---|
DLBCL | Peng Zhou et al. [23] | 10 | 95.4 |
Chuanze et al. [30] | 8 | 100 | |
Yan et al. [25] | NA | 77.49 | |
Bir-Jmel et al. [26] | 6 | 100 | |
Cui et al. [14] | 16 | 100 | |
Alirezanejad et al. [32] | 10 | 89 | |
Proposed | 4 | 100 |
Dataset | Method | No. of Features Selected | Classification Accuracy (%) |
---|---|---|---|
Prostate cancer | Liu et al. [28] | 22 | 94.17 |
Bir-Jmel et al. [26] | 21 | 100 | |
Sun et al. [18] | 4 | 91.2 | |
Prabhakar et al. [29] | 100 | 99.48 | |
Cahyaningrum et al. [30] | 10 | 76.47 | |
Deng et al. [31] | 54 | 98 | |
Proposed | 3 | 100 |
Dataset | t-Value | p-Value | Significance Level (0.10) |
---|---|---|---|
Arrhythmia | −1.618637 | 0.083225 | Significant |
Leukemia | −2.790643 | 0.019208 | Significant |
DLBCL | −1.743647 | 0.070839 | Significant |
Prostate cancer | −1.871934 | 0.060057 | Significant |
Dataset | No. of Attributes | No. of Instances | No. of Classes | Dataset Domain |
---|---|---|---|---|
Ionosphere | 34 | 351 | 2 | Electromagnetic |
Krvskp | 36 | 3196 | 2 | Game |
Sonar | 60 | 208 | 2 | Biology |
Dataset | Method | No. of Features Selected | Classification Accuracy (%) |
---|---|---|---|
Ionosphere | Ghosh et al. [1] | 20 | 95.36 |
Thejas et al. [66] | 6 | 97.18 | |
Ghosh et al. [2] | 7 | 98.51 | |
Sheikh et al. [59] | 7 | 98.56 | |
Proposed | 5 | 95.77 | |
Krvskp | Chatterjee et al. [33] | 20 | 97.81 |
Sheikh et al. [59] | 15 | 97.81 | |
Ghosh et al. [34] | 11 | 98.6 | |
Ghosh et al. [2] | 32 | 99.06 | |
Proposed | 4 | 95.15 | |
Sonar | Ghosh et al. [1] | 27 | 85.07 |
Sheikh et al. [59] | 22 | 92.86 | |
Thejas et al. [66] | 51 | 97.62 | |
Ghosh et al. [34] | 16 | 100 | |
Proposed | 4 | 92.85 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mandal, M.; Singh, P.K.; Ijaz, M.F.; Shafi, J.; Sarkar, R. A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification. Sensors 2021, 21, 5571. https://doi.org/10.3390/s21165571
Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R. A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification. Sensors. 2021; 21(16):5571. https://doi.org/10.3390/s21165571
Chicago/Turabian StyleMandal, Moumita, Pawan Kumar Singh, Muhammad Fazal Ijaz, Jana Shafi, and Ram Sarkar. 2021. "A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification" Sensors 21, no. 16: 5571. https://doi.org/10.3390/s21165571