An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment

Alia Fatima¹,
Tahira Nazir ORCID: orcid.org/0000-0001-8130-3721²,
Aiman Khan Nazir³ &
…
Atta Mohyi U. Din⁴

123 Accesses
Explore all metrics

Abstract

Tumor is one of the deadliest diseases; nowadays the cases of tumors are increasing rapidly. Researchers from worldwide are doing extensive research for the diagnosis and discernment of tumors by employing machine learning algorithms and performing experiments on the basis of observations which are stored in the form of datasets. The tumor-related dataset is high-dimensional and has many genes, most of which are not prognostic. Some of them are irrelevant and redundant. Here, we proposed a methodology named IG based on IWSSr-Random Forest(RF) which selects the most relevant prognostic genes by using Information Gain for gene ranking and evaluates the importance of genes by using RF after selecting the genes in an incremental manner in a wrapper. Furthermore, we use the RF for classification purposes. Experiments are performed on nine publicly available tumor-related datasets. Accuracy, Confusion matrix, Precision, F-measure, and Recall are used as performance evaluators. The proposed methodology selects 3 most relevant genes out of 2000 genes, 5 genes out of 7129 genes, 3 genes out of 7129 genes, 5 genes out of 24,481 genes, 7 genes out of 12,601 genes, 5 genes out of 15,154 genes, 2 genes out of 4026 genes, 5 genes out of 12,582 genes and 4 genes out of 2308 genes, and produces 88.71%, 71.67%, 98.61%, 79.38%, 93.60%, 99.60%, 92.42%, 95.83% and 92.77% accurate results in case of Colon, Central Nervous System, Leukemia, Breast Cancer, Lung Cancer, Ovarian Cancer, Lymphoma, MLL and SRBCT respectively. Experimental results present that IG based on IWSSr(RF) performs well compared to the state-of-the-art algorithms’ results for instance RF, Naïve Bayes, KNN, and Decision Tree. IG based on IWSSr(RF) also has nominal time complexity compared to the time complexity of the above-mentioned classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection

Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

Article Open access 05 February 2024

Data availability

All data is publicly available at (Zexuan Zhu, 2007).

References

Abe S (2005) 13th European symposium on artificial neural networks. Bruges, Belgium, p 27–29
Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853
Article Google Scholar
Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305
Article MathSciNet Google Scholar
Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowl-Based Syst 25(1):35–44
Article Google Scholar
Bomze IM, De Klerk E (2002) Solving standard quadratic optimization problems via linear, semidefinite and copositive programming. J Global Optim 24(2):163–185
Article MathSciNet Google Scholar
Breiman L (2001) Random forests Machine learning 45(1):5–32
Article Google Scholar
Cernuda C, Lughofer E, Hintenaus P, Märzinger W (2014) Enhanced waveband selection in NIR spectra using enhanced genetic operators. J Chemometr 28(3):123–136
Article Google Scholar
Chen X-W (2003) An improved branch and bound algorithm for feature selection. Pattern Recogn Lett 24(12):1925–1933
Article Google Scholar
Cotter SF, Kreutz-Delgado K, Rao BD (2001) Backward sequential elimination for sparse vector subset selection. Signal Process 81(9):1849–1864
Article Google Scholar
Debuse JC, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. Journal of Intelligent Information Systems 9(1):57–81
Article Google Scholar
Prachi HM, Sharma P (2019) Intrusion detection using machine learning and feature selection. International Journal of Computer Network and Information Security 11(4):43–52
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87
Article MathSciNet Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Article Google Scholar
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13
Article Google Scholar
Guyon, Isabelle, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Article Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Advances in Neural Information Processing Systems 18(1):2005
Kabir M, Shahjahan M, Murase K (2009) An efficient feature selection using ant colony optimization algorithm. International Conference on Neural Information Processing
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Aaai
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9(4):1106–1119
Article Google Scholar
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267–281
Article Google Scholar
Liu H, Zhou M, Liu Q (2019) An embedded feature selection method for imbalanced data classification. IEEE/CAA Journal of Automatica Sinica 6(3):703–715
Article Google Scholar
Lughofer E (2011) On-line incremental feature weighting in evolving fuzzy classifiers. Fuzzy Sets Syst 163(1):1–23
Article MathSciNet Google Scholar
Mbaabu O (2022) Introduction to Random Forest in Machine Learning. https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/
Mitchell TJ, Beauchamp JJ (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83(404):1023–1032
Article MathSciNet Google Scholar
Quinlan JR (1986) Induction of decision trees Machine learning 1(1):81–106
Google Scholar
Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392
Article Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60
Article Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson education. Inc., New Delhi
Google Scholar
Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, Van Der Kooy K, Marton MJ, Witteveen AT (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Article Google Scholar
Wang J, Wu L, Kong J, Li Y, Zhang B (2013) Maximum weight and minimum redundancy: a novel framework for feature subset selection. Pattern Recogn 46(6):1616–1627
Article Google Scholar
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications 13(2):44–49
Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Higher Education Department, Taxila, Punjab, Pakistan
Alia Fatima
Faculty of Computing, Riphah International University Gulberg Greens, Islamabad, Pakistan
Tahira Nazir
Pir Mehr Ali Shah, Arid Agriculture University, Rawalpindi, Punjab, Pakistan
Aiman Khan Nazir
Military College of Signals, NUST, Rawalpindi, Islamabad, Pakistan
Atta Mohyi U. Din

Authors

Alia Fatima
View author publications
You can also search for this author in PubMed Google Scholar
Tahira Nazir
View author publications
You can also search for this author in PubMed Google Scholar
Aiman Khan Nazir
View author publications
You can also search for this author in PubMed Google Scholar
Atta Mohyi U. Din
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tahira Nazir.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fatima, A., Nazir, T., Nazir, A.K. et al. An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment. Multimed Tools Appl 83, 64741–64766 (2024). https://doi.org/10.1007/s11042-023-18046-2

Download citation

Received: 17 March 2023
Revised: 14 October 2023
Accepted: 26 December 2023
Published: 17 January 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11042-023-18046-2

An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection

Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment

A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection

Genetic Clustering Algorithm-Based Feature Selection and Divergent Random Forest for Multiclass Cancer Classification Using Gene Expression Data

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation