Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Tumor is one of the deadliest diseases; nowadays the cases of tumors are increasing rapidly. Researchers from worldwide are doing extensive research for the diagnosis and discernment of tumors by employing machine learning algorithms and performing experiments on the basis of observations which are stored in the form of datasets. The tumor-related dataset is high-dimensional and has many genes, most of which are not prognostic. Some of them are irrelevant and redundant. Here, we proposed a methodology named IG based on IWSSr-Random Forest(RF) which selects the most relevant prognostic genes by using Information Gain for gene ranking and evaluates the importance of genes by using RF after selecting the genes in an incremental manner in a wrapper. Furthermore, we use the RF for classification purposes. Experiments are performed on nine publicly available tumor-related datasets. Accuracy, Confusion matrix, Precision, F-measure, and Recall are used as performance evaluators. The proposed methodology selects 3 most relevant genes out of 2000 genes, 5 genes out of 7129 genes, 3 genes out of 7129 genes, 5 genes out of 24,481 genes, 7 genes out of 12,601 genes, 5 genes out of 15,154 genes, 2 genes out of 4026 genes, 5 genes out of 12,582 genes and 4 genes out of 2308 genes, and produces 88.71%, 71.67%, 98.61%, 79.38%, 93.60%, 99.60%, 92.42%, 95.83% and 92.77% accurate results in case of Colon, Central Nervous System, Leukemia, Breast Cancer, Lung Cancer, Ovarian Cancer, Lymphoma, MLL and SRBCT respectively. Experimental results present that IG based on IWSSr(RF) performs well compared to the state-of-the-art algorithms’ results for instance RF, Naïve Bayes, KNN, and Decision Tree. IG based on IWSSr(RF) also has nominal time complexity compared to the time complexity of the above-mentioned classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability

All data is publicly available at (Zexuan Zhu, 2007).

References

  1. Abe S (2005) 13th European symposium on artificial neural networks. Bruges, Belgium, p 27–29

  2. Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853

    Article  Google Scholar 

  3. Almuallim H, Dietterich TG (1994) Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2):279–305

    Article  MathSciNet  Google Scholar 

  4. Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowl-Based Syst 25(1):35–44

    Article  Google Scholar 

  5. Bomze IM, De Klerk E (2002) Solving standard quadratic optimization problems via linear, semidefinite and copositive programming. J Global Optim 24(2):163–185

    Article  MathSciNet  Google Scholar 

  6. Breiman L (2001) Random forests Machine learning 45(1):5–32

    Article  Google Scholar 

  7. Cernuda C, Lughofer E, Hintenaus P, Märzinger W (2014) Enhanced waveband selection in NIR spectra using enhanced genetic operators. J Chemometr 28(3):123–136

    Article  Google Scholar 

  8. Chen X-W (2003) An improved branch and bound algorithm for feature selection. Pattern Recogn Lett 24(12):1925–1933

    Article  Google Scholar 

  9. Cotter SF, Kreutz-Delgado K, Rao BD (2001) Backward sequential elimination for sparse vector subset selection. Signal Process 81(9):1849–1864

    Article  Google Scholar 

  10. Debuse JC, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. Journal of Intelligent Information Systems 9(1):57–81

    Article  Google Scholar 

  11. Prachi HM, Sharma P (2019) Intrusion detection using machine learning and feature selection. International Journal of Computer Network and Information Security 11(4):43–52

  12. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87

    Article  MathSciNet  Google Scholar 

  13. George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889

    Article  Google Scholar 

  14. Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13

    Article  Google Scholar 

  15. Guyon, Isabelle, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

  16. Hall MA, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447

    Article  Google Scholar 

  17. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Advances in Neural Information Processing Systems 18(1):2005

  18. Kabir M, Shahjahan M, Murase K (2009) An efficient feature selection using ant colony optimization algorithm. International Conference on Neural Information Processing

  19. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Aaai

  20. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9(4):1106–1119

    Article  Google Scholar 

  21. Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267–281

    Article  Google Scholar 

  22. Liu H, Zhou M, Liu Q (2019) An embedded feature selection method for imbalanced data classification. IEEE/CAA Journal of Automatica Sinica 6(3):703–715

    Article  Google Scholar 

  23. Lughofer E (2011) On-line incremental feature weighting in evolving fuzzy classifiers. Fuzzy Sets Syst 163(1):1–23

    Article  MathSciNet  Google Scholar 

  24. Mbaabu O (2022) Introduction to Random Forest in Machine Learning. https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/

  25. Mitchell TJ, Beauchamp JJ (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83(404):1023–1032

    Article  MathSciNet  Google Scholar 

  26. Quinlan JR (1986) Induction of decision trees Machine learning 1(1):81–106

    Google Scholar 

  27. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392

    Article  Google Scholar 

  28. Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  29. Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60

    Article  Google Scholar 

  30. Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson education. Inc., New Delhi

    Google Scholar 

  31. Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, Van Der Kooy K, Marton MJ, Witteveen AT (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536

    Article  Google Scholar 

  32. Wang J, Wu L, Kong J, Li Y, Zhang B (2013) Maximum weight and minimum redundancy: a novel framework for feature subset selection. Pattern Recogn 46(6):1616–1627

    Article  Google Scholar 

  33. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications 13(2):44–49

  34. Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tahira Nazir.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fatima, A., Nazir, T., Nazir, A.K. et al. An efficient Incremental Wrapper-based Information Gain Gene Subset Selection (IG based on IWSSr) method for Tumor Discernment. Multimed Tools Appl 83, 64741–64766 (2024). https://doi.org/10.1007/s11042-023-18046-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-18046-2

Keywords