Abstract
Feature selection is an essential task in the field of machine learning, data mining, and pattern recognition, primarily, when we deal with a large number of features. Feature selection assists in enhancing prediction accuracy, reducing computation time, and creating more comprehensible models. In feature selection, each feature has two possibilities, either it would be taken for computation or not, which implies for n number of features, there are \(2^{n}\) possible feature subsets. So, identifying a relevant feature subset in a reasonable amount of time is an NP-hard problem, but by using an approximation algorithm, a near-optimal solution can be achieved. However, many of the feature selection algorithms use a sequential search strategy to select relevant features, which adds or removes features from the dataset sequentially and leads to trapped into a local optimum solution. In this paper, we propose a novel clustering-based hybrid feature selection approach using ant colony optimization that selects features randomly and measures the qualities of features by K-means clustering in terms of silhouette index and Laplacian score. The proposed feature selection approach allows random selection of features, which allows a better exploration of feature space and thus avoids the problem of being trapped in a local optimal solution, and generates a global optimal solution. The same is verified when compared with another state-of-the-art method.
Similar content being viewed by others
References
Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020)
Venkatesh, B.; Anuradha, J.: A review of feature selection and its methods. Cybern. Inf. Technol. 19(1), 3–26 (2019)
Zhu, P.; Hou, X.; Wang, Z.; Nie, F.: Compactness score: a fast filter method for unsupervised feature selection. arXiv preprint arXiv:2201.13194 (2022)
Feofanov, V.; Devijver, E.; Amini, M.-R.: Wrapper feature selection with partially labeled data. Appl. Intell. 1–14 (2022)
Sadeghian, Z.; Akbari, E.; Nematzadeh, H.: A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng. Appl. Artif. Intell. 97, 104079 (2021)
Aram, K.Y.; Lam, S.S.; Khasawneh, M.T.: Linear cost-sensitive max-margin embedded feature selection for SVM. Expert Syst. Appl. 197, 116683 (2022)
Prakash, J.; Singh, P.K.: Particle swarm optimization with k-means for simultaneous feature selection and data clustering. In: 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 74–78 . IEEE (2015)
Prakash, J.; Singh, P.K.: Gravitational search algorithm and k-means for simultaneous feature selection and data clustering: a multi-objective approach. Soft. Comput. 23(6), 2083–2100 (2019)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Tran, B.; Xue, B.; Zhang, M.: Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans. Evol. Comput. 23(3), 473–487 (2018)
Chen, K.; Zhou, F.-Y.; Yuan, X.-F.: Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst. Appl. 128, 140–156 (2019)
Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F.: A new hybrid filter-wrapper feature selection method for clustering based on ranking. Neurocomputing 214, 866–880 (2016)
Dash, M.; Liu, H.: Feature selection for clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 110–121 . Springer (2000)
Li, Y., Lu, B.-L., Wu, Z.-F.: A hybrid method of unsupervised feature selection based on ranking. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 2, pp. 687–690. IEEE (2006)
Blake, C.: UCI repository of machine learning databases. http://www. ics. uci. edu/ mlearn/MLRepository. html (1998)
Chatterjee, I.; Ghosh, M.; Singh, P.K.; Sarkar, R.; Nasipuri, M.: A clustering-based feature selection framework for handwritten indic script classification. Expert. Syst. 36(6), 12459 (2019)
Dorigo, M.; Gambardella, L.M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1(1), 53–66 (1997)
Tabakhi, S.; Moradi, P.; Akhlaghian, F.: An unsupervised feature selection algorithm based on ant colony optimization. Eng. Appl. Artif. Intell. 32, 112–123 (2014)
Sweetlin, J.D.; Nehemiah, H.K.; Kannan, A.: Feature selection using ant colony optimization with tandem-run recruitment to diagnose bronchitis from CT scan images. Comput. Methods Programs Biomed. 145, 115–125 (2017)
Joseph Manoj, R.; Praveena, A.; Vijayakumar, K.: An ACO-ANN based feature selection algorithm for big data. Clust. Comput. 22(2), 3953–3960 (2019)
Ma, W.; Zhou, X.; Zhu, H.; Li, L.; Jiao, L.: A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn. 116, 107933 (2021)
Franks, N.R.; Richardson, T.: Teaching in tandem-running ants. Nature 439(7073), 153–153 (2006)
He, X.; Cai, D.; Niyogi, P.: Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 18 (2005)
Bandillo, N.; Raghavan, C.; Muyco, P.A.; Sevilla, M.A.L.; Lobina, I.T.; Dilla-Ermita, C.J.; Tung, C.-W.; McCouch, S.; Thomson, M.; Mauleon, R.: Multi-parent advanced generation inter-cross (magic) populations in rice: progress and potential for genetics research and breeding. Rice 6(1), 1–15 (2013)
Mansueto, L.; Fuentes, R.R.; Borja, F.N.; Detras, J.; Abriol-Santos, J.M.; Chebotarov, D.; Sanciangco, M.; Palis, K.; Copetti, D.; Poliakov, A.: Rice SNP-seek database update: new SNPS, indels, and queries. Nucleic Acids Res. 45(D1), 1075–1081 (2017)
Dilla-Ermita, C.J.; Tandayu, E.; Juanillas, V.M.; Detras, J.; Lozada, D.N.; Dwiyanti, M.S.; Vera Cruz, C.; Mbanjo, E.G.N.; Ardales, E.; Diaz, M.G.: Genome-wide association analysis tracks bacterial leaf blight resistance loci in rice diverse germplasm. Rice 10(1), 1–17 (2017)
Xie, M.; Chung, C.Y.-L.; Li, M.-W.; Wong, F.-L.; Wang, X.; Liu, A.; Wang, Z.; Leung, A.K.-Y.; Wong, T.-H.; Tong, S.-W.: A reference-grade wild soybean genome. Nat. Commun. 10(1), 1–12 (2019)
Jha, P.; Tiwari, A.; Bharill, N.; Ratnaparkhe, M.; Mounika, M.; Nagendra, N.: Apache spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis. Comput. Biol. Chem. 92, 107454 (2021)
Real, R.; Vargas, J.M.: The probabilistic basis of Jaccard’s index of similarity. Syst. Biol. 45(3), 380–385 (1996)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Dwivedi, R.; Kumar, R.; Jangam, E.; Kumar, V.: An ant colony optimization based feature selection for data classification. Int. J. Recent Technol. Eng 7, 35–40 (2019)
Rahmanian, M.; Mansoori, E.G.: An unsupervised gene selection method based on multivariate normalized mutual information of genes. Chemom. Intell. Lab. Syst. 222, 104512 (2022)
Misuraca, M.; Spano, M.; Balbi, S.: BMS: an improved Dunn index for document clustering validation. Commun. Stat. Theory Methods 48(20), 5036–5049 (2019)
Davies, D.L.; Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Acknowledgements
This research is funded by The Council of Scientific and Industrial Research (CSIR), Government of India under grant no. 22(0853)/20/EMR-II.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dwivedi, R., Tiwari, A., Bharill, N. et al. A Novel Clustering-Based Hybrid Feature Selection Approach Using Ant Colony Optimization. Arab J Sci Eng 48, 10727–10744 (2023). https://doi.org/10.1007/s13369-023-07719-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-07719-7