Abstract
Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and image retrieval involve tens or hundreds of thousands of features in the dataset. However, many of these features may be irrelevant and redundant, which only worsen the performance of the learning algorithms, and this may lead to the problem of overfitting. These superfluous features only degrade the accuracy and the computation time of a classification algorithm. So, the selection of relevant and nonredundant features is an important preprocessing step of any classification problem. Most of the global optimization techniques have the ability to converge to a solution quickly, but these begin with initializing a population randomly and the choice of initial population is an important step. In this paper, local searching algorithms have been used for generating a subset of relevant and nonredundant features; thereafter, a global optimization algorithm has been used so as to remove the limitations of global optimization algorithms, like lack of consistency in classification results and very high time complexity, to some extent. The computation time and classification accuracy are improved by using a feature set obtained from sequential backward selection and mutual information maximization algorithm which is fed to a global optimization technique (genetic algorithm, differential evolution or particle swarm optimization). In this proposed work, the computation time of these global optimization techniques has been reduced by using variance as stopping criteria. The proposed approach has been tested on publicly available Sonar, Wdbc and German datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE T Pattern Anal 19:153–158
Kotsiantis S (2011) Feature selection for machine learning classification problems: a recent overview. Artif Intell Rev 42:157
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inform 43:15–23
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Sutha K, Tamilselvi JJ (2015) A review of feature selection algorithms for data mining techniques. Int J Comput Sci Eng 7:63–67
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:491–502
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 100:917–922
Gupta P, Doermann D, DeMenthon D (2002) Beam search for feature selection in automatic SVM defect classification. Proc Int Conf Pattern Recogn 2:212–215
Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: overfitting and dynamic search space topology. In: Proceedings of international conference of knowledge discovery and data mining, pp 192–197
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125
Liu Y, Zheng YF (2006) FS_SFS: a novel feature selection method for support vector machines. Pattern Recogn 39:1333–1345
Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn Lett 30:525–534
Chaikla N, Qi Y (1999) Genetic algorithms in feature selection. Proc IEEE Int Conf Syst Man Cybernet 5:538–540
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Price K, Storn RM, Lampinen JA (2006) Differential evolution: a practical approach to global optimization. Springer, Berlin, pp 37–130
Ahmad I (2015) Feature selection using particle swarm optimization in intrusion detection. Int J Distrib Sens Netw. doi:10.1155/2015/806954
Christensen J, Marks J, Shieber S (1995) An empirical study of algorithms for point-feature label placement. ACM Trans Gr 14:203–232
Hall MA (1999) Correlation-based feature selection for machine learning. Dissertation, University of Waikato
Burrell L, Smart O, Georgoulas GK, Marsh E, Vachtsevanos GJ (2007) Evaluation of feature selection techniques for analysis of functional MRI and EEG. In: Proceedings of international conference on data mining, pp 256–262
Vafaie H, Imam IF (1994) Feature selection methods: genetic algorithms vs. greedy-like search. Proc Int Conf Fuzzy Intell Control Syst 51:39–43
Ladha L, Deepa T (2011) Feature selection methods and algorithms. Int J Adv Trends Comput Sci Eng 3:1787–1797
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal 26:1424–1437
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Yuan Huang, TsengSS Gangshan W, Fuyan Z (1999) A two-phase feature selection method using both filter and wrapper. Proc IEEE Conf Syst Man Cybernet 2:132–136
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 22:8520–8532
Bhandari D, Murthy CA, Pal SK (2012) Variance as a stopping criterion for genetic algorithms with elitist model. Fundam Inform 120:145–164
Yu L, Liu H (2003) Efficiently handling feature redundancy in high-dimensional data. In: Proceedings of international conference on knowledge discovery and data mining, pp 685–690
Kwak N, Choi CH (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal 24:1667–1671
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc Int Conf Mach Learn 3:856–863
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal 27:1226–1238
Zhuo L, Zheng J, Li X, Wang F, Ai B, Qian, J (2008) A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. In: Proceedings of geoinformatics 2008 and joint conference on GIS and built environment: classification of remote sensing images, pp 71471J–71471J
Jung M, Zscheischler J (2013) A guided hybrid genetic algorithm for feature selection with expensive cost functions. Proc Int Conf Comput Sci 18:2337–2346
Jiang J, Bo Y, Song C, Bao L (2012) Hybrid algorithm based on particle swarm optimization and artificial fish swarm algorithm. Int Symp Neural Netw 607–614
Balakrishnan U, Venkatachalapathy K, Marimuthu SG (2015) A hybrid PSO-DEFS based feature selection for the identification of diabetic retinopathy. Curr Diabet Rev 11:182–190
Brown G (2009) A new perspective for information theoretic feature selection. In: Proceedings of international conference on artificial intelligence and statistics, pp 49–56
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Venter G (2010) Review of optimization techniques. Encycl Aerosp Eng. doi:10.1002/9780470686652.eae495
Lichman M (2013) UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
Wang G, Song Q, Sun H, Zhang X, Xu B, Zhou Y (2013) A feature subset selection algorithm automatic recommendation method. J Artif Intell Res 47:1–34
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors do not bear any financial or personal relationships with other people or the organizations that could inappropriately influence their work.
Rights and permissions
About this article
Cite this article
Tiwari, S., Singh, B. & Kaur, M. An approach for feature selection using local searching and global optimization techniques. Neural Comput & Applic 28, 2915–2930 (2017). https://doi.org/10.1007/s00521-017-2959-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-2959-y