Abstract
Selecting the most discriminative features is a challenging problem in many applications. Bio-inspired optimization algorithms have been widely applied to solve many optimization problems including the feature selection problem. In this paper, the most discriminating features were selected by a new Chaotic Dragonfly Algorithm (CDA) where chaotic maps embedded with searching iterations of the Dragonfly Algorithm (DA). Ten chaotic maps were employed to adjust the main parameters of dragonflies’ movements through the optimization process to accelerate the convergence rate and improve the efficiency of DA. The proposed algorithm is employed for selecting features from the dataset that were extracted from the Drug bank database, which contained 6712 drugs. In this paper, 553 drugs that were bio-transformed into liver are used. This data have four toxic effects, namely, irritant, mutagenic, reproductive, and tumorigenic effect, where each drug is represented by 31 chemical descriptors. The proposed model is mainly comprised of three phases; data pre-processing, features selection, and the classification phase. In the data pre-processing phase, Synthetic Minority Over-sampling Technique (SMOTE) was used to solve the problem of the imbalanced dataset. At the features selection phase, the most discriminating features were selected using CDA. Finally, the selected features from CDA were used to feed Support Vector Machine (SVM) classifier at the classification phase. Experimental results proved the capability of CDA to find the optimal feature subset, which maximizing the classification performance and minimizing the number of selected features compared with DA and the other meta-heuristic optimization algorithms. Moreover, the experiments showed that Gauss chaotic map was the appropriate map to significantly boost the performance of DA. Additionally, the high obtained value of accuracy (81.82–96.08%), recall (80.84–96.11%), precision (81.45–96.08%) and F-Score (81.14–96.1%) for all toxic effects proved the robustness of the proposed model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu H, Motoda H (2012) Feature selection for knowledge discovery and data mining, vol 454. Springer Science & Business Media, Berlin
Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863
Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’m AZ, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems
Tharwat A, Gaber T, Ibrahim A, Hassanien AE (2017) Linear discriminant analysis: a detailed tutorial. AI Commun 30(2):169–190
Tharwat A (2016) Principal component analysis-a tutorial. Int J Appl Pattern Recogn 3(3):197–240
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Kashef S, Nezamabadi-pour H (2013) A new feature selection algorithm based on binary ant colony optimization. in: 5th conference on information and knowledge technology (IKT). IEEE, pp 50–54
Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl Soft Comput 43:117–130
Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J Glob Optim 39(3):459–471
Moayedikia A, Jensen R, Wiil UK, Forsati R (2015) Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng Appl Artif Intell 44:153–167
Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98
Zawbaa HM, Emary E, Parv B (2015) Feature selection based on antlion optimization algorithm. in: Third world conference on complex systems (WCCS). IEEE, pp 1–7
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Mirjalili S, Gandomi AH, Mirjalili S, Saremi S, Faris H, Mirjalili S (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053– 1073
Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3-4):853–871
Gandomi AH, Yang XS (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232
Pereira M, Costa VS, Camacho R, Fonseca NA, Simões C, Brito RM (2009) Comparative study of classification algorithms using molecular descriptors in toxicological databases. In: Advances in Bioinformatics and Computational Biology. Springer, Berlin, pp 121–132
Huang R, Southall N, Xia M, Cho MH, Jadhav A, Nguyen DT, Inglese J, Tice RR, Austin CP (2009) Weighted feature significance (wfs): a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features. Toxicol Sci 112(2):385–393
Tharwat A, Gaber T, Fouad MM, Snasel V, Hassanien AE (2015) Towards an automated zebrafish-based toxicity test model using machine learningProceedings of International Conference on Communications, management, and Information technology (ICCMIT’2015). Proced Comput Sci 65:643–651
Klopman G (1984) Artificial intelligence approach to structure-activity studies. computer automated structure evaluation of biological activity of organic molecules. J Am Chem Soc 106(24):7315–7321
Prival MJ (2001) Evaluation of the topkat system for predicting the carcinogenicity of chemicals. Environ Mol Mutagen 37(1):55–69
Woo YT, Lai DY, Argus MF, Arcos JC (1995) Development of structure-activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol Lett 79(1):219–228
Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55(2):460–473
Tharwat A, Moemen YS, Hassanien AE (2017) Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines. J Biomed Inform 68:132–149
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
López V, Fernández A, Moreno-Torres JG, Herrera F (2012) Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics. Expert Syst Appl 39(7):6585–6608
López V, Fernández A, Del Jesus MJ, Herrera F (2013) A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl-Based Syst 38:85–104
Reynolds CW (1987) Flocks, herds and schools: a distributed behavioral model. ACM SIGGRAPH Comput Graph 21(4):25–34
Tharwat A, Hassanien AE (2018) Chaotic antlion algorithm for parameter optimization of support vector machine. Appl Intell 48(3):670–686
Zhang Q, Li Z, Zhou CJ, Wei XP (2013) Bayesian network structure learning based on the chaotic particle swarm optimization algorithm. Genet Mol Res 12(4):4468–4479
Saremi S, Mirjalili S, Lewis A (2014) Biogeography-based optimization with chaos. Neural Comput Appl 25(5):1077–1097
Sarafrazi S (2013) Facing the classification of binary problems with a gsa-svm hybrid system. Math Comput Model 57:270–278
Tharwat A, Mahdi H, Elhoseny M, Hassanien AE (2018) Recognizing human activity in mobile crowdsensing environment using optimized k-nn algorithm. Expert Syst Appl 107:32–44
Tharwat A (2016) Linear vs. quadratic discriminant analysis classifier: a tutorial. Int J Appl Pattern Recogn 3(2):145–180
Liu Z, Cui Y, Li W (2015) A classification method for complex power quality disturbances using eemd and rank wavelet svm. IEEE Trans Smart Grid 6(4):1678–1685
Sun L, Liu H, Zhang L, Meng J (2015) lncrscan-svm: a tool for predicting long non-coding rnas using support vector machine. PloS one 10(10):e0139654
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689
Tharwat A, Hassanien AE, Elnaghi BE (2017) A ba-based algorithm for parameter optimization of support vector machine. Pattern Recogn Lett 93:13–22
Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57
Mirjalili S, Mirjalili S, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Meng X, Liu Y, Gao X, Zhang H (2014) A new bio-inspired algorithm: chicken swarm optimization. In: International conference in swarm intelligence. Springer, Berlin, pp 86–94
Askarzadeh A (2016) A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Comput Struct 169:1–12
Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83
Derrac J, García S., Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evoloutionary Comput 1(1):3–18
Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Schiezaro M, Pedrini H (2013) Data feature selection based on artificial bee colony algorithm. EURASIP J Image Video Process 2013(1):1–8
Sayed GI, Soliman M, Hassanien AE (2016) Bio-inspired swarm techniques for thermogram breast cancer detection. In: Medical Imaging in Clinical Applications. Springer, pp 487–506
Hafez AI, Zawbaa HM, Emary E, Mahmoud HA, Hassanien AE (2015) An innovative approach for feature selection based on chicken swarm optimization. In: 2015 7th international conference of soft computing and pattern recognition (SoCPaR). IEEE, pp 19–24
Hafez AI, Zawbaa HM, Emary E, Hassanien AE (2016) Sine cosine optimization algorithm for feature selection. In: International symposium on INnovations in intelligent systems and applications (INISTA). IEEE, pp 1–5
Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell, pp 1–20
Sayed GI, Hassanien AE, Azar AT (2017) Feature selection via a novel chaotic crow search algorithm. Neural Comput Applic, pp 1–18
Acknowledgment
We would like to thank Dr. Yasmine S. Momen of the clinical pathology department; national liver institute for providing the database that has been used in this work and for her great effort for getting understanding the used dataset.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Rights and permissions
About this article
Cite this article
Sayed, G.I., Tharwat, A. & Hassanien, A.E. Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell 49, 188–205 (2019). https://doi.org/10.1007/s10489-018-1261-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1261-8