Abstract
The execution of many computational steps per time unit typical of parallel computers offers an important benefit in reducing the computing time in real world applications. In this work, a parallel Particle Swarm Optimization (PSO) is used for gene selection of high dimensional Microarray datasets. The proposed algorithm, called PMSO, consists of running a set of independent PSOs following an island model, where a migration policy exchanges solutions with a certain frequency. A feature selection mechanism is embedded in each subalgorithm for finding small samples of informative genes amongst thousands of them. PMSO has been experimentally assessed with different population structures on four well-known cancer datasets. The contributions are twofold: our parallel approach is able to improve sequential algorithms in terms of computational time/effort (Efficiency of 85%), as well as in terms of accuracy rate, identifying specific genes that our work suggests as significant ones for an accurate classification.
Additional comparisons with several recent state the of art methods also show competitive results with improvements of over 100% in the classification rate and very few genes per subset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alba E (2002) Parallel evolutionary algorithms can achieve super-linear performance. Inf Process Lett 82(1):7–13
Alba E (2005) Parallel metaheuristics: a new class of algorithms. Wiley series on parallel and distributed computing. Wiley, New York
Alba E, Dorronsoro B (2008) Cellular genetic algorithms. Springer, Berlin
Alba E, Luque G (2005) Parallel metaheuristics. A new class of algorithms. In: Measuring the performance of parallel metaheuristics. Wiley series on parallel and distributed computing. Wiley, New York, pp 43–62. Chap 2
Alba E, Troya JM (2001) Analyzing synchronous and asynchronous parallel distributed genetic algorithms. Future Gener Comput Syst 17(4):451–465
Alba E, García-Nieto J, Jourdan L, Talbi E-G (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE congress on evolutionary computation CEC-07, Singapore, Sep 2007, pp 284–290
Alba E, Luque G, García-Nieto J, Ordonez G, Leguizamón G (2007) MALLBA: a software library to design efficient optimisation algorithms. Int J Innov Comput Appl 1(1):74–85
Alizadeh A.A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 11:403–503
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750
Chang C-C, Lin C-J (2002) LIBSVM: a library for support vector machines
Cho S, Won H (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250
Clerc M (2005) Binary particle swarm optimisers: Toolbox, derivations, and mathematical insights
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J (2008) Monte Carlo feature selection for supervised classification. Bioinformatics 24(1):110–117
Fix E, Hodges JL (1951) Nonparametric discrimination: consistency properties. Technical report, 4, US Air Force School of Aviation Medicine, R Field, TX
Golub R, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Hernandez J, Duval B, Hao J-K (2007) A genetic embedded approach for gene selection and classification of microarray data. In: Marchiori E et al (eds) LNCS of EvoBio, pp 90–101
Huerta EB, Duval B, Hao J-K (2006) A Hybrid GA SVM approach for gene selection and classification of microarray data. In: Rothlauf F et al (eds) LNCS of EvoWorkshops, vol 3907. Springer, Berlin, pp 34–44
Juliusdottir T, Keedwell E, Corne D, Narayanan A (2005) Two-phase EA/K-NN for feature selection and classification in cancer microarray datasets. In: Comp int in bioinformatics and computational biology, pp 1–8
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proc of the IEEE international conference on neural networks, vol 4, pp 1942–1948
Kennedy J, Eberhart R (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of the IEEE international conference on systems, man and cybernetics, vol 5, pp 4104–4109
Kohavi J, John GH (1998) The wrapper approach. In: Feature selection for knowledge discovery and data mining, pp 33–50
Liu J, Iba H (2002) Selecting informative genes using a multiobjective evolutionary algorithm. In: Proceedings of the IEEE congress on evolutionary computation, CEC’02, May 2002, vol 1, pp 297–302
Liu B, Cui Q, Jiang T, Ma S (2004) A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinform 5:136–148
Moraglio A, Di Chio C, Poli R (2007) Geometric particle swarm optimization. In: 10th European conference on genetic programming (EuroGP 2007). Lecture notes in computer science, vol 4445. Springer, Berlin
Narendra M, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26:917–922
Pease AC, Solas D, Sullivan E, Cronin M, Holmes CP, Fodor S (1994) Light-generated oligonucleotide arrays for rapid dna sequence analysis. In: Proc natl acad sci, vol 96., pp 5022–5026
Romdhane L, Shili H, Ayeb B (2010) Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs. Appl Intell 33:220–231
Salto C, Alba E (In press) Designing heterogeneous distributed GAs by efficiently self-adapting the migration period. Appl Intell (Online first). doi:10.1007/s10489-011-0297-9
Verma B, Hassan SZ (2010) Hybrid ensemble approach for classification. Appl Intell 34(2):258–278
Vinh L, Lee S, Park Y, dÁuriol B (In press) A novel feature selection method based on normalized mutual information. Appl Intell (Online first). doi:10.1007/s10489-011-0315-y
Wang S, Zhu J (2007) Improved centroids estimation for the nearest shrunken centroid classifier. Bioinformatics 32(2):972–979
Zhu H, Jiao L, Pan J (2006) Multi-population genetic algorithm for feature selection. In: ICNC (2), pp 480–487
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
García-Nieto, J., Alba, E. Parallel multi-swarm optimizer for gene selection in DNA microarrays. Appl Intell 37, 255–266 (2012). https://doi.org/10.1007/s10489-011-0325-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-011-0325-9