Authors:
Vijay Sambhe
1
;
Shanmukha Rajesh
1
;
Enrique Naredo
2
;
1
;
Douglas Mota Dias
2
;
3
;
1
;
Meghana Kshirsagar
2
;
1
and
Conor Ryan
2
;
1
Affiliations:
1
University of Limerick, Limerick, Ireland
;
2
Lero – Science Foundation Ireland Research Centre for Software, Ireland
;
3
UERJ – Rio de Janeiro State University, Brazil
Keyword(s):
DNA Sequences, MAP-Elites, k-mer, NSGA-II, Feature Selection, Genetic Algorithms.
Abstract:
The advent of the Covid-19 pandemic has resulted in a global crisis making the health systems vulnerable, challenging the research community to find novel approaches to facilitate early detection of infections. This open-up a window of opportunity to exploit machine learning and artificial intelligence techniques to address some of the issues related to this disease. In this work, we address the classification of ten SARS-CoV-2 protein sequences related to Covid-19 using k-mer frequency as features and considering two objectives; classification performance and feature selection. The first set of experiments considered the objectives one at the time, four techniques were used for the feature selection and twelve well known machine learning methods, where three are neural network based for the classification. The second set of experiments considered a multi-objective approach where we tested a well known multi-objective approach Non-dominated Sorting Genetic Algorithm II (NSGA-II), and
the Multi-dimensional Archive of Phenotypic Elites (MAP-Elites), which considers quality+diversity containers to guide the search through elite solutions. The experimental results shows that ResNet and PCA is the best combination using single objectives. Whereas, for the mulit-classification, NSGA-II outperforms ME with two out of three classifiers, while ME gets competitive results bringing more diverse set of solutions.
(More)