Abstract
Given a dataset, exemplars are subset of data points that can represent a set of data points without significance loss of information. Affinity propagation is an exemplar discovery technique that, unlike k–centres clustering, gives uniform preference to all data points. The data points iteratively exchange real–valued messages, until clusters with their representative exemplar become apparent.
In this paper, we propose a Class Aware Exemplar Discovery (CAED) algorithm, which assigns preference value to data points based on their ability to differentiate samples of one class from others. To aid this, CAED performs class wise ranking of data points, assigning preference value to each data point based on its class wise rank. While exchanging messages, data points with better representative ability are more favored for being chosen as exemplar over other data points.
The proposed method is evaluated over 18 gene expression datasets to check its efficacy for selection of relevant exemplars from large datasets. Experimental evaluation exhibits improvement in classification accuracy over affinity propagation and other state-of-art feature selection techniques. Class Aware Exemplar Discovery converges in lesser iterations as compared to affinity propagation thereby dropping the execution time significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Inza, I., Larrañaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
De Abreu, F.B., Wells, W.A., Tsongalis, G.J.: The emerging role of the molecular diagnostics laboratory in breast cancer personalized medicine. Am. J. Pathol. 183(4), 1075–1083 (2013)
Kononenko, I., Šimec, E., Robnik-Šikonja, M.: Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7(1), 39–55 (1997)
Hall, M.A.: Correlation-based feature selection for machine learning. Doctoral dissertation, The University of Waikato (1999)
Kashef, R., Kamel, M.S.: Efficient bisecting k-medoids and its application in gene expression analysis. In: Campilho, A., Kamel, M. (eds.) ICIAR 2008. LNCS, vol. 5112, pp. 423–434. Springer, Heidelberg (2008)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
De Souto, M.C., Costa, I.G., de Araujo, D.S., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinf. 9(1), 497 (2008)
Foithong, S., Pinngern, O., Attachoo, B.: Feature subset selection wrapper based on mutual information and rough sets. Expert Syst. Appl. 39(1), 574–584 (2012)
Mramor, M., Leban, G., Demšar, J., Zupan, B.: Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16), 2147–2154 (2007)
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1/2), 245–271 (1997)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Soufan, O., Kleftogiannis, D., Kalnis, P., Kalnis, B.: Bajic DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PLoS ONE 10, e0117988 (2015). doi:10.1371/journal.pone.0117988
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sharma, S., Agrawal, A., Patel, D. (2015). Class Aware Exemplar Discovery from Microarray Gene Expression Data. In: Kumar, N., Bhatnagar, V. (eds) Big Data Analytics. BDA 2015. Lecture Notes in Computer Science(), vol 9498. Springer, Cham. https://doi.org/10.1007/978-3-319-27057-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-27057-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27056-2
Online ISBN: 978-3-319-27057-9
eBook Packages: Computer ScienceComputer Science (R0)