Abstract
A large pool of techniques have already been developed for analyzing micro-array datasets but less attention has been paid on multi-class classification problems. In this context, selecting features and quantify classifiers may be hard since only few training examples are available in each single class. This paper demonstrates a framework for multi-class learning that considers learning a classifier within each class independently and grouping all relevant features in a single dataset. Next step, that dataset is presented as input to a classification algorithm that learns a global classifier across the classes. We analyze two micro-array datasets using the proposed framework. Results demonstrate that our approach is capable of identifying a small number of influential genes within each class while the global classifier across the classes performs better than existing multi-class learning methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Piatetsky-Shapiro, G., Tamayo, P.: Microarray Data Mining: Facing the Challenges. ACM SIGKDD Explorations 5(2) (2003)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnill, S.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of Statistical Learning: Data Mining, Inference, Prediction. Springer, Heidelberg (2001)
Weston, J., Watkins, C.: Multi-class support vector machines. Technical Report, Department of Computer Science, Holloway, University of London, Egham, UK (1998)
Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132–1139 (2003)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(3), 1–12 (2005)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Pranckeviciene, E., Somorjai, R.: On Classification Models of Gene Expression Microarrays: The Simpler the Better. International Joint Conference on Neural Networks (2006)
Yukinawa, N., et al.: Optimal aggregation of binary classifiers for multi-class cancer diagnosis using gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics (preprint) (2008)
Simon, H.: Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). SIGKDD Explorations 5(2), 31–36 (2003)
Bell, D., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learning 41(2), 175–195 (2000)
Caruana, R., Freitag, D.: How useful is relevance? In: Working Notes of the AAAI Fall Symposium on Relevance. AAAI Press, N. Orleans (1994)
Bosin, A., Dessì, N., Pes, B.: A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS, vol. 4578, pp. 571–579. Springer, Heidelberg (2007)
Bosin, A., Dessì, N., Pes, B.: Capturing Heuristics and Intelligent Methods for Improving Micro-array Data Classification. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 790–799. Springer, Heidelberg (2007)
Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)
Bhattacharjee, A., Richards, W.G., et al.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenoma subclasses. PNAS 98, 13790–13795 (2001)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Amsterdam (2005)
Statnikov, A., Aliferis, C.F., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5) (2005)
Liu, H., et al.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome informatics 13, 51–60 (2002)
Ling, N.E., Hasan, Y.A.: Classification on microarray data. In: IMT-GT Regional Conference on Mathematics, Statistics and Applications, Malaysia (2006)
Ding, Y., Wilkins, D.: Improving the Performance of SVM-RFE to Select Genes in Microarray Data. BMC Bioinformatics 7(suppl. 2), S12 (2006)
Piatetsky-Shapiro, G., et al.: Capturing Best Practice for Microarray Gene Expression Data Analysis. In: SIGKDD 2003, Washington, USA (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dessì, N., Pes, B. (2009). A Framework for Multi-class Learning in Micro-array Data Analysis. In: Combi, C., Shahar, Y., Abu-Hanna, A. (eds) Artificial Intelligence in Medicine. AIME 2009. Lecture Notes in Computer Science(), vol 5651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02976-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-02976-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02975-2
Online ISBN: 978-3-642-02976-9
eBook Packages: Computer ScienceComputer Science (R0)