Abstract
The main challenge of gene selection from gene expression dataset is to reduce the redundant genes without affecting discernibility between objects. A pipelined approach combining feature ranking together with rough sets attribute reduction for gene selection is proposed. Feature ranking is used to narrow down the gene space as the first step, top ranked genes are selected; the minimal reduct is induced by rough sets to eliminate the redundant attributes. An exploration of this approach on Leukemia gene expression data is conducted and good results are obtained with no preprocessing to the data. The experiment results show that this approach is successful for selecting high discriminative genes for cancer classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Wang, L.P., Feng, C., Xie, X.: Accurate Cancer Classification Using Expressions of Very Few Genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4, 40–53 (2007)
Au, A., Chan, K.C.C., Wong, A.K.C., Wang, Y.: Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)
Smet, F.D., Pochet, N.L.M.M., Engelen, K., Gorp, T.V., Hummelen, P.V., Marchal, K., Amant, F., Timmerman, D., Moor, B.D., Vergote, I.: Predicting the Clinical Behavior of Ovarian Cancer from Gene Expression Profiles. International Journal of Gynecological Cancer 16, 147–151 (2006)
Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F.X., Mewes, H.W.: Gene Selection from Microarray Data for Cancer Classification-A Machine Learning Approach. Computational Biology and Chemistry 29, 37–46 (2005)
Ding, C.: Analysis of Gene Expression Profiles: Class Discovery and Leaf Ordering. In: 6th Annual Conference on Research in Computational Molecular Biology, pp. 127–136. ACM Press, New York (2002)
Pawlak, Z.: Rough Set- Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dorderecht (1991)
Wang, J., Waog, J.: Reduction Algorithms Based on Discernibly Matrix: The Ordered Attributes Method. Journal of Computer Science And Technology 16, 489–504 (2002)
Miao, D.Q., Hu, G.R.: A Heuristic Algorithm for Reduction of Knowledge. Journal of Computer Research and Development 36, 681–684 (1999)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics 16, 906–914 (2000)
Valdes, J.J., Barton, A.J.: Gene Discovery in Leukemia Revisited: A Computational Intelligence Perspective. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS (LNAI), vol. 3029, pp. 118–127. Springer, Heidelberg (2004)
Ding, C., Peng, H.C.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3, 185–205 (2003)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. In: 4th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 54–64. Universal Academy Press, Tokyo (2000)
Tseng, V.S., Kao, C.P.: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 355–365 (2005)
Mitra, S., Hayashi, Y.: Bioinformatics with Soft Computing. IEEE Transactions on Systems, Man and Cybernetics-Part C: Applications and Reviews 36, 616–635 (2006)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the 13th International Joint Conference of Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, Chambery, France (1993)
Van, D.G.E., Leccia, M., Dekker, S., Jalbert, N., Amodeo, D., Byers, H.: Role of Zyxin in Differential Cell Spreading and Proliferation of Melanoma Cells and Melanocytes. J. Invest. Dermatol. 118, 246–254 (2002)
Yagi, T., Morimoto, A., Eguchi, M., Hibi, S., Sako, M., Ishii, E., Mizutani, S., Imashuku, S., Ohki, M., Ichikawa, H.: Identification of a Gene Expression Signature Associated with Pediatric AML Prognosis. Blood 102, 1849–1856 (2003)
Banerjee, M., Mitra, S., Banka, H.: Evolutinary-Rough Feature Selection in Gene Expression Data. IEEE Transaction on Systems, Man, and Cybernetics, Part C: Application and Reviews 37, 622–632 (2007)
Momin, B.F., Mitra, S., Datta Gupta, R.: Reduct Generation and Classification of Gene Expression Data. In: Proceeding of First International Conference on Hybrid Information Technology (ICHICT 2006), pp. 699–708. IEEE Press, New York (2006)
Deb, K., Reddy, A.R.: Reliable Classification of Two Class Cancer Data Using Evolutionary Algorithms. BioSystems 72, 111–129 (2003)
Cho, S.B., Ryu, J.: Classification Gene Expression Data of Cancer Using Classifier Ensemble with Mutually Exclusive Features. In: Proceedings of the IEEE, Special Issue on Bioinformatics Part-I: Advances and Challenges, pp. 1744–1753. IEEE Press, New York (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, L., Miao, D., Zhang, H. (2008). Efficient Gene Selection with Rough Sets from Gene Expression Data. In: Wang, G., Li, T., Grzymala-Busse, J.W., Miao, D., Skowron, A., Yao, Y. (eds) Rough Sets and Knowledge Technology. RSKT 2008. Lecture Notes in Computer Science(), vol 5009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79721-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-79721-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79720-3
Online ISBN: 978-3-540-79721-0
eBook Packages: Computer ScienceComputer Science (R0)