Abstract
Gene expression array technology has rapidly become a standard tool for biologists. Its use within areas such as diagnostics, toxicology, and genetics, calls for good methods for finding patterns and prediction models from the generated data. Rule induction is one promising candidate method due to several attractive properties such as high level of expressiveness and interpretability. In this work we investigate the use of rule induction methods for mining gene expression patterns from various cancer types. Three different rule induction methods are evaluated on two public tumor tissue data sets. The methods are shown to obtain as good prediction accuracy as the best current methods, at the same time allowing for straightforward interpretation of the prediction models. These models typically consist of small sets of simple rules, which associate a few genes and expression levels with specific types of cancer. We also show that information gain is a useful measure for ranked feature selection in this domain.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999) Molecular classification of cancer:Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, S. Y. D. and Levine, A. J. (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci., 96, 6745–6750.
Khan, J., Wei, J. S., Rignér, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 7, 673–679.
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z. (2000) Tissue classification with gene expression profiles. In Proceedings of the 4th International Conference on Computational Molecular Biology (RECOMB) Universal Academy Press, Tokyo.
Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schumm, M. and Haussler, D. (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906–914.
Keller, A. D., Schummer, M., Hood, L. and Ruzzo, W. L. (2000) Bayesian Classification of DNA Array Expression Data. Technical Report, University of Washington.
Zhang, H., Yu, C. Y., Singer, B. and Xiong, M. (2001) Recursive partitioning for tumor classification with gene expression microarray data. Proc. Natl. Acad. Sci., 98, 6730–6735
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002) Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46(1–3): 389–422.
Quinlan, J. R. (1986) Induction of decision trees. Machine Learning, 1, 81–106
Rivest, R. L. (1987) Learning Decision Lists. Machine Learning, 2, 229–246
Clark, P. and Niblett, T. (1989) The CN2 Induction Algorithm. Machine Learning, 3, 261–283
Bostróm, H. (2001) Virtual Predict User Manual. Virtual Genetics Laboratory AB, available from http://www.vglab.com
Bostróm, H. and Asker, L. (1999) Combining Divide-and-Conquer and Separate-and-Conquer for Efficient and Effective Rule Induction. Proc. of the Ninth International Workshop on Inductive Logic Programming, LNAI Series 1634, Springer, 33–43
Fayyad, U. and Irani, K. (1992) On the Handling of Continuous Valued Attributes in Decision Tree Generation. Machine Learning, 8, 87–102
Cestnik, B. and Bratko, I. (1991) On estimating probabilities in tree pruning. Proc. of the Fifth European Working Session on Learning, Springer, 151–163
Quinlan and Rivest (1989) “Inferring Decision Trees Using the Minimum Description Length Principle”, Information and Computation 80(3) (1989) 227–248
Freund, Y. and Schapire, R. E. (1996) Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, 148–156
Bostróm, H. (1995) Covering vs, Divide-and-Conquer for Top-Down Induction of Logic Programs. Proc. of the Fourteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann 1194–1200
Cohen, W. W. (1995) Fast Effective Rule Induction. Machine Learning: Proc. of the 12th International Conference, Morgan Kaufmann, 115–123
Tahir, S. A., Yang, G., Ebara, S., Timme, T. L., Satoh, T., Li, L., Goltsov, A., Ittmann, M., Morrisett, J. D. and Thompson, T. C. (2001) Secreted caveolin-1 stimulates cell survival/clonal growth and contributes to metastasis in androgen-insensitive prostate cancer. Cancer Res., 61, 3882–3885
Fine, S. W., Lisanti, M. P., Galbiati, F. and Li, M. (2001) Elevated expression of caveolin-1 in adenocarcinoma of the colon. Am. J. Clin. Pathol., 115, 719–724
Rimokh, R., Gadoux, M., Berthéas, M. F., Berger, F., Garoscio, M., Deléage, G., Germain, D. and Magaud, J. P. (1993) FVT-1, a novel human transcription unit affected by variant translocation t(2;18)(p11;q21) of follicular lymphoma. Blood, 81, 136–142
Busson-Le Coniat, M., Salomon-Nguyen, F., Hillion, J., Bernard, O. A. and Berger, R. (1999) MLL-AF1q fusion resulting from t(1;11) in acute leukemia. Leukemia, 13, 302–306
Coles, L. S., Diamond, P., Occhiodoro, F., Vadas, M. A. and Shannon, M. F. (1996) Cold shock domain proteins repress transcription from the GM-CSF promoter. Nucleic Acids Res., 24, 2311–2317
Duclos, F., Straub, V., Moore, S. A., Venzke, D. P., Hrstka, R. F., Crosbie, R.H., Durbeej, M., Lebakken, C. S., Ettinger, A. J., van der Meulen, J., Holt, K. H., Lim, L. E., Sanes, J. R., Davidson, B. L., Faulkner, J. A., Williamson, R. and Campbell, K. P. (1998) Progressive muscular dystrophy in alpha-sarcoglycan-deficient mice. J. Cell. Biol., 142, 1461–1471
El-Badry, O. M., Minniti, C., Kohn, E. C., Houghton, P. J., Daughaday, W. H. and Helman, L. J. (1990) Insulin-like growth factor II acts as an autocrine growth and motility factor in human rhabdomyosarcoma tumors. Cell Growth Differ., 1, 325–331
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lidén, P., Asker, L., Bostróm, H. (2002). Rule Induction for Classification of Gene Expression Array Data. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2002. Lecture Notes in Computer Science, vol 2431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45681-3_28
Download citation
DOI: https://doi.org/10.1007/3-540-45681-3_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44037-6
Online ISBN: 978-3-540-45681-0
eBook Packages: Springer Book Archive