Abstract
We present an application of BioDCV, a computational environment for semisupervised profiling with Support Vector Machines, aimed at detecting outliers and deriving informative subtypes of patients with respect to pathological features. First, a sample-tracking curve is extracted for each sample as a by-product of the profiling process. The curves are then clustered according to a distance derived from Dynamic Time Warping. The procedure allows identification of noisy cases, whose removal is shown to improve predictive accuracy and the stability of derived gene profiles. After removal of outliers, the semisupervised process is repeated and subgroups of patients are specified. The procedure is demonstrated through the analysis of a liver cancer dataset of 213 samples described by 1 993 genes and by pathological features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albanese, D.: BioDCV: a distributed computing system for the complete validation of gene profiles. Master’s thesis, University of Trento (2005)
Simon, R., Radmacher, M., Dobbin, K., McShane, L.: Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 95, 14–18 (2003)
Sese, J., Kurokawa, Y., Monden, M., Kato, K., Morishita, S.: Constrained clusters of gene expression profiles with pathological features. Bioinformatics 20, 3137–3145 (2004)
Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Semisupervised learning for molecular profiling. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 110–118 (2005)
Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2 (2004) DOI: 10.1371/journal.pbio.0020108
Merler, S., Caprile, B., Furlanello, C.: Bias-variance control via hard points shaving. International Journal of Pattern Recognition and Artificial Intelligence 18, 891–903 (2004)
Li, L., Pratap, A., Lin, H., Abu-Mostafa, Y.: Generalization by data categorization. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 157–168. Springer, Heidelberg (2005)
Furlanello, C., Serafini, M., Merler, S., Jurman, G.: Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 54 (2003)
Aach, J., Church, G.: Aligning gene expression time series with time warping algorithms. Bioinformatics 17, 495–508 (2001)
Furlanello, C., Merler, S., Jurman, G.: Combining feature selection and DTW for time-varying functional genomics. Technical Report T05-05-01, ITC-irst (2005)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paoli, S., Jurman, G., Albanese, D., Merler, S., Furlanello, C. (2006). Semisupervised Profiling of Gene Expressions and Clinical Data. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds) Fuzzy Logic and Applications. WILF 2005. Lecture Notes in Computer Science(), vol 3849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11676935_35
Download citation
DOI: https://doi.org/10.1007/11676935_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32529-1
Online ISBN: 978-3-540-32530-7
eBook Packages: Computer ScienceComputer Science (R0)