Abstract
One of the aims of modern Bioinformatics is to discover the molecular mechanisms that rule the protein operation. This would allow us to understand the complex processes involved in living systems and possibly correct dysfunctions. The first step in this direction is the identification of the functional sites of proteins.
In this paper, we propose new kernels for the automatic protein active site classification. In particular, we devise innovative attribute-value and tree substructure representations to model biological and spatial information of proteins in Support Vector Machines. We experimented with such models and the Protein Data Bank adequately pre-processed to make explicit the active site information. Our results show that structural kernels used in combination with polynomial kernels can be effectively applied to discriminate an active site from other regions of a protein. Such finding is very important since it firstly shows a successful identification of catalytic sites for a very large family of proteins belonging to a broad class of enzymes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cilia, E., Fabbri, A., Uriani, M., Scialdone, G.G., Ammendola, S.: The signature amidase from sulfolobus solfataricus belongs to the cx3c subgroup of enzymes cleaving both amides and nitriles: Ser195 and cys145 are predicted to be the active sites nucleophiles. The FEBS Journal 272, 4716–4724 (2005)
Tramontano, A.: The ten most wanted solutions in Protein Bioinformatics. Mathematical Biology and Medicine Series. Chapman & Hall/CRC (2005)
Brunak, S., Baldi, P., Frasconi, P., Pollastri, G., Soda, G.: Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15(11) (1999)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28(1), 235–242 (2000)
Meng, E.C., Polacco, B.J., Babbitt, P.C.: Superfamily active site templates. PROTEINS: Structure, Function, and Bioinformatics 55, 962–976 (2004)
Gärtner, T.: A survey of kernels for structured data. Multi Relational Data Mining (MRDM) 5, 49–58 (2003)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Moschitti, A.: Efficient convolution kernels for dependency and constituent syntactic trees. In: Proceedings of The 17th European Conference on Machine Learning, Berlin, Germany (2006)
Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In: ACL 2002 (2002)
Borgwardt, K.: Graph-based Functional Classification of Proteins using Kernel Methods. Ludwig Maximilians University of Monaco (2004)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. Journal 36, 1389–1401 (1957)
Moschitti, A.: A study on convolution kernel for shallow semantic parsing. In: ACL-2004. Proceedings of the 42th Conference on Association for Computational Linguistic, Barcelona, Spain (2004)
Joachims, T.: Making large-scale svm learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1999)
Yang, Z.R.: Orthogonal kernels machines for the prediction of functional sites in proteins. IEEE Trans. on Systems, Man and Cybernetics 35(1), 100–106 (2005)
Petrova, N.V., Wu, C.H.: Prediction of catalytic residues using support vector machine with selected protein sequence and structural properties. BMC Bionformatics (7), 312–324 (2006)
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Muller, K.R: Engineering support vector machine kernel that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)
Pavlidis, P., Furey, T.S., Liberto, M., Grundy, W.N.: Promoter region-based classification of genes. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 151–163 (2001)
Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bionformatics 17(5), 455–460 (2001)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Ares, M., Haussler, D.: Support vector machine classification of microarray expression data. In: UCSC-URL (1999)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machine. PNAS 97(1), 262–267 (2000)
Furey, T.S., Duffy, N., Cristianini, N., Bednarski, D., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Pavlidis, P., Furey, T., Liberto, M., Grundy, W.N.: Learning gene functional classification from multiple data types. Journal of Computational Biology (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cilia, E., Moschitti, A. (2007). Advanced Tree-Based Kernels for Protein Classification. In: Basili, R., Pazienza, M.T. (eds) AI*IA 2007: Artificial Intelligence and Human-Oriented Computing. AI*IA 2007. Lecture Notes in Computer Science(), vol 4733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74782-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-74782-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74781-9
Online ISBN: 978-3-540-74782-6
eBook Packages: Computer ScienceComputer Science (R0)