Abstract
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. In this paper a promising method for detecting the domain structure of a protein from sequence information alone was presented. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using support vector machines. The overall accuracy of the method for a single protein chains dataset, is about 85%. The result demonstrates that the utility of the method can help not only in predicting the complete 3D structure of a protein but also in the study of proteins’ building blocks and for functional analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rose, G.D.: Hierarchic Organization of Domains in Globular Proteins. J. Mol. Biol. 134, 447–470 (1979)
Sonnhammer, E.L., Kahn, D.: Modular Arrangement of Proteins as Inferred From Analysis of Homology. Protein Sci. 3, 482–492 (1994)
Gracy, J., Argos, P.: Automated Protein Sequence Database Classification. I. Integration of Copositional Similarity Search, Local Similarity Search and Multiple Sequence Alignment. II. Delineation of domain boundries from sequence similarity. Bioinformatics 14, 164–187 (1998)
George, R.A., Heringa, J.: Protein Domain Identification and Improved Sequence Similarity Searching Ssing PSI-BLAST. Proteins 48, 672–681 (2002)
Murzin, G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 247, 536–540 (1995)
Orengo, A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH-a Hierarchic Classification of Protein Domain Structures. Structure 5, 1093–1108 (1997)
Holm, L., Sander, C.: Mapping the Protein Universe. Science 273, 595–602 (1996)
Alexandrov, N., Shindyalov, I.: PDP:protein domain parser. Bioinf. 19, 429–430 (2003)
Xu, Y., Xu, D.: Protein Domain Decomposition Using a Graph-Theoretic Approach. Bioinformatics 16, 1091–1104 (2000)
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D., Sonnhammer, E.L.: Pfam 3.1: 1313 Multiple Alignments and Profile HMMs Match the Majority of Proteins. Nucl. Acids Res. 27, 260–262 (1999)
Ponting, P., Schultz, J., Milpetz, F., Bork, P.: SMART: Identification and Annotation of domains from Signaling and Extracellular Protein Sequences. Nucl. Acids Res. 27, 229–232 (1999)
Wheelan, S.J., Marchler-Bauer, A., Bryant, S.H.: Domain Size Distributions Can Predict Domain Boundaries. Bioinformatics 16, 613–618 (2000)
Galzitskaya, O.V., Melnik, B.S.: Prediction of Protein Domain Boundaries from Sequence alone. Protein Science 12, 696–701 (2003)
Kosiol, C., Goldman, N., Buttimore, N.H.: A New Criterion and Method for Amino Acid Classification. Journal of Theoretical Biology 228, 97–106 (2004)
Nagaragan, N., Yona, G.: Automatic Prediction of Protein Domains from Sequence Information Using a Hybrid Learn System. Bioinformatics 1, 1–27 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zou, S., Huang, Y., Wang, Y., Zhou, C. (2006). Prediction of Protein Domains from Sequence Information Using Support Vector Machines. In: Wang, J., Yi, Z., Zurada, J.M., Lu, BL., Yin, H. (eds) Advances in Neural Networks - ISNN 2006. ISNN 2006. Lecture Notes in Computer Science, vol 3973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11760191_99
Download citation
DOI: https://doi.org/10.1007/11760191_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34482-7
Online ISBN: 978-3-540-34483-4
eBook Packages: Computer ScienceComputer Science (R0)