Abstract
Over the past several decades, biologists have conducted numerous studies examining both general and specific functions of proteins. Generally, if similarities in either the structure or sequence of amino acids exist for two proteins, then a common biological function is expected. Protein function is determined primarily based on the structure rather than the sequence of amino acids. The algorithm for protein structure alignment is an essential tool for the research. The quality of the algorithm depends on the quality of the similarity measure that is used, and the similarity measure is an objective function used to determine the best alignment. However, none of existing similarity measures became golden standard because of their individual strength and weakness. They require excessive filtering to find a single alignment. In this paper, we introduce a new strategy that finds not a single alignment, but multiple alignments with different lengths. This method has obvious benefits of high quality alignment. However, this novel method leads to a new problem that the running time for this method is considerably longer than that for methods that find only a single alignment. To address this problem, we propose algorithms that can locate a common region (CORE) of multiple alignment candidates, and can then extend the CORE into multiple alignments. Because the CORE can be defined from a final alignment, we introduce CORE* that is similar to CORE and propose an algorithm to identify the CORE*. By adopting CORE* and dynamic programming, our proposed method produces multiple alignments of various lengths with higher accuracy than previous methods. In the experiments, the alignments identified by our algorithm are longer than those obtained by TM-align by 17 % and 15.48 %, on average, when the comparison is conducted at the level of super-family and fold, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ginalski K, Grishin N V, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Research, 2005, 33(6): 1874–1891.
Roytberg M, Gambin A, Noe L et al. On subset seeds for protein alignment. IEEE/ACM Trans. Computational Biology and Bioinformatics, 2009, 6(3): 483–494.
Mayr G, Domingues F, Lackner P. Comparative analysis of protein structure alignments. BMC Structural Biology, 2007, 7: Article No.50.
Zhang Y. Protein structure prediction: When is it useful? Current Opinion in Structural Biology, 2009, 19(2): 145–155.
Holm L, Sander C. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 1993, 233(1): 123–138.
Dahiyat B I, Mayo S L. De novo protein design: Fully automated sequence selection. Science, 1997, 278(5335): 82–87.
Yakunin A F, Yee A A, Savchenko A, Edwards A M, Arrowsmith C H. Structural proteomics: A tool for genome annotation. Current Opinion on Chemical Biology, 2004, 8(1): 42–48.
Menke M, Berger B, Cowen L. Matt: Local flexibility aids protein multiple structure alignment. PLoS Computational Biology, 2008, 4(1): e10.
Gu J, Bourne P. Structural Bioinformatics (2nd edition). John Wiley, 2009.
Arun K S, Huang T S, Blostein S D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Analysis and Machine Intelligence, 1987, 9(5): 698–700.
Sippl M J, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics, 2008, 24(3): 426–427.
Chen L, Zhou T, Tang Y. Protein structure alignment by deterministic annealing. Bioinformatics, 2005, 21: 51–62.
Glasgow J, Kuo T, Davies J. Protein structure from contact maps: A case-based reasoning approach. Information Systems Frontiers, 2006, 8(1): 29–36.
Bhattacharya S, Bhattacharyya C, Chandra N R. Comparison of protein structures by growing neighborhood alignments. BMC Bioinformatics, 2007, 8: Article No.77.
Kolbeck B, May P, Schmidt-Goenner T, Steinke T, Knapp E W. Connectivity independent protein-structure alignment: A hierarchical approach. BMC Bioinformatics, 2006, 7: Article No.510.
Eidhammer I, Jonassen I, Taypor W. Structure comparison and structure patterns. Journal of Computational Biology, 2000, 7(5): 685–716.
Shindyalov I N, Bourne P E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, 1998, 11(9): 739–747.
Taylor W R, Orengo C A. Protein structure alignment. Journal of Molecular Biology, 1989, 208(1): 1–22.
Taylor WR. Protein structure comparison using iterated double dynamic programming. Protein Science, 1999, 8(3): 654–665.
Jewett A I, Huang C C, Ferrin T E. MINRMS: An efficient algorithm for determining protein structure similarity using root-mean-squared-distance. Bioinformatics, 2003, 19(5): 625–634.
Lotan I, Schwarzer F. Approximation of protein structure for fast similarity measures. Journal of Computational Biology, 2004, 11(2/3): 299–317.
Gibrat J F, Madej T, Bryant S H. Surprising similarities in structure comparison. Current Opinion in Structural Biology, 1996, 6(3): 377–385.
Kabsch W, Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983, 22(12): 2577–2637.
Frishman D, Argos P. Knowledge-based protein secondary structure assignment. Proteins-Structure Function and Genetics, 1995, 23(4): 566–579.
Holm L, Sander C. 3-D lookup: Fast protein structure database searches at 90 % reliability. In Proc. the 3rd Int. Conference on Intelligent Systems for Molecular Biology, July 1995, Vol.3, pp.179-187.
Nussinov R, Wolfson H J. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc. National Academy of Sciences of USA, 1991, 88(23): 10495–10499.
Le Q, Pollastri G, Koehl P. Structural alphabets for protein structure classification: A comparison study. Journal of Molecular Biology, 2009, 387(2): 431–450.
Erdmann M A. Protein similarity from knot theory: Geometric convolution and line weavings. Journal of Computational Biology, 2005, 12(6): 609–637.
Zhang Y, Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 2005, 33(7): 2302–2309.
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins, 2004, 57(4): 702–710.
Godzik A. The structural alignment between two proteins: Is there a unique answer? Protein Science, 1996, 5(7): 1325–1338.
Murzin A G, Brenner S E, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995, 247(4): 536–540.
Berman H M, Westbrook J, Feng Z et al. The protein data bank. Nucleic Acids Research, 2000, 28(1): 235–242.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea under Grant No.2012R1A1A3013084.
The preliminary version of the paper was published in the Proceedings of EDB2012.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kim, WC., Park, S. & Won, JI. CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution. J. Comput. Sci. Technol. 28, 647–656 (2013). https://doi.org/10.1007/s11390-013-1365-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1365-x