article

Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment

Authors:

Ying Xu,

Liming CaiAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 3, Issue 4

Pages 423 - 432

https://doi.org/10.1109/TCBB.2006.52

Published: 01 October 2006 Publication History

Get Access

Abstract

Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity, but also spatially conserved conformations caused by residue interactions and, consequently, is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where, in general, the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k^{t} N^{2}) for the structure of N residues and its tree decomposition of width t. Parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. This paper demonstrates a successful application of the algorithm to RNA structure search used for noncoding RNA identification. An application to protein threading is also discussed.

References

[1]

{1} S. Arnborg and A. Proskurowski, "Linear Time Algorithms for NP-Hard Problems Restricted to Partial k-Trees," Discrete Applied Math., vol. 23, pp. 11-24, 1989.

Digital Library

Google Scholar

[2]

{2} S. Arnborg, J. Lagergren, and D. Seese, "Easy Problems for Tree-Decomposable Graphs," J. Algorithms, vol. 12, pp. 308-340, 1991.

Digital Library

Google Scholar

[3]

{3} J. Bowie, R. Luthy, and D. Eisenberg, "A Method to Identify Protein Sequences that Fold into a Known Three-Dimensional Structure," Science, vol. 253, pp. 164-170, 1991.

Google Scholar

[4]

{4} H.L. Bodlaender, "A Linear Time Algorithm for Finding Tree-Decompositions of Small Treewidth," SIAM J. Computing, vol. 25, pp. 1305-1317, 1996.

Digital Library

Google Scholar

[5]

{5} S.H. Bryant and S.F. Altschul, "Statistics of Sequence-Structure Threading," Current Opinion Structural Biology, vol. 5, pp. 236-244, 1995.

Crossref

Google Scholar

[6]

{6} M. Brown and C. Wilson, "RNA Pseudoknot Modeling Using Intersections of Stochastic Context Free Grammars with Applications to Database Search," Proc. Pacific Symp. Biocomputing, pp. 109-125, 1995.

Google Scholar

[7]

{7} L. Cai, R. Malmberg, and Y. Wu, "Stochastic Modeling of Pseudoknot Structures: A Grammatical Approach," Bioinformatics, vol. 19, pp. i66-i73, 2003.

Crossref

Google Scholar

[8]

{8} T. Dandekar, S. Schuster, B. Snel, M. Huynen, and P. Bork, "Pathway Alignment: Application to the Comparative Analysis of Glycolytic Enzymes," Biochemical J., vol. 1, pp. 115-24, 1999.

Crossref

Google Scholar

[9]

{9} A.T. Dandjinou, N. Lévesque, S. Larose, J. Lucier, S.A. Elela, and R.J. Wellinger, "A Phylogenetically Based Secondary Structure for the Yeast Telomerase RNA," Current Biology, vol. 14, pp. 1148- 1158, 2004.

Crossref

Google Scholar

[10]

{10} J.A. Doudna, "Structural Genomics of RNA," Nature Structural Biology, vol. 7, no. 11 supp., pp. 954-956, 2000.

Crossref

Google Scholar

[11]

{11} R. Downey and M. Fellows, Parameterized Complexity. Springer, 1999.

Digital Library

Google Scholar

[12]

{12} R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.

Crossref

Google Scholar

[13]

{13} S.R. Eddy, "Computational Genomics of Non-Coding RNA Genes," Cell, vol. 109, pp. 137-140, 2002.

Crossref

Google Scholar

[14]

{14} S. Eddy and R. Durbin, "RNA Sequence Analysis Using Covariance Models," Nucleic Acids Research, vol. 22, pp. 2079- 2088, 1994.

Crossref

Google Scholar

[15]

{15} D. Eppstein, "Subgraph Isomorphism in Planar Graphs and Related Problems," J. Graph Algorithms and Applications, vol. 3.3, pp. 1-27, 1999.

Crossref

Google Scholar

[16]

{16} S.J. Geobel, B. Hsue, T.F. Dombrowski, and P.S. Masters, "Characterization of the RNA Components of a Putative Molecular Switch in the 3' Untranslated Region of the Murine Coronavirus Genome," J. Virology, vol. 78, pp. 669-682, 2004.

Crossref

Google Scholar

[17]

{17} S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy, "Rfam: An RNA Family Database," Nucleic Acids Research, vol. 31, pp. 439-441, 2003.

Crossref

Google Scholar

[18]

{18} R.J. Klein and S.R. Eddy, "RSEARCH: Finding Homologs of Single Structured RNA Sequences," BMC Bioinformatics, vol. 4, p. 44, 2003.

Crossref

Google Scholar

[19]

{19} R.H. Lathrop, "The Protein Threading Problem with Sequence Amino Acid Interaction Preferences Is NP-Complete," Protein Eng., vol. 7, pp. 1069-1068, 1994.

Crossref

Google Scholar

[20]

{20} R.H. Lathrop, R.G. Rogers Jr., J. Bienkowska, B.K.M. Bryant, L.J. Buturovic, C. Gaitatzes, R. Nambudripad, J.V. White, and T.F. Smith, "Analysis and Algorithms for Protein Sequence-Structure Alignment," Computational Methods in Molecular Biology, S.L. Salzberg, D.B. Searls, and S. Kasif, eds., Elsevier, 1998.

Google Scholar

[21]

{21} A. Leaver-Fay, B. Kuhlman, and J. Snoeyink, "An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem," Proc. Pacific Symp. Biocomputing 10, pp. 16-27, 2005.

Google Scholar

[22]

{22} H.-P. Lenhof, K. Reinert, and M. Vingron, "A Polyhedral Approach to RNA Sequence Structure Alignment," J. Computational Biology, vol. 5, no. 3, pp. 517-530, 1998.

Crossref

Google Scholar

[23]

{23} C. Liu, Y. Song, R. Malmberg, and L. Cai, "Profiling and Searching for RNA Pseudoknot Structures in Genomes," Lecture Notes in Computer Science, vol. 3515, pp. 968-975, 2005.

Digital Library

Google Scholar

[24]

{24} T.M. Lowe and S.R. Eddy, "tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence," Nucleic Acids Research, vol. 25, pp. 955-964, 1997.

Google Scholar

[25]

{25} S.B. Lyngso and C.N. Pedersen, "RNA Pseudoknot Prediction in Energy-Based Models," J. Computational Biololgy, vol. 7, no. 3, pp. 409-427, 2000.

Crossref

Google Scholar

[26]

{26} J. Matousek and R. Thomas, "On the Complexity of Finding Iso-and Other Morphisms for Partial k-Trees," Discrete Math., vol. 108, pp. 343-364, 1992.

Digital Library

Google Scholar

[27]

{27} E.M. Marcotte, P. Matteo, H.L. Ng, D.W. Rice, T.O. Yeates, and D. Eisenberg, "Detecting Protein Function and Protein-Protein Interactions from Genome Sequences," Science, vol. 285, pp. 751- 753, 1999.

Crossref

Google Scholar

[28]

{28} N. Nameki, B. Felden, J.F. Atkins, R.F. Gesteland, H. Himeno, and A. Muto, "Functional and Structural Analysis of a Pseudoknot Upstream of the Tag-Encoded Sequence in E. coli tmRNA," J. Molecular Biology, vol. 286, no. 3, pp. 733-744, 1999.

Crossref

Google Scholar

[29]

{29} D.D. Pervouchine, "IRIS: Intermolecular RNA Interaction Search," Genome Informatics, vol. 15, no. 2, pp. 92-101, 2004.

Google Scholar

[30]

{30} K. Reinert, H.-P. Lenhof, P. Mutzel, K. Mehlhorn, and J.D. Kececioglu, "A Branch-and-Cut Algorithm for Multiple Sequence Alignment," Proc. First Ann. Int'l Conf. Computational Molecular Biology, pp. 241-250, 1997.

Digital Library

Google Scholar

[31]

{31} E. Rivas and S.R. Eddy, "Noncoding RNA Gene Detection Using Comparative Sequence Analysis," BMC Bioinformatics, vol. 2, p. 8, 2001.

Crossref

Google Scholar

[32]

{32} N. Robertson and P.D. Seymour, "Graph Minors II. Algorithmic Aspects of Tree-Width," J. Algorithms, vol. 7, pp. 309-322, 1986.

Google Scholar

[33]

{33} Y. Song, K. Ellrott, C. Liu, J. Guo, Y. Xu, and L. Cai, "Efficient Protein Threading with Tree Decomposition," manuscript, 2006.

Google Scholar

[34]

{34} Y. Song, C. Liu, R. Malmberg, F. Pan, and L. Cai, "Tree Decomposition Based Fast Search of RNA Secondary Structures in Genomes," Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 223-234, 2005.

Digital Library

Google Scholar

[35]

{35} Y. Uemura, A. Hasegawa, Y. Kobayashi, and T. Yokomori, "Tree Adjoining Grammars for RNA Structure Prediction," Theoretical Computer Science, vol. 210, pp. 277-303, 1999.

Digital Library

Google Scholar

[36]

{36} G. Wang and R.L. Dunbrack Jr., "PISCES: A Protein Sequence Culling Server," Bioinformatics, vol. 19, pp. 1589-1591, 2003.

Google Scholar

[37]

{37} D. Xu, M.A. Unseren, Y. Xu, and E.C. Uberbacher, "Sequence-Structure Specificity of a Knowledge Based Energy Function at the Secondary Structure Level," Bioinformatics, vol. 16, pp. 257-268, 2000.

Crossref

Google Scholar

[38]

{38} J. Xu, "Rapid Side-Chain Packing via Tree Decomposition," Proc. 2005 Int'l Conf. Research in Computational Biology, pp. 423-439, 2005.

Digital Library

Google Scholar

[39]

{39} J. Xu, F. Jiao, and B. Berger, "A Tree-Decomposition Approach to Protein Structure Prediction," Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 247-256, 2005.

Digital Library

Google Scholar

[40]

{40} J. Xu, M. Li, D. Kim, and Y. Xu, "RAPTOR: Optimal Protein Threading by Linear Programming," J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 95-113, 2003.

Crossref

Google Scholar

[41]

{41} Y. Xu, Z. Liu, L. Cai, and D. Xu, "Protein Structure Prediction by Protein Threading," Computational Methods for Protein Structure Prediction and Modeling, Y. Xu, D. Xu and J. Liang, eds., Springer, 2006.

Google Scholar

[42]

{42} Y. Xu, D. Xu, and E.C. Uberbacher, "An Efficient Computational Method for Globally Optimal Threading," J. Computational Biology, vol. 5, no. 3, pp. 597-614, 1998.

Crossref

Google Scholar

Cited By

View all

Haan RKanj ISzeider S(2017)On the Parameterized Complexity of Finding Small Unsatisfiable Subsets of CNF Formulas and CSP InstancesACM Transactions on Computational Logic10.1145/309152818:3(1-46)Online publication date: 11-Aug-2017
https://dl.acm.org/doi/10.1145/3091528
Hassan OKanj ILokshtanov DPerković L(2016)On the Ordered List Subgraph Embedding ProblemsAlgorithmica10.1007/s00453-015-9980-274:3(992-1018)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s00453-015-9980-2
Song YLiu CWang Z(2015)A machine learning approach for accurate annotation of noncoding RNAsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.236675812:3(551-559)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1109/TCBB.2014.2366758
Show More Cited By

Index Terms

Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
      2. Game tree search

Recommendations

Efficient parameterized algorithm for biopolymer structure-sequence alignment
WABI'05: Proceedings of the 5th International conference on Algorithms in Bioinformatics

Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment ...
Local RNA structure alignment with incomplete sequence

Motivation: Accuracy of automated structural RNA alignment is improved by using models that consider not only primary sequence but also secondary structure information. However, current RNA structural alignment approaches tend to perform poorly on ...
MicroRNA identification based on sequence and structure alignment

Motivation: MicroRNAs (miRNA) are ~22 nt long non-coding RNAs that are derived from larger hairpin RNA precursors and play important regulatory roles in both animals and plants. The short length of the miRNA sequences and relatively low conservation ...

Reviews

Reviewer: Constantin S Chassapis

The structure-sequence alignment between a pseudoknotted ribonucleic acid (RNA) structure and a sequence is a nondeterministic polynomial-time (NP)-hard problem. A way to attack these types of problems is with a parameterized complexity approach, where a parameter, say k, of the problem (in this case, a statistical cut-off for the correspondence between the biopolymer sequence and structure) is included in the problem specification. An algorithm in such a case is called fixed-parameter tractable, and has a running time of the type f(k)Nc, where N is the overall input size (in this case, the number of amino acids of the RNA macromolecule), c is an independent constant, and f is an arbitrary function. Through excellent use of graph theory and by treating the alignment question as a subgraph isomorphism problem, the authors manage to formulate an efficient algorithm of a just O(kt N2) behavior (t is the tree decomposition width of the tree decomposition of the graph that models the biopolymer) in comparison, for example, with an O(Nk) behavior of other algorithms in the literature with similar objectives. The parameterized algorithm presented in the paper has one more novelty, apart from the aforementioned formulation of the structure-sequence alignment into the subgraph isomorphism problem, that previous algorithms do not have: the structure-sequence alignment happens at the structure unit level (a unit is a stretch of residues as a whole) instead of at the residue level, allowing for the consideration of sophisticated biomolecular structure models, like, for example, ones with pseudoknots. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 3, Issue 4

October 2006

117 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2006

Published in TCBB Volume 3, Issue 4

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
397
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Haan RKanj ISzeider S(2017)On the Parameterized Complexity of Finding Small Unsatisfiable Subsets of CNF Formulas and CSP InstancesACM Transactions on Computational Logic10.1145/309152818:3(1-46)Online publication date: 11-Aug-2017
Hassan OKanj ILokshtanov DPerković L(2016)On the Ordered List Subgraph Embedding ProblemsAlgorithmica10.1007/s00453-015-9980-274:3(992-1018)Online publication date: 1-Mar-2016
Song YLiu CWang Z(2015)A machine learning approach for accurate annotation of noncoding RNAsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.236675812:3(551-559)Online publication date: 1-May-2015
Song YChi A(2015)Peptide sequencing via graph path decompositionInformation Sciences: an International Journal10.1016/j.ins.2015.01.003301:C(262-270)Online publication date: 20-Apr-2015
Hamid ATran Q(2015)Emerging Trends in Computational Biology, Bioinformatics, and Systems BiologyundefinedOnline publication date: 21-Aug-2015
Ding LSamad AXue XHuang XMalmberg RCai L(2014)Stochastic k-Tree Grammar and Its Application in Biomolecular Structure ModelingProceedings of the 8th International Conference on Language and Automata Theory and Applications - Volume 837010.1007/978-3-319-04921-2_25(308-322)Online publication date: 10-Mar-2014
Denise ARinaudo P(2013)Optimisation Problems for Pairwise RNA Sequence and Structure ComparisonTransactions on Computational Intelligence XIII - Volume 834210.1007/978-3-642-54455-2_3(70-82)Online publication date: 1-Aug-2013
Ma CWong TLam THon WSadakane KYiu S(2012)An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic SequenceIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2012.1049:6(1629-1638)Online publication date: 1-Nov-2012
Wong TChiu YLam TYiu S(2012)Memory Efficient Algorithms for Structural Alignment of RNAs with PseudoknotsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2011.669:1(161-168)Online publication date: 1-Jan-2012

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Efficient parameterized algorithm for biopolymer structure-sequence alignment

Local RNA structure alignment with incomplete sequence

MicroRNA identification based on sequence and structure alignment

Reviews

Access critical reviews of Computing literature here