Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Efficient Parameterized Algorithms for Biopolymer Structure-Sequence Alignment

Published: 01 October 2006 Publication History

Abstract

Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity, but also spatially conserved conformations caused by residue interactions and, consequently, is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where, in general, the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k^{t} N^{2}) for the structure of N residues and its tree decomposition of width t. Parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. This paper demonstrates a successful application of the algorithm to RNA structure search used for noncoding RNA identification. An application to protein threading is also discussed.

References

[1]
{1} S. Arnborg and A. Proskurowski, "Linear Time Algorithms for NP-Hard Problems Restricted to Partial k-Trees," Discrete Applied Math., vol. 23, pp. 11-24, 1989.
[2]
{2} S. Arnborg, J. Lagergren, and D. Seese, "Easy Problems for Tree-Decomposable Graphs," J. Algorithms, vol. 12, pp. 308-340, 1991.
[3]
{3} J. Bowie, R. Luthy, and D. Eisenberg, "A Method to Identify Protein Sequences that Fold into a Known Three-Dimensional Structure," Science, vol. 253, pp. 164-170, 1991.
[4]
{4} H.L. Bodlaender, "A Linear Time Algorithm for Finding Tree-Decompositions of Small Treewidth," SIAM J. Computing, vol. 25, pp. 1305-1317, 1996.
[5]
{5} S.H. Bryant and S.F. Altschul, "Statistics of Sequence-Structure Threading," Current Opinion Structural Biology, vol. 5, pp. 236-244, 1995.
[6]
{6} M. Brown and C. Wilson, "RNA Pseudoknot Modeling Using Intersections of Stochastic Context Free Grammars with Applications to Database Search," Proc. Pacific Symp. Biocomputing, pp. 109-125, 1995.
[7]
{7} L. Cai, R. Malmberg, and Y. Wu, "Stochastic Modeling of Pseudoknot Structures: A Grammatical Approach," Bioinformatics, vol. 19, pp. i66-i73, 2003.
[8]
{8} T. Dandekar, S. Schuster, B. Snel, M. Huynen, and P. Bork, "Pathway Alignment: Application to the Comparative Analysis of Glycolytic Enzymes," Biochemical J., vol. 1, pp. 115-24, 1999.
[9]
{9} A.T. Dandjinou, N. Lévesque, S. Larose, J. Lucier, S.A. Elela, and R.J. Wellinger, "A Phylogenetically Based Secondary Structure for the Yeast Telomerase RNA," Current Biology, vol. 14, pp. 1148- 1158, 2004.
[10]
{10} J.A. Doudna, "Structural Genomics of RNA," Nature Structural Biology, vol. 7, no. 11 supp., pp. 954-956, 2000.
[11]
{11} R. Downey and M. Fellows, Parameterized Complexity. Springer, 1999.
[12]
{12} R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[13]
{13} S.R. Eddy, "Computational Genomics of Non-Coding RNA Genes," Cell, vol. 109, pp. 137-140, 2002.
[14]
{14} S. Eddy and R. Durbin, "RNA Sequence Analysis Using Covariance Models," Nucleic Acids Research, vol. 22, pp. 2079- 2088, 1994.
[15]
{15} D. Eppstein, "Subgraph Isomorphism in Planar Graphs and Related Problems," J. Graph Algorithms and Applications, vol. 3.3, pp. 1-27, 1999.
[16]
{16} S.J. Geobel, B. Hsue, T.F. Dombrowski, and P.S. Masters, "Characterization of the RNA Components of a Putative Molecular Switch in the 3' Untranslated Region of the Murine Coronavirus Genome," J. Virology, vol. 78, pp. 669-682, 2004.
[17]
{17} S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy, "Rfam: An RNA Family Database," Nucleic Acids Research, vol. 31, pp. 439-441, 2003.
[18]
{18} R.J. Klein and S.R. Eddy, "RSEARCH: Finding Homologs of Single Structured RNA Sequences," BMC Bioinformatics, vol. 4, p. 44, 2003.
[19]
{19} R.H. Lathrop, "The Protein Threading Problem with Sequence Amino Acid Interaction Preferences Is NP-Complete," Protein Eng., vol. 7, pp. 1069-1068, 1994.
[20]
{20} R.H. Lathrop, R.G. Rogers Jr., J. Bienkowska, B.K.M. Bryant, L.J. Buturovic, C. Gaitatzes, R. Nambudripad, J.V. White, and T.F. Smith, "Analysis and Algorithms for Protein Sequence-Structure Alignment," Computational Methods in Molecular Biology, S.L. Salzberg, D.B. Searls, and S. Kasif, eds., Elsevier, 1998.
[21]
{21} A. Leaver-Fay, B. Kuhlman, and J. Snoeyink, "An Adaptive Dynamic Programming Algorithm for the Side Chain Placement Problem," Proc. Pacific Symp. Biocomputing 10, pp. 16-27, 2005.
[22]
{22} H.-P. Lenhof, K. Reinert, and M. Vingron, "A Polyhedral Approach to RNA Sequence Structure Alignment," J. Computational Biology, vol. 5, no. 3, pp. 517-530, 1998.
[23]
{23} C. Liu, Y. Song, R. Malmberg, and L. Cai, "Profiling and Searching for RNA Pseudoknot Structures in Genomes," Lecture Notes in Computer Science, vol. 3515, pp. 968-975, 2005.
[24]
{24} T.M. Lowe and S.R. Eddy, "tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence," Nucleic Acids Research, vol. 25, pp. 955-964, 1997.
[25]
{25} S.B. Lyngso and C.N. Pedersen, "RNA Pseudoknot Prediction in Energy-Based Models," J. Computational Biololgy, vol. 7, no. 3, pp. 409-427, 2000.
[26]
{26} J. Matousek and R. Thomas, "On the Complexity of Finding Iso-and Other Morphisms for Partial k-Trees," Discrete Math., vol. 108, pp. 343-364, 1992.
[27]
{27} E.M. Marcotte, P. Matteo, H.L. Ng, D.W. Rice, T.O. Yeates, and D. Eisenberg, "Detecting Protein Function and Protein-Protein Interactions from Genome Sequences," Science, vol. 285, pp. 751- 753, 1999.
[28]
{28} N. Nameki, B. Felden, J.F. Atkins, R.F. Gesteland, H. Himeno, and A. Muto, "Functional and Structural Analysis of a Pseudoknot Upstream of the Tag-Encoded Sequence in E. coli tmRNA," J. Molecular Biology, vol. 286, no. 3, pp. 733-744, 1999.
[29]
{29} D.D. Pervouchine, "IRIS: Intermolecular RNA Interaction Search," Genome Informatics, vol. 15, no. 2, pp. 92-101, 2004.
[30]
{30} K. Reinert, H.-P. Lenhof, P. Mutzel, K. Mehlhorn, and J.D. Kececioglu, "A Branch-and-Cut Algorithm for Multiple Sequence Alignment," Proc. First Ann. Int'l Conf. Computational Molecular Biology, pp. 241-250, 1997.
[31]
{31} E. Rivas and S.R. Eddy, "Noncoding RNA Gene Detection Using Comparative Sequence Analysis," BMC Bioinformatics, vol. 2, p. 8, 2001.
[32]
{32} N. Robertson and P.D. Seymour, "Graph Minors II. Algorithmic Aspects of Tree-Width," J. Algorithms, vol. 7, pp. 309-322, 1986.
[33]
{33} Y. Song, K. Ellrott, C. Liu, J. Guo, Y. Xu, and L. Cai, "Efficient Protein Threading with Tree Decomposition," manuscript, 2006.
[34]
{34} Y. Song, C. Liu, R. Malmberg, F. Pan, and L. Cai, "Tree Decomposition Based Fast Search of RNA Secondary Structures in Genomes," Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 223-234, 2005.
[35]
{35} Y. Uemura, A. Hasegawa, Y. Kobayashi, and T. Yokomori, "Tree Adjoining Grammars for RNA Structure Prediction," Theoretical Computer Science, vol. 210, pp. 277-303, 1999.
[36]
{36} G. Wang and R.L. Dunbrack Jr., "PISCES: A Protein Sequence Culling Server," Bioinformatics, vol. 19, pp. 1589-1591, 2003.
[37]
{37} D. Xu, M.A. Unseren, Y. Xu, and E.C. Uberbacher, "Sequence-Structure Specificity of a Knowledge Based Energy Function at the Secondary Structure Level," Bioinformatics, vol. 16, pp. 257-268, 2000.
[38]
{38} J. Xu, "Rapid Side-Chain Packing via Tree Decomposition," Proc. 2005 Int'l Conf. Research in Computational Biology, pp. 423-439, 2005.
[39]
{39} J. Xu, F. Jiao, and B. Berger, "A Tree-Decomposition Approach to Protein Structure Prediction," Proc. 2005 IEEE Computational Systems Bioinformatics Conf., pp. 247-256, 2005.
[40]
{40} J. Xu, M. Li, D. Kim, and Y. Xu, "RAPTOR: Optimal Protein Threading by Linear Programming," J. Bioinformatics and Computational Biology, vol. 1, no. 1, pp. 95-113, 2003.
[41]
{41} Y. Xu, Z. Liu, L. Cai, and D. Xu, "Protein Structure Prediction by Protein Threading," Computational Methods for Protein Structure Prediction and Modeling, Y. Xu, D. Xu and J. Liang, eds., Springer, 2006.
[42]
{42} Y. Xu, D. Xu, and E.C. Uberbacher, "An Efficient Computational Method for Globally Optimal Threading," J. Computational Biology, vol. 5, no. 3, pp. 597-614, 1998.

Cited By

View all
  • (2017)On the Parameterized Complexity of Finding Small Unsatisfiable Subsets of CNF Formulas and CSP InstancesACM Transactions on Computational Logic10.1145/309152818:3(1-46)Online publication date: 11-Aug-2017
  • (2016)On the Ordered List Subgraph Embedding ProblemsAlgorithmica10.1007/s00453-015-9980-274:3(992-1018)Online publication date: 1-Mar-2016
  • (2015)A machine learning approach for accurate annotation of noncoding RNAsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.236675812:3(551-559)Online publication date: 1-May-2015
  • Show More Cited By

Recommendations

Reviews

Constantin S Chassapis

The structure-sequence alignment between a pseudoknotted ribonucleic acid (RNA) structure and a sequence is a nondeterministic polynomial-time (NP)-hard problem. A way to attack these types of problems is with a parameterized complexity approach, where a parameter, say k, of the problem (in this case, a statistical cut-off for the correspondence between the biopolymer sequence and structure) is included in the problem specification. An algorithm in such a case is called fixed-parameter tractable, and has a running time of the type f(k)Nc, where N is the overall input size (in this case, the number of amino acids of the RNA macromolecule), c is an independent constant, and f is an arbitrary function. Through excellent use of graph theory and by treating the alignment question as a subgraph isomorphism problem, the authors manage to formulate an efficient algorithm of a just O(kt N2) behavior (t is the tree decomposition width of the tree decomposition of the graph that models the biopolymer) in comparison, for example, with an O(Nk) behavior of other algorithms in the literature with similar objectives. The parameterized algorithm presented in the paper has one more novelty, apart from the aforementioned formulation of the structure-sequence alignment into the subgraph isomorphism problem, that previous algorithms do not have: the structure-sequence alignment happens at the structure unit level (a unit is a stretch of residues as a whole) instead of at the residue level, allowing for the consideration of sophisticated biomolecular structure models, like, for example, ones with pseudoknots. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 3, Issue 4
October 2006
117 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2006
Published in TCBB Volume 3, Issue 4

Author Tags

  1. RNA structure homology search
  2. Structure-sequence alignment
  3. dynamic programming
  4. parameterized algorithm
  5. protein threading.
  6. tree decomposition

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)On the Parameterized Complexity of Finding Small Unsatisfiable Subsets of CNF Formulas and CSP InstancesACM Transactions on Computational Logic10.1145/309152818:3(1-46)Online publication date: 11-Aug-2017
  • (2016)On the Ordered List Subgraph Embedding ProblemsAlgorithmica10.1007/s00453-015-9980-274:3(992-1018)Online publication date: 1-Mar-2016
  • (2015)A machine learning approach for accurate annotation of noncoding RNAsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.236675812:3(551-559)Online publication date: 1-May-2015
  • (2015)Peptide sequencing via graph path decompositionInformation Sciences: an International Journal10.1016/j.ins.2015.01.003301:C(262-270)Online publication date: 20-Apr-2015
  • (2015)Emerging Trends in Computational Biology, Bioinformatics, and Systems BiologyundefinedOnline publication date: 21-Aug-2015
  • (2014)Stochastic k-Tree Grammar and Its Application in Biomolecular Structure ModelingProceedings of the 8th International Conference on Language and Automata Theory and Applications - Volume 837010.1007/978-3-319-04921-2_25(308-322)Online publication date: 10-Mar-2014
  • (2013)Optimisation Problems for Pairwise RNA Sequence and Structure ComparisonTransactions on Computational Intelligence XIII - Volume 834210.1007/978-3-642-54455-2_3(70-82)Online publication date: 1-Aug-2013
  • (2012)An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic SequenceIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2012.1049:6(1629-1638)Online publication date: 1-Nov-2012
  • (2012)Memory Efficient Algorithms for Structural Alignment of RNAs with PseudoknotsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2011.669:1(161-168)Online publication date: 1-Jan-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media