Efficient methods for multiple sequence alignment with guaranteed error bounds

Dan Gusfield¹

187 Accesses
113 Citations
Explore all metrics

Abstract

Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature

Altschul, S. 1989. Gap costs for multiple alignment.J. theor. Biol. 138, 297–309.
MathSciNet Google Scholar
Altschul, S. and D. Lipman. 1989. Trees, stars, and multiple sequence alignment.SIAM J. appl. Math 49, 197–209.
Article MATH MathSciNet Google Scholar
Argos, P. and M. Vingren. 1990. Sensitivity comparisons of protein amino acid sequences. InMethods in Enzymology. Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, R. Doolittle (Ed.), Vol. 183, pp. 352–365. New York: Academic Press.
Google Scholar
Bacon, D. and W. Anderson. 1986. Multiple sequence alignment.J. molec. Biol. 191, 153–161.
Article Google Scholar
Carrillo, H. and D. Lipman. 1988. The multiple sequence alignment problem in biology.SIAM J. appl. Math 48, 1073–1082.
Article MATH MathSciNet Google Scholar
Doolittle, R. 1986.Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. Mill Valley, CA: University Science Books.
Google Scholar
Feng, D. and R. Doolittle. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees.J. molec. Evol. 25, 351–360.
Google Scholar
Gusfield, D. 1984.The Steiner Tree Problem in Phylogeny. Technical report No. 334. Yale University Computer Science Department, September 1984.
Johnson, M. and R. Doolittle. 1986. A method for the simultaneous alignment of three or more amino acid sequences.J. molec. Evol. 23, 267–278.
Article Google Scholar
Jukes, T. H. and C. R. Cantor. 1969. Evolution of protein molecules. InMammalian Protein Metabolism, H. N. Munro (Ed.), pp. 21–132. New York: Academic Press.
Google Scholar
Kou, L., G. Markowsky and L. Berman. 1981. A fast algorithm for steiner trees.ACTA Informatica 15.
Lipman, D., S. Altshul and J. Kececioglu. 1980. A tool for multiple sequence alignment.Proc. natn. Acad. Sci., U.S.A. 86, 4412–4415.
Article Google Scholar
Murata, M., J. Richardson and Joel Sussman. 1985. Simultaneous comparison of three protein sequences.Proc. natn. Acad. Sci., U.S.A.,82, 3073–3077.
Article Google Scholar
Sankoff, D. and R. Cedergren. 1983. Simultaneous comparisons of three or more sequences related by a tree. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, D. Sankoff and J. Kruskal (Eds), pp. 253–264. Reading, MA: Addison Wesley.
Google Scholar
Sankoff, D. and J. Kruskal (Eds). 1983.Time Warps, String Edits, and Macromolecules: the Theory and Pactice of Sequence Comparison. Reading, MA: Addison Wesley.
Google Scholar
Schuler, G. D., S. F. Altschul and D. J. Lipman. 1992. A workbench for multiple alignment construction and analysis. InProteins: Structure, Function and Genetics, in press.
Schwarz, R. and M. Dayhoff. 1979. Matrices for detecting distant relationships. InAtlas of Protein Sequences, M. Dayhoff (Ed.), pp. 353–358. National Biomedical Research Foundation.
Tarjan, R. E. 1983.Data Structures and Network Algorithms. Philadelphia, PA: SIAM.
Google Scholar
Waterman, M. 1986. Multiple sequence alignment by consensus.Nucleic Acids Res. 14, 9095–9102.
MathSciNet Google Scholar
Wong, R. 1980. Worst-case analysis of network design problem heuristics.SIAM J. Algebraic Discrete Methods 1, 51–63.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

computer Science Division, University of California, 95616-8755, Davis, CA, U.S.A.
Dan Gusfield

Authors

Dan Gusfield
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gusfield, D. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bltn Mathcal Biology 55, 141–154 (1993). https://doi.org/10.1007/BF02460299

Download citation

Received: 11 July 1991
Revised: 12 November 1991
Issue Date: January 1993
DOI: https://doi.org/10.1007/BF02460299

Efficient methods for multiple sequence alignment with guaranteed error bounds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Programming

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Objective Functions

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient methods for multiple sequence alignment with guaranteed error bounds

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Programming

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Objective Functions

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation