Abstract
With the increased democratization of sequencing, the reliance of sequence assembly programs on heuristics is at odds with the need for black-box assembly solutions that can be used reliably by non-specialists. In this work, we present a formal definition for in silico assembly validation and finishing and explore the feasibility of an exact solution for this problem using quadratic programming (FinIS). Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs. Source code and executables for FinIS are freely available at http://sourceforge.net/projects/finis/ .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Li, Y., Zheng, H., Luo, R., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nature Biotechnology 29, 6723–6730 (2011)
Birol, I., Jackman, S.D., Nielsen, C.B., et al.: De novo transcriptome assembly with ABySS. Bioinformatics 25(21), 2872–2877 (2009)
Woyke, T., Teeling, H., Ivanova, N.N., et al.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443, 950–955 (2006)
Nagarajan, N., Pop, M.: Sequencing and genome assembly using next-generation technologies. Methods in Molecular Biology 673, 1–17 (2010)
Baker, M.: De novo genome assembly: what every biologist should know. Nature Methods 9, 333–337 (2012)
Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)
Gao, S., Sung, W.K., Nagarajan, N.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology 18(11), 1681–1691 (2011)
Pop, M., Kosack, S.D., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Research 14, 149–159 (2004)
Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)
Pop, M., Phillipy, A., Delcher, A.L., Salzberg, S.L.: Comparative genome assembly. Briefings in Bioinformatics 5(3), 237–248 (2004)
Nagarajan, N., Cook, C., Bonaventura, M.D., et al.: Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 11(242) (2010)
Zerbino, D.R., McEwen, G.K., Marguiles, E.H., Birney, E.: Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12) (2009)
Li, R.H., Zhu, J., Ruan, W., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20, 265–272 (2010)
Tsai, I.J., Otto, T.D., Berriman, M.: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biology 11, R41 (2010)
Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51 (1993)
Pevzner, P.A., Tang, H., Waterman, M.S.: A Eularian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98(17), 9748–9753 (2001)
Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(2), 79–85 (2005)
Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research (2008), doi:10.1101/gr.074492.107
Karger, D., Motwani, R., Ramkumar, G.D.S.: On approximating the longest path in a graph. Algorithmica 18, 421–432 (1993)
Kleinberg, J.M.: Approximation algorithms for disjoint path problems. Ph.D Thesis, Dept. of EECS. MIT (1996)
Fleischner, H.: Algorithms for Eulerian Trails, Eulerian Graphs and Related Topics. Annals of Discrete Mathematics, Part 1 2(50), X.1C13 (1991)
Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(21) (2010)
Richter, D.C., Ott, F., Schmid, R., Huson, D.H.: Metasim: a sequencing simulator for genomics and metagenomics. PloS One 3(10) (2008)
Kelley, D.R., Schatz, M.C., Salzberg, S.L.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11, R116 (2010)
Kurtz, S.A., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5, R12 (2004)
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14) (2002)
Jarrod, A.C., Isaac, H., Sirisha, S., Shujun, L., Gary, P.S., Daniel, S.R.: Meraculous: De Novo Genome Assembly with Short Paired-End Reads. PLoS ONE 6(8), e23501 (2011), doi:10.1371/journal.pone.0023501
Vandenberghe, L., Boyd, S.: Semidefinite Programming. SIAM Review 38, 49–95 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, S., Bertrand, D., Nagarajan, N. (2012). FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-33122-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33121-3
Online ISBN: 978-3-642-33122-0
eBook Packages: Computer ScienceComputer Science (R0)