Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2808719.2808736acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Bermuda: de novo assembly of transcripts with new insights for handling uneven coverage

Published: 09 September 2015 Publication History

Abstract

Motivation: RNA-seq has made feasible the analysis of a whole set of expressed mRNAs. Mapping-based assembly of RNA-seq reads sometimes is infeasible due to lack of high-quality references. However, de novo assembly is very challenging due to uneven expression levels among transcripts and also the read coverage variation within a single transcript. Existing methods either apply de Bruijn graphs of single-sized k-mers to assemble the full set of transcripts, or conduct multiple runs of assembly, but still apply graphs of single-sized k-mers at each run. However, a single k-mer size is not suitable for all the regions of the transcripts with varied coverage.
Contribution: This paper presents a de novo assembler Bermuda with new insights for handling uneven coverage. Opposed to existing methods that use a single k-mer size for all the transcripts in each run of assembly, Bermuda self-adaptively uses a few k-mer sizes to assemble different regions of a single transcript according to their local coverage. As such, Bermuda can deal with uneven expression levels and coverage not only among transcripts, but also within a single transcript. Extensive tests show that Bermuda outperforms popular de novo assemblers in reconstructing unevenly-expressed transcripts with longer length, better contiguity and lower redundancy. Further, Bermuda is computationally efficient with moderate memory consumption.
Availability: Supplementary materials are available through http://ttic.uchicago.edu/~qmtang/

References

[1]
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389--3402, 1997.
[2]
P. N. Ariyaratne and W.-K. Sung. Pe-assembler: de novo assembler using short paired-end reads. Bioinformatics, 27(2):167--174, 2011.
[3]
A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, et al. Spades: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 19(5):455--477, 2012.
[4]
I. Birol, S. D. Jackman, C. B. Nielsen, J. Q. Qian, R. Varhol, G. Stazyk, R. D. Morin, Y. Zhao, M. Hirst, J. E. Schein, et al. De novo transcriptome assembly with abyss. Bioinformatics, 25(21):2872--2877, 2009.
[5]
K. Bryc, C. Velez, T. Karafet, A. Moreno-Estrada, A. Reynolds, A. Auton, M. Hammer, C. D. Bustamante, and H. Ostrer. Genome-wide patterns of population structure and admixture among hispanic/latino populations. Proceedings of the National Academy of Sciences, 107(Supplement 2):8954--8961, 2010.
[6]
J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe. Allpaths: de novo assembly of whole-genome shotgun microreads. Genome research, 18(5):810--820, 2008.
[7]
P. J. Campbell, P. J. Stephens, E. D. Pleasance, S. O'Meara, H. Li, T. Santarius, L. A. Stebbings, C. Leroy, S. Edkins, C. Hardy, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature genetics, 40(6):722--729, 2008.
[8]
M. J. Chaisson, D. Brinza, and P. A. Pevzner. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome research, 19(2):336--346, 2009.
[9]
H.-T. Chu, W. W. Hsiao, J.-C. Chen, T.-J. Yeh, M.-H. Tsai, H. Lin, Y.-W. Liu, S.-A. Lee, C.-C. Chen, T. T. Tsao, et al. Ebardenovo: highly accurate de novo assembly of rna-seq with efficient chimera-detection. Bioinformatics, 29(8):1004--1010, 2013.
[10]
J. C. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer. Substantial biases in ultra-short read data sets from high-throughput dna sequencing. Nucleic acids research, 36(16):e105--e105, 2008.
[11]
M. Garber, M. G. Grabherr, M. Guttman, and C. Trapnell. Computational methods for transcriptome annotation and quantification using rna-seq. Nature methods, 8(6):469--477, 2011.
[12]
M. G. Grabherr, B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, et al. Full-length transcriptome assembly from rna-seq data without a reference genome. Nature biotechnology, 29(7):644--652, 2011.
[13]
B. R. Graveley. The haplo-spliceo-transcriptome: common variations in alternative splicing in the human population. Trends in Genetics, 24(1):5--7, 2008.
[14]
M. Guttman, M. Garber, J. Z. Levin, J. Donaghey, J. Robinson, X. Adiconis, L. Fan, M. J. Koziol, A. Gnirke, C. Nusbaum, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature biotechnology, 28(5):503--510, 2010.
[15]
S. Heber, M. Alekseyev, S.-H. Sze, H. Tang, and P. A. Pevzner. Splicing graphs and est assembly problem. Bioinformatics, 18(suppl 1):S181--S188, 2002.
[16]
L. Ilie, F. Fazayeli, and S. Ilie. Hitec: accurate error correction in high-throughput sequencing data. Bioinformatics, 27(3):295--302, 2011.
[17]
H. Jiang and W. H. Wong. Statistical inferences for isoform expression in rna-seq. Bioinformatics, 25(8):1026--1032, 2009.
[18]
D. R. Kelley, M. C. Schatz, S. L. Salzberg, et al. Quake: quality-aware detection and correction of sequencing errors. Genome Biol, 11(11):R116, 2010.
[19]
J. O. Korbel, A. E. Urban, J. P. Affourtit, B. Godwin, F. Grubert, J. F. Simons, P. M. Kim, D. Palejev, N. J. Carriero, L. Du, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science, 318(5849):420--426, 2007.
[20]
S. Koren, M. C. Schatz, B. P. Walenz, J. Martin, J. T. Howard, G. Ganapathy, Z. Wang, D. A. Rasko, W. R. McCombie, E. D. Jarvis, et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature biotechnology, 30(7):693--700, 2012.
[21]
H.-S. Le, M. H. Schulz, B. M. McCauley, V. F. Hinman, and Z. Bar-Joseph. Probabilistic error correction for rna sequencing. Nucleic acids research, page gkt215, 2013.
[22]
H. Li and R. Durbin. Fast and accurate short read alignment with burrows--wheeler transform. Bioinformatics, 25(14):1754--1760, 2009.
[23]
R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, et al. Soapdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1(1):18, 2012.
[24]
J. A. Martin and Z. Wang. Next-generation transcriptome assembly. Nature Reviews Genetics, 12(10):671--682, 2011.
[25]
L. M. McIntyre, K. K. Lopiano, A. M. Morse, V. Amin, A. L. Oberg, L. J. Young, and S. V. Nuzhdin. Rna-seq: technical variability and sampling. BMC genomics, 12(1):293, 2011.
[26]
P. Medvedev, E. Scott, B. Kakaradov, and P. Pevzner. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27(13):i137--i141, 2011.
[27]
N. Nagarajan and M. Pop. Sequence assembly demystified. Nature Reviews Genetics, 14(3):157--167, 2013.
[28]
Y. Peng, H. C. Leung, S.-M. Yiu, M.-J. Lv, X.-G. Zhu, and F. Y. Chin. Idba-tran: a more robust de novo de bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics, 29(13):i326--i334, 2013.
[29]
P. A. Pevzner, H. Tang, and M. S. Waterman. An eulerian path approach to dna fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748--9753, 2001.
[30]
F. Rapaport, R. Khanin, Y. Liang, M. Pirun, A. Krek, P. Zumbo, C. E. Mason, N. D. Socci, and D. Betel. Comprehensive evaluation of differential gene expression analysis methods for rna-seq data. Genome Biol, 14(9):R95, 2013.
[31]
A. Roberts, H. Pimentel, C. Trapnell, and L. Pachter. Identification of novel transcripts in annotated genomes using rna-seq. Bioinformatics, 27(17):2325--2329, 2011.
[32]
G. Robertson, J. Schein, R. Chiu, R. Corbett, M. Field, S. D. Jackman, K. Mungall, S. Lee, H. M. Okada, J. Q. Qian, et al. De novo assembly and analysis of rna-seq data. Nature methods, 7(11):909--912, 2010.
[33]
J. Schröder, H. Schröder, S. J. Puglisi, R. Sinha, and B. Schmidt. Shrec: a short-read error correction method. Bioinformatics, 25(17):2157--2163, 2009.
[34]
M. H. Schulz, D. R. Zerbino, M. Vingron, and E. Birney. Oases: robust de novo rna-seq assembly across the dynamic range of expression levels. Bioinformatics, 28(8):1086--1092, 2012.
[35]
Y. Surget-Groba and J. I. Montoya-Burgos. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome research, 20(10):1432--1440, 2010.
[36]
C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D. R. Kelley, H. Pimentel, S. L. Salzberg, J. L. Rinn, and L. Pachter. Differential gene and transcript expression analysis of rna-seq experiments with tophat and cufflinks. Nature protocols, 7(3):562--578, 2012.
[37]
C. Trapnell, B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, and L. Pachter. Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology, 28(5):511--515, 2010.
[38]
B. S. Weir and C. C. Cockerham. Estimating f-statistics for the analysis of population structure. evolution, pages 1358--1370, 1984.
[39]
Z. Xia, J. Wen, C.-C. Chang, and X. Zhou. Nsmap: a method for spliced isoforms identification and quantification from rna-seq. BMC bioinformatics, 12(1):162, 2011.
[40]
Y. Xie, G. Wu, J. Tang, R. Luo, J. Patterson, S. Liu, W. Huang, G. He, S. Gu, S. Li, et al. Soapdenovo-trans: de novo transcriptome assembly with short rna-seq reads. Bioinformatics, 30(12):1660--1666, 2014.
[41]
Y. Xing, A. Resch, and C. Lee. The multiassembly problem: reconstructing multiple transcript isoforms from est fragment mixtures. Genome research, 14(3):426--441, 2004.
[42]
M. Yassour, T. Kaplan, H. B. Fraser, J. Z. Levin, J. Pfiffner, X. Adiconis, G. Schroth, S. Luo, I. Khrebtukova, A. Gnirke, et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mrna sequencing. Proceedings of the National Academy of Sciences, 106(9):3264--3269, 2009.
[43]
D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821--829, 2008.

Index Terms

  1. Bermuda: de novo assembly of transcripts with new insights for handling uneven coverage

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
          September 2015
          683 pages
          ISBN:9781450338530
          DOI:10.1145/2808719
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 09 September 2015

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. RNA-seq
          2. de novo assembly
          3. multiple k-mer
          4. uneven coverage

          Qualifiers

          • Research-article

          Funding Sources

          • National Science Foundation CAREER award
          • Alfred P. Sloan Fellowship

          Conference

          BCB '15
          Sponsor:

          Acceptance Rates

          BCB '15 Paper Acceptance Rate 48 of 141 submissions, 34%;
          Overall Acceptance Rate 254 of 885 submissions, 29%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 104
            Total Downloads
          • Downloads (Last 12 months)4
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 21 Nov 2024

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media