Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1854776.1854801acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

REAL: an efficient REad ALigner for next generation sequencing reads

Published: 02 August 2010 Publication History

Abstract

Motivation: The constant advances in sequencing technology are turning whole-genome sequencing into a routine procedure, resulting in massive amounts of data that need to be processed. Tens of gigabytes of data in the form of short reads need to be mapped back to reference sequences, a few gigabases long. A first generation of short read alignment software successfully employed hash tables, and the current second generation uses Burrows-Wheeler Transform, further improving mapping speed. However, there is still demand for faster and more accurate mapping.
Results: In this paper, we present REad ALigner, an efficient, accurate and consistent tool for aligning short reads obtained from next generation sequencing. It is based on a new, simple, yet efficient mapping algorithm that can match and outperform current BWT-based software.

References

[1]
http://www.1000genomes.org/.
[2]
http://www3.appliedbiosystems.com.
[3]
http://www.454.com.
[4]
http://www.illumina.com.
[5]
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic Local Alignment Search Tool. Journal of Molecular Biology, 215(3):403--410, October 1990.
[6]
P. Antoniou, J. W. Daykin, C. S. Iliopoulos, D. Kourie, L. Mouchard, and S. P. Pissis. Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome. In Proceedings of Information Technology and Applications in Biomedicine (ITAB 09), 2009.
[7]
P. Antoniou, C. S. Iliopoulos, L. Mouchard, and S. P. Pissis. A fast and efficient algorithm for mapping short sequences to a reference genome. Advances in Experimental Medicine and Biology. Springer, 2010. (in press).
[8]
M. Burrows and D. Wheeler. A block sorting lossless data compression algorithm. Digital Equipment Corporation, Technical Report 124, 1994.
[9]
A. Cox. ELAND, unpublished. 2007.
[10]
A. V. Dalca and M. Brudno. Genome variation discovery with high-throughput sequencing data. Briefings in Bioinformatics, 11(1):3--14, 2010.
[11]
H. Jiang and W. H. Wong. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics, 24(20):2395--2396, 2008.
[12]
W. J. Kent. BLAT - the BLAST-like alignment tool. Genome Research, 12:656--664, 2002.
[13]
B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25.
[14]
H. Li and R. Durbin. Fast and accurate short read alignment with Burrow-Wheeler transform. Bioinformatics, 25(14):1754--1760, 2009.
[15]
H. Li, J. Ruan, and R. Durbin. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18:1851--1858, 2008.
[16]
R. Li, Y. Li, K. Kristiansen, and J. Wang. SOAP: short oligonucleotide alignment program. Bioinformatics, 24(5):713--714, 2008.
[17]
R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, and J. Wang. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(16):1966--1967, 2009.
[18]
R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, Y. Li, S. Li, G. Shan, K. Kristiansen, S. Li, H. Yang, J. Wang, and J. Wang. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2):265--272, February 2010.
[19]
J. R. Miller, S. Koren, and G. Sutton. Assembly algorithms for next-generation sequencing data. Genomics, 2010.
[20]
A. Mortazavi, B. A. Williams, K. McCue, L. Schaeffer, and B. Wold. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 5(7):621--628, May 2008.
[21]
G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33:2001, 1999.
[22]
S. B. Ng, K. J. Buckingham, C. Lee, A. W. Bigham, H. K. Tabor, K. M. Dent, C. D. Huff, P. T. Shannon, E. W. Jabs, D. A. Nickerson, J. Shendure, and M. J. Bamshad. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics, 42(1):30--35, January 2010.
[23]
W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA, 85:2444--2448, 1988.
[24]
S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno. SHRiMP: Accurate Mapping of Short Color-space Reads. PLoS Computational Biology, 5(5):e1000386+, May 2009.
[25]
F. Sanger and A. R. Coulson. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology, 94:441--448, 1975.
[26]
F. Sanger, S. Nicklen, and A. R. Coulson. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA, 74:5463--5467, 1977.
[27]
J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol. Abyss: A parallel assembler for short read sequence data. Genome Research, 19(6):1117--1123, June 2009.
[28]
J. R. ten Bosch and W. W. Grody. Keeping up with the next generation: Massively parallel sequencing in clinical diagnostics. Journal of Molecular Diagnostics, 10(6):484--492, 2008.
[29]
T. D. Wu and S. Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics, 26:873--881, 2010.
[30]
H. Xiang, J. Zhu, Q. Chen, F. Dai, X. Li, M. Li, H. Zhang, G. Zhang, D. Li, Y. Dong, L. Zhao, Y. Lin, D. Cheng, J. Yu, J. Sun, X. Zhou, K. Ma, Y. He, Y. Zhao, S. Guo, M. Ye, G. Guo, Y. Li, R. Li, X. Zhang, L. Ma, K. Kristiansen, Q. Guo, J. Jiang, S. Beck, Q. Xia, W. Wang, and J. Wang. Single base-resolution methylome of the silkworm reveals a sparse epigenomic map. Nature Biotechnology, 28(5):516--20, 2010.

Cited By

View all
  • (2020)Semiglobal Sequence Alignment with Gaps Using GPUIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2019.291410517:6(2086-2097)Online publication date: 1-Nov-2020
  • (2019)Accelerating Alignment for Short Reads Allowing Insertion of Gaps on Multi-Core Cluster2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)10.1109/PDCAT46702.2019.00019(41-45)Online publication date: Dec-2019
  • (2017)Searching and Indexing Circular PatternsAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_3(77-90)Online publication date: 19-Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
August 2010
705 pages
ISBN:9781450304382
DOI:10.1145/1854776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mapping
  2. next generation sequencing
  3. pattern matching
  4. reads
  5. string algorithms

Qualifiers

  • Research-article

Conference

BCB'10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Semiglobal Sequence Alignment with Gaps Using GPUIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2019.291410517:6(2086-2097)Online publication date: 1-Nov-2020
  • (2019)Accelerating Alignment for Short Reads Allowing Insertion of Gaps on Multi-Core Cluster2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)10.1109/PDCAT46702.2019.00019(41-45)Online publication date: Dec-2019
  • (2017)Searching and Indexing Circular PatternsAlgorithms for Next-Generation Sequencing Data10.1007/978-3-319-59826-0_3(77-90)Online publication date: 19-Sep-2017
  • (2016)Transcriptome activity of isochores during preimplantation process in human and mouseFEBS Letters10.1002/1873-3468.12245590:14(2297-2306)Online publication date: 7-Jul-2016
  • (2015)Pairwise Sequence Alignment with Gaps with GPUProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.109(603-610)Online publication date: 8-Sep-2015
  • (2014)Fast algorithms for approximate circular string matchingAlgorithms for Molecular Biology10.1186/1748-7188-9-99:1(9)Online publication date: 2014
  • (2014)JVM: Java Visual Mapping Tool for Next Generation Sequencing ReadAdvance in Structural Bioinformatics10.1007/978-94-017-9245-5_2(11-18)Online publication date: 12-Nov-2014
  • (2013)libgapmis: extending short-read alignmentsBMC Bioinformatics10.1186/1471-2105-14-S11-S414:Suppl 11(S4)Online publication date: 2013
  • (2013)GapsMisProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506584(402-411)Online publication date: 22-Sep-2013
  • (2013)Comparison for the detection of Virus and spam using pattern matching tools2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE)10.1109/TAEECE.2013.6557291(304-311)Online publication date: May-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media