Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1739041.1739074acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Anchoring millions of distinct reads on the human genome within seconds

Published: 22 March 2010 Publication History

Abstract

With the advent of next-generation DNA sequencing machines, there is an increasing need for the development of computational tools that can anchor accurately and expediently the millions of generated short DNA sequences (or reads) onto the genomes of target organisms. In this work, we describe 'Q-Pick', a new and efficient method for solving this problem. Q-Pick allows the rapid identification and anchoring of such reads with possible wildcards in large genomic databases, while guaranteeing completeness of results and efficiency of operation. Q-Pick requires very spartan memory and computational resources, and is trivially amenable to SIMD implementation; it can also be easily extended to handle longer reads, e.g. 75-mers or longer. Our experiments indicate that Q-Pick can anchor millions of distinct short reads against both strands of a mammalian genome in seconds, using a single-core computer processor.

References

[1]
http://seqanswers.com/forums/showthread.php?t=145.
[2]
http://www.sanger.ac.uk/Users/lh3/seq-nt.html.
[3]
N. Cloonan, A. Forrest, G. Kolle, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. In Nature Methods 5 (7), pages 613--619, 2008.
[4]
A. J. Cox. (Unpublished) ELAND: Efficient Large-Scale Alignment of Nucleotide Databases. Illumina, 2008.
[5]
J. Deng, R. Shoemaker, and B. Xie. Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming. In Nature Biotechnology 27, pages 353--360, 2009.
[6]
J. Dohm, C. Lottaz, T. Borodina, and H. Himmelbauer. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. In Genome Research 17 (11), pages 1697--1706, 2007.
[7]
S. Elbashir, J. Harborth, W. Lendeckel, A. Yalcin, K. Weber, and T. Tuschl. Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. In Nature 411, pages 494--498, 2001.
[8]
R. Fernandes and S. Skiena. MultiPrimer: a system for microarray PCR primer design. In Methods Mol Biol., 402, pages 305--314, 2007.
[9]
L. Gasieniec, C. Li, P. Sant, and P. Wong. Randomized probe selection algorithm for microarray design. In J Theor Biol, 248(3), pages 512--521, 2007.
[10]
C. Iseli, G. Ambrosini, P. Bucher, and C. Jongeneel. Indexing strategies for rapid searches of short words in genome sequences. In PLoS ONE, 2(6):e57, 2007.
[11]
W. Kent. BLAT--the BLAST-like alignment tool. In Genome Research 12(4), pages 656--664, 2002.
[12]
B. Langmead, C. Trapnell, M. Pop, and S. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. In Genome Biology 10:R25, 2009.
[13]
L. Laurent, E. Wong, G. Li, T. Huynh, A. Tsirigos, et al. Dynamic Changes in the Human Methylome During Differentiation. In Genome Research, 2010, In Press.
[14]
T. J. Ley, E. R. Mardis, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. In Nature 456 (7218), pages 66--72, 2008.
[15]
H. Li. MAQ: Mapping and Assembly with Quality. In http://maq.sourceforge.net/.
[16]
H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. In Bioinformatics 25, pages 1754--1760, 2009.
[17]
H. Li, J. Ruan, and R. Durbin. Mapping short DNA sequencing reads and calling variants using mapping quality scores. In Genome Research 18, pages 1851--1858, 2008.
[18]
R. Li, Y. Li, K. Kristiansen, and J. Wang. SOAP: short oligonucleotide alignment program. In Bioinformatics Note, 24(5), pages 713--714, 2008.
[19]
W. Li and X. Ying. Mprobe 2.0: computer-aided probe design for oligonucleotide microarray. In Appl Bioinformatics 5(3), pages 181--186, 2006.
[20]
X. Li, Z. He, and J. Zhou. Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation. In Nucleic Acids Res., 33(19), pages 6114--6123, 2005.
[21]
R. Lister, M. Pelizzola, and R. Dowen. Human DNA methylomes at base resolution show widespread epigenomic differences. In Nature 462, pages 315--322, 2009.
[22]
M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In Proc. of ACM SIGMOD, pages 247--258, 1996.
[23]
B. Phoophakdee and M. J. Zaki. Genome-scale disk-based suffix tree indexing. In Proc. of ACM SIGMOD, pages 833--844, 2007.
[24]
E. D. Pleasance, R. K. Cheetham, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. In Nature, doi:10.1038/nature08658, 2009.
[25]
E. D. Pleasance, P. J. Stephens, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. In Nature, doi:10.1038/nature08629, 2009.
[26]
I. Rigoutsos, T. Huynh, K. Miranda, A. Tsirigos, A. McHardy, and D. Platt. Short blocks from the non-coding parts of the human genome have instances within nearly all known genes and relate to biological processes. In PNAS, 103(17), pages 6605--6610, 2006.
[27]
S. Rumble and M. Brudno. SHRiMP: SHort Read Mapping Package. In http://compbio.cs.toronto.edu/shrimp/, University of Toronto.
[28]
O. Snøve and T. Holen. Many commonly used siRNAs risk off-target activity. In Biochem Biophys Res Commun., 319(1), pages 256--263, 2004.
[29]
J. Stenberg, M. Nilsson, and U. Landegren. ProbeMaker: an extensible framework for design of sets of oligonucleotide probes. In BMC Bioinformatics, 6:229, 2005.
[30]
C. Trapnell and S. L. Salzberg. How to map billions of short reads onto genomes. In Nature Biotechnology 27, pages 455--457, 2009.
[31]
Z. Wang, M. Gerstein, and M. Snyder. RNA-Seq: a revolutionary tool for transcriptomics. In Nature Reviews Genetics 10 (1), pages 57--63, 2009.
[32]
R. Wernersson, A. Juncker, and H. Nielsen. Probe selection for DNA microarrays using OligoWiz. In Nat Protoc, 2(11), pages 2677--2691, 2007.
[33]
T. Yamada and S. Mirishita. Accelerated off-target search algorithm for siRNA. In Bioinformatics, 21(8), pages 1316--1324, 2005.
[34]
D. Zerbino and E. Birney. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. In Genome Research 18 (5), pages 821--829, 2008.

Cited By

View all
  • (2020)An Approach to the Exact Packed String Matching ProblemProceedings of the 4th International Conference on Natural Language Processing and Information Retrieval10.1145/3443279.3443296(173-178)Online publication date: 18-Dec-2020
  1. Anchoring millions of distinct reads on the human genome within seconds

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
    March 2010
    741 pages
    ISBN:9781605589459
    DOI:10.1145/1739041
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 March 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '10
    EDBT/ICDT '10: EDBT/ICDT '10 joint conference
    March 22 - 26, 2010
    Lausanne, Switzerland

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)An Approach to the Exact Packed String Matching ProblemProceedings of the 4th International Conference on Natural Language Processing and Information Retrieval10.1145/3443279.3443296(173-178)Online publication date: 18-Dec-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media