Abstract
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Handelsman, J., Rondon, M., Brady, S., Clardy, J. & Goodman, R. Chem. Biol. 5, R245–R249 (1998).
Benson, D.A., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Wheeler, D. Nucleic Acids Res. 33, D34–D38 (2005).
Kanehisa, M. & Goto, S. Nucleic Acids Res. 28, 27–30 (2000).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
Kent, W.J. Genome Res. 12, 656–664 (2002).
Edgar, R.C. Bioinformatics 26, 2460–2461 (2010).
Zhao, Y., Tang, H. & Ye, Y. Bioinformatics 28, 125–126 (2012).
Huson, D.H. & Xie, C. Bioinformatics 30, 38–39 (2014).
Burkhardt, S. & Kärkkäinen, J. Fundamenta Informaticae 23, 1001–1018 (2003).
Ma, B., Tromp, J. & Li, M. Bioinformatics 18, 440–445 (2002).
Ilie, L., Ilie, S., Khoshraftar, S. & Bigvand, A.M. BMC Genomics 12, 280 (2011).
Murphy, L.R., Wallqvist, A. & Levy, R.M. Protein Eng. 13, 149–152 (2000).
Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).
Mackelprang, R. et al. Nature 480, 368–371 (2011).
Jansson, J. Microbe 6, 309–315 (2011).
Turnbaugh, P.J. et al. Nature 449, 804–810 (2007).
Venter, J.C. et al. Science 304, 66–74 (2004).
Wilson, M.C. et al. Nature 506, 58–62 (2014).
Wheeler, D.L. et al. Nucleic Acids Res. 36, D13–D21 (2008).
Boncz, P., Manegold, S. & Kersten, M.L. Proc. VLDB Conf. 99, 54–65 (1999).
Hach, F. et al. Nat. Methods 7, 576–577 (2010).
Rognes, T. BMC Bioinformatics 12, 221 (2011).
Henikoff, J.G. & Henikoff, S. Methods Enzymol. 266, 88–105 (1996).
Zhu, W., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 38, e132 (2010).
Acknowledgements
This research was partially supported by the National Research Foundation and Ministry of Education Singapore under its Research Centre of Excellence Programme, and by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities.
Author information
Authors and Affiliations
Contributions
B.B. designed and implemented the algorithm. C.X. performed the experimental study. C.X. and D.H.H. initiated and guided the project. D.H.H. and B.B. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Spaced seeds.
(a) The four seed shapes of weight 12 that DIAMOND uses by default. Ones and zeros indicate positions to use and ignore, respectively. (b) Illustration of the application of a spaced seed to match letters between a reference and a query sequence.
Supplementary Figure 2 Ratio of main memory accesses.
The ratio K/K’ as a function of the total length of the query sequences, for different seed lengths. The variables K and K’ represent the approximate number of main memory accesses required when using a single index or double index, respectively.
Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.
BLASTX results are shown on the left, (a) and (c). DIAMOND-fast results are shown on the right, (b) and (d). The upper two panels show the first and second principle coordinates, whereas the lower two panels show the first and third principle coordinates.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 523 kb)
Supplementary Software
DIAMOND v0.4.7 source code (ZIP 2737 kb)
Rights and permissions
About this article
Cite this article
Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3176
This article is cited by
-
Gut metagenomic analysis of gastric cancer patients reveals Akkermansia, Gammaproteobacteria, and Veillonella microbiota as potential non-invasive biomarkers
Genomics & Informatics (2024)
-
Identification of potential microbial risk factors associated with fecal indicator exceedances at recreational beaches
Environmental Microbiome (2024)
-
A systematic screen for co-option of transposable elements across the fungal kingdom
Mobile DNA (2024)
-
Isolation, characterization and functional analysis of a bacteriophage targeting Culex pipiens pallens resistance-associated Aeromonas hydrophila
Parasites & Vectors (2024)
-
Genomic epidemiology reveals multiple mechanisms of linezolid resistance in clinical enterococci in China
Annals of Clinical Microbiology and Antimicrobials (2024)