Abstract
The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Metagenomics revealed a correlation of gut phageome with autism spectrum disorder
Gut Pathogens Open Access 04 August 2023
-
Presence and role of viruses in anaerobic digestion of food waste under environmental variability
Microbiome Open Access 04 August 2023
-
Epiphytic common core bacteria in the microbiomes of co-located green (Ulva), brown (Saccharina) and red (Grateloupia, Gelidium) macroalgae
Microbiome Open Access 01 June 2023
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout

References
Handelsman, J., Rondon, M., Brady, S., Clardy, J. & Goodman, R. Chem. Biol. 5, R245–R249 (1998).
Benson, D.A., Karsch-Mizrachi, I., Lipman, D., Ostell, J. & Wheeler, D. Nucleic Acids Res. 33, D34–D38 (2005).
Kanehisa, M. & Goto, S. Nucleic Acids Res. 28, 27–30 (2000).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. J. Mol. Biol. 215, 403–410 (1990).
Kent, W.J. Genome Res. 12, 656–664 (2002).
Edgar, R.C. Bioinformatics 26, 2460–2461 (2010).
Zhao, Y., Tang, H. & Ye, Y. Bioinformatics 28, 125–126 (2012).
Huson, D.H. & Xie, C. Bioinformatics 30, 38–39 (2014).
Burkhardt, S. & Kärkkäinen, J. Fundamenta Informaticae 23, 1001–1018 (2003).
Ma, B., Tromp, J. & Li, M. Bioinformatics 18, 440–445 (2002).
Ilie, L., Ilie, S., Khoshraftar, S. & Bigvand, A.M. BMC Genomics 12, 280 (2011).
Murphy, L.R., Wallqvist, A. & Levy, R.M. Protein Eng. 13, 149–152 (2000).
Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).
Mackelprang, R. et al. Nature 480, 368–371 (2011).
Jansson, J. Microbe 6, 309–315 (2011).
Turnbaugh, P.J. et al. Nature 449, 804–810 (2007).
Venter, J.C. et al. Science 304, 66–74 (2004).
Wilson, M.C. et al. Nature 506, 58–62 (2014).
Wheeler, D.L. et al. Nucleic Acids Res. 36, D13–D21 (2008).
Boncz, P., Manegold, S. & Kersten, M.L. Proc. VLDB Conf. 99, 54–65 (1999).
Hach, F. et al. Nat. Methods 7, 576–577 (2010).
Rognes, T. BMC Bioinformatics 12, 221 (2011).
Henikoff, J.G. & Henikoff, S. Methods Enzymol. 266, 88–105 (1996).
Zhu, W., Lomsadze, A. & Borodovsky, M. Nucleic Acids Res. 38, e132 (2010).
Acknowledgements
This research was partially supported by the National Research Foundation and Ministry of Education Singapore under its Research Centre of Excellence Programme, and by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities.
Author information
Authors and Affiliations
Contributions
B.B. designed and implemented the algorithm. C.X. performed the experimental study. C.X. and D.H.H. initiated and guided the project. D.H.H. and B.B. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Spaced seeds.
(a) The four seed shapes of weight 12 that DIAMOND uses by default. Ones and zeros indicate positions to use and ignore, respectively. (b) Illustration of the application of a spaced seed to match letters between a reference and a query sequence.
Supplementary Figure 2 Ratio of main memory accesses.
The ratio K/K’ as a function of the total length of the query sequences, for different seed lengths. The variables K and K’ represent the approximate number of main memory accesses required when using a single index or double index, respectively.
Supplementary Figure 3 PCoA analysis of 12 permafrost samples based on a subset of 6 million reads.
BLASTX results are shown on the left, (a) and (c). DIAMOND-fast results are shown on the right, (b) and (d). The upper two panels show the first and second principle coordinates, whereas the lower two panels show the first and third principle coordinates.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Tables 1–3 (PDF 523 kb)
Supplementary Software
DIAMOND v0.4.7 source code (ZIP 2737 kb)
Rights and permissions
About this article
Cite this article
Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3176
This article is cited by
-
Chromosome-length genome assembly of Teladorsagia circumcincta – a globally important helminth parasite in livestock
BMC Genomics (2023)
-
Nationwide genomic surveillance reveals the prevalence and evolution of honeybee viruses in China
Microbiome (2023)
-
The gastrointestinal microbiome in dairy cattle is constrained by the deterministic driver of the region and the modified effect of diet
Microbiome (2023)
-
Metagenomics revealed a correlation of gut phageome with autism spectrum disorder
Gut Pathogens (2023)
-
Reference genomes of channel catfish and blue catfish reveal multiple pericentric chromosome inversions
BMC Biology (2023)