  • Brief Communication
  • Published:

DADA2: High-resolution sample inference from Illumina amplicon data


We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors ( DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

Figure 1: Comparison of sequence variants inferred by DADA2 with OTUs constructed by UPARSE.
Figure 2: L.crispatus sequence variants in the human vaginal community during pregnancy.

We thank M. Schirmer and D. MacIntyre for productive correspondence. This work was supported by the NSF (DMS-1162538 to S.P.H.), the NIH (R01AI112401 to S.P.H.), and the Samarth Foundation (Stanford Microbiome Seed Grant to B.J.C. and S.P.H.).

B.J.C. and S.P.H. designed the research; B.J.C., P.J.M., and M.J.R. implemented the algorithm; B.J.C. performed the analysis; B.J.C., P.J.M., M.J.R., and S.P.H. wrote the paper; and A.W.H. and A.J.A.J. generated the Extreme data set designed by B.J.C., P.J.M., and A.W.H.

Correspondence to Benjamin J Callahan.

The authors declare no competing financial interests.

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Tables 1–3 and Supplementary Notes 1 and 2 (PDF 1809 kb)

DADA2 software package and scripts for benchmarking and analysis (ZIP 1312 kb)

Callahan, B., McMurdie, P., Rosen, M. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 (2016).

