DADA2: High-resolution sample inference from Illumina amplicon data

Abstract

We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Comparison of sequence variants inferred by DADA2 with OTUs constructed by UPARSE.
Figure 2: L.crispatus sequence variants in the human vaginal community during pregnancy.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. 1

    Human Microbiome Project Consortium. Nature 486, 207–214 (2012).

  2. 2

    Rosen, M.J., Davison, M., Bhaya, D. & Fisher, D.S. Science 348, 1019–1023 (2015).

    CAS  Article  Google Scholar 

  3. 3

    Reeder, J. & Knight, R. Nat. Methods 7, 668–669 (2010).

    CAS  Article  Google Scholar 

  4. 4

    Quince, C., Lanzen, A., Davenport, R.J. & Turnbaugh, P.J. BMC Bioinformatics 12, 38 (2011).

    Article  Google Scholar 

  5. 5

    Rosen, M.J., Callahan, B.J., Fisher, D.S. & Holmes, S.P. BMC Bioinformatics 13, 283 (2012).

    Article  Google Scholar 

  6. 6

    Bragg, L., Stone, G., Imelfort, M., Hugenholtz, P. & Tyson, G.W. Nat. Methods 9, 425–426 (2012).

    CAS  Article  Google Scholar 

  7. 7

    Schloss, P.D. et al. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    CAS  Article  Google Scholar 

  8. 8

    Caporaso, J.G. et al. Nat. Methods 7, 335–336 (2010).

    CAS  Article  Google Scholar 

  9. 9

    Edgar, R.C. Nat. Methods 10, 996–998 (2013).

    CAS  Article  Google Scholar 

  10. 10

    Eren, A.M., Borisy, G.G., Huse, S.M. & Welch, J.L.M. Proc. Natl. Acad. Sci. USA 111, E2875–E2884 (2014).

    CAS  Article  Google Scholar 

  11. 11

    Eren, A.M. et al. ISME J. 9, 968–979 (2015).

    CAS  Article  Google Scholar 

  12. 12

    Tikhonov, M., Leach, R.W. & Wingreen, N.S. ISME J. 9, 68–80 (2015).

    Article  Google Scholar 

  13. 13

    Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M. & Shafer, R.W. Genome Res. 17, 1195–1201 (2007).

    CAS  Article  Google Scholar 

  14. 14

    McElroy, K., Zagordi, O., Bull, R., Luciani, F. & Beerenwinkel, N. BMC Genomics 14, 501 (2013).

    Article  Google Scholar 

  15. 15

    Guarner, F. Nat. Rev. Gastroenterol. Hepatol. 11, 647–649 (2014).

    Article  Google Scholar 

  16. 16

    Schirmer, M. et al. Nucleic Acids Res. 43, e37 (2015).

    Article  Google Scholar 

  17. 17

    Kozich, J.J., Westcott, S.L., Baxter, N.T., Highlander, S.K. & Schloss, P.D. Appl. Environ. Microbiol. 79, 5112–5120 (2013).

    CAS  Article  Google Scholar 

  18. 18

    Edgar, R.C. & Flyvbjerg, H. Bioinformatics 31, 3476–3482 (2015).

    CAS  Article  Google Scholar 

  19. 19

    MacIntyre, D.A. et al. Sci. Rep. 11, 8988 (2015).

    Article  Google Scholar 

  20. 20

    Ravel, J. et al. Proc. Natl. Acad. Sci. USA 108, 4680–4687 (2011).

    CAS  Article  Google Scholar 

  21. 21

    Sun, Y. et al. Nucleic Acids Res. 37, e76 (2009).

    Article  Google Scholar 

  22. 22

    Caporaso, J.G. et al. ISME J. 6, 1621–1624 (2012).

    CAS  Article  Google Scholar 

  23. 23

    Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C. & Knight, R. Bioinformatics 27, 2194–2200 (2011).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank M. Schirmer and D. MacIntyre for productive correspondence. This work was supported by the NSF (DMS-1162538 to S.P.H.), the NIH (R01AI112401 to S.P.H.), and the Samarth Foundation (Stanford Microbiome Seed Grant to B.J.C. and S.P.H.).

Author information

Affiliations

Authors

Contributions

B.J.C. and S.P.H. designed the research; B.J.C., P.J.M., and M.J.R. implemented the algorithm; B.J.C. performed the analysis; B.J.C., P.J.M., M.J.R., and S.P.H. wrote the paper; and A.W.H. and A.J.A.J. generated the Extreme data set designed by B.J.C., P.J.M., and A.W.H.

Corresponding author

Correspondence to Benjamin J Callahan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8, Supplementary Tables 1–3 and Supplementary Notes 1 and 2 (PDF 1809 kb)

Supplementary Software

DADA2 software package and scripts for benchmarking and analysis (ZIP 1312 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Callahan, B., McMurdie, P., Rosen, M. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 (2016). https://doi.org/10.1038/nmeth.3869

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing