Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Metagenome analysis using the Kraken software suite

Abstract

Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1–2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Protocol workflow.
Fig. 2: Microbiome plots.
Fig. 3: Pavian output for hierarchical visualization.
Fig. 4: Pavian output for pathogen identification.
Fig. 5: Pavian alignment viewer.
Fig. 6: α- and β-diversity results.
Fig. 7: Pathogen identification results.

Similar content being viewed by others

Data availability

The microbiome analysis used three samples from Taur et al.8, and the pathogen identification used ten samples from Li et al.9, all of which can be found on NCBI with their SRA IDs. Source data are provided with this paper.

Code availability

The following website details and links all software and databases used in this protocol: http://ccb.jhu.edu/data/kraken2_protocol/. We also provide easy-to-use Jupyter notebooks for both workflows, which can be executed in the browser using Google Collab: https://github.com/martin-steinegger/kraken-protocol/.

References

  1. Rappé, M. S. & Giovannoni, S. J.The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003).

    Article  PubMed  Google Scholar 

  2. Wood, D. E. & Salzberg, S. L.Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Breitwieser, F. P., Baker, D. N. & Salzberg, S. L.KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L.Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).

    Article  Google Scholar 

  6. Breitwieser, P. & Salzberg, S. L.Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36, 1303–1304 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Langmead, B. & Salzberg, S. L.Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Taur, Y. et al.Reconstitution of the gut microbiota of antibiotic-treated patients by autologous fecal microbiota transplant. Sci. Transl. Med. 10, eaap9489 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Li, Z. et al.Identifying corneal infections in formalin-fixed specimens using next generation sequencing. Invest. Ophthalmol. Vis. Sci. 59(Jan), 280–288 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J.Basic local alignment search tool. J. Mol. Biol. 215(Oct), 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  11. Pruitt, K. D., Tatusova, T. & Maglott, D. R.NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    Article  CAS  PubMed  Google Scholar 

  12. O’Leary, N. A. et al.Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  PubMed  Google Scholar 

  13. Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S.CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L.Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Menzel, P., Ng, K. L. & Krogh, A.Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C.Benchmarking metagenomics tools for taxonomic classification. Cell 178, 779–794 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Seppey, M., Manni, M. & Zdobnov, M.LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res. 30, 1208–1216 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Segata, N. et al.Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Vervier, K., Mahé, P., Tournoud, M., Veyrieras, J. B. & Vert, J. P.Large-scale machine learning for metagenomics sequence classification. Bioinformatics 32, 1023–1032 (2016).

    Article  CAS  PubMed  Google Scholar 

  20. Luo, Y., Yu, Y. W., Zeng, J., Berger, B. & Peng, J.Metagenomic binning through low-density hashing. Bioinformatics 35, 219–226 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Breitwieser, F. P., Lu, J. & Salzberg, S. L.A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2017).

    Article  PubMed Central  Google Scholar 

  22. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).

  23. Li, H.Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Stephens, Z. et al.Exogene: a performant workflow for detecting viral integrations from paired-end next-generation sequencing data. PLoS ONE 16, e0250915 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L.Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Steinegger, M. & Salzberg, S. L.Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lu, J. & Salzberg, S. L.Removing contaminants from databases of draft genomes. PLoS Comput. Biol. 14, e1006277 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Buchfink, B., Xie, C. & Huson, D. H.Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

    Article  CAS  PubMed  Google Scholar 

  29. Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J. & Levy Karin, E. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029–3031 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Nasko, D. J., Koren, S., Phillippy, A. M. & Treangen, T. J.RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19, 165 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Yang, C. et al.A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput. Struct. Biotechnol. J. 19, 6301–6314 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Whittaker, R. H.Evolution and measurement of species diversity. Taxon 21, 213–251 (1972).

    Article  Google Scholar 

  33. Berger, W. H. & Parker, F. L. Diversity of planktonic foraminifera in deep-sea sediments. Science 168, 1345–1347 (1970).

    Article  CAS  PubMed  Google Scholar 

  34. Fisher, R. A., Corbet, A. S. & Williams, C. B.The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 12, 42–58 (1943).

    Article  Google Scholar 

  35. Simpson, E. H.Measurement of diversity. Nature 163, 688–688 (1949).

    Article  Google Scholar 

  36. Shannon, C. E.A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).

    Article  Google Scholar 

  37. Bray, J. R. & Curtis, J. T.An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957).

    Article  Google Scholar 

  38. Ondov, B. D., Bergman, N. H. & Phillippy, A. M.Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 385 (2011).

    Article  Google Scholar 

  39. Danecek, P. et al.Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Grüning, B. et al.Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Indexes for tools in the Kraken suite, including the indexes used in this protocol, are made freely available on Amazon Web Services thanks to the AWS Public Dataset Program. B.L. was supported by NIH/NIHMS grant R35GM139602. S.L.S. was supported by NIH grants R35-GM130151 and R01-HG006677. M.S. acknowledges support from the National Research Foundation of Korea grant (2019R1A6A1A10073437, 2020M3A9G7103933, 2021R1C1C102065 and 2021M3A9I4021220); New Faculty Startup Fund; and the Creative-Pioneering Researchers Program through Seoul National University.

Author information

Authors and Affiliations

Authors

Contributions

J.L. and M.S. led the development of the protocol. N.R. executed and designed the microbiome analysis protocol and is the author of the KrakenTools α-diversity tools. J.L. developed the pathogen identification protocol and is the author of Bracken and KrakenTools. M.S. authored the Jupyter notebooks for the protocol. D.E.W. is the senior author of Kraken and Kraken 2. F.B. is the author of KrakenUniq. C.P. is an author for the KrakenTools β-diversity script. B.L. supervised the development of Kraken 2. S.L.S. supervised the development of Kraken, KrakenUniq and Bracken. B.L. and S.L.S. supervised the development of this protocol. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Jennifer Lu or Martin Steinegger.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Salzberg, S. et al. Neurol. Neuroimmunol. Neuroinflamm. 3, e251 (2016): https://doi.org/10.1212/NXI.0000000000000251

Wood, D. et al. Genome Biol. 15, R46 (2014): https://doi.org/10.1186/gb-2014-15-3-r46

Lu, J. et al. Peer J. Comput. Sci. 3, e104 (2017): https://doi.org/10.7717/peerj-cs.104

Breitwieser, F. et al. Genome Biol. 19, 198 (2018): https://doi.org/10.1186/s13059-018-1568-0

Wood, D. et al. Genome Biol. 20, 257 (2019): https://doi.org/10.1186/s13059-019-1891-0

Breitwieser, F. et al. Bioinformatics 36, 1303–1304 (2020): https://doi.org/10.1093/bioinformatics/btz715

Key data used in this protocol

Taur, Y. et al. Sci. Transl. Med. 10, eaap9489 (2018): https://doi.org/10.1126/scitranslmed.aap9489

Li, Z. et al. Invest. Ophthalmol. Vis. Sci. 59, 280–288 (2018): https://doi.org/10.1167/iovs.17-21617

Supplementary information

Supplementary Table 1

Supplementary Table 1

Supplementary Table 2

Supplementary Table 2

Source data

Source Data Fig. 2

Breport text for plotting Sankey, and krona counts for plotting krona plots.

Source Data Fig. 6

Alpha diversity table text, bray Curtis equation text, and heatmap values for beta diversity.

Source Data Fig. 7

Pathogen sample species heat map data.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-022-00738-y

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics