Metagenome analysis using the Kraken software suite


Metagenomic experiments expose the wide range of microscopic organisms in any microbial environment through high-throughput DNA sequencing. The computational analysis of the sequencing data is critical for the accurate and complete characterization of the microbial community. To facilitate efficient and reproducible metagenomic analysis, we introduce a step-by-step protocol for the Kraken suite, an end-to-end pipeline for the classification, quantification and visualization of metagenomic datasets. Our protocol describes the execution of the Kraken programs, via a sequence of easy-to-use scripts, in two scenarios: (1) quantification of the species in a given metagenomics sample; and (2) detection of a pathogenic agent from a clinical sample taken from a human patient. The protocol, which is executed within 1–2 h, is targeted to biologists and clinicians working in microbiome or metagenomics analysis who are familiar with the Unix command-line environment.

Fig. 1: Protocol workflow.
Fig. 2: Microbiome plots.
Fig. 3: Pavian output for hierarchical visualization.
Fig. 4: Pavian output for pathogen identification.
Fig. 5: Pavian alignment viewer.
Fig. 6: α- and β-diversity results.
Fig. 7: Pathogen identification results.

Data availability

The microbiome analysis used three samples from Taur et al.8, and the pathogen identification used ten samples from Li et al.9, all of which can be found on NCBI with their SRA IDs. Source data are provided with this paper.

Code availability

The following website details and links all software and databases used in this protocol: We also provide easy-to-use Jupyter notebooks for both workflows, which can be executed in the browser using Google Collab:


Indexes for tools in the Kraken suite, including the indexes used in this protocol, are made freely available on Amazon Web Services thanks to the AWS Public Dataset Program. B.L. was supported by NIH/NIHMS grant R35GM139602. S.L.S. was supported by NIH grants R35-GM130151 and R01-HG006677. M.S. acknowledges support from the National Research Foundation of Korea grant (2019R1A6A1A10073437, 2020M3A9G7103933, 2021R1C1C102065 and 2021M3A9I4021220); New Faculty Startup Fund; and the Creative-Pioneering Researchers Program through Seoul National University.

J.L. and M.S. led the development of the protocol. N.R. executed and designed the microbiome analysis protocol and is the author of the KrakenTools α-diversity tools. J.L. developed the pathogen identification protocol and is the author of Bracken and KrakenTools. M.S. authored the Jupyter notebooks for the protocol. D.E.W. is the senior author of Kraken and Kraken 2. F.B. is the author of KrakenUniq. C.P. is an author for the KrakenTools β-diversity script. B.L. supervised the development of Kraken 2. S.L.S. supervised the development of Kraken, KrakenUniq and Bracken. B.L. and S.L.S. supervised the development of this protocol. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Jennifer Lu or Martin Steinegger.

Competing interests

The authors declare no competing interests.

Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table 1

Supplementary Table 1

Supplementary Table 2

Supplementary Table 2

Source data

Source Data Fig. 2

Breport text for plotting Sankey, and krona counts for plotting krona plots.

Source Data Fig. 6

Alpha diversity table text, bray Curtis equation text, and heatmap values for beta diversity.

Source Data Fig. 7

Pathogen sample species heat map data.

Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022).

