Species-level functional profiling of metagenomes and metatranscriptomes

Franzosa, Eric A.; McIver, Lauren J.; Rahnavard, Gholamali; Thompson, Luke R.; Schirmer, Melanie; Weingart, George; Lipson, Karen Schwarzberg; Knight, Rob; Caporaso, J. Gregory; Segata, Nicola; Huttenhower, Curtis

doi:10.1038/s41592-018-0176-y

Article
Published: 30 October 2018

Species-level functional profiling of metagenomes and metatranscriptomes

Nature Methods volume 15, pages 962–968 (2018)Cite this article

25k Accesses
918 Citations
81 Altmetric
Metrics details

Subjects

Abstract

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community’s known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species’ genomic versus transcriptional contributions, and strain profiling. Further, we introduce ‘contributional diversity’ to explain patterns of ecological assembly across different microbial community types.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: HUMAnN2 functionally profiles microbial communities with high accuracy using tiered search.**

**Fig. 2: Contributional diversity of core human microbiome pathways.**

**Fig. 3: Thermocline-associated microbial enzymes in the marine pelagic zone.**

**Fig. 4: Metatranscriptomic functional profiling and multi’omic data integration with HUMAnN2.**

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4

Article Open access 23 February 2023

Integration of absolute multi-omics reveals dynamic protein-to-RNA ratios and metabolic interplay within mixed-domain microbiomes

Article Open access 18 September 2020

Annotation-free discovery of functional groups in microbial communities

Article 30 March 2023

Data availability

The Human Microbiome Project (HMP) metagenomes analyzed in this work are available via http://hmpdacc.org. The IBDMDB metagenomes and metatranscriptomes analyzed in this work are available via http://ibdmdb.org. The Red Sea metagenomes analyzed in this work were previously deposited as NCBI BioProject PRJNA289734. The synthetic metagenomes and metatranscriptomes used in the evaluation of HUMAnN2 and other methods are available from the authors and at http://huttenhower.sph.harvard.edu/humann2.

References

Shafquat, A., Joice, R., Simmons, S. L. & Huttenhower, C. Functional and phylogenetic assembly of microbial communities in the human microbiome. Trends Microbiol. 22, 261–266 (2014).
CAS PubMed PubMed Central Google Scholar
Fuhrman, J. A. Microbial community structure and its functional implications. Nature 459, 193–199 (2009).
CAS PubMed Google Scholar
Lloyd-Price, J., Abu-Ali, G. & Huttenhower, C. The healthy human microbiome. Genome Med. 8, 51 (2016).
PubMed PubMed Central Google Scholar
Franzosa, E. A. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).
CAS PubMed PubMed Central Google Scholar
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
CAS PubMed PubMed Central Google Scholar
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).
CAS PubMed Google Scholar
Silva, G. G., Green, K. T., Dutilh, B. E. & Edwards, R. A. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data. Bioinformatics 32, 354–361 (2016).
CAS PubMed Google Scholar
Sharma, A. K., Gupta, A., Kumar, S., Dhakan, D. B. & Sharma, V. K. Woods: a fast and accurate functionalannotator and classifier of genomic and metagenomic sequences. Genomics 106, 1–6 (2015).
CAS PubMed Google Scholar
Petrenko, P., Lobb, B., Kurtz, D. A., Neufeld, J. D. & Doxey, A. C. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes. BMC Biol. 13, 92 (2015).
PubMed PubMed Central Google Scholar
Bose, T., Haque, M. M., Reddy, C. & Mande, S. S. COGNIZER: a framework for functional annotation of metagenomic datasets. PLoS One 10, e0142102 (2015).
PubMed PubMed Central Google Scholar
Kim, J., Kim, M. S., Koh, A. Y., Xie, Y. & Zhan, X. FMAP: functional mapping and analysis pipeline for metagenomics and metatranscriptomics studies. BMC Bioinformatics 17, 420 (2016).
PubMed PubMed Central Google Scholar
Huson, D. H. et al. MEGAN Community Edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12, e1004957 (2016).
PubMed PubMed Central Google Scholar
Nayfach, S. et al. Automated and accurate estimation of gene family abundance from shotgun metagenomes. PLoS Comput. Biol. 11, e1004573 (2015).
PubMed PubMed Central Google Scholar
Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).
CAS PubMed PubMed Central Google Scholar
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using Diamond. Nat. Methods 12, 59–60 (2015).
CAS PubMed Google Scholar
Zhao, Y., Tang, H. & Ye, Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 125–126 (2012).
CAS PubMed Google Scholar
Hauswedell, H., Singer, J. & Reinert, K. Lambda: the local aligner for massive biological data. Bioinformatics 30, i349–i355 (2014).
CAS PubMed PubMed Central Google Scholar
Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).
CAS PubMed PubMed Central Google Scholar
Scholz, M. et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods 13, 435–438 (2016).
CAS PubMed Google Scholar
Luo, C. et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015).
CAS PubMed PubMed Central Google Scholar
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
CAS PubMed Google Scholar
Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).
CAS PubMed Google Scholar
Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
CAS PubMed Google Scholar
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).
CAS PubMed Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
CAS PubMed Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
CAS PubMed Google Scholar
Gene Ontology Consortium. Gene Ontology Consortium: going forward Nucleic Acids Res. 43, D1049–D1056 (2015)..
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016).
CAS PubMed Google Scholar
Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).
CAS PubMed PubMed Central Google Scholar
Sczyrba, A. et al. Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
CAS PubMed PubMed Central Google Scholar
Hamady, M. & Knight, R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res. 19, 1141–1152 (2009).
CAS PubMed PubMed Central Google Scholar
Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108, 4680–4687 (2011).
CAS PubMed Google Scholar
Thompson, L. R. et al. Metagenomic covariation along densely sampled environmental gradients in the Red Sea. ISME J. 11, 138–151,https://doi.org/10.1038/ismej.2016.99 (2017).
Article CAS PubMed Google Scholar
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
PubMed Google Scholar
Swan, B. K. et al. Genomic and metabolic diversity of Marine Group I Thaumarchaeota in the mesopelagic of two subtropical gyres. PLoS One 9, e95380 (2014).
PubMed PubMed Central Google Scholar
Thompson, L. R. et al. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl. Acad. Sci. USA 108, E757–E764 (2011).
CAS PubMed PubMed Central Google Scholar
Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014)..
Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl. Acad.Sci. USA 111, E2329–E2338 (2014).
CAS PubMed PubMed Central Google Scholar
Turnbaugh, P. J. et al. A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009).
CAS PubMed Google Scholar
Burke, C., Steinberg, P., Rusch, D., Kjelleberg, S. & Thomas, T. Bacterial community assembly based on functional genes rather than species. Proc. Natl. Acad. Sci. USA 108, 14288–14293 (2011).
CAS PubMed PubMed Central Google Scholar
Duran-Pinedo, A. E. et al. Community-wide transcriptome of the oral microbiome in subjects with and without periodontitis. ISME J. 8, 1659–1672 (2014).
PubMed PubMed Central Google Scholar
Mason, O. U. et al. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J. 6, 1715–1727 (2012).
CAS PubMed PubMed Central Google Scholar
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
PubMed PubMed Central Google Scholar
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).
Google Scholar
Huang, K. et al. MetaRef: a pan-genomic database for comparative and community microbial genomics. Nucleic Acids Res. 42, D617–D624 (2014).
CAS PubMed Google Scholar
Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
PubMed Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
CAS PubMed PubMed Central Google Scholar
Ye, Y. & Doak, T. G. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput. Biol. 5, e1000465 (2009).
PubMed PubMed Central Google Scholar
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
PubMed Google Scholar

Download references

Acknowledgements

The authors thank M. Wong, T. Sharpton, and the members of the HUMAnN user group for their feedback on the development and evaluation of HUMAnN2. Funding for this work was provided by NSF 1565100 (to J.G.C.); People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) under REA grant agreement PCIG13-GA-2013-618833 and by MIUR “Futuro in Ricerca” RBFR13EWWI_001 (to N.S.); NIH NIDDK U54DE023798, NSF MCB-1453942, NIH NIDDK P30DK043351; and NSF DBI-1053486 (to C.H.).

Author information

These authors contributed equally: Eric A. Franzosa and Lauren J. McIver.

Authors and Affiliations

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
Eric A. Franzosa, Lauren J. McIver, Gholamali Rahnavard, Melanie Schirmer, George Weingart & Curtis Huttenhower
The Broad Institute of MIT and Harvard, Cambridge, MA, USA
Eric A. Franzosa, Lauren J. McIver, Gholamali Rahnavard, Melanie Schirmer & Curtis Huttenhower
Department of Pediatrics, University of California San Diego, San Diego, CA, USA
Luke R. Thompson & Rob Knight
Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
Karen Schwarzberg Lipson & J. Gregory Caporaso
Department of Computer Science & Engineering, University of California San Diego, San Diego, CA, USA
Rob Knight
Centre for Integrative Biology, University of Trento, Trento, Italy
Nicola Segata

Authors

Eric A. Franzosa
View author publications
You can also search for this author in PubMed Google Scholar
Lauren J. McIver
View author publications
You can also search for this author in PubMed Google Scholar
Gholamali Rahnavard
View author publications
You can also search for this author in PubMed Google Scholar
Luke R. Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Schirmer
View author publications
You can also search for this author in PubMed Google Scholar
George Weingart
View author publications
You can also search for this author in PubMed Google Scholar
Karen Schwarzberg Lipson
View author publications
You can also search for this author in PubMed Google Scholar
Rob Knight
View author publications
You can also search for this author in PubMed Google Scholar
J. Gregory Caporaso
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Segata
View author publications
You can also search for this author in PubMed Google Scholar
Curtis Huttenhower
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.A.F., L.J.M., and C.H. designed the methods. L.J.M. developed the software implementation. G.R., G.W., and N.S. produced datasets to support the software. E.A.F., L.J.M., G.R., L.R.T., M.S., and K.S.L. designed and carried out the evaluations and applications; R.K., J.G.C., and all other authors participated in interpretation of the resulting data. E.A.F., L.J.M., L.R.T., M.S., K.S.L., and C.H. wrote the paper with feedback from the other authors.

Corresponding author

Correspondence to Curtis Huttenhower.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Expanded overview of the HUMAnN2 method.

(a) HUMAnN2 implements a tiered meta’omic search that aims to explain the origin of microbial community DNA or RNA reads based on the pangenomes of detected microbes before falling back to more computationally expensive translated search. (b) The tiered search produces alignments of reads to coding sequences of known or ambiguous taxonomy. These alignments are processed in a species-specific manner to calculate gene family abundance and reconstruct community metabolic pathways. (c) HUMAnN2 thus provides, for each community meta'ome: per-gene abundances, pathway presence/absence calls and abundances, and downstream visualization and statistical tests

Supplementary Figure 2 Reference hold-out analysis of a complex synthetic metagenome.

We constructed and analyzed with HUMAnN2 a 100-member mock-even synthetic metagenome containing only non-human associated species (~2 × coverage per species). (a) Variation in the number of reads sampled per gene (compared with a genome’s average fold-coverage) makes a non-trivial contribution to the error in per-species gene abundance estimation in HUMAnN2 (roughly 0.1 Bray-Curtis dissimilarity units). (b) Accuracy of community-level gene family abundance estimation decreases linearly with the number of community species missed by HUMAnN2′s taxonomic prescreen (simulated here by excluding sets of species from the underlying pangenome reference collection). (c) HUMAnN2′s overall runtime increases linearly as more species are excluded from the taxonomic prescreen (which results in more work being done during translated search). Runtimes reflect execution using 8 CPU cores

Supplementary Figure 3 HUMAnN2 tiered search performance on human metagenomes.

We applied HUMAnN2′s tiered search to profile 397 first-visit HMP metagenomes on Harvard University’s Odyssey Research Computing Cluster (8 CPU cores per job). Sample counts per body site were as follows: 54 for anterior nares, 65 for buccal mucosa, 68 for supragingival plaque, 73 for tongue dorsum, 76 for stool, and 34 for posterior fornix. (a) At most body sites, ~ 60% of reads were explained by detected pangenomes, with (b) an additional ~ 20% explained by downstream translated search (~80% total). Pangenome search performance (c) consistently exceeded translated search performance (d) by 1–2 orders of magnitude. From smallest to largest, box plot elements in panels a–d represent the lower inner fence, first quartile, median, third quartile, and upper inner fence. Horizontal red lines indicate the median value over all samples. (e) Total runtime is largely dictated by the number of reads passed to translated search, and (for HMP samples with < 100 million reads) was approximately linear in the number of input reads (~1 h/5 million input reads). (f) Peak memory use was sublinear in the number of input reads and very predictable. The cluster of outliers in f results from large samples that were requeued during their runs: these samples resumed later in the HUMAnN2 workflow and hence display smaller peak memory use

Supplementary Figure 4 HUMAnN2 compared with other methods (details).

We profiled a 10-million-read synthetic gut metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of COG abundance. Here, expected (gold standard) and observed COG abundances are compared in units of copies per million (CPMs; that is., raw abundance normalized by gene length and number of mapped reads). HUMAnN2′s tiered search was considerably more accurate than the other methods based on pure translated search. HUMAnN2′s pure translated search showed better agreement than other translated search methods, with its largest source of error being underreporting of low-abundance COGs (false negatives). This behavior is expected from the translated search coverage filters used in HUMAnN2, which we use to limit false positive detection events (that is., COGs with zero expected abundance and non-zero observed abundance). Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 5 Protein coverage thresholds in translated search.

If two largely unrelated proteins share local sequence homology, reads drawn from the homologous region will map to both proteins, potentially resulting in false positive detection events. To limit such events, we require a threshold fraction of sites in a protein to recruit reads during translated search before considering the protein ‘detected’. We evaluated potential thresholds by analyzing the results of pure translated search of synthetic metagenomes versus the UniRef90 database. Trade-offs between sensitivity and precision are shown for the 100-member, even, non-human-associated metagenome in a, and the 20-member, staggered, human-gut-associated metagenome in b. When all community genomes are well covered, a 50% coverage threshold (HUMAnN2′s default) yields a marked increase in precision with only minor loss of sensitivity (a). Loss of sensitivity is higher at this threshold when rare (low-coverage) genomes are included, as genes in low-coverage genomes often fail to meet the coverage threshold due to insufficient read sampling (b). These evaluations do not reflect any additional post-processing of translated search results (for example. weighting by alignment quality), which provide additional accuracy improvements

Supplementary Figure 6 HUMAnN2 compared with other methods: synthetic metatranscriptome evaluation.

We profiled a 10-million-read synthetic gut metatranscriptome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG transcript abundance. Twenty species’ genomic abundance values were geometrically staggered (as in the gut metagenome evaluation), while genes (transcripts) were sampled within-species following a log-normal distribution [ln N(0, 1)]. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel is analogous to Fig. 1e (which focuses on metagenomic COG abundance in the same synthetic community). (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 7 HUMAnN2 compared with other methods: novel isolates of known species, UniRef90-based COG gold standard.

We profiled a 10-million-read synthetic metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG abundance. Twenty recent, new isolates of known species (that is., species present in HUMAnN2′s pangenome database) were sampled at staggered relative abundance. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel and analysis are analogous to those in Fig. 1e. (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 8 HUMAnN2 compared with other methods: novel isolates of known species, UniRef50-based COG gold standard.

This figure mirrors Supplementary Fig. 6, except that COG annotations are defined based on co-clustering with UniRef50 families (rather than UniRef90). Similarly, HUMAnN2 was run in UniRef50 mode. These changes tend to favor sensitivity over specificity during both isolate genome annotation and profiling. (a) Accuracy and performance of the six functional profiling methods. (b) Observed versus expected COG abundance

Supplementary Figure 9 HUMAnN2 compared with other methods: isolates of novel species, UniRef90-based COG gold standard.

We profiled a 10-million-read synthetic metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG abundance. Twenty recent, new isolates of novel species (that is., species not present in HUMAnN2′s pangenome database) were sampled at staggered relative abundance. Note that, in this context, HUMAnN2′s tiered search relies entirely on the translated search phase to explain sample reads. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel and analysis are analogous to those in Fig. 1e. (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives. Vertical striping of “expected COG abundance” results from single-copy COGs that were only assigned to one genome (and hence all have the same expected coverage)

Supplementary Figure 10 HUMAnN2 compared with other methods: isolates of novel species, UniRef50-based COG gold standard.

This figure mirrors Supplementary Fig. 8, except that COG annotations are defined based on co-clustering with UniRef50 families (rather than UniRef90). Similarly, HUMAnN2 was run in UniRef50 mode. These changes tend to favor sensitivity over specificity during both isolate genome annotation and profiling. (a) Accuracy and performance of the six functional profiling methods. (b) Observed versus expected COG abundance

Supplementary Figure 11 Contributional diversity at additional oral sites.

This figure follows the format of Fig. 2 from the main text and includes data for two additional oral body sites: buccal mucosa and supragingival plaque. Stars indicate background species-level community diversity

Supplementary Figure 12 Additional examples of core human microbiome pathways with low within-subject and low between-subject contributional diversity.

Bar heights represent the total relative abundance of the pathway and are log-scaled. Contributions of individual species/other/unclassified are linearly scaled within the total bar height

Supplementary Figure 13 Non-vaginal examples of human microbiome pathways with simple but varied contributional diversity.

Bar heights represent the total relative abundance of the pathway and are log-scaled. Contributions of individual species/other/unclassified are linearly scaled within the total bar height

Supplementary Figure 14 Examples of subspecies-level functional variation (gene level).

(a) Strains of Lactobacillus jensenii were well represented in 21 HMP posterior fornix samples. At least two subspecies-level clades appear to be present, defined by the presence of gene block a1 or a2 (highlighted). (b) Strains of Eubacterium eligens were well represented in 51 HMP stool samples. At least three subspecies-level clades appear to be present, defined by the presence/absence of gene blocks b1, b2, and b3 (highlighted)

Supplementary Figure 15 Example of potential niche-adapted subspecies of Haemophilus haemolyticus.

Metagenomic ‘strains’ (UniRef90 gene family presence/absence profiles) of this species differ across the three oral sites where it was detected. Right-side plots illustrate the coreness, variability, and site-specific enrichment of individual genes. Variability peaks at 1.0 for genes detected in exactly 50% of samples. Site-specific enrichment peaks at 1.0 when the gene is 100% prevalent in a focal site and 0% prevalent in all other sites (with –1 corresponding to the exact opposite scenario)

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Notes 1–7

Reporting Summary

Supplementary Software

The pypi install package for HUMAnN2 v0.11.0 (used in the evaluations from the manuscript)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franzosa, E.A., McIver, L.J., Rahnavard, G. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15, 962–968 (2018). https://doi.org/10.1038/s41592-018-0176-y

Download citation

Received: 23 May 2017
Accepted: 17 July 2018
Published: 30 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41592-018-0176-y

This article is cited by

Co-localization of antibiotic resistance genes is widespread in the infant gut microbiome and associates with an immature gut microbial composition
- Xuanji Li
- Asker Brejnrod
- Søren Johannes Sørensen
Microbiome (2024)
Multi-omic profiling reveals associations between the gut microbiome, host genome and transcriptome in patients with colorectal cancer
- Shaomin Zou
- Chao Yang
- Lekun Fang
Journal of Translational Medicine (2024)
Gut microbiome for predicting immune checkpoint blockade-associated adverse events
- Muni Hu
- Xiaolin Lin
- Haoyan Chen
Genome Medicine (2024)
Differential peripheral immune signatures elicited by vegan versus ketogenic diets in humans
- Verena M. Link
- Poorani Subramanian
- Yasmine Belkaid
Nature Medicine (2024)
Insights into gut microbiomes in stem cell transplantation by comprehensive shotgun long-read sequencing
- Philipp Spohr
- Sebastian Scharf
- Klaus Pfeffer
Scientific Reports (2024)