Species-level functional profiling of metagenomes and metatranscriptomes

Abstract

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community’s known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species’ genomic versus transcriptional contributions, and strain profiling. Further, we introduce ‘contributional diversity’ to explain patterns of ecological assembly across different microbial community types.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: HUMAnN2 functionally profiles microbial communities with high accuracy using tiered search.
Fig. 2: Contributional diversity of core human microbiome pathways.
Fig. 3: Thermocline-associated microbial enzymes in the marine pelagic zone.
Fig. 4: Metatranscriptomic functional profiling and multi’omic data integration with HUMAnN2.

Data availability

The Human Microbiome Project (HMP) metagenomes analyzed in this work are available via http://hmpdacc.org. The IBDMDB metagenomes and metatranscriptomes analyzed in this work are available via http://ibdmdb.org. The Red Sea metagenomes analyzed in this work were previously deposited as NCBI BioProject PRJNA289734. The synthetic metagenomes and metatranscriptomes used in the evaluation of HUMAnN2 and other methods are available from the authors and at http://huttenhower.sph.harvard.edu/humann2.

References

  1. 1.

    Shafquat, A., Joice, R., Simmons, S. L. & Huttenhower, C. Functional and phylogenetic assembly of microbial communities in the human microbiome. Trends Microbiol. 22, 261–266 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Fuhrman, J. A. Microbial community structure and its functional implications. Nature 459, 193–199 (2009).

    CAS  PubMed  Google Scholar 

  3. 3.

    Lloyd-Price, J., Abu-Ali, G. & Huttenhower, C. The healthy human microbiome. Genome Med. 8, 51 (2016).

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Franzosa, E. A. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

    CAS  PubMed  Google Scholar 

  7. 7.

    Silva, G. G., Green, K. T., Dutilh, B. E. & Edwards, R. A. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data. Bioinformatics 32, 354–361 (2016).

    CAS  PubMed  Google Scholar 

  8. 8.

    Sharma, A. K., Gupta, A., Kumar, S., Dhakan, D. B. & Sharma, V. K. Woods: a fast and accurate functionalannotator and classifier of genomic and metagenomic sequences. Genomics 106, 1–6 (2015).

    CAS  PubMed  Google Scholar 

  9. 9.

    Petrenko, P., Lobb, B., Kurtz, D. A., Neufeld, J. D. & Doxey, A. C. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes. BMC Biol. 13, 92 (2015).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    Bose, T., Haque, M. M., Reddy, C. & Mande, S. S. COGNIZER: a framework for functional annotation of metagenomic datasets. PLoS One 10, e0142102 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Kim, J., Kim, M. S., Koh, A. Y., Xie, Y. & Zhan, X. FMAP: functional mapping and analysis pipeline for metagenomics and metatranscriptomics studies. BMC Bioinformatics 17, 420 (2016).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Huson, D. H. et al. MEGAN Community Edition—interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol. 12, e1004957 (2016).

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Nayfach, S. et al. Automated and accurate estimation of gene family abundance from shotgun metagenomes. PLoS Comput. Biol. 11, e1004573 (2015).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  16. 16.

    Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using Diamond. Nat. Methods 12, 59–60 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Zhao, Y., Tang, H. & Ye, Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 125–126 (2012).

    CAS  PubMed  Google Scholar 

  18. 18.

    Hauswedell, H., Singer, J. & Reinert, K. Lambda: the local aligner for massive biological data. Bioinformatics 30, i349–i355 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Scholz, M. et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods 13, 435–438 (2016).

    CAS  PubMed  Google Scholar 

  21. 21.

    Luo, C. et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

    CAS  Google Scholar 

  23. 23.

    Medini, D., Donati, C., Tettelin, H., Masignani, V. & Rappuoli, R. The microbial pan-genome. Curr. Opin. Genet. Dev. 15, 589–594 (2005).

    CAS  PubMed  Google Scholar 

  24. 24.

    Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    CAS  Google Scholar 

  25. 25.

    Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261–D269 (2015).

    CAS  PubMed  Google Scholar 

  26. 26.

    Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    CAS  PubMed  Google Scholar 

  27. 27.

    Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Gene Ontology Consortium. Gene Ontology Consortium: going forward Nucleic Acids Res. 43, D1049–D1056 (2015)..

  29. 29.

    Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016).

    CAS  PubMed  Google Scholar 

  30. 30.

    Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Sczyrba, A. et al. Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Hamady, M. & Knight, R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res. 19, 1141–1152 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108, 4680–4687 (2011).

    CAS  PubMed  Google Scholar 

  34. 34.

    Thompson, L. R. et al. Metagenomic covariation along densely sampled environmental gradients in the Red Sea. ISME J. 11, 138–151,https://doi.org/10.1038/ismej.2016.99 (2017).

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Swan, B. K. et al. Genomic and metabolic diversity of Marine Group I Thaumarchaeota in the mesopelagic of two subtropical gyres. PLoS One 9, e95380 (2014).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Thompson, L. R. et al. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl. Acad. Sci. USA 108, E757–E764 (2011).

    CAS  PubMed  Google Scholar 

  38. 38.

    Integrative HMP (iHMP) Research Network Consortium. The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014)..

  39. 39.

    Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl. Acad.Sci. USA 111, E2329–E2338 (2014).

    CAS  PubMed  Google Scholar 

  40. 40.

    Turnbaugh, P. J. et al. A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009).

    CAS  PubMed  Google Scholar 

  41. 41.

    Burke, C., Steinberg, P., Rusch, D., Kjelleberg, S. & Thomas, T. Bacterial community assembly based on functional genes rather than species. Proc. Natl. Acad. Sci. USA 108, 14288–14293 (2011).

    CAS  PubMed  Google Scholar 

  42. 42.

    Duran-Pinedo, A. E. et al. Community-wide transcriptome of the oral microbiome in subjects with and without periodontitis. ISME J. 8, 1659–1672 (2014).

    PubMed  PubMed Central  Google Scholar 

  43. 43.

    Mason, O. U. et al. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J. 6, 1715–1727 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

    Google Scholar 

  46. 46.

    Huang, K. et al. MetaRef: a pan-genomic database for comparative and community microbial genomics. Nucleic Acids Res. 42, D617–D624 (2014).

    CAS  PubMed  Google Scholar 

  47. 47.

    Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Ye, Y. & Doak, T. G. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput. Biol. 5, e1000465 (2009).

    PubMed  PubMed Central  Google Scholar 

  51. 51.

    Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank M. Wong, T. Sharpton, and the members of the HUMAnN user group for their feedback on the development and evaluation of HUMAnN2. Funding for this work was provided by NSF 1565100 (to J.G.C.); People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) under REA grant agreement PCIG13-GA-2013-618833 and by MIUR “Futuro in Ricerca” RBFR13EWWI_001 (to N.S.); NIH NIDDK U54DE023798, NSF MCB-1453942, NIH NIDDK P30DK043351; and NSF DBI-1053486 (to C.H.).

Author information

Affiliations

Authors

Contributions

E.A.F., L.J.M., and C.H. designed the methods. L.J.M. developed the software implementation. G.R., G.W., and N.S. produced datasets to support the software. E.A.F., L.J.M., G.R., L.R.T., M.S., and K.S.L. designed and carried out the evaluations and applications; R.K., J.G.C., and all other authors participated in interpretation of the resulting data. E.A.F., L.J.M., L.R.T., M.S., K.S.L., and C.H. wrote the paper with feedback from the other authors.

Corresponding author

Correspondence to Curtis Huttenhower.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Expanded overview of the HUMAnN2 method.

(a) HUMAnN2 implements a tiered meta’omic search that aims to explain the origin of microbial community DNA or RNA reads based on the pangenomes of detected microbes before falling back to more computationally expensive translated search. (b) The tiered search produces alignments of reads to coding sequences of known or ambiguous taxonomy. These alignments are processed in a species-specific manner to calculate gene family abundance and reconstruct community metabolic pathways. (c) HUMAnN2 thus provides, for each community meta'ome: per-gene abundances, pathway presence/absence calls and abundances, and downstream visualization and statistical tests

Supplementary Figure 2 Reference hold-out analysis of a complex synthetic metagenome.

We constructed and analyzed with HUMAnN2 a 100-member mock-even synthetic metagenome containing only non-human associated species (~2 × coverage per species). (a) Variation in the number of reads sampled per gene (compared with a genome’s average fold-coverage) makes a non-trivial contribution to the error in per-species gene abundance estimation in HUMAnN2 (roughly 0.1 Bray-Curtis dissimilarity units). (b) Accuracy of community-level gene family abundance estimation decreases linearly with the number of community species missed by HUMAnN2′s taxonomic prescreen (simulated here by excluding sets of species from the underlying pangenome reference collection). (c) HUMAnN2′s overall runtime increases linearly as more species are excluded from the taxonomic prescreen (which results in more work being done during translated search). Runtimes reflect execution using 8 CPU cores

Supplementary Figure 3 HUMAnN2 tiered search performance on human metagenomes.

We applied HUMAnN2′s tiered search to profile 397 first-visit HMP metagenomes on Harvard University’s Odyssey Research Computing Cluster (8 CPU cores per job). Sample counts per body site were as follows: 54 for anterior nares, 65 for buccal mucosa, 68 for supragingival plaque, 73 for tongue dorsum, 76 for stool, and 34 for posterior fornix. (a) At most body sites, ~ 60% of reads were explained by detected pangenomes, with (b) an additional ~ 20% explained by downstream translated search (~80% total). Pangenome search performance (c) consistently exceeded translated search performance (d) by 1–2 orders of magnitude. From smallest to largest, box plot elements in panels a–d represent the lower inner fence, first quartile, median, third quartile, and upper inner fence. Horizontal red lines indicate the median value over all samples. (e) Total runtime is largely dictated by the number of reads passed to translated search, and (for HMP samples with < 100 million reads) was approximately linear in the number of input reads (~1 h/5 million input reads). (f) Peak memory use was sublinear in the number of input reads and very predictable. The cluster of outliers in f results from large samples that were requeued during their runs: these samples resumed later in the HUMAnN2 workflow and hence display smaller peak memory use

Supplementary Figure 4 HUMAnN2 compared with other methods (details).

We profiled a 10-million-read synthetic gut metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of COG abundance. Here, expected (gold standard) and observed COG abundances are compared in units of copies per million (CPMs; that is., raw abundance normalized by gene length and number of mapped reads). HUMAnN2′s tiered search was considerably more accurate than the other methods based on pure translated search. HUMAnN2′s pure translated search showed better agreement than other translated search methods, with its largest source of error being underreporting of low-abundance COGs (false negatives). This behavior is expected from the translated search coverage filters used in HUMAnN2, which we use to limit false positive detection events (that is., COGs with zero expected abundance and non-zero observed abundance). Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 5 Protein coverage thresholds in translated search.

If two largely unrelated proteins share local sequence homology, reads drawn from the homologous region will map to both proteins, potentially resulting in false positive detection events. To limit such events, we require a threshold fraction of sites in a protein to recruit reads during translated search before considering the protein ‘detected’. We evaluated potential thresholds by analyzing the results of pure translated search of synthetic metagenomes versus the UniRef90 database. Trade-offs between sensitivity and precision are shown for the 100-member, even, non-human-associated metagenome in a, and the 20-member, staggered, human-gut-associated metagenome in b. When all community genomes are well covered, a 50% coverage threshold (HUMAnN2′s default) yields a marked increase in precision with only minor loss of sensitivity (a). Loss of sensitivity is higher at this threshold when rare (low-coverage) genomes are included, as genes in low-coverage genomes often fail to meet the coverage threshold due to insufficient read sampling (b). These evaluations do not reflect any additional post-processing of translated search results (for example. weighting by alignment quality), which provide additional accuracy improvements

Supplementary Figure 6 HUMAnN2 compared with other methods: synthetic metatranscriptome evaluation.

We profiled a 10-million-read synthetic gut metatranscriptome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG transcript abundance. Twenty species’ genomic abundance values were geometrically staggered (as in the gut metagenome evaluation), while genes (transcripts) were sampled within-species following a log-normal distribution [ln N(0, 1)]. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel is analogous to Fig. 1e (which focuses on metagenomic COG abundance in the same synthetic community). (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 7 HUMAnN2 compared with other methods: novel isolates of known species, UniRef90-based COG gold standard.

We profiled a 10-million-read synthetic metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG abundance. Twenty recent, new isolates of known species (that is., species present in HUMAnN2′s pangenome database) were sampled at staggered relative abundance. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel and analysis are analogous to those in Fig. 1e. (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives

Supplementary Figure 8 HUMAnN2 compared with other methods: novel isolates of known species, UniRef50-based COG gold standard.

This figure mirrors Supplementary Fig. 6, except that COG annotations are defined based on co-clustering with UniRef50 families (rather than UniRef90). Similarly, HUMAnN2 was run in UniRef50 mode. These changes tend to favor sensitivity over specificity during both isolate genome annotation and profiling. (a) Accuracy and performance of the six functional profiling methods. (b) Observed versus expected COG abundance

Supplementary Figure 9 HUMAnN2 compared with other methods: isolates of novel species, UniRef90-based COG gold standard.

We profiled a 10-million-read synthetic metagenome using HUMAnN2 (tiered and pure translated search modes), HUMAnN1, COGNIZER, MEGAN, and ShotMAP to produce profiles of community-level COG abundance. Twenty recent, new isolates of novel species (that is., species not present in HUMAnN2′s pangenome database) were sampled at staggered relative abundance. Note that, in this context, HUMAnN2′s tiered search relies entirely on the translated search phase to explain sample reads. (a) Measures of methods’ accuracy and performance in this evaluation. All methods were allowed to use 8 CPU cores and up to 30 GB of memory. This panel and analysis are analogous to those in Fig. 1e. (b) Observed versus expected COG transcript abundance across the six methods. This panel is analogous to Supplementary Fig. 4. CPM refers to “copies per million.” Ticks in the x- and y-axis margins represent zero values; x-axis ticks are false negatives and y-axis ticks are false positives. Vertical striping of “expected COG abundance” results from single-copy COGs that were only assigned to one genome (and hence all have the same expected coverage)

Supplementary Figure 10 HUMAnN2 compared with other methods: isolates of novel species, UniRef50-based COG gold standard.

This figure mirrors Supplementary Fig. 8, except that COG annotations are defined based on co-clustering with UniRef50 families (rather than UniRef90). Similarly, HUMAnN2 was run in UniRef50 mode. These changes tend to favor sensitivity over specificity during both isolate genome annotation and profiling. (a) Accuracy and performance of the six functional profiling methods. (b) Observed versus expected COG abundance

Supplementary Figure 11 Contributional diversity at additional oral sites.

This figure follows the format of Fig. 2 from the main text and includes data for two additional oral body sites: buccal mucosa and supragingival plaque. Stars indicate background species-level community diversity

Supplementary Figure 12 Additional examples of core human microbiome pathways with low within-subject and low between-subject contributional diversity.

Bar heights represent the total relative abundance of the pathway and are log-scaled. Contributions of individual species/other/unclassified are linearly scaled within the total bar height

Supplementary Figure 13 Non-vaginal examples of human microbiome pathways with simple but varied contributional diversity.

Bar heights represent the total relative abundance of the pathway and are log-scaled. Contributions of individual species/other/unclassified are linearly scaled within the total bar height

Supplementary Figure 14 Examples of subspecies-level functional variation (gene level).

(a) Strains of Lactobacillus jensenii were well represented in 21 HMP posterior fornix samples. At least two subspecies-level clades appear to be present, defined by the presence of gene block a1 or a2 (highlighted). (b) Strains of Eubacterium eligens were well represented in 51 HMP stool samples. At least three subspecies-level clades appear to be present, defined by the presence/absence of gene blocks b1, b2, and b3 (highlighted)

Supplementary Figure 15 Example of potential niche-adapted subspecies of Haemophilus haemolyticus.

Metagenomic ‘strains’ (UniRef90 gene family presence/absence profiles) of this species differ across the three oral sites where it was detected. Right-side plots illustrate the coreness, variability, and site-specific enrichment of individual genes. Variability peaks at 1.0 for genes detected in exactly 50% of samples. Site-specific enrichment peaks at 1.0 when the gene is 100% prevalent in a focal site and 0% prevalent in all other sites (with –1 corresponding to the exact opposite scenario)

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Notes 1–7

Reporting Summary

Supplementary Software

The pypi install package for HUMAnN2 v0.11.0 (used in the evaluations from the manuscript)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Franzosa, E.A., McIver, L.J., Rahnavard, G. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 15, 962–968 (2018). https://doi.org/10.1038/s41592-018-0176-y

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing