A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research

Article metrics


Our understanding of how the gut microbiome interacts with its human host has been restrained by limited access to longitudinal datasets to examine stability and dynamics, and by having only a few isolates to test mechanistic hypotheses. Here, we present the Broad Institute-OpenBiome Microbiome Library (BIO-ML), a comprehensive collection of 7,758 gut bacterial isolates paired with 3,632 genome sequences and longitudinal multi-omics data. We show that microbial species maintain stable population sizes within and across humans and that commonly used ‘omics’ survey methods are more reliable when using averages over multiple days of sampling. Variation of gut metabolites within people over time is associated with amino acid levels, and differences across people are associated with differences in bile acids. Finally, we show that genomic diversification can be used to infer eco-evolutionary dynamics and in vivo selection pressures for strains within individuals. The BIO-ML is a unique resource designed to enable hypothesis-driven microbiome research.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The BIO-ML library of human gut bacterial isolates.
Fig. 2: The BIO-ML of isolate genomes is large and diverse.
Fig. 3: Rapid genomic evolution of gut commensal bacteria within people.
Fig. 4: Densely sampled longitudinal data greatly improve ecological inferences.
Fig. 5: Eco-evolutionary dynamics of human gut bacterial strains and impact on community stability.
Fig. 6: Gut metabolome profiles are highly specific to individual people, and this is mostly driven by differences in bile-acid concentrations.

Data Availability

Sequencing and genomic data were deposited on the NCBI, under BioProject PRJNA544527.

BioSample accession numbers for raw sequencing data of isolate genomes: SAMN11846030-SAMN11847029; SAMN11847047-SAMN11848046; SAMN11848055SAMN11849054; SAMN11849056-SAMN11849687.

BioSample accession numbers for isolate genome assemblies:

SAMN11943001SAMN11944000; SAMN11944002-SAMN11945001; SAMN11945004-SAMN11946003; SAMN11946038-SAMN11946669.

BioSample accession numbers for raw 16S data:

SAMN11941243SAMN11942242; SAMN11942243SAMN11942410.

BioSample accession numbers for metagenomic data:


The processed metabolomics data is available at the NIH Common Fund’s Metabolomics Data Repository and Coordinating Center (supported by NIH grant, U01-DK097430) website, the Metabolomics Workbench, http://www.metabolomicsworkbench.org, where it has been assigned Project ID PR000804.

Scripts and command lines used to analyze the sequencing and genomic data are available at https://github.com/almlab/BIO-ML.

The library of isolates is maintained and stored at the Broad Institute and strains will be made available for purchase upon request by researchers through a Broad Institute online platform: https://www.broadinstitute.org/bio-ml


  1. 1.

    Shen, T.-C. D. et al. Engineering the gut microbiota to treat hyperammonemia. J. Clin. Invest. 125, 2841–2850 (2015).

  2. 2.

    Ronda, C., Chen, S. P., Cabral, V., Yaung, S. J. & Wang, H. H. Metagenomic engineering of the mammalian gut microbiome in situ. Nat. Methods 16, 167–170 (2019).

  3. 3.

    Holmes, E. et al. Therapeutic modulation of microbiota-host metabolic interactions. Sci. Transl. Med. 4, 137rv6 (2012).

  4. 4.

    van Nood, E. et al. Duodenal infusion of donor feces for recurrent Clostridium difficile. N. Engl. J. Med. 368, 407–415 (2013).

  5. 5.

    Kassam, Z., Lee, C. H., Yuan, Y. & Hunt, R. H. Fecal microbiota transplantation for Clostridium difficile infection: systematic review and meta-analysis. Am. J. Gastroenterol. 108, 500–508 (2013).

  6. 6.

    Moayyedi, P. et al. Fecal microbiota transplantation induces remission in patients with active ulcerative colitis in a randomized controlled trial. Gastroenterology 149, 102–109.e6 (2015).

  7. 7.

    Ratner, M. Microbial cocktails join fecal transplants in IBD treatment trials. Nat. Biotechnol. 33, 787–788 (2015).

  8. 8.

    Mullish, B. H., McDonald, J. A. K., Thursz, M. R. & Marchesi, J. R. Fecal microbiota transplant from a rational stool donor improves hepatic encephalopathy: a randomized clinical trial. Hepatology 66, 1354–1355 (2017).

  9. 9.

    Flameling, I. A. & Rijkers, G. T. Fecal Microbiota Transplants as a Treatment Option for Parkinson’s Disease. Gut Microbiota - Brain Axis https://doi.org/10.5772/intechopen.78666(2018).

  10. 10.

    Fischer, M., Bittar, M., Papa, E., Kassam, Z. & Smith, M. Can you cause inflammatory bowel disease with fecal transplantation? A 31-patient case-series of fecal transplantation using stool from a donor who later developed Crohn’s disease. Gut Microbes 8, 205–207 (2017).

  11. 11.

    Smillie, C. S. et al. Strain tracking reveals the determinants of bacterial engraftment in the human gut following fecal microbiota transplantation. Cell Host Microbe 23, 229–240.e5 (2018).

  12. 12.

    Li, S. S. et al. Durable coexistence of donor and recipient strains after fecal microbiota transplantation. Science 352, 586–589 (2016).

  13. 13.

    Human Microbiome Jumpstart Reference Strains Consortium. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

  14. 14.

    Faith, J. J. et al. The long-term stability of the human gut microbiota. Science 341, 1237439–1237439 (2013).

  15. 15.

    Goodman, A. L. et al. Extensive personal human gut microbiota culture collections characterized and manipulated in gnotobiotic mice. Proc. Natl Acad. Sci. USA 108, 6252–6257 (2011).

  16. 16.

    Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).

  17. 17.

    Lagier, J.-C. et al. Culture of previously uncultured members of the human gut microbiota by culturomics. Nat. Microbiol. 1, 16203 (2016).

  18. 18.

    Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).

  19. 19.

    Forster, S. C. et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat. Biotechnol. 37, 186–192 (2019).

  20. 20.

    Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

  21. 21.

    Zhao, S. et al. Adaptive evolution within the gut microbiome of individual people. Preprint at https://doi.org/10.1101/208009 (2017).

  22. 22.

    Greenblum, S., Carr, R. & Borenstein, E. Extensive strain-level copy-number variation across human gut microbiome species. Cell 160, 583–594 (2015).

  23. 23.

    Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. Preprint at https://doi.org/10.1101/210955 (2017).

  24. 24.

    Ahern, P. P., Faith, J. J. & Gordon, J. I. Mining the human gut microbiota for effector strains that shape the immune system. Immunity 40, 815–823 (2014).

  25. 25.

    Bron, P. A., van Baarlen, P. & Kleerebezem, M. Emerging molecular insights into the interaction between probiotics and the host intestinal mucosa. Nat. Rev. Microbiol. 10, 66–78 (2011).

  26. 26.

    Barboza, M. et al. Glycoprofiling bifidobacterial consumption of galacto-oligosaccharides by mass spectrometry reveals strain-specific, preferential consumption of glycans. Appl. Environ. Microbiol. 75, 7319–7325 (2009).

  27. 27.

    Rossi, M. et al. Fermentation of fructooligosaccharides and inulin by bifidobacteria: a comparative study of pure and fecal cultures. Appl. Environ. Microbiol. 71, 6150–6158 (2005).

  28. 28.

    Lopez-Siles, M. et al. Cultured representatives of two major phylogroups of human colonic Faecalibacterium prausnitzii can utilize pectin, uronic acids, and host-derived substrates for growth. Appl. Environ. Microbiol. 78, 420–428 (2012).

  29. 29.

    Haiser, H. J. et al. Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella lenta. Science 341, 295–298 (2013).

  30. 30.

    Wilson, I. D. & Nicholson, J. K. Gut microbiome interactions with drug metabolism, efficacy, and toxicity. Transl. Res. 179, 204–222 (2017).

  31. 31.

    Cover, T. L. Helicobacter pylori diversity and gastric cancer risk. MBio 7, e01869–15 (2016).

  32. 32.

    Arthur, J. C. et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science 338, 120–123 (2012).

  33. 33.

    Conway, T. & Cohen, P. S. Commensal and pathogenic Escherichia coli metabolism in the gut. Microbiol Spectr 3, https://doi.org/10.1128/microbiolspec.MBP-0006-2014 (2015).

  34. 34.

    Rettedal, E. A., Gumpert, H. & Sommer, M. O. A. Cultivation-based multiplex phenotyping of human gut microbiota allows targeted recovery of previously uncultured bacteria. Nat. Commun. 5, 4714 (2014).

  35. 35.

    Lau, J. T. et al. Capturing the diversity of the human gut microbiota through culture-enriched molecular profiling. Genome Med. 8, 72 (2016).

  36. 36.

    Kearney, S. M. et al. Endospores and other lysis-resistant bacteria comprise a widely shared core community within the human microbiota. ISME J. 12, 2403-2416 (2018).

  37. 37.

    Fodor, A. A. et al. The ‘Most Wanted’ taxa from the human microbiome for whole genome sequencing. PLoS ONE 7, e41294 (2012).

  38. 38.

    Derrien, M., Vaughan, E. E., Plugge, C. M. & de Vos, W. M. Akkermansia muciniphila gen. nov., sp. nov., a human intestinal mucin-degrading bacterium. Int. J. Syst. Evol. Microbiol. 54, 1469–1476 (2004).

  39. 39.

    Schneeberger, M. et al. Akkermansia muciniphila inversely correlates with the onset of inflammation, altered adipose tissue metabolism and metabolic disorders during obesity in mice. Sci. Rep. 5, 16643 (2015).

  40. 40.

    Dao, M. C. et al. Akkermansia muciniphila and improved metabolic health during a dietary intervention in obesity: relationship with gut microbiome richness and ecology. Gut 65, 426–436 (2016).

  41. 41.

    Sokol, H. et al. Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc. Natl Acad. Sci. USA 105, 16731–16736 (2008).

  42. 42.

    Miquel, S. et al. Faecalibacterium prausnitzii and human intestinal health. Curr. Opin. Microbiol. 16, 255–261 (2013).

  43. 43.

    Galperin, M. Y. et al. Genomic determinants of sporulation in Bacilli and Clostridia: towards the minimal set of sporulation-specific genes. Environ. Microbiol. 14, 2870–2890 (2012).

  44. 44.

    Daubin, V., Moran, N. A. & Ochman, H. Phylogenetics and the cohesion of bacterial genomes. Science 301, 829–832 (2003).

  45. 45.

    Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K. & Knight, R. Diversity, stability and resilience of the human gut microbiota. Nature 489, 220–230 (2012).

  46. 46.

    Windey, K., De Preter, V. & Verbeke, K. Relevance of protein fermentation to gut health. Mol. Nutr. Food Res. 56, 184–196 (2012).

  47. 47.

    Jansson, J. et al. Metabolomics reveals metabolic biomarkers of Crohn’s disease. PLoS One 4, e6386 (2009).

  48. 48.

    Weir, T. L. et al. Stool microbiome and metabolome differences between colorectal cancer patients and healthy adults. PLoS One 8, e70803 (2013).

  49. 49.

    Tramontano, M. et al. Nutritional preferences of human gut bacteria reveal their metabolic idiosyncrasies. Nat. Microbiol. 3, 514–522 (2018).

  50. 50.

    Maier, L. et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature 555, 623–628 (2018).

  51. 51.

    Atarashi, K. et al. Treg induction by a rationally selected mixture of Clostridia strains from the human microbiota. Nature 500, 232–236 (2013).

  52. 52.

    Wlodarska, M. et al. Indoleacrylic acid produced by commensal peptostreptococcus species suppresses inflammation. Cell Host Microbe 22, 25–37.e6 (2017).

  53. 53.

    Tanoue, T. et al. A defined commensal consortium elicits CD8 T cells and anti-cancer immunity. Nature 565, 600–605 (2019).

  54. 54.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  55. 55.

    Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

  56. 56.

    Nadalin, F., Vezzi, F. & Policriti, A. GapFiller: a de novo assembly approach to fill the gap within paired reads. BMC Bioinforma. 13, S8 (2012).

  57. 57.

    Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).

  58. 58.

    Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  59. 59.

    Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).

  60. 60.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  61. 61.

    Li, H. et al. The sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  62. 62.

    Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).

  63. 63.

    Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

  64. 64.

    Jauffrit, F. et al. RiboDB Database: a comprehensive resource for prokaryotic systematics. Mol. Biol. Evol. 33, 2170–2172 (2016).

  65. 65.

    Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).

  66. 66.

    Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

  67. 67.

    Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).

  68. 68.

    Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma. 11, 119 (2010).

  69. 69.

    Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

  70. 70.

    Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28, 33–36 (2000).

  71. 71.

    O’Sullivan, J. F. et al. Dimethylguanidino valeric acid is a marker of liver fat and predicts diabetes. J. Clin. Invest 127, 4394–4402 (2017).

  72. 72.

    Galili, T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 31, 3718–3720 (2015).

  73. 73.

    Kassambara, A. Practical Guide To Principal Component Methods in R: PCA, M(CA), FAMD, MFA, HCPC, factoextra (STHDA, 2017).

Download references


The authors are thankful to M. Sovie, C. Kim, W. Kelley, E. Lee, W. Pettee, J. Watson and P. Panchal from OpenBiome for their assistance in processing materials and donor metadata used in this study. This work was funded by a grant from the Broad Institute (Broad Next 10 grant 4000017).

Author information

M.P., M.G., S.M.G, R.J.X. and E.J.A. designed the project. M.P. and M.G. built the library of bacterial isolates and whole genomes. M.P. and M.G analyzed whole-genome sequence data. S.M.K., M.G. and M.P. analyzed the sporulation and ethanol-resistance data. S.M.G. analyzed the 16S data. S.M.G. and X.J. analyzed the metagenomics data. J.A.-P. analyzed the metabolomics data. M.P., S.M.K. and A.R.P. designed the culturing protocols. M.P. and B.B. curate the library of isolates. S.Z. and T.D.L. provided technical advice for WGS library preparation and analysis. P.K.S. and M.S. provided OpenBiome samples and associated metadata. S.R., J.E.A, S.A.R., J.L. and H.V. generated the 16s and metagenomics data. C.C., K.B., A.D., J.S. and K.A.P. generated the metabolomics data. M.P., M.G., S.M.G. and E.J.A. wrote the paper, with input from all authors. E.J.A. and R.J.X. obtained funding and supervised the project.

Correspondence to R. J. Xavier or E. J. Alm.

Ethics declarations

Competing interests

M.S. and E.J.A. are co-founders and shareholders of Finch Therapeutics, a company that specializes in microbiome-targeted therapeutics.

Additional information

Peer review information: Alison Farrell is the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Description of the BIO-ML.

a, 16S phylogenetic tree of the 7,758 isolates in the BIO-ML. Lineages are colored by phylum. b, Depiction of the distribution of 1,347 isolates across 24 bacterial species (y axis) over time (x axis) that were whole-genome sequenced. c, Depiction of the distribution of 1,168 samples across individuals (y axis) and over time (x axis) that were processed for 16S amplicon sequencing. d, Depiction of the distribution of 563 samples across individuals and over time that were processed for shotgun metagenomic sequencing. e, Depiction of the distribution of 179 samples across individuals and over time that were processed for metabolomic study.

Extended Data Fig. 2 Taxonomic coverage and composition of the BIO-ML of isolates and genomes.

a, Abundance-weighted taxonomic coverage of the library of bacterial isolates (7,758 isolates) (y axis), compared to the diversity observed through culture-independent 16S amplicon sequencing (x axis). Eleven donors were used to build the library of isolates. The phylogenetic diversity of isolates was measured with 16S sanger sequencing, and this was compared to the total diversity observed in the 16S sequence data obtained from 1,168 samples from 90 individual donors of the BIO-ML. Taxonomic coverage was evaluated using different 16S OTU clustering thresholds, from 90% to 100% (ASV) similarity. b, Phylogenomic tree of the 3,632 genomes of the BIO-ML. Branches are colored by phylum.

Extended Data Fig. 3 The library of genomes contain multiple species within the Faecalibacterium and Akkermansia genera.

Phylogenetic trees of Faecalibacterium (a) and Akkermansia (b) genomes were reconstructed using the concatenate alignment of ribosomal proteins (see Methods). We used RAxML to reconstruct the tree, using the PROTGAMMALGF substitution model. Pairwise Mash distances are represented on the right of each tree. Within each major clade, pairwise Mash distances were lower than 0.05, the threshold used to define species taxonomic units. Between clades, pairwise distances were higher than 0.05. Genomes in the F. prausnitzii and A. muciniphila clades have Mash distances with corresponding NCBI reference genomes that were lower than 0.05. Two different Akkermansia species are present in our genome library. At least two different Faecalibacterium species are present in the genome library.

Extended Data Fig. 4 Stability and conservation of microbiome species over time within and across people.

a, Non-metric multidimensional scaling (NMDS) plot showing 16S community structure (Bray–Curtis distances) across long-term time series from ten stool donors. Samples are colored by donors (right). Donors maintain unique microbial signatures over many months to years (ANOSIM, P < 0.0001). b, The black points show the median abundance comparisons, and the red points show the results for a single, randomly drawn sample. Species abundances are conserved across donor pairs. The spread in the red points is larger than for the black points, indicating the median abundances show a tighter correlation across donors (black points Pearson’s R2 = 0.25; red points Pearson’s R2 = 0.19).

Extended Data Fig. 5 Stability and conservation of microbiome functions over time within and across people.

a, NMDS plot showing functional structure (Bray–Curtis distances) across long-term time series from four stool donors. Donors maintain unique functional signatures over many months-to-years. b, COG abundances are conserved across donor pairs. The black points show the median abundance comparisons, and the red points show the results for a single, randomly drawn sample. The spread in the red points is larger than that for the black points, indicating the median abundances show a tighter correlation across donors (black points Pearson’s R2 = 0.88; red points Pearson’s R2 = 0.77).

Extended Data Fig. 6 Averaging taxa abundances across time points improves the identification of species–species correlations.

a, Correlation matrix of log median ASV relative abundances across ten donors with long, dense time series (that is cross-sectional correlations) filtered to only look at abundant SVs with average frequencies of ≥0.01 across the dataset. b, Distribution of correlation coefficients from panel a. Dashed lines show the significance threshold (P < 0.05). Correlations beyond this threshold were used to infer a cross-sectional correlation network from the full dataset. c, The fraction of edges from the cross-sectional correlation network inferred from the full dataset that are captured by random subsampling of donor time series. Choosing a single sample from each donor only captures ~40% of ‘true’ network edges (number of iterations = 10).

Extended Data Fig. 7 Metabolomics data capture crossdonor variation as well as within-donor variation through time.

a, PC scores plot of all 179 samples for which metabolomic data were generated. Samples colored in gray correspond to subjects for which metabolomics data had been generated for less than six time points. Arrows connecting samples reflect the chronological order in which samples were collected. b, Dendrogram for donors for which metabolomics data had been generated for more than six time points. Metabolomes are colored by subject, as in a. The first two letters indicate the donor ID.

Extended Data Fig. 8 Bacterial taxa–metabolites correlation network reveals strong functional associations in the human gut.

Significant correlations between bacterial taxa and metabolite abundances (|Spearman’s rho| > 0.7, P < 0.01) suggest a link between eating meat and bacterial community composition. Alistipes and Subdoligranulum are strongly associated with the bile acid taurocholate and its derivatives. Subdoligranulum is also associated with carnitine, which has been linked to eating meat. Other taxa are associated with acids and lipids common to the gut environment.

Supplementary information

Supplementary Information

Supplementary Methods 1 and 2

Reporting Summary

Supplementary Table

Supplementary Tables 1–4

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading