Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

FEAST: fast expectation-maximization for microbial source tracking

Abstract

A major challenge of analyzing the compositional structure of microbiome data is identifying its potential origins. Here, we introduce fast expectation-maximization microbial source tracking (FEAST), a ready-to-use scalable framework that can simultaneously estimate the contribution of thousands of potential source environments in a timely manner, thereby helping unravel the origins of complex microbial communities (https://github.com/cozygene/FEAST). The information gained from FEAST may provide insight into quantifying contamination, tracking the formation of developing microbial communities, as well as distinguishing and characterizing bacteria-related health conditions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Methods comparison.
Fig. 2: Running time comparison to current state-of-the-art.
Fig. 3: FEAST estimations of source contribution to the sink; that is, gut microbiome of focal infant at 12-months of age.
Fig. 4: The proportion of the unknown sources in kitchen counter samples using FEAST and SourceTracker.
Fig. 5: The receiver operating characteristic curve using FEAST, weighted UniFrac and Jensen–Shannon divergence to classify healthy individuals and patients in ICU with dysbiosis.
Fig. 6: Significant differences in the distribution of the unknown source between sink samples before and during the first event of intestinal domination across 94 patients undergoing allo-HSCT.

Similar content being viewed by others

Data availability

All of the datasets analyzed in this paper are public and can be referenced at the following accession numbers: The first dataset was collected and studied by Backhed et al.16 (accession number ERP005989). The second dataset was collected and studied by Lax et al.15 (accession number ERP005806). The third dataset was collected and studied by Knights et al.10 (data from this study are stored in https://github.com/danknights/sourcetracker). The fourth dataset was collected and studied by McDonald et al.12 (accession number ERP012810) and the American Gut Project30 (EBI project number PRJEB11419). The fifth dataset was collected and studied by Taur et al.18 (data from this study are stored in http://www.ncbi.nlm.nih.gov/sra). In our simulations we used the Earth microbiome project (ftp://ftp.microbio.me/emp/release1/otu_tables/closed_ref_greengenes/).

Code availability

Code is available at https://github.com/cozygene/FEAST

References

  1. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Kau, A. L., Ahern, P. P., Griffin, N. W., Goodman, A. L. & Gordon, J. I. Human nutrition, the gut microbiome and the immune system. Nature 474, 327–336 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Turnbaugh, P. J. & Gordon, J. I. The core gut microbiome, energy balance and obesity. J. Physiol. 587, 4153–4158 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ridaura, V. K. et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 341, 1241214 (2013).

    Article  PubMed  Google Scholar 

  5. Simpson, J. M., Santo Domingo, J. W. & Reasoner, D. J. Microbial source tracking: state of the science. Environ. Sci. Technol. 36, 5279–5288 (2002).

    Article  CAS  PubMed  Google Scholar 

  6. Wu, C. H. et al. Characterization of coastal urban watershed bacterial communities leads to alternative community-based indicators. PLoS ONE 5, e11285 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Greenberg, J., Price, B. & Ware, A. Alternative estimate of source distribution in microbial source tracking using posterior probabilities. Water Res. 44, 2629–2637 (2010).

    Article  CAS  PubMed  Google Scholar 

  8. Dufrêne, M. & Legendre, P. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr. 67, 345–366 (1997).

    Google Scholar 

  9. Smith, A., Sterba-Boatwright, B. & Mott, J. Novel application of a statistical technique, Random Forests, in a bacterial source tracking study. Water Res. 44, 4067–4076 (2010).

    Article  CAS  PubMed  Google Scholar 

  10. Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Devane, M. L., Weaver, L., Singh, S. K. & Gilpin, B. J. Fecal source tracking methods to elucidate critical sources of pathogens and contaminant microbial transport through New Zealand agricultural watersheds—a review. J. Environ. Manag. 222, 293–303 (2018).

    Article  Google Scholar 

  12. McDonald, D. et al. Extreme dysbiosis of the microbiome in critical illness. mSphere 1, pii: e00199-16 (2016).

  13. Dominguez-Bello, M. G. et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Teaf, C. M., Flores, D., Garber, M. & Harwood, V. J. Toward forensic uses of microbial source tracking. Microbiol. Spectr. 6, https://doi.org/10.1128/microbiolspec.EMF-0014-2017 (2018).

  15. Lax, S. et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science 345, 1048–1052 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Backhed, F. et al. Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17, 690–703 (2015).

    Article  PubMed  Google Scholar 

  17. Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Taur, Y. et al. Intestinal domination and the risk of bacteremia in patients undergoing allogeneic hematopoietic stem cell transplantation. Clin. Infect. Dis. 55, 905–914 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).

    Article  PubMed  Google Scholar 

  20. Ley, R. E. Obesity and the human microbiome. Curr. Opin. Gastroenterol. 26, 5–11 (2010).

    Article  PubMed  Google Scholar 

  21. Turnbaugh, P. J., Bäckhed, F., Fulton, L. & Gordon, J. I. Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe 3, 213–223 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102, 11070–11075 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Koren, O. et al. Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc. Natl Acad. Sci. USA 108, 4592–4598 (2011).

    Article  CAS  PubMed  Google Scholar 

  24. Clemente, J. C., Ursell, L. K., Parfrey, L. W. & Knight, R. The impact of the gut microbiota on human health: an integrative view. Cell 148, 1258–1270 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

    Article  PubMed  Google Scholar 

  26. Clarke, S. F. et al. The gut microbiota and its relationship to diet and obesity: new insights. Gut Microbes 3, 186–202 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Jeffery, I. B., Quigley, E. M. M., Öhman, L., Simrén, M. & O’Toole, P. W. The microbiota link to irritable bowel syndrome: an emerging story. Gut Microbes 3, 572–576 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Marchesi, J. R. et al. Towards the human colorectal cancer microbiome. PLoS ONE 6, e20447 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    Article  CAS  PubMed  Google Scholar 

  30. McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00031-18 (2018).

  31. Moon, T. K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 13, 47–60 (1996).

    Article  Google Scholar 

  32. Silverman, J. D., Shenhav, L., Halperin, E. A., Mukherjee, S. A. & David, L. A. Statistical considerations in the design and analysis of longitudinal microbiome studies. Preprint at bioRxiv: https://doi.org/10.1101/448332 (2018).

  33. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Tang, H., Peng, J., Wang, P. & Risch, N. J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005).

    Article  PubMed  Google Scholar 

  35. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Deloger, M., El Karoui, M. & Petit, M.-A. A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J. Bacteriol. 191, 91–99 (2009).

    Article  CAS  PubMed  Google Scholar 

  38. Leung, H. C. M. et al. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27, 1489–1495 (2011).

    Article  CAS  PubMed  Google Scholar 

  39. Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lauber, C. L., Hamady, M., Knight, R. & Fierer, N. Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale. Appl. Environ. Microbiol. 75, 5111–5120 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank S. Mukherjee for insightful comments on the manuscript. This research was partially supported by European Research Council under the European Union’s Horizon 2020 research and innovation program, project number 640384. This work was partially supported by the National Science Foundation (grant number 1705197). T.A.J. was supported by National Science Foundation (grant no. DGE-1644869).

Author information

Authors and Affiliations

Authors

Contributions

L.S. and E.H. conceived the statistical model. L.S. designed the algorithm and software, and performed computational experiments. L.S., M.T., T.A.J. and L.B. wrote the manuscript. O.F. and D.B. contributed to writing the manuscript. T.A.J. and M.T. contributed to algorithm design. M.T. and L.B contributed to the computational experiments. I.M., I.P. and E.H. supervised the project.

Corresponding author

Correspondence to Eran Halperin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 The accuracy of FEAST and SourceTracker using data-driven synthetic mixtures.

The accuracy of FEAST and SourceTracker on simulated data. Each simulation was performed using 10 real source environments and simulated sinks. The x-axis is average Jensen-Shannon divergence value across known sources. The y-axis represents correlation across all source environments between true and estimated mixing proportions, measured by (a) the squared Pearson correlation coefficient averaged across sources, and (b) the squared Spearman correlation coefficient averaged across sources.

Supplementary Figure 2 Evaluation of FEAST and SourceTracker through varying levels of sequencing depth.

Evaluation of FEAST and SourceTracker through varying levels of sequencing depth. Similarity of sequences remained constant (Jensen-Shannon divergence = 0.95, trivial to disambiguate), while sequencing depth was set to vary in the range 100–10,000.

Supplementary Figure 3 The expected variance in FEAST's output.

The expected variance in FEAST's output using the dataset from McDonald et al. We used the gut microbiome of one, randomly selected, ICU patient as a sink, and the sources considered by McDonald et al.: 126 healthy controls, 126 samples of mammalian corpse decomposition, 126 samples of the gut from healthy children, and 126 samples from indoor house surfaces. By repeating this analysis 100 times and calculating the standard deviation of each source we demonstrate that the variance in FEAST’s output is very small (that is., sd(dust) = 7.7e-05, sd(healthy adults' feces) = 0.01, sd(healthy children's feces) = 0.01,sd(soil) = 5e-05, sd(unknown) = 8.5e-05).

Supplementary Figure 4 The effect of noisy samples among sources on prediction accuracy.

The effect of noisy samples among sources on prediction accuracy (that is., estimation of the known and unknown sources). As we increase the number of samples per source, FEAST’s prediction accuracy improves, however this effect is moderate (squared Pearson correlation ranges from 0.9–0.99, Jensen-Shannon divergence values range from 0.87–0.92).

Supplementary Figure 5 The source proportions using SourceTracker.

SourceTracker estimations of source contribution (the gut microbiome of mother, infant at 4 months and infant at birth) to the gut microbiome of 12-month-old infants. According to SourceTracker differences between C-section (n = 15) and Vaginally-delivered (n = 83) infants in terms of maternal contribution are not significant (two-sided t-test p-value = 0.6408). Box plots indicate the median (central lines), interquartile range (hinges), and the 5th and 95th percentiles (whiskers).

Supplementary Figure 6 Detecting contamination in lab-settings.

FEAST and SourceTracker report consistent proportions of contamination, despite minor discrepancies in a lab-setting (left: keyboard, right: Counter). Estimates on the top row were reported by SourceTracker and estimates on the bottom row were reported by FEAST.

Supplementary Figure 7 Gut microbiome samples from ICU patients are not reminiscent of gut samples from healthy individuals.

Gut samples from ICU patients are not reminiscent of gut samples from healthy individuals. We used the gut microbiome of each ICU patient (at discharge or after 10 days) as a sink, and the sources considered by the original study (McDonald et al. 2016): 126 samples from the American Gut Project (healthy controls), 126 samples of mammalian corpse decomposition, 126 samples of the gut from healthy children (Global Gut study), and 126 samples from indoor house surfaces.

Supplementary Figure 8 Unknown source distribution across sink samples (ICU patients vs. healthy individuals).

The distribution of the unknown source across sink samples—healthy individuals and ICU patients (n = 100).

Supplementary Figure 9 Distinguishing between ICU patients and healthy individuals.

The receiver operating characteristic curve (ROC curve) using FEAST, Weighted UniFrac, Bray-curtis and Jensen Shannon divergence to classify healthy individuals and ICU patients with dysbiosis. FEAST AUC = 0.91, Weighted UniFrac AUC = 0.78, Jensen Shannon divergence AUC = 0.87, Bray-curtis AUC = 0.86.

Supplementary Figure 10 The source contribution across maternal samples.

Distribution of the median random maternal rank in two scenarios: (a) all maternal and early infant samples (from all the infants in the study) were considered as potential sources (n = 293 sources), and (b) only the maternal samples were considered as potential sources (n = 98 sources). In both scenarios samples taken from infants at age 12 months were considered as sinks (n = 98 sinks). The red vertical line in each figure corresponds to the actual median rank of the maternal contribution.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10, Supplementary Tables 1 and 2 and Supplementary Notes

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shenhav, L., Thompson, M., Joseph, T.A. et al. FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 16, 627–632 (2019). https://doi.org/10.1038/s41592-019-0431-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0431-x

This article is cited by

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology