FEAST: fast expectation-maximization for microbial source tracking

Shenhav, Liat; Thompson, Mike; Joseph, Tyler A.; Briscoe, Leah; Furman, Ori; Bogumil, David; Mizrahi, Itzhak; Pe’er, Itsik; Halperin, Eran

doi:10.1038/s41592-019-0431-x

Article
Published: 10 June 2019

FEAST: fast expectation-maximization for microbial source tracking

Liat Shenhav¹,
Mike Thompson ORCID: orcid.org/0000-0003-1546-0512²,
Tyler A. Joseph³,
Leah Briscoe²,
Ori Furman⁴,
David Bogumil⁴,
Itzhak Mizrahi⁴,
Itsik Pe’er³ &
…
Eran Halperin ORCID: orcid.org/0000-0002-2373-3691^1,2,5,6

Nature Methods volume 16, pages 627–632 (2019)Cite this article

15k Accesses
249 Citations
99 Altmetric
Metrics details

Subjects

Abstract

A major challenge of analyzing the compositional structure of microbiome data is identifying its potential origins. Here, we introduce fast expectation-maximization microbial source tracking (FEAST), a ready-to-use scalable framework that can simultaneously estimate the contribution of thousands of potential source environments in a timely manner, thereby helping unravel the origins of complex microbial communities (https://github.com/cozygene/FEAST). The information gained from FEAST may provide insight into quantifying contamination, tracking the formation of developing microbial communities, as well as distinguishing and characterizing bacteria-related health conditions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Running time comparison to current state-of-the-art.**

**Fig. 3: FEAST estimations of source contribution to the sink; that is, gut microbiome of focal infant at 12-months of age.**

**Fig. 4: The proportion of the unknown sources in kitchen counter samples using FEAST and SourceTracker.**

**Fig. 5: The receiver operating characteristic curve using FEAST, weighted UniFrac and Jensen–Shannon divergence to classify healthy individuals and patients in ICU with dysbiosis.**

**Fig. 6: Significant differences in the distribution of the unknown source between sink samples before and during the first event of intestinal domination across 94 patients undergoing allo-HSCT.**

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Best practices for single-cell analysis across modalities

Article 31 March 2023

Gene trajectory inference for single-cell data by optimal transport metrics

Article 05 April 2024

Data availability

All of the datasets analyzed in this paper are public and can be referenced at the following accession numbers: The first dataset was collected and studied by Backhed et al.¹⁶ (accession number ERP005989). The second dataset was collected and studied by Lax et al.¹⁵ (accession number ERP005806). The third dataset was collected and studied by Knights et al.¹⁰ (data from this study are stored in https://github.com/danknights/sourcetracker). The fourth dataset was collected and studied by McDonald et al.¹² (accession number ERP012810) and the American Gut Project³⁰ (EBI project number PRJEB11419). The fifth dataset was collected and studied by Taur et al.¹⁸ (data from this study are stored in http://www.ncbi.nlm.nih.gov/sra). In our simulations we used the Earth microbiome project (ftp://ftp.microbio.me/emp/release1/otu_tables/closed_ref_greengenes/).

Code availability

Code is available at https://github.com/cozygene/FEAST

References

Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kau, A. L., Ahern, P. P., Griffin, N. W., Goodman, A. L. & Gordon, J. I. Human nutrition, the gut microbiome and the immune system. Nature 474, 327–336 (2011).
Article CAS PubMed PubMed Central Google Scholar
Turnbaugh, P. J. & Gordon, J. I. The core gut microbiome, energy balance and obesity. J. Physiol. 587, 4153–4158 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ridaura, V. K. et al. Gut microbiota from twins discordant for obesity modulate metabolism in mice. Science 341, 1241214 (2013).
Article PubMed Google Scholar
Simpson, J. M., Santo Domingo, J. W. & Reasoner, D. J. Microbial source tracking: state of the science. Environ. Sci. Technol. 36, 5279–5288 (2002).
Article CAS PubMed Google Scholar
Wu, C. H. et al. Characterization of coastal urban watershed bacterial communities leads to alternative community-based indicators. PLoS ONE 5, e11285 (2010).
Article PubMed PubMed Central Google Scholar
Greenberg, J., Price, B. & Ware, A. Alternative estimate of source distribution in microbial source tracking using posterior probabilities. Water Res. 44, 2629–2637 (2010).
Article CAS PubMed Google Scholar
Dufrêne, M. & Legendre, P. Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr. 67, 345–366 (1997).
Google Scholar
Smith, A., Sterba-Boatwright, B. & Mott, J. Novel application of a statistical technique, Random Forests, in a bacterial source tracking study. Water Res. 44, 4067–4076 (2010).
Article CAS PubMed Google Scholar
Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).
Article CAS PubMed PubMed Central Google Scholar
Devane, M. L., Weaver, L., Singh, S. K. & Gilpin, B. J. Fecal source tracking methods to elucidate critical sources of pathogens and contaminant microbial transport through New Zealand agricultural watersheds—a review. J. Environ. Manag. 222, 293–303 (2018).
Article Google Scholar
McDonald, D. et al. Extreme dysbiosis of the microbiome in critical illness. mSphere 1, pii: e00199-16 (2016).
Dominguez-Bello, M. G. et al. Partial restoration of the microbiota of cesarean-born infants via vaginal microbial transfer. Nat. Med. 22, 250–253 (2016).
Article CAS PubMed PubMed Central Google Scholar
Teaf, C. M., Flores, D., Garber, M. & Harwood, V. J. Toward forensic uses of microbial source tracking. Microbiol. Spectr. 6, https://doi.org/10.1128/microbiolspec.EMF-0014-2017 (2018).
Lax, S. et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science 345, 1048–1052 (2014).
Article CAS PubMed PubMed Central Google Scholar
Backhed, F. et al. Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17, 690–703 (2015).
Article PubMed Google Scholar
Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).
Article CAS PubMed PubMed Central Google Scholar
Taur, Y. et al. Intestinal domination and the risk of bacteremia in patients undergoing allogeneic hematopoietic stem cell transplantation. Clin. Infect. Dis. 55, 905–914 (2012).
Article CAS PubMed PubMed Central Google Scholar
Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).
Article PubMed Google Scholar
Ley, R. E. Obesity and the human microbiome. Curr. Opin. Gastroenterol. 26, 5–11 (2010).
Article PubMed Google Scholar
Turnbaugh, P. J., Bäckhed, F., Fulton, L. & Gordon, J. I. Diet-induced obesity is linked to marked but reversible alterations in the mouse distal gut microbiome. Cell Host Microbe 3, 213–223 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102, 11070–11075 (2005).
Article CAS PubMed PubMed Central Google Scholar
Koren, O. et al. Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc. Natl Acad. Sci. USA 108, 4592–4598 (2011).
Article CAS PubMed Google Scholar
Clemente, J. C., Ursell, L. K., Parfrey, L. W. & Knight, R. The impact of the gut microbiota on human health: an integrative view. Cell 148, 1258–1270 (2012).
Article CAS PubMed PubMed Central Google Scholar
Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).
Article PubMed Google Scholar
Clarke, S. F. et al. The gut microbiota and its relationship to diet and obesity: new insights. Gut Microbes 3, 186–202 (2012).
Article PubMed PubMed Central Google Scholar
Jeffery, I. B., Quigley, E. M. M., Öhman, L., Simrén, M. & O’Toole, P. W. The microbiota link to irritable bowel syndrome: an emerging story. Gut Microbes 3, 572–576 (2012).
Article PubMed PubMed Central Google Scholar
Marchesi, J. R. et al. Towards the human colorectal cancer microbiome. PLoS ONE 6, e20447 (2011).
Article CAS PubMed PubMed Central Google Scholar
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Article CAS PubMed Google Scholar
McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00031-18 (2018).
Moon, T. K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 13, 47–60 (1996).
Article Google Scholar
Silverman, J. D., Shenhav, L., Halperin, E. A., Mukherjee, S. A. & David, L. A. Statistical considerations in the design and analysis of longitudinal microbiome studies. Preprint at bioRxiv: https://doi.org/10.1101/448332 (2018).
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
CAS PubMed PubMed Central Google Scholar
Tang, H., Peng, J., Wang, P. & Risch, N. J. Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005).
Article PubMed Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
Article PubMed PubMed Central Google Scholar
Deloger, M., El Karoui, M. & Petit, M.-A. A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J. Bacteriol. 191, 91–99 (2009).
Article CAS PubMed Google Scholar
Leung, H. C. M. et al. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27, 1489–1495 (2011).
Article CAS PubMed Google Scholar
Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lauber, C. L., Hamady, M., Knight, R. & Fierer, N. Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale. Appl. Environ. Microbiol. 75, 5111–5120 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank S. Mukherjee for insightful comments on the manuscript. This research was partially supported by European Research Council under the European Union’s Horizon 2020 research and innovation program, project number 640384. This work was partially supported by the National Science Foundation (grant number 1705197). T.A.J. was supported by National Science Foundation (grant no. DGE-1644869).

Author information

Authors and Affiliations

Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
Liat Shenhav & Eran Halperin
Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
Mike Thompson, Leah Briscoe & Eran Halperin
Department of Computer Science, Columbia University, New York, NY, USA
Tyler A. Joseph & Itsik Pe’er
Life Sciences, Ben Gurion University, Be’er Sheva, Israel
Ori Furman, David Bogumil & Itzhak Mizrahi
Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
Eran Halperin
Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
Eran Halperin

Authors

Liat Shenhav
View author publications
You can also search for this author in PubMed Google Scholar
Mike Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Tyler A. Joseph
View author publications
You can also search for this author in PubMed Google Scholar
Leah Briscoe
View author publications
You can also search for this author in PubMed Google Scholar
Ori Furman
View author publications
You can also search for this author in PubMed Google Scholar
David Bogumil
View author publications
You can also search for this author in PubMed Google Scholar
Itzhak Mizrahi
View author publications
You can also search for this author in PubMed Google Scholar
Itsik Pe’er
View author publications
You can also search for this author in PubMed Google Scholar
Eran Halperin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.S. and E.H. conceived the statistical model. L.S. designed the algorithm and software, and performed computational experiments. L.S., M.T., T.A.J. and L.B. wrote the manuscript. O.F. and D.B. contributed to writing the manuscript. T.A.J. and M.T. contributed to algorithm design. M.T. and L.B contributed to the computational experiments. I.M., I.P. and E.H. supervised the project.

Corresponding author

Correspondence to Eran Halperin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 The accuracy of FEAST and SourceTracker using data-driven synthetic mixtures.

The accuracy of FEAST and SourceTracker on simulated data. Each simulation was performed using 10 real source environments and simulated sinks. The x-axis is average Jensen-Shannon divergence value across known sources. The y-axis represents correlation across all source environments between true and estimated mixing proportions, measured by (a) the squared Pearson correlation coefficient averaged across sources, and (b) the squared Spearman correlation coefficient averaged across sources.

Supplementary Figure 2 Evaluation of FEAST and SourceTracker through varying levels of sequencing depth.

Evaluation of FEAST and SourceTracker through varying levels of sequencing depth. Similarity of sequences remained constant (Jensen-Shannon divergence = 0.95, trivial to disambiguate), while sequencing depth was set to vary in the range 100–10,000.

Supplementary Figure 3 The expected variance in FEAST's output.

The expected variance in FEAST's output using the dataset from McDonald et al. We used the gut microbiome of one, randomly selected, ICU patient as a sink, and the sources considered by McDonald et al.: 126 healthy controls, 126 samples of mammalian corpse decomposition, 126 samples of the gut from healthy children, and 126 samples from indoor house surfaces. By repeating this analysis 100 times and calculating the standard deviation of each source we demonstrate that the variance in FEAST’s output is very small (that is., sd(dust) = 7.7e-05, sd(healthy adults' feces) = 0.01, sd(healthy children's feces) = 0.01,sd(soil) = 5e-05, sd(unknown) = 8.5e-05).

Supplementary Figure 4 The effect of noisy samples among sources on prediction accuracy.

The effect of noisy samples among sources on prediction accuracy (that is., estimation of the known and unknown sources). As we increase the number of samples per source, FEAST’s prediction accuracy improves, however this effect is moderate (squared Pearson correlation ranges from 0.9–0.99, Jensen-Shannon divergence values range from 0.87–0.92).

Supplementary Figure 5 The source proportions using SourceTracker.

SourceTracker estimations of source contribution (the gut microbiome of mother, infant at 4 months and infant at birth) to the gut microbiome of 12-month-old infants. According to SourceTracker differences between C-section (n = 15) and Vaginally-delivered (n = 83) infants in terms of maternal contribution are not significant (two-sided t-test p-value = 0.6408). Box plots indicate the median (central lines), interquartile range (hinges), and the 5th and 95th percentiles (whiskers).

Supplementary Figure 6 Detecting contamination in lab-settings.

FEAST and SourceTracker report consistent proportions of contamination, despite minor discrepancies in a lab-setting (left: keyboard, right: Counter). Estimates on the top row were reported by SourceTracker and estimates on the bottom row were reported by FEAST.

Supplementary Figure 7 Gut microbiome samples from ICU patients are not reminiscent of gut samples from healthy individuals.

Gut samples from ICU patients are not reminiscent of gut samples from healthy individuals. We used the gut microbiome of each ICU patient (at discharge or after 10 days) as a sink, and the sources considered by the original study (McDonald et al. 2016): 126 samples from the American Gut Project (healthy controls), 126 samples of mammalian corpse decomposition, 126 samples of the gut from healthy children (Global Gut study), and 126 samples from indoor house surfaces.

Supplementary Figure 8 Unknown source distribution across sink samples (ICU patients vs. healthy individuals).

The distribution of the unknown source across sink samples—healthy individuals and ICU patients (n = 100).

Supplementary Figure 9 Distinguishing between ICU patients and healthy individuals.

The receiver operating characteristic curve (ROC curve) using FEAST, Weighted UniFrac, Bray-curtis and Jensen Shannon divergence to classify healthy individuals and ICU patients with dysbiosis. FEAST AUC = 0.91, Weighted UniFrac AUC = 0.78, Jensen Shannon divergence AUC = 0.87, Bray-curtis AUC = 0.86.

Supplementary Figure 10 The source contribution across maternal samples.

Distribution of the median random maternal rank in two scenarios: (a) all maternal and early infant samples (from all the infants in the study) were considered as potential sources (n = 293 sources), and (b) only the maternal samples were considered as potential sources (n = 98 sources). In both scenarios samples taken from infants at age 12 months were considered as sinks (n = 98 sinks). The red vertical line in each figure corresponds to the actual median rank of the maternal contribution.

Supplementary information

Supplementary Information

Supplementary Figs. 1–10, Supplementary Tables 1 and 2 and Supplementary Notes

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shenhav, L., Thompson, M., Joseph, T.A. et al. FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 16, 627–632 (2019). https://doi.org/10.1038/s41592-019-0431-x

Download citation

Received: 06 August 2018
Accepted: 23 April 2019
Published: 10 June 2019
Issue Date: July 2019
DOI: https://doi.org/10.1038/s41592-019-0431-x

This article is cited by

Oral-gut microbial transmission promotes diabetic coronary heart disease
- Yiwen Li
- Yanfei Liu
- Yue Liu
Cardiovascular Diabetology (2024)
Microbes translocation from oral cavity to nasopharyngeal carcinoma in patients
- Ying Liao
- Yan-Xia Wu
- Wei-Hua Jia
Nature Communications (2024)
Fecal Impairment Framework, A New Conceptual Framework for Assessing Fecal Contamination in Recreational Waters
- John J. Hart
- Megan N. Jamison
- Richard R. Rediske
Environmental Management (2024)
Diversity Patterns of Eukaryotic Phytoplankton in the Medog Section of the Yarlung Zangbo River
- Huan Zhu
- Shuyin Li
- Guoxiang Liu
Microbial Ecology (2024)
SNV-FEAST: microbial source tracking with single nucleotide variants
- Leah Briscoe
- Eran Halperin
- Nandita R. Garud
Genome Biology (2023)