Bayesian community-wide culture-independent microbial source tracking


Contamination is a critical issue in high-throughput metagenomic studies, yet progress toward a comprehensive solution has been limited. We present SourceTracker, a Bayesian approach to estimate the proportion of contaminants in a given community that come from possible source environments. We applied SourceTracker to microbial surveys from neonatal intensive care units (NICUs), offices and molecular biology laboratories, and provide a database of known contaminants for future testing.

Figure 1: Comparison of SourceTracker and other models.
Figure 2: SourceTracker proportion estimates for a subset of sink samples.
Figure 3: Relative abundance of common contaminating operational taxonomic units (OTUs).


We acknowledge funding from US National Institutes of Health (R01HG4872, R01HG4866, U01HL098957 and P01DK78669), the Crohn's and Colitis Foundation of America and the Howard Hughes Medical Institute, and B. Prithiviraj for helpful insight into previous related work.

D.K. designed the algorithm and software, and performed computational experiments; D.K., R.K. and S.T.K. wrote the manuscript; J.K., E.S.C., J.Z., M.C.M., R.G.C. and F.D.B. contributed to writing the manuscript; J.K. and M.C.M. contributed to algorithm design; J.K. processed the data after sequencing; E.S.C. collected the data; R.G.C. and F.D.B. organized and supervised the data collection; R.G.C., F.D.B., R.K. and S.T.K. supervised the project.

Correspondence to Scott T Kelley.

The authors declare no competing financial interests.

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Tables 1–2 (PDF 3401 kb)

Knights, D., Kuczynski, J., Charlson, E. et al. Bayesian community-wide culture-independent microbial source tracking. Nat Methods 8, 761–763 (2011).

