The accurate quantification of microbial growth dynamics for species without complete genome sequences is biologically important, but computationally challenging in metagenomics. Here we present dynamic estimator of microbial communities (DEMIC; https://sourceforge.net/projects/demic/), a multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. We demonstrate robust performances of DEMIC for various sample sizes and assembly qualities using multiple synthetic and real datasets.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
The ISME Journal Open Access 22 October 2022
The ISME Journal Open Access 08 March 2022
The ISME Journal Open Access 16 September 2020
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The accession numbers and weblinks for all real datasets are provided in the Methods. Simulated data are available upon request from the corresponding author.
Myhrvold, C., Kotula, J. W., Hicks, W. M., Conway, N. J. & Silver, P. A. Nat. Commun. 6, 10039 (2015).
Helaine, S. et al. Proc. Natl Acad. Sci. USA 107, 3746–3751 (2010).
Claudi, B. et al. Cell 158, 722–733 (2014).
Abel, S. et al. Nat. Methods 12, 223–226 (2015).
Korem, T. et al. Science 349, 1101–1106 (2015).
Brown, C. T., Olm, M. R., Thomas, B. C. & Banfield, J. F. Nat. Biotechnol. 34, 1256–1263 (2016).
Breitwieser, F. P., Lu, J. & Salzberg, S. L. Brief. Bioinform. https://doi.org/10.1093/bib/bbx120 (2017).
Alneberg, J. et al. Nat. Methods 11, 1144–1146 (2014).
Albertsen, M. et al. Nat. Biotechnol. 31, 533–538 (2013).
Rearick, D. et al. Nucleic Acids Res. 39, 2357–2366 (2011).
Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A. & Singer, S. W. Microbiome 2, 26 (2014).
Wu, Y. W., Simmons, B. A. & Singer, S. W. Bioinformatics 32, 605–607 (2016).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. Bioinformatics 31, 1674–1676 (2015).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. Genome Res. 25, 1043–1055 (2015).
Thompson, L. R. et al. ISME J. 11, 138–151 (2017).
Lewis, J. D. et al. Cell Host Microbe 18, 489–500 (2015).
Sangwan, N., Xia, F. & Gilbert, J. A. Microbiome 4, 8 (2016).
Sczyrba, A. et al. Nat. Methods 14, 1063–1071 (2017).
Luo, C. et al. Nat. Biotechnol. 33, 1045–1052 (2015).
Beaulaurier, J. et al. Nat. Biotechnol. 36, 61–69 (2018).
Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. J. Stat. Softw. 67, 1–48 (2015).
Lê, S., Josse, J. & Husson, F. J. Stat. Softw. 25, 1–18 (2008).
Ross, M. G. et al. Genome. Biol. 14, R51 (2013).
Gao, F., Luo, H. & Zhang, C. T. Nucleic Acids Res. 41, D90–D93 (2013).
Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. BMC Bioinformatics 17, 125 (2016).
Letunic, I. & Bork, P. Nucleic Acids Res. 44, W242–W245 (2016).
Markowitz, V. M. et al. Nucleic Acids Res. 40, D115–D122 (2012).
Kang, D. D., Froula, J., Egan, R. & Wang, Z. PeerJ 3, e1165 (2015).
Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
This research was supported by grant R01GM123056 (H.L.) from the National Institutes of Health.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
(a) For most bacteria, DNA replication starts from a fixed origin in circular genome. (b) Replication forks proceed bi-directionally, and more than two replication forks may occur in fast growing bacteria. For a genome region, its DNA copy number is higher if it is nearer the fixed replication origin, and lower if it is farther away from the origin. (c) When complete genome sequence is available, ordinary linear regression model can be fitted between genome locations and logarithm-transformed sequencing coverages, and the growth dynamics of a bacterial population can be measured by coverage ratio between replication origin (peak) and terminus (trough). The peak-to-trough ratio (PTR) cannot be directly calculated without the full and complete genome.
Supplementary Figure 2 The average read coverages of four species in 50 samples of the synthetic dataset.
Each sample is a mixture of two to four real sequencing datasets from different species: Lactobacillus gasseri, Enterococcus faecalis, Citrobacter rodentium and Escherichia coli.
Supplementary Figure 3 Number of growth rate (PTR) estimates by three computational methods for the three species with contig clusters generated by binning algorithm from the synthetic dataset.
Whereas PTRC and DEMIC successfully estimated all 122 growth rates, iRep only output 59 growth rates by default, and other growth rates were categorized as ‘unfiltered’.
Supplementary Figure 4 Scatterplots and correlations of the PTR estimates from DEMIC (red) and iRep (blue) with PTR values (Pearson’s r value) in 36 sequencing datasets of Lactobacillus gasseri.
The shaded areas indicate the 99% level of confidence interval.
Supplementary Figure 5 Evaluation of effects of sample sizes on the performances of DEMIC (red) and iRep (blue) based on L. gasseri, E. faecalis and C. rodentium (n = 10 for each).
Box plots of correlations between the estimated PTRs and true PTRs (Pearson’s r values) of all evaluations, indicating the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers) of the correlations.
Supplementary Figure 6 Phylogenetic tree generated by iTOL for 45 species from five phyla in simulated data that were randomly selected from records of DoriC.
According to NCBI Taxonomy, six species have synonym name (genus or species): Desulfotomaculum nigrificans (Desulfotomaculum carboxydivorans), Cutibacterium acnes (Propionibacterium acnes), Sphaerochaeta coccoides (Spirochaeta coccoides), Pseudopropionibacterium propionicum (Propionibacterium propionicum), Acidipropionibacterium acidipropionici (Propionibacterium acidipropionici) and Sediminispirochaeta smaragdinae (Spirochaeta smaragdinae).
A total of 1,336 PTRs randomly assigned to 45 species from 15 genera of five phyla and 50 samples.
A total of 1,336 average coverages randomly assigned for 45 species from 15 genera of five phyla and 50 samples.
Supplementary Figure 9 True versus estimated PTRs from DEMIC and iRep for 41 species represented by contig clusters.
Symbol shape indicates whether species were filtered or not in the estimates from iRep.
(a-b) iRep failed to accurately estimate the growth rates of two closely related species, P. terrae and P. polymyxa, that were mixed into the same contig cluster by binning algorithm. (c) In the contig cluster, P. polymyxa is the dominant species but with a high proportion of contamination from P. terrae. DEMIC effectively filtered out contigs from P. terrae and kept most of the contigs from P. polymyxa by iteratively updating the contig cluster based on the PC1 distribution of all remaining contigs. (d) DEMIC estimates were highly correlated with PTRs of P. polymyxa (r = 0.994, n = 28).
Supplementary Figure 11 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MetaBAT as the binning algorithm.
iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MetaBAT.
Supplementary Figure 12 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MaxBin as the binning algorithm.
iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MaxBin.
(a) Overview of the estimated growth rates (ePTRs) for contig clusters from seawater samples of different depths in eight RedSea stations. (b-c) An example of depth-related variation in growth rates estimated by DEMIC. The estimated growth rates were significantly lower in depth 500 m compared to those in 10 m and 100 m (one-sided Mann-Whitney U test; n = 5,5,3,6 for 10 m, 100 m, 200 m and 500 m, respectively), for contig cluster 36 generated by MetaBAT, which has 60% completeness and an average identity of 92% with Marinobacter adhaerens. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).
(a) Overview of a subset of contig clusters with estimated bacterial growth rates (ePTRs) in healthy and Crohn’s disease samples at the baseline. (b) Completeness and contamination of contig clusters in the datasets. For each contig cluster, the size and composition of the pie represent the number of samples and proportion of disease/control status as well as treatment duration in samples that can be estimated by DEMIC for growth rates, respectively. (c) Some species represented by contig clusters showed different growth rates between healthy and disease subjects, but such a difference disappeared completely or partially after anti-TNF or enteral diet treatment (one-sided Mann-Whitney U-Test, p value < 0.05 after FDR correction). ePTR: estimated PTR from DEMIC. For metabat2.239, n = 3,9,8,3,2; for metabat2.259, n = 4,13,12,13,7; for metabat2.55, n = 4,22,14,17,15 in the group of control, Crohn’s disease baseline, week 1, week 4 and week 8, respectively. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).
About this article
Cite this article
Gao, Y., Li, H. Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15, 1041–1044 (2018). https://doi.org/10.1038/s41592-018-0182-0
This article is cited by
The ISME Journal (2022)
The ISME Journal (2022)
The ISME Journal (2021)
A Species-Specific qPCR Method for Enumeration of Lactobacillus sanfranciscensis, Lactobacillus brevis, and Lactobacillus curvatus During Cocultivation in Sourdough
Food Analytical Methods (2021)
Rapid detection of microbiota cell type diversity using machine-learned classification of flow cytometry data
Communications Biology (2020)