Abstract
The accurate quantification of microbial growth dynamics for species without complete genome sequences is biologically important, but computationally challenging in metagenomics. Here we present dynamic estimator of microbial communities (DEMIC; https://sourceforge.net/projects/demic/), a multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. We demonstrate robust performances of DEMIC for various sample sizes and assembly qualities using multiple synthetic and real datasets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The accession numbers and weblinks for all real datasets are provided in the Methods. Simulated data are available upon request from the corresponding author.
References
Myhrvold, C., Kotula, J. W., Hicks, W. M., Conway, N. J. & Silver, P. A. Nat. Commun. 6, 10039 (2015).
Helaine, S. et al. Proc. Natl Acad. Sci. USA 107, 3746–3751 (2010).
Claudi, B. et al. Cell 158, 722–733 (2014).
Abel, S. et al. Nat. Methods 12, 223–226 (2015).
Korem, T. et al. Science 349, 1101–1106 (2015).
Brown, C. T., Olm, M. R., Thomas, B. C. & Banfield, J. F. Nat. Biotechnol. 34, 1256–1263 (2016).
Breitwieser, F. P., Lu, J. & Salzberg, S. L. Brief. Bioinform. https://doi.org/10.1093/bib/bbx120 (2017).
Alneberg, J. et al. Nat. Methods 11, 1144–1146 (2014).
Albertsen, M. et al. Nat. Biotechnol. 31, 533–538 (2013).
Rearick, D. et al. Nucleic Acids Res. 39, 2357–2366 (2011).
Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A. & Singer, S. W. Microbiome 2, 26 (2014).
Wu, Y. W., Simmons, B. A. & Singer, S. W. Bioinformatics 32, 605–607 (2016).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. Bioinformatics 31, 1674–1676 (2015).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. Genome Res. 25, 1043–1055 (2015).
Thompson, L. R. et al. ISME J. 11, 138–151 (2017).
Lewis, J. D. et al. Cell Host Microbe 18, 489–500 (2015).
Sangwan, N., Xia, F. & Gilbert, J. A. Microbiome 4, 8 (2016).
Sczyrba, A. et al. Nat. Methods 14, 1063–1071 (2017).
Luo, C. et al. Nat. Biotechnol. 33, 1045–1052 (2015).
Beaulaurier, J. et al. Nat. Biotechnol. 36, 61–69 (2018).
Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. J. Stat. Softw. 67, 1–48 (2015).
Lê, S., Josse, J. & Husson, F. J. Stat. Softw. 25, 1–18 (2008).
Ross, M. G. et al. Genome. Biol. 14, R51 (2013).
Gao, F., Luo, H. & Zhang, C. T. Nucleic Acids Res. 41, D90–D93 (2013).
Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. BMC Bioinformatics 17, 125 (2016).
Letunic, I. & Bork, P. Nucleic Acids Res. 44, W242–W245 (2016).
Markowitz, V. M. et al. Nucleic Acids Res. 40, D115–D122 (2012).
Kang, D. D., Froula, J., Egan, R. & Wang, Z. PeerJ 3, e1165 (2015).
Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Acknowledgements
This research was supported by grant R01GM123056 (H.L.) from the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
H.L. and Y.G. conceived and designed the project. Y.G. implemented the method. Both authors analyzed the data, and wrote and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Peak-to-trough ratio and pipeline of DEMIC.
(a) For most bacteria, DNA replication starts from a fixed origin in circular genome. (b) Replication forks proceed bi-directionally, and more than two replication forks may occur in fast growing bacteria. For a genome region, its DNA copy number is higher if it is nearer the fixed replication origin, and lower if it is farther away from the origin. (c) When complete genome sequence is available, ordinary linear regression model can be fitted between genome locations and logarithm-transformed sequencing coverages, and the growth dynamics of a bacterial population can be measured by coverage ratio between replication origin (peak) and terminus (trough). The peak-to-trough ratio (PTR) cannot be directly calculated without the full and complete genome.
Supplementary Figure 2 The average read coverages of four species in 50 samples of the synthetic dataset.
Each sample is a mixture of two to four real sequencing datasets from different species: Lactobacillus gasseri, Enterococcus faecalis, Citrobacter rodentium and Escherichia coli.
Supplementary Figure 3 Number of growth rate (PTR) estimates by three computational methods for the three species with contig clusters generated by binning algorithm from the synthetic dataset.
Whereas PTRC and DEMIC successfully estimated all 122 growth rates, iRep only output 59 growth rates by default, and other growth rates were categorized as ‘unfiltered’.
Supplementary Figure 4 Scatterplots and correlations of the PTR estimates from DEMIC (red) and iRep (blue) with PTR values (Pearson’s r value) in 36 sequencing datasets of Lactobacillus gasseri.
The shaded areas indicate the 99% level of confidence interval.
Supplementary Figure 5 Evaluation of effects of sample sizes on the performances of DEMIC (red) and iRep (blue) based on L. gasseri, E. faecalis and C. rodentium (n = 10 for each).
Box plots of correlations between the estimated PTRs and true PTRs (Pearson’s r values) of all evaluations, indicating the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers) of the correlations.
Supplementary Figure 6 Phylogenetic tree generated by iTOL for 45 species from five phyla in simulated data that were randomly selected from records of DoriC.
According to NCBI Taxonomy, six species have synonym name (genus or species): Desulfotomaculum nigrificans (Desulfotomaculum carboxydivorans), Cutibacterium acnes (Propionibacterium acnes), Sphaerochaeta coccoides (Spirochaeta coccoides), Pseudopropionibacterium propionicum (Propionibacterium propionicum), Acidipropionibacterium acidipropionici (Propionibacterium acidipropionici) and Sediminispirochaeta smaragdinae (Spirochaeta smaragdinae).
Supplementary Figure 7
A total of 1,336 PTRs randomly assigned to 45 species from 15 genera of five phyla and 50 samples.
Supplementary Figure 8
A total of 1,336 average coverages randomly assigned for 45 species from 15 genera of five phyla and 50 samples.
Supplementary Figure 9 True versus estimated PTRs from DEMIC and iRep for 41 species represented by contig clusters.
Symbol shape indicates whether species were filtered or not in the estimates from iRep.
Supplementary Figure 10 An example of contig filtering in DEMIC.
(a-b) iRep failed to accurately estimate the growth rates of two closely related species, P. terrae and P. polymyxa, that were mixed into the same contig cluster by binning algorithm. (c) In the contig cluster, P. polymyxa is the dominant species but with a high proportion of contamination from P. terrae. DEMIC effectively filtered out contigs from P. terrae and kept most of the contigs from P. polymyxa by iteratively updating the contig cluster based on the PC1 distribution of all remaining contigs. (d) DEMIC estimates were highly correlated with PTRs of P. polymyxa (r = 0.994, n = 28).
Supplementary Figure 11 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MetaBAT as the binning algorithm.
iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MetaBAT.
Supplementary Figure 12 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MaxBin as the binning algorithm.
iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MaxBin.
Supplementary Figure 13 Growth dynamics PTR estimates by DEMIC for the RedSea datasets.
(a) Overview of the estimated growth rates (ePTRs) for contig clusters from seawater samples of different depths in eight RedSea stations. (b-c) An example of depth-related variation in growth rates estimated by DEMIC. The estimated growth rates were significantly lower in depth 500 m compared to those in 10 m and 100 m (one-sided Mann-Whitney U test; n = 5,5,3,6 for 10 m, 100 m, 200 m and 500 m, respectively), for contig cluster 36 generated by MetaBAT, which has 60% completeness and an average identity of 92% with Marinobacter adhaerens. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).
Supplementary Figure 14 Growth dynamics PTR estimates by DEMIC for the PLEASE datasets.
(a) Overview of a subset of contig clusters with estimated bacterial growth rates (ePTRs) in healthy and Crohn’s disease samples at the baseline. (b) Completeness and contamination of contig clusters in the datasets. For each contig cluster, the size and composition of the pie represent the number of samples and proportion of disease/control status as well as treatment duration in samples that can be estimated by DEMIC for growth rates, respectively. (c) Some species represented by contig clusters showed different growth rates between healthy and disease subjects, but such a difference disappeared completely or partially after anti-TNF or enteral diet treatment (one-sided Mann-Whitney U-Test, p value < 0.05 after FDR correction). ePTR: estimated PTR from DEMIC. For metabat2.239, n = 3,9,8,3,2; for metabat2.259, n = 4,13,12,13,7; for metabat2.55, n = 4,22,14,17,15 in the group of control, Crohn’s disease baseline, week 1, week 4 and week 8, respectively. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).
Supplementary information
Supplementary Figures and Tables
Supplementary Figures 1–14 and Supplementary Tables 1–3
Rights and permissions
About this article
Cite this article
Gao, Y., Li, H. Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15, 1041–1044 (2018). https://doi.org/10.1038/s41592-018-0182-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-018-0182-0