Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples

Gao, Yuan; Li, Hongzhe

doi:10.1038/s41592-018-0182-0

Brief Communication
Published: 12 November 2018

Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples

Nature Methods volume 15, pages 1041–1044 (2018)Cite this article

6034 Accesses
32 Citations
32 Altmetric
Metrics details

Subjects

Abstract

The accurate quantification of microbial growth dynamics for species without complete genome sequences is biologically important, but computationally challenging in metagenomics. Here we present dynamic estimator of microbial communities (DEMIC; https://sourceforge.net/projects/demic/), a multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. We demonstrate robust performances of DEMIC for various sample sizes and assembly qualities using multiple synthetic and real datasets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Computational pipeline of DEMIC.**

**Fig. 2: Performance evaluation of DEMIC based on sequencing datasets of three species.**

**Fig. 3: Performance evaluation of DEMIC based on simulated data of 45 closely related species from five phyla.**

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Xinran Wang, Ningxin Chen, … Xiaozhou Luo

Revealing uncertainty in the status of biodiversity change

Article Open access 27 March 2024

T. F. Johnson, A. P. Beckerman, … R. P. Freckleton

Single-cell RNA-seq of the rare virosphere reveals the native hosts of giant viruses in the marine environment

Article 11 April 2024

Amir Fromm, Gur Hevroni, … Assaf Vardi

Data availability

The accession numbers and weblinks for all real datasets are provided in the Methods. Simulated data are available upon request from the corresponding author.

References

Myhrvold, C., Kotula, J. W., Hicks, W. M., Conway, N. J. & Silver, P. A. Nat. Commun. 6, 10039 (2015).
Article CAS PubMed Google Scholar
Helaine, S. et al. Proc. Natl Acad. Sci. USA 107, 3746–3751 (2010).
Article CAS PubMed PubMed Central Google Scholar
Claudi, B. et al. Cell 158, 722–733 (2014).
Article CAS PubMed Google Scholar
Abel, S. et al. Nat. Methods 12, 223–226 (2015).
Article CAS PubMed PubMed Central Google Scholar
Korem, T. et al. Science 349, 1101–1106 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brown, C. T., Olm, M. R., Thomas, B. C. & Banfield, J. F. Nat. Biotechnol. 34, 1256–1263 (2016).
Article CAS PubMed PubMed Central Google Scholar
Breitwieser, F. P., Lu, J. & Salzberg, S. L. Brief. Bioinform. https://doi.org/10.1093/bib/bbx120 (2017).
Article PubMed Central Google Scholar
Alneberg, J. et al. Nat. Methods 11, 1144–1146 (2014).
Article CAS PubMed Google Scholar
Albertsen, M. et al. Nat. Biotechnol. 31, 533–538 (2013).
Article CAS PubMed Google Scholar
Rearick, D. et al. Nucleic Acids Res. 39, 2357–2366 (2011).
Article CAS PubMed Google Scholar
Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A. & Singer, S. W. Microbiome 2, 26 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wu, Y. W., Simmons, B. A. & Singer, S. W. Bioinformatics 32, 605–607 (2016).
Article CAS PubMed Google Scholar
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. Bioinformatics 31, 1674–1676 (2015).
Article CAS PubMed Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Thompson, L. R. et al. ISME J. 11, 138–151 (2017).
Article CAS PubMed Google Scholar
Lewis, J. D. et al. Cell Host Microbe 18, 489–500 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sangwan, N., Xia, F. & Gilbert, J. A. Microbiome 4, 8 (2016).
Article PubMed PubMed Central Google Scholar
Sczyrba, A. et al. Nat. Methods 14, 1063–1071 (2017).
Article CAS PubMed PubMed Central Google Scholar
Luo, C. et al. Nat. Biotechnol. 33, 1045–1052 (2015).
Article CAS PubMed PubMed Central Google Scholar
Beaulaurier, J. et al. Nat. Biotechnol. 36, 61–69 (2018).
Article CAS PubMed Google Scholar
Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. J. Stat. Softw. 67, 1–48 (2015).
Article Google Scholar
Lê, S., Josse, J. & Husson, F. J. Stat. Softw. 25, 1–18 (2008).
Article Google Scholar
Ross, M. G. et al. Genome. Biol. 14, R51 (2013).
Article PubMed PubMed Central Google Scholar
Gao, F., Luo, H. & Zhang, C. T. Nucleic Acids Res. 41, D90–D93 (2013).
Article CAS PubMed Google Scholar
Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. BMC Bioinformatics 17, 125 (2016).
Article CAS PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Nucleic Acids Res. 44, W242–W245 (2016).
Article CAS PubMed PubMed Central Google Scholar
Markowitz, V. M. et al. Nucleic Acids Res. 40, D115–D122 (2012).
Article CAS PubMed Google Scholar
Kang, D. D., Froula, J., Egan, R. & Wang, Z. PeerJ 3, e1165 (2015).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by grant R01GM123056 (H.L.) from the National Institutes of Health.

Author information

Authors and Affiliations

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Yuan Gao & Hongzhe Li

Authors

Yuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhe Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L. and Y.G. conceived and designed the project. Y.G. implemented the method. Both authors analyzed the data, and wrote and edited the manuscript.

Corresponding author

Correspondence to Hongzhe Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Peak-to-trough ratio and pipeline of DEMIC.

(a) For most bacteria, DNA replication starts from a fixed origin in circular genome. (b) Replication forks proceed bi-directionally, and more than two replication forks may occur in fast growing bacteria. For a genome region, its DNA copy number is higher if it is nearer the fixed replication origin, and lower if it is farther away from the origin. (c) When complete genome sequence is available, ordinary linear regression model can be fitted between genome locations and logarithm-transformed sequencing coverages, and the growth dynamics of a bacterial population can be measured by coverage ratio between replication origin (peak) and terminus (trough). The peak-to-trough ratio (PTR) cannot be directly calculated without the full and complete genome.

Supplementary Figure 2 The average read coverages of four species in 50 samples of the synthetic dataset.

Each sample is a mixture of two to four real sequencing datasets from different species: Lactobacillus gasseri, Enterococcus faecalis, Citrobacter rodentium and Escherichia coli.

Supplementary Figure 3 Number of growth rate (PTR) estimates by three computational methods for the three species with contig clusters generated by binning algorithm from the synthetic dataset.

Whereas PTRC and DEMIC successfully estimated all 122 growth rates, iRep only output 59 growth rates by default, and other growth rates were categorized as ‘unfiltered’.

Supplementary Figure 4 Scatterplots and correlations of the PTR estimates from DEMIC (red) and iRep (blue) with PTR values (Pearson’s r value) in 36 sequencing datasets of Lactobacillus gasseri.

The shaded areas indicate the 99% level of confidence interval.

Supplementary Figure 5 Evaluation of effects of sample sizes on the performances of DEMIC (red) and iRep (blue) based on L. gasseri, E. faecalis and C. rodentium (n = 10 for each).

Box plots of correlations between the estimated PTRs and true PTRs (Pearson’s r values) of all evaluations, indicating the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers) of the correlations.

Supplementary Figure 6 Phylogenetic tree generated by iTOL for 45 species from five phyla in simulated data that were randomly selected from records of DoriC.

According to NCBI Taxonomy, six species have synonym name (genus or species): Desulfotomaculum nigrificans (Desulfotomaculum carboxydivorans), Cutibacterium acnes (Propionibacterium acnes), Sphaerochaeta coccoides (Spirochaeta coccoides), Pseudopropionibacterium propionicum (Propionibacterium propionicum), Acidipropionibacterium acidipropionici (Propionibacterium acidipropionici) and Sediminispirochaeta smaragdinae (Spirochaeta smaragdinae).

Supplementary Figure 7

A total of 1,336 PTRs randomly assigned to 45 species from 15 genera of five phyla and 50 samples.

Supplementary Figure 8

A total of 1,336 average coverages randomly assigned for 45 species from 15 genera of five phyla and 50 samples.

Supplementary Figure 9 True versus estimated PTRs from DEMIC and iRep for 41 species represented by contig clusters.

Symbol shape indicates whether species were filtered or not in the estimates from iRep.

Supplementary Figure 10 An example of contig filtering in DEMIC.

(a-b) iRep failed to accurately estimate the growth rates of two closely related species, P. terrae and P. polymyxa, that were mixed into the same contig cluster by binning algorithm. (c) In the contig cluster, P. polymyxa is the dominant species but with a high proportion of contamination from P. terrae. DEMIC effectively filtered out contigs from P. terrae and kept most of the contigs from P. polymyxa by iteratively updating the contig cluster based on the PC1 distribution of all remaining contigs. (d) DEMIC estimates were highly correlated with PTRs of P. polymyxa (r = 0.994, n = 28).

Supplementary Figure 11 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MetaBAT as the binning algorithm.

iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MetaBAT.

Supplementary Figure 12 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MaxBin as the binning algorithm.

iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MaxBin.

Supplementary Figure 13 Growth dynamics PTR estimates by DEMIC for the RedSea datasets.

(a) Overview of the estimated growth rates (ePTRs) for contig clusters from seawater samples of different depths in eight RedSea stations. (b-c) An example of depth-related variation in growth rates estimated by DEMIC. The estimated growth rates were significantly lower in depth 500 m compared to those in 10 m and 100 m (one-sided Mann-Whitney U test; n = 5,5,3,6 for 10 m, 100 m, 200 m and 500 m, respectively), for contig cluster 36 generated by MetaBAT, which has 60% completeness and an average identity of 92% with Marinobacter adhaerens. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).

Supplementary Figure 14 Growth dynamics PTR estimates by DEMIC for the PLEASE datasets.

(a) Overview of a subset of contig clusters with estimated bacterial growth rates (ePTRs) in healthy and Crohn’s disease samples at the baseline. (b) Completeness and contamination of contig clusters in the datasets. For each contig cluster, the size and composition of the pie represent the number of samples and proportion of disease/control status as well as treatment duration in samples that can be estimated by DEMIC for growth rates, respectively. (c) Some species represented by contig clusters showed different growth rates between healthy and disease subjects, but such a difference disappeared completely or partially after anti-TNF or enteral diet treatment (one-sided Mann-Whitney U-Test, p value < 0.05 after FDR correction). ePTR: estimated PTR from DEMIC. For metabat2.239, n = 3,9,8,3,2; for metabat2.259, n = 4,13,12,13,7; for metabat2.55, n = 4,22,14,17,15 in the group of control, Crohn’s disease baseline, week 1, week 4 and week 8, respectively. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).

Supplementary information

Supplementary Figures and Tables

Supplementary Figures 1–14 and Supplementary Tables 1–3

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, Y., Li, H. Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15, 1041–1044 (2018). https://doi.org/10.1038/s41592-018-0182-0

Download citation

Received: 20 March 2018
Accepted: 18 August 2018
Published: 12 November 2018
Issue Date: December 2018
DOI: https://doi.org/10.1038/s41592-018-0182-0

This article is cited by

Unique microbial landscape in the human oropharynx during different types of acute respiratory tract infections
- Hui Li
- Xiaorong Wu
- Tao Ding
Microbiome (2023)
Changes in salivary microbiota due to gastric cancer resection and its relation to gastric fluid microbiota
- Eri Komori
- Nahoko Kato-Kogoe
- Takaaki Ueno
Scientific Reports (2023)
Basin-scale biogeography of Prochlorococcus and SAR11 ecotype replication
- Alyse A Larkin
- George I Hagstrom
- Adam C Martiny
The ISME Journal (2023)
Growth phase estimation for abundant bacterial populations sampled longitudinally from human stool metagenomes
- Joe J. Lim
- Christian Diener
- Sean M. Gibbons
Nature Communications (2023)
An improved workflow for accurate and robust healthcare environmental surveillance using metagenomics
- Jiaxian Shen
- Alexander G. McFarland
- Erica M. Hartmann
Microbiome (2022)