Brief Communication | Published:

Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples

Nature Methodsvolume 15pages10411044 (2018) | Download Citation

Abstract

The accurate quantification of microbial growth dynamics for species without complete genome sequences is biologically important, but computationally challenging in metagenomics. Here we present dynamic estimator of microbial communities (DEMIC; https://sourceforge.net/projects/demic/), a multi-sample algorithm based on contigs and coverage values, to infer the relative distances of contigs from the replication origin and to accurately compare bacterial growth rates between samples. We demonstrate robust performances of DEMIC for various sample sizes and assembly qualities using multiple synthetic and real datasets.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The accession numbers and weblinks for all real datasets are provided in the Methods. Simulated data are available upon request from the corresponding author.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Myhrvold, C., Kotula, J. W., Hicks, W. M., Conway, N. J. & Silver, P. A. Nat. Commun. 6, 10039 (2015).

  2. 2.

    Helaine, S. et al. Proc. Natl Acad. Sci. USA 107, 3746–3751 (2010).

  3. 3.

    Claudi, B. et al. Cell 158, 722–733 (2014).

  4. 4.

    Abel, S. et al. Nat. Methods 12, 223–226 (2015).

  5. 5.

    Korem, T. et al. Science 349, 1101–1106 (2015).

  6. 6.

    Brown, C. T., Olm, M. R., Thomas, B. C. & Banfield, J. F. Nat. Biotechnol. 34, 1256–1263 (2016).

  7. 7.

    Breitwieser, F. P., Lu, J. & Salzberg, S. L. Brief. Bioinform. https://doi.org/10.1093/bib/bbx120 (2017).

  8. 8.

    Alneberg, J. et al. Nat. Methods 11, 1144–1146 (2014).

  9. 9.

    Albertsen, M. et al. Nat. Biotechnol. 31, 533–538 (2013).

  10. 10.

    Rearick, D. et al. Nucleic Acids Res. 39, 2357–2366 (2011).

  11. 11.

    Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A. & Singer, S. W. Microbiome 2, 26 (2014).

  12. 12.

    Wu, Y. W., Simmons, B. A. & Singer, S. W. Bioinformatics 32, 605–607 (2016).

  13. 13.

    Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. Bioinformatics 31, 1674–1676 (2015).

  14. 14.

    Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. Genome Res. 25, 1043–1055 (2015).

  15. 15.

    Thompson, L. R. et al. ISME J. 11, 138–151 (2017).

  16. 16.

    Lewis, J. D. et al. Cell Host Microbe 18, 489–500 (2015).

  17. 17.

    Sangwan, N., Xia, F. & Gilbert, J. A. Microbiome 4, 8 (2016).

  18. 18.

    Sczyrba, A. et al. Nat. Methods 14, 1063–1071 (2017).

  19. 19.

    Luo, C. et al. Nat. Biotechnol. 33, 1045–1052 (2015).

  20. 20.

    Beaulaurier, J. et al. Nat. Biotechnol. 36, 61–69 (2018).

  21. 21.

    Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. J. Stat. Softw. 67, 1–48 (2015).

  22. 22.

    Lê, S., Josse, J. & Husson, F. J. Stat. Softw. 25, 1–18 (2008).

  23. 23.

    Ross, M. G. et al. Genome. Biol. 14, R51 (2013).

  24. 24.

    Gao, F., Luo, H. & Zhang, C. T. Nucleic Acids Res. 41, D90–D93 (2013).

  25. 25.

    Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. BMC Bioinformatics 17, 125 (2016).

  26. 26.

    Letunic, I. & Bork, P. Nucleic Acids Res. 44, W242–W245 (2016).

  27. 27.

    Markowitz, V. M. et al. Nucleic Acids Res. 40, D115–D122 (2012).

  28. 28.

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. PeerJ 3, e1165 (2015).

  29. 29.

    Langmead, B. & Salzberg, S. L. Nat. Methods 9, 357–359 (2012).

  30. 30.

    Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

Download references

Acknowledgements

This research was supported by grant R01GM123056 (H.L.) from the National Institutes of Health.

Author information

Affiliations

  1. Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

    • Yuan Gao
    •  & Hongzhe Li

Authors

  1. Search for Yuan Gao in:

  2. Search for Hongzhe Li in:

Contributions

H.L. and Y.G. conceived and designed the project. Y.G. implemented the method. Both authors analyzed the data, and wrote and edited the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Hongzhe Li.

Integrated supplementary information

  1. Supplementary Figure 1 Peak-to-trough ratio and pipeline of DEMIC.

    (a) For most bacteria, DNA replication starts from a fixed origin in circular genome. (b) Replication forks proceed bi-directionally, and more than two replication forks may occur in fast growing bacteria. For a genome region, its DNA copy number is higher if it is nearer the fixed replication origin, and lower if it is farther away from the origin. (c) When complete genome sequence is available, ordinary linear regression model can be fitted between genome locations and logarithm-transformed sequencing coverages, and the growth dynamics of a bacterial population can be measured by coverage ratio between replication origin (peak) and terminus (trough). The peak-to-trough ratio (PTR) cannot be directly calculated without the full and complete genome.

  2. Supplementary Figure 2 The average read coverages of four species in 50 samples of the synthetic dataset.

    Each sample is a mixture of two to four real sequencing datasets from different species: Lactobacillus gasseri, Enterococcus faecalis, Citrobacter rodentium and Escherichia coli.

  3. Supplementary Figure 3 Number of growth rate (PTR) estimates by three computational methods for the three species with contig clusters generated by binning algorithm from the synthetic dataset.

    Whereas PTRC and DEMIC successfully estimated all 122 growth rates, iRep only output 59 growth rates by default, and other growth rates were categorized as ‘unfiltered’.

  4. Supplementary Figure 4 Scatterplots and correlations of the PTR estimates from DEMIC (red) and iRep (blue) with PTR values (Pearson’s r value) in 36 sequencing datasets of Lactobacillus gasseri.

    The shaded areas indicate the 99% level of confidence interval.

  5. Supplementary Figure 5 Evaluation of effects of sample sizes on the performances of DEMIC (red) and iRep (blue) based on L. gasseri, E. faecalis and C. rodentium (n = 10 for each).

    Box plots of correlations between the estimated PTRs and true PTRs (Pearson’s r values) of all evaluations, indicating the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers) of the correlations.

  6. Supplementary Figure 6 Phylogenetic tree generated by iTOL for 45 species from five phyla in simulated data that were randomly selected from records of DoriC.

    According to NCBI Taxonomy, six species have synonym name (genus or species): Desulfotomaculum nigrificans (Desulfotomaculum carboxydivorans), Cutibacterium acnes (Propionibacterium acnes), Sphaerochaeta coccoides (Spirochaeta coccoides), Pseudopropionibacterium propionicum (Propionibacterium propionicum), Acidipropionibacterium acidipropionici (Propionibacterium acidipropionici) and Sediminispirochaeta smaragdinae (Spirochaeta smaragdinae).

  7. Supplementary Figure 7

    A total of 1,336 PTRs randomly assigned to 45 species from 15 genera of five phyla and 50 samples.

  8. Supplementary Figure 8

    A total of 1,336 average coverages randomly assigned for 45 species from 15 genera of five phyla and 50 samples.

  9. Supplementary Figure 9 True versus estimated PTRs from DEMIC and iRep for 41 species represented by contig clusters.

    Symbol shape indicates whether species were filtered or not in the estimates from iRep.

  10. Supplementary Figure 10 An example of contig filtering in DEMIC.

    (a-b) iRep failed to accurately estimate the growth rates of two closely related species, P. terrae and P. polymyxa, that were mixed into the same contig cluster by binning algorithm. (c) In the contig cluster, P. polymyxa is the dominant species but with a high proportion of contamination from P. terrae. DEMIC effectively filtered out contigs from P. terrae and kept most of the contigs from P. polymyxa by iteratively updating the contig cluster based on the PC1 distribution of all remaining contigs. (d) DEMIC estimates were highly correlated with PTRs of P. polymyxa (r = 0.994, n = 28).

  11. Supplementary Figure 11 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MetaBAT as the binning algorithm.

    iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MetaBAT.

  12. Supplementary Figure 12 Applicable contig clusters and computational resources of DEMIC in two real metagenomic datasets using MaxBin as the binning algorithm.

    iRep was applied to the same SAM records and contig clusters, whereas PTRC was applied using a complete genome library that is independent of the contig clusters generated by MaxBin.

  13. Supplementary Figure 13 Growth dynamics PTR estimates by DEMIC for the RedSea datasets.

    (a) Overview of the estimated growth rates (ePTRs) for contig clusters from seawater samples of different depths in eight RedSea stations. (b-c) An example of depth-related variation in growth rates estimated by DEMIC. The estimated growth rates were significantly lower in depth 500 m compared to those in 10 m and 100 m (one-sided Mann-Whitney U test; n = 5,5,3,6 for 10 m, 100 m, 200 m and 500 m, respectively), for contig cluster 36 generated by MetaBAT, which has 60% completeness and an average identity of 92% with Marinobacter adhaerens. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).

  14. Supplementary Figure 14 Growth dynamics PTR estimates by DEMIC for the PLEASE datasets.

    (a) Overview of a subset of contig clusters with estimated bacterial growth rates (ePTRs) in healthy and Crohn’s disease samples at the baseline. (b) Completeness and contamination of contig clusters in the datasets. For each contig cluster, the size and composition of the pie represent the number of samples and proportion of disease/control status as well as treatment duration in samples that can be estimated by DEMIC for growth rates, respectively. (c) Some species represented by contig clusters showed different growth rates between healthy and disease subjects, but such a difference disappeared completely or partially after anti-TNF or enteral diet treatment (one-sided Mann-Whitney U-Test, p value < 0.05 after FDR correction). ePTR: estimated PTR from DEMIC. For metabat2.239, n = 3,9,8,3,2; for metabat2.259, n = 4,13,12,13,7; for metabat2.55, n = 4,22,14,17,15 in the group of control, Crohn’s disease baseline, week 1, week 4 and week 8, respectively. The box plots indicate the median (center line), first and third quartiles (box edges), and 1.5 times the interquartile range (whiskers).

Supplementary information

  1. Supplementary Figures and Tables

    Supplementary Figures 1–14 and Supplementary Tables 1–3

  2. Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41592-018-0182-0