Measurement of bacterial replication rates in microbial communities

Journal name:
Nature Biotechnology
Volume:
34,
Pages:
1256–1263
Year published:
DOI:
doi:10.1038/nbt.3704
Received
Accepted
Published online

Abstract

Culture-independent microbiome studies have increased our understanding of the complexity and metabolic potential of microbial communities. However, to understand the contribution of individual microbiome members to community functions, it is important to determine which bacteria are actively replicating. We developed an algorithm, iRep, that uses draft-quality genome sequences and single time-point metagenome sequencing to infer microbial population replication rates. The algorithm calculates an index of replication (iRep) based on the sequencing coverage trend that results from bi-directional genome replication from a single origin of replication. We apply this method to show that microbial replication rates increase after antibiotic administration in human infants. We also show that uncultivated, groundwater-associated, Candidate Phyla Radiation bacteria only rarely replicate quickly in subsurface communities undergoing substantial changes in geochemistry. Our method can be applied to any genome-resolved microbiome study to track organism responses to varying conditions, identify actively growing populations and measure replication rates for use in modeling studies.

At a glance

Figures

  1. iRep determines replication rates for bacteria using genome-resolved metagenomics.
    Figure 1: iRep determines replication rates for bacteria using genome-resolved metagenomics.

    (a) Populations of bacteria undergoing rapid cell division differ from slowly growing populations in that the individual cells of a growing population are more actively in the process of replicating their genomes (purple circles). (b) Differences in genome copy number across a population of replicating cells can be determined based on sequencing read coverage over complete genome sequences. The ratio between the coverage at the origin (“peak”) and terminus (“trough”) of replication (PTR) relates to the replication rate of the population. The origin and terminus can be determined based on cumulative GC skew. (c,d) If no complete genome sequence is available, it is possible to calculate the replication rate based on the distribution of coverage values across a draft-quality genome using the iRep method. Coverage is first calculated across overlapping segments of genome fragments. Growing populations will have a wider distribution of coverage values compared with stable populations (histograms). These values are ordered from lowest to highest, and linear regression is used to evaluate the coverage distribution across the genome in order to determine the coverage values associated with the origin and terminus of replication. iRep is calculated as the ratio of these values. (e) Genome-resolved metagenomics involves DNA extraction from a microbiome sample followed by DNA sequencing, assembly, and genome binning. Binning is the grouping together of assembled genome fragments that originated from the same genome. This can be done based on shared characteristics of each fragment, such as sequence composition, taxonomic affiliation, or abundance.

  2. iRep is an accurate measure of in situ replication rates.
    Figure 2: iRep is an accurate measure of in situ replication rates.

    (a) iRep, bPTR, and kPTR measurements made for cultured Lactobacillus gasseri8 were compared (Pearson's r value); strong agreement was seen between all methods. (b) Colony forming unit (CFU) counts were available for a subset of these samples8 and used to calculate growth rates. All methods were highly correlated with CFU-derived rates after first accounting for the delay between start of genome replication and observable change in population size (as noted previously8). Replication rates from CFU data were adjusted by variable amounts before calculating correlations with sequencing-based rates (best correlation shown; d = time adjustment). CFU data are plotted with a −90 min offset. Error bars, mean ± s.d. (c) Using the L. gasseri data, minimum coverage requirements were determined for each method by first measuring the replication rate at 25× coverage, and then comparing it to values calculated after simulating lower coverage. This shows that ≥5× coverage is required. (d) The minimum required genome fraction for iRep was determined by conducting 100 random fragmentations and subsets of the L. gasseri genome. Sequencing was subset to 5× coverage before calculating iRep to show the combined affect of low coverage and missing genomic information. With ≥75% of a genome sequence, most iRep measurements are accurate ± 0.15. (e) iRep and bPTR measurements were calculated using five genome sequences assembled from premature infant metagenomes, showing that these methods are in agreement in the context of microbiome sequencing data.

  3. iRep and bPTR calculations agree for a novel Deltaproteobacterium sampled from groundwater.
    Figure 3: iRep and bPTR calculations agree for a novel Deltaproteobacterium sampled from groundwater.

    (a) bPTR was calculated after determining the origin and terminus of replication based on regression to coverage calculated across the genome. Coverage was calculated for 10 Kbp windows sampled every 100 bp (Online Methods). The ratio between the coverage at the origin and terminus was determined after applying a median filter. The cumulative GC skew pattern confirms the genome assembly and locations of the origin and terminus of replication. (b) iRep was determined by first calculating coverage over 5-Kbp windows sampled every 100 bp, and then sorting the resulting values. High- and low-coverage windows were removed, and then the slope of the remaining (trimmed) values was determined and used to evaluate the coverage at the origin and terminus of replication: iRep was calculated as the ratio of these values. (r2 was calculated between trimmed data and the linear regression.)

  4. Replication rates were determined for CPR and human microbiome-associated organisms.
    Figure 4: Replication rates were determined for CPR and human microbiome-associated organisms.

    (a,b) iRep values were measured and compared across studies (a; MW = Mann-Whitney, n = number of measured replication rates), and compared based on taxonomic affiliation (b).

  5. Elevated replication rates are associated with antibiotic administration and were detected before onset of necrotizing enterocolitis (NEC) in premature infants.
    Figure 5: Elevated replication rates are associated with antibiotic administration and were detected before onset of necrotizing enterocolitis (NEC) in premature infants.

    (a,b) iRep distributions were compared between samples collected during or within 5 d after antibiotic administration and samples from other time points (a), and between samples collected from NEC and control infants (b). (c,d) Comparison of iRep values measured for different species (c) and genera (d) sampled from NEC and control infants (shown are taxa with ≥5 observations from either group). (e) iRep for the fastest growing organism observed for each control infant, and for the fastest growing organism from each day of life (DOL) sampled for each NEC infant, reported relative to NEC diagnosis. High replication rates for members of the genus Clostridium were detected in infants surveyed before NEC diagnosis.

  6. Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) values for bacterial species associated with two premature infants.
    Figure 6: Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) values for bacterial species associated with two premature infants.

    The 5 d following antibiotic administration are indicated using a color gradient. (a) Exponential growth was determined by regression to K. oxytoca absolute abundance values (black dashed line). (b) Infant 2 was diagnosed with two cases of necrotizing enterocolitis (NEC; red dashed lines) during the study period.

  7. Schematic showing steps involved in a genome-resolved metagenomics study that includes iRep analysis.
    Supplementary Fig. 1: Schematic showing steps involved in a genome-resolved metagenomics study that includes iRep analysis.

    Microbiome sample collection and DNA extraction methods should be determined on a per-project basis, and metagenome sequencing can be conducted on the Illumina, PacBio, or another sequencing platform. Sequencing reads are trimmed based on quality scores (e.g. using SickleSickle18) and filtered for contamination (e.g. removal of human genome sequences). High-quality reads are then assembled (e.g. using IDBA_UDUD19), and the resulting scaffolds are binned either manually (e.g. based on GC content, taxonomic affiliation, coverage), and/or using a clustering algorithm such as ESOMESOM20,29,30) or using an automated binning program (e.g. MaxBinMaxBin21, CONCOCTCONCOCT22, or ABAWACAABAWACA15). Genome bins can then be assessed for completion and contamination based on inventory of expected single copy genes (SCGs), either based on identification of these genes from genome annotations (seesee15,29,55), or using software such as CheckMCheckM23. High-quality genomes are then compared with one another and grouped into clusters based on average nucleotide identity (ANI; e.g., based on sharing 98% ANI determined using MashMash54). A representative of each cluster should be included in a genome database that will be used for iRep analysis, along with genomes from other projects that may be appropriate for the analysis. Reads from each metagenome are then mapped to the genome database (e.g. using Bowtie2Bowtie247), and iRep is calculated from the read mapping data (see Online Methods).

  8. Evaluation of iRep method parameters.
    Supplementary Fig. 2: Evaluation of iRep method parameters.

    (a) Gamma distribution used to simulate genome fragmentation for genome completeness analyses. The frequency of genome fragment sizes from all genomes analyzed in this study are compared with genome fragment sizes simulated using a gamma distribution with parameters: alpha = 0.1, beta = 21,000, min. = 5,000, max. = 200,000. These parameters were first estimated by fitting to the genome data, and then manually adjusted. Similarity between the two distributions shows that this gamma distribution can be used to approximate the level of genome fragmentation expected for draft-quality genome sequences. (b) iRep was calculated from random genome fragmentation simulations in order to survey a range of fragmentation levels (Supplementary Table 1). The analysis was conducted for an L. gasseri sample from the Korem et al.8 study in which iRep was determined to be 2.01 using the complete genome with 25x sequencing coverage. This known iRep value was then compared with iRep values determined from each genome fragmentation simulation after subsampling to 75% of the genome and using only 5x sequencing coverage. This enabled analysis of the influence of fragmentation on iRep calculations at the completeness and coverage limits of the method. Results show that 91.8% of iRep values are within the expected range of 0.15 when genomes have fewer than 175 fragments/Mbp of genome sequence. (c) Four L. gasseri samples from the Korem et al.8 study that represent iRep values between 1.50 and 2.01 were selected in order to test different coverage sliding window calculation methods (see Online Methods for description of each methods) and window sizes. For each sample, 100 random genome fragmentations and subsets were conducted in order to assess each method based on various levels of genome completion. The results show that the “iRep” and “median iRep” methods using 5 Kbp windows exhibited the least amount of variation. (d) Because the iRep method involves randomly combining coverage data from different genome fragments prior to calculating coverage sliding windows, some sliding windows will include coverage values from different locations on the complete genome sequence. In order to evaluate the variation introduced by the (random) order in which scaffolds are combined, iRep calculations were conducted for ten random orderings of 100 random genome fragmentations conducted using the sample set described in (c). Results show a very minimal amount of variation in iRep values as described by the difference between the lowest and highest values determined from each of the ten orderings (“iRep range”). Because of this, we chose not to implement the “median iRep” strategy. (e) Using the sample set described in (c), the iRep method was implemented using 5 Kbp windows using different window slide values in order to test whether or not the slide value would change the results. Because both 10 and 100 bp window slides produced similar results, we implemented the iRep method using a 100 bp window slide. (f) iRep is not as strongly correlated with bPTR without the GC sequencing bias correction for five genome sequences assembled from premature infant metagenomes (Supplementary Table 4; compare with GC corrected data in Fig. 2e).

  9. Coverage, GC skew patterns, and bPTR measurements for reconstructed genomes oriented and ordered based on complete reference genome sequences.
    Supplementary Fig. 3: Coverage, GC skew patterns, and bPTR measurements for reconstructed genomes oriented and ordered based on complete reference genome sequences.

    (a-e) Read mapping was conducted using sequences from the sample used for genome recovery. bPTR was calculated after determining the origin and terminus of replication based on cumulative GC skew. Coverage was calculated for 10 Kbp windows calculated every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns confirm the ordering of genome fragments.

  10. Reference genomes are not representative of organisms surveyed in the premature infant microbiome study.
    Supplementary Fig. 4: Reference genomes are not representative of organisms surveyed in the premature infant microbiome study.

    Reads were mapped to both reconstructed genomes and closely related reference genomes (Supplementary Table 4), and the percent of each genome covered by sequencing reads is reported. Average nucleotide identity (ANI) is reported between each reconstructed genome and the paired reference genome. The large fractions of reference genomes not represented by metagenome sequencing show that extensive genomic variation is present between surveyed and reference genomes, despite high ANI values in some cases.

  11. Replication rates determined by iRep and kPTR are not in strong agreement for the premature infant study.
    Supplementary Fig. 5: Replication rates determined by iRep and kPTR are not in strong agreement for the premature infant study.

    iRep values were determined based on reconstructed genomes, and kPTR values based on complete reference genomes (r = Pearson’s r value; Supplementary Tables 5 and 8).

  12. Coverage, cumulative GC skew, and bPTR measurements for complete reference genomes with similarity to genomes from the adult human microbiome sample.
    Supplementary Fig. 6: Coverage, cumulative GC skew, and bPTR measurements for complete reference genomes with similarity to genomes from the adult human microbiome sample.

    (a-e) Reads from the adult human microbiome were mapped to complete reference genome sequences. Coverage was calculated for 10 Kbp windows every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). The origin and terminus of replication were determined based on coverage. bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns suggest the presence of genomic variation or assembly errors for some genomes (b-c, e).

  13. Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) for bacterial species associated with premature infants.
    Supplementary Fig. 7: Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) for bacterial species associated with premature infants.

    The five days following antibiotic administration are indicated using a color gradient (DOL = day of life). Half of the infants in the study developed necrotizing enterocolitis (NEC; dotted red lines) during the study period.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Bremer, H. & Churchward, G. An examination of the Cooper-Helmstetter theory of DNA replication in bacteria and its underlying assumptions. J. Theor. Biol. 69, 645654 (1977).
  2. Skovgaard, O., Bak, M., Løbner-Olesen, A. & Tommerup, N. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing. Genome Res. 21, 13881393 (2011).
  3. Prescott, D.M. & Kuempel, P.L. Bidirectional replication of the chromosome in Escherichia coli. Proc. Natl. Acad. Sci. USA 69, 28422845 (1972).
  4. Wake, R.G. Visualization of reinitiated chromosomes in Bacillus subtilis. J. Mol. Biol. 68, 501509 (1972).
  5. Sernova, N.V. & Gelfand, M.S. Identification of replication origins in prokaryotic genomes. Brief. Bioinform. 9, 376391 (2008).
  6. Gao, F., Luo, H. & Zhang, C.-T. DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes. Nucleic Acids Res. 41, D90D93 (2013).
  7. Anantharaman, K. et al. Analysis of five complete genome sequences for members of the class Peribacteria in the recently recognized Peregrinibacteria bacterial phylum. PeerJ 4, e1607 (2016).
  8. Korem, T. et al. Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 11011106 (2015).
  9. Cooper, S. & Helmstetter, C.E. Chromosome replication and the division cycle of Escherichia coli B/r. J. Mol. Biol. 31, 519540 (1968).
  10. Tyson, G.W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 3743 (2004).
  11. Baker, B.J. et al. Enigmatic, ultrasmall, uncultivated Archaea. Proc. Natl. Acad. Sci. USA 107, 88068811 (2010).
  12. Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111120 (2013).
  13. Iverson, V. et al. Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587590 (2012).
  14. Nielsen, H.B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822828 (2014).
  15. Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208211 (2015).
  16. Castelle, C.J. et al. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690701 (2015).
  17. Seitz, K.W., Lazar, C.S., Hinrichs, K.-U., Teske, A.P. & Baker, B.J. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME J. 10, 16961705 (2016).
  18. Joshi, N. Sickle. github.com https://github.com/najoshi/sickle.
  19. Peng, Y., Leung, H.C.M., Yiu, S.M. & Chin, F.Y.L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 14201428 (2012).
  20. Dick, G.J. et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009).
  21. Wu, Y.-W., Simmons, B.A. & Singer, S.W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605607 (2016).
  22. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 11441146 (2014).
  23. Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. & Tyson, G.W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 10431055 (2015).
  24. Wrighton, K.C. et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337, 16611665 (2012).
  25. Di Rienzi, S.C. et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2, e01102 (2013).
  26. Castelle, C.J. et al. Extraordinary phylogenetic diversity and metabolic versatility in aquifer sediment. Nat. Commun. 4, 2120 (2013).
  27. Eloe-Fadrosh, E.A. et al. Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nat. Commun. 7, 10476 (2016).
  28. Lobry, J.R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13, 660665 (1996).
  29. Raveh-Sadka, T. et al. Gut bacteria are rarely shared by co-hospitalized premature infants, regardless of necrotizing enterocolitis development. eLife 4, e05477 (2015).
  30. Sharon, I. et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534543 (2015).
  31. Paczia, N. et al. Extensive exometabolome analysis reveals extended overflow metabolism in various microorganisms. Microb. Cell Fact. 11, 122 (2012).
  32. Kopf, S.H. et al. Trace incorporation of heavy water reveals slow and heterogeneous pathogen growth rates in cystic fibrosis sputum. Proc. Natl. Acad. Sci. USA 113, E110E116 (2016).
  33. Luef, B. et al. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 6, 6372 (2015).
  34. Hug, L.A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
  35. Podar, M. et al. Targeted access to the genomes of low-abundance organisms in complex microbial communities. Appl. Environ. Microbiol. 73, 32053214 (2007).
  36. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431437 (2013).
  37. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533538 (2013).
  38. Kantor, R.S. et al. Small genomes and sparse metabolisms of sediment-associated bacteria from four candidate phyla. MBio 4, e00708e00713 (2013).
  39. Nelson, W.C. & Stegen, J.C. The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle. Front. Microbiol. 6, 713 (2015).
  40. Burstein, D. et al. Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat. Commun. 7, 10613 (2016).
  41. Gong, J., Qing, Y., Guo, X. & Warren, A. “Candidatus Sonnebornia yantaiensis”, a member of candidate division OD1, as intracellular bacteria of the ciliated protist Paramecium bursaria (Ciliophora, Oligohymenophorea). Syst. Appl. Microbiol. 37, 3541 (2014).
  42. Soro, V. et al. Axenic culture of a candidate division TM7 bacterium from the human oral cavity and biofilm interactions with other oral bacteria. Appl. Environ. Microbiol. 80, 64806489 (2014).
  43. He, X. et al. Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc. Natl. Acad. Sci. USA 112, 244249 (2015).
  44. Luo, F., Devine, C.E. & Edwards, E.A. Cultivating microbial dark matter in benzene-degrading methanogenic consortia. Environ. Microbiol. 18, 29232936 (2016).
  45. Vieira-Silva, S. & Rocha, E.P.C. The systemic imprint of growth and its uses in ecological (meta)genomics. PLoS Genet. 6, e1000808 (2010).
  46. Carini, P. et al. Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Preprint at bioRxiv http://dx.doi.org/10.1101/043372 (2016).
  47. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012).
  48. Newville, M., Stensitzki, T., Allen, D.B. & Ingargiola, A. LMFIT: non-linear least-square minimization and curve-fitting for Python (Zenodo, 2014).
  49. Grigoriev, A. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 26, 22862290 (1998).
  50. Ross, M.G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013).
  51. Richter, M. & Rosselló-Móra, R. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. USA 106, 1912619131 (2009).
  52. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403410 (1990).
  53. Rissman, A.I. et al. Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 25, 20712073 (2009).
  54. Ondov, B.D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
  55. Raes, J., Korbel, J.O., Lercher, M.J., von Mering, C. & Bork, P. Prediction of effective genome size in metagenomic samples. Genome Biol. 8, R10 (2007).

Download references

Author information

Affiliations

  1. Department of Plant and Microbial Biology, University of California, Berkeley, California, USA.

    • Christopher T Brown &
    • Matthew R Olm
  2. Department of Earth and Planetary Science, University of California, Berkeley, California, USA.

    • Brian C Thomas &
    • Jillian F Banfield
  3. Department of Environmental Science, Policy, and Management, University of California, Berkeley, California, USA.

    • Jillian F Banfield
  4. Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

    • Jillian F Banfield

Contributions

C.T.B. and J.F.B. developed the iRep and bPTR methods. M.R.O. ordered and oriented draft genome sequences for bPTR calculations and conducted kPTR analyses. C.T.B. conducted the iRep, bPTR, and kPTR comparisons, and determined the accuracy of the iRep method. J.F.B. binned the adult human metagenome and curated the Deltaproteobacterium genome, with input from C.T.B. C.T.B. implemented the iRep method. B.C.T. provided bioinformatics support. C.T.B. and J.F.B. drafted the manuscript. All authors contributed to iRep development, reviewed results, and approved the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Schematic showing steps involved in a genome-resolved metagenomics study that includes iRep analysis. (136 KB)

    Microbiome sample collection and DNA extraction methods should be determined on a per-project basis, and metagenome sequencing can be conducted on the Illumina, PacBio, or another sequencing platform. Sequencing reads are trimmed based on quality scores (e.g. using SickleSickle18) and filtered for contamination (e.g. removal of human genome sequences). High-quality reads are then assembled (e.g. using IDBA_UDUD19), and the resulting scaffolds are binned either manually (e.g. based on GC content, taxonomic affiliation, coverage), and/or using a clustering algorithm such as ESOMESOM20,29,30) or using an automated binning program (e.g. MaxBinMaxBin21, CONCOCTCONCOCT22, or ABAWACAABAWACA15). Genome bins can then be assessed for completion and contamination based on inventory of expected single copy genes (SCGs), either based on identification of these genes from genome annotations (seesee15,29,55), or using software such as CheckMCheckM23. High-quality genomes are then compared with one another and grouped into clusters based on average nucleotide identity (ANI; e.g., based on sharing 98% ANI determined using MashMash54). A representative of each cluster should be included in a genome database that will be used for iRep analysis, along with genomes from other projects that may be appropriate for the analysis. Reads from each metagenome are then mapped to the genome database (e.g. using Bowtie2Bowtie247), and iRep is calculated from the read mapping data (see Online Methods).

  2. Supplementary Figure 2: Evaluation of iRep method parameters. (92 KB)

    (a) Gamma distribution used to simulate genome fragmentation for genome completeness analyses. The frequency of genome fragment sizes from all genomes analyzed in this study are compared with genome fragment sizes simulated using a gamma distribution with parameters: alpha = 0.1, beta = 21,000, min. = 5,000, max. = 200,000. These parameters were first estimated by fitting to the genome data, and then manually adjusted. Similarity between the two distributions shows that this gamma distribution can be used to approximate the level of genome fragmentation expected for draft-quality genome sequences. (b) iRep was calculated from random genome fragmentation simulations in order to survey a range of fragmentation levels (Supplementary Table 1). The analysis was conducted for an L. gasseri sample from the Korem et al.8 study in which iRep was determined to be 2.01 using the complete genome with 25x sequencing coverage. This known iRep value was then compared with iRep values determined from each genome fragmentation simulation after subsampling to 75% of the genome and using only 5x sequencing coverage. This enabled analysis of the influence of fragmentation on iRep calculations at the completeness and coverage limits of the method. Results show that 91.8% of iRep values are within the expected range of 0.15 when genomes have fewer than 175 fragments/Mbp of genome sequence. (c) Four L. gasseri samples from the Korem et al.8 study that represent iRep values between 1.50 and 2.01 were selected in order to test different coverage sliding window calculation methods (see Online Methods for description of each methods) and window sizes. For each sample, 100 random genome fragmentations and subsets were conducted in order to assess each method based on various levels of genome completion. The results show that the “iRep” and “median iRep” methods using 5 Kbp windows exhibited the least amount of variation. (d) Because the iRep method involves randomly combining coverage data from different genome fragments prior to calculating coverage sliding windows, some sliding windows will include coverage values from different locations on the complete genome sequence. In order to evaluate the variation introduced by the (random) order in which scaffolds are combined, iRep calculations were conducted for ten random orderings of 100 random genome fragmentations conducted using the sample set described in (c). Results show a very minimal amount of variation in iRep values as described by the difference between the lowest and highest values determined from each of the ten orderings (“iRep range”). Because of this, we chose not to implement the “median iRep” strategy. (e) Using the sample set described in (c), the iRep method was implemented using 5 Kbp windows using different window slide values in order to test whether or not the slide value would change the results. Because both 10 and 100 bp window slides produced similar results, we implemented the iRep method using a 100 bp window slide. (f) iRep is not as strongly correlated with bPTR without the GC sequencing bias correction for five genome sequences assembled from premature infant metagenomes (Supplementary Table 4; compare with GC corrected data in Fig. 2e).

  3. Supplementary Figure 3: Coverage, GC skew patterns, and bPTR measurements for reconstructed genomes oriented and ordered based on complete reference genome sequences. (103 KB)

    (a-e) Read mapping was conducted using sequences from the sample used for genome recovery. bPTR was calculated after determining the origin and terminus of replication based on cumulative GC skew. Coverage was calculated for 10 Kbp windows calculated every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns confirm the ordering of genome fragments.

  4. Supplementary Figure 4: Reference genomes are not representative of organisms surveyed in the premature infant microbiome study. (13 KB)

    Reads were mapped to both reconstructed genomes and closely related reference genomes (Supplementary Table 4), and the percent of each genome covered by sequencing reads is reported. Average nucleotide identity (ANI) is reported between each reconstructed genome and the paired reference genome. The large fractions of reference genomes not represented by metagenome sequencing show that extensive genomic variation is present between surveyed and reference genomes, despite high ANI values in some cases.

  5. Supplementary Figure 5: Replication rates determined by iRep and kPTR are not in strong agreement for the premature infant study. (19 KB)

    iRep values were determined based on reconstructed genomes, and kPTR values based on complete reference genomes (r = Pearson’s r value; Supplementary Tables 5 and 8).

  6. Supplementary Figure 6: Coverage, cumulative GC skew, and bPTR measurements for complete reference genomes with similarity to genomes from the adult human microbiome sample. (108 KB)

    (a-e) Reads from the adult human microbiome were mapped to complete reference genome sequences. Coverage was calculated for 10 Kbp windows every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). The origin and terminus of replication were determined based on coverage. bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns suggest the presence of genomic variation or assembly errors for some genomes (b-c, e).

  7. Supplementary Figure 7: Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) for bacterial species associated with premature infants. (98 KB)

    The five days following antibiotic administration are indicated using a color gradient (DOL = day of life). Half of the infants in the study developed necrotizing enterocolitis (NEC; dotted red lines) during the study period.

PDF files

  1. Supplementary Text and Figures (9.49 MB)

    Supplementary Figures 1–7

Excel files

  1. Supplementary Table 1 (7,154 KB)

    Analysis of the impact of genome completeness on iRep replication rate measurements.

  2. Supplementary Table 2 (49 KB)

    Comparison of iRep, bPTR, and kPTR measurements.

  3. Supplementary Table 3 (71 KB)

    iRep, bPTR, and kPTR measurements for minimum genome sequencing coverage analyses.

  4. Supplementary Table 4 (89 KB)

    Comparison of iRep and bPTR measurements for draft-qualitygenomes ordered and oriented based on complete genome sequences.

  5. Supplementary Table 5 (167 KB)

    iRep measurements for organisms associated with prematureinfant microbiomes.

  6. Supplementary Table 6 (39 KB)

    Single copy gene inventory for genomes reconstructed from anadult human gut metagenome.

  7. Supplementary Table 7 (49 KB)

    iRep measurements for organisms associated with an adulthuman microbiome.

  8. Supplementary Table 8 (47 KB)

    kPTR values determined from the premature infantmetagenomes.

  9. Supplementary Table 9 (34 KB)

    kPTR values determined from the adult human metagenome.

  10. Supplementary Table 10 (204 KB)

    iRep measurements for Candidate Phyla Radiation (CPR)organisms.

Zip files

  1. Supplementary Code (91 KB)

Additional data