Culture-independent microbiome studies have increased our understanding of the complexity and metabolic potential of microbial communities. However, to understand the contribution of individual microbiome members to community functions, it is important to determine which bacteria are actively replicating. We developed an algorithm, iRep, that uses draft-quality genome sequences and single time-point metagenome sequencing to infer microbial population replication rates. The algorithm calculates an index of replication (iRep) based on the sequencing coverage trend that results from bi-directional genome replication from a single origin of replication. We apply this method to show that microbial replication rates increase after antibiotic administration in human infants. We also show that uncultivated, groundwater-associated, Candidate Phyla Radiation bacteria only rarely replicate quickly in subsurface communities undergoing substantial changes in geochemistry. Our method can be applied to any genome-resolved microbiome study to track organism responses to varying conditions, identify actively growing populations and measure replication rates for use in modeling studies.
At a glance
- An examination of the Cooper-Helmstetter theory of DNA replication in bacteria and its underlying assumptions. J. Theor. Biol. 69, 645–654 (1977). &
- Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing. Genome Res. 21, 1388–1393 (2011). , , &
- Bidirectional replication of the chromosome in Escherichia coli. Proc. Natl. Acad. Sci. USA 69, 2842–2845 (1972). &
- Visualization of reinitiated chromosomes in Bacillus subtilis. J. Mol. Biol. 68, 501–509 (1972).
- Identification of replication origins in prokaryotic genomes. Brief. Bioinform. 9, 376–391 (2008). &
- DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes. Nucleic Acids Res. 41, D90–D93 (2013). , &
- Analysis of five complete genome sequences for members of the class Peribacteria in the recently recognized Peregrinibacteria bacterial phylum. PeerJ 4, e1607 (2016). et al.
- Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 1101–1106 (2015). et al.
- Chromosome replication and the division cycle of Escherichia coli B/r. J. Mol. Biol. 31, 519–540 (1968). &
- Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004). et al.
- Enigmatic, ultrasmall, uncultivated Archaea. Proc. Natl. Acad. Sci. USA 107, 8806–8811 (2010). et al.
- Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013). et al.
- Untangling genomes from metagenomes: revealing an uncultured class of marine Euryarchaeota. Science 335, 587–590 (2012). et al.
- Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014). et al.
- Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015). et al.
- Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690–701 (2015). et al.
- Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME J. 10, 1696–1705 (2016). , , , &
- Sickle. github.com https://github.com/najoshi/sickle.
- IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012). , , &
- Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009). et al.
- MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016). , &
- Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014). et al.
- CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015). , , , &
- Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337, 1661–1665 (2012). et al.
- The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria. eLife 2, e01102 (2013). et al.
- Extraordinary phylogenetic diversity and metabolic versatility in aquifer sediment. Nat. Commun. 4, 2120 (2013). et al.
- Global metagenomic survey reveals a new bacterial candidate phylum in geothermal springs. Nat. Commun. 7, 10476 (2016). et al.
- Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13, 660–665 (1996).
- Gut bacteria are rarely shared by co-hospitalized premature infants, regardless of necrotizing enterocolitis development. eLife 4, e05477 (2015). et al.
- Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015). et al.
- Extensive exometabolome analysis reveals extended overflow metabolism in various microorganisms. Microb. Cell Fact. 11, 122 (2012). et al.
- Trace incorporation of heavy water reveals slow and heterogeneous pathogen growth rates in cystic fibrosis sputum. Proc. Natl. Acad. Sci. USA 113, E110–E116 (2016). et al.
- Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 6, 6372 (2015). et al.
- A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016). et al.
- Targeted access to the genomes of low-abundance organisms in complex microbial communities. Appl. Environ. Microbiol. 73, 3205–3214 (2007). et al.
- Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013). et al.
- Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013). et al.
- Small genomes and sparse metabolisms of sediment-associated bacteria from four candidate phyla. MBio 4, e00708–e00713 (2013). et al.
- The reduced genomes of Parcubacteria (OD1) contain signatures of a symbiotic lifestyle. Front. Microbiol. 6, 713 (2015). &
- Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat. Commun. 7, 10613 (2016). et al.
- “Candidatus Sonnebornia yantaiensis”, a member of candidate division OD1, as intracellular bacteria of the ciliated protist Paramecium bursaria (Ciliophora, Oligohymenophorea). Syst. Appl. Microbiol. 37, 35–41 (2014). , , &
- Axenic culture of a candidate division TM7 bacterium from the human oral cavity and biofilm interactions with other oral bacteria. Appl. Environ. Microbiol. 80, 6480–6489 (2014). et al.
- Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc. Natl. Acad. Sci. USA 112, 244–249 (2015). et al.
- Cultivating microbial dark matter in benzene-degrading methanogenic consortia. Environ. Microbiol. 18, 2923–2936 (2016). , &
- The systemic imprint of growth and its uses in ecological (meta)genomics. PLoS Genet. 6, e1000808 (2010). &
- Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Preprint at bioRxiv http://dx.doi.org/10.1101/043372 (2016). et al.
- Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). &
- LMFIT: non-linear least-square minimization and curve-fitting for Python (Zenodo, 2014). , , &
- Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 26, 2286–2290 (1998).
- Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013). et al.
- Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. USA 106, 19126–19131 (2009). &
- Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). , , , &
- Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics 25, 2071–2073 (2009). et al.
- Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016). et al.
- Prediction of effective genome size in metagenomic samples. Genome Biol. 8, R10 (2007). , , , &
- Supplementary Figure 1: Schematic showing steps involved in a genome-resolved metagenomics study that includes iRep analysis. (136 KB)
Microbiome sample collection and DNA extraction methods should be determined on a per-project basis, and metagenome sequencing can be conducted on the Illumina, PacBio, or another sequencing platform. Sequencing reads are trimmed based on quality scores (e.g. using SickleSickle18) and filtered for contamination (e.g. removal of human genome sequences). High-quality reads are then assembled (e.g. using IDBA_UDUD19), and the resulting scaffolds are binned either manually (e.g. based on GC content, taxonomic affiliation, coverage), and/or using a clustering algorithm such as ESOMESOM20,29,30) or using an automated binning program (e.g. MaxBinMaxBin21, CONCOCTCONCOCT22, or ABAWACAABAWACA15). Genome bins can then be assessed for completion and contamination based on inventory of expected single copy genes (SCGs), either based on identification of these genes from genome annotations (seesee15,29,55), or using software such as CheckMCheckM23. High-quality genomes are then compared with one another and grouped into clusters based on average nucleotide identity (ANI; e.g., based on sharing 98% ANI determined using MashMash54). A representative of each cluster should be included in a genome database that will be used for iRep analysis, along with genomes from other projects that may be appropriate for the analysis. Reads from each metagenome are then mapped to the genome database (e.g. using Bowtie2Bowtie247), and iRep is calculated from the read mapping data (see Online Methods).
- Supplementary Figure 2: Evaluation of iRep method parameters. (92 KB)
(a) Gamma distribution used to simulate genome fragmentation for genome completeness analyses. The frequency of genome fragment sizes from all genomes analyzed in this study are compared with genome fragment sizes simulated using a gamma distribution with parameters: alpha = 0.1, beta = 21,000, min. = 5,000, max. = 200,000. These parameters were first estimated by fitting to the genome data, and then manually adjusted. Similarity between the two distributions shows that this gamma distribution can be used to approximate the level of genome fragmentation expected for draft-quality genome sequences. (b) iRep was calculated from random genome fragmentation simulations in order to survey a range of fragmentation levels (Supplementary Table 1). The analysis was conducted for an L. gasseri sample from the Korem et al.8 study in which iRep was determined to be 2.01 using the complete genome with 25x sequencing coverage. This known iRep value was then compared with iRep values determined from each genome fragmentation simulation after subsampling to 75% of the genome and using only 5x sequencing coverage. This enabled analysis of the influence of fragmentation on iRep calculations at the completeness and coverage limits of the method. Results show that 91.8% of iRep values are within the expected range of 0.15 when genomes have fewer than 175 fragments/Mbp of genome sequence. (c) Four L. gasseri samples from the Korem et al.8 study that represent iRep values between 1.50 and 2.01 were selected in order to test different coverage sliding window calculation methods (see Online Methods for description of each methods) and window sizes. For each sample, 100 random genome fragmentations and subsets were conducted in order to assess each method based on various levels of genome completion. The results show that the “iRep” and “median iRep” methods using 5 Kbp windows exhibited the least amount of variation. (d) Because the iRep method involves randomly combining coverage data from different genome fragments prior to calculating coverage sliding windows, some sliding windows will include coverage values from different locations on the complete genome sequence. In order to evaluate the variation introduced by the (random) order in which scaffolds are combined, iRep calculations were conducted for ten random orderings of 100 random genome fragmentations conducted using the sample set described in (c). Results show a very minimal amount of variation in iRep values as described by the difference between the lowest and highest values determined from each of the ten orderings (“iRep range”). Because of this, we chose not to implement the “median iRep” strategy. (e) Using the sample set described in (c), the iRep method was implemented using 5 Kbp windows using different window slide values in order to test whether or not the slide value would change the results. Because both 10 and 100 bp window slides produced similar results, we implemented the iRep method using a 100 bp window slide. (f) iRep is not as strongly correlated with bPTR without the GC sequencing bias correction for five genome sequences assembled from premature infant metagenomes (Supplementary Table 4; compare with GC corrected data in Fig. 2e).
- Supplementary Figure 3: Coverage, GC skew patterns, and bPTR measurements for reconstructed genomes oriented and ordered based on complete reference genome sequences. (103 KB)
(a-e) Read mapping was conducted using sequences from the sample used for genome recovery. bPTR was calculated after determining the origin and terminus of replication based on cumulative GC skew. Coverage was calculated for 10 Kbp windows calculated every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns confirm the ordering of genome fragments.
- Supplementary Figure 4: Reference genomes are not representative of organisms surveyed in the premature infant microbiome study. (13 KB)
Reads were mapped to both reconstructed genomes and closely related reference genomes (Supplementary Table 4), and the percent of each genome covered by sequencing reads is reported. Average nucleotide identity (ANI) is reported between each reconstructed genome and the paired reference genome. The large fractions of reference genomes not represented by metagenome sequencing show that extensive genomic variation is present between surveyed and reference genomes, despite high ANI values in some cases.
- Supplementary Figure 5: Replication rates determined by iRep and kPTR are not in strong agreement for the premature infant study. (19 KB)
- Supplementary Figure 6: Coverage, cumulative GC skew, and bPTR measurements for complete reference genomes with similarity to genomes from the adult human microbiome sample. (108 KB)
(a-e) Reads from the adult human microbiome were mapped to complete reference genome sequences. Coverage was calculated for 10 Kbp windows every 100 bp (extremely low and high coverage windows were filtered out; see Online Methods). The origin and terminus of replication were determined based on coverage. bPTR was calculated as the ratio between the coverage at the origin and terminus after applying a median filter. Cumulative GC skew and coverage patterns suggest the presence of genomic variation or assembly errors for some genomes (b-c, e).
- Supplementary Figure 7: Absolute abundance (bars, left axis) and iRep (scatter plot, right axis) for bacterial species associated with premature infants. (98 KB)
The five days following antibiotic administration are indicated using a color gradient (DOL = day of life). Half of the infants in the study developed necrotizing enterocolitis (NEC; dotted red lines) during the study period.
- Supplementary Text and Figures (9.49 MB)
Supplementary Figures 1–7
- Supplementary Table 1 (7,154 KB)
Analysis of the impact of genome completeness on iRep replication rate measurements.
- Supplementary Table 2 (49 KB)
Comparison of iRep, bPTR, and kPTR measurements.
- Supplementary Table 3 (71 KB)
iRep, bPTR, and kPTR measurements for minimum genome sequencing coverage analyses.
- Supplementary Table 4 (89 KB)
Comparison of iRep and bPTR measurements for draft-qualitygenomes ordered and oriented based on complete genome sequences.
- Supplementary Table 5 (167 KB)
iRep measurements for organisms associated with prematureinfant microbiomes.
- Supplementary Table 6 (39 KB)
Single copy gene inventory for genomes reconstructed from anadult human gut metagenome.
- Supplementary Table 7 (49 KB)
iRep measurements for organisms associated with an adulthuman microbiome.
- Supplementary Table 8 (47 KB)
kPTR values determined from the premature infantmetagenomes.
- Supplementary Table 9 (34 KB)
kPTR values determined from the adult human metagenome.
- Supplementary Table 10 (204 KB)
iRep measurements for Candidate Phyla Radiation (CPR)organisms.