Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing


In 1943, Luria and Delbrück used a phage-resistance assay to establish spontaneous mutation as a driving force of microbial diversity1. Mutation rates are still studied using such assays, but these can only be used to examine the small minority of mutations conferring survival in a particular condition. Newer approaches, such as long-term evolution followed by whole-genome sequencing2,3, may be skewed by mutational ‘hot’ or ‘cold’ spots3,4. Both approaches are affected by numerous caveats5,6,7. Here we devise a method, maximum-depth sequencing (MDS), to detect extremely rare variants in a population of cells through error-corrected, high-throughput sequencing. We directly measure locus-specific mutation rates in Escherichia coli and show that they vary across the genome by at least an order of magnitude. Our data suggest that certain types of nucleotide misincorporation occur 104-fold more frequently than the basal rate of mutations, but are repaired in vivo. Our data also suggest specific mechanisms of antibiotic-induced mutagenesis, including downregulation of mismatch repair via oxidative stress, transcription–replication conflicts, and, in the case of fluoroquinolones, direct damage to DNA.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Overview of MDS.
Figure 2: Substitution rates and indel frequencies.
Figure 3: Substitution spectra.
Figure 4: Relationships between mutation rates and physiologic conditions.


  1. 1

    Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943)

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Wielgoss, S. et al. Mutation rate inferred from synonymous substitutions in a long-term evolution experiment with Escherichia coli. G3 (Bethesda) 1, 183–186 (2011)

    CAS  Article  Google Scholar 

  3. 3

    Lee, H., Popodi, E., Tang, H. & Foster, P. L. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl Acad. Sci. USA 109, E2774–E2783 (2012)

    ADS  CAS  Article  Google Scholar 

  4. 4

    Martincorena, I., Seshasayee, A. S. N. & Luscombe, N. M. Evidence of non-random mutation rates suggests an evolutionary risk management strategy. Nature 485, 95–98 (2012)

    ADS  CAS  Article  Google Scholar 

  5. 5

    Lenski, R. E., Slatkin, M. & Ayala, F. J. Mutation and selection in bacterial populations: alternatives to the hypothesis of directed mutation. Proc. Natl Acad. Sci. USA 86, 2775–2778 (1989)

    ADS  CAS  Article  Google Scholar 

  6. 6

    Wielgoss, S. et al. Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc. Natl Acad. Sci. USA 110, 222–227 (2013)

    ADS  CAS  Article  Google Scholar 

  7. 7

    Drake, J. W. Contrasting mutation rates from specific-locus and long-term mutation-accumulation procedures. G3 (Bethesda) 2, 483–485 (2012)

    Article  Google Scholar 

  8. 8

    Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011)

    ADS  Article  Google Scholar 

  9. 9

    Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011)

    CAS  Article  Google Scholar 

  10. 10

    Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013)

    ADS  CAS  Article  Google Scholar 

  11. 11

    Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012)

    ADS  CAS  Article  Google Scholar 

  12. 12

    Schmitt, M. W. et al. Sequencing small genomic targets with high efficiency and extreme accuracy. Nat. Methods 12, 423–425 (2015)

    CAS  Article  Google Scholar 

  13. 13

    Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature 505, 686–690 (2014)

    ADS  CAS  Article  Google Scholar 

  14. 14

    Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006)

    Article  Google Scholar 

  15. 15

    Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010)

    ADS  CAS  Article  Google Scholar 

  16. 16

    Garibyan, L. et al. Use of the rpoB gene to determine the specificity of base substitution mutations on the Escherichia coli chromosome. DNA Repair (Amst.) 2, 593–608 (2003)

    CAS  Article  Google Scholar 

  17. 17

    Drake, J. W. A constant rate of spontaneous mutation in DNA-based icrobes. Proc. Natl Acad. Sci. USA 88, 7160–7164 (1991)

    ADS  CAS  Article  Google Scholar 

  18. 18

    Chen, X. & Zhang, J. No gene-specific optimization of mutation rate in Escherichia coli. Mol. Biol. Evol. 30, 1559–1562 (2013)

    CAS  Article  Google Scholar 

  19. 19

    McDonald, M. J., Wang, W.-C., Huang, H.-D. & Leu, J.-Y. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences. PLoS Biol. 9, e1000622 (2011)

    CAS  Article  Google Scholar 

  20. 20

    Zhu, L., Wang, Q., Tang, P., Araki, H. & Tian, D. Genomewide association between insertions/deletions and the nucleotide diversity in bacteria. Mol. Biol. Evol. 26, 2353–2361 (2009)

    CAS  Article  Google Scholar 

  21. 21

    Kohanski, M. A., DePristo, M. A. & Collins, J. J. Sublethal antibiotic treatment leads to multidrug resistance via radical-induced mutagenesis. Mol. Cell 37, 311–320 (2010)

    CAS  Article  Google Scholar 

  22. 22

    Dwyer, D. J. et al. Antibiotics induce redox-related physiological alterations as part of their lethality. Proc. Natl Acad. Sci. USA 111, E2100–E2109 (2014)

    CAS  Article  Google Scholar 

  23. 23

    Gutierrez, A. et al. β-Lactam antibiotics promote bacterial mutagenesis via an RpoS-mediated reduction in replication fidelity. Nat. Commun. 4, 1610 (2013)

    ADS  CAS  Article  Google Scholar 

  24. 24

    Liu, Y. & Imlay, J. A. Cell death from antibiotics without the involvement of reactive oxygen species. Science 339, 1210–1213 (2013)

    ADS  CAS  Article  Google Scholar 

  25. 25

    Lang, G. I. & Murray, A. W. Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae. Genetics 178, 67–82 (2008)

    CAS  Article  Google Scholar 

  26. 26

    Pohlhaus, J. R. & Kreuzer, K. N. Norfloxacin-induced DNA gyrase cleavage complexes block Escherichia coli replication forks, causing double-stranded breaks in vivo. Mol. Microbiol. 56, 1416–1429 (2005)

    CAS  Article  Google Scholar 

  27. 27

    Merrikh, H., Zhang, Y., Grossman, A. D. & Wang, J. D. Replication-transcription conflicts in bacteria. Nat. Rev. Microbiol. 10, 449–458 (2012)

    CAS  Article  Google Scholar 

  28. 28

    Sangurdekar, D. P., Srienc, F. & Khodursky, A. B. A classification based framework for quantitative description of large-scale microarray data. Genome Biol. 7, R32 (2006)

    Article  Google Scholar 

  29. 29

    Dutta, D., Shatalin, K., Epshtein, V., Gottesman, M. E. & Nudler, E. Linking RNA polymerase backtracking to genome instability in E. coli. Cell 146, 533–543 (2011)

    CAS  Article  Google Scholar 

  30. 30

    Rosenberg, S. M. Evolving responsively: adaptive mutation. Nat. Rev. Genet. 2, 504–515 (2001)

    CAS  Article  Google Scholar 

  31. 31

    Berend, D. & Tassa, T. Improved bounds on Bell numbers and on moments of sums of random variables. Probab. Math. Stat. 30, 185–205 (2010)

    MathSciNet  MATH  Google Scholar 

  32. 32

    Woods, R. J. et al. Second-order selection for evolvability in a large Escherichia coli population. Science 331, 1433–1436 (2011)

    ADS  CAS  Article  Google Scholar 

  33. 33

    Datsenko, K. A. & Wanner, B. L. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl Acad. Sci. USA 97, 6640–6645 (2000)

    ADS  CAS  Google Scholar 

  34. 34

    Sarkar, S., Ma, W. T. & Sandri, G. H. On fluctuation analysis: a new, simple and efficient method for computing the expected number of mutants. Genetica 85, 173–179 (1992)

    CAS  Article  Google Scholar 

Download references


We thank A. Heguy and the NYU Genome Technology Center, which is partially supported by the Cancer Center Support Grant, P30CA016087, at the Laura and Isaac Perlmutter Cancer Center. This work used computing resources at the High Performance Computing Facility of the Center for Health Informatics and Bioinformatics at the NYU Langone Medical Center. We thank D. Dwyer and K. Shankarling for materials, and T. Artemyev for his contribution. This work was supported by NIH grant R01GM107329 and HHMI (E.N.) and NCI PSOC grant U54 CA193313 (B.M.). J.J. was supported by the NYU Medical Scientist Training Program and a National Defense Science and Engineering Graduate Fellowship.

Author information




J.J. and I.S. designed the MDS protocols. J.J., A.R., and E.N. designed the biological experiments. J.J., A.R., and Y.A. performed the experiments. J.J., B.M., S.S., and I.S. performed the data analysis. J.J. and E.N. wrote the manuscript with input from all co-authors. B.M. and E.N. supervised the research.

Corresponding authors

Correspondence to Bud Mishra or Evgeny Nudler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks N. Luscombe, I. Martincorena, J. Wang and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 MDS accuracy and yield.

a, Mock culture composed of rpoB point mutants of known concentration was sequenced using MDS. Output concentrations of each point mutant recovered from R = 2 analysis are plotted against its input concentration (see Supplementary Information Table 2 for details). b, c, Distribution of the sizes of barcode families in four trials, shown as log10(number of barcode families) per trial versus size of barcode family in reads (R). b, Trials used for the calibration run shown in a (~100 M reads total, divided into four trials). c, Representative quadruplicate trials (from rpoB of wild-type bacteria grown in LB broth with no antibiotics) taking up a total of one quarter of the output of a HiSeq rapid run, a total of ~60 M reads.

Extended Data Figure 2 Dual-barcode MDS.

a, Barcodes are attached to original DNA molecules as per MDS protocol. After linear amplification, a second barcode is attached to the opposite end of each read (see Supplementary Information: testing sample preparation and PCR efficiency). Exponential PCR is then performed. In the analysis phase, reads can be grouped both by primary barcode (that is, a classic MDS barcode family) and a second barcode corresponding to a ‘subfamily’ of reads with the same parent from a particular linear amplification step before exponential amplification. b, The probability that for a given family only reads of one subfamily are recovered (a ‘homogenous’ barcode) decreases exponentially with R. For example, for R = 3, the probability all 3 reads are of the same subfamily is 0.02. c, We show the number of reads in each subfamily, sorted within each column by subfamily size, for the 1,500 largest primary barcode families in the experiment. For families of such size, it is unlikely that a single subfamily will account for more than 25% of the total number of reads recovered from that family.

Extended Data Figure 3 Substitution frequency controls.

a, Empirically, average substitution frequency (with high frequency substitutions such as C→A excluded) stabilizes as R increases. Note, substitution frequencies are not normalized by number of generations. b, Empirical sequencing C→A error rate at C→A mutational hotspots with neighbouring Cs (same as those in Fig. 3c) versus all other positions. c, C→A substitution frequencies when 10% 8-oxoG is synthetically added to in vitro DNA and in FPG-treated samples. Frequencies are reported from ROI positions with potential 8-oxoG incorporations as described in template ‘rpoB_reverse_complement_8-oxo-Dg’. Frequencies are reported at R = 2 level. For R > 2, no C→A substitutions were found in 72,646 in vitro template sites. Data represent biological triplicates. Error bars are standard deviation.

Extended Data Figure 4 Substitution rates per locus.

Positive frequencies denote synonymous substitutions. Negative frequencies denote nonsynonymous substitutions. a, c, Values are averaged across quadruplicate trials. b, d, In vitro synthesized DNA has undergone 20-cycle PCR amplification using Q5 polymerase.

Extended Data Figure 5 Mutational spectra and contexts.

a, Substitution frequencies of all ROIs after ~120 generations of growth. Note that values are not normalized for the number of generations and are thus true frequencies, rather than mutation rates. b, Mutation frequencies are shown in context of their 5′ (A, C, G, or T on the x axis) and 3′ (A, C, G, or T on the y axis) neighbours. c, The relative relationship between in vivo substitution frequencies and expected errors due to sequencing and PCR (from in vitro DNA assays) is poorly described by a linear approximation (R2 = 0.27). Furthermore, the recovered frequency from in vivo substitutions (R = 3) is higher than the rate of error (equivalent frequencies would be represented by the dotted line), even with the relatively relaxed read-cutoff threshold of R = 2 (the sequencing + PCR error with an R = 3 cutoff is approximately an order of magnitude lower). Templates are rpoB CDS and mrcA ROIs.

Extended Data Figure 6 Comparing substitution rate and indel rate across 5 ROIs reveals a positive correlation.

Pearson correlation coefficient = 0.76.

Extended Data Figure 7 Rate of rifampicin resistance per generation.

ad, As calculated in fluctuation assays in wild-type cells grown in exponential phase only (a), wild-type cells grown to saturation (b), katG overexpression mutant grown to saturation (c) and inactive katG (H106Y point mutation) overexpression mutant grown to saturation (d). Growth in LB broth was supplemented with possible subinhibitory doses of ampicillin (amp), norfloxacin (nor), or gentamycin (gen). Rates are mean. Error bars are 95% CI. N = 25 (see Methods: fluctuation assays).

Extended Data Figure 8 Transversion and transition rates (per nucleotide-generation).

As calculated in fluctuation assays in anaerobic conditions (a) and in a mutS knockout (b). Note that because the transition (Ts) rate was high in MutS strains, transversion mutations could not be detected. Rates are mean. Error bars are 95% CI. N = 25 (see Methods: fluctuation assays).

Extended Data Figure 9 Rates of rpoB and mrcA substitutions in the presence of antibiotics as calculated by MDS.

Asterisks indicate cultures grown separately and prepared with Phusion rather than Q5. Although not shown, we note that only in-frame (3×) indels were observed in rpoB in fluctuation assays, as expected since frameshift indels would be deleterious. These increased in frequency by a factor of 10 on addition of norfloxacin.

Extended Data Figure 10 Schematic depicting the mathematical derivation of the false positive rate of MDS due to polymerase error.

a, The origin of various terms used in equations (2)–(7). b, Illustration of an example calculation of false positive rate given more ‘intuitive’ values of N, R and P. The false positive rate is calculated in a way that accounts for the possibility that an error in one or more ‘linear’ cycles propagates to a whole family of reads. The number of reads with an error (k) is Poisson distributed according to equation (2). The probability of a false positive is the sum of the probabilities that all R reads come from one of k families, for all possible k, according to equation (3). Note that in practice, P < 10−6, and in our study N = 12, R > 2, making the false positive rate much lower (see Fig. 1).

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, additional references and Supplementary Tables 1-3. (PDF 571 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jee, J., Rasouly, A., Shamovsky, I. et al. Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693–696 (2016). https://doi.org/10.1038/nature18313

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing