The origin of mutations is central to understanding evolution and of key relevance to health. Variation occurs non-randomly across the genome, and mechanisms for this remain to be defined. Here we report that the 5′ ends of Okazaki fragments have significantly increased levels of nucleotide substitution, indicating a replicative origin for such mutations. Using a novel method, emRiboSeq, we map the genome-wide contribution of polymerases, and show that despite Okazaki fragment processing, DNA synthesized by error-prone polymerase-α (Pol-α) is retained in vivo, comprising approximately 1.5% of the mature genome. We propose that DNA-binding proteins that rapidly re-associate post-replication act as partial barriers to Pol-δ-mediated displacement of Pol-α-synthesized DNA, resulting in incorporation of such Pol-α tracts and increased mutation rates at specific sites. We observe a mutational cost to chromatin and regulatory protein binding, resulting in mutation hotspots at regulatory elements, with signatures of this process detectable in both yeast and humans.
Your institute does not have access to this article
Open Access articles citing this article.
Nature Open Access 09 February 2022
Cellular and Molecular Life Sciences Open Access 31 August 2021
Functional and genetic determinants of mutation rate variability in regulatory elements of cancer genomes
Genome Biology Open Access 03 May 2021
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Kunkel, T. A. Evolving views of DNA replication (in)fidelity. Cold Spring Harb. Symp. Quant. Biol. 74, 91–101 (2009)
Wolfe, K. H., Sharp, P. M. & Li, W. H. Mutation rates differ among regions of the mammalian genome. Nature 337, 283–285 (1989)
Alexandrov, L. B. & Stratton, M. R. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr. Opin. Genet. Dev. 24, 52–60 (2014)
Ciccia, A. & Elledge, S. J. The DNA damage response: making it safe to play with knives. Mol. Cell 40, 179–204 (2010)
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011)
Pollard, K. S. et al. Forces shaping the fastest evolving regions in the human genome. PLoS Genet. 2, e168 (2006)
Prendergast, J. G. & Semple, C. A. Widespread signatures of recent selection linked to nucleosome positioning in the human lineage. Genome Res. 21, 1777–1787 (2011)
Sasaki, S. et al. Chromatin-associated periodicity in genetic variation downstream of transcriptional start sites. Science 323, 401–404 (2009)
Semple, C. A. & Taylor, M. S. Molecular biology. The structure of change. Science 323, 347–348 (2009)
Warnecke, T., Batada, N. N. & Hurst, L. D. The impact of the nucleosome code on protein-coding sequence evolution in yeast. PLoS Genet. 4, e1000250 (2008)
Washietl, S., Machne, R. & Goldman, N. Evolutionary footprints of nucleosome positions in yeast. Trends Genet. 24, 583–587 (2008)
Ying, H., Epps, J., Williams, R. & Huttley, G. Evidence that localized variation in primate sequence divergence arises from an influence of nucleosome placement on DNA repair. Mol. Biol. Evol. 27, 637–649 (2010)
Johnston, L. H. & Nasmyth, K. A. Saccharomyces cerevisiae cell cycle mutant cdc9 is defective in DNA ligase. Nature 274, 891–893 (1978)
Okazaki, R., Okazaki, T., Sakabe, K., Sugimoto, K. & Sugino, A. Mechanism of DNA chain growth. I. Possible discontinuity and unusual secondary structure of newly synthesized chains. Proc. Natl Acad. Sci. USA 59, 598–605 (1968)
Balakrishnan, L. & Bambara, R. A. Okazaki fragment metabolism. Cold Spring Harb. Perspect. Biol. 5, a010173 (2013)
Zheng, L. & Shen, B. Okazaki fragment maturation: nucleases take centre stage. J. Mol. Cell Biol. 3, 23–30 (2011)
Smith, D. J. & Whitehouse, I. Intrinsic coupling of lagging-strand synthesis to chromatin assembly. Nature 483, 434–438 (2012)
Stith, C. M., Sterling, J., Resnick, M. A., Gordenin, D. A. & Burgers, P. M. Flexibility of eukaryotic Okazaki fragment maturation through regulated strand displacement synthesis. J. Biol. Chem. 283, 34129–34140 (2008)
Perera, R. L. et al. Mechanism for priming DNA synthesis by yeast DNA Polymerase alpha. Elife 2, e00482 (2013)
Walsh, E. & Eckert, K. A. Eukaryotic Replicative DNA Polymerases. Nucleic Acid Polymerases 30, 17–41 (2014)
Pavlov, Y. I. et al. Evidence that errors made by DNA polymerase α are corrected by DNA polymerase δ. Curr. Biol. 16, 202–207 (2006)
Maga, G. et al. Okazaki fragment processing: modulation of the strand displacement activity of DNA polymerase δ by the concerted action of replication protein A, proliferating cell nuclear antigen, and flap endonuclease-1. Proc. Natl Acad. Sci. USA 98, 14298–14303 (2001)
Kunkel, T. A., Hamatake, R. K., Motto-Fox, J., Fitzgerald, M. P. & Sugino, A. Fidelity of DNA polymerase I and the DNA polymerase I-DNA primase complex from Saccharomyces cerevisiae. Mol. Cell. Biol. 9, 4447–4458 (1989)
Lujan, S. A. et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res. 24, 1751–1764 (2014)
Nick McElhinny, S. A., Kissling, G. E. & Kunkel, T. A. Differential correction of lagging-strand replication errors made by DNA polymerases α and δ. Proc. Natl Acad. Sci. USA 107, 21070–21075 (2010)
Niimi, A. et al. Palm mutants in DNA polymerases α and η alter DNA replication fidelity and translesion activity. Mol. Cell. Biol. 24, 2734–2746 (2004)
Nick McElhinny, S. A. et al. Genome instability due to ribonucleotide incorporation into DNA. Nature Chem. Biol. 6, 774–781 (2010)
Reijns, M. A. et al. Enzymatic removal of ribonucleotides from DNA is essential for mammalian genome integrity and development. Cell 149, 1008–1022 (2012)
Sparks, J. L. et al. RNase H2-initiated ribonucleotide excision repair. Mol. Cell 47, 980–986 (2012)
Lujan, S. A., Williams, J. S., Clausen, A. R., Clark, A. B. & Kunkel, T. A. Ribonucleotides are signals for mismatch repair of leading-strand replication errors. Mol. Cell 50, 437–443 (2013)
Lujan, S. A. et al. Mismatch repair balances leading and lagging strand DNA replication fidelity. PLoS Genet. 8, e1003016 (2012)
Miyabe, I., Kunkel, T. A. & Carr, A. M. The major roles of DNA polymerases ε and δ at the eukaryotic replication fork are evolutionarily conserved. PLoS Genet. 7, e1002407 (2011)
Nick McElhinny, S. A., Gordenin, D. A., Stith, C. M., Burgers, P. M. & Kunkel, T. A. Division of labor at the eukaryotic replication fork. Mol. Cell 30, 137–144 (2008)
Larrea, A. A. et al. Genome-wide model for the normal eukaryotic DNA replication fork. Proc. Natl Acad. Sci. USA 107, 17674–17679 (2010)
Pursell, Z. F., Isoz, I., Lundstrom, E. B., Johansson, E. & Kunkel, T. A. Yeast DNA polymerase epsilon participates in leading-strand DNA replication. Science 317, 127–130 (2007)
Raghuraman, M. K. et al. Replication dynamics of the yeast genome. Science 294, 115–121 (2001)
Nick McElhinny, S. A. et al. Abundant ribonucleotide incorporation into DNA by yeast replicative polymerases. Proc. Natl Acad. Sci. USA 107, 4949–4954 (2010)
Kao, H. I. & Bambara, R. A. The protein components and mechanism of eukaryotic Okazaki fragment maturation. Crit. Rev. Biochem. Mol. Biol. 38, 433–452 (2003)
Chon, H. et al. RNase H2 roles in genome integrity revealed by unlinking its activities. Nucleic Acids Res. 41, 3130–3143 (2013)
Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011)
Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005)
Schmidt, S. et al. Hypermutable non-synonymous sites are under stronger negative selection. PLoS Genet. 4, e1000281 (2008)
Vengrova, S. & Dalgaard, J. Z. The wild-type Schizosaccharomyces pombe mat1 imprint consists of two ribonucleotides. EMBO Rep. 7, 59–65 (2006)
Ghodgaonkar, M. M. et al. Ribonucleotides misincorporated into DNA act as strand-discrimination signals in eukaryotic mismatch repair. Mol. Cell 50, 323–332 (2013)
Liberti, S. E., Larrea, A. A. & Kunkel, T. A. Exonuclease 1 preferentially repairs mismatches generated by DNA polymerase α. DNA Repair 12, 92–96 (2013)
Burgess, R. J. & Zhang, Z. Histone chaperones in nucleosome assembly and human disease. Nature Struct. Mol. Biol. 20, 14–22 (2013)
Villar, D., Flicek, P. & Odom, D. T. Evolution of transcription factor binding in metazoans - mechanisms and functional implications. Nature Rev. Genet. 15, 221–233 (2014)
Clausen, A. R. et al. Tracking replication enzymology in vivo by genome-wide mapping of ribonucleotide incorporation. Nature Struct. Mol. Biol http://dx.doi.org/10.1038/nsmb.2957 (2015)
Koh, K. D., Balachander, S., Hesselberth, J. R. & Storici, F. Ribose-seq: global mapping of ribonucleotides embedded in genomic DNA. Nature Methods http://dx.doi.org/10.1038/nmeth.3259 (2015)
Daigaku, Y. et al. A global profile of replicative polymerase usage. Nature Struct. Mol. Biol. (in the press)
Kuhn, R. M., Haussler, D. & Kent, W. J. The UCSC genome browser and associated tools. Brief. Bioinform. 14, 144–161 (2013)
Derrien, T. et al. Fast computation and applications of genome mappability. PLoS ONE 7, e30377 (2012)
Eaton, M. L., Galani, K., Kang, S., Bell, S. P. & MacAlpine, D. M. Conserved nucleosome positioning defines replication origins. Genes Dev. 24, 748–753 (2010)
Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009)
Jiang, C. & Pugh, B. F. A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome. Genome Biol. 10, R109 (2009)
Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–341 (2009)
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007)
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)
Korhonen, J., Martinmaki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009)
Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014)
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012)
Reijns, M. A. et al. The structure of the human RNase H2 complex defines key interaction interfaces relevant to enzyme function and human disease. J. Biol. Chem. 286, 10530–10539 (2011)
We thank N. Hastie and P. Burgers for discussions, I. Adams, J. Caceres, T. Aitman and P. Heyn for comments on the manuscript, and A. Gallacher for technical assistance. We are indebted to J. Williams, A. Clausen and T. Kunkel for sharing yeast strains and unpublished data, and to S. Cerritelli and R. Crouch for RNH201 vectors. Funding: MRC Centenary Award to M.A.M.R.; MRC and Lister Institute of Preventive Medicine to A.P.J.; MRC and Medical Research Foundation to M.S.T.
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Increased OJ and polymorphism rates correlate at binding sites of different nucleosome classes and at Rap1 binding sites.
a–f, OJ and polymorphism rates are strongly correlated for different classes of nucleosomes. Data presented as in Fig. 1a, for different sub-classes of S. cerevisiae nucleosomes, demonstrating that OJ and polymorphism rates co-vary in all cases. Transcription start site proximal nucleosomes (d) are probably subject to strong and asymmetrically distributed selective constraints, which is likely to explain the modestly reduced correlation for this subset. Such transcription start site proximal nucleosomes were excluded from analyses of other categories presented (b, c, e, f), except ‘all nucleosomes’ (a). g, OJ and polymorphism rates are correlated for the S. cerevisiae TF, Rap1. Data presented, as for Reb1 in Fig. 1b, show increased OJ and polymorphism rates around its binding site, with a dip corresponding to its central recognition sequence. h–j, Increased polymorphism and OJ rates at Rap1 (h), nucleosome (i) and Reb1 (j) binding sites are not caused by biases in nucleotide content. Distributions calculated as for g, Fig. 1a and b, respectively, using a trinucleotide preserving genome shuffle. Pink shaded areas denote 95% confidence intervals for nucleotide substitution rates (100 shuffles). k, l, Polymorphism (red) and between-species (black) substitution rates are highly correlated for nucleosome (k) and Reb1 (l) binding sites. Best fit splines shown only. y axes scaled to demonstrate similar shape distribution. Values plotted as percentage relative to the mean rate for all data points (central 11 nucleotides excluded for calculation of mean in g, l).
a, Schematic of emRiboSeq library preparation. rN, ribonucleotide. b–d, Validation of strand-specific detection of enzymatically generated nicks through linker-ligation. Nb.BtsI nicking endonuclease cleaves the bottom strand of its recognition site releasing a 5′ fragment (cyan) with a free 3′-OH group after denaturation, to which the sequencing adaptor (pink) is ligated, allowing sequencing and mapping of this site to the genome (b). Nb.BtsI libraries have high reproducibility between Δrnh201 POL and Δrnh201 Pol-αpol1-L868M) strains after normalizing read counts to sequence tags per million (TPM). Bona fide Nb.BtsI sites were equally represented, at maximal frequency, in both libraries (c). Those with lower frequencies represented sites in close proximity to other Nb.BtsI sites, causing their partial loss during size selection. Additionally, Nb.BtsI-like sites were detected as the result of star activity. Libraries were also prepared using BciVI restriction enzyme digestion, that did not show such star activity (data not shown), allowing calculation of the site specificity for the method (>99.9%). Summed signal at Nb.BtsI sites shows >99.9% strand specificity (blue, correct strand; grey, opposite strand) and >99% single nucleotide resolution (d).
a, Point mutations in replicative polymerases elevate ribonucleotide incorporation rates, permitting their contribution to genome synthesis to be tracked. Schematic of replication fork with polymerases and their ribonucleotide incorporation rates (refs 27, 30 and J. S. Williams, A. R. Clausen & T. A. Kunkel, personal communication) as indicated (POL denotes wild-type polymerases; asterisk denotes point mutants). Embedded ribonucleotides indicated by ‘R’; additional incorporation events due to polymerase mutations highlighted by shaded circles. b, c, Mapping of leading/lagging-strand synthesis by Pol-δ* and Pol-ε* yeast strain using emRiboSeq (as in Fig. 3) highlights both experimentally validated (pink dotted lines) and putative (grey dotted lines) replication origins. These often correspond to regions of early replicating DNA36 (c). d, Pol-αy axes show log2 of the strand ratio calculated in 2,001-nucleotide windows (b–d).
Extended Data Figure 4 Quantification of in vivo ribonucleotide incorporation by replicative polymerases.
a, b, Representative alkaline gel electrophoresis of genomic DNA from yeast strains with mutant replicative DNA polymerases (a), with accompanying densitometry plots (b). Embedded ribonucleotides are detected by increased fragmentation of genomic DNA following alkaline treatment in an RNase H2-deficient (Δrnh201) background. Increased rates are seen with all three mutant polymerases (indicated by asterisk, as defined in Extended Data Fig. 3a), and are reduced in Pol-ε′ which contains the point mutation Met644Leu, a mutation that increases selectivity for dNTPs over rNTPs27. c, Quantification of average ribonucleotide incorporation in polymerase mutants from four independent experiments. DNA isolated from mid-log phase cultures; error bars denote s.e.m. Overall ribonucleotide content is the product of incorporation frequency and the total contribution of each polymerase, resulting in the total ribonucleotide content detected to be highest for Pol-εd, Most of the yeast genome exhibits directional asymmetry in replication (median 4:1 strand ratio). Count of genomic segments calculated for consecutive 2,001-nucleotide windows over the yeast genome based on reanalysis of OF sequencing data17 denoted as ‘Okazaki-seq’. The strand asymmetry ratio was calculated after re-orienting all regions such that the predominant lagging strand was the forward strand. e–g, Genome-wide quantification of strand-specific incorporation of wild-type and mutant replicative DNA polymerases determined by emRiboSeq reflects their roles in leading- and lagging-strand replication. A close to linear correlation with Okazaki-seq strand ratios is observed. The strand ratio preference for lagging-strand ribonucleotide incorporation for independent libraries (including stationary phase libraries for POL and Pol-αf and g). f, g, Scatter plots illustrating the individual strand ratio data points for 2,001-nucleotide windows, for stationary phase POL (f) and Pol-αg) yeast. Pearson’s correlation = 0.49, P < 2.2 × 10−16 for POL (f); correlation = 0.75, P < 2.2 × 10−16 for Pol-αg).
Extended Data Figure 5 Pol-α-synthesized DNA retention is independent of RNase H2 processing of RNA primers.
a, b, The ribonucleotide content of genomic DNA is unchanged between Δrnh201 strains transformed with empty vector (−) or vector expressing Rnh201 separation-of-function mutant (sf), that retains the ability to cleave RNA:DNA hybrids, including RNA primers, but cannot cleave single embedded ribonucleotides39. In contrast, the same vector expressing wild-type Rnh201 (wt) fully rescues alkaline sensitivity of the DNA. As complementation with the separation-of-function mutant had no detectable effect on the ribonucleotide content seen in the Pol-α(Leu868Met) Δrnh201 strain, retention of Pol-α-synthesized DNA appears to be independent of a putative role for RNase H2 in RNA primer removal. Representative result shown for n = 3 independent experiments. c, Wild-type and mutant Rnh201 are expressed at equal levels, as shown by immuno-detection of the C-terminal FLAG tag. Loading control, actin.
Extended Data Figure 6 Elevated substitution rates are observed adjacent to many human TF binding sites.
a–d, Nucleotide substitution rates (plotted as GERP scores) are elevated immediately adjacent to REST (a, b) and CTCF binding sites (c, d). Colour intensity shows quartiles of ChIP-seq peak height (pink to brown: lower to higher), reflecting strength of binding/occupancy. Stronger binding correlates with greater increases of proximal substitution rate in the ‘shoulder’ region (asterisk). Increased substitution rates are not a consequence of local sequence composition effects (b, d). Strongest binding quartile of sites (brown) is shown compared to a trinucleotide preserving shuffle (black) based on the flanking sequence (100–300 nucleotides from motif midpoint) of the same genomic locations. Brown dashed line and grey shading denote 95% confidence intervals. e, Substitution rates plotted as GERP scores for human TF binding sites identified in ChIP-seq data sets (in conjunction with binding site motif). Sites aligned (x = 0) on the midpoint of the TF binding site within the ChIP-seq peak (colours as for a–d). Dashed black line shows y = 0, the genome wide expectation for neutral evolution.
a, b, DNase I footprint edges correspond, genome-wide, to increased OJ rates and locally elevated polymorphism rates in S. cerevisiae (a), a pattern that is maintained when footprints associated with Reb1 and Rap1 binding sites are excluded (b). Genome-wide DNase I footprints (n = 6,063) and excluding those within 50 nucleotides of a Reb1 or Rap1 binding site (n = 5,136) were aligned to their midpoint. c, d, Aligning DNase I footprints on their left edge rather than midpoint (to compensate for substantial heterogeneity in footprint size) demonstrates a distinct shoulder of elevated polymorphism rate at the aligned edge (c), with a significant elevation compared to nearby sequence upstream from the footprint (d). DNase I footprints from a were aligned to their left edge (x = 0) with corresponding polymorphism rates shown (c). The increased polymorphism rate cannot be explained by local sequence compositional distortions (d). Nucleotide substitution rates in the 11 nucleotides centred on the DNase footprint edge (pink line), and another 11 nucleotides encompassing positions −35 to −25 relative to the footprint edge (green line) were quantified. Darker pink and green filled circles denote the mean of observed substitution rates and lighter shades denote the mean for the same sites after trinucleotide preserving genomic shuffles. Error bars denote s.d.; statistics by Mann–Whitney test. e, Model shows that correlation of increased nucleotide substitution and OJ rates are consistent with increased mutation frequency across heterogeneous DNase I footprints. Polymorphism is reduced at sequence-specific binding sites within the footprints, owing to functional constraint. Therefore, the effect of OF-related mutagenesis in these regions is most sensitively detected in the region immediately adjacent to the binding site (left of vertical dashed blue line, representing footprints aligned to their left edge). This ‘shoulder’ of increased nucleotide substitutions represents sites with increased, OJ-associated mutation is followed by a region of depressed substitution rates, owing to selective effects of the functional binding sites within the footprints (to the right of the dashed blue line). Signals further to the right are not interpretable given the heterogeneity in DNase I footprint sizes. Given strong selection at TF and DNase I footprint sites, this ‘shoulder’ of elevated nucleotide substitutions could represent a measure for the local mutation rate for such regions, analogous to that measured by the fourfold degenerate sites in protein coding sequence.
a, OF priming occurs stochastically, with the 5′ end of each OF initially synthesized by Pol-α and the remainder of the OF synthesized by Pol-δ. b, c, OF processing: when Pol-δ encounters the previously synthesized OF, Pol-δ continues to synthesize DNA displacing the 5′ end of the downstream OF, which is removed by nucleases to result in mature OFs which are then ligated. The OJs of such mature OFs before ligation were detected previously17 after depletion of temperature-sensitive DNA ligase I. They demonstrated that if a protein barrier is encountered (grey circle), Pol-δ progression is impaired, leading to reduced removal of the downstream OF (b). Given that ∼1.5% of the mature genome is synthesized by Pol-α, a proportion of lagging strands will retain Pol-α-synthesized DNA (red). When Pol-δ progression is impaired by protein binding, this will lead to an increased fraction of fragments containing Pol-α-synthesized DNA downstream of such sites (c).
About this article
Cite this article
Reijns, M., Kemp, H., Ding, J. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015). https://doi.org/10.1038/nature14183
Functional and genetic determinants of mutation rate variability in regulatory elements of cancer genomes
Genome Biology (2021)
Nature Protocols (2021)
Nature Structural & Molecular Biology (2021)
Cellular and Molecular Life Sciences (2021)