Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Differential DNA mismatch repair underlies mutation rate variation across the human genome

This article has been updated


Cancer genome sequencing has revealed considerable variation in somatic mutation rates across the human genome, with mutation rates elevated in heterochromatic late replicating regions and reduced in early replicating euchromatin1,2,3,4,5. Multiple mechanisms have been suggested to underlie this2,6,7,8,9,10, but the actual cause is unknown. Here we identify variable DNA mismatch repair (MMR) as the basis of this variation. Analysing 17 million single-nucleotide variants from the genomes of 652 tumours, we show that regional autosomal mutation rates at megabase resolution are largely stable across cancer types, with differences related to changes in replication timing and gene expression. However, mutations arising after the inactivation of MMR are no longer enriched in late replicating heterochromatin relative to early replicating euchromatin. Thus, differential DNA repair and not differential mutation supply is the primary cause of the large-scale regional mutation rate variation across the human genome.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Changes in megabase-scale regional mutation rate variation between tumour samples.
Figure 2: Reduced regional mutation rate variability in genomes of MSI cancer samples.
Figure 3: Association of mutational signatures to MSI and to replication timing.
Figure 4: Inferring the time of MMR failure by a deconvolution of the mutational signatures.

Change history

  • 26 February 2015

    A sentence was edited in the abstract to match the authors’ findings


  1. 1

    Hodgkinson, A., Chen, Y. & Eyre-Walker, A. The large-scale distribution of somatic mutations in cancer genomes. Hum. Mutat. 33, 136–143 (2012)

    CAS  Article  Google Scholar 

  2. 2

    Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012)

    ADS  Article  Google Scholar 

  3. 3

    Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nature Commun. 3, 1004 (2012)

    ADS  Article  Google Scholar 

  4. 4

    Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)

    CAS  ADS  Article  Google Scholar 

  5. 5

    Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nature Commun. 4, 1502 (2013)

    ADS  Article  Google Scholar 

  6. 6

    Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nature Genet. 41, 393–395 (2009)

    CAS  Article  Google Scholar 

  7. 7

    Waters, L. S. & Walker, G. C. The critical mutagenic translesion DNA polymerase Rev1 is highly expressed during G2/M phase rather than S phase. Proc. Natl Acad. Sci. USA 103, 8971–8976 (2006)

    CAS  ADS  Article  Google Scholar 

  8. 8

    Hsu, T. C. A possible function of constitutive heterochromatin: the bodyguard hypothesis. Genetics 79 (suppl.). 137–150 (1975)

    PubMed  Google Scholar 

  9. 9

    Sima, J. & Gilbert, D. M. Complex correlations: replication timing and mutational landscapes during cancer and genome evolution. Curr. Opin. Genet. Dev. 25, 93–100 (2014)

    CAS  Article  Google Scholar 

  10. 10

    Chen, C.-L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010)

    CAS  Article  Google Scholar 

  11. 11

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013)

    CAS  ADS  Article  Google Scholar 

  12. 12

    Jäger, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013)

    Article  Google Scholar 

  13. 13

    The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012)

  14. 14

    The Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013)

    ADS  Article  Google Scholar 

  15. 15

    The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014)

  16. 16

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)

    CAS  Article  Google Scholar 

  17. 17

    Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nature Rev. Genet. 15, 585–598 (2014)

    CAS  Article  Google Scholar 

  18. 18

    Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. Science 334, 1713–1716 (2011)

    CAS  ADS  Article  Google Scholar 

  19. 19

    Edelbrock, M. A., Kaliyaperumal, S. & Williams, K. J. DNA mismatch repair efficiency and fidelity are elevated during DNA synthesis in human cells. Mutat. Res. 662, 59–66 (2009)

    CAS  Article  Google Scholar 

  20. 20

    Amouroux, R., Campalans, A., Epe, B. & Radicella, J. P. Oxidative stress triggers the preferential assembly of base excision repair complexes on open chromatin regions. Nucleic Acids Res. 38, 2878–2890 (2010)

    CAS  Article  Google Scholar 

  21. 21

    Chaudhuri, S., Wyrick, J. J. & Smerdon, M. J. Histone H3 Lys79 methylation is required for efficient nucleotide excision repair in a silenced locus of Saccharomyces cerevisiae. Nucleic Acids Res. 37, 1690–1700 (2009)

    CAS  Article  Google Scholar 

  22. 22

    Murga, M. et al. Global chromatin compaction limits the strength of the DNA damage response. J. Cell Biol. 178, 1101–1108 (2007)

    CAS  Article  Google Scholar 

  23. 23

    Hiratani, I. et al. Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res. 20, 155–169 (2010)

    CAS  Article  Google Scholar 

  24. 24

    Lubelsky, Y. et al. DNA replication and transcription programs respond to the same chromatin cues. Genome Res. 24, 1102–1114 (2014)

    CAS  Article  Google Scholar 

  25. 25

    Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245 (2008)

    Article  Google Scholar 

  26. 26

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012)

    CAS  Article  Google Scholar 

  27. 27

    Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013)

    CAS  Article  Google Scholar 

  28. 28

    Roberts, N. D. et al. A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics 29, 2223–2230 (2013)

    CAS  Article  Google Scholar 

  29. 29

    The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013)

  30. 30

    Derrien, T. et al. Fast computation and applications of genome mappability. PLoS ONE 7, e30377 (2012)

    CAS  ADS  Article  Google Scholar 

  31. 31

    Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013)

    CAS  Article  Google Scholar 

  32. 32

    Kim, T.-M., Laird, P. W. & Park, P. J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–868 (2013)

    CAS  Article  Google Scholar 

  33. 33

    Pawlik, T. M., Raut, C. P. & Rodriguez-Bigas, M. A. Colorectal carcinogenesis: MSI-H versus MSI-L. Dis. Markers 20, 199–206 (2004)

    Article  Google Scholar 

  34. 34

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)

    CAS  Article  Google Scholar 

  35. 35

    Wagner, G. P., Kin, K. & Lynch, V. J. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 131, 281–285 (2012)

    CAS  Article  Google Scholar 

  36. 36

    Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA 107, 139–144 (2010)

    CAS  ADS  Article  Google Scholar 

  37. 37

    Thurman, R. E., Day, N., Noble, W. S. & Stamatoyannopoulos, J. A. Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 17, 917–927 (2007)

    CAS  Article  Google Scholar 

  38. 38

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007)

    CAS  Article  Google Scholar 

  39. 39

    Jackson, D. A. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204–2214 (1993)

    Article  Google Scholar 

  40. 40

    Mebane, W. R. & Sekhon, J. S. Genetic optimization using derivatives: the rgenoud package for R. J. Stat. Softw. 42, 473–487 (2010)

    Google Scholar 

Download references


This work was supported by grants from the Spanish Ministry of Economy and Competitiveness (BFU2011-26206 and ‘Centro de Excelencia Severo Ochoa 2013-2017’ SEV-2012-0208), a European Research Council Consolidator grant IR-DC (616434), Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR), the EMBO Young Investigator Program, the EMBL-CRG Systems Biology Program, FP7 project 4DCellFate (277899), FP7 project MAESTRA (ICT-2013-612944) and by Marie Curie Actions.

Author information




F.S. performed all analyses. F.S. and B.L. designed analyses, interpreted the data and wrote the manuscript.

Corresponding author

Correspondence to Ben Lehner.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Overall mutational burden and megabase-scale regional rate variability in tumour samples of MSI-prone cancer types.

a, b, Correlations of tissue specificity (TS; see Methods) in regional mutation rates of diffuse large B-cell lymphoma (DLBC) with TS of gene expression in DLBC (a), or with TS of replication timing in the Gm12878 lymphoblastoid cell line (b). c, Overall mutational load, as SNVs per Mb of alignable genomic DNA (Methods) for MSI-H, MSS (includes MSI-L), PolE mutant tumours, or otherwise hypermutated tumour samples. d, PC plot with PCs 3 and 4, as in Fig. 1e, but showing only tumour samples for colorectal (CRAD), uterine (UCEC) and stomach (STAD) cancers for visual emphasis. e, f, Relative SNV frequencies across 1 Mb windows of chromosome 1p in UCEC and STAD. Unbroken and dotted lines are the median across tumour samples and its 95% confidence interval, respectively. For each tumour sample, relative mutation frequencies are always obtained by dividing by the mean of all 1 Mb windows. MSI/PolE samples are in the MSI-H group; hyper/ultramutators are not in the MSS group.

Extended Data Figure 2 Reduced correlation of regional mutation rates to gene expression, heterochromatin and replication timing in genomes and exomes of MSI tumours.

ac, The 1 Mb windows in the genome were pooled into five equal-frequency bins by the average gene expression levels (log2 transcripts per million (TPM)) in each window. The median and interquartile range of relative mutation rates across 1 Mb windows is shown for each bin. R2 values were always determined on original (not binned) data. P < 0.01 for difference of R after Fisher Z-transform. Gene expression levels are medians over TPM across 15 cancer types. Relative SNV frequencies of each tumour sample were obtained by normalizing by the average SNV density of all genomic 1 Mb windows of that sample. Prior to binning the windows, cancer samples in a group were combined by taking the median of the relative mutation frequencies for each 1 Mb window, as illustrated for CRAD in Fig. 2d. PolE/MSI samples are in the MSI group; ultramutators are not in the MSS group. MSI-L samples are pooled with MSS. df, Same as in ac but for five heterochromatin bins (median H3K9me3 signal over eight tissues and cell lines). gi, Regional mutation rates in exome sequences of a broader set of 195 MSI-H tumour samples. The 1,709 genomic 1 Mb windows with at least 5 kb alignable protein-coding DNA each were grouped into five equal-frequency bins by the median Repli-Seq signal over 11 cell lines (Methods). Mutations were pooled across all samples in one cancer type with a known MSI-H or MSS status (Methods). a is the slope of the regression line fit to binned data. j, Slopes a determined for individual cancer exomes with a sufficient number of mutations (≥50 SNVs). Number of samples n shown below each group. For all cancer types, MSI-H samples have significantly less negative slopes than MSS (P < 0.01, Mann–Whitney test, one tailed). MSI-H also includes the MSI-H/PolE mutant samples, and MSS includes the MSI-L samples. In the exome analyses, ultramutators were not considered separately.

Extended Data Figure 3 Association of mutational signatures to microsatellite instability and to replication timing.

a, Relative frequencies of the 96 mutation contexts (strand symmetric) in MSI versus MSS cancers; the MSS group includes MSI-L samples but not MSS/PolE ultramutators. Mutations were pooled across samples of MSI-prone tissues (CRAD, UCEC and STAD). b, c, Similar to Fig. 3a, b, showing two additional examples of mutational contexts with different MSI propensities and their relative mutation rates across five genomic replication timing bins. d, Lack of correlation between the MSI propensity of a mutational context with its replication timing slope in MSS tumour samples (compare to Fig. 3c, which shows slopes in MSI samples). Ts, transition; Tv, transversion. e, f, Association of per cent MSI-specific signatures (CCN > CAN + GCN > GTN + [C/T]AN > [C/T]GN) across cancer samples and the binned replication timing slopes for two non-MSI transition signatures in the same samples. Slopes averaged over contexts are displayed in each plot. In all panels except a, mutation rates were normalized to number of nucleotides at risk in a 1 Mb window before determining the replication timing slopes.

Extended Data Figure 4 The deconvolution of MSI mutational spectra robustly converges onto two equivalent solutions.

a, Agreement of the observed relative frequencies of mutational contexts in each tumour sample with the predictions of model 1 (having median a, b and z coefficients across all solutions in cluster 1). b, Sets of best-fit solutions determined in a hundred optimization runs initialized with different starting conditions. The solutions cluster into two homogeneous clusters (Pearson R > 0.9 between >90% of the solutions within a cluster, in UPGMA hierarchical clustering). c, d, Solutions within both clusters have similar fit to observed data (c) and make extremely similar predictions for mutation spectra in tumour samples (d). eh, Similar to Fig. 4a, b. Example mutation accumulation diagrams for two mutation contexts typical of MSI tumours, shown for an example MSI tumour TCGA-BR-4280 (e, g) and for an MSS tumour TCGA-CD-8529 (f, h). i, j, Values of the parameters in two solution clusters, with medians and interquartile ranges (shown as whiskers). Each solution encompasses 104 parameters: relative mutation rates a and b for each of 28 mutational contexts (i), and the relative pre-MMR failure time z for each tumour sample of the 24 MSI and 24 MSS samples (j).

Supplementary information

Supplementary Table 1

This table contains genome data sources. (XLSX 53 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Supek, F., Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing