Cancer genome sequencing has revealed considerable variation in somatic mutation rates across the human genome, with mutation rates elevated in heterochromatic late replicating regions and reduced in early replicating euchromatin1,2,3,4,5. Multiple mechanisms have been suggested to underlie this2,6,7,8,9,10, but the actual cause is unknown. Here we identify variable DNA mismatch repair (MMR) as the basis of this variation. Analysing ∼17 million single-nucleotide variants from the genomes of 652 tumours, we show that regional autosomal mutation rates at megabase resolution are largely stable across cancer types, with differences related to changes in replication timing and gene expression. However, mutations arising after the inactivation of MMR are no longer enriched in late replicating heterochromatin relative to early replicating euchromatin. Thus, differential DNA repair and not differential mutation supply is the primary cause of the large-scale regional mutation rate variation across the human genome.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Hodgkinson, A., Chen, Y. & Eyre-Walker, A. The large-scale distribution of somatic mutations in cancer genomes. Hum. Mutat. 33, 136–143 (2012)
Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012)
Woo, Y. H. & Li, W.-H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nature Commun. 3, 1004 (2012)
Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)
Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nature Commun. 4, 1502 (2013)
Stamatoyannopoulos, J. A. et al. Human mutation rate associated with DNA replication timing. Nature Genet. 41, 393–395 (2009)
Waters, L. S. & Walker, G. C. The critical mutagenic translesion DNA polymerase Rev1 is highly expressed during G2/M phase rather than S phase. Proc. Natl Acad. Sci. USA 103, 8971–8976 (2006)
Hsu, T. C. A possible function of constitutive heterochromatin: the bodyguard hypothesis. Genetics 79 (suppl.). 137–150 (1975)
Sima, J. & Gilbert, D. M. Complex correlations: replication timing and mutational landscapes during cancer and genome evolution. Curr. Opin. Genet. Dev. 25, 93–100 (2014)
Chen, C.-L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010)
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013)
Jäger, N. et al. Hypermutation of the inactive X chromosome is a frequent event in cancer. Cell 155, 567–581 (2013)
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012)
The Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013)
The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014)
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)
Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nature Rev. Genet. 15, 585–598 (2014)
Hombauer, H., Srivatsan, A., Putnam, C. D. & Kolodner, R. D. Mismatch repair, but not heteroduplex rejection, is temporally coupled to DNA replication. Science 334, 1713–1716 (2011)
Edelbrock, M. A., Kaliyaperumal, S. & Williams, K. J. DNA mismatch repair efficiency and fidelity are elevated during DNA synthesis in human cells. Mutat. Res. 662, 59–66 (2009)
Amouroux, R., Campalans, A., Epe, B. & Radicella, J. P. Oxidative stress triggers the preferential assembly of base excision repair complexes on open chromatin regions. Nucleic Acids Res. 38, 2878–2890 (2010)
Chaudhuri, S., Wyrick, J. J. & Smerdon, M. J. Histone H3 Lys79 methylation is required for efficient nucleotide excision repair in a silenced locus of Saccharomyces cerevisiae. Nucleic Acids Res. 37, 1690–1700 (2009)
Murga, M. et al. Global chromatin compaction limits the strength of the DNA damage response. J. Cell Biol. 178, 1101–1108 (2007)
Hiratani, I. et al. Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res. 20, 155–169 (2010)
Lubelsky, Y. et al. DNA replication and transcription programs respond to the same chromatin cues. Genome Res. 24, 1102–1114 (2014)
Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245 (2008)
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Bioinformatics 28, 1811–1817 (2012)
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nature Biotechnol. 31, 213–219 (2013)
Roberts, N. D. et al. A comparative analysis of algorithms for somatic SNV detection in cancer. Bioinformatics 29, 2223–2230 (2013)
The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013)
Derrien, T. et al. Fast computation and applications of genome mappability. PLoS ONE 7, e30377 (2012)
Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013)
Kim, T.-M., Laird, P. W. & Park, P. J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–868 (2013)
Pawlik, T. M., Raut, C. P. & Rodriguez-Bigas, M. A. Colorectal carcinogenesis: MSI-H versus MSI-L. Dis. Markers 20, 199–206 (2004)
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011)
Wagner, G. P., Kin, K. & Lynch, V. J. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 131, 281–285 (2012)
Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA 107, 139–144 (2010)
Thurman, R. E., Day, N., Noble, W. S. & Stamatoyannopoulos, J. A. Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 17, 917–927 (2007)
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007)
Jackson, D. A. Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 74, 2204–2214 (1993)
Mebane, W. R. & Sekhon, J. S. Genetic optimization using derivatives: the rgenoud package for R. J. Stat. Softw. 42, 473–487 (2010)
This work was supported by grants from the Spanish Ministry of Economy and Competitiveness (BFU2011-26206 and ‘Centro de Excelencia Severo Ochoa 2013-2017’ SEV-2012-0208), a European Research Council Consolidator grant IR-DC (616434), Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR), the EMBO Young Investigator Program, the EMBL-CRG Systems Biology Program, FP7 project 4DCellFate (277899), FP7 project MAESTRA (ICT-2013-612944) and by Marie Curie Actions.
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Overall mutational burden and megabase-scale regional rate variability in tumour samples of MSI-prone cancer types.
a, b, Correlations of tissue specificity (TS; see Methods) in regional mutation rates of diffuse large B-cell lymphoma (DLBC) with TS of gene expression in DLBC (a), or with TS of replication timing in the Gm12878 lymphoblastoid cell line (b). c, Overall mutational load, as SNVs per Mb of alignable genomic DNA (Methods) for MSI-H, MSS (includes MSI-L), PolE mutant tumours, or otherwise hypermutated tumour samples. d, PC plot with PCs 3 and 4, as in Fig. 1e, but showing only tumour samples for colorectal (CRAD), uterine (UCEC) and stomach (STAD) cancers for visual emphasis. e, f, Relative SNV frequencies across 1 Mb windows of chromosome 1p in UCEC and STAD. Unbroken and dotted lines are the median across tumour samples and its 95% confidence interval, respectively. For each tumour sample, relative mutation frequencies are always obtained by dividing by the mean of all 1 Mb windows. MSI/PolE samples are in the MSI-H group; hyper/ultramutators are not in the MSS group.
Extended Data Figure 2 Reduced correlation of regional mutation rates to gene expression, heterochromatin and replication timing in genomes and exomes of MSI tumours.
a–c, The 1 Mb windows in the genome were pooled into five equal-frequency bins by the average gene expression levels (log2 transcripts per million (TPM)) in each window. The median and interquartile range of relative mutation rates across 1 Mb windows is shown for each bin. R2 values were always determined on original (not binned) data. P < 0.01 for difference of R after Fisher Z-transform. Gene expression levels are medians over TPM across 15 cancer types. Relative SNV frequencies of each tumour sample were obtained by normalizing by the average SNV density of all genomic 1 Mb windows of that sample. Prior to binning the windows, cancer samples in a group were combined by taking the median of the relative mutation frequencies for each 1 Mb window, as illustrated for CRAD in Fig. 2d. PolE/MSI samples are in the MSI group; ultramutators are not in the MSS group. MSI-L samples are pooled with MSS. d–f, Same as in a–c but for five heterochromatin bins (median H3K9me3 signal over eight tissues and cell lines). g–i, Regional mutation rates in exome sequences of a broader set of 195 MSI-H tumour samples. The 1,709 genomic 1 Mb windows with at least 5 kb alignable protein-coding DNA each were grouped into five equal-frequency bins by the median Repli-Seq signal over 11 cell lines (Methods). Mutations were pooled across all samples in one cancer type with a known MSI-H or MSS status (Methods). a is the slope of the regression line fit to binned data. j, Slopes a determined for individual cancer exomes with a sufficient number of mutations (≥50 SNVs). Number of samples n shown below each group. For all cancer types, MSI-H samples have significantly less negative slopes than MSS (P < 0.01, Mann–Whitney test, one tailed). MSI-H also includes the MSI-H/PolE mutant samples, and MSS includes the MSI-L samples. In the exome analyses, ultramutators were not considered separately.
Extended Data Figure 3 Association of mutational signatures to microsatellite instability and to replication timing.
a, Relative frequencies of the 96 mutation contexts (strand symmetric) in MSI versus MSS cancers; the MSS group includes MSI-L samples but not MSS/PolE ultramutators. Mutations were pooled across samples of MSI-prone tissues (CRAD, UCEC and STAD). b, c, Similar to Fig. 3a, b, showing two additional examples of mutational contexts with different MSI propensities and their relative mutation rates across five genomic replication timing bins. d, Lack of correlation between the MSI propensity of a mutational context with its replication timing slope in MSS tumour samples (compare to Fig. 3c, which shows slopes in MSI samples). Ts, transition; Tv, transversion. e, f, Association of per cent MSI-specific signatures (CCN > CAN + GCN > GTN + [C/T]AN > [C/T]GN) across cancer samples and the binned replication timing slopes for two non-MSI transition signatures in the same samples. Slopes averaged over contexts are displayed in each plot. In all panels except a, mutation rates were normalized to number of nucleotides at risk in a 1 Mb window before determining the replication timing slopes.
Extended Data Figure 4 The deconvolution of MSI mutational spectra robustly converges onto two equivalent solutions.
a, Agreement of the observed relative frequencies of mutational contexts in each tumour sample with the predictions of model 1 (having median a, b and z coefficients across all solutions in cluster 1). b, Sets of best-fit solutions determined in a hundred optimization runs initialized with different starting conditions. The solutions cluster into two homogeneous clusters (Pearson R > 0.9 between >90% of the solutions within a cluster, in UPGMA hierarchical clustering). c, d, Solutions within both clusters have similar fit to observed data (c) and make extremely similar predictions for mutation spectra in tumour samples (d). e–h, Similar to Fig. 4a, b. Example mutation accumulation diagrams for two mutation contexts typical of MSI tumours, shown for an example MSI tumour TCGA-BR-4280 (e, g) and for an MSS tumour TCGA-CD-8529 (f, h). i, j, Values of the parameters in two solution clusters, with medians and interquartile ranges (shown as whiskers). Each solution encompasses 104 parameters: relative mutation rates a and b for each of 28 mutational contexts (i), and the relative pre-MMR failure time z for each tumour sample of the 24 MSI and 24 MSS samples (j).
About this article
Cite this article
Supek, F., Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015). https://doi.org/10.1038/nature14173
BMC Medical Genomics (2021)
Nature Reviews Genetics (2021)
Imaging the response to DNA damage in heterochromatin domains reveals core principles of heterochromatin maintenance
Nature Communications (2021)