Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes


Promoters are DNA sequences that have an essential role in controlling gene expression. While recent whole cancer genome analyses have identified numerous hotspots of somatic point mutations within promoters, many have not yet been shown to perturb gene expression or drive cancer development1,2,3,4. As such, positive selection alone may not adequately explain the frequency of promoter point mutations in cancer genomes. Here we show that increased mutation density at gene promoters can be linked to promoter activity and differential nucleotide excision repair (NER). By analysing 1,161 human cancer genomes across 14 cancer types, we find evidence for increased local density of somatic point mutations within the centres of DNase I-hypersensitive sites (DHSs) in gene promoters. Mutated DHSs were strongly associated with transcription initiation activity, in which active promoters but not enhancers of equal DNase I hypersensitivity were most mutated relative to their flanking regions. Notably, analysis of genome-wide maps of NER5 shows that NER is impaired within the DHS centre of active gene promoters, while XPC-deficient skin cancers do not show increased promoter mutation density, pinpointing differential NER as the underlying cause of these mutation hotspots. Consistent with this finding, we observe that melanomas with an ultraviolet-induced DNA damage mutation signature show greatest enrichment of promoter mutations, whereas cancers that are not highly dependent on NER, such as colon cancer, show no sign of such enrichment. Taken together, our analysis has uncovered the presence of a previously unknown mechanism linking transcription initiation and NER as a major contributor of somatic point mutation hotspots at active gene promoters in cancer genomes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Pan-cancer analysis of mutation distribution across the genome reveals enrichment at promoter DHSs.
Figure 2: Promoter mutation density is strongly linked with transcription activity.
Figure 3: Nucleotide excision repair in ultraviolet-irradiated human cells inversely mirrors mutation density in promoters and ubiquitous enhancers.
Figure 4: Mutation signatures and promoter DHS mutation density in melanoma and lung cancer.


  1. 1

    Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nature Genet. 46, 1258–1263 (2014)

    CAS  Article  Google Scholar 

  2. 2

    Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nature Genet. 46, 1160–1165 (2014)

    CAS  Article  Google Scholar 

  3. 3

    Melton, C., Reuter, J. A., Spacek, D. V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature Genet. 47, 710–716 (2015)

    CAS  Article  Google Scholar 

  4. 4

    Poulos, R. C. et al. Systematic screening of promoter regions pinpoints functional cis-regulatory mutations in a cutaneous melanoma genome. Mol. Cancer Res. 13, 1218–1226 (2015)

    CAS  Article  Google Scholar 

  5. 5

    Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015)

    CAS  Article  Google Scholar 

  6. 6

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)

    CAS  Article  Google Scholar 

  7. 7

    Mellon, I., Spivak, G. & Hanawalt, P. C. Selective removal of transcription-blocking DNA damage from the transcribed strand of the mammalian DHFR gene. Cell 51, 241–249 (1987)

    CAS  Article  Google Scholar 

  8. 8

    Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 9, 1228–1234 (2014)

    CAS  Article  Google Scholar 

  9. 9

    Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)

    CAS  ADS  Article  Google Scholar 

  10. 10

    Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010)

    CAS  ADS  Article  Google Scholar 

  11. 11

    Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013)

    Article  Google Scholar 

  12. 12

    Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nature Commun. 4, 1502 (2013)

    ADS  Article  Google Scholar 

  13. 13

    Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nature Biotechnol. 32, 71–75 (2014)

    CAS  Article  Google Scholar 

  14. 14

    Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012)

    ADS  Article  Google Scholar 

  15. 15

    Woo, Y. H. & Li, W. H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nature Commun. 3, 1004 (2012)

    ADS  Article  Google Scholar 

  16. 16

    Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015)

    CAS  ADS  Article  Google Scholar 

  17. 17

    Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015)

    CAS  ADS  Article  Google Scholar 

  18. 18

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)

    CAS  ADS  Article  Google Scholar 

  19. 19

    De Santa, F. et al. A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLoS Biol. 8, e1000384 (2010)

    Article  Google Scholar 

  20. 20

    Koch, F. et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature Struct. Mol. Biol. 18, 956–963 (2011)

    CAS  Article  Google Scholar 

  21. 21

    Tommasi, S., Oxyzoglou, A. B. & Pfeifer, G. P. Cell cycle-independent removal of UV-induced pyrimidine dimers from the promoter and the transcription initiation domain of the human CDC2 gene. Nucleic Acids Res. 28, 3991–3998 (2000)

    CAS  Article  Google Scholar 

  22. 22

    Tu, Y., Tornaletti, S. & Pfeifer, G. P. DNA repair domains within a human gene: selective repair of sequences near the transcription initiation site. EMBO J. 15, 675–683 (1996)

    CAS  Article  Google Scholar 

  23. 23

    Tornaletti, S. & Pfeifer, G. P. UV light as a footprinting agent: modulation of UV-induced DNA damage by transcription factors bound at the promoters of three human genes. J. Mol. Biol. 249, 714–728 (1995)

    CAS  Article  Google Scholar 

  24. 24

    Rochette, P. J. et al. Influence of cytosine methylation on ultraviolet-induced cyclobutane pyrimidine dimer formation in genomic DNA. Mutat. Res. 665, 7–13 (2009)

    CAS  Article  Google Scholar 

  25. 25

    Cannistraro, V. J., Pondugula, S., Song, Q. & Taylor, J. S. Rapid deamination of cyclobutane pyrimidine dimer photoproducts at TCG sites in a translationally and rotationally positioned nucleosome in vivo. J. Biol. Chem. 290, 26597–26609 (2015)

    CAS  Article  Google Scholar 

  26. 26

    Gunz, D., Hess, M. T. & Naegeli, H. Recognition of DNA adducts by human nucleotide excision repair. Evidence for a thermodynamic probing mechanism. J. Biol. Chem. 271, 25089–25098 (1996)

    CAS  Article  Google Scholar 

  27. 27

    The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)

  28. 28

    Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013)

    CAS  ADS  Article  Google Scholar 

  29. 29

    Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013)

    CAS  ADS  Article  Google Scholar 

  30. 30

    Wilks, C. et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database 2014, bau093 (2014)

    Article  Google Scholar 

  31. 31

    Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012)

    CAS  Article  Google Scholar 

  32. 32

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010)

    CAS  Article  Google Scholar 

  33. 33

    Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)

    CAS  Article  Google Scholar 

  34. 34

    The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

  35. 35

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  36. 36

    Futreal, P. A. et al. A census of human cancer genes. Nature Rev. Cancer 4, 177–183 (2004)

    CAS  Article  Google Scholar 

  37. 37

    Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005)

    CAS  Article  Google Scholar 

  38. 38

    Berezikov, E., Guryev, V., Plasterk, R. H. & Cuppen, E. CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res. 14, 170–178 (2004)

    CAS  Article  Google Scholar 

  39. 39

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    CAS  Article  Google Scholar 

  40. 40

    Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012)

    CAS  ADS  Article  Google Scholar 

  41. 41

    Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013)

    CAS  Article  Google Scholar 

  42. 42

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)

    CAS  Article  Google Scholar 

  43. 43

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    PubMed  PubMed Central  Google Scholar 

  44. 44

    Zhou, X. et al. The Human Epigenome Browser at Washington University. Nature Methods 8, 989–990 (2011)

    CAS  Article  Google Scholar 

  45. 45

    Ramírez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014)

    Article  Google Scholar 

Download references


The authors thank TCGA, ICGC as well as numerous other groups who have made their data available for public analysis. The authors additionally thank members of Intersect Pty Ltd for providing high-performance computing resources and data storage used in this study. This work was funded by Cancer Institute NSW (13/DATA/1-02) and the Cure Cancer Foundation Australia with the assistance of Cancer Australia, through the Priority-driven Collaborative Cancer Research Scheme (APP1057921) to J.W.H.W. D.P. is supported by a UNSW Australia post-graduate scholarship, R.C.P is supported by an Australian Postgraduate Award, D.B. is supported by a National Health and Medical Research Council Early Career Fellowship (APP1073768), J.E.P. is funded by the National Health and Medical Research Council (Australia) and J.W.H.W. is supported by an Australian Research Council Future Fellowship (FT130100096).

Author information




Project planning and design: J.E.P. and J.W.H.W. Method design and data analysis: D.P., R.C.P., A.S., D.B. and J.W.H.W. Manuscript writing and figures: D.P., R.C.P. and J.W.H.W. All authors reviewed and edited the final manuscript.

Corresponding author

Correspondence to Jason W. H. Wong.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Relationship between sequence composition and trinucleotide mutation signatures in promoter DHS mutations.

a, The mutation density ratio of promoter DHS/DHS flanking regions (DHS ±1 kb) in melanoma, astrocytoma, lung, ovarian and oesophageal cancers before and after adjustment by percentage GC content or trinucleotide frequencies within the respective regions. Adjustment was performed by dividing the mutation density in the promoter DHS and DHS flanking region by the percentage GC ratio or trinucleotide frequencies in the two regions, respectively. *P < 0.05, **P < 0.01, ***P < 0.001 (χ2 test). b, Heatmap showing the relative frequency of each trinucleotide mutation signature across all samples with greater than 8,602 mutations (see Extended Data Fig. 3a). Unsupervised hierarchical clustering was used to define clusters based on the trinucleotide mutation signature of each sample. The promoter DHS/±1 kb flank mutation density ratio for each sample is shown at the leaf of the dendrogram where black and white depicts the highest and lowest ratios, respectively. The cancer type is colour-coded as defined by the key on the right of the figure. ce, Mutations were separated into 6 classes and relative frequency was evaluated over promoter DHSs and genic regions in melanoma, ovarian and lung cancer according to the template and non-template strands relative to the associated gene. ***P < 0.001 (χ2 test).

Extended Data Figure 2 Comparison of mutation signatures in promoter DHS and its ±1-kb flanking region.

a, d, g, Trinucleotide mutation signatures within promoter DHSs and ±1 kb flanking regions in melanoma, ovarian and lung cancer, respectively. All signatures have been normalized by trinucleotide frequencies within their respective regions. b, e, h, Correlation of the normalized trinucleotide mutation signature frequencies in the promoter DHS versus the ±1 kb flanking region in melanoma, ovarian and lung cancers, respectively. The Pearson’s correlation was calculated by linear regression. c, f, i, Comparison of the distribution of each of the 6 mutation classes in promoter DHSs and ±1 kb flanking regions with mutation counts normalized by GC frequency. There are significantly more C > T mutations in melanoma (P < 0.001, χ2 test) and more T > N mutations in ovarian (P < 0.001, χ2 test) and lung cancers (P < 0.001, χ2 test) based on mutation counts normalized by GC frequency.

Extended Data Figure 3 Distribution of promoter DHS mutations in relation to genome mutation load and across chromosomes.

a, Mutation power analysis for detection of significant promoter DHS mutation enrichment. Bootstrapping analysis was performed to assess the number of mutations required in an individual sample to achieve >95% confidence (that is, less than 5% of resampling resulting in a >2-fold enrichment of promoter DHS mutation density relative to its (±1 kb) flanking region. The dotted line marks the number of mutations (8,602) required to detect >2-fold enrichment of promoter DHS mutations relative to flanking region with at least 99% confidence and this threshold was used to select cancer samples for individual analysis. b, The mutation density ratio of the promoter DHS/±1 kb DHS flanking region for individual cancer genomes with at least 8,602 mutations plotted against genomic mutation density. c, The number of promoter DHS melanoma mutations and the number of genes within each chromosome. d, Circos plot showing the location of promoter DHS mutations (red lines) in TCGA melanoma sample TCGA-EE-A3J5.

Extended Data Figure 4 Mutation density is increased across the DHS centre of highly expressed genes.

ac, Melanoma (a), ovarian (b) and lung (c) cancer mutation profiles and melanocyte (a), ovary cell (b) and A549 cell (c) DNase-seq cleavage profiles, respectively, centred around the TSS. Profiles for mutations were stratified by quartiles of gene expression while DNase-seq profiles were averaged across all genes. Mutation profiles were smoothed using 5 bp windows. Mutation density profiles are oriented according to strand. d, Mutation density in melanoma within and outside of digital genomic footprints within TSS −100 bp promoter regions of melanocytes. Mean mutation density is shown, together with 95% confidence intervals across all 36 samples. Footprinted regions represent transcription factor (TF) bound sites whereas non-footprinted regions represent unoccupied sites (**P < 0.01, paired t-test). e, Melanoma mutation and melanocyte DNase-seq cleavage profiles centred around 50 bp footprints identified within TSS −100 bp promoter regions. Mutation profiles were smoothed using 5 bp windows.

Extended Data Figure 5 Association between mutation density of the 6 mutation classes against chromatin accessibility.

Chromatin accessibility was measured by DNase-seq read coverage in bins of 100 bp. Slope (α) was calculated from the linear regression of the binned data.

Extended Data Figure 6 Comparison of promoter with enhancer mutation density and relationship between mutation density and DNA methylation in XPC−/− skin cancer.

a, b, CPD and 6–4PP XR-seq repair density ratio of DNase I hypersensitivity matched (a) active promoters and enhancers, and (b) ubiquitous and permissive enhancers relative to their DHS flanking regions. For promoters, a set of active promoters represented by the top 25% of nucleosome free promoters based on melanocyte DNase-seq data was used. A corresponding set of enhancers of equal size were selected with matching DNase-seq coverage. The error bar on the enhancer data set shows the interquartile range of repair ratios over 100 randomized samplings of enhancers with matching DNase-seq coverage. For the comparison of ubiquitous and permissive enhancers, the full set of ubiquitous enhancers from FANTOM5 (n = 200) were used and a matching set of equally nucleosome free permissive enhancers were sampled and repeated 100 times as described above (also see Methods). SCC XPCwt and XPC−/− mutation density ratio of DNase I hypersensitivity matched (c) active promoters and enhancers, and (d) ubiquitous and permissive enhancers relative to their DHS flanking regions. Promoter, enhancer, ubiquitous enhancer and permissive enhancer regions were generated as described for XR-seq data in a and b. e, Mutation density and methylation profile ±5 kb of the TSS of genes for XPC−/− SCC and normal human epithelial keratinocytes (NHEK), respectively. Methylation profiles were generated using the fraction methylation data calculated using whole genome bisulfite sequencing (bisulfite-seq) data from the Human Epigenome Atlas. f, Association between the average methylation level of gene promoters (TSS ±1 kb) and the density of [C/T]CpG mutations in XPC−/− SCC. The average methylation level of each promoter region was calculated using the mean fraction methylation of each [C/T]CpG within the region as measured by bisulfite-seq in NHEK cells. For ad: *P < 0.05, **P < 0.01, ***P < 0.001, n.s., not significant (χ2 test).

Extended Data Figure 7 Schematic diagram of proposed mechanism leading to localized increased promoter mutation density.

DNA damage (such as CPD or 6–4PP) caused by ultraviolet (UV) irradiation is typically recognized by NER machinery such as XPC, to initiate DNA repair (left). In highly transcribed promoters, the transcription pre-initiation complex prevents repair machinery such as XPC from recognizing the DNA lesion (right), leaving it unrepaired and ultimately leading to mutation formation upon DNA replication.

Extended Data Table 1 Comparison of promoter and enhancer mutation densities across 14 cancer types
Extended Data Table 2 Comparison of promoter DHS and promoter DHS flank (±1 kb) mutation density across 14 cancer types
Extended Data Table 3 Logistic regression analysis of mutated promoters (TSS −100 bp) in melanoma, ovarian and lung cancer against various genetic and epigenetic characteristics known to contribute to variations in mutation density in the genome

Supplementary information

Supplementary Data

This file contains Supplementary Text and Data, Supplementary Figures 1-2 and additional references. (PDF 521 kb)

Supplementary Tables

This file contains Supplementary Tables 1-9. (XLSX 16527 kb)

Supplementary Information

This zipped file contains the scripts and annotation files used for data analysis. (ZIP 37196 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Perera, D., Poulos, R., Shah, A. et al. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532, 259–263 (2016). https://doi.org/10.1038/nature17437

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing