Promoters are DNA sequences that have an essential role in controlling gene expression. While recent whole cancer genome analyses have identified numerous hotspots of somatic point mutations within promoters, many have not yet been shown to perturb gene expression or drive cancer development1,2,3,4. As such, positive selection alone may not adequately explain the frequency of promoter point mutations in cancer genomes. Here we show that increased mutation density at gene promoters can be linked to promoter activity and differential nucleotide excision repair (NER). By analysing 1,161 human cancer genomes across 14 cancer types, we find evidence for increased local density of somatic point mutations within the centres of DNase I-hypersensitive sites (DHSs) in gene promoters. Mutated DHSs were strongly associated with transcription initiation activity, in which active promoters but not enhancers of equal DNase I hypersensitivity were most mutated relative to their flanking regions. Notably, analysis of genome-wide maps of NER5 shows that NER is impaired within the DHS centre of active gene promoters, while XPC-deficient skin cancers do not show increased promoter mutation density, pinpointing differential NER as the underlying cause of these mutation hotspots. Consistent with this finding, we observe that melanomas with an ultraviolet-induced DNA damage mutation signature show greatest enrichment of promoter mutations, whereas cancers that are not highly dependent on NER, such as colon cancer, show no sign of such enrichment. Taken together, our analysis has uncovered the presence of a previously unknown mechanism linking transcription initiation and NER as a major contributor of somatic point mutation hotspots at active gene promoters in cancer genomes.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nature Genet. 46, 1258–1263 (2014)
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nature Genet. 46, 1160–1165 (2014)
Melton, C., Reuter, J. A., Spacek, D. V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nature Genet. 47, 710–716 (2015)
Poulos, R. C. et al. Systematic screening of promoter regions pinpoints functional cis-regulatory mutations in a cutaneous melanoma genome. Mol. Cancer Res. 13, 1218–1226 (2015)
Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015)
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)
Mellon, I., Spivak, G. & Hanawalt, P. C. Selective removal of transcription-blocking DNA damage from the transcribed strand of the mammalian DHFR gene. Cell 51, 241–249 (1987)
Zheng, C. L. et al. Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 9, 1228–1234 (2014)
Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)
Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010)
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013)
Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nature Commun. 4, 1502 (2013)
Polak, P. et al. Reduced local mutation density in regulatory DNA of cancer genomes is linked to DNA repair. Nature Biotechnol. 32, 71–75 (2014)
Schuster-Böckler, B. & Lehner, B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488, 504–507 (2012)
Woo, Y. H. & Li, W. H. DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes. Nature Commun. 3, 1004 (2012)
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015)
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015)
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)
De Santa, F. et al. A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLoS Biol. 8, e1000384 (2010)
Koch, F. et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature Struct. Mol. Biol. 18, 956–963 (2011)
Tommasi, S., Oxyzoglou, A. B. & Pfeifer, G. P. Cell cycle-independent removal of UV-induced pyrimidine dimers from the promoter and the transcription initiation domain of the human CDC2 gene. Nucleic Acids Res. 28, 3991–3998 (2000)
Tu, Y., Tornaletti, S. & Pfeifer, G. P. DNA repair domains within a human gene: selective repair of sequences near the transcription initiation site. EMBO J. 15, 675–683 (1996)
Tornaletti, S. & Pfeifer, G. P. UV light as a footprinting agent: modulation of UV-induced DNA damage by transcription factors bound at the promoters of three human genes. J. Mol. Biol. 249, 714–728 (1995)
Rochette, P. J. et al. Influence of cytosine methylation on ultraviolet-induced cyclobutane pyrimidine dimer formation in genomic DNA. Mutat. Res. 665, 7–13 (2009)
Cannistraro, V. J., Pondugula, S., Song, Q. & Taylor, J. S. Rapid deamination of cyclobutane pyrimidine dimer photoproducts at TCG sites in a translationally and rotationally positioned nucleosome in vivo. J. Biol. Chem. 290, 26597–26609 (2015)
Gunz, D., Hess, M. T. & Naegeli, H. Recognition of DNA adducts by human nucleotide excision repair. Evidence for a thermodynamic probing mechanism. J. Biol. Chem. 271, 25089–25098 (1996)
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013)
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013)
Wilks, C. et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database 2014, bau093 (2014)
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012)
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010)
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Futreal, P. A. et al. A census of human cancer genes. Nature Rev. Cancer 4, 177–183 (2004)
Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005)
Berezikov, E., Guryev, V., Plasterk, R. H. & Cuppen, E. CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res. 14, 170–178 (2004)
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012)
Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013)
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)
Zhou, X. et al. The Human Epigenome Browser at Washington University. Nature Methods 8, 989–990 (2011)
Ramírez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014)
The authors thank TCGA, ICGC as well as numerous other groups who have made their data available for public analysis. The authors additionally thank members of Intersect Pty Ltd for providing high-performance computing resources and data storage used in this study. This work was funded by Cancer Institute NSW (13/DATA/1-02) and the Cure Cancer Foundation Australia with the assistance of Cancer Australia, through the Priority-driven Collaborative Cancer Research Scheme (APP1057921) to J.W.H.W. D.P. is supported by a UNSW Australia post-graduate scholarship, R.C.P is supported by an Australian Postgraduate Award, D.B. is supported by a National Health and Medical Research Council Early Career Fellowship (APP1073768), J.E.P. is funded by the National Health and Medical Research Council (Australia) and J.W.H.W. is supported by an Australian Research Council Future Fellowship (FT130100096).
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Relationship between sequence composition and trinucleotide mutation signatures in promoter DHS mutations.
a, The mutation density ratio of promoter DHS/DHS flanking regions (DHS ±1 kb) in melanoma, astrocytoma, lung, ovarian and oesophageal cancers before and after adjustment by percentage GC content or trinucleotide frequencies within the respective regions. Adjustment was performed by dividing the mutation density in the promoter DHS and DHS flanking region by the percentage GC ratio or trinucleotide frequencies in the two regions, respectively. *P < 0.05, **P < 0.01, ***P < 0.001 (χ2 test). b, Heatmap showing the relative frequency of each trinucleotide mutation signature across all samples with greater than 8,602 mutations (see Extended Data Fig. 3a). Unsupervised hierarchical clustering was used to define clusters based on the trinucleotide mutation signature of each sample. The promoter DHS/±1 kb flank mutation density ratio for each sample is shown at the leaf of the dendrogram where black and white depicts the highest and lowest ratios, respectively. The cancer type is colour-coded as defined by the key on the right of the figure. c–e, Mutations were separated into 6 classes and relative frequency was evaluated over promoter DHSs and genic regions in melanoma, ovarian and lung cancer according to the template and non-template strands relative to the associated gene. ***P < 0.001 (χ2 test).
Extended Data Figure 2 Comparison of mutation signatures in promoter DHS and its ±1-kb flanking region.
a, d, g, Trinucleotide mutation signatures within promoter DHSs and ±1 kb flanking regions in melanoma, ovarian and lung cancer, respectively. All signatures have been normalized by trinucleotide frequencies within their respective regions. b, e, h, Correlation of the normalized trinucleotide mutation signature frequencies in the promoter DHS versus the ±1 kb flanking region in melanoma, ovarian and lung cancers, respectively. The Pearson’s correlation was calculated by linear regression. c, f, i, Comparison of the distribution of each of the 6 mutation classes in promoter DHSs and ±1 kb flanking regions with mutation counts normalized by GC frequency. There are significantly more C > T mutations in melanoma (P < 0.001, χ2 test) and more T > N mutations in ovarian (P < 0.001, χ2 test) and lung cancers (P < 0.001, χ2 test) based on mutation counts normalized by GC frequency.
Extended Data Figure 3 Distribution of promoter DHS mutations in relation to genome mutation load and across chromosomes.
a, Mutation power analysis for detection of significant promoter DHS mutation enrichment. Bootstrapping analysis was performed to assess the number of mutations required in an individual sample to achieve >95% confidence (that is, less than 5% of resampling resulting in a >2-fold enrichment of promoter DHS mutation density relative to its (±1 kb) flanking region. The dotted line marks the number of mutations (8,602) required to detect >2-fold enrichment of promoter DHS mutations relative to flanking region with at least 99% confidence and this threshold was used to select cancer samples for individual analysis. b, The mutation density ratio of the promoter DHS/±1 kb DHS flanking region for individual cancer genomes with at least 8,602 mutations plotted against genomic mutation density. c, The number of promoter DHS melanoma mutations and the number of genes within each chromosome. d, Circos plot showing the location of promoter DHS mutations (red lines) in TCGA melanoma sample TCGA-EE-A3J5.
Extended Data Figure 4 Mutation density is increased across the DHS centre of highly expressed genes.
a–c, Melanoma (a), ovarian (b) and lung (c) cancer mutation profiles and melanocyte (a), ovary cell (b) and A549 cell (c) DNase-seq cleavage profiles, respectively, centred around the TSS. Profiles for mutations were stratified by quartiles of gene expression while DNase-seq profiles were averaged across all genes. Mutation profiles were smoothed using 5 bp windows. Mutation density profiles are oriented according to strand. d, Mutation density in melanoma within and outside of digital genomic footprints within TSS −100 bp promoter regions of melanocytes. Mean mutation density is shown, together with 95% confidence intervals across all 36 samples. Footprinted regions represent transcription factor (TF) bound sites whereas non-footprinted regions represent unoccupied sites (**P < 0.01, paired t-test). e, Melanoma mutation and melanocyte DNase-seq cleavage profiles centred around 50 bp footprints identified within TSS −100 bp promoter regions. Mutation profiles were smoothed using 5 bp windows.
Extended Data Figure 5 Association between mutation density of the 6 mutation classes against chromatin accessibility.
Chromatin accessibility was measured by DNase-seq read coverage in bins of 100 bp. Slope (α) was calculated from the linear regression of the binned data.
Extended Data Figure 6 Comparison of promoter with enhancer mutation density and relationship between mutation density and DNA methylation in XPC−/− skin cancer.
a, b, CPD and 6–4PP XR-seq repair density ratio of DNase I hypersensitivity matched (a) active promoters and enhancers, and (b) ubiquitous and permissive enhancers relative to their DHS flanking regions. For promoters, a set of active promoters represented by the top 25% of nucleosome free promoters based on melanocyte DNase-seq data was used. A corresponding set of enhancers of equal size were selected with matching DNase-seq coverage. The error bar on the enhancer data set shows the interquartile range of repair ratios over 100 randomized samplings of enhancers with matching DNase-seq coverage. For the comparison of ubiquitous and permissive enhancers, the full set of ubiquitous enhancers from FANTOM5 (n = 200) were used and a matching set of equally nucleosome free permissive enhancers were sampled and repeated 100 times as described above (also see Methods). SCC XPCwt and XPC−/− mutation density ratio of DNase I hypersensitivity matched (c) active promoters and enhancers, and (d) ubiquitous and permissive enhancers relative to their DHS flanking regions. Promoter, enhancer, ubiquitous enhancer and permissive enhancer regions were generated as described for XR-seq data in a and b. e, Mutation density and methylation profile ±5 kb of the TSS of genes for XPC−/− SCC and normal human epithelial keratinocytes (NHEK), respectively. Methylation profiles were generated using the fraction methylation data calculated using whole genome bisulfite sequencing (bisulfite-seq) data from the Human Epigenome Atlas. f, Association between the average methylation level of gene promoters (TSS ±1 kb) and the density of [C/T]CpG mutations in XPC−/− SCC. The average methylation level of each promoter region was calculated using the mean fraction methylation of each [C/T]CpG within the region as measured by bisulfite-seq in NHEK cells. For a–d: *P < 0.05, **P < 0.01, ***P < 0.001, n.s., not significant (χ2 test).
Extended Data Figure 7 Schematic diagram of proposed mechanism leading to localized increased promoter mutation density.
DNA damage (such as CPD or 6–4PP) caused by ultraviolet (UV) irradiation is typically recognized by NER machinery such as XPC, to initiate DNA repair (left). In highly transcribed promoters, the transcription pre-initiation complex prevents repair machinery such as XPC from recognizing the DNA lesion (right), leaving it unrepaired and ultimately leading to mutation formation upon DNA replication.
This file contains Supplementary Text and Data, Supplementary Figures 1-2 and additional references. (PDF 521 kb)
This file contains Supplementary Tables 1-9. (XLSX 16527 kb)
This zipped file contains the scripts and annotation files used for data analysis. (ZIP 37196 kb)
About this article
Cite this article
Perera, D., Poulos, R., Shah, A. et al. Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes. Nature 532, 259–263 (2016). https://doi.org/10.1038/nature17437
Nucleosome positioning stability is a modulator of germline mutation rate variation across the human genome
Nature Communications (2020)
Pausing sites of RNA polymerase II on actively transcribed genes are enriched in DNA double-stranded breaks
Journal of Biological Chemistry (2020)
Current Opinion in Systems Biology (2020)
International Journal of Radiation Biology (2020)
Nature Cancer (2020)