Global unleashing of transcription elongation waves in response to genotoxic stress restricts somatic mutation rate

Complex molecular responses preserve gene expression accuracy and genome integrity in the face of environmental perturbations. Here we report that, in response to UV irradiation, RNA polymerase II (RNAPII) molecules are dynamically and synchronously released from promoter-proximal regions into elongation to promote uniform and accelerated surveillance of the whole transcribed genome. The maximised influx of de novo released RNAPII correlates with increased damage-sensing, as confirmed by RNAPII progressive accumulation at dipyrimidine sites and by the average slow-down of elongation rates in gene bodies. In turn, this transcription elongation ‘safe’ mode guarantees efficient DNA repair regardless of damage location, gene size and transcription level. Accordingly, we detect low and homogenous rates of mutational signatures associated with UV exposure or cigarette smoke across all active genes. Our study reveals a novel advantage for transcription regulation at the promoter-proximal level and provides unanticipated insights into how active transcription shapes the mutagenic landscape of cancer genomes.

Determination of constitutive gene expression status before stress (steady-state or NO UV). a Flow cytometry analysis of cell cycle. Percentages of cells in each cell cycle phase are shown for cells starved (+0.5% FBS) for 72 h and for cells released in complete medium (+10% FBS) for 3 to 5 h. b Average plots of RNAPII-ser2P ChIP-seq read densities on active genes (see d) larger than 60 kb from TSS to TSS + 60 kb before irradiation (NO UV). We compare cells starved (+0.5% FBS) for 72 h and released in complete medium (+10% FBS) for 3 h, with cells not starved, and cells not starved and treated with DMSO for 16 h prior release in normal medium for 1 h. c Correlation analysis of RNAPII-hypo, -ser5P and -ser2P ChIP-seq Densities around promoters before irradiation (Dp = density (reads per million) on promoter proximal regions, from -250 bp to +100 bp around TSS). Pearson's Correlation Coefficients (PCC) were calculated and reported on the correlation heat map. Intense red color represents a stronger correlation score. d Gene expression status for the curated gene list (see Supplementary Methods) was defined by intersecting genes with MACS2 peaks (on promoter proximal regions) for RNAPII-hypo (dark green), -ser5p (purple) and -ser2P (blue) ChIP-seq. For higher stringency, RNAPII-ser2P list was further filtered by selecting only the genes with Dp > 0.7 rpm. RNAPII-ser2P containing genes were considered as active (solid green), while the union of RNAPII-ser5P and -hypo genes that did not overlap -ser2P genes were defined as poised. The rest of the genes were considered as inactive. See Supplementary Data 2 and 3. e Comparison of gene expression status determined by RNAPII ChIP-seq in c with the status determined by nascent RNA-seq (nRNA-seq status: A = active for genes with RPKM greater than an arbitrary threshold located between the two peaks of the Kernell distribution; I =Inactive for the rest of the genes, see also Methods). f Same as in (c) for RNAPII-ser2P (Dp), nRNA-seq (RPKM gene) and mRNAseq (RPKM exon). g Heatmaps illustrating the distribution of RNAPII isoforms reads before UV irradiation, as aligned at individual (rows) promoter-proximal regions (-250 bp to +2 kb relative to TSS) and categorised by gene expression status, as defined in (d). Corresponding nascent (nRNA) and processed mature (mRNA) RNA expression levels (RPKM gene and exon, respectively) are shown for each row.  Fig. 2 Genome-wide reorganisation of RNAPII isoforms during UV stress response. a Average plots of read densities for RNAPII-ser2P, -ser5P and -hypo on active, poised and inactive genes from TSS to TSS + 2 kb, before (NO UV) and after (+UV) irradiation. b Same as in (a), but showing the difference in binding profiles of RNAPII isoforms after UV irradiation (8 J m -2 ) (Log 2 FC= (read density +UV)/ (read density NO UV)). c Individual comparison of escape index (EI, see Fig. 1b) before (NO UV) and after UV (+UV) for all active, poised and inactive genes. Percentages of genes with increased RNAPII escape from promoterproximal pausing (PPP) regions after UV (ΔEI >1, dark green dots above the y=x dotted line) are shown. d Summary heatmap representing the changes in percentages (blue=0% to red=100%) of genes with ΔEI > 1 for time series analysis of RNAPII-hypo, -ser5P, and -ser2P and for gene expression categories defined in (c). Chisquare test (Χ 2 ) compares the number of genes in active and poised categories for each condition and determines if the observed number of genes with ΔEI > 1 differs from expected value purely by chance. P values are indicated on the right and are considered significant if < 0.01. Data complied with normal distribution and tests were performed with the tool available at http://vassarstats.net/newcs.html.  )). e Analysis of ChIP-seq peak summits average positions relative to TSS per decile, as defined in (a), for RNAPII-ser2P and -ser5P. Median value is shown with black horizontal bar and error bars represent interquartile range. **: P < 0.0001, *P: < 0.0005 (paired t-test (two-tailed) was used for variance analysis). f Pairwise analysis of changes in decile's average peak summits distance to TSS (NO vs + UV irradiation for RNAPII-ser2P and -ser5P. **: P < 0.0001 (paired t-test (two-tailed)). g Analysis of shifts in ChIP-seq peak summits positions relative to TSS for conditions defined in (a) for RNAPII-ser2P and -ser5P.
Median value is shown with horizontal bar and boxes represent the 10-90 percentile. P = 0.0008 (paired t-test (two-tailed)). h UCSC snapshot of RNAPII-ser2P binding profiles before and after UV stress. Tracks are normalised to the minimum total number of reads sequenced across samples. Input is shown in black. Genes are shown at the bottom. The orange box highlights one representative gene displaying a time-dependent progressing wave of elongating RNAPII. i Average (n = 2,531) elongation rates (kb/min) for RNAPII-ser2P ChIP-seq were calculated from the average wave front positions determined in Fig. 1e. Gradients of colors reflect time intervals. j-k, Input-and ChIP-western blot analysis of RNAPII-CTD phosphorylation patterns with quantification (bottom, mean (±SEM) Log2 FC as compared to NO UV) (hypo: dark green, ser5P: purple and ser2P: light blue) before (NO) and after UV irradiation (8 J m -2 ). Input (j) amounts represent 5% of the starting material of cross-linked chromatin fraction (Histone H3 was used as a representative loading control). For ChIPs (k), equal amounts of anti-RPB1 antibody (raised against the N-terminus of RNAPII main subunit and therefore recognising all RNAPII isoforms: pan-RNAPII) were incubated with equal amounts of cross-linked chromatin. The material was resolved by 4-12% Bis-Tris Gel (NuPage), transferred and immuno-blotted with indicated antibodies, see also Methods. Hc indicates the heavy chain. Data shown reflects 2 independent experiments.   Fig. 4 Characterisation of the UV-dependent global and reversible changes of nascent RNA synthesis. a Average plots showing the changes in nRNA-seq read densities after UV irradiation (15 J m -2 ) in starved-released VH10 cells (nRNA was labeled with EU for 10 min before indicated time, see also Supplementary Methods). Gene length of genes larger than 100 kb from transcription start sites (TSS) to TSS +100 kb. During the early recovery period (0.5 to 2 h), depleted Log 2 FC scores show an increase in nRNA synthesis for 5' regions and a reduction for the rest of gene bodies. b Bar plots showing the mean (±SEM) of absolute density (rpm) of nRNA-seq reads in distal regions (50-100kb downstream of TSS) of active and inactive genes longer than 100 kb, as detected for VH10 (samples defined in (a)) and for HF1 cells (hTert immortalised wild type human fibroblasts) using the nascent RNA-seq data (Bru-seq) generated by Andrade-Lima et al.  FC IP/Input >16 Supplementary Fig. 9 De novo wave release upon UV genotoxic stress does not promote specific stalling of RNAPII at exon start loci. a Heatmaps illustrating the distribution of RNAPII-ser2P reads aligned around a representative subset of exon start loci spanning active genes (see also Methods and Supplementary Fig. 8c) before (NO UV) and after UV irradiation (+UV, 8J m -2 ) and sorted from left to right by increasing distances relative to TSS. Exon start loci were then clustered according to their position relative to RNAPII-ser2P wave front positions. Upstream clusters (i.e. I, II, III for + 2h) are enriched for de novo released RNAPII, and Downstream clusters (i.e. Clusters IV, V, VI for + 2h) for pri-elongating molecules (see Fig. 4 and Supplementary Fig. 8c). Loci near PPP-specific RNAPII signal were not considered for analysis. b Average plots of read densities (Rd) mapped in (a) for individual clusters (n is indicated).     I  II  III  IV  V  VI  I  II  III  IV  V

Flow cytometry
Cell cycle analysis was achieved via Propidium Iodide (PI) staining. Cells were plated in 10 cm plates 48 h prior to serum-deprivation and were released or not from serum-deprivation by the addition of FBS to a final concentration of 10 %. Cells were collected and fixed in a solution of PBS, 0.1% Glucose and 70% ethanol and were left for 1 day at -20 o C. After fixation, the cells were centrifuged, washed once with PBS and stained with 50μg/ml PI (SIGMA). Next, 100μg/ml RNase A (Thermo Fisher Scientific) was added and the cells were immediately transferred into the appropriate FACS tubes and were incubated in a platform shaker in the dark for 40 minutes. PBS was added to terminate the reaction. Signals were acquired and analysed with FACSDiva software (Version 6.0; BD Biosciences) using a FACS Canto II flow cytometer (BD Biosciences). Percentages of cells in each cell cycle phase were determined for cells starved (+0.5% FBS) for 72 h and for cells released in complete medium (+10% FBS) for 3 to 5 h.

ChIP-seq
At least two independent cultures of cells per condition were cross-linked with 1% (v/v) formaldehyde for 10 min at room temperature in culture media, and the reaction was stopped by the addition of glycine (125 mM final for 1h at 55°C, DNA was purified using the Ampure XP Beads (Agencourt) according to manufacturer's protocol without size selection (using a bead-to-sample ratio of 1.8).

Damaged-DNA immunoblot analysis (dot-or slot-blot)
ChIPed and Input DNA purified and quantified by Qubit assay (as detailed in the 'ChIP' section) were denatured by boiling for 5 min and placing on ice in a final concentration of 6 x SSC buffer. Samples were then spotted onto a washed (with TE) LI-COR Odyssey Nitrocellulose membrane (LI-COR Biosciences) using the MINIFOLD I slot-blot system as previously described with minor modifications 3 . Wells were then washed in 6 x SSC + 0.7 x TE buffer three times. Membrane was incubated at 80°C for 2 h and blocked for 1 h with a 1:1 solution of PBS and Odyssey Blocking Buffer (OBB, LI-COR Biosciences). The analysis was performed as described with minor modifications: the membrane was incubated overnight at 4°C with an antibody against CPDs (CosmoBio USA, CAC-NM-DND-001, Lot: TM C-05), and washed with PBS:OBB (1:1) buffer. The signal intensities were recorded as above for Western blots and quantified using the Image Studio software (LI-COR Biosciences). Blots shown are representative of at least two independent biological replicates.

mRNA-seq
Total RNA from two independent cultures of serum-starved synchronised and released in 10% FCS cells was extracted using Trizol reagent (Life Technologies) and genomic DNA was removed with Turbo DNAse (Ambion

Nascent RNA fluorescence microscopy (EU-Fluo)
For fluorescence microscopy of nascent RNA (EU-Fluo), cells were incubated with EU (0.1 mM) in duplicates for 120 min at different times before and after UV irradiation as indicated in the Figure legend, and as previously described 4 . Briefly, cells were washed with PBS after EU incubation, fixed and permeabilised. Click-IT was performed for 1 h at RT in the presence of 25 μM Alexa Fluor® 594 azide, 10 mM sodium ascorbate, 4 mM Copper (II) sulfate and RNaseOUT (Thermo Scientific). Results shown are representative of three independent biological replicates.

CPDIP-seq
Mapping of CPD lesions was achieved through modification of MeDIP-seq protocol (https://www.epigenesys.eu/images/stories/protocols/pdf/20111026125309_p33.pdf). Briefly, cells were crosslinked immediately after UV irradiation (8 J/m 2 ) and sheared Input DNA was purified as described in the 'ChIP' section. Purified dsDNA was denatured and the resulting single stranded (ss)DNA was immunoprecipitated as follows. Protein G dynabeads were washed and blocked in PBS-BSA 0.1% ('ChIP' section) before being incubated with anti-CPD antibody (Anti Cyclobutane Pyrimidine Dimers, clone TDM-2, cat: NMDND001, lot TMC-05) and ssDNA for 2 h at 4°C in 1 x CPDIP buffer (10 mM Na-Phosphate buffer pH 7.0 (1 M stock was prepared fresh as follows: 39 ml of 2 M monobasic sodium phosphate (NaH2PO4) (276 g/L) was added to 61 ml of 2 M dibasic sodium phosphate (Na2HPO4)(284 g/L)), 140 mM NaCl, 0.05 % Triton X-100). Samples were then washed in CPDIP buffer three times and DNA was eluted and precipitated. Dot-blot was performed to measure the enrichment of pulled-down DNA in UV-photolesions (CPDs) as compared to the levels of CPD detected in similar or indicated amounts of Input DNA (Qubit, see Figure legends). CPDIP bound and Input DNA were checked by qPCR analyses (data not shown) performed in duplicate reactions (see ChIP Section). Remaining amounts of CPDIP-material were then used for library preparation and sequencing after ssDNA was transformed to dsDNA as described in mRNA-seq section.

Reads alignment and normalisation
ChIP-Seq, CPDIP-seq, and Input-seq reads were quality trimmed and Illumina adaptors were removed with the cutadapt tool 5 , keeping resulting reads of at least 20 bp (cutadapt -a -q and -m parameters). Reads were then aligned uniquely to the GRCh37/hg19 reference genome using bwa (version 0.7.12) 6 allowing up to two mismatches (bwa aln default parameters). To gain better specificity on the tag enriched regions and increase sensitivity, non-redundant reads were complemented with a fraction of redundant reads 7 for normalising each sample of the set analysed during the experiment. For simplicity, only the steady-state Input is shown and was used as a representative sample among all sequenced Inputs (high correlation scores, data no shown). For nRNA-seq, reads were also quality trimmed and adaptor clipped as above. To remove rRNA reads, remaining reads were first aligned to the human ribosomal DNA complete repeating unit (U13369.1) allowing up to 1 mismatch using Bowtie 8 (version 1.1.0) (bowtien 1 -k 1 -m 1 and --un parameters), and the unmapped reads were aligned uniquely to the GRCh37/hg19 reference genome allowing up to 2 mismatches (bowtie -n 2 -k 1 and -m 1 parameters). For nRNA-seq both redundant and non-redundant intronic reads were considered in order to increase the dynamic range of our analysis and to limit the background of unlabeled mature RNA 9,10 . For preDRB-nRNA-seq (with DRB treatment) experiments, only non-redundant intronic reads were considered (sort -k1,1 -k2g,2 -u unix command) to limit the effect of variability in transcript coverage possibly generating artifacts during PCR amplification of the libraries as previously suggested 11,12 . For mRNAseq, reads were mapped with Tophat 2.0.12 13 (tophat2 --library-type fr-unstranded --keep-