Transcriptional overlap links DNA hypomethylation with DNA hypermethylation at adjacent promoters in cancer

Tumor development involves alterations in DNA methylation patterns, which include both gains (hypermethylation) and losses (hypomethylation) in different genomic regions. The mechanisms underlying these two opposite, yet co-existing, alterations in tumors remain unclear. While studying the human MAGEA6/GABRA3 gene locus, we observed that DNA hypomethylation in tumor cells can lead to the activation of a long transcript (CT-GABRA3) that overlaps downstream promoters (GABRQ and GABRA3) and triggers their hypermethylation. Overlapped promoters displayed increases in H3K36me3, a histone mark deposited during transcriptional elongation and known to stimulate de novo DNA methylation. Consistent with such a processive mechanism, increases in H3K36me3 and DNA methylation were observed over the entire region covered by the CT-GABRA3 overlapping transcript. Importantly, experimental induction of CT-GABRA3 by depletion of DNMT1 DNA methyltransferase, resulted in a similar pattern of regional DNA hypermethylation. Bioinformatics analyses in lung cancer datasets identified other genomic loci displaying this process of coupled DNA hypo/hypermethylation, and some of these included tumor suppressor genes, e.g. RERG and PTPRO. Together, our work reveals that focal DNA hypomethylation in tumors can indirectly contribute to hypermethylation of nearby promoters through activation of overlapping transcription, and establishes therefore an unsuspected connection between these two opposite epigenetic alterations.

Cancer development is driven in part by the accumulation of epigenetic alterations, which render chromatin permissive to changes in gene expression patterns. As a result, tumor cells acquire increased plasticity, thereby facilitating their evolution towards full malignancy 1 . Epigenetic alterations concern in particular DNA methylation, a chemical modification of cytosines in CpG sequences that is associated with long-term transcriptional repression 2 . DNA methylation changes in tumors include gains (hypermethylation) within normally unmethylated gene promoters, and at the same time losses (hypomethylation) in other genomic sequences 3 . The mechanisms underlying these contrasting changes in DNA methylation patterns in tumors are only partially elucidated, and evidence so far suggest that DNA hypermethylation and hypomethylation result from two independent processes 4,5 .
DNA hypermethylation has a well-established role in tumor progression, as it can lead to irreversible silencing of genes with tumor suppressor functions 6 . DNA hypomethylation on the other hand, appears to promote tumor development by increasing genomic instability, and by inducing ectopic activation of genes with oncogenic functions 7 . Although genes ectopically activated in hypomethylated tumors were reported to display different tissue specificities 8 , most of them were found to belong to the class of so-called "cancer-germline" (CG) genes, as their expression in healthy adults is normally restricted to testicular germ cells 9 . It has indeed been demonstrated that CG genes rely primarily on DNA methylation for repression in non-expressing cells, and that DNA demethylation is a sufficient trigger for their activation in a variety of tumors [10][11][12] . Evidence has accumulated indicating that some CG genes contribute to tumor progression, notably by encoding proteins that regulate processes of cell proliferation, death resistance, metabolic adaptation, and DNA repair 13,14 .
Recently, we discovered a novel CG transcript (CT-GABRA3) showing DNA hypomethylation-dependent activation in a variety of tumors, including melanoma and lung cancer 15,16 . The CT-GABRA3 transcript is noncoding, extends over a large distance (530 kb), and overlaps the Gamma-Aminobutyric Acid Type A Receptor Vertical bars indicate location of CpG sites with positions relative to the transcription start site. Open and filled squares represent unmethylated and methylated CpG sites, respectively, and each row represents a single clone. CT-GABRA3 expression status (+) or (−) in melanoma cell lines is indicated. (C) Melanoma tissue samples from the TCGA were grouped according to CT-GABRA3 expression status (inferred from RNA-seq data), and the methylation level of three CpG sites embedded within the GABRA3 5′-region (position relative to TSS) were determined through the analysis of Infinium methylation data (probe intensity ratio). *** Welch's t-test, adjusted p-value < 0.001 (D) Methylation level of CpG sites within the MAGEA6/CT-GABRA3 and GABRA3 5′-regions in lung adenocarcinoma (LUAD) cell lines (CpG positions are expressed relative to the TSS). Methylation levels were calculated on the basis of Methyl-seq data from the DBTSS database. CT-GABRA3 expression status in LUAD cell lines was inferred from RNA-seq data. (E) The same analysis as described in D was applied to lung adenocarcinoma samples from the TCGA. ** and ***Mann-Whitney test, adjusted p-value < 0.01 and < 0.001, respectively.  2C). Analysis of TCGA methylomic datasets confirmed the existence of a similar profile of regional DNA hypermethylation, limited to the CT-GABRA3 transcription unit, in melanoma and lung adenocarcinoma tissue samples (Fig. 2D).

Scientific Reports
Experimental evidence demonstrating that DNA hypomethylation/activation of CT-GABRA3 induces de novo methylation of overlapped CpGs. So   www.nature.com/scientificreports/ were only correlative. To establish a direct link between these two events, we explored experimental results obtained by O'Neill and colleagues 23 , who generated three immortalized human fibroblast clones (of male origin) in which DNMT1 DNA methyltransferase was depleted following stable transfection of a specific shRNA vector (Fig. 3A). Interestingly, the authors reported that experimental depletion of DNMT1 resulted not only in losses, but also in gains of DNA methylation 23 . We decided to explore O'Neill's datasets to find out the changes that occurred within the MAGEA6/GABRA3 locus. Previous reports demonstrated that DNMT1 plays a key role in maintaining silencing of CG genes [24][25][26] . Consistently, cDNA microarray data revealed concurrent upregulation of MAGEA6 and (CT-)GABRA3 (microarray probes do not distinguish CT-GABRA3 and GABRA3 variants) in DNMT1-depleted cell clones (Fig. 3B). We then analyzed Infinium methylation assay datasets generated for the different groups of cells to evaluate methylation levels of CpGs located in either the CT-GABRA3 transcription unit (region B) or in the neighboring regions (region A and C, see Fig. 2B). The results revealed that, compared with the control cell line, all three DNMT1-depleted cell clones displayed significant increases of CpG methylation within the CT-GABRA3 transcription unit (region B, Fig. 3C). Methylation levels of CpGs located in adjacent regions A and C, remained instead unchanged. Together these results demonstrate that hypomethylation/activation of CT-GABRA3 is linked with a process of de novo methylation of CpGs located within its transcription unit.
CT-GABRA3 transcription in LUAD cells correlates with regional increases in H3K36me3. We next searched to determine if CT-GABRA3 transcription also modifies histone marks within the overlapped genomic region. To this end, we analyzed ChIP-seq datasets of LUAD cell lines via the DBTSS platform, in order to evaluate the level of various histone marks around the transcription start site of GABRA3. These analyses revealed that CT-GABRA3 transcription in LUAD cells correlates with a decrease in H3K27me3 (Fig. 4A). This is consistent with initial presence of this mark, and its loss upon DNA hypermethylation through the previously described process of "epigenetic switch ", whereby promoter silencing shifts from a H3K27me3-mediated to a DNA methylation-mediated mechanism (supplementary Fig. S4, and 27 ). Strikingly, an enrichment in H3K36me3 within the 5′-region of GABRA3 was also observed (Fig. 4A). We observed no significant change in the repressive histone mark H3K9me3. Activating histone marks H3K4me3 and H3ac remained low in both CT-GABRA3-positive and -negative cell lines (Fig. 4A), consistent with the fact that GABRA3 is repressed in the two groups of cell lines. Similar observations were made for the GABRQ promoter, although the promoter showed a significant decrease of H3K9me3 levels in CT-GABRA3-expressing cell lines (supplementary Fig. S5). H3K36me3 is classically enriched over the body of actively transcribed genes, as it is deposited along with the transcription machinery. It has been shown that H3K36me3 favors local DNA methylation by attracting DNMT3 methyltransferases 17,19,21 . Inspection of the distribution of H3K36me3 within the entire 530 kb genomic region covered by CT-GABRA3 transcription, revealed that enrichment of this histone mark in CT-GABRA3-positive LUAD cell lines already becomes apparent 15 to 20-kb downstream of the transcription start site, and extends up into the GABRA3 promoter and beyond (Fig. 4B). Examination of ChIP-seq signals in neighboring segments (regions A and C, see Fig. 2B), indicated that increases in H3K36me3 were limited to the region overlapped by CT-GABRA3 transcription (region B, Fig. 4C). Together, these observations suggest that CT-GABRA3 transcription is accompanied by deposition of the H3K36me3 histone mark, and leads thereby to increased susceptibility of the entire transcription unit to DNA hypermethylation. This model explains how DNA hypomethylation, and concurrent transcriptional activation, can be connected with hypermethylation of adjacent promoters (Fig. 4D).

Other gene promoters displaying DNA hypermethylation in association with overlapping transcription in lung adenocarcinoma cells. An important issue was to determine if genes other than
GABRA3 and GABRQ, and in particular tumor suppressor genes, rely on a similar process of DNA hypomethylation-induced overlapping transcription to become hypermethylated in tumors. To this end, we examined the RNA-seq and Methyl-seq data obtained from LUAD cell lines (DBTSS) by applying a computational selection procedure to identify genomic loci that displayed the following features: i) ectopic activation in at least one LUAD cell line of a transcript that is not expressed in normal lung, ii) the ectopic transcript overlaps one or several downstream promoter(s) in either sense or anti-sense orientation, iii) the downstream overlapped promoter(s) (OPr) is(are) unmethylated in normal lung, and its hypermethylation is correlated with activation of the overlapping transcript (OTr) (Fig. 5A). This led to a list of 35 genomic loci, besides that containing GABRA3 and GABRQ. In three of these loci, activation of the overlapping transcript was correlated with DNA hypermethylation in not only one but two overlapped genes. Hence, our search identified 38 genes showing promoter hypermethylation in association with activation of an overlapping transcript in LUAD cell lines (Fig. 5B, supplementary Table S1). Overlapped promoters were located 2 kb to 128 kb downstream of the OTr transcription start site, in either sense or antisense orientation, and generally contained a high density of CpGs ( Fig. 5B,C). Moreover, examination of ChIP-seq data revealed that 87% of the overlapped promoters displayed significant enrichment of H3K36me3 in the LUAD cell lines that express the corresponding overlapping transcript (Pearson correlation coefficient > 0.5, adjusted p-value < 0.05; Fig. 5B), thereby supporting the involvement of a silencing mechanism similar to that described for GABRQ and GABRA3 (Fig. 4D). Interestingly, eight among the overlapped hypermethylated genes (WT1, PAX6, GNAS, EPB41L1, CSMD1, CPEB1, RERG, and SMAD6) were previously reported to exhibit tumor suppressive functions. DNA demethylation accounts for the ectopic activation of several overlapping transcripts. We next searched to determine if DNA hypomethylation accounted for activation of the overlapping transcripts in the genomic loci we selected. To this end, we first exploited bisulfite-seq data from normal human tissues in order to sort out OTrs that have their promoter initially methylated in normal lung ( me CpG ≥ 50%, Fig. 5B).    www.nature.com/scientificreports/ In addition, we examined Methyl-seq data from DBTSS, in order to identify OTrs that show significant association between activation and promoter demethylation among the 26 LUAD cell lines (Fig. 5B). Seven OTr genes (besides CT-GABRA3) fulfilled the two criteria, and were therefore considered as being DNA methyla-     CpG density (-400 to +400 relative to TSS)  www.nature.com/scientificreports/ tion dependent. Importantly, 6 out of these 7 genes displayed typical "cancer-germline" features, i.e. preferential expression and promoter demethylation in testis (and for some in placenta), as well as sensitivity to induction upon treatment with the DNA demethylating agent 5′-aza-deoxycytidine (5-azadC) (Fig. 5B, supplementary Fig. S6 and S7). Moreover, 6 of these OTr genes contained an intermediate density of CpGs within their 5′ region (Fig. 5C), a recognized characteristic of DNA methylation-regulated gene promoters 28 . For 7 other OTr genes, high promoter methylation was observed in normal lung, but Methyl-seq data in LUAD cell lines were lacking. Dependency on DNA methylation could therefore not be determined for these genes. The remaining 17 OTr genes were considered to be regulated by mechanisms not involving DNA methylation (Fig. 5B). Together, our selection procedure in LUAD cell lines led to the identification of 7 genes besides GABRQ and GABRA3 (ECH1, ZNF815P, AGO1, RERG, IGHVII-44-2, CNTNAP4, CSMD1) that become hypermethylated in lung tumor cells most likely through a process of DNA hypomethylation-induced overlapping transcription. For RERG, which initially displays elevated mRNA levels in normal lung (mean TPM = 16.38, GTEx), we were able to show that its expression was significantly downregulated in the LUAD cell lines that express the overlapping transcript (supplementary Fig. S8). Moreover, we found that hypermethylation of many other genes are also associated with transcriptional overlap (Fig. 5B), but in these cases activation of the overlapping transcript did not appear to be due to promoter DNA demethylation. Examination of RNA-seq data with the Splice Junctions analysis tool of the Integrative Genome Viewer (IGV) confirmed the presence of a transcript overlapping PTPRO and RERG promoters in testis and several LUAD cell lines (Fig. 6A). The OTr was therefore named CT-RERG (Cancer-Testis RERG). RT-PCR experiments and RNA-seq data in healthy tissues revealed that CT-RERG is expressed not only in testis but also in placenta (Fig. 6B, and supplementary Fig. S6), a feature shared by several CG genes (Fig. 6B). Despite the presence of the entire RERG open reading frame in the CT-RERG mRNA, this transcript variant appeared as a poor substrate of RERG protein translation, probably due to the presence of short upstream open reading in the specific 5′ exons (supplementary Fig. S9).

DNA hypomethylation-induced transcriptional overlap is linked with promoter hypermethylation of
Analysis of bisulfite-seq datasets showed that CpG sites located around the TSS (−/+ 400 bp) of CT-RERG are highly methylated in normal somatic tissues and instead almost completely unmethylated in sperm (Fig. 6C). Evidence supporting a primary role of DNA methylation in the regulation of CT-RERG was provided by the observation that this transcript was induced upon 5-azadC treatment in all of three lung tumor cell lines, which were available in our lab and did not initially express the gene (Fig. 6D).
As for CT-GABRA3, we observed that transcriptional activation of CT-RERG in LUAD cell lines was accompanied by increases in H3K36 tri-methylation over the entire 240 kb-long transcription unit, while neighboring regions remained unaffected (Fig. 6E,F). Consistently, whereas the CT-RERG transcription unit (region B) showed increased DNA methylation in expressing LUAD cell lines, neighboring regions A and C displayed instead reduced DNA methylation levels in these cell lines (Fig. 6G). Analyses of the RERG locus were further extended to in vivo tumor samples. Examination of TCGA datasets demonstrated that CT-RERG transcription is significantly correlated with PTPRO and RERG promoter hypermethylation in lung adenocarcinoma tissues (Fig. 6H). Since PTPRO has also been reported to exert a tumor suppressor function in hepatocellular carcinoma 31 , we analyzed corresponding TCGA datasets to verify association between CT-RERG expression and PTPRO hypermethylation in this tumor type. The results confirmed increased frequencies of PTPRO hypermethylation in the hepatocellular carcinoma samples that express CT-RERG (Fig. 6I). Together these observations confirm that DNA hypomethylation is associated with CT-RERG expression, and consequently with an increased propensity for PTPRO and RERG promoters to become hypermethylated. We noticed however, that a few CT-RERG-negative tumor samples nevertheless displayed DNA hypermethylation of PTPRO and RERG (Fig. 6H,I), thereby suggesting that transcriptional overlap may not be the only mechanism directing epigenetic silencing onto these promoters.

Discussion
It is currently proposed that DNA hypomethylation contributes to tumor progression by inducing genome instability, and by activating genes with oncogenic potential 13,33 . Our study now raises the interesting, and paradoxical, possibility that it also favors tumor development by contributing indirectly to the repression of tumor suppressor genes. We found indeed that focal DNA hypomethylation in tumor cells can lead to aberrant activation of transcripts that overlap downstream promoters and trigger their hypermethylation. Our work establishes therefore an unrecognized connection between DNA hypomethylation and DNA hypermethylation in tumors. This epigenetic coupling, however, applies to discrete genomic sites, and is therefore compatible with the accepted notion that genome-wide DNA hypomethylation is not associated at the global level with higher frequencies of DNA hypermethylation events 5 . Tumor-type specific patterns of DNA hypomethylation and hence of overlapping transcript activation, may instead be partly responsible for the selectivity of DNA hypermethylation events that is observed among tumors of different origins 34,35 . For instance, hypermethylation of PTPRO and RERG promoters was only occasionally detected in renal carcinoma, a tumor type known to display infrequent hypomethylation and activation of CG genes 12 , and in which we found seldom activation of CT-RERG (supplementary Fig. S10). www.nature.com/scientificreports/ tumors, the same process contributes to hypermethylation of alternative promoters that are embedded in the body of transcribed genes 37 . Our results support this concept, and provide a list of promoters that undergo hypermethylation in association with overlapping transcription in lung tumor cells. We show that overlapping transcripts sometimes initiate from a distant site, and are oriented in either sense or antisense direction compared with the hypermethylated promoter. Although overlapping transcription is an efficient way of inducing de novo methylation of downstream CpGs, our analyses showed that overlapped promoters sometimes remained unmethylated. Lack of GABRQ/GABRA3 and PTPRO/RERG promoter hypermethylation was indeed observed in a fraction of tumor samples that nevertheless produced the overlapping transcripts. Hypermethylation of these promoters was also absent in testicular germ cells, where overlapping transcripts are expressed (supplementary Fig S11). It is therefore likely that, under certain conditions, promoters can resist overlapping transcription-induced DNA hypermethylation. Such a mechanism of resistance was previously reported in differentiating embryonic stem cells, and was correlated with elevated transcriptional activity of the overlapped promoter 17 . Moreover, we hypothesize that overlapped promoters are at higher risk of becoming hypermethylated in tumors that exhibit molecular imbalances favoring de novo DNA methylation, for instance through exacerbated activities of DNMT methyltranferases or impaired functioning of TET demethylases [38][39][40][41] .

Scientific Reports
Transcripts that overlap GABRA3 and RERG promoters are in sense orientation, and share all coding exons with the corresponding overlapped genes. Our analyses revealed, however, that these overlapping transcripts do not produce the corresponding proteins, possibly due to the presence of short upstream ORFs in the specific 5′ exons (supplementary Fig. S9) and 16 . When activated, these non-coding overlapping transcripts do have therefore the potential to cause loss of function of the overlapped gene. A corollary to this observation is that analyses of transcriptomic data in tumors might in some cases suggest that a gene is activated, when in fact activation pertains to a non-coding overlapping transcript that actually leads to loss of function of the gene. This may partly explain previous observations linking DNA hypermethylation with transcriptional activation 42,43 . Hence, high-resolution analyses of transcriptomic and methylomic data are required in order clearly understand the links between DNA methylation changes and gene expression in tumors 44 .
Most of the DNA hypomethylation-induced overlapping transcripts we identified correspond to previously unreferenced RNA species with unknown function. It is therefore unclear if they exert pro-tumoral functions. CT-GABRA3, however, was shown to produce two miRNAs (miR-105 and miR-767) with oncogenic potential 16,45 , but the functional significance of the regulatory connection between CT-GABRA3 and GABRA3 remains unexplained.
It is hoped that a better understanding of the processes that underlie epigenetic alterations in tumors will lead to the development of novel tools for the diagnosis and therapy of cancer. In this line, establishing the pattern of expression of overlapping transcripts in tumor samples could serve to predict tumor suppressor genes that are at risk to become hypermethylated. Moreover, epigenetic anti-cancer therapies aiming at reactivating silenced tumor suppressor genes might benefit from the knowledge that some of these genes owe their hypermethylated status to a process of transcriptional overlap, and therefore to the specific contribution of druggable chromatin regulators, such as modifiers and readers of H3K36me3 marks.
RT-PCR and qPCR analyses. RNA of tissue samples was purchased from Ambion (Life Technologies).
RNA of cell lines was extracted using TriPure Isolation Reagent (Roche Diagnostics GmbH). Reverse transcription was performed on 2 μg of total RNA using M-MLV Reverse transcriptase and random hexamers (Invitrogen). For PCR reactions, we used the DreamTaq Kit (Thermo Fisher Scientific), incorporating 1/40 of the reverse transcription mixture in a final reaction volume of 20 μl. PCR reactions were visualized after electrophoresis in an ethidium bromide-stained agarose gel. For qPCR reactions, we used KAPA SYBR FAST (Sigma-Aldrich), incorporating 1/40 of the reverse transcription mixture in a final reaction volume of 10 μl. All reactions were carried out according to the manufacturer's instructions. All primers are listed in the supplementary table S2. Sodium bisulfite sequencing. Sodium bisulfite genomic sequencing of CT-GABRA3 and GABRA3 promoter regions was performed as previously described 16  (2) Data analyses: For regional DNA hypermethylation analysis, we studied the methylation status of all CpGs located between + 1 kb of the TSS and up to the 5′ end of the OTr (= region B: 530 kb for CT-GABRA3 and 240 kb for CT-RERG). Genomic segments of the same size were used to investigate CpG methylation levels in neighboring regions (regions A and C). For the upstream region A, CpGs located within 1 kb upstream from the TSS of the OTr were excluded from the analysis. For analyses in LUAD cell lines, we only retained CpGs for which the methylation status could be determined in > 70% of the cell lines.
ChIP-seq data collection and analysis. Hg38 ChIP-seq data for H3ac, H3K4me3, H3K9me3, H3K27me3, H3K36me3 histone marks (and input) of the 26 LUAD cell lines were downloaded from DBTSS v9. To quantify histone modifications within promoter (TSS+/− 1 kb) or genomic regions of interest in LUAD cell lines, we computed the total number of reads mapped to the corresponding genomic segment, divided by the sum of all reads generated in the same experiment, and multiplied by 10 6 to obtain Reads Per Million (RPM) values.

TCGA consortium datasets.
(1) Data collection: Normalized hg19 RNA-seq data with exon-level quantification and Infinium Human Methylation 450 K datasets for skin cutaneous melanoma (SKCM), lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LIHC), and kidney papillary cell carcinoma (KIRP) were downloaded from The Cancer Genome Atlas (TCGA) consortium 47 , using TCGAbiolinks v2.14.1 R-package 48 . Hg19 coordinates were converted to hg38 using liftOver v1.10.0 R package. Only unique primary (−01A) and metastatic (-06A) tumor samples, as well as unique normal adjacent tissues (−11A), for which both RNA-seq and Infinium methylation data were available, were analyzed. RNA-seq exon expression levels are expressed as Reads Per Kilobase per Million (RPKM). (2) Expression analyses: Since CT-GABRA3 and CT-RERG transcripts variants are not annotated in TCGA-derived datasets, we resorted to exon quantification to determine their expression status. Presence or absence of the canonical exon 1 allowed to distinguish CT-GABRA3 or CT-RERG transcript variants versus GABRA3 or RERG referenced transcripts, respectively. Thresholds were determined as follows: samples were considered positive for CT-GABRA3 expression if exon 1 displayed ≤ 0.1 RPKM and exon 2 ≥ 1 RPKM; CT-RERG expression was positive when exon 1 displayed ≤ 0.4 RPKM and exon 5 ≥ 1 RPKM. (3) DNA methylation analyses: For regional DNA hypermethylation analyses, Infinium methylation levels (beta values) were examined for all CpG probes located in regions A, B and C regions, demarcated as described here above.

DNMT1 depletion experiments of O'Neill's study. Illumina
HumanHT-12 V4.0 expression data (GSE90012) and Infinium Human Methylation 450 K data (GSE90011) were downloaded from NCBI Gene Expression Omnibus database. The following probes were used for expression analysis of the genes of interest: DNMT1 (ILMN_1715551), MAGEA6 (ILMN_2372681), and (CT-)GABRA3 (ILMN_1715551). For regional hypermethylation analyses, Infinium methylation levels (beta values) were examined for all CpG probes located in regions A, B and C regions, demarcated as described here above.
Bioinformatics workflow for the identification of overlapped promoter hypermethylation. Identification of genomic loci that harbor an activated transcript leading to hypermethylation of a downstream promoter in LUAD cell lines, was performed by using a pipeline conducted in Perl programming language (scripts are available upon request), and applied to RNA-seq and Methyl-seq data of LUAD cell lines (DNA Data Bank of Japan, DBTSS), as well as RNA-seq and WGBS data of normal lung (Sequence Read Archive of NCBI, ENCODE). Initial processing of these RNA-seq datasets was described above. Details on the procedures to select transcripts activated in LUAD cell lines and potential overlapped promoters, and to establish correlations between overlapping transcript expression and overlapped promoter methylation, are described in the supplementary methods.