MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA–DNA triplex structures

Long noncoding RNAs (lncRNAs) regulate gene expression by association with chromatin, but how they target chromatin remains poorly understood. We have used chromatin RNA immunoprecipitation-coupled high-throughput sequencing to identify 276 lncRNAs enriched in repressive chromatin from breast cancer cells. Using one of the chromatin-interacting lncRNAs, MEG3, we explore the mechanisms by which lncRNAs target chromatin. Here we show that MEG3 and EZH2 share common target genes, including the TGF-β pathway genes. Genome-wide mapping of MEG3 binding sites reveals that MEG3 modulates the activity of TGF-β genes by binding to distal regulatory elements. MEG3 binding sites have GA-rich sequences, which guide MEG3 to the chromatin through RNA–DNA triplex formation. We have found that RNA–DNA triplex structures are widespread and are present over the MEG3 binding sites associated with the TGF-β pathway genes. Our findings suggest that RNA–DNA triplex formation could be a general characteristic of target gene recognition by the chromatin-interacting lncRNAs.

b. ChIP-qPCR result showing enrichment of EZH2 and H3K27me3 chromatin marks, represented as percentage of input, over the human alpha satellite repeats and the HOXA9 promoter, used as positive controls, in BT549 cells 1,2 . The GAPDH promoter was used as a negative control, and showed no enrichment in EZH2 and H3K27me3 ChIP. DNase I treatment resulted in loss of enrichment over the human alpha satellite repeats and HOXA9 promoter.
c. Coding potential probability of non-annotated transcripts from the analysis using CPAT. Protein coding mRNAs and known lncRNAs were presented along with the non-annotated transcripts for the comparison. Most of the non-annotated transcripts had CPAT score less than 0.37 (CPAT score <=0.37 indicate non-coding).
d. 1,046 lncRNAs (annotated and non-annotated) were associated with EZH2-specific (17,625) T-to-C conversion. 193 EZH2 enriched lncRNAs (out of 997) carry T-to-C transitions. The P value was obtained by performing a hypergeometric test using all the lncRNAs considered in our analysis.  c. ChRIP validation in actinomycin D untreated BT-549 cells: RT-qPCR data showing the enrichment of the selected annotated and non-annotated lncRNAs in the EZH2 and H3K27me3 ChRIP pulldowns compared to input. IgG was used as non-specific antibody in ChRIP pulldowns. Actin was used as a negative control. Data represent the mean ± SD of two independent biological experiments.   Supplementary Figure 4. Data related to RNA-seq and RT-qPCR validation.
a. Figure showing the overlap of the deregulated protein-coding genes identified by microarray and RNA sequencing after downregulation of MEG3 and EZH2 using siRNA in BT-549 cells. The P values were obtained by performing a hypergeometric test using all protein-coding genes as a background.
b. Pathway analysis of the overlapped genes from microarray and RNA-seq after MEG3 and EZH2 downregulation in BT-549 cells, using KEGG annotation.
c. RT-qPCR analysis of ACTC1, CNN1, and COL5A1 gene expression in BT-549 cells transfected with Ctrlsi and Ctrlsi transfection followed by incubation with TGF-β2 ligand (Ctrlsi+ TGF-β ligand). Expression was also measured in MEG3si and MEG3si transfection followed by incubation with TGF-β inhibitor (MEG3sh+TGF-βin). The bar graph shows relative quantification of expression (± SD, n = 3) compared to the Ctrlsi transfection.   c.  b. The TGF-β pathway has a significant number of genes among the networks. Asterisks in red denote statistical significance. a-o. Location of the MEG3 peaks associated with the TGF-β pathway genes relative to the transcription start site. The scale on the y-axis shows ChOP-seq intensities in log10 scale.  a. ChOP-qPCR validation of the MEG3 binding sites associated with the TGF-β pathway genes (± SD, n = 3). We detected enrichment of the MEG3 peak sequences only when ChOP was performed with antisense oligos. Pulldown with sense oligos or with non-specific GFP oligos did not show any enrichment. RNase A treatment of the sonicated chromatin prior to ChOP pulldown resulted in complete loss of enrichment. β-actin was used as a negative control.
b. ChIP-qPCR result showing enrichment (percentage of input) of EZH2 over the MEG3 peaks associated with the TGF-β genes in Ctrlsh and MEG3sh cells. (± SD, n = 3).
c. Enhancer activity of the TGFBR1-associated MEG3 peak. The bar graph represents the relative normalized firefly luciferase activity of the DNA element (around 1,200 bp) containing MEG3 peak associated with the TGFBR1 in Ctrlsh and MEG3sh transduced cells. Firefly luciferase activity was normalized with Renilla luciferase and is presented as luciferase activity in MEG3sh cells relative to Ctrlsh cells (± SD, n = 3). The P value was calculated using Student's t-test. The schematic diagram above the graph shows the location and distance of the MEG3 peak from the TGFBR1 gene promoter. The small gray bar below the MEG3 peak indicates the cloned DNA fragment.

Transcriptome reconstruction using nuclear RNA sample
Transcriptome reconstruction was performed by Cufflinks 7 on nuclear RNA (Input) sample using GENCODE v11 annotation 8 . For non-annotated transcript discovery we used guided mode (parameter: --GTF-guide) in Cufflinks which uses RABT (Reference Annotation Based Transcript) assembly 9 . Transcripts were assembled with standard SOLiD library type parameter of fr-secondstrand for strandedness and the transfags are merged if the distance between them is less than 100 bp (parameter: --overlap-radius). The assembled reads were also multi-read corrected during Cufflinks procedure. Later the non-annotated transcripts which overlap with exons of known transcripts (GENCODE v11) were removed by retaining intronic and intergenic non-annotated transcripts.

Transcript abundance, differential expression analysis, finding T to C conversion and peak calling
The predicted known and non-annotated transcripts ( 14 . We have used Freebayes 0.6.3 (Bayesian genetic variant detector) 15 to identify conversion sites in our ChRIP RNA-seq samples and looked for T to C (Thymine to Cytosine) transitions.
We obtained significantly higher number of T to C changes compared to other transitions. The transitions (T to C) were considered only if the minimum read depth >= 2 (total number of reads covered per transition position) 16 . The filtered T to C transitions were mapped to the known lncRNAs and non-annotated transcripts identified by Cufflinks. Trizol (life technologies). As a negative control, a reaction without PRC2 protein was used which helps to detect background non-specific binding of RNA to the antibody and beads.

Expression analysis by microarray and RNA sequencing after siRNA transfection
Trizol extracted RNA was used in RT-qPCR to detect the enrichment of the bound MEG3 RNA over the negative control.
Biotin-RNA pull-down assay was carried out as in Tsai et al. 20 with the modifications as followed. Biotin-labeled WT sense, antisense and Δ345-348 MEG3 RNAs were synthesized using biotin RNA labeling mix (Roche) and incubated the in vitro synthesized RNAs with nuclear lysate from BT-549 cells followed by capturing with streptavidin-magnetic beads.
Bound proteins were eluted in SDS buffer and detected by Western blot with anti-EZH2 antibody (Cell Signalling).

RNA In situ hybridization
Fluorescence labeled probe to detect MEG3 was generated with a full length MEG3 cDNA using the BioPrime Array CGH Genomic Labeling system (Invitrogen). BT-549 cells were grown in Culturewell™ MultiWell cell culture system (Molecular Probes) for 36 hours followed by fixation, probe hybridization, washing, image capturing and image processing were performed as in Reinius et al. 21 .

Chromatin oligo affinity precipitation (ChOP)
Biotin labeled antisense DNA probes against full length MEG3 were designed using online

Motif analysis and assignment of MEG3 peaks to genes using GREAT tool
Sequences within ±200 bp around MEG3 peak summits were extracted and motif analysis of these peaks was performed using MEME-Chip 23 . Motifs with the lowest e-value (most significant) were considered. MEG3 peaks were assigned to genes with the help of GREAT tool using criteria of proximal distance 5 kb upstream, 2 kb downstream of transcription start sites and up to 500 kb distal to gene promoters 24 .

Chromatin Immuno precipitation (ChIP)
H3K4me1 ChIP in BT-549 cells was performed with 3 μg of anti-H3K4me1 antibody (Santa Cruz) using protocol followed in Robertson et al. 25 . 100-300 ng of immunoprufied DNA from H3K4me1 ChIP was purified and sequenced on Illumina platform as reported in Robertson et al. 26 . Sequencing reads were analyzed and the enriched regions were defined by following the criteria from Robertson et al. 25 . Bedtools were used to calculate the closest distance between Visualization of the pathways using Cytoscape The functional networks of the 300 genes, which were deregulated after MEG3 knockdown and were associated with at least one of the MEG3 peaks, were constructed with the help of Cluepedia 29 on Cytoscape 30 .

Analysis of MEG3 expression in breast cancer subtypes
Expression levels of MEG3 and target genes were assessed in a batch-corrected compendium of seventeen Affymetrix U133A/plus 2 primary breast tumors and three cell line gene expression datasets 31 . Briefly, Raw .cel files from seventeen Affymetrix U133A/plus 2 primary breast tumour and three cell line gene expression datasets were downloaded from NCBI GEO (GSE12276, GSE21653, GSE3744, GSE5460, GSE2109, GSE1561, GSE17907,   GSE2990, GSE7390, GSE11121, GSE16716, GSE2034, GSE1456, GSE6532, GSE3494), or caBIG (geral-00143) repositories, summarized with Ensembl alternative CDF 32 and normalized with RMA 33 , before integrating using ComBat 34 to remove dataset-specific bias 35 . The intrinsic molecular subtypes were assigned based on the highest correlation to the Sorlie et al. 36 centroids for each subtype. To compare gene expression of MEG3 in normal breast tissue and invasive ductal carcinoma the dataset GSE10780 was downloaded from NCBI GEO 37 and pre-processed as above using Ensembl aCDF and RMA normalization.

Cell invasion assay
Invasion assay was performed using the biocoat invasion chamber (BD bioscience, 354480) following manufacturer's instructions. In brief, cells were seeded in 50,000-60,000 cells/ml density on the upper chamber in DMEM medium supplemented with 1% serum. DMEM medium containing 10 % serum was added to the lower chamber. The migrated cells were fixed and stained with SNABB-DIFF Kit (Labex AB, Sweden) and counted under light microscope.

Predicting Triplex Forming Oligo (TFO) and Triple-helix Target Site (TrTS)
Triplexator is a tool used in our analysis to predict the Triplex Forming Oligos (TFO's) and Triplex Target Sites (TrTS) 38

Triplex capture assay
For in vitro and in vivo Triplex capture assay, we have followed a protocol by Besch et al. 39  DNA isolation and purification following the same protocol described above.

CD spectroscopy
CD-spectra were recorded on a Jasco J-810 spectropolarimeter. Each spectrum is the average  Supplementary Fig. 13).

Chromatin conformation capture (3C)
3C was performed according to the published protocol 40 . In brief, 2% formaldehyde fixed chromatin from Ctrlsh or MEG3sh BT-549 cells was digested with EcoRI. After ligation and DNA purification, we analyzed the 3C interactions between the TGFBR1 associated distal MEG3 peak and the TGFBR1 promoter in Ctrlsh or MEG3sh BT-549 cells using the primers provided in Supplementary Data 10. In order to measure the 3C interactions accurately, the difference in efficiency between the primers was normalized with a control BAC clone covering the whole TGFBR1 locus. The BAC clone was digested with EcoRI followed by ligation to generate all possible random combinations of ligation products. This ligated DNA was then used to check the efficiencies of the different primer pairs. 3C-qPCR data was normalized with the control 3C primers to eliminate the minor difference in cross-linking between the cell lines and also with the GAPDH primers for input amount (loading control). Luminescence was measured 48 hours post-transfection using Dual-Glo Luciferase Assay System (Promega). Expression of firefly luciferase was normalized with Renilla luciferase and data was represented as relative expression compared to the Ctrlsh cells.

Immunofluorescence
BT-549 cells were plated on glass coverslips at densities of 50,000 to 60,000 cells/ml and cells were allowed to attach for 24 hours. The cells were fixed with 3.7% formaldehyde (pH 7.4) in PBS for 15 min followed by 3X PBS wash. The cells were permeabilized in PBS containing 0.25% Triton X-100 (PBST) for 10 minutes, followed by 3X PBS wash. Following permeabilization, the cells were treated with RNase A or with RNase H at 30°C for 20 minutes. Control cells were incubated at 30°C for 20 mins in PBS without any treatment.