The role of epigenetic modifications, long-range contacts, enhancers and topologically associating domains in the regulation of glioma grade-specific genes

Genome-wide studies have uncovered specific genetic alterations, transcriptomic patterns and epigenetic profiles associated with different glioma types. We have recently created a unique atlas encompassing genome-wide profiles of open chromatin, histone H3K27ac and H3Kme3 modifications, DNA methylation and transcriptomes of 33 glioma samples of different grades. Here, we intersected genome-wide atlas data with topologically associating domains (TADs) and demonstrated that the chromatin organization and epigenetic landscape of enhancers have a strong impact on genes differentially expressed in WHO low grade versus high grade gliomas. We identified TADs enriched in glioma grade-specific genes and/or epigenetic marks. We found the set of transcription factors, including REST, E2F1 and NFKB1, that are most likely to regulate gene expression in multiple TADs, containing specific glioma-related genes. Moreover, many genes associated with the cell–matrix adhesion Gene Ontology group, in particular 14 PROTOCADHERINs, were found to be regulated by long-range contacts with enhancers. Presented results demonstrate the existence of epigenetic differences associated with chromatin organization driving differential gene expression in gliomas of different malignancy.

Testing the significance of the overlap between genes with differential expression and differential deposition of the epigenetic marks To verify the significance of the overlap of DEGs and genes with DEMs, we performed bootstrapping. From a pool of active genes (defined as those with a mean raw count across samples > 10), genes matching the number of: (1) obtained DEGs and (2) obtained genes with DEMs we randomly selected, and the overlap between them was calculated. Then, a random selection out of all, active genes of two sets of genes matching numbers of DEGs and genes with DEMs was repeated 100 times. In the next step, we calculated the number of times the overlap of the randomly selected DEGs and genes with DEMs was equal to or greater than the number of the originally overlapping genes.

Searching for correlation between H3K4me3 and H3K27ac marks and gene expression
In the case of DEGs, the Spearman correlation between expression levels and H3K4me3 and H3K27ac levels was calculated across samples. We compared the obtained correlation for DEGs with their epigenetic marks to randomly selected non-DEGs with their matching epigenetic marks. The randomly selected non-DEGs were sampled from active genes (average expression levels ≥ 10 raw counts across samples) under the condition that their mean expression level was at least the same as that of the DEGs, to avoid possible bias resulting in stronger correlation due to stronger expression. Correlation was calculated across all samples of our dataset, separately for each of the pairwise comparisons: PA vs DA, PA vs GBM/pGBM, DA vs GBM/pGBM. Random selection of active genes matching numbers of each of the two groups was repeated 10 times for each epigenetic mark and grade pairwise comparisons.

Distribution of chromatin states across topologically associating domains
All genes' genomic coordinates (defined as the region from gene start to gene end) were

Identification of TFs potentially regulating DEGs present within the 'glioma TADs' and enhancers
Additionally, prediction of TFs binding sites from DAVID tool was confirmed by an additional analysis of ATAC-seq data 4 employing the BMO tool -a method to predict TF binding sites without using footprints. First, low-quality reads and PCR duplicates were removed and only properly paired and uniquely mapped reads were retained for downstream analysis. Next, ATAC-seq peaks were called using MACS2 5 using "broad" and "nomodel --shift -100 --extsize 200" flags. Resulting peaks were then intersected with human ENCODE blacklist regions (hg38) to discard the peaks within artifact regions. Concurrently, FIMO tool 6 was used to predict occurrence of known motifs across the human genome (fasta file). Altogether, BMO pipeline was run and only the significant binding motifs instances, based on adjusted Benjamini-Yekutieli test (-log10 adjusted p-value < 0.05), were selected.
For the 315 genes encompassed within 'glioma TADs' (Supplementary Table S2) we specified which TFBS are present within their promoters using the BMO tool. Gene promoters were defined as a sequence of 2 kB ± from TSS. Detection of TFBS with BMO was performed using ATAC-seq from eight tumour samples (PA n=4, DA n=2, GBM n=2).
TFBS obtained with BMO allowed us to select TFs which could potentially regulate the expression of the genes of interest. Finally, using BMO, TFBS were also detected in the enhancer region (chr5:141528260-141529747) associated with PCDHGA genes cluster.

Long-range intra-chromosomal contacts of DEGs and enhancers
Following Johnston et al. 7 discovery that genes active in glioma stem cells with multiple loops are usually expressed at higher levels, we tested whether a similar pattern could be found in bulk tumour samples. Identification of enhancers, long-range contacts between genes and enhancers, as well as DNA methylation levels have all been previously described 4 .
In brief, active enhancers were determined by the presence of H3K27ac peaks in nonpromoter regions, (with promoters defined as TSS ± 2kB). We selected contacts between DEGs and enhancers within the 2 Mb range based on chromatin contact maps generated from the Hi-C data from developing human brains 3 . The differentially acetylated enhancers and differentially methylated CpGs within enhancers were defined by Mann-Whitney-Wilcoxon test p-value cut-off < 0.01. Enrichment scores for genes having multiple contacts with enhancers on the same chromosomes and for DEGs having at least 1 contact with differentially acetylated enhancers between PA and GBM/pGBM were calculated using gene set enrichment analysis (GSEA) 8 . Other enriched GO terms were identified using the DAVID tool 9 . GO terms enrichment threshold was p<0.005 with GO_Biological_Process_2018 and Enrichr algorithm enrichment. Results were visualized with the enrichR R package 10 .