Super-Enhancers and Broad H3K4me3 Domains Form Complex Gene Regulatory Circuits Involving Chromatin Interactions

Stretched histone regions, such as super-enhancers and broad H3K4me3 domains, are associated with maintenance of cell identity and cancer. We connected super-enhancers and broad H3K4me3 domains in the K562 chronic myelogenous leukemia cell line as well as the MCF-7 breast cancer cell line with chromatin interactions. Super-enhancers and broad H3K4me3 domains showed higher association with chromatin interactions than their typical counterparts. Interestingly, we identified a subset of super-enhancers that overlap with broad H3K4me3 domains and show high association with cancer-associated genes including tumor suppressor genes. Besides cell lines, we could observe chromatin interactions by a Chromosome Conformation Capture (3C)-based method, in primary human samples. Several chromatin interactions involving super-enhancers and broad H3K4me3 domains are constitutive and can be found in both cancer and normal samples. Taken together, these results reveal a new layer of complexity in gene regulation by super-enhancers and broad H3K4me3 domains.


EpiSwitch™ validation in K562
The EpiSwitch™ method was tested in K562 cells to check its efficacy and whether it could validate ChIA-PET data from K562 cells. Two technical repeats were screened for each region and 0/2 indicates no interaction observed while 1/2 and 2/2 indicates 1 and 2 interactions observed respectively in each individual K562 sample. For bait-hit2, there were 2 different primers designed at slightly different loci, indicated as bait-hit2-1 and bait-hit2-2. While there are a few differences, the results are mostly similar between bait-hit2-1 and bait-hit2-2. Negative control region 1, which has no reported interaction, was used for this analysis.

Individual heterogeneity in patients
We note that we also detected examples of individual heterogeneity in chromatin interactions in patients. While certain individuals had similar interaction patterns, others had different patterns -for example, MYC Bait-Hit1 was seen in 9 out of 13 peripheral blood samples examined while MYC Bait-Hit2-1 was seen in 12 out of 13 peripheral blood samples examined ( Figure S9a). Interestingly, the MYC Bait-Hit1 interaction was observed in the peripheral blood but not bone marrow samples of patients AD454 and AD548 ( Figure S9a). This particular interaction was also detected in only 4 out of 10 CML patient samples in contrast to 7 out of 8 non-CML patient samples that were screened ( Figure S9a).
Having observed heterogeneity between individuals in terms of chromatin interactions, we asked whether there was individual heterogeneity in enhancer activity levels. Due to limitations in the quantities of the patient samples, we were only able to examine two enhancers through the anti-H3K27ac ChIP-qPCR assay. We examined an enhancer at MYC-335, which is not present in K562 cells but is important in solid cancers, and the superenhancer at miR1205 which loops over to the MYC promoter in K562 cells. We found that the enhancer activity levels vary in individual samples ( Figure S9c). We did not see any clear correlation between enhancer activity levels and chromatin interaction levels (Table S5).
Next, we investigated whether there were any genetic or transcriptomic differences between the patient samples, and if so, what might be the impact of genetic differences between samples on chromatin interactions and enhancer activity levels. We performed PCR amplification followed by capillary sequencing to determine the genotype of Single Nucleotide Polymorphisms as well as whether there were any novel somatic mutations in the region and at two cancer mutation hotspot regions 24 proximal of the miR1205 super-enhancer in patient samples (Table S4). Although several SNPs varied between samples and several SNPs occurred in CTCF binding regions as measured by ChIP-Seq 25 , none showed any correlation with chromatin interaction levels or enhancer levels. It should be noted, however, that no SNPs were found at the CTCF motif. We did not detect any different SNPs or mutations in the two mutation hotspot regions proximal of the miR1205 super-enhancer. We did not detect by Sanger sequencing any novel somatic mutations in neither CML nor non-CML samples at the miR-1205 super-enhancer (Table S4). Taken together, we did not observe any clear correlations between different genetic status or the presence of novel DNA mutations and enhancer status or chromatin interaction levels or expression of MYC in the miR1205 region examined (Table S5) in these clinical samples. M: number of interacting proximal broad domains (PBD). N: number of interacting distal typical domains (DTD). O: number of interacting proximal typical domains (PTD). P: genes at proximal broad domains (PBD) that interact with this enhancer element. Q: the CAGE expression levels of the genes indicated by the transcripts in P are reported here. R: the RNA-Seq expression levels of the genes indicated by the transcripts in P are reported here. S: Genes at proximal typical domains (PTD) that interact with this enhancer element. T: the CAGE expression levels of the genes indicated by the transcripts in S are reported here. U: the RNA-Seq expression levels of the genes indicated by the transcripts in S are reported here. V: The maximum specificity score of the CAGE clusters at each transcript. Table 4. Listing of ChIP-Seq libraries used in the study. Table 5. Listing of patient samples examined (in this document). These samples include both peripheral blood and bone marrow from patients with chronic myelogenous leukemia as well as other diseases, and include knee bone aspirates from normal individuals. Table 6. Genetic variations in clinical samples (in this document). 6 SNPs were found in miR1205 region. SNPs that fall within the CTCF regions are annotated with *. A>C indicates a change of base from A to C while A/C indicates two peaks identified from sequencing. No mutations and SNPs were found in the two DNA hotpsot regions that were sequenced. The genomic location for the miR1205 sequenced region is chr8: 128978404-128981253 while chr8:128,973,357-128,973,382 and chr8:129,066,969-129,067,005 are the genomic locations for DNA hotspot region 1 and 2 from Weinhold et al. respectively 1 . Table 7. Cross-comparison between chromatin interactions, enhancers, gene expression data and sequencing at the miR1205 super-enhancer region (in this document). Enhancer activity is indicated by ChIP-qPCR of H3K27ac ( Figure S9C). N.D. indicates "not done", due to low cell numbers in the clinical samples. Chromatin interactions are indicated as present (Yes) or absent (No) on the basis that 2 or 3 of the 3 tested interactions must be positive in order to conclude that the interaction is present. The source data is shown in Figure S9A. The MYC expression data is indicated by RT-qPCR ( Figure S9B). N.D. indicates "not done", due to low cell numbers in the clinical samples. The source data of genetic signatures is from Table S6. Supplementary Table 8. Listing of primers used in experiments (in this document). This is a listing of all primers used for ChIP-qPCR, RT-qPCR, DNA sequencing, and EpiSwitch TM .

Supplementary Figure Legends
Figure S1. Schematic of analyses performed in this paper. The input data are indicated in light green, and the analyses performed are shown in light beige. Figure S2. Characterizing proximal and distal super-enhancers. a. The signal profile of the super-enhancer calling and some oncogenes (red), tumor suppressor genes (blue), and census cancer genes (underlined) they target by proximity or by looping. b-c. Screenshots of superenhancers at FOXA1, GATA3 and ESR1, CDKN1B. d. Fraction of proximal elements found at leukaemia associated genes. e. Histone modifications (H3K27ac, H3K4me1, H3K4me3) at the four types of regulatory elements, including proximal super-enhancers (PSE), proximal typical enhancers (PTE), distal super-enhancers (DSE) and distal typical enhancers (DTE). fg. The specificities (f) and expression levels (g) of CAGE clusters at proximal superenhancers and proximal typical-enhancers. All boxplots presented were prepared in the following manner: the black horizontal line indicates the median, the top and bottom of the box indicates the third and first quartile respectively, and the whiskers indicate 1.5*the interquartile range. Widths of boxes are in proportion to the square root of the number of data points in each category and the statistics testing was done using Dunn's Test. h-i. The celltype specificities and expressions of enhancer RNAs at distal super-enhancers (DSE) and distal typical enhancers (DTE). j. The number of distal enhancers, including distal superenhancers (DSE) and distal typical enhancers (DTE) associated with enhancer transcription. TPM indicates tags per million sequences. Figure S3. The overlap of enhancers with chromatin interactions and their effects on the transcription of the remote target genes a. Fractions of the four types of elements associated with at least one chromatin interaction. b. Boxplot for the number of interactions each type of element has (only interacting elements were included). c. Distribution of distances of the nearest TSS (blue), nearest active TSS (green), and TSSes through chromatin interactions (red) to the center of distal super-enhancers. d-e. The expression levels (d) and cell-type specificity scores (e) of CAGE clusters at proximal elements that are connected to the four types of elements as indicated on the x-axis. "None" indicates the set of CAGE clusters located at non-interacting proximal elements. (f) The expression levels of tumor suppressor genes (tsg), oncogenes (og), and cencus cancer genes (ccg) targeted by broad domains through proximity (pse_p) and looping (pse_d and dse_d) measured by RNA-Seq. Data shown is for MCF-7. Figure S4. Analysis of broad H3K4me3 peaks and chromatin interactions. a. Rank of H3k4me3 peaks by size and the cancer associated genes targeted by some broad domains. b. Some histone modifications at the four types of H3k4me3 domains. c. The fraction of proximal broad and typical domains located near leukemia associated genes. p-value is produced Fisher's Exact Test. d. The fraction of each type of elements by H3k4me3 involved in chromatin interactions. e-f. The cell-type specificity scores (e) and expression levels (f) of CAGE clusters at proximal H3k4me3 elements that are connected to the four types of H3k4me3 elements as indicated on the x-axis. "None" indicates the set of CAGE clusters located at non-interacting proximal elements. g. The expression levels of tumor suppressor genes (tsg), oncogenes (og), and cencus cancer genes (ccg) targeted by broad domains through proximity (pbd_p) and looping (pbd_d and dbd_d) measured by RNA-Seq. Data shown is for MCF-7. Figure S5. Normalization of the association between enhancers and chromatin interactions. a-b. Fraction of extended elements with chromatin interactions (a) and boxplot of number of chromatin interactions each type of elements has (b). c. Boxplot of number of chromatin interaction the four types of enhancer have after normalization against the total Pol2 signal present in the ChIA-PET dataset. d-e. Fraction of elements with chromatin interactions from HiC data (d) and boxplot of number of chromatin interactions from HiC data each type of elements has (e). f-g. Fraction of extended elements with chromatin interactions from HiC data (f) and boxplot of number of chromatin interactions from HiC data each type of elements has (g). The data shown is based on K562 cells.   Bait region is located near the TP53 promoter. The TP53 Hit 1 region is located near the MPDU1 gene promoter, and spans over CD68. The TP53 Hit 2 region is located near the KDM6B gene promoter. All regions include proximal super-enhancers and proximal broad domains. The MYC Bait region is located at a proximal typical-enhancer, a proximal broad domain, and near the MYC promoter. The MYC Hit 1 region is located at a distal typical-enhancer; in a PVT1 intron, and near miR-1205. MYC Hit 2 region is located at a distal super-enhancer, a distal typical domain; and in a PVT1 intron. b. EpiSwitch TM results at the MYC locus. Results are shown at different input levels of DNA for semi-quantitative measurements, and with primers designed at each locus. Certain loci have two or more primer pairs used which indicate different locations within each locus. Results are shown from peripheral blood and bone marrow from Chronic Myelogenous Leukemia patients and control patients respectively. "1" indicates the interaction was detected, while "0" indicates the interaction was not detected. c. EpiSwitch TM results at the TP53 locus. Results are shown at different input levels of DNA for semiquantitative measurements, and with primers designed at each locus. Certain loci have two or more primer pairs used which indicate different locations within each locus. Results are shown from peripheral blood and bone marrow from Chronic Myelogenous Leukemia patients and control patients respectively. "1" indicates the interaction was detected, while "0" indicates the interaction was not detected. Figure S9. Characterization of chromatin interactions at MYC locus in human samples by EpiSwitch TM , RT-qPCR and ChIP-qPCR. a. The interactions at the MYC locus were screened in triplicate and at multiple using 8 CML peripheral blood samples and 6 other patients (non-CML) peripheral blood. Two CML and two non-CML bone marrow samples were run as a pilot test of any possible differences in bone marrow (AD454 and AD548). "PB" indicates peripheral blood and "BM" indicates bone marrow. Three technical repeats were screened for each region and the number of interactions detected were indicated in numerical values out of the total three replicates for each of the individual patient samples. 2 or 3 of the 3 tested interactions must be positive to conclude that the interaction is detected. "*" indicates patient in blast crisis. It should be noted that this set of data was obtained in earlier experiments, before different input levels of DNA were used for semi-quantitative results. b. RT-qPCR comparison of MYC mRNA expression levels between CML and normal cells. MYC mRNA was quantified by qPCR and normalized against the expression of the β-actin housekeeping gene. The fold difference in MYC mRNA expression was then calculated relative to K562. Error bars indicate the mean + standard deviation. c. ChIP-qPCR of H3K27ac at MYC-335, a super-enhancer near miR-1205, MYC promoter and a negative control region. IgG showed low enrichment levels comparable to the negative control (results not shown). Error bars indicate the mean + standard deviation. and expression levels (f, h) of CAGE clusters at proximal elements that are connected to regulatory regions as indicated on the x-axis. "None" indicates the set of CAGE clusters located at non-interacting proximal elements.      Table S8b. Primer sequences used for EpiSwitch TM .