Epigenomic profiling of primary gastric adenocarcinoma reveals super-enhancer heterogeneity

Regulatory enhancer elements in solid tumours remain poorly characterized. Here we apply micro-scale chromatin profiling to survey the distal enhancer landscape of primary gastric adenocarcinoma (GC), a leading cause of global cancer mortality. Integrating 110 epigenomic profiles from primary GCs, normal gastric tissues and cell lines, we highlight 36,973 predicted enhancers and 3,759 predicted super-enhancers respectively. Cell-line-defined super-enhancers can be subclassified by their somatic alteration status into somatic gain, loss and unaltered categories, each displaying distinct epigenetic, transcriptional and pathway enrichments. Somatic gain super-enhancers are associated with complex chromatin interaction profiles, expression patterns correlated with patient outcome and dense co-occupancy of the transcription factors CDX2 and HNF4α. Somatic super-enhancers are also enriched in genetic risk SNPs associated with cancer predisposition. Our results reveal a genome-wide reprogramming of the GC enhancer and super-enhancer landscape during tumorigenesis, contributing to dysregulated local and regional cancer gene expression.

b. An unaltered predicted super-enhancer in T/N20020720, T/N2001206 and T/N980401 at the CMIP locus. c. A predicted super-enhancer detected in FU97 and YCC22 GC cells shows an inactive state in three T/N pairs at the ZNF326 locus.

Supplementary Figure 8: Association between copy number alterations and predicted super-enhancers.
a. An example of a somatic gain predicted super-enhancer detected in a copy number neutral region. b. FGFR2-associated predicted super-enhancers detected at regions of somatic copy number gain in KATO-III cells. c. A somatic gain predicted super-enhancer detected in a region with copy number gain in T/N980447. d. A highly recurrent somatic gain (H3K27ac) predicted super-enhancer was detected at the CLDN4 locus. This region was not associated with copy number gain. Figure 9: Long-range interactions between a predicted superenhancer (black rectangle) at TM4SF1 locus and the TM4SF4 promoter detected in OCUM-1 cells using Capture-C technology. The bottom track indicates the summarized interactions from the capture point #17. Figure 10: Capture-C interaction profiles. a. Interactions from the EHBP1 predicted super-enhancer (black rectangle) to promoters of TMEM1 and EHBP1 genes. The predicted super-enhancer was detected in OCUM-1 cells, showed somatic gain in primary tumor T20020720 and is associated with up-regulated expression of TMEM1 and EBHP1. b. Interactions from a predicted super-enhancer (black rectangle) at the YWHAZ locus to the promoter of YWHAZ. The predicted super-enhancer was detected in SNU16 cells, showed somatic gain in the primary tumor sample T990275 and is associated with up-regulated expression of YWHAZ. Figure 11: 4C interaction profiles. a. Example of a somatic gain predicted super-enhancer at the ELF3 locus and interactions with neighbouring genes, such as ELF3, RNPEP, ARL8A and LMOD1. Somatic gain activity is associated with up-regulation of ELF3 in primary GCs. Interactions (Q<0.05, r3Cseq) were detected in OCUM-1 cells using 4C. The 4C signal plot (in units of RPM) was generated using the Basic4CSeq package. Two constituent enhancers, e3 and e4 were deleted independently in OCUM-1 cells using CRISPR/Cas9 genome editing technology. b. Long-range interactions between a predicted super-enhancer at KLF5 locus and the KLF5 promoter were detected in OCUM-1 cells. Somatic gain activity in the primary tumor (T76629543) is associated with up-regulation of KLF5 expression in the matched sample. c. Interactions of a predicted super-enhancer at the CABLES1 locus to neighbouring noncoding regions and promoters of genes, including CABLES1 and RIOK3. (e-f) Differential gene expression between mutant (with one predicted enhancer deletion) and wild type cells was performed using RT-qPCR in OCUM-1 and SNU16 cells. Pooled cells were analysed. *P<0.05, # P = 0.055, one-sided t-test; wt: wild type; lad: DNA ladder (Bioline HyperLadder I); c1-c3: wild type cells using GAPDH primers.

Supplementary Figure 15: Landscape of GC-associated predicted super-enhancers in other cell and tissue types.
Enrichment ratios of recurrent somatic gain predicted super-enhancers identified in GC overlapping with super-enhancers detected in 86 cell and tissue samples compared to randomly selected regions. Cancer cell lines are labelled in red; Samples with statistically insignificant (P > 0.001) enrichment ratios are in grey. Figure 16: Consequences of transcription factor-silencing on histone modifications and gene expression. a. Differential CDX2 (left) and HNF4α (right) average binding signal analysis between recurrent somatic gain predicted super-enhancers and unaltered predicted super-enhancers. The predicted super-enhancers were also active in SNU16. b. Global changes in H3K27ac after silencing one or two transcription factors simultaneously (red). Background changes are created from the difference between two controls (NT CDX2 and NT HNF4α ). c. Magnitude of H3K27ac depletion after silencing of transcription factor(s) in OCUM-1 cells. d. Visual example showing H3K27ac depletion in a predicted super-enhancer at the FGL1 locus after CDX2 silencing in OCUM-1 cells. e. Association between H3K27ac depletion in somatic gain predicted super-enhancers relative to CDX2 or HNF4α binding sites in SNU16 cells. Distances were uniformly distributed classified into three categories: near, moderate and distal to the binding sites. Statistical significance was evaluated using a one-sided Wilcoxon rank sum test. f. Gene expression associated with somatic gain predicted super-enhancers in OCUM-1 was examined after the silencing of single or double transcription factors simultaneously (NT-siTF). The percentage of genes showing changes in expression (FPKM difference > 0 as down-regulation; < 0 as up-reglation) is indicated. The proportion of down-regulated genes was tested using an empirical approach (see Methods).

Supplementary
Supplementary Figure 17: CDX2, HNF4α knockdown efficiency by Western blotting and real time (RT) PCR. a. Western blot measuring CDX2 protein abundance before (siNT) and after CDX2 knockdown (siCDX2) in SNU16 and OCUM-1 cells. GADPH protein abundance was used as a control. b. Western blot measuring HNF4α protein abundance before (siNT) and after HNF4α knockdown (siHNF4α) in SNU16 and OCUM-1 cells. GADPH protein abundance was used as a control. c. Relative RNA abundance of CDX2 to control was measured using RT-PCR in two replicates in OCUM-1 cells. d. Relative RNA abundance of HNF4α to control was measured using RT-PCR in three replicates in OCUM-1 cells.

Supplementary Tables
Supplementary

Correlation between gene expression and distal predicted regulatory elements
To correlate distal predicted regulatory elements defined by Nano-ChIPseq to gene expression, we identified 80 predicted super-enhancers exhibiting high recurrence across multiple lines (P < 0.0001, empirical test). The same approach was also used to identify highly recurrent predicted typical enhancers. For both predicted super-enhancers and predicted typical enhancers, genes associated with distal regulatory elements exhibited higher expression that randomly selected genes ( Supplementary Fig. 4b). Comparing the expression of predicted super-enhancer/typical enhancer associated genes revealed higher overall expression levels (in unit of percentile) for predicted super-enhancer associated genes (P = 5.2x10 -3 , one-sided Wilcoxon's rank sum test). These results suggest a positive association between H3K27ac enrichment in predicted super-enhancers and predicted typical enhancers with target gene expression.

Comparisons of primary gastric non-malignant samples to Epigenome Roadmap
To confirm that our non-malignant gastric tissues are indeed reflective of gastric epithelia and not muscle, immune cells etc, we compared the non-malignant gastric H3K27ac profiles from our study to previously published normal gastric profiles and also to stomach smooth muscle profiles 11 . For each Nano-ChIPseq profile, 70% (average) of the H3K27ac signals overlapped with published normal gastric profiles, while only 34% (average) overlapped with stomach smooth muscle. The result suggests that our nonmalignant gastric samples are indeed reflective of gastric epithelia and not stomach smooth muscle.

Associations between copy number alterations and predicted super-enhancers in gastric cancer
We investigated the extent to which recurrent somatic altered predicted superenhancers might be associated with somatic copy number alterations (sCNAs). We computed overlaps between the predicted super-enhancers and copy number information from the cell lines and primary GCs, using in-house generated Affymetrix SNP6.0 array data.
Our analysis was restricted to regions covered by at least 6 SNP probes per 10 kb (2x higher than the mean genome-wide coverage), to allow regions of sCNA to be confidently identified.
Confirming the reliability of our sCNA analysis, an average of 98% of copy number gains and 82% of copy number losses in our analysis were also reported in Cancer Cell Line Encyclopedia 12 for GC cell lines found in the latter (FU97, KATO-III, MKN7, OCUM-1, In the cell lines, we found that only 5~6% (± 6% standard deviation) of the predicted super-enhancers were associated with copy number gains (average log2 ratio > 0.6). For example, an FGFR2-associated predicted super-enhancer detected in KATO-III overlapped with copy number gain ( Supplementary Fig. 8b), suggesting that the observed higher H3K27ac read density at the locus is potentially driven by regional genomic amplification. On the other hand, the majority of the predicted super-enhancers detected in GC cell lines localized at copy number neutral regions, suggesting that the establishment of predicted super-enhancers is independent of somatic copy number events. This fraction is greater than by random chance (P< 0.01, empirical test) Similarly, in primary GCs, we were able to compute CNA/SE correlations for 1,748 recurrent somatic gain predicted super-enhancers in 19 primary T/N pairs. We found that a only small fraction of somatic gain predicted super-enhancers (< 2% ± 3% s.d) overlapped with copy number gains ( Supplementary Fig. 8c), with >90% of somatic gain predicted super-enhancers found in individual T/N pairs are detected within copy number neutral regions ( Supplementary Fig. 8a). This result suggests that there is no strong association between somatic gain of H3K27ac in predicted super-enhancers and copy number changes 13 , and that H3K27ac acquisition at predicted super-enhancers in the tumor samples are likely driven by mechanisms separate from copy number alteration.