Integrated molecular analysis reveals complex interactions between genomic and epigenomic alterations in esophageal adenocarcinomas

The incidence of esophageal adenocarcinoma (EAC) is rapidly rising in the United States and Western countries. In this study, we carried out an integrative molecular analysis to identify interactions between genomic and epigenomic alterations in regulating gene expression networks in EAC. We detected significant alterations in DNA copy numbers (CN), gene expression levels, and DNA methylation profiles. The integrative analysis demonstrated that altered expression of 1,755 genes was associated with changes in CN or methylation. We found that expression alterations in 84 genes were associated with changes in both CN and methylation. These data suggest a strong interaction between genetic and epigenetic events to modulate gene expression in EAC. Of note, bioinformatics analysis detected a prominent K-RAS signature and predicted activation of several important transcription factor networks, including β-catenin, MYB, TWIST1, SOX7, GATA3 and GATA6. Notably, we detected hypomethylation and overexpression of several pro-inflammatory genes such as COX2, IL8 and IL23R, suggesting an important role of epigenetic regulation of these genes in the inflammatory cascade associated with EAC. In summary, this integrative analysis demonstrates a complex interaction between genetic and epigenetic mechanisms providing several novel insights for our understanding of molecular events in EAC.

The incidence of esophageal adenocarcinoma (EAC) has increased more than 6-fold over the past three decades in the United States and Western countries [1][2][3] . Chronic Gastroesophageal Reflux Disease (GERD) is a condition where esophageal epithelial cells are abnormally exposed to acidic bile salts and subsequently generates a high level of reactive oxygen species (ROS) and oxidative stress. GERD is the main risk factor for the development of a metaplastic glandular epithelium known as Barrett's esophagus (BE), which can subsequently progress to high-grade dysplasia and EACs 2,3 . EAC is an aggressive malignancy characterized by unfavorable prognosis with 5-year survival at less than 15%, irrespective of treatment and tumour stage 4,5 .
Molecular studies have demonstrated complex patterns in EAC. Studies of DNA copy numbers using comparative genomic hybridization (CGH) have consistently shown complex genomic alterations that include gains and losses of multiple chromosomal regions with high level amplifications in 8q24, 17q21, and 20q13 and losses in 9p21, 17p, and 18q21 [6][7][8] . Recent array-CGH and exome sequencing results have also indicated the presence of massive chromosomal and genomic instability [9][10][11] . The most frequent genetic changes that are implicated in EAC include silencing of p16 gene expression (by deletion or promoter hypermethylation), the loss of p53 expression (by mutation or deletion), and overexpression of cyclin D1 12,13 . Mutation analyses using whole-exome sequencing of EAC tumour-normal pairs confirmed that mutations of p53 are the most frequent alterations which occur in more than 50% of EAC, however, the frequency of mutation of any other individual gene falls below 5% 12,14 . In fact, these studies are all in agreement with the notion of high level of aneuploidy as a prominent feature of high-grade-dysplasia and EAC 15,16 .
Several genes have been reported to be downregulated or silenced in human cancers including EAC through epigenetic mechanisms that include promoter DNA hypermethylation 17,18 . Silencing of gene expression by promoter DNA methylation contributes to tumour development and progression. Examples include tumour suppressor genes CDKN2A (p16), APC, and CDH1; DNA damage repair genes such as MGMT; and antioxidant genes such as glutathione S transferase (GST) family and glutathione peroxidase family members 17,19,20 .
In this study, we have performed comprehensive integrated molecular analyses of gene expression, DNA copy number, and promoter DNA methylation using human EAC tissue samples. This integrated analyses approach identified a subset of genes where mRNA expression is associated with changes in copy number and/or methylation levels. We postulate that those genes that are regulated by more than one molecular mechanism are important drivers for the development of EAC. This could explain why cancer cells develop coordinated genetic and/or epigenetic mechanisms to regulate their expression.

Materials and Methods
Tissues Samples. Tissues were collected from 12 esophageal adenocarcinoma tumour samples and nine adjacent non-tumour histologically normal tissue samples (Supplementary Table S1). All tissue samples were examined for histological confirmation using haematoxylin and eosin staining followed by dissection of tumour tissue samples to enrich cancer cells content to ≥ 70%. All samples were subjected to molecular profiling that included comprehensive gene expression, copy number, and DNA methylation analyses. The use of de-identified specimens from the frozen tissue repository of the Department of Pathology was approved by the Vanderbilt University Institutional Review Board (IRB# 111096). All experiments were performed in accordance with the guidelines and regulations of Vanderbilt University Institutional Review Board and an informed consent was obtained from all human subjects. All experimental methods and protocols were approved by Vanderbilt University Biosafety Committee.
Gene Expression Profiling. Total RNA from the tissue samples was prepared using Qiagen RNeasy Tissue kit (Qiagen, Germantown, MD). The total RNAs were evaluated at the Vanderbilt Microarray Core Lab. Affymetrix Human Gene 1.0 ST arrays (Affymetrix, Santa Clara, CA) were used for gene expression analysis. The RNA preparation, labeling and cDNA array hybridization were carried out following the manufacturer's protocol. Raw data (CEL files) were processed using a robust multiarray averaging (RMA) approach to provide normalized expression data for each probe set on the arrays. Differential expression analysis (12 tumours vs. nine normal) was performed using limma package 21 . P-values were adjusted for multiple comparisons using the false discovery rate (FDR) method 22 . The thresholds for significance were set at p-value of 0.05 and fold change of two to determine the genes that were over-or under-expressed in tumours. Cluster analysis was carried out using R package Heatmap3 23 . Pathway and network analysis was performed using Ingenuity Pathway Analysis. Functional analysis was carried out using Gene Set Enrichment Analysis (GSEA) 24 . Gene expression data are available in Gene Expression Omnibus (GEO accession #GSE92396). DNA Methylation Analyses. DNA from the tissue samples were prepared using Qiagen DNeasy Tissue kit (Qiagen, Germantown, MD). Methylated DNA from each sample was enriched using Invitrogen MethylMiner ™ Methylated DNA Enrichment Kit (ThermoFisher, Waltham, MA) following the manufacturer's protocol. The captured DNA and input DNA were sent to Vanderbilt Microarray Core Lab for processing and hybridization. We used the NimbleGen 385 K array (Roche NimbleGen Inc., Madison, WI) which consists of 385,019 50-mer DNA probes with approximately 8 kb average spatial resolution. Normal samples were labeled with Cy5, and tumour samples were labeled with Cy3. The hybridizations were carried out following the NimbleGen's protocol. The arrays were scanned at 5μ m on an AXON 4000B scanner and analyzed using NimbleScan software v.2.6.0.0. Data were processed by Roche NimbleGen NimbleScan software (v1.9; NimbleGen). Three processing steps were involved: i) assigning scores to each probe, ii) finding peaks, and iii) annotating peaks. Signal intensity data were first extracted and processed to obtain a log2 ratio. A fixed-length window (750 bp) centered around the probe was selected to represent the distribution of the signal intensity of the probe, and the one-sided Kolmogorov-Smirnov (KS) test was applied to determine whether the probe was drawn from a significantly more positive distribution of intensity log-ratios than those in the rest of the array. The "finding peaks" step identified peaks as those consisting of at least two probes with scores above a minimum threshold of 2 (i.e. p-value < 0.05). The annotation step searched peaks in regions spanning 5 kb upstream and 1 kb downstream of the transcription start site (TSS). For a gene whose promoter region harbours multiple peaks, an average value of the scores was taken. In downstream analysis, each gene has one methylation level associated with its promoter region. The portion of probes aberrantly methylated in each TSS window (promoter region) was estimated. This was done by counting the probes in a called peak and dividing the count by the number of interrogated probes within a TSS window overlapping that peak. Calculations were done separately for hypermethylation and hypomethylation. For a gene possessing multiple transcription start sites, a peak might overlap multiple TSS windows associated with that single gene. In such a case, a TSS window around the most upstream TSS was used to represent the promoter region for the gene. The genomic positions were initially annotated by Human Genome Build version HG18 and were converted to use within a TSS window annotated by HG18_refGene. X and Y chromosomes were excluded from the analysis. Array Comparative Genomic Hybridization (aCGH). DNA from the tissue samples was prepared using Qiagen DNeasy Tissue kit (Qiagen, Germantown, MD). aCGH was carried out using NimbleGen aCGH array (Roche NimbleGen Inc.). The array consists of 719,690 oligonucleotide DNA probes with approximately 4 kb average spatial resolution. Pooled DNA from three normal esophageal samples was used as the reference sample. Labeling and microarray processing were done according to the manufacturer's protocol. All samples passed NimbleGen quality control assessment. Copy number (CN) was expressed as the log2 ratio of tumour:control DNA fluorescence intensity. CN data (log2 intensity ratios) were first processed by within-array normalization to remove spatial correlation and GC bias, followed by segmentation using circular binary segmentation (CBS) to translate noisy intensity measurements into regions of equal CN. The median absolute deviation (MAD) of the difference between the observed and segmented values was used to estimate sample-specific experimental variation. For each sample, a segment was declared gain or loss if the segment value was two times the sample MAD from the median segment of the autosomes. Probe value and CN status were determined based on the corresponding segment value and status. X and Y chromosome probes were excluded from analysis. A probe-wise frequency plot of CN alterations was based on gain/loss/normal status.

Results and Discussion
Gene Expression. Gene expression analysis identified 6,715 genes that are differentially expressed (FDR < 0.05) in tumour samples as compared to normal samples (3,488 genes were upregulated and 3,227 genes were downregulated in tumours) (Supplementary Table S2). Even though the majority of the samples were tumour-normal paired, unsupervised cluster analysis using all genes shows a clear separation between tumour and normal samples with one outlier (Supplementary Figure S1). These results confirm the presence of a strong difference in overall gene expression pattern between tumour and normal that is sufficient to distinguish tumours and adjacent normal tissues.
Gene ontology (GO) analysis using these differentially expressed genes identified several GO categories ( Table 1). Many of the identified GO categories have been previously linked to esophageal cancer. For example, we identified two upregulated GO categories related to metallopeptidase. Aberrant expression of these proteins has been associated with esophageal adenocarcinomas [25][26][27] . Our finding suggests that the differential expression of metalloproteinases related genes is important for EAC. The other top upregulated GO categories identified are collagen 28 , proteinaceous extracellular matrix and extracellular matrix part 29,30 . Extracellular matrix interaction can be promoted by mucins -large o-glycoproteins, which are often overexpressed in cancer cells 31,32 . This may partially explain the identification of extracellular related GO categories. However, the stroma cells in tumour microenvironment may also contribute to this finding 27 . The representative downregulated GO categories in this analysis are the intermediate filament and intermediate filament cytoskeleton 33 . Downregulation of cytokeratins has been reported in esophageal cancers 34 .
Among kinases, pathways that were predicted to be activated by IPA in our EAC samples include, ERBB2, FGFR2, and MAPKs. Gene amplification and overexpression of ERBB2, FGFR2 and MAPKs have been reported in gastroesophageal cancers [54][55][56] . Of note, targeting signaling kinases has been a promising therapeutic approach in several cancer types [56][57][58] . In addition to these important signaling molecules, the results predicted activation of CD44 pathway, which plays an important role in the development and progression of cancer 59,60 . CD44 is known to define cells with stem cell properties 61 and its activation has been associated with aggressive tumours and resistance to therapy 62,63 . Therefore, our findings are consistent with the biology of EACs, which are characterized by poor outcome and resistance to therapeutic approaches.
Gene Set Enriched Analysis (GSEA) was carried out based on gene expression data to determine the predominant oncogenic signature. This analysis predicted a signature that is overwhelmingly similar to KRAS activation observed in several cancer types including lung, prostate, and breast (Table 3). Although activating mutations in KRAS have been documented in several cancers [64][65][66] , these mutations are rare in EAC 14,67 . In this context, it is important to note that the observed KRAS signature in our EACs denotes the activation of signaling pathways downstream of KRAS and may contribute to the observed chemoresistance and poor clinical outcome in EACs as noted in other cancers with activated mutant KRAS 68,69 . The complete gene list for the predicted KRAS activation from GSEA analysis can be viewed in Table 3.

DNA Copy Number Alterations and Correlation with Gene Expression. Overall, 661,383 (92%)
probes showed copy number (CN) aberration (gain or loss) in our samples. Specifically, 454,299 (63%) probes showed CN gain and 458,201 (64%) showed CN loss. Supplementary Figure S3 shows the probe-wise frequency of CN changes across the entire genome (autosome) for all 12 tumours. A frequency value of 33% or more (i.e., at least 4 of 12 tumours) was set to identify commonly amplified or deleted probes. After filtering out probes in intergenic regions, 5,946 unique genes (1,115 amplified, 4,831 deleted) met this frequency cutoff. Details and frequency of genes that were amplified or deleted are given in Supplementary Tables S2 and S3. Our findings confirm the previously noted massive alterations in gene copy numbers in EAC 8,10,14,70,71 . A comparison of our results with previous publications is given in Supplementary Table S4.
Integrated analysis of gene expression and DNA methylation identified 107 genes that showed DNA hypermethylation and mRNA downregulation ( Fig. 1 and Supplementary Table S2). Among genes with frequent DNA hypermethylation, we found several gene clusters with significantly higher DNA methylation in EACs, as compared to normal tissues. These included several homeobox genes (HOX), forkhead box (FOX) families, G protein-coupled receptors (GPR), and zinc finger proteins (ZNF) (Supplementary Table S3). HOX genes belong to a large family that encodes for proteins functioning as critical master regulatory transcription factors during embryogenesis; where some of them reported overexpressed while others downregulated in cancer 82,83 . The roles of HOX genes methylation and downregulation in the biology of EAC or as biomarkers for cancer risk need to be determined. Several recent publications have reported DNA methylation of potassium channel related proteins (KCN) in human cancers [84][85][86] , including EAC 87 . Taken together, our findings call for functional analysis of the role of potassium channel related genes in the development and/or progression of EAC.
Our results showed that DSC3 (desmocollin 3) is among the top genes that were significantly downregulated and hypermethylated. This finding is consistent with an earlier report showing silencing of DSC3 by DNA methylation in advanced stages of esophageal adenocarcinoma 88 . We also detected downregulation and promoter hypermethylation of NDRG2. NDRG2 has some tumour suppressor functions as a potential metastasis suppressor gene in several cancers 89,90 . To date, the role of NDRG2 in Barrett's tumourigenesis and EAC has not been explored.
We also identified 244 genes that showed DNA hypomethylation and mRNA overexpression ( Fig. 1 and Supplementary Table S3). DNA hypomethylation has been reported as one of many mechanisms that regulate gene expression 91,92 . EAC develops in the background of chronic inflammation due to chronic gastroesophageal reflux disease where esophageal cells are abnormally exposed to acid and bile salts leading to the development of Barrett's esophagus and its progression to EAC 93,94 . Interestingly, our data demonstrate hypomethylation of several inflammation-related genes in EAC. These included 16 interleukin family members such as IL8, IL23R, and as many as 19 interferon family members (Supplementary Table S3). We have previously reported that IL8 is induced by both neutral and acidic bile acids in esophageal epithelia 95 , consistent with the biology of EAC. Notably, PTGS2, also known as COX2, was also hypomethylated and overexpressed in our study. COX2 is one of the key players in inflammation-related cancers and is activated in esophageal cells following exposure to acid and bile salts and its overexpression has been reported in EAC [96][97][98] . Furthermore, consistent with the etiology of EAC, our analysis predicted activation of several pathways associated with cytokines that play a central role in the inflammatory processes (Table 2). Moreover, several genes that regulate invasion and metastasis were hypomethylated and overexpressed in our analysis of EAC (Supplementary Table S3). These genes included several members of the matrix metalloproteinases (MMPs) family; MMP1, MMP7, MMP12, MMP3, and MMP10. MMPs overexpression has been described in several cancers including EACs, and their activation is associated with matrix remodeling, invasion and angiogenesis 26,99 . We also detected DNA hypomethylation and overexpression of SPP1 and LYN. SPP1 (also known as Osteopontin), is a chemokine-like calcified ECM-associated protein, which plays a crucial role in determining the metastatic potential in gastrointestinal cancers [100][101][102] . LYN (v-yes-1 Yamaguchi sarcoma viral related oncogene homolog) gene is one of the 8 non-receptor tyrosine kinases (Src, Fyn, Yes, Lck, Lyn, Hck, Fgr and Blk) that interact with the intracellular domains of growth factor/cytokine receptors. These findings call for investigating the roles of SPP1 and LYN in Barrett's carcinogenesis and EAC. Taken together, our results indicate the existence of a strong pro-inflammatory and pro-invasive environment in the development of EAC and suggest a previously unexplored interaction between promoter DNA hypomethylation and activation of these genes and networks during esophageal tumourigenesis.

Integrated Analysis of Gene Expression, Copy Number and DNA Methylation.
We have also carried out a fully integrated analysis that takes into account gene expression, methylation, and copy numbers (Figs 2 and 3, and Supplementary Table S2). We found 56 overexpressed genes that are associated with both CN gains and DNA hypomethylation whereas downregulation of 31 genes was associated with both CN losses and DNA hypermethylation (Fig. 1). Among these genes, CDH17 and GATA6 are examples with significant gene overexpression, promoter hypomethylation and copy number amplification in EACs. CDH17 is a member of the cadherin superfamily, encoding a calcium-dependent, membrane-associated glycoprotein that is specifically expressed in the gastrointestinal tract and pancreatic ducts. Recent studies indicated that CDH17 is involved in tumourigenesis and metastasis in gastrointestinal cancers 103 and may serve as a prognostic biomarker 104,105 , although not studied previously in the context of EAC. GATA6 is one of the GATA transcription factor family members, which include GATA1-6 47 . We have discussed the importance of GATA6 in the above sections, and our results show a possible regulation of GATA6 by both genetic and epigenetic mechanisms in cancer cells (CN and methylation), and suggest an important role of GATA6 in esophageal tumourigenesis. On the other hand, ANXA8, ANXA8L1, and PPP2R2C were examples of downregulated genes with both DNA hypermethylation and copy number loss (Supplementary Table S2). Both ANXA8 and ANXA8L1 are members of the annexin family of evolutionarily conserved Ca2+ and phospholipid binding proteins that may function as an anticoagulant. Downregulation of ANXA8 has been shown to correlate with the morphologic changes of the epithelial-to-mesenchymal transition (EMT) and may contribute to cell scattering and metastasis in cholangiocarcinoma 106 . ANXA8 might also play an important role in calcium fluctuation-mediated HIF-1α transcriptional activation and cell viability in pancreatic cancers 107 . PPP2R2C is another interesting gene downregulated in EAC, with simultaneous copy number loss and hypermethylation (Supplementary Table S2). Dysfunction of PPP2R2C through microRNAs has been recently reported in some human cancers 108,109 . Our results indicate that dysfunction of these genes by both genetic and epigenetic interaction mechanisms may play an important role in Barrett's tumourigenesis and EAC. Further studies are needed to elucidate the functions and regulatory mechanisms.

Conclusion
Our study indicates that EACs have complex transcriptomic, genomic, and epigenomic alterations. Frequent copy number alterations highlight the chromosomal instability nature of these cancers. Although this study has some limitations due to a relatively small number of patients' samples that were analyzed, our integrated analysis of gene expression, copy numbers, and methylation provided novel information regarding the complex molecular interactions in EACs. Our integrated approach identified several novel candidate genes and signaling networks that can be investigated for their biological functions and as possible diagnostic or prognostic biomarkers for EAC.