Genomic and transcriptomic alterations associated with drug vulnerabilities and prognosis in adenocarcinoma at the gastroesophageal junction

Adenocarcinoma at the gastroesophageal junction (ACGEJ) has dismal clinical outcomes, and there are currently few specific effective therapies because of limited knowledge on its genomic and transcriptomic alterations. The present study investigates genomic and transcriptomic changes in ACGEJ from Chinese patients and analyzes their drug vulnerabilities and associations with the survival time. Here we show that the major genomic changes of Chinese ACGEJ patients are chromosome instability promoted tumorigenic focal copy-number variations and COSMIC Signature 17-featured single nucleotide variations. We provide a comprehensive profile of genetic changes that are potentially vulnerable to existing therapeutic agents and identify Signature 17-correlated IFN-α response pathway as a prognostic marker that might have practical value for clinical prognosis of ACGEJ. These findings further our understanding on the molecular biology of ACGEJ and may help develop more effective therapeutic strategies.

I n the past few decades, the incidences of adenocarcinoma at the gastroesophageal junction (ACGEJ) are rapidly increasing worldwide 1,2 . While it can be categorized as esophageal or gastric adenocarcinoma 3 , ACGEJ has been treated as the latter in Chinese hospitals because esophageal cancer in China is mostly squamous-cell carcinomas 4 . Surgical resection is the standard treatment for early-stage ACGEJ and for locally advanced or unresectable tumors, pre-and post-operative chemotherapies are often used 5 . Targeted therapies are only for patients with latestage metastatic HER2-positive tumors 6 , although Nivolumab, an immuno-oncology agent targeting PD-L1, has showed encouraging efficacy in patients with unresectable advanced or recurrent HER2-negative ACGEJ 7 . The 5-year survival rates of this cancer are 20−25%, lower than that of esophageal or gastric cancers. Thus, it is urgent to apprehend molecular characteristics that can serve as potential drug targets and (or) prognostic indicators.
In a previous study conducted in Caucasian patients by the Cancer Genome Atlas (TCGA) project 8 , 65% of ACGEJ were categorized as the chromosomal instability (CIN) subtype of gastric cancer, characterized by preponderating focal copynumber variations (CNVs) in the tumor genome. Somatic single nucleotide variations (SNVs) and small insertions and deletions (indels) significantly recur in TP53 and genome-wide single base substitutions (SBSs) typically form a pattern known as Signature 17 in the Catalogue of Somatic Mutations in Cancer (COSMIC) 9,10 . Signature 17 may be the footprint of an early mutational mechanism initiating esophageal and gastric adenocarcinoma 11 , and its characteristic SBS, 5′-C[T > G]T-3′, has been associated with poor survival of esophageal adenocarcinoma 12 .
Due to different genetic makeup and environments including lifestyles, whether ACGEJ genomes in Chinese patients share the above characteristics is unclear. Previous gastric cancer studies on East Asian patients have each collected about 30 ACGEJ samples 13,14 , probably too few to draw rigorous conclusions. Moreover, previous studies did not provide enough information on transcriptomic changes that could further their significant findings on the genomic changes of ACGEJ. In the present study, we have assembled a relatively large set of Chinese ACGEJ patients and have sequenced genomes and transcriptomes of matched tumor and adjacent normal tissue samples. By jointly analyzing these data and comparing with the findings in Caucasian patients, we show that ACGEJ in Chinese patients are also dominated by CIN-associated focal CNVs and that Signature 17 activities correlate with multiple essential genomic and transcriptomic changes of ACGEJ. Furthermore, we deliver a comprehensive profile of genetic alterations potentially vulnerable to existing treatments, and identify genomic and transcriptomic prognostic markers of potential clinical values. These findings may improve current knowledge about ACGEJ and contribute to its precision diagnosis and treatment.

Results
CIN-associated focal CNV is the major feature of ACGEJ genomic alterations. We performed whole-genome sequencing (WGS) on ACGEJ tumor samples containing ≥60% of cancer cells ( Supplementary Fig. 1a) and matched blood samples from 124 Chinese patients (mean coverage 61x and 31x, respectively) and identified 2,558,269 SNVs and 1,258,899 indels. The tumor mutation burden (TMB) ranged from 0 to 13.1 (median 1.8) per megabase (Mb). The most significantly mutated gene was TP53 (FDR q ≤ 0.05) with coding mutations found in 71.0% (88/124) samples. To increase the detection power, we combined the reported coding-region mutations data of 151 ACGEJ samples 8,15,16 with our data and found that TP53 was the only gene recurrently mutated in ≥10% of samples (Supplementary Table 1). CNVs overlapped with a median of 7.8% of the tumor genome and 8.4% of the coding region, respectively (Supplementary Fig. 2a) and the jagged layout of genome-wide CNV distributions (Fig. 1a) indicates widespread focal CNVs. Proteincoding genes subject to CNVs were 5.7 times (median) more than those altered by non-silent SNVs/indels (Fig. 1b). These results suggest that focal CNVs are the major genomic alterations of ACGEJ in Chinese patients.
Since 65% of TCGA ACGEJ samples were deemed CIN and CIN is a common source of CNVs in cancer genomes associated with metastasis, therapeutic resistance, and immune evasion 17,18 , we investigated the prevalence of CIN in our samples. We found a published CIN gene signature (CIN70) 19 significantly overexpressed in tumors compared with adjacent normal tissues (P = 0.001; Fig. 1c) and whole-genome doubling (WGD), a known precursor to CIN 20 , in 59.7% (74/124) of our ACGEJ genomes. We also found other genomic abnormalities suggestive of CIN including chromothripsis (n = 77, 54 with WGD), kataegis (n = 74), and complex structural variations (SVs) such as translocations (median 141.5 per ACGEJ genome) and inversions (median 257.5 per tumor genome) (Supplementary Data 1,2). The expression levels of CIN70 were significantly higher in WGD than in non-WGD tumor genomes (median 0.50 versus 0.28, P = 2.15e-5) and significantly correlated with the number of chromosomal arm and gene level CNVs (Spearman's ρ = 0.30 and 0.27, P = 6.60e-4 and 0.003, respectively) (Fig. 1d). ACGEJ genomes with WGD also had more gene level CNVs than those without WGD (median 609.5 versus 312, P = 8.01e-4). The median ploidy of WGD genomes was 3.1 and they had a significantly larger proportion of autosomal genome losing heterozygosity than non-WGD genomes (39% versus 25%, P = 2.43e-6), indicating frequent single-copy losses after WGD. Together, these results suggest the abundant CNVs observed in our ACGEJ genomes were associated with CIN.
COSMIC Signature 17 is the characteristic ACGEJ mutational signature. We next examined the mutation spectra of our ACGEJ genomes to characterize the mutational signature and found 5′-C [T > G]T-3′ was the most common somatic SBS across the genomes (Fig. 2a). When fitted to reported COSMIC mutational signatures, this SBS becomes a major component of Signature 17. We also found recurrent activities of COSMIC Signatures 1, 3, 5, and 8 (Fig. 2b). In the ACGEJ genomes of our patients, 52.3% (257,673/493,106) of Signature 17 attributed SNVs were located at intergenic regions. We searched for potential cancer-driving regulatory elements in these regions and found Signature 17 SNVs in 75.3% (314/417) of significantly mutated CTCF binding sites 27 (Supplementary Data 3). Because oncogenic SNVs at CTCF binding sites have been linked to CIN 27-29 , we then investigated correlations between Signature 17 and CINrelated genomic alterations. Signature 17 activities were higher in ACGEJ genomes with TP53 coding mutations or WGD or chromothripsis than in genomes without these features (P = 0.002, 0.006, and 0.009, respectively) and were positively correlated with TMB and the number of chromosomal arm level CNVs (Spearman's ρ = 0.26 and 0.32, P = 0.004 and 2.57e-4, respectively; Fig. 2c, d). Chr. We further investigated Signature 17-correlated transcriptomic changes. Since the underlying mutational process represented by Signature 17 has not been well established, we sought for genes whose expressions are correlated with Signature 17 activities and assessed the enrichment of these genes in cancer hallmark pathways and gene signatures (see Methods). Among pathways and gene signatures differentially expressed between our tumor and adjacent normal samples (FDR q ≤ 1e-4), the CIN70 gene signature was enriched with genes positively correlated with Signature 17 activities (e.g., TPX2, NEK2, and KIF4A). Additionally, the mitotic cell-cycle, mitotic spindle, E2F targets, and G2M checkpoint pathways were also enriched with genes positively correlated with Signature 17 (e.g., MYBL2, KIF15, and CCNE1), suggesting elevated cell proliferation of tumors with high Signature 17 activities. Immunity-related pathways including allograft rejection, interferon (IFN)-α or IFN-γ response, and the IL6-JAK-STAT3 signaling were down-regulated in tumor samples and they were enriched with Signature 17 negatively correlated genes (e.g., CXCR3, A2M, and FGL2), suggesting suppressed immune response in Signature 17-high tumors. We also found the oxidative phosphorylation and glycolysis pathways were, respectively, enriched with Signature 17 negatively and positively correlated genes (e.g., CLDN3, NDUFA4L2, and UQCRC2), suggesting metabolic remodeling in Signature 17-high tumors (Fig. 2e).
Comparative analysis on major genomic alterations of our patients and TCGA patients. We compared clinically relevant characteristics of ACGEJ samples collected by this study (n = 124) and by TCGA (n = 105). Due to the controlled access of TCGA Level 1/2 data, we downloaded Level 3 data and applied the same downstream analyses. The patients enrolled in this study were slightly younger than TCGA patients (median 65 and 66 years, respectively, P = 0.050) and exhibited more advanced (stage III or IV) ACGEJ (72 and 41 patients, respectively, P = 0.021). While two cohorts shared similar sex distributions, the TCGA cohort was more ethnically diverse. Eighty-eight ancestryknown TCGA patients contain 78 Caucasians, nine Asians, and one African American. Five TCGA ACGEJ samples were likely microsatellite instable, while no such samples were identified in our cohort. Disregarding these samples, the TMB of the rest of TCGA ACGEJ (n = 100) was still higher than that of ours (median 2.68 and 1.84, respectively). The recurrence rates of functional mutations (annotated as oncogenic or likely oncogenic by OncoKB 21 ) in TP53, PTEN, ARID1A, CDKN2A, and KRAS appeared to be significantly higher in TCGA patient cohort than in our patient cohort. We did not find CDKN2A and KRAS mutations in our samples but detected 4.0% of LIPF mutations absent in TCGA samples (Supplementary Table 2).
Our samples were not significantly different from TCGA samples in the number of chromosomal arm level CNVs (median 17 for both) or gene level CNVs (median 423.5 and 370, respectively). High correlations of the cohort-level recurrences of arm level CNVs (Fig. 3a) indicated similar arm level CNV patterns across two cohorts. The genome-wide distributions of focal CNV regions were also similar in terms of recurrences and amplitudes (Fig. 3b). We assessed the overlaps between significantly recurrent focal CNVs (FDR q < 0.25) across two cohorts and found 46.4% of focal amplifications and 50.8% of focal deletions identified from our samples had ≥50% overlap with those identified from TCGA samples. We then searched for CNV driver genes in highly recurrent focal CNV regions identified in either cohort; any two regions identified from different cohorts and reciprocally overlapping at ≥50% of both their sizes were merged into one big region and considered shared by both cohorts. In these regions, we found 28 genes with highly correlated copy-number and expression changes (Spearman's ρ ≥ 0.3, FDR q < 0.05) in respective cohorts (Fig. 3c). Homologous deletions or amplifications of these genes, which might drive ACGEJ, occurred mostly at comparable frequencies between two cohorts (Fig. 3d), except for more frequent CCNE1 and BCL2L1 amplifications in our samples (P = 0.016 and P = 0.007, respectively).
We then compared the impacts of potentially functional alterations (i.e., oncogene amplification, TSG homologous deletion, and functional mutations in both) at the pathway level. In Fig. 1 Featured ACGEJ genomic and transcriptomic changes. a Frequencies of CNVs detected in 124 ACGEJ samples, with gains in red and losses in blue. Bars represent non-overlapped 1-million-base-pair windows along the genome. b Bar plot comparing the number of genes altered by CNVs and SNVs/ indels in each of 120 ACGEJ genomes with both types of alterations. The y-axis indicates the ratio of CNVs to SNVs/indels minus 1. Only deletions, amplifications, or non-silent coding mutations are counted. Most samples (102/120, 85%) show a preponderance of CNVs over SNVs/indels. c GSEA analysis comparing CIN70 activities in ACGEJ samples and adjacent normal tissue samples. The plot shows overexpressed CIN70 in ACGEJ samples, the normalized enrichment score (NES) and the P value. d Associations between CIN70 activities and the WGD status (+ and − indicating 74 and 50 tumor genomes with and without WGD, respectively; two-sided Wilcoxon rank-sum test), the number of chromosomal (Chr.) arm or gene level CNVs (Spearman's correlation tests) in ACGEJ samples. e GISTIC2.0 identified recurrent focal CNVs in 124 ACGEJ genomes, with 25 potential CNV drivers annotated on the plot. f Box and bar plots comparing chromosomal (Chr.) arm level CNVs, gene level CNVs, and the frequencies of WGD in tumor genomes with and without CCNE1 copy number gains (n = 67 and 57, respectively; two-sided Wilcoxon rank-sum tests). g Associations between CCNE1 expression levels and CIN70 activities (Spearman's correlation test) or WGD status (two-sided Wilcoxon rank-sum test) in ACGEJ samples. Box plots in (d, f, g) show the median (central line), the 25-75% interquartile range (IQR) (box limits), the ±1.5 times IQR (Tukey whiskers), and all data points, among which the lowest and the highest points indicate minimal and maximal values, respectively.   both cohorts, we found recurrent perturbations (>10% samples) to the cell-cycle, receptor tyrosine kinases (RTK)/RAS, PI3K, MYC, TGF-β, and NOTCH signaling pathways, and to genome integrity related genes (e.g., TP53, MDM2, and ERCC2), with different rates (Fig. 3e). In addition, MAPK signaling (e.g., KRAS, MAP2K1), immune signaling (e.g., HLA-A/B), the SWI/SNF chromatin remodeling complex (e.g., ARID1A, SMARCA4), chromatin histone modifiers (e.g., EP300, KMT2C), and the  homologous recombination pathway (e.g., BRCA1, TP53BP1) were recurrently perturbed in the TCGA cohort but not in ours.
Despite different genomic perturbation rates, the activities of these pathways or gene sets in our tumor samples and TCGA samples, as quantified by Gene Set Variation Analysis (GSVA) 31 scores using normalized and batch corrected gene expression data, were similar with no statistically significant difference. So were the estimated fractions of different immune cell types. Together, these results suggest that our patient set and TCGA patient set had similar genomic alterations with some differences.
Genomic alterations indicate potential vulnerability to existing therapeutic agents. The lack of specific and effective drugs for ACGEJ treatments is largely due to our limited knowledge on the genomic characteristics of this disease.  (Fig. 4a). The vulnerabilities of our patients to the three drugs were predicted mainly based on coding mutations in TP53. Anthracyclines has also been tested in a clinical trial as an inhibitor of TOP2A 33 , which was amplified in 11.3% (14/124) of our patients. We found significantly more frequent CIN-related genomic alterations (i.e., WGD, chromothripsis, and kataegis) in patients predictively responsive to any of the three drugs (n = 91) than in other patients (P = 2.00e-5, 0.035, and 2.00e-5, respectively) ( Fig. 4b), consistent with a previous report that CIN-type gastric cancer is sensitive to adjuvant chemotherapy 34 . For targeted therapeutic agents (Fig. 4c, d), 19.4% (24/124) of our patients had ERBB2 amplification and therefore could benefit from FDA approved ERBB2 inhibitors, including Trastuzumab, Pertuzumab, Neratinib, Lapatinib, and Ado-Trastuzumab Emtansine. Of the 100 patients without ERBB2 amplifications, 89 had other genomic alterations likely druggable by 59 classes of agents. Specifically, we predicted that WEE1, CDK4/6, and PARP inhibitors would be effective for 70, 42, and 27 of our patients, respectively, and together they could cover 93.3% (83/89) of our patients without ERBB2 amplification. Among potentially actionable gene alterations found in all 124 patients (Fig. 4c), the TP53 mutations in 88 patients were the major contributors to the predicted efficacy of WEE1 inhibitors. While only 19 patients had CDK4/CDK6 alterations, other alterations such as CDKN2A/ CDKN2B deletions or CCND1 amplifications in another 29 patients might also render them vulnerable to CDK4/6 inhibitors. Using the RNA sequencing data of paired ACGEJ and adjacent normal tissue samples of each patient, we found that 85.2% (404/ 474) of the predictively targetable gene alterations had corresponding expression changes (Supplementary Data 6). Recurrent inconsistency (>10 samples) was only found for FGF3 and FGF4 amplifications (in 13 and 14 samples, respectively), potentially affecting the predicted response rates of two FGFR inhibitors Lucitanib and Dovitinib.
We compared the classes of targeted therapeutic agents with ≥20% predicted response rates in our 89 patients and 85 TCGA ACGEJ patients that had no ERBB2 amplifications (Fig. 4d) and found that of the eight guideline or clinical-trial drug classes (i.e., WEE1, CDK4/6, PARP, FGFR, PI3K, AKT, AURKA-VEGF, and MTOR inhibitors), WEE1 and FGFR inhibitors had higher whereas others had lower response rates in our patients than in TCGA patients. Since CCNE1 was recurrently amplified, overexpressed, and often co-amplified with ERBB2 in our tumor samples, we took a special interest in druggable CDK2 whose activity is regulated by Cyclin E1 produced by CCNE1. The predicted response rate to existing CDK2 inhibitors was 29.8% (37/124) in all our patients and 24.7% (22/89) in our patients without ERBB2 amplifications, compared with 16.2% (17/105) and 15.3% (13/85) in TCGA ACGEJ patients, respectively.
Unfortunately, 11 of our patients had no potentially druggable gene alterations. The tumor genomes of these patients had significantly lower TMB (median 1.87 versus 0.06 per Mb, P = 1.81e-4), fewer CNVs at the chromosomal arm level (median 1 versus 17, P = 7.84e-6) and the gene level (median 0 versus 490, P = 6.10e-6) and less frequent WGD (2/11 versus 72/113, P = 0.007) than the tumor genomes of other patients (Fig. 4e). Based on the hematoxylin and eosin (H&E)-stained histopathological images of these 11 patients' tumor samples ( Supplementary  Fig. 1b), we estimated that the range of their tumor cell contents were 60−95%, which eliminated the possibility of normal tissue contamination.
We performed in vitro experiments to assess the vulnerabilities of cell lines with predicted druggable gene alterations to corresponding therapeutic agents. We tested six chemotherapeutic agents and five targeted therapeutic agents on eight cell lines with different predictive vulnerabilities (Supplementary Fig. 3; Supplementary Data 7) and the results were in line with the predictions (Fig. 4f). For chemotherapeutic agents, cells with oncogenic mutation in NF1 were significantly more sensitive to Vinblastine and cells with oncogenic mutations in ATM and ERCC4 were slightly more sensitive to Cisplatin than those without corresponding mutations. For targeted therapeutic agents, cells with CCNE1 amplifications were highly responsive to CDK2 inhibitor Roniciclib. Cells with oncogenic mutations in DNA repair genes BRCA2, ATM, ATR, CHEK2, and FANCA were more sensitive to PARP inhibitor Olaparib and cells with oncogenic mutations in TP53 or BRCA1 were more sensitive to WEE1 inhibitor MK-1775 than those without corresponding mutations. Cells with CDK4/6 or FGF3/4 amplifications showed only slightly but significantly higher sensitivity to CDK4/6 inhibitor LEE011 or FGFR inhibitor Dovitinib, respectively, than those without corresponding gene amplifications (Fig. 4f).  (Fig. 5a, b). Given that the TMB and gene level CNVs were moderately correlated in our patients (Spearman's ρ = 0.32, P = 2.50e-4), we further classified patients into two groups, the low-TMB high-CNV group (n = 31) versus others (n = 52), and the first group showed significantly shorter survival time than the second group (log-rank P = 0.001), with an adjusted HR of 4.61 (95% CI = 1.80-11.80) (Fig. 5c). Incorporating TMB with gene level CNVs led to a bigger HR than considering the latter alone, suggesting the two genomic alterations may combinedly affect ACGEJ survival time. We also found a significant correlation between Signature 17 activities (≥17.1% or <17.1%) in ACGEJ and survival time in patients (log-rank P = 0.022; Fig. 5d). The HR of high Signature 17 activity for ACGEJ death was 3.94 (95% CI = 1.38-11.28) adjusted for age, sex, tumor stage, TMB, and gene level CNVs, indicating Signature 17 activity is an independent prognostic factor. Figure 5e shows a multivariate Cox model comparing the relative impacts of featured genomic alterations and other non-genomic factors such as for age, sex, and tumor   stage. We next evaluated the survival associations of ACGEJ transcriptomic changes, specifically, aberrant activities of cancer hallmark pathways or gene signatures quantified by GSVA scores. Patients with low GSVA scores of the IFN-α response pathway (< −0.2256) showed significantly shorter survival time (log-rank P = 0.004; Fig. 5f) than those with high scores (≥ −0.2256). The HR of high IFN-α response for ACGEJ death was 0.25 (95% CI = 0.09-0.66) adjusted for age, sex, and tumor stage. Because downregulated IFN-α response pathway was significantly correlated with high Signature 17 activities (Fig. 2e), we then focused analysis on this pathway and further identified two relevant genes, Interferon Induced Protein 44 (IFI44) and IFI30 Lysosomal Thiol Reductase (IFI30), whose high expressions (≥3.22 or ≥28.13 Transcripts Per Million, TPM) were significantly correlated with long survival time (log-rank P = 6.57e-5 and 0.043, respectively; Fig. 5g, h), with respective adjusted HRs of 0.23 (95% CI = 0.09-0.60) and 0.28 (95% CI = 0.09-0.84).

Discussion
ACGEJ is on the rise worldwide, but its molecular characteristics have never been independently profiled and reported, which is impeding clinical treatment and drug development of this malignancy. In the present study, we have assembled and comprehensively analyzed the genome and transcriptome data of totally 124 ACGEJ samples exclusively collected from Chinese patients. The results have indicated that, like Caucasian patients, the ACGEJ genomes of our patients were dominated by CINpromoted tumorigenic focal CNVs while lacking recurrently mutated coding genes other than TP53. Both TMB and gene level CNVs of the ACGEJ genome indicate the prognosis of Chinese patients. Compared with previous reports, the present study has two major advances. Firstly, our integrative analyses on genome and bulk-tissue transcriptome sequencing data have led us to more reliable findings. Secondly, we have gone beyond descriptive genomic profiling to predict effective therapeutic agents and prognosis of patients based on their genomic and transcriptomic changes. The comprehensive findings in the present study may improve our current knowledge of ACGEJ and have implications for clinical care of patients.
In the ACGEJ genomes of our patients, TP53 is the only significantly and recurrently mutated coding gene we have detected. That said, there may be mutated non-coding drivers, such as CTCF-binding-site SNVs attributable to COSMIC Signature 17. As one of the hallmark signatures of upper-gastrointestinal adenocarcinoma, Signature 17 is believed to arise from misincorporation of oxidized DNA precursors into genomic DNA 12,35 . It has been shown that oxidative stress may generate more mutations in introns and intergenic regions than in exons and promoters 36 , and improperly handled oxidative DNA damages can lead to CIN phenotypes in model systems 37,38 and in human cells 39 . Our results are in line with these previous reports, indicating that Signature 17 SNVs may be attributed to oxidative stress and represents an etiological factor forming CIN in ACGEJ genomes. In addition, we have notably provided bulk transcriptomic evidence to support the role of Signature 17 by correlating its activities with a gene expression signature of CIN. More importantly, our analysis on bulk RNA sequencing data has revealed an immunosuppressive microenvironment in tumors with high activities of Signature 17. These results indicate that Signature 17 related genome and transcriptome changes play very important roles in the development and progression of ACGEJ, which may be of clinical relevance.
Comparing the genomic and transcriptomic alterations in our samples with those in TCGA ACGEJ samples (mostly from Caucasian patients) revealed similar CNV patterns and pathway aberrations. However, tumor samples of our patients seem to Fig. 4 Genomic alterations vulnerable to existing treatment options in Chinese ACGEJ in comparison with TCGA ACGEJ. a Chemotherapeutic agents predicted to be responsive in Chinese ACGEJ. Left panel shows genomic alterations vulnerable to each chemotherapeutic agent and right panel shows the percentage of samples responsive to each chemotherapeutic agent, with comparisons between our ACGEJ patient set and the TCGA ACGEJ dataset. b Bar plots comparing ACGEJ samples likely responsive and unresponsive to Anthracyclines, Gemcitabine, or Mytomycin C (n = 91 and 33, respectively) for the frequencies of WGD, chromothripsis, and kataegis (Fisher's exact tests). c Genomic alterations in our ACGEJ samples and the corresponding targeted therapeutic agents. Only genomic alterations detected in ≥5 samples and agents having potential targets in ≥20% samples are shown. d Percentage of samples predicted to be responsive to different targeted therapeutic agents, with comparisons between our patients and TCGA patients without ERBB2 amplification (n = 89 and 85, respectively). Only agents with potential targets in ≥20% of our patients or TCGA patients are shown. e Comparison between likely druggable and undruggable ACGEJ samples (n = 113 and 11, respectively) for TMB, chromosomal (Chr.) arm or gene level CNVs (two-sided Wilcoxon rank-sum tests), and the frequencies of WGD (Fisher's exact test). f In vitro validation of predicted vulnerabilities to chemotherapeutic and targeted therapeutic agents. Box plots compare relative viability (expressed as OD 450 value) of different cell lines, likely vulnerable (blue) or invulnerable (red) to specified therapeutic agents, after treatment with optimal dose (in parentheses) of corresponding agents for 72 h. Each dot represents the result of one independent experiment and each experiment had three replicates on one cell line; n represents the number of cell lines; P values derived from twosided Wilcoxon rank-sum tests. Box plots in (e, f) show the median (central line), the 25-75% interquartile range (IQR) (box limits), the ±1.5 times IQR (Tukey whiskers), and all data points, among which the lowest and the highest points indicate minimal and maximal values, respectively.
have fewer functional mutations than TCGA samples, which might reflect different genetic and environmental backgrounds between Chinese patients and Caucasian patients or might be simply due to different coverage of our WGS data and TCGA whole-exome sequencing data. We could reduce the confounding effect of different sequencing coverage by down-sampling the reads, but that requires access to controlled TCGA data. There are also some differences in the alteration rates of candidate driver genes (e.g., LIPF), potential drug targets (e.g., CCNE1), and survival-associated immune pathways (e.g., IFN-α response) across two cohorts. In these regards, the current study has increased general knowledge on the genetic and molecular basis of ACGEJ and, particularly, filled in the gap for non-Caucasian patients. Thus, our findings could contribute to developing more specific and effective precise care of ACGEJ worldwide.
Because there have been no specific and effective drugs for ACGEJ treatments, we have screened the ACGEJ genome and transcriptome of each patient for changes that may indicate the patient's responses to therapeutic agents currently used for other types of cancer or under clinical trials. Since DNA repair deficiency caused by TP53 inactivation and other events is the major mechanism associated with the efficacy of cytotoxic anticancer drugs, we have predicted that more frequent TP53 inactivating mutations may lead to generally higher response rates to cytotoxic anticancer agents in Chinese patients than in Caucasian patients, suggesting that these chemotherapeutic agents may generally benefit Chinese patients with ACGEJ. We have also found that a considerable proportion of Chinese ACGEJ patients might benefit from targeted anticancer agents 40 such as ERBB2, WEE1, and CDK4/6 inhibitors. The predicted response rates to these agents are somewhat different between Chinese and Caucasian ACGEJ patients. Specifically, we have shown that CCNE1 amplification is a CIN-associated oncogenic event prevalent in Chinese patients and often coexisting with ERBB2 amplifications. Given that CCNE1 amplification has been linked to the resistance to ERBB2 inhibitors 41,42 , it is worthwhile to conduct clinical trials for CDK2 inhibitors 43,44 in Chinese ACGEJ patients. We have validated the reliability of gene-alteration-based drug vulnerability prediction by in vitro assays in a panel of cancer cell lines. Thus, our predictions on the response rates to currently used cytotoxic anticancer agents or targeted drugs approved or tested for other types of cancer have provided molecular evidence for their clinical use or clinical trials for treating ACGEJ.
In the present study, we have identified several prognostic markers from genomic or transcriptomic changes associated with the survival time in ACGEJ patients. At the genomic level, we have found that TMB and gene level CNVs are independently and combinedly correlated with the survival time in our patients. In addition, we have identified Signature 17 activity as another independent marker for ACGEJ survival time, which one may expect because in our patient set Signature 17 is correlated with CIN-related genomic changes such as WGD and chromothripsis, both confer terrible malignant cancer phenotypes 23,45 . Consistent with our result, a previous genomic study in breast cancer has also shown that Signature 17 is enriched in metastatic tumors and linked to poor prognosis of patients 46 . Nevertheless, in the present study we went beyond genome data and incorporated transcriptome data for analysis. As a result, we have found that a Signature 17-correlated cancer hallmark pathway, IFN-α response, is also significantly associated with the survival time in our ACGEJ patient set. Moreover, transcriptome analysis has shown that tumors with high Signature 17 activities had repressed activities of IFN-producing cytotoxic cells. These findings emphasize that immunosuppression in the tumor microenvironment, likely induced by high Signature 17 activities in cancer cells, plays an important role in the progression of ACGEJ. Specifically, we have found significant associations of high expression levels of IFI44 and IFI30, two IFN-α-inducible genes, with good survival in our patient set. Our comparative analyses on TCGA and GEO datasets have yielded both consistent and inconsistent results. The associations of TMB levels, Signature 17 activities, IFN-α response activities, and IFI30 expression with patient survival time are quite consistent in our patient set and TCGA and GEO patient set. The results for gene level CNVs in our cohort and the TCGA cohort seem contradictory. For IFI44, the GEO data agreed with our finding, whereas the TCGA data showed a significantly negative correlation between high IFI44 expression and patient survival. IFI44 has been reported to function in many biological processes through IFN signaling pathways, including anti-proliferative activity in tumors 47,48 . High IFI30 expression in tumor cells has been linked to improved cancer survival 49,50 , possibly due to its ability to enhance the presentation of tumor antigens for T cell recognition 51 or to regulate the cellular redox state and proliferation 52 . Thus, the identified down-regulation of IFI44 and IFI30 might be a simplified prognostic marker for ACGEJ in clinical use.
Despite the advances discussed above, we acknowledge some limitations in the present study. Firstly, we focused on genomic alterations in coding sequences and barely touched upon noncoding elements, which may harbor additional ACGEJ drivers. Secondly, we have not fully investigated the extent of intratumor heterogeneity and how it may affect our findings. Both issues are on our agenda for future studies. In addition, the clinical or preclinical evidence for the anticancer agents we referred to mostly came from studies on Caucasian patients and thus it is uncertain whether and how much they will apply to Chinese patients.
In conclusion, our present study has profiled ACGEJ in Chinese patients as a CNV-dominant CIN-type tumor, revealed the types and the distributions of various druggable changes in tumor genomes, and identified genomic and transcriptomic prognostic markers that have potential clinical implications. These findings have furthered our understanding on ACGEJ and would help develop more effective therapeutic strategies to precisely fight this malignancy, especially for Chinese patients.

Methods
Biospecimen and clinical data. The biospecimen used in this study were obtained from 124 Chinese ACGEJ patients (Supplementary Table 3 Fig. 1a) were used for DNA and RNA sequencing. WGS data were generated from matched tumor and blood samples from 124 patients. The bulk RNA sequencing data were generated from matched tumor and normal tissue samples from 123 of the 124 patients. DNA and total RNA were extracted from tissue samples using the AllPrep DNA/RNA Kit (Qiagen). Blood DNA was extracted using QiaAmp Blood Midi Kit (Qiagen). Library preparation was as previously described 53 . All libraries were sequenced on Illumina HiSeq xTen in 2 × 150 bp paired-end mode.
Segmentation files output by BIC-seq2 (for our samples) or downloaded from the TCGA website (excluding the regions within 3 Mb around centromeres and 1 Mb at both chromosome ends) were used as input for GSITIC2.0 (v.2.0.23) 66 to quantify the CNV status of each gene in each tumor genome. In a tumor genome, we considered a gene undergoing homozygous deletion, copy-number loss, copynumber gain, or amplification if the corresponding GISTIC score = −2, ≤ −1, ≥ 1, or = 2, respectively. Significantly recurrent focal CNV regions (Supplementary Data 8,9) were also determined by GISTIC2.0 (FDR q < 0.25).
We considered a chromosomal arm undergoing copy-number gain or loss if ≥2/3 genes on that arm had GISTIC score ≥1 or ≤ −1, respectively. We defined arm level CNVs as chromosomal arms with copy-number gain or loss and gene level CNVs as gene amplifications or deletions. Only genes with GISTIC scores ±2 were counted when measuring gene level CNVs. We considered an ACGEJ genome undergoing WGD if ≥50% autosomes had a major allele copy number ≥2; the 50% threshold was determined following a previous study 23 (Supplementary Fig. 5). We measured the proportion of CNV affected genome using regions where | log2(reads ratio) | ≥0.3. Kataegis events were detected following the method of a previous WGS study 67 . Chromothripsis events were identified by ShatterSeek (v0.4) 45 based on the merged SV results and FACETS CNV results.
Identification of mutational signatures. We started by extracting de novo signatures from and inferring the prevalence of 30 published COSMIC mutational signatures in the Chinese cohort. Both had led us to recognize the prevalent activities of COSMIC Signatures 1, 3, 5, 8, and 17 (or their analogues). Thus, we only focused on these five signatures and clamped the activities of other signatures to zero in all subsequent analyses. We used deconstructSigs (v1.8.0) 68 to measure the activity of every signature in every tumor genome (the percentage of SNVs across the exome or genome attributable to the signature). For each somatic SNV, we determined its clonality and the mutational signature it most likely belongs to using Palimpsest (v.2.0.0) 69 .
Analysis of bulk-tissue RNA sequencing data. We used HISAT2 (v2.1.0) 70 to align RNA sequence reads to the Ensembl GRCh37 human genome and then used StringTie (v1.3.3b) 71 to re-assemble transcriptome and quantify the expression level of each gene in each sample. The assembly process was guided by the reference annotations from GENCODE (v19) 72 . Gene expression levels were quantified by Transcript per Million (TPM). We considered a gene differentially expressed in tumor and normal samples if the log2 + 1 transformed TPM change was significant in a paired Student's t-test (FDR q < 0.05) and the fold change of mean TPM was >1.2 or <0.8. In other comparing situations, we considered a gene differentially expressed in two groups of samples if the TPM change was significant in a paired Wilcoxon test (P < 0.05) and the fold change of median TPM was >1.2 or <0.8. The activity of specific pathway or gene signature in each tumor or normal sample was quantified using the R package GSVA (v1.30.0) 31 on log2 + 1 transformed TPM. The differential expression of specific pathway or gene signature between tumor and normal samples was assessed using gene set enrichment analysis (GSEA) 73 (implemented in the R package clusterProfiler v3.10.1 74 ). We quantified the fractions of 10 immune cell types (i.e., macrophages (M1 and M2), monocytes, neutrophils, B, NK, CD8 + /CD4 + /regulatory T, myeloid dendritic cells, and uncharacterized cells) in individual tumor and adjacent normal samples using quanTIseq 75 (implemented in the R package immunedeconv v2.0.0).
Identification of Signature17-correlated pathways and gene signatures. We identified these pathways and gene signatures using matched WGS and bulk RNAseq data generated from 123 patients. We first calculated partial Spearman's correlation (using the R package ppcor v1.1) between Signature 17 activities and the expressions of each gene across tumor samples, controlling for tumor purity and the activities of other four signatures (i.e., Signatures 1, 3, 5, and 8). Then we examined known cancer hallmark pathways and gene signatures obtained from a previous study 30 one by one to see whether they are enriched with Signature 17correlated genes using GSEA on the calculated correlation coefficients.
Analysis and integration of public data. To fairly compare the transcriptomic profiles of our ACGEJ samples and TCGA samples, we combined the gene expression data (measured by TPM) of both cohorts and that of normal gastroesophageal junction tissue samples from the Genotype-Tissue Expression (GTEx) Project, performed normalization and batch correction and used the processed data for comparative analyses. We normalized the combined gene expression dataset by keeping genes whose expression levels were quantified in all three individual datasets (n = 19,587) and rescaling the TPM values per sample to maintain a sum of 1 million. The ComBat function implemented in the R package sva (v3.28.0) 76 was used to remove the batch effect of different data sources (i.e., data source as a batch parameter). Tumor and normal samples in the combined dataset were respectively grouped together and appropriately separated from each other (Supplementary Fig. 6). The GSVA scores of featured pathways and gene signatures in either cohort before and after the processing were highly correlated; so were the estimated fractions of different immune cell types.
In vitro validation of drug vulnerability prediction. We used eight human cancer cell lines including the ACGEJ cell line OE19 (purchased from Beijing Beina Chuanglian Biotechnology Institute), esophageal adenocarcinoma cell lines OE33 and SK-GT-4 (purchased from Nanjing COBIOER Biosciences Company Limited), gastric adenocarcinoma cell lines AGS and HGC-27, and colorectal cancer cell lines HCT-116, LoVo and RKO (purchased from the Cell Bank of Type Culture Collection of Chinese Academy of Sciences Shanghai Institute of Biochemistry and Cell Biology). We obtained somatic mutation and CNV data of these cell lines from COSMIC Cell Lines Project (https://cancer.sanger.ac.uk/cell_lines) and predicted each cell line as vulnerable or invulnerable to specific chemotherapeutic or targeted therapeutic agents based on whether it carried corresponding gene alterations. The efficacy of 11 drugs was respectively assessed by cell viability measured using the CCK-8 kit (Dojindo Labs). Briefly, cells were treated with optimal dose of each drug and the viability was measured after incubation for 72 h. All analyses were performed in two independent experiments and each had three replicates.
Survival analysis. We used the log-rank test in univariate survival analyses and the Cox proportional hazards model in multivariate survival analyses (both implemented in the R package survival v2. . The Kaplan-Meier plot was used for presentation. The specific cutoff we used to dichotomize a continuous variable (e.g., gene level CNVs, Signature 17 activities, IFI44 expression) and then group patients was determined by testing a series of values of that variable with fixed increments and then choosing the one by which both log-rank and Cox P were <0.05 and the log-rank P was minimized. This process would not give us any options if there was no statistically significant result in the first place. The cutoffs we used to obtain the presented results were 3.45 per Mb for TMB, 382 for gene level CNVs, 17.1% for Signature 17 activities, −0.2256 for IFN-α response, 3.22 for IFI44 expressions, and 28.13 for IFI30 expressions. The same approach was applied to divide TCGA patients into good or poor survival groups, with the specific cutoffs of 1.73 per Mb for TMB, 42 for gene level CNVs, 48.5% for Signature 17 activity, −0.4892 for IFNα response, 12.09 for IFI44 expressions, and 10.80 for IFI33 expressions.
Identification of prognostic marker genes. To identify prognostic marker genes in the IFN-α response pathway, we first selected 40 genes whose expressions were (a) significantly correlated with the pathway GSVA scores across tumor samples (| Spearman's ρ| > 0.3, P < 0.05) and (b) significantly different between the two survival groups divided based on their IFN-α pathway GSVA scores (t-test P < 0.05). Next, we tested each of the 40 genes and found 13 of them were significantly correlated with patient survival (with at least one cutoff under which log-rank and Cox P < 0.05). We finally picked IFI44 and IFI30 out of the 13 genes using a Lasso-Cox method as implemented in the glmnet package v3.0-2 77 .
Statistical analysis. We used Fisher's exact test for any independence test between two categorical variables and Wilcoxon rank-sum test for any independence test between a continuous variable and a binary categorical variable, when there was no covariate to adjust for. Otherwise, we used an F-test to compare two generalized linear models, one of which included the variable being evaluated as a predictor. Spearman's rank correlation coefficient was used to measure the correlation between two continuous variables.