Introduction

Esophageal cancer (EC) and gastric cancer (GC) are two common gastrointestinal cancers worldwide, with 456,000 new cases and 400,000 estimated deaths per year for EC, and 951,000 new cases and 723,000 estimated deaths per year for GC, respectively [1]. The stomach is connected to the esophagus through gastroesophageal junction, which is also known as cardia. Based on the anatomical location, GC can be classified into two types: true gastric (non-cardia) and gastroesophageal junction cancers (cardia) [2]. The majority of GC is gastric adenocarcinoma, whereas EC consists of two histopathological types, esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) [3]. ESCC is predominant around the world especially in China, whereas EAC subtype is a major type in the United States, Australia, the United Kingdom, and other European countries [4].

Both environmental and genetic factors contribute to the development of EC and GC. There are similarities and differences in the risk factors for ESCC and EAC. Tobacco use was associated with increased risk of both ESCC and EAC [5,6,7]. Alcohol consumption is a specific risk factor for ESCC [8, 9], whereas gastroesophageal reflux disease, obesity, and Barrett’s esophagus were associated with increased susceptibility of EAC [9]. The known environmental risk factors for GC are Helicobacter pylori infection, smoking, obesity, low intake of fresh fruits and vegetables, and high consumption of salted foods [2]. Genome-wide association studies (GWAS) have been conducted to explore genetic variants influencing the susceptibility of EC and GC over the past few decades [10,11,12]. A missense mutation located in PLCE1 named rs2274223 was found to be associated with risk of ESCC and gastric cardia cancer [10].

In consideration of the close anatomical location and similarities among risk factors between EC and GC, we hypothesized that the genetic basis of developing EC and GC might have something in common. To test whether single EC risk variant or cumulative genetic risk score computed using established EC risk loci were also associated with GC risk, we utilized risk loci reported in the published EC GWAS and tested whether they were associated with GC in our large case–control studies.

Materials and methods

Study populations

Participants of the current study were from three published GC GWAS. The Nanjing GWAS (565 cases and 1162 controls) and the Beijing GWAS (468 cases and 1123 controls) were based on two independent case–control studies, which were reported previously [12]. For the National Cancer Institute (NCI) GWAS (1625 cases and 2100 controls), subjects were recruited from Shanxi and Linxian [10]. All cases in the Nanjing and Beijing GWAS were diagnosed with non-cardia GC, whereas cases in the NCI GWAS contained both cardiac and non-cardia GC. In total, 2631 GC cases and 4373 controls were included in the current study. Basic demographic information of the participants was shown in Supplementary table 1.

Genetic variants selection

We searched the GWAS catalog (https://www.genome.gov/gwastudies/, last accessed on 25 July 2017) for genetic variants associated with EC risk. Besides, we searched PubMed database (https://www.ncbi.nlm.nih.gov/pubmed/) for recently published EC GWAS. The reported EC risk loci were filtered using the following criteria: (1) the reported significance level of the association reaching 5.00 × 10−8; (2) the minor allele frequency (MAF) of variants not < 1% in the Chinese population (1000 Genomes phase III); (3) for variants in linkage disequilibrium (LD) defined as r2 > 0.1, we selected the variant with the lowest P value. Finally, nine GWAS (eight studies for ESCC, one study for EAC) were included in our current study [10, 11, 13,14,15,16,17,18,19]. 42 single nucleotide polymorphisms (SNP) reached the predefined significance level of association, but four had the MAF of < 1% in the Chinese population. After excluding SNPs in LD, 21 SNPs (18 SNPs for ESCC, 3 SNPs for EAC) were included in the final statistical analysis. Detailed information about the eligible EC risk loci are shown in Supplementary table 2.

Imputation and quality control

After basic quality–control procedures performed in GWAS, we excluded SNPs with call rate < 95%, MAF < 0.01, and Hardy–Weinberg equilibrium P value < 1.00 × 10−6. Then, we performed imputation for the Nanjing (Affymetrix 6.0), the Beijing (Affymetrix 6.0) and the NCI (Illumina 660 W) GWAS separately using software SHAPEIT [20] and IMPUTE2 [21]. We used all populations from the 1000 Genomes Project Phase III as the reference set. After imputation, we further excluded SNPs with poor imputation quality (info score < 0.3) and repeated the quality control procedures for SNPs mentioned above. Among the 21 selected SNPs, we did not get genotype information of rs76014404 and rs8030672 from the Nanjing/Beijing or the NCI GWAS. Therefore, we used two SNPs (rs2143771 and rs116760846) in complete LD with these SNPs in the following analyses (Supplementary table 3).

Calculation of weighted genetic scores

Weighted genetic score (WGS) was calculated to evaluate the cumulative effect of esophagus cancer risk loci on GC risk. We calculated two independent WGS for ESCC and EAC, as there was high heterogeneity in the genetic background between these two subtypes. We also combined the two subtypes to measure the WGS for overall EC. For each individual, WGS was calculated by multiplying the number of risk alleles by the EC-associated beta (βj), which was derived from published studies. For rs2274223, which was reported in more than one study, we estimated its effect on EC based on meta-analysis. To calculate WGS for the ith subject, the following formula was used:

$$WGS_{\mathrm\it{i}} = \mathop {\sum }\limits_1^j {\mathrm\it{x}}_{{\mathrm\it{ij}}}\beta _{\mathrm\it{j}}$$

In this formula, xij is the number of risk alleles for the j-th variant in the i-th subject (xij=0, 1, or 2) and βj is the coefficient or weight for the j-th variant (calculated by ln-transformed of odds ratios (ORs) from published studies).

Differential expression analysis

Expression data (normalized expectation-maximization read counts) were downloaded from the Cancer Genome Atlas database, which consisted of 87 EAC tissues, 10 EAC paired normal tissues, 85 ESCC tissues, three ESCC-paired normal tissues, 413 GC tissues, and 32 GC paired normal tissues, respectively. Expression data were log2 transformed to correspond to normal distribution. Paired t test (10 EAC tissues vs. 10 EAC paired normal tissues, 3 ESCC tissues vs. 3 ESCC paired normal tissues, 32 GC tissues vs. 32 GC paired normal tissues) and two-sample t test (87 EAC tissues vs. 10 EAC paired normal tissues, 85 ESCC tissues vs. 3 ESCC paired normal tissues, 413 GC tissues vs. 32 GC paired normal tissues) were used to evaluate differential expression among tumor and normal tissues.

Statistical analysis

Genetic association analysis was conducted by using logistic regression models. When dealing with association between single locus and GC risk, we assumed an additive genetic model in logistic regression. For GC risk-associated variants, we estimated the cumulative effect based on the number of risk alleles. We included the WGS in the logistic regression model both as a continuous variable and categorical variable. For the Nanjing and the Beijing GWAS, we adjusted for age, sex, smoking, drinking status, and principal component analysis (PCA) for population stratification, and for the NCI GWAS, we adjusted for age, sex, and PCA in the regression models. Meta-analysis was used to combine results from the three GWAS, and Cochran’s Q was used for heterogeneity test. Fixed-effect model was applied to assume the combined effect, whereas random effect model was repeated if I2 (calculated by 100% × (Q–(n–1))/Q) was > 75%. Differential expression analysis was performed based on two-sample t test or paired t test. Analyses were performed with Stata version 11 or R version 3.2.1, unless otherwise noted.

Results

Association between single variant and GC risk

As shown in Table 1, among the 21 known genetic variants associated with EC risk, three were significantly associated with GC risk. Consistent with the previous report [10], the G allele of rs2274223 at 10q23.33 (reported gene: PLCE1) was associated with increased GC risk (OR = 1.26, 95% confidence interval (CI):1.16–1.38, P = 6.51 × 10−8). Considering that PLCE1 locus was identified as a common locus between ESCC and GC in the previous GWAS using NCI samples, we reanalyzed the association between rs2274223 and GC risk by excluding samples from NCI and found the direction of the association was consistent but not significant (per G allele: OR = 1.07, 95% CI: 0.94–1.23, P = 0.31). In addition, rs10052657 at 5q11.2 (reported gene: PDE4D) and rs671 at 12q24.12 (reported gene: ALDH2) were also associated with GC risk (OR = 1.12, 95% CI: 1.01–1.25, P = 3.28 × 10−2; OR = 0.83, 95% CI: 0.75–0.91, P = 1.14 × 10−4, respectively). Nevertheless, there were obvious heterogeneities among three studies for all the three discovered SNPs, and they were insignificant when random effect model was used in meta-analysis. We did not find significant associations with GC risk for the remaining 18 variants.

Table 1 Associations of 21 known EC risk variants with GC risk

We further examined the cumulative effect of these three variants (rs2274223, rs10052657, and rs671) on GC risk (Table 2). We found a strong tendency of increased GC risk with greater numbers of risk alleles (OR = 1.31, 95% CI: 1.19–1.44, P = 2.34 × 10−8).

Table 2 Combined analysis of association between three identified genetic variants and risk of GC

Association of EC WGS with GC risk

Because the observed effect of rs671 on ESCC and GC was in opposite directions, we derived the WGS based on the reported effect size of the 20 variants (excluding rs671) from the original EC study (Supplementary table 1) and evaluated the association of EC WGS and risk of GC (Table 3 and Table 4). We found that the EC WGS was significantly associated with increased risk of GC (OR = 1.15, 95% CI: 1.06–1.25, P = 1.20 × 10−3 for continuous WGS and OR = 1.08, 95% CI: 1.03–1.13, P = 9.11 × 10−4 for trend test for WGS categories).This association was mainly restricted to ESCC (OR = 1.16, 95% CI: 1.07–1.27, P = 5.52 × 10−4 for continuous WGS and OR = 1.09, 95% CI: 1.04–1.14, P = 2.71 × 10−4 for trend test for WGS categories) but not to EAC (OR = 1.02, 95% CI: 0.92–1.13, P = 0.66 for continuous WGS and OR = 0.99, 95% CI: 0.95–1.04, P = 0.70 for trend test for WGS categories).

Table 3 Association between WGS (as a continuous variable) and risk of GC
Table 4 Association between WGS category and risk of GC

Differential expression analysis of candidate genes

We further analyzed whether the expression levels of the genes associated with those three variants (rs2274223, rs10052657, and rs671) were altered in cancer tissues compared with normal tissues of esophagus and stomach (Supplementary figure 1). We found that PDE4D and ALDH2 were downregulated in both ESCC and GC tissues as compared with normal tissues. However, we did not observe differential expression for PLCE1 in either ESCC or GC tissues.

Discussion

In the current study, we investigated whether the known EC risk loci were associated with GC risk using 2631 GC cases and 4373 controls of Chinese ancestry. We found that the G allele of rs2274223, C allele of rs10052657, and G allele of rs671 were associated with increased risk of GC, and higher WGS of ESCC was associated with increased risk of GC.

PLCE1 is located on chromosome 10q23, which encodes a phospholipase enzyme that catalyzes the hydrolysis of phosphatidylinositol-4,5-bisphosphate to generate two second messengers: inositol 1,4,5-triphosphate and diacylglycerol [22]. In addition, it also interacts with small monomeric GTPases of the Ras and Rho families and heterotrimeric G proteins [23]. Thus, PLCE1 regulates various processes affecting cell growth, survival, differentiation, gene expression, and oncogenesis. Several studies found that the missense variation of rs2274223 in PLCE1 was significantly associated with ESCC [24] and gastric cardia cancer [25], but not EAC [26], which is consistent with our findings.

Rs10052657 was another locus significantly associated with GC. However, no additional study reported the association between rs10052657 and the risk of GC and EC. Rs10052657 is located in intron 5 of PDE4D, a gene that hydrolyzes the second messenger cAMP (cyclic adenosine monophosphate) and acts as a signal transduction molecule in multiple cell types. Previous studies have identified genetic variants in PDE4D were associated with risk of several cancers including breast cancer [27]. PDE4D was also found to be a diver gene participated in the development of cancer, and involved in cancer progression by accelerating proliferation [28,29,30]. PDE4D was overexpressed in prostate cancer [30, 31], whereas some PDE4D isoforms were downregulated [32, 33]. In our study, we observed down-regulation of PDE4D in both ESCC and GC tissues. These findings support the biological plausibility that genetic variants in PDE4D may confer altered risk to ESCC and GC, whereas the potential mechanism may involve different PDE4D isoforms as reported in studies of prostate cancer.

Unlike rs10052657, whose C allele increased risk of the above two cancers, A-allele of rs671 promoted ESCC but protected from GC. The SNP rs671 is located in the twelfth exon of ALDH2 at 12q24.12. ALDH2 belongs to the aldehyde dehydrogenase family and participates in pathway of alcohol metabolism. Several studies have shown that ALDH2 was associated with susceptibility to cancers including GC [34, 35], head and neck cancer [36], and colorectal cancer [37]. Rs671 has been reported to be associated with ESCC [38, 39], and overall EC [40], but the conclusions were inconsistent. Rs671 was also reported to influence GC risk, though their findings were opposite to ours [41, 42]. Alcohol consumption and rs671 were considered simultaneously when evaluating their associations with GC risk in those studies. Recently, one study reported that rs671 may not increase gastric cardia adenocarcinoma (GCA) susceptibility in Chinese Han populations, but the proportion of ALDH2 mutated allele carriers in GCA high-incidence areas was lower than that in low-incidence areas [43]. The above evidence suggested that the mutated allele may have a potential role in decreasing GC risk. In addition, Subjects with mutated ALDH2 exhibited a lower level of alcohol consumption than wild ALDH2 carriers [44]. Alcohol consumption is an established risk factor for cancers [45, 46] and healthy lifestyle like controlling alcohol intake is benefit for keeping cancers away. In conclusion, effect of rs671 on GC might depend on consumption of alcohol and A-allele may turn from a protective role to hazardous role in GC if subjects intake high level of alcohol.

From the analysis based on WGS, we observed a significant association between GC risk and WGS of ESCC rather than EAC. It suggested that ESCC and GC may share common genetic background. ESCC is the main histological type of EC in Asian countries, whereas the incidence of EAC now exceeds ESCC in European and American countries. There are great differences between the two histological types of EC in pathophysiology and pathogenesis [4]. However, most ESCC associated loci were discovered based on participants form Asian. In consideration of the prevalence of ESCC and GC in China, and the well-known shared genetic variant rs2274223 between the two cancers, there may be other genetic loci participate in occurrence of both cancers. Therefore, it is rational and credible to discover the shared genetic background between ESCC and GC. Although we did not find connections between EAC and GC, it may be explained by population heterogeneity as EAC associated loci were reported in European ancestry. Moreover, only a few EAC risk loci were reported and used in the current study, which might be less representative for genetic risk of EAC based on WGS of EAC.

There are some limitations in our study. First, although the association between ESCC WGS and GC risk was significant, there were obvious heterogeneities among three studies for three SNPs. The associations became weak as considering the heterogeneity and should be treated in caution. Second, we did not conduct subgroup analysis based on cardiac and non-cardiac GC, which limited our analysis on the different impact of these genetic factors among tumor subtype. Third, there are considerable difference between ESCC and GC, including environment and patient characteristics, which may have introduced potential bias on our results. Therefore, our results were preliminary and should be further evaluated in future studies.

In summary, we evaluated the genetic association between EC and GC, and found shared genetic susceptibility between ESCC and GC. In the future, more studies with larger sample sizes and multiple populations are needed to help detect the relationship between the genetic basis of EC and GC.