Genome-wide association meta-analysis identifies novel GP2 gene risk variants for pancreatic cancer in the Japanese population

The etiology of pancreatic cancer remains largely unknown. Here, we report the results of a meta-analysis of three genome-wide association studies (GWASs) comprising 2,039 pancreatic cancer cases and 32,592 controls, the largest sample size in the Japanese population. We identified 3 (13q12.2, 13q22.1, and 16p12.3) genome-wide significant loci (P<5.0×10-8) and 4 suggestive loci (P<1.0×10-6) for pancreatic cancer. Of these risk loci, 16p12.3 is novel; the lead SNP maps to rs78193826 (odds ratio (OR)=1.46, 95% CI=1.29-1.66, P=4.28×10-9), an Asian-specific, nonsynonymous glycoprotein 2 (GP2) gene variant predicted to be highly deleterious. Additionally, the gene-based GWAS identified a novel gene, KRT8, which is linked to exocrine pancreatic and liver diseases. The identified GP2 gene variants were pleiotropic for multiple traits, including type 2 diabetes, hemoglobin A1c (HbA1c) levels, and pancreatic cancer. Mendelian randomization analyses corroborated causality between HbA1c and pancreatic cancer. These findings suggest that GP2 gene variants are associated with pancreatic cancer susceptibility in the Japanese population, prompting further functional characterization of this locus.


6
GWAS to populations of non-European ancestry because of differences in minor allele frequencies (MAFs) and patterns of linkage disequilibrium (LD) across diverse populations 7 .
In fact, previous GWASs focusing exclusively on populations of Eastern Asian ancestry led to the identification of new susceptibility loci for breast and colorectal cancers 8, 9. The majority of the risk loci for pancreatic cancer were discovered in the PanScan GWASs, which included populations of European ancestry. Only two GWASs have been conducted in East Asian populations: one in China 10 and one in Japan 11 . A total of 8 risk loci (5 genomewide significant loci and 3 loci with suggestive evidence of association) have been identified for pancreatic cancer, but these loci were not replicated in a previous study using samples from European populations 12 . Therefore, the role of common susceptibility loci in East Asian populations remains uncertain and needs further exploration. To detect additional susceptibility loci for pancreatic cancer, we conducted another GWAS in the Japanese population and then performed a meta-analysis combining all published and unpublished GWAS data in Japan.
The lead SNP maps to rs78193826, a nonsynonymous variant of the GP2 (Figure 2) gene.  Table 4). LD maps of these 10 SNPs at 16p12.3 are shown in Supplementary Figure 3.
Complete LD between 9 of these SNPs (all except rs4420538) was observed in the Japanese population. Among the 10 SNPs in this region, only rs4383153 had available association summary statistics in the previous PanScan publications 2,13 , but this SNP was not significantly associated with pancreatic cancer risk (Supplementary Table 5). The functional annotation results for the 10 SNPs at 16p12.3 are shown in Supplementary Table 4. The lead SNP rs78193826 was classified as "damaging" according to the Sifting Intolerant from Tolerant (SIFT) algorithm and as "possibly damaging" by Polymorphism Phenotyping v2 (PolyPhen-2). Moreover, the estimated combined annotation-dependent depletion (CADD) score was 20.3. For replication, we selected 4 SNPs (rs78193826, rs73541251, rs117267808, rs4632135) that met either of the following criteria: 1) exonic SNP or 2) intronic SNP with a score of 3 or less according to the Regulome DB database.
Analysis of another independent replication cohort comprising 507 cases and 879 controls showed that rs4632135 (an intronic variant) was nominally significantly associated with pancreatic cancer risk (P<0.05), whereas the other 3 SNPs did not show nominal significance (Table 2). However, the direction and magnitude of the effects for all 4 SNPs was consistent with those observed in the GWAS meta-analysis. Furthermore, the combined analysis of the three GWASs and the replication dataset yielded lower P values than those of the GWAS meta-analysis for each SNP ( Table 2), suggesting that the novel risk locus at 16p12.3 discovered in our GWAS meta-analysis was unlikely to be false positive.
To examine whether the associations of T2D and T2D-related quantitative traits with pancreatic cancer are consistent with a causal effect, we performed a Mendelian randomization (MR) analysis with the inverse variance-weighted (IVW) and MR-Egger methods. No significant associations were observed between SNP-modulated T2D and pancreatic cancer based on the IVW method (Figure 3a). Instead, genetically increased To complement the SNP-based GWAS, we performed a gene-based GWAS using MAGMA 18 (Supplementary Figure 5). We confirmed the significant associations for GP2 and WNT2B identified by the SNP-based GWAS. Notably, a novel significant association (Bonferroni-corrected P < 2.84 × 10 −6 ) for the gene KRT8 was observed (Supplementary Table 10 and Supplementary Figures 5 and 6), and this association was further replicated in the PanScan 1 and PanScan 2 datasets (P=0.024) 13 .

Discussion
The role of inherited common genetic variations in pancreatic cancer susceptibility remains incompletely understood. We identified and replicated a novel risk locus at 16p12.3 for pancreatic cancer through combining three GWAS datasets in the Japanese population.
Furthermore, we provided evidence that the identification of this new locus can be attributed to the observed differences in the MAF of the lead SNP (rs78193826) at 16p12.3 and the LD structure in this region across ethnic populations.

0
Little overlap has been observed when risk loci reported from previous Chinese or Japanese GWASs are compared with those reported in the PanScan GWASs 2 . By including more than twice the number of cases than were included in previous Japanese or Chinese GWASs as well as imputed SNP data, we replicated the majority of the significant risk loci discovered in the PanScan GWASs (Supplementary Table 6). Moreover, for most variants, the direction and magnitude of the effects in our GWAS meta-analysis of Japanese subjects were consistent with those in populations of European ancestry. These findings suggested that GWASidentified causal variants at many loci are shared across ancestral groups and that the lack of replication may be due to an insufficient sample size in previous Chinese or Japanese GWASs.
Several lines of evidence indicate that rs78193826 is most likely a causal variant at 16p12.3, which harbors the GP2 gene. First, this variant is nonsynonymous; the nucleotide mutation from C to T causes an amino acid change from valine to methionine, which could affect protein structure and function. Second, functional annotations in several databases consistently indicate that this variant is highly pathogenic. Third, the observed differences in the MAF of rs78193826 as well as the LD structure across different ethnic populations provide indirect evidence supporting its role as a causal variant in the Japanese population.
The frequency of the minor T allele of rs78193826 is 0.1% in populations of European ancestry but 7% in the Japanese population. Given this apparent difference in the MAF, 1 1 the minor T allele of rs78193826 ranges from 3.9% to 6.6% in other Asian populations, rs78193826 is likely to be a causal variant for pancreatic cancer that is specific to Asian populations. However, further transethnicity replicability and fine mapping are necessary to establish the causal role of this variant.
Genetic variations in the GP2 gene have been linked to several phenotypes in addition to pancreatic cancer. The SNP rs12597579, located in the upstream region of the GP2 gene, has been associated with body mass index (BMI) in a GWAS including East Asians 19 . However, rs12597579 was not in LD with rs78193826 (r 2 =0.003, calculated from Japanese samples in the 1000 Genomes Project Phase 3), suggesting that rs12597579 may have functions different from those of rs78193826. Coincidently, the lead variant (rs117267808) in the GP2 gene identified in the latest GWAS meta-analysis of T2D in the Japanese population is the same variant that we identified in our GWAS meta-analysis of pancreatic cancer (Supplementary Table 7). Of the 82 T2D-related SNPs, 5 showed significant associations (P<0.05) with pancreatic cancer, suggesting that pancreatic cancer and T2D may share specific genetic susceptibility factors. Furthermore, the risk alleles of rs78193826 and rs117267808 were identical for pancreatic cancer and T2D. Together, these findings indicate that GP2 variants may exert pleiotropic effects on multiple traits.
The newly identified lead SNP (rs78193826) encodes the GP2 protein, which is present on the inner surface of zymogen granules in pancreatic acinar cells 20  However, accumulating evidence has demonstrated the effects of GP2 on the innate immune response 28,29 . The GP2 is also expressed in the membranous (M) cells of the intestinal epithelium in humans and mice, where it acts as an uptake receptor for a subset of commensal and pathogenic bacteria 30 . For example, GP2 and its closest homolog, uromodulin, have been shown to bind to Escherichia coli (E. coli) that express type 1 fimbriae 31 . In particular, uromodulin null mice showed increased sensitivity to urinary tract infections 24 . These findings suggest that GP2 may also play a role in host defense in the pancreas, given that proteobacteria have been detected in pancreatic ductal adenocarcinoma samples as well as in the normal human pancreas 32 .
Previous epidemiological studies have suggested that HbA1c levels, even in nondiabetic ranges, or changes in HbA1c levels in new-onset T2D are associated with pancreatic cancer risk 33,34 . Our MR analysis provided corroborating evidence that genetically increased HbA1c levels may be causally associated with pancreatic cancer risk. This result was also partially consistent with a previous MR analysis, in which T2D was not causally implicated but BMI and fasting insulin were causally associated with pancreatic cancer 35  In conclusion, our GWAS meta-analysis identified a novel risk locus at chromosome 16p12.3, which harbors the GP2 gene, for pancreatic cancer in the Japanese population.
Further fine mapping and functional characterization are required to elucidate the effects of common GP2 gene variants on pancreatic cancer susceptibility. Moreover, our findings highlight genetic susceptibility factors shared between T2D and pancreatic cancer.

Study samples.
We performed a GWAS meta-analysis based on three Japanese studies: the All studies were imputed based on the 1000 Genomes Project reference panel (Phase 3).    Table 2). This project was approved by the ethics committee of the NCC.

Quality control after genotype imputation
After genotype imputation, quality control was applied to each study. SNPs with an imputation quality of r 2 < 0.5 or a MAF of <0.01 were excluded. SNPs that passed quality control in at least two cohorts were included in the meta-analysis.

Association analysis for SNPs and pancreatic cancer
The association of pancreatic cancer with SNP allele dose was tested using logistic 1 6 regression analysis with adjustment for the top 2 principal components. Other known covariates, such as age, sex, and cigarette smoking, were not included in the analysis because the inclusion of covariates has been shown to substantially reduce the power for the identification of disease-associated variants when the disease prevalence is less than 2% 41 .
The effect sizes and standard errors were used in the subsequent meta-analysis.

Meta-analysis
We performed a meta-analysis of three pancreatic cancer GWASs (JaPAN, BBJ and NCC).
The association results for each SNP across the studies were combined with METAL software in a fixed effects inverse variance-weighted meta-analysis. Heterogeneity in allelic effects was assessed using the I 2 index. The meta-analysis included 7,914,378 SNPs with genotype data available from at least two cohorts. A P value threshold of 5 × 10 -8 was used to determine genome-wide significance. We assessed the inflation of test statistics using the genomic control lambda.

Replication analysis
The replication cohort comprised 507 cases and 879 controls who were recruited under the same framework as the multi-institutional case-control study of the JaPAN consortium

Functional annotations
To prioritize the associated SNPs at the novel loci, we adopted a series of bioinformatic 1 7 approaches to collate functional annotations. We first used ANNOVAR 42 to obtain an aggregate set of functional annotations-including the gene location and the impact of the amino acid substitution based on prediction tools such as SIFT, PolyPhen-2, and CADD-for SNPs with a P value of <5 × 10 -8 for pancreatic cancer. We also explored potential effects on gene regulation by annotating these SNPs using the RegulomeDB database 43 .

MR analysis
We performed MR analyses using independent, genome-wide significant T2D-associated or HbA1c-associated SNPs, which were available from two published GWAS meta-analyses in Japanese subjects, as instrumental variables 16,17 . For the two-sample MR analysis of T2D and pancreatic cancer, we did not exclude the overlapping samples (15.5% found only in the controls) because retaining these samples was unlikely to introduce substantial bias 44 . A total of 106 pancreatic cancer cases were excluded in the HbA1c GWAS, and the effect sizes for the HbA1c-associated SNPs were reestimated. After the exclusion of 6 SNPs on the X chromosome (5 SNPs for T2D and 1 SNP for HbA1c) and an SNP for T2D (rs35678078) without genotype data, the summary data for 82 T2D-related SNPs and 25 HbA1c-related SNPs and the associations of these SNPs with pancreatic cancer risk were analyzed using IVW and MR-Egger regression methods. MR analysis was performed with the MendelianRandomization package 45 .

Gene-based analysis
SNP-based P values were combined into gene-based P values using MAGMA software version 1.06 18 . SNP summary statistics (P values) from the meta-analysis were used as input for MAGMA. In gene-based association tests, LD between SNPs was accounted for, and the 1 8 located between the first exon and the last exon of a gene were used to calculate the gene-     HbA1c-associated SNPs.

Figure 3b
Beta for Beta for pancreatic cancer HbA1c