Common germline variants within the CDKN2A/2B region affect risk of pancreatic neuroendocrine tumors

Pancreatic neuroendocrine tumors (PNETs) are heterogeneous neoplasms which represent only 2% of all pancreatic neoplasms by incidence, but 10% by prevalence. Genetic risk factors could have an important role in the disease aetiology, however only a small number of case control studies have been performed yet. To further our knowledge, we genotyped 13 SNPs belonging to the pleiotropic CDKN2A/B gene region in 320 PNET cases and 4436 controls, the largest study on the disease so far. We observed a statistically significant association between the homozygotes for the minor allele of the rs2518719 SNP and an increased risk of developing PNET (ORhom = 2.08, 95% CI 1.05–4.11, p = 0.035). This SNP is in linkage disequilibrium with another polymorphic variant associated with increased risk of several cancer types. In silico analysis suggested that the SNP could alter the sequence recognized by the Neuron-Restrictive Silencer Factor (NRSF), whose deregulation has been associated with the development of several tumors. The mechanistic link between the allele and the disease has not been completely clarified yet but the epidemiologic evidences that link the DNA region to increased cancer risk are convincing. In conclusion, our results suggest rs2518719 as a pleiotropic CDKN2A variant associated with the risk of developing PNETs.


years
. PNETs represent only 2% of all pancreatic neoplasms by incidence, but 10% by prevalence 2,5 . PNETs have been poorly studied, due to their rarity, and compared to other cancer types very little is known regarding either environmental or genetic risk factors for their occurrence. The role of traditional cancer risk factors such as smoking and alcohol consumption seems controversial [6][7][8] . On the contrary, type two diabetes (T2D) and family history of cancer have been consistently associated with PNET risk 7,8 .
Therefore genetic risk factors may have a role in disease aetiology, at least in a subset of PNET cases. Despite this, the impact of the genetic variability in the disease incidence is poorly understood. Only a small number of case-control studies have been performed to uncover the genetic susceptibility to PNETs [9][10][11][12] and no genome wide association study (GWAS) has been performed yet.
With the goal of further our knowledge on PNET susceptibility we have performed a case-control study considering the genetic variability of the CDKN2A/2B region. The selection of the region was motivated by a fairly large amount of evidences pointing to a key role of this locus in pancreatic cancer onset and prognosis. For example CDKN2A is commonly mutated or de-regulated in both endocrine and exocrine pancreatic cancer 13 . Genetic polymorphisms in the locus have been reported to be associated with type two diabetes mellitus (T2DM), which is a one of the few suggested risk factors for PNETs [14][15][16] , suggesting a shared genetic background between T2DM and PNETs. Genetic variants belonging to the CDKN2A/2B region have been identified through GWAS as susceptibility markers for several human traits and diseases, including a large number of tumor types [17][18][19][20][21][22][23][24] . In addition we have recently showed the association of the CDKN2A/2B-rs3217992 SNP with increased risk of pancreatic ductal adenoma carcinoma (PDAC) 25 . The pleiotropic role of this region is justified by its crucial role in the regulation of the cell cycle 26,27 . Finally in a manuscript investigating the genetic susceptibility to endocrine tumors (NETs) Ter-Minassian and colleagues have suggested the association of four SNPs (representing two independent signals because of high linkage disequilibrium (LD)) in this region and an increased risk of the disease 11 .
Our hypothesis was that common genetic variability at the locus could modulate the risk of developing PNETs, as it has been shown for other cancer types.

Results
Data filtering and quality control. The origin of the population by country is shown in Table 1. None of the SNPs were out of Hardy-Weinberg equilibrium (HWE) in controls (p > 0.05). A total of 311 subjects (17 PNET cases and 294 controls) were removed after genotyping because they had a call rate < 75%. After the removal of these subjects the average SNP call rate was 95.54% with a minimum of 89.79% (rs3731246) and a maximum of 99.15% (rs3218009). The quality control analysis showed a concordance rate of 99.68% between the duplicate samples. After exclusions, 320 cases and 4,436 controls were used for statistical analyses.

SNPs main effect.
We observed a statistically significant association between the carriers of the A allele of the rs2518719 SNP and an increased risk of developing PNET (OR hom = 2.08, 95% CI 1.05-4.11, p = 0.035). The association was statistically significant only comparing the rare and common homozygous individuals. None of the other SNPs showed any statistically significant associations. The frequencies and distributions of the genotypes, the odds ratios (ORs) for the association of each polymorphism with PNET risk and relative confidence intervals (CI) are shown in Table 2.
Possible functional effects. We used several bioinformatic tools to predict possible functional relevance for the SNPs showing the most significant associations. RegulomeDB showed a score of 4 suggesting the presence of a transcription factor binding motif and a DNase sensitivity peak for rs2518719. HaploReg also suggested the presence of a DNase sensitivity peak and the polymorphism to alter the sequence recognized by the Neuron-Restrictive Silencer Factor (NRSF) regulatory repressor. No significant association between rs2518719 and expression of any gene is reported in the GTEx project. We used The SNAP software to find SNPs in LD with rs2518719 and we found 9 variants that had a minimum LD of 0.760 (rs2188127, rs3731222, rs3731217, rs3731204, rs3731198, rs2811711, rs495490, rs575427, rs647188) but also for them there was no evidence of association with gene expression in GTEx.

Discussion
The background of common genetic susceptibility to sporadic PNETs is largely unknown. The rarity of the disease is certainly one motivation for the scarcity of the information on the disease genetic susceptibility. Only a small number of studies have been performed with the largest study having 101 cases and 432 controls 11 . To further our understanding on the topic we conducted the largest study on the disease, with up to 320 cases and 4,436 controls, taking advantage of the mainframe of the Pancreatic Disease Research (PANDoRA) Consortium.
The only finding of potential significance was that the carriers of the rare A allele of the rs2518719 SNP had an increased risk of developing the disease. This SNP belongs to the CDKN2A gene and lies in the second intron of the gene around 2000 bp from the start of the 3′ UTR. Rs2518719 is in tight LD with another variant in the gene, rs3731217 (r 2 = 0.925, D' = 1 in Caucasian, as reported by 1000 Genomes), that lies in the first intron of the gene. This latter SNP is a well known pleiotropic susceptibility polymorphism and it has been found to be associated with increased risk of developing childhood acute lymphoblastic leukemia 28,29 , differentiated thyroid carcinoma 30 and salivary gland carcinoma 31 . In addition the SNP was also reported to modulate the survival of oropharingeal cancer patients 32 . Despite all these evidences, indicating a role of the variant allele in developing various diseases, no functional studies have been performed and therefore it is not possible to elucidate a possible direct effect of the SNP. It has been suggested that rs3731217 might be involved in the regulation of the p53 gene expression, but given the before mentioned lack of direct functional evidence this remains highly speculative 32 . The observation that it is always the same allele, in different cancer types, to be associated with an increased risk indicates a pleiotropic role for rs2518719/rs3731217 (or one of the other variants in tight LD) and also strongly suggests that the causal variant alters the function of the protein, the regulation of the gene expression or both in a way that influences the chances of developing cancer in different organs. The 9p21.3 locus in general, and the CDKN2A/CDKN2B genes in specific, are a classic examples of pleiotropic regions since they are associated with a very large number of human traits and diseases [17][18][19][22][23][24] . Pleiotropic regions are probably more accessible DNA stretches than normal and therefore variability within them may result to be non neutral more likely than in any other randomly selected DNA sequence. However the regulation of pleiotropic region is likely to be more complex than other genome parts and therefore this increases the difficulties in understanding the effect of the genetic effect at each single locus.
The results from HaploReg suggested that rs2518719 could alter the sequence recognized by the NRSF regulatory repressor. NRSF is encoded by the RE1-Silencing Transcription factor (REST) gene, and its deregulation has been associated to the development of several tumors including colon and lung cancer 33 . Therefore a possible explanation of the pleiotropic valence of this SNP may rely on the regulatory effect of the sequence recognized by NRSF/REST. However, even if it is intriguing, also this functional explanation remains speculative especially considered that NRSF is primarily involved in the silencing of neural genes in non neuronal tissues 34 . The lack of reliable functional data and eQTLs for the SNP can be explained by the fact that the entire CDKN2A/2B region is under a very complex gene regulation.
The results from Ter-Minassian and colleagues, in their study on neuroendocrine tumors, are in agreement with what we found suggesting, also in their case, an increased risk specifically of PNET for individuals carrying the rare allele for rs2518719 and the linked rs3731217 and rs3731198 11 . In that paper the authors also observed an independent signal from the rs3731211 variant. Considering our sample size and their ORs we had a power greater than 99% of finding the association but we did not. Considering also the p value reported (p = 0.042) the most likely explanation for this discrepancy is that the association was due to statistical fluctuation due to the rarity of the disease.  In the light of multiple testing this association is not statistically significant, however considering the concordance with previous reports and the low statistical power permitted by the rarity of the disease a Bonferroni correction is too strict. We therefore used also the False Positive Report Probability (FPRP 35 ) and using a prior of 0.25 the association retains noteworthiness (posterior p = 0.188). We used a prior p = 0.25 based on the fact that the polymorphism is pleiotropic or in almost complete LD with one, and that it has been found to be associated already with PNET risk. We are aware that the results given by the FPRP are indications and that the final confirmation of the association can be only given by functional studies.
The present study carries some limitation, such as limited clinical information on the sporadic PNETs patients in terms of environmental and familial risk factors and disease stage and grade. However, our results, taken together with what found by Ter-Minassian and colleagues convincingly suggest that the pleiotropic CDKN2A region is associated with the risk of developing PNETs as already observed for several other cancer types.

Materials and Methods
Study population. In the present study 320 sporadic PNET patients and 4,436 controls belonging to the Pancreatic Disease Research (PANDoRA) consortium were recruited in 4 European countries. Cases were sporadic, i.e. not observed in the context of genetic syndromes associated with PNET, such as MEN-1, MEN-2, VHL or TSC. Controls were recruited in the same hospitals, or at least geographical region from where the cases were recruited. All participants signed a written consent form. The study protocol was approved by the ethic board of the University of Heidelberg and was carried out according to declaration of Helsinki. Additional information on the PANDoRA consortium have been given elsewhere 36 . SNPs selection. We investigated the common genetic variability in the CDKN2A/B region using tagging SNPs and potentially functional SNPs. Tagging SNPs were selected using a pairwise tagging method with a minimum r 2 of 0.8 and a minor frequency allele of 0.05. We identified 11 SNPs that efficiently tagged the selected gene region. Subsequently we added two (rs1063192 and rs3217992) putative miR-SNPs that are polymorphic variants predicted to alter the binding of one or more microRNAs to their target. Therefore the final selection resulted in 13 SNPs.
Sample preparation and genotyping. For each sample DNA was extracted from whole blood using the AllPrep Isolation Kit (Qiagen, Hilden, Germany) or the Qiagen-mini kit (Qiagen, Hilden, Germany), according to the manufacturer's protocol. Blood was kept frozen before the extraction. Genotyping was performed using KASP (KBioscence, Hoddesdon, UK) and TaqMan (Thermo Fisher Scientific, Waltham, MA, USA) technologies. Genotyping was carried out using 384 well plates using 5ng of DNA for each sample. The order of DNA samples was randomized on plates in order to ensure that similar numbers of cases and controls were analyzed in each batch. Detection was performed using an ABI PRISM Viia7 sequence detection system with Viia7 software (Applied Biosystems, Foster City, CA, USA). The personnel performing the genotyping was blinded on the identity of the subject (i.e. whether the DNA belonged to a case or a control subject). For quality control, duplicates of 10% of the samples were interspersed throughout the plates. In addition, we discarded all the samples that had a call rate < 75%. Statistical analysis. Using Pearson's chi-square test we checked the departure from Hardy-Weinberg equilibrium (HWE) for all SNPs in the control subjects of the study. Unconditional logistic regression computing odds ratios (OR), 95% confidence intervals (95% CIs) and p values was used to estimate the association between the genotypes of all polymorphisms and PNET risk. The more common allele among the controls was assigned as the reference category and the co-dominant model inheritance model was assessed. All analyses were adjusted for age, gender and geographic origin.
Multiple testing. We used two methods to correct for multiple testing: a robust conservative test and a Bayesian one. The threshold to declare an association to be significant with a Bonferroni correction is 0.0038 (0.05/13). Considering the vast a priori knowledge on the region and on the SNPs in particular we opted to use also the False Positive Report Probability (FPRP) method. The FPRP was developed by Wacholder and colleagues 35 to assess if an association is 'noteworthy' using a Bayesian approach that includes a priori knowledge of the variable taken in consideration. For associations with moderate to high prior evidence (e.g. association reported in a previous study, convincing functional evidence) the prior probabilities used are in the range 0.10-0.25, whereas lower prior probabilities are employed with decreasing information on the SNP and/or the relation between the SNP and the disease 35,37 . Bioinformatic analysis. We used several bioinformatic tools to assess the possible functional relevance for the SNP showing the most significant association with risk of developing PNET. RegulomeDB (http://regulome. stanford.edu/) 38 and HaploReg v2B 39 were used to identify the regulatory potential of the region nearby each SNP. The GTEx portal web site 40 was used to identify potential associations between the SNP and expression levels of nearby genes (eQTL). In addition we used the SNAP software 41 to find SNPs in LD with the SNP that showed the strongest association with PNET risk using a threshold of r 2 = 0.70.