Lack of association of CD44-rs353630 and CHI3L2-rs684559 with pancreatic ductal adenocarcinoma survival

Although pancreatic ductal adenocarcinoma (PDAC) survival is poor, there are differences in patients’ response to the treatments. Detection of predictive biomarkers explaining these differences is of the utmost importance. In a recent study two genetic markers (CD44-rs353630 and CHI3L2-rs684559) were reported to be associated with survival after PDAC resection. We attempted to replicate the associations in 1856 PDAC patients (685 resected with stage I/II) from the PANcreatic Disease ReseArch (PANDoRA) consortium. We also analysed the combined effect of the two genotypes in order to compare our results with what was previously reported. Additional stratified analyses considering TNM stage of the disease and whether the patients received surgery were also performed. We observed no statistically significant associations, except for the heterozygous carriers of CD44-rs353630, who were associated with worse OS (HR = 5.01; 95% CI 1.58–15.88; p = 0.006) among patients with stage I disease. This association is in the opposite direction of those reported previously, suggesting that data obtained in such small subgroups are hardly replicable and should be considered cautiously. The two polymorphisms combined did not show any statistically significant association. Our results suggest that the effect of CD44-rs353630 and CHI3L2-rs684559 cannot be generalized to all PDAC patients.

considering TNM stage of the disease and whether the patients received surgery were also performed. We observed no statistically significant associations, except for the heterozygous carriers of CD44-rs353630, who were associated with worse OS (HR = 5.01; 95% CI 1.58-15.88; p = 0.006) among patients with stage I disease. This association is in the opposite direction of those reported previously, suggesting that data obtained in such small subgroups are hardly replicable and should be considered cautiously. The two polymorphisms combined did not show any statistically significant association. Our results suggest that the effect of CD44-rs353630 and CHI3L2-rs684559 cannot be generalized to all PDAC patients.
Pancreatic ductal adenocarcinoma (PDAC) is a lethal tumour type, with an increasing incidence over the past decades and a five year survival rate still as low as 9% 1 . PDAC is, therefore, projected to be the second cause of cancer-related deaths in the USA by 2030 2 with a similar figure in most Western countries. One of the reasons of this meager survival is the lack of specific symptoms and of biological markers for early diagnosis and risk stratification. Moreover, treatment options are poor, and surgery in the setting of multimodal treatments is the only possible cure. However, there are marked differences in patients' responses to the treatments, which cannot be explained by the traditional prognostic factors such as tumour size, lymph node involvement and distal metastasis 3 . Thus, investigation of biomarkers able to predict the tumour behaviour is of the utmost importance.
Genetic variability accounts for a large part of the risk to develop PDAC 4 . Several germline mutations, associated with well described hereditary syndromes, increase the risk of developing PDAC and subjects carrying such mutations are offered surveillance in the context of research protocols 5 . Besides these high penetrance germline mutations, there are overwhelming evidences on the role of germline genetic polymorphisms in the development of the disease, identified through genome-wide association studies (GWAS) or through large scale case-control studies [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] . However, the possible association between genetic variants and patients' outcome in PDAC, both in terms of survival and response to treatments, has been poorly investigated with limited success [22][23][24][25][26][27][28][29] . The genetic contribution to PDAC survival has, however, been studied with GWAS. For example, a small-scale GWAS done on 252 PDAC cases, with subsequent validation done on 261 and 572 sets of patients, respectively, reported an association between a single nucleotide polymorphism (SNP) on chromosome 12 and overall survival (OS) of PDAC cases (p = 1.72 × 10 −7 in the combined dataset) 27 . Innocenti et al. reported on a GWAS on OS of 351 pancreatic cancer patients treated with gemcitabine in the context of a randomized clinical trial. This study showed an association of a SNP in IL17F with OS (p = 9.51 × 10 −7 ) 29 . Wu et al. performed the largest GWAS on OS to date in 1005 PDAC cases (10), of whom 642 were of European descent (from prospective cohort studies), used as discovery phase, and 363 retrospectively collected cases of Chinese descent used for replication 25 . In the first stage 131 SNPs at 28 loci showed association with OS of the PDAC cases (p < 10 −5 ). Combining the discovery and the replication phases, a locus on chromosome 11 near the SBF2 gene was the most significantly associated with OS of PDAC patients (p = 1.72 × 10 −7 ). It is worth noticing that none of the findings of the above mentioned GWAS reached genome-wide significance level, conventionally set at p < 5 × 10 −8 , and that only a few loci were successfully replicated across multiple reports. In one of the few studies of this kind, we genotyped 44 SNPs proposed by the GWAS by Wu et al. in one of the largest study so far consisting in 1722 PDAC cases, in the context of the PANDoRA consortium 25 . We validated associations of one SNP in the CTNNA2 gene (rs1567532) and one SNP in the RUNX2 gene (rs12209785) with OS 26 . The lack of replicability of the PDAC survival loci is in striking contrast with the situation observed for GWAS-identified risk loci, which are generally confirmed in multiple independent studies, to some extent even in populations of diverse ethnic background.
Recently, Dimitrakopoulos and colleagues carried out a genome-wide screening of functional variants aimed at identifying novel markers associated with PDAC survival after pancreatic resection 30 . The authors used a dense SNP array and investigated more than 2 million SNPs in 331 PDAC patients in a two-phase approach. The main findings consisted in association of CD44-rs353630 and CHI3L2-rs684559 with PDAC survival. The diplotype combination of the two alleles of the two SNPs associated with better survival (C and G, respectively) reached genome-wide significance (HR = 0.38, 95%CI 0.27-0.53, p = 1.00 × 10 −8 ) in the merged population consisting of the two phases.
One of the reasons behind the lack of success of studies investigating genetic variants as biomarkers of disease outcome is that most studies so far were underpowered and therefore prone to statistical fluctuation, and/or that a proper replication phase in an independent adequately sized population was lacking. Considering the capricious nature of association studies, we aimed at validating the findings by Dimitrakopoulos and colleagues in a large cohort of 1856 PDAC cases among which 685 resected patients with tumour stage I or II from the PANcreatic Disease ReseArch (PANDoRA) consortium.

Results
Population description and data filtering. The population investigated in the present study consisted overall of 1856 PDAC patients: 105 with stage I, 717 with stage II, 348 with stage III and 686 with stage IV at the time of diagnosis. The median age was 66 and both sexes were represented with a slight majority of males (56%). 685 patients with stage I or II were operated with curative intent. A description of the population is shown in Table 1.
The genotyping call rate was 95% for CD44-rs353630 and 96% for CHI3L2-rs684559, the concordance between the duplicated samples was higher than 99%. All the SNPs resulted to be in HWE equilibrium with non-significant p values (p > 0.05) when considering all the individuals together or dividing them by country of origin. The minor frequency (MAF) observed in PANDoRA is 25 Survival analysis. Analysing the whole PDAC case population and adjusting for age, sex and stage we observed no statistically significant association between the two SNPs under investigation and OS of PDAC patients, with the best finding consisting in a non-significant trend for the C/C genotype of the CD44-rs353630 of having a worse survival compared with the T/T homozygous (HR = 1.28, 95% CI 0.98-1.68, p = 0.074). We also performed stratified analysis by disease stage and when analysing the individuals that presented stage I disease at diagnosis we observed a statistically significant association for the heterozygous C/T carriers of CD44-rs353630 and worse OS compared with the reference (HR = 5.01; 95% CI 1.58-15.88; p = 0.006). The analysis considering only the 685 patients who received surgery did not show any statistically significant associations. We also performed an analysis considering the combination of the two markers and we observed no statistically significant result. The frequencies and distribution of the genotypes, the HR and 95% CI for the association with PDAC survival are shown in Table 2. A Kaplan-Meier curve with the survival of resected PDAC patients diagnosed in stage I or II (p = 0.64), according to the combined genotypes of the two SNPs is showed in Fig. 1. We performed additional analyses stratifying by country of origin. We did not observe any statistically significant associations, with the exception of a longer OS for patients from Hungary diagnosed with stage I or II PDAC and heterozygous for the CHI3L2-rs684559 SNP, in comparison with the homozygotes for the common allele (HR = 0.19; 95% CI 0.05-0.71; p = 0.014).
In silico functional characterization. The GTEx portal identified only one eQTL for CD44-rs353630: the weak (p = 0.00025) association between the A allele and a decreased expression of the CD44 gene in the tibial nerve. GTEx predicts that CHI3L2-rs684559 is an eQTLs for the CHI3L2, DENND2D, CHIAP2 and CHIA genes across seven tissues, although none of them in the pancreas (p-values ranging from 0.000026 to 3.7 × 10 −20 ). For CD44-rs353630 RegulomeDB showed a rank of 4 that is the third lowest and indicates that the polymorphism may be situated in a possible transcription factor binding site. On the other hand, for CHI3L2-rs684559 Regu-lomeDB showed a rank of 1b indicating a very high degree of supporting data in the database and high probability of functional relevance. Haploreg showed no SNPs in LD with CD44-rs353630 and no eQTLs, while six SNPs (rs2764546, rs12048900, rs12024633, rs2494004, rs1325284, rs2494006) in LD with CHI3L2-rs68455 and 14 eQTLs in lymphoblastoid cell lines and in the whole blood.

Discussion
In the last decades genetic epidemiological approaches, especially those performed through genome-wide association studies in the context of multicentric studies, have identified thousands of SNPs and have shed light on the biology of many cancer types. As far as regards PDAC, around 30 risk loci showed a clear association with the development of the disease. However, differently from other cancer types such as breast, lung, prostate, colorectal cancers, and multiple myeloma [31][32][33][34][35][36][37] , the involvement of genetic variability in PDAC survival has been investigated with limited success. The most reliable factors in determining PDAC patient survival are, therefore, clinical parameters such as TNM stage, tumour grade, margins of resection and pre-and postoperative levels of C-reactive protein (CRP) and CA-19-9 38 . Unfortunately, these factors have limited accuracy in predicting the outcome of PDAC patients and their response to treatments. Therefore, the identification of additional biological markers would be of the utmost importance as a basis to elaborate novel decisional algorithms aimed at optimizing treatment and improving patients' outcomes. Additionally, discovering new PDAC survival genetic loci may give insight to the molecular mechanisms of survival and suggest, directly or indirectly, new therapeutic targets. Several SNPs have been proposed to be possible prognostic markers in PDAC patients, given the association with survival. However, with only a few exceptions, none has been consistently replicated in following studies [25][26][27][28][29] . Recently two novel candidates, identified through a GWAS approach, have been proposed. A total of 331 PDAC patients were investigated by means of a dense SNP array, encompassing more than 2 million SNPs, finding an www.nature.com/scientificreports/ association between CD44-rs353630 and CHI3L2-rs684559 and PDAC survival in resected patients of whom 90.9% with stage I or II. The diplotype combination of these two alleles was, indeed, associated with longer survival after surgery 30 . We genotyped these two SNPs and analysed them, separately or in combination, in a large set of PDAC cases belonging to the PANDoRA consortium. Considering the paucity of evidences on markers related to PDAC survival this attempt to generalize the results of the two SNPs was necessary. We had more than 99% power to replicate the associations, but we observed none when analysing the whole population or when considering only the resected patients. The most clinically relevant result of the study by Dimitrakopoulos and colleagues is the association of the two combined polymorphisms with better survival in 331 patients resected with intent of radicality. Availability of such a biomarker might be extremely relevant as it would help selection of patients to consider for upfront surgery or neoadjuvant chemotherapy, thus contributing to a personalized medicine approach for PDAC. However, in our subgroup analysis on 685 patients resected with stage I or II disease (therefore with a much larger sample size and a power to detect the association greater than 99%) the same SNP combination did not show any statistically significant association (the best association we observed had a p = 0.48). Further stratifying the population by stage, we observed a statistically significant association between CD44-rs353630 C/T individuals and worse OS among PDAC cases diagnosed with stage I. The estimate is very high (HR > 5) and it goes in the opposite direction of the ones reported by Dimitrakopoulos, which all showed association of the effect alleles/genotypes with better OS. These findings arise from an analysis conducted in a very small subgroup of individuals and only comparing heterozygous with the reference homozygous (N = 34 vs. 13, respectively) and therefore must be taken with caution. The analysis conducted to assess the functional relevance of the SNPs clearly shows that for CD44-rs353630 there is little or no evidence of its involvement in gene expression, while CHI3L2-rs684559 is possibly an eQTL with regulatory potential in several tissues, but not in the pancreas. It is therefore difficult to link the SNPs to a functional role in pancreatic cancer progression or outcome.
Analyses stratified by country of origin did not show anything remarkable, with the exception of a weak association with longer OS in Hungarian patients diagnosed in stage I or II for heterozygous for the CHI3L2-rs684559 SNP. We note that this association is not significant if we consider a significance threshold adjusted for multiple testing (the observed association had p = 0.014, but analyses stratified by 10 countries of origin have an adjusted threshold of significance of p = 0.05/10 = 0.005). Considering also the relatively small number of cases used for this analysis (18 patients diagnosed with stage I and 16 in stage II), and the difficulty of explaining why the polymorphism should be associated with OS only in Hungarian patients, we are inclined to think that this result is due to statistical fluctuation and does not reflect a real association. A possible limitation of the present study is the lack of information on chemotherapeutic regimens administered to the patients, either before or after surgery. This prevents us to explore association between the genetic variability and drug response that may help personalization of treatments. In addition, the included patients were not consecutively enrolled in each center, and that could add a potential bias to the findings. We believe, however that given the samples size this bias should be diluted and unlikely to be responsible for the lack of statistically significant association that we observed.

Material and methods
Study population. The PANDoRA consortium consists of a multicentric study that includes 12 European countries (Italy, Germany, Poland, United Kingdom, Hungary, Czech Republic, Greece, Lithuania, the Netherlands, Denmark, Ukraine, Romania) and Japan and has been described in detail elsewhere 9 .
Briefly, all cases included in the consortium population are defined by a confirmed diagnosis of pancreatic cancer and were collected with a non-consecutive sampling approach in 21 different centers, including departments of surgery, endoscopy and gastroenterology. In addition to pancreatic cancer cases, controls have been selected among the general population, blood donors and among hospitalized subjects with different diagnosis excluding cancer. In the present study, 1856 PDAC cases of European descent were included, for whom information on country of origin, sex, age at diagnosis, overall survival (OS) and disease stage assessed by TNM stage was available. Disease stage was defined according to American Joint Committee on Cancer TNM system 7th version 39 , and patients were categorized as stage I (T1-2, N0, M0), stage II (T3, N0, M0/T1-3, N1, M0), stage III (T4, any N, M0), stage IV (any T, any N, M1). The patients that underwent surgery were categorized by the stage at the time of operation.
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Commission of the Medical Faculty of the University of Heidelberg (project identification code: S-565/2015). A description of the population is shown in Table 1.
SNPs selection, sample preparation and genotyping. We selected two germline variants, CD44-rs353630 and CHI3L2-rs684559 that were proposed recently by Dimitrakopoulos and colleagues as associated with survival of PDAC patients that received surgery. For each subject DNA was extracted from whole blood Table 2. Survival analysis of CD44-rs353630 and CHI3L2-rs684559. NA, not applicable; HR, Hazard Ratio, CI, confidence interval. Statistically significant results (p < 0.05) are in bold. Numbers may not add up to the total number of subjects in the respective stratum due to missing data. a The analyses were adjusted by tumour stage (stages I through IV). b The analyses were adjusted by tumour stage, country of origin, sex and age. c The analyses were adjusted by, country of origin, sex and age. d The combined analysys was performed using the rs353630 T/T and rs684559 A/A genotypes as reference. Statistical analysis. Deviation from Hardy-Weinberg equilibrium (HWE) was tested for both polymorphisms in the overall population and dividing for country of origin using the Pearson corrected chi square test. Survival analysis was conducted by Cox proportional hazard models with OS as endpoint, computing hazard ratios (HR) and 95% confidence intervals (CI), using an additive and a codominant inheritance models and setting the same alleles used by Dimitrakopoulos and colleagues as the reference category (T for CD44-rs353630 and A for CHI3L2-rs684559). In addition, considering that the most statistically significant association observed by Dimitrakopoulos was the one combining the two genotypes, we also performed this analysis defining two groups: the individuals with the CD44-rs353630 T/T and/or CHI3L2-rs684559 A/A as the reference category and all the other genotype combinations as the other group. For this analysis we considered only individuals with 100% call rate. All analyses were adjusted for sex, age and TNM stage. We also performed an analysis adjusting only for disease stage, since Dimitrakopoulos and colleagues did not adjust for sex and age. As the vast majority (90.9%) of the patients in the study by Dimitrakopoulos and colleagues were resected with stage I or II, stratified analysis was conducted considering patients showing TNM stage I, TNM stage II, and TNM stage I and II together. We also performed a stratified analysis considering TNM stage I and II for patients who received surgery, as well as by country of origin. The probability of survival was calculated by means of a Kaplan-Meier curve.

Functional analysis using bioinformatic tools.
To test the possible functional relevance of CD44-rs353630 and CHI3L2-rs684559 we used several bioinformatic tools. In particular, the Genotype-Tissue Expression (GTEx) project portal (https:// gtexp ortal. org/ home/) was used to identify potential cis-acting expression quantitative trait loci (eQTLs) in order to establish whether the two SNPs could be involved in the expression of nearby genes (accession date 8th May 2020) 40 . We analysed all the tissues present in the database. We have used Regulome DB 2.0 (https:// www. regul omedb. org/ regul ome-search/) to evaluate the effect of the SNPs with regulatory regions of the human non-coding genome 41 . Regulome DB 2.0 assigns to each SNP a rank consisting of 14 steps, that reflect the volume of the supporting data on the functional relevance of the SNP, going from 6 (lowest amount of evidence) to 1a (highest amount of evidence). Finally, we used Haploreg v4.1 (https:// pubs. broad insti tute. org/ mamma ls/ haplo reg/ haplo reg. php) to link the SNPs to functional annotations of the genome 42 . Haploreg is designed to explore the possible mechanistic interaction of the candidate SNPs (and all SNPs in LD with them) with regulatory motifs.

Conclusions
In conclusion, we attempted to replicate the results of a genome-wide investigation on genetic polymorphisms and survival of PDAC patients, with a well-powered study. Our findings suggest no involvement in the PANDoRA population of the two variants reported by Dimitrakopoulos and colleagues and highlight a lack of functional or regulatory role of CD44-rs353630 and CHI3L2-rs684559 in the pancreatic tissue. Our results strongly support the need to consider with caution findings on genetic variables as prognostic markers in PDAC, in the absence of replication in large and independent datasets or of compelling in vitro mechanisms to support the associations. Moreover, it would be desirable for future studies to have information on the therapy and to include this information in the analyses. This could make possible to evaluate the different genetic predisposition in drug response, and further our knowledge of this deadly disease.

Data availability
Owing to ethical and legal reasons, raw data of this work are not publicly deposited. The PANDoRA primary data for this work will be made available to researchers who submit a reasonable request to the corresponding author, conditional to approval by the PANDoRA Steering Committee and Ethics Commission of the Medical Faculty of the University of Heidelberg. Data will be stripped from all information allowing identification of study participants.