Genetic polymorphisms associated with telomere length and risk of developing myeloproliferative neoplasms

Telomere length measured in leukocyte (LTL) has been found to be associated with the risk of developing several cancer types, including myeloproliferative neoplasms (MPNs). LTL is genetically determined by, at least, 11 SNPs previously shown to influence LTL. Their combination in a score has been used as a genetic instrument to measure LTL and evaluate the causative association between LTL and the risk of several cancer types. We tested, for the first time, the “teloscore” in 480 MPN patients and 909 healthy controls in a European multi-center case–control study. We found an increased risk to develop MPNs with longer genetically determined telomeres (OR = 1.82, 95% CI 1.24–2.68, P = 2.21 × 10−3, comparing the highest with the lowest quintile of the teloscore distribution). Analyzing the SNPs individually we confirm the association between TERT-rs2736100-C allele and increased risk of developing MPNs and we report a novel association of the OBFC1-rs9420907-C variant with higher MPN risk (ORallelic = 1.43; 95% CI 1.15–1.77; P = 1.35 × 10−3). Consistently with the results obtained with the teloscore, both risk alleles are also associated with longer LTL. In conclusion, our results suggest that genetically determined longer telomeres could be a risk marker for MPN development.


Introduction
Myeloproliferative neoplasms (MPNs) are a group of hematopoietic stem-cell disorders, characterized by increased production of differentiated cells of the myeloid lineage [1][2][3] . The main types of MPNs are chronic myeloid leukemia (CML), characterized by the presence of the Philadelphia chromosome, and the JAK2/CALR/MPL mutation-related MPNs group, also called Philadelphia chromosome negative, including polycythemia vera (PV), essential thrombocythemia (ET), and primary myelofibrosis (PMF) 4 . The pooled worldwide crude rate incidence of MPNs is 2.58/100,000 per year 5 . MPNs are an elderly disease with a large increase in risk after the age of 60, and are more common in men than in women 3 . Several epidemiologic factors, such as body mass index (BMI), smoking behavior, alcohol consumption, benzene exposure, and physical activity have been associated with one or more specific subgroups of MPNs 1 .
Moreover, there are several evidences for a causative role of somatic mutations in MPN development, such as the V617F mutation in the JAK2 gene and mutations in CALR and MPL 6,7 . In addition, common low penetrance variants have been associated with the disease risk, in particular, seven single nucleotide polymorphisms (SNPs) have been identified through three genome-wide association studies (GWAS) 6,8,9 and, through case-control studies, additional hits such as the 46/1 JAK2 haplotype 10 and several SNPs belonging to genes involved in DNA repair, apoptosis, inflammation, and detoxification 11-16 . In addition, telomere length measured in leukocyte (LTL) has also been investigated as a marker of susceptibility for MPN. LTL is a good proxy for telomere length in other tissues and is relatively stable even across long periods of time 17,18 . LTL has been associated with risk of developing several neoplasms [18][19][20][21][22][23][24][25] .
Five studies have investigated the possible correlation between LTL and MPN risk and all have found a correlation between short telomeres and increased risk of development of the disease [26][27][28][29][30][31] . Even though the findings seem conclusive, all these studies may suffer from common pitfalls that have to be considered when analysing LTL in relation to neoplastic diseases. All the studies were retrospective and therefore could be influenced by reverse causation bias (i.e., the disease and/or therapies causing the decrease in LTL) as described in a review on LTL association studies 32 . In addition, molecular technologies such as FISH or qPCR are prone to measurement errors due to heterogeneity of biological sample collection, storage, and manipulation 33 . Finally, the studies had a small average sample size (130 cases and 155 controls) that could lead to an overinterpretation of the results.
Telomere length is, at least partially, genetically determined, and 11 SNPs have been identified, through GWAS, to affect LTL 23,34,35 . To overcome the biases of using the direct measure of LTL, the genetic score combination of these 11 SNPs has been successfully used as proxy of genetically determined LTL to investigate its association with various cancer types 23,[36][37][38][39][40][41][42][43] . Noteworthy, the associations between LTL and cancer risk have not consistently shown the same trend if the LTL was measured directly through qPCR or through the genetic score 22,23,36,44 . With these premises, we have tested the "teloscore" in 480 MPN cases and 909 controls in a European multi-center case-control study.

Study population
Biological samples were collected in four European countries (Hungary, Italy, Poland, and Spain) and included 480 cases and 909 controls. MPN cases were diagnosed according to the WHO 2008 criteria, and 149 CML, 173 ET, 36 PMF, and 122 PV have been collected. Controls were selected at the same centers where cases were collected and were frequency matched to the cases for age and sex. More in detail: in Spain, Poland and Italy all controls were healthy blood donors arising from the general population that declared to be free of neoplastic pathologies at the moment of the interview. In Hungary, they were recruited at the internal medicine department and were patients diagnosed with any diseases other than malignancies or immunopathological disorders (like arteriosclerosis, liver cirrhosis, myocardial events, stroke, metabolic diseases, nephroliths or colitis). The characteristics of the study population are summarized in Table 1. At diagnosis, all the information about the patients was collected from medical records, namely age, sex, country of origin, and V617F JAK2 mutation status. Following the guidelines of the Declaration of Helsinki, written informed consent was obtained from each participant.

Sample preparation and genotyping
DNA was extracted from whole blood using QIAampR 96 DNA QIAcubeR HT Kit (Qiagen, Hilden, Germany). Genotyping was performed in 384-well plates, using the TaqMan technology (ThermoFisher Applied Biosystems, Waltham MA, USA) according to the manufacturer's instructions. Each plate contained cases and controls, duplicated samples (8%) were used to check the quality of the genotyping, and negative controls were also added to each plate. Information about the case/control status of the samples was not available to the person performing the genotyping. PCR plates were read on a ViiA 7 Real-Time PCR System and genotypes were called with the QuantStudio software (ThermoFisher Applied Biosystems, Waltham MA, USA).

Teloscore computation
The "teloscore" was built weighing the effect of each SNP on LTL, based on the knowledge of the allele associated with LTL increase. The estimate of the per-allele effect on LTL for each SNP (expressed in base pairs) was taken from the literature 23 . The score was computed as follows: to each subject and for each genotype a value equal to 0, 1 or 2 was assigned, based on the number of alleles associated with longer telomeres. Then for each SNP in each subject, we multiplied the number of alleles associated with longer telomeres and the per-allele effect on LTL in base pairs. Finally, we summed up the estimated effect of each genotype for each study subject, obtaining a score for each subject. Only a subset of study subjects had a 100% genotyping call rate (330 cases and 687 controls). Therefore, a scaled teloscore was computed with all the subjects included in the study, by dividing the score by the number of genotypes available for each subject. We have not tested the teloscore in our data (i.e., the association between the SNPs and TL measured with RT-PCR), however, the estimate of the effect of each allele has been computed using thousands of individuals in the original studies and we consider it reliable. We will use the wording "genetically determined" telomere length to identify the telomere length predicted by the score throughout the manuscript. Supplementary Table 1 shows an example of how the teloscore was computed.

Statistical analysis
The association between each SNP and MPN risk was tested with unconditional logistic regression, computing odds ratios (ORs) and 95% confidence intervals (CI), adjusting for age, sex and country of origin. The analysis was performed with allelic (log-additive) and co-dominant models of inheritance, and, considering the Bonferroni correction for multiple testing, the threshold of statistical significance was P = 0.05/(11 SNPs × 2 models) = 0.05/22 = 0.0023.
For the teloscore analysis, cases were distributed in quintiles based on the distribution of values in controls. The analyses were performed with unconditional logistic regression computing ORs and 95% CIs, adjusting for age, sex, and country of origin.

Bioinformatic tools
Each SNP with statistically significant results in the risk association analysis was examined with bio-informatic tools to evaluate a plausible biologic mechanism to explain the association. First, RegulomeDB (https://www. regulomedb.org/regulome-search/) 46 was used to identify the regulatory potential of each SNP analyzed and the SNPs in LD with them. Then, SNPs with the most interesting regulatory potential were analyzed with the Genotype-Tissue Expression (GTEx) portal web site (https://www.gtexportal.org) to identify potential associations between the SNPs and gene expression (i.e., if the SNPs were expression quantitative trait loci, eQTL) 47 .

Results
All the selected SNPs were in Hardy-Weinberg equilibrium in the control population. From a total of 503 cases and 929 controls, we excluded 23 cases and 20 controls with a genotyping call rate lower than 72.7% (i.e. we kept samples with at least 8 called SNPs out of 11), leaving 480 cases and 909 controls to use in the statistical analysis with an average genotyping call rate of 97.0%. The concordance rate with duplicated samples was higher than 99%.

Association of the "teloscore" with MPN risk
We tested the association between the weighted teloscore and MPN risk, using the average scores computed for all the individuals. The observed trend was an increased risk of MPN with longer genetically determined telomere length, comparing individuals in the highest quintile with those in the lowest quintile (OR = 1.82; 95% CI 1.24-2.68; P = 2.21 × 10 −3 ) ( Table 3).
The same trend was observed in the teloscore analyses in the subset of samples with 100% genotyping call rate (330 cases and 687 controls), comparing individuals in the highest quintile with those in the lowest quintile (OR = 1.75; 95% CI 1.12-2.74; P = 0.014) ( Table 3).
TERT-rs2736100 is well-known in the literature to be associated with MPN risk 48 and it might have pleiotropic effects. We replicated the teloscore analysis without this polymorphism, using the other 10 SNPs to test if the association discovered was driven by the presence of TERT-rs2736100. The result observed was in agreement with the result obtained with 11 SNPs (Supplementary  Table 2).

Stratified analyses
Stratified analysis according to JAK2 mutation status was also performed for the teloscore and for each individual SNP. The results of the teloscore were in agreement with the analysis done regardless of JAK2 status, with genetically determined longer LTL associated with increased risk of developing MPNs, although the association reached statistical significance only in the JAK2+ stratum (Supplementary Table 3). The analysis of the individual SNPs showed two additional borderline associations for ZNF208-rs8105767 (p = 0.027) and for ZBTB46-rs755017 (p = 0.037) in the JAK2+ stratum and no additional hits (Supplementary Table 4).

Possible functional effect
We tested with bioinformatic tools the possible functional effect of the individual SNPs associated with risk of MPNs, and we found that RegulomeDB assigns to OBFC1-rs9420907 a rank of 3a that indicates that the region where the SNP lies is probably a transcription factor binding site in a DNase sensible region.

Discussion
Telomeres are structures with a continuous decrease of length in each cell replication, due to the inability to The p-value are bold statistically significant. a The unit for the analysis with the continuous variable was the increment of one quintile.
replicate the ends of the chromosome 49 . LTL has been extensively investigated in relation to cancer risk, with several studies suggesting an association between short LTL, as well as long LTL with increased risk and some studies suggesting a null association or a bimodal association [18][19][20][21][22][23][24][25]50 .
Five relatively small retrospective studies have been attempted to identify the relationship between the LTL and MPNs risk through FISH or qPCR. We used a different strategy and computed a genetic score as proxy for LTL and observed that genetically determined longer telomeres were significantly associated with an increased risk of developing MPNs. These results are in contrast with what was observed before i.e. increased risk of MPNs and short LTL. The discrepancies in the findings could be explained by the fact that the direct measure of LTL is plagued by several possible biases. For example, retrospective studies are often influenced by reverse causation bias, especially if chemotherapy is not taken in consideration when estimating the association. In addition, exposure to several risk factors, such as smoking and BMI, is associated with short LTL and tends to confound the association if not accounted for properly. Finally, the decrease of TL is highly correlated with the increase of age, and the measurement of LTL could be affected by bias due to the different age at blood drawing of the cases and controls included in the studies 32,33 . On the contrary, the genetic risk score has the advantage of being independent from study design and from the environmental factors known to be associated with a variation of LTL. The use of genetic proxies to understand the causative correlation between environmental risk factors and outcome has become more and more popular in the last years and it has been instrumental in identifying several of such associations 23,[36][37][38][39][40][41][42][43] . The sample size of our study is almost four times larger than the average size of the previous studies (480 vs 130) if considering only the cases and almost five times larger if considering cases and controls combined (1389 vs 285).
We have also analyzed the effect of each SNP and we replicated the well-known association between the TERT-rs2736100-C allele with an increased risk of developing MPNs 6,51-53 . The TERT-rs2736100-C allele has been consistently associated with longer LTL 21,35 , possibly through the regulation of the telomerase activity 48 . Another potentially interesting association was observed between the OBFC1-rs9420907-C allele and increased risk of MPN. The gene product of OBFC1 is a member of the CST complex, that is involved in telomere maintenance, and it is a member of the alpha accessory factor (AAF), that is involved in the initiation process of DNA replication 54 . It is interesting to note that also the OBFC1-rs9420907-C allele is associated with long LTL 45 . Regulome DB assigns to the SNP a rank of 3a indicating a mild possibility of this SNP to be involved in the regulation of nearby genes, however, without in vitro studies, it is not possible, with this level of evidence, to draw a definitive conclusion on the variant. The association of these two SNPs is consistent with what we observed for the score, i.e. the alleles associated with increased MPN risk are also associated with longer LTL. Moreover, an additional SNP PXK-rs6772228-T showed the same trend observed for TERT-rs2736100-C and OBFC1-rs9420904-C, with the T allele, known to be associated with longer LTL, correlated with a non-statistically significant increase of risk of developing MPNs.
A possible limitation of this study is that the controls we selected did not undergo screening to detect myeloidspecific clonal mutations and therefore they might present clonal hematopoiesis, which is a known precursor of hematopoietic malignancies 8,55,56 , however, even if some controls do have clonal hematopoiesis, this could only dilute the true associations between the exposure and the outcome, and not create spurious ones.
Telomere length measured through teloscore could be a possible marker of cells that have the innate tendency to have higher dividing potential. Longer TL may facilitate the acquisition and accumulation of new harmful potential mutations that could increase the risk of developing cancer disease 23 .
Taken together, all these evidences point towards an association between longer LTL and increased risk of developing MPNs despite what has been reported previously.