Germline polymorphisms in the Von Hippel-Lindau and Hypoxia-inducible factor 1-alpha genes, gene-environment and gene-gene interactions and renal cell cancer

We investigated the relationship between germline single nucleotide polymorphisms (SNPs) in Von Hippel-Lindau (VHL) and Hypoxia-inducible factor 1-alpha (HIF1A), and their gene-environment and gene-gene interactions, and clear-cell RCC (ccRCC) risk. Furthermore, we assessed the relationship between VHL SNPs and VHL promoter methylation. Three VHL polymorphisms and one HIF1A polymorphism were genotyped in the Netherlands Cohort Study. In 1986, 120,852 participants aged 55–69 completed a self-administered questionnaire on diet and lifestyle and toenail clippings were collected. Toenail DNA was genotyped using the Sequenom MassARRAY platform. After 20.3 years, 3004 subcohort members and 406 RCC cases, of which 263 ccRCC cases, were eligible for multivariate case-cohort analyses. VHL_rs779805 was associated with RCC (Hazard Ratio (HR) 1.53; 95% Confidence Interval (CI) 1.07–2.17) and ccRCC risk (HR 1.88; 95% CI 1.25–2.81). No associations were found for other SNPs. Potential gene-environment interactions were found between alcohol consumption and selected SNPs. However, none remained statistically significant after multiple comparison correction. No gene-gene interactions were observed between VHL and HIF1A. VHL promoter methylation was not associated with VHL SNPs. VHL SNPs may increase (cc)RCC susceptibility. No associations were found between gene-environment and gene-gene interactions and (cc)RCC risk and between VHL promoter methylation and VHL SNPs.

Main SNP effects. In both age-and sex-adjusted analyses and multivariable-adjusted analyses, an association with (cc)RCC risk was observed for SNPs in VHL_rs779805, but not for SNPs in VHL_rs1642739, VHL_rs265318 and HIF1A_rs2301111 (Table 2). In multivariable-adjusted analyses individuals carrying the AG (vs. AA) genotype of VHL_rs779805 had a statistically significantly increased RCC risk (HR 1.32, 95%CI 1.06-1.66), and the GG (vs. AA) genotype was associated with a statistically significantly increased RCC risk (HR 1.53, 95%CI 1.07-2.17). In addition, a statistically significant per-allele p for trend was observed (p = 0.004). In multivariable-analyses for ccRCC risk, the AG (vs. AA) genotype for VHL_rs779805 was associated with a statistically significantly increased ccRCC risk (HR 1.35, 95% CI 1.02-1.78), as was the GG (vs. AA) genotype of VHL_rs779805 (HR 1.88, 95%CI 1.25-2.81).
Gene-environment interactions. In multivariable-adjusted models for RCC risk, potential gene-environment interactions were observed between VHL_rs1642739, VHL_rs779805 and HIF1A_rs2301111 Baseline characteristics (mean (SD)) Subcohort members RCC ccRCC SNPs and alcohol consumption (Table 3). A weak inverse association between alcohol consumption (per 5 g/day) and RCC risk was observed in participants carrying the rare genotype for VHL_rs1642739 and VHL_rs779805, but not in participants carrying the wild-type genotype. For carriers of the wild-type HIF1A_rs2301111 genotype a weak inverse association was observed between alcohol consumption and RCC risk, but not for individuals carrying the rare genotype. No interaction was observed between either of the selected SNPs and self-reported hypertension (yes, no), smoking status (never, former, current) and BMI (per kg/m 2 ) for RCC risk. For ccRCC, a potential interaction between VHL_rs779805 SNPs and alcohol consumption was observed. However, after correction for multiple comparisons using the adaptive Benjamini-Hochberg method none of the potential gene-environment interactions maintained statistical significance 30 .
In sensitivity analyses, a potential gene-environment interaction was apparent between categorized alcohol consumption (0 g/d, 0.1-4 g/d, 5-14 g/d, 15-29 g/d and 30+ g/d) and VHL_rs164273 status for ccRCC risk (p = 0.009; Supplementary Table 2). The direction of associations for VHL_rs779805 was similar to main analyses using alcohol consumption (per 5 g/day). Sensitivity analyses between smoking status (ever/never), hypertension (no self-reported hypertension or no self-reported antihypertensive medication, hypertension with self-reported hypertensive medication) and BMI (<20 kg/m 2 , 20-<25 kg/m 2 , 25-<30 kg/m 2 and 30+ kg/m 2 ) and SNP status showed similar associations compared to main gene-environment analyses (Supplementary Table 2). Similar to main analyses, no sensitivity analysis remained statistically significant after multiple comparison correction.
Gene-gene interactions. No  www.nature.com/scientificreports www.nature.com/scientificreports/ Association between Snps and VHL promoter methylation status. In total, information on VHL promoter methylation was available from 253 ccRCC cases. Among ccRCC cases, 19 (7.5%) participants had a methylated CpG island in the VHL promoter region of which 13 had at least one mutant allele for the selected VHL SNPs (Supplementary Table 3). VHL promoter methylation was apparent in three, twelve and two participants for the rare genotype of VHL_rs1642739 (GG vs. GT + TT), VHL_rs779805 (AA vs. AG + GG) and VHL_ rs264318 (AA vs. AC + CC), respectively. In multivariable-adjusted analyses a non-significant inverse association was observed between both VHL_rs1642739 (HR 0.45, 95%CI 0.12-1.69) and VHL_rs265318 (HR 0.38, 95%CI 0.07-2.00) and VHL promoter methylation in ccRCC cases. No association was observed for the VHL_rs779805 SNP (HR 0.99, 95%CI 0.37-2.69).

Discussion
In this study, a statistically significantly increased RCC risk was found for individuals that carry genotypes with at least one variant allele for the VHL_rs779805 SNP. This association was especially pronounced for ccRCC risk. No association was found for VHL_rs164239, VHL_rs265318 and HIF1A_rs2301111. After adjustment for multiple comparisons, no statistically significant gene-environment interactions were found between the selected SNPs and smoking, hypertension, BMI and alcohol for both RCC and ccRCC cases. No gene-gene interactions were found between selected VHL SNPs and the HIF1A SNP.
Several studies have assessed the relationship between the VHL_rs779805 SNP and sporadic RCC 19-21 . Lv et al. found an association between the germline SNP VHL_rs779805 and RCC risk. Similarly, we found a statistically significant positive trend for the G allele and a positive association between the GG genotype for VHL_rs779805 and RCC risk 20 . The aforementioned studies did not report associations between VHL SNPs and ccRCC risk. In our study, rare VHL_rs779805 genotypes had a stronger association with ccRCC risk than with RCC risk. This might indicate that VHL polymorphisms lead to an increased susceptibility for ccRCC in particular. To our knowledge, no other study has investigated the relationship between VHL_rs1642739, VHL_rs265318 and HIF1A_rs2301111 and (cc)RCC risk. In this study, no association was found between (cc)RCC risk and VHL_rs1642739, VHL_rs265318 or HIF1A_rs2301111.
Multiple studies have assessed gene-environment interactions in RCC and ccRCC. RCC risk has been found to be associated with interactions between alcohol consumption and ADH7 26 ; sodium and hypertension and AGTR, AGT and ACE 31 ; calcium and vitamin D intake and RXRA 28 ; tobacco smoking and NAT2, CYP1A1 and GSTM1 25 ; and meat-cooking mutagens and ITPR2 and EPAS1 27 . To our knowledge, we are the first to study gene-environment interactions between the selected VHL and HIF1A SNPs and smoking, hypertension, BMI and alcohol consumption. Solely the interaction between VHL_rs779805 and alcohol consumption was associated with both RCC and ccRCC risk. However, this association did not maintain statistical significance after correction for multiple comparisons with the adaptive Benjamini-Hochberg method. Dominant models were used for all gene-environment analyses because of the low MAF of most included SNPs. However, SNPs may not have adhered to a dominant model, as there may be differences in disease susceptibility between heterozygous and homozygous rare genotypes, as was found for VHL_rs779805 (Table 2). This exemplifies that our gene-environment analyses may have been hampered by the inability to assess interactions per genotype. Further research is needed to ascertain the interaction between alcohol and VHL SNP status on (cc)RCC risk.  www.nature.com/scientificreports www.nature.com/scientificreports/ Disruptions in the VHL tumor suppressor gene are thought to play a role in the constitutive activation of hypoxia-inducible factors, as regulated in part by HIF1A, which may lead to carcinogenesis 1 . Therefore, it is plausible for gene-gene interactions to occur. However, in this study, we did not find gene-gene interactions between selected VHL and HIF1A SNPs on the risk of developing (cc)RCC.
Previous studies have found a relationship between VHL promoter hypermethylation and SNPs in VHL_rs779805 in sporadic ccRCC cases 6,23 . Moore et al. also reported a positive association between promoter hypermethylation and VHL_rs265318 and VHL_rs1642739. In contrast, we found no association between promoter methylation status and VHL_rs779805 in ccRCC cases. VHL_rs1642739 and VHL_rs265318 seemed inversely associated with VHL promoter methylation in ccRCC cases. However, this association was based on a limited sample size. While the number of cases with known promoter methylation status was similar in size to the study of Moore et al., our study had a smaller proportion of cases with VHL promoter methylation (7.5% vs. 9.8%) 23 . Banks et al. reported an even higher proportion of sporadic ccRCC cases with a methylated VHL promoter (20.4%), but had a smaller study population 6 . In general, there are large differences in the proportion of methylated VHL promoters per SNP between studies, which may explain these unstable point estimates 23 . Therefore, more research with a larger number of sporadic ccRCC cases is needed to elucidate the relationship between VHL promoter methylation and VHL SNPs.
At present, genome-wide association studies (GWAS) have identified multiple novel risk loci that may contribute to RCC susceptibility. Interestingly, SNPs in the VHL and HIF1A genes have not (yet) been identified as potential risk variants, while there is a biological plausibility for the involvement of these genes based on current evidence on the development of RCC 2,9 . For example, risk loci have been identified in EPAS1 11,13,17,18 , which is known to be involved in the VHL-HIF-1 pathway 32 . While we found no evidence for an association between three of our selected SNPs, VHL_rs779805 was associated with an increased risk of RCC. This finding was in line with two prior published studies, in which a potential association between VHL_rs779805 and RCC risk was found 19,20 . While this particular SNP is present on commonly used SNP arrays, this SNP remains unidentified in large-scale GWAS studies [11][12][13][14][15][16][17][18] . It is estimated that the currently available risk loci for RCC account for approximately 10% of the familial risk for RCC 11 . Therefore, it may well be possible for minor susceptibility loci to remain unidentified in GWAS studies, due to their tendency to convey small-to-moderate changes in risk, while major susceptibility loci are detectable in the stringent false discovery rate correction criteria of GWAS studies. This could be a reason why SNPs like VHL_rs779805 may remain unidentified, unless alternative methodologies are employed 11 . As a result, there is ample opportunity to discover new, rarer, RCC risk variants in future research. Additional evidence on risk loci from GWAS studies, combined with extensive information on direct effects, environmental factors and other potential modulators of disease etiology from candidate SNP studies, should lead to new insights into the biology of RCC to further the potential for new prevention, early detection and intervention strategies to be employed 11 .
This study also has several strengths. Strengths of this study were the detailed questionnaire information, the long duration and the histological revision of RCC cases by two experienced pathologists. Furthermore, cases in our study were obtained prospectively from a population of 120,852 men and women from 204 Dutch municipalities. Combined with the completeness of follow-up, we assume that these cases are a representative of kidney cancer cases in the Netherlands at the time.
In conclusion, this study confirmed the association between germline SNP VHL_rs779805 with RCC risk. In addition, a slightly stronger association for ccRCC was found compared to RCC. Potential gene-environment interactions were found between alcohol and VHL SNPs. However, results did not remain statistically significant after correction for multiple comparisons. No gene-gene interactions were observed between the VHL and HIF1A SNPs. Lastly, tumor promoter methylation was not significantly associated with VHL SNPs.

Methods
Study design. The NLCS is a nation-wide prospective cohort study initiated in September 1986 with the inclusion of 120,852 participants aged 55-69 years to study the relationship between diet and cancer. The study design has been described in detail elsewhere 33 . In short, a case-cohort design was used for efficiency in data processing and follow-up for vital status. Cases were derived from the entire cohort, whereas a subcohort of 5000 participants, consisting of 2411 men and 2589 women, was randomly sampled at baseline to estimate person years at risk for the entire cohort. The subcohort was followed up biennially for migration and vital status information by contacting participants and using computerized municipalities registries. Using the subcohort, person-years at risk were calculated from baseline until registration of RCC, or until date of censoring by death, emigration, loss to follow-up or end of follow-up, whichever occurred first. Cancer follow-up for the full cohort was conducted by computerized record linkage with the Netherlands Cancer Registry (NCR), the Netherlands Pathology Registry (PALGA), and causes of death registry maintained by Statistics Netherlands (CBS) 34 . Follow-up for vital status of the subcohort was nearly 100% complete after 20.3 years. The completeness of cancer follow-up is estimated to be over 96% 35 .
Individuals with prevalent cancer, excluding skin cancer, at baseline were excluded. After 20.3 years of follow-up, 608 RCC cases were identified (International Classification of Diseases for Oncology 3 (ICD-O-3):C64). Histologically confirmed epithelial RCC cases were eligible for the collection of formalin-fixed paraffin-embedded (FFPE) tumor tissue. Tumor blocks were collected for 454 out of 568 eligible cases (80%). Two experienced pathologists revised the tumor histology according to the WHO-classification of RCC tumors 36 . Based on this revision 366 (81%) of the cases with available tumor blocks were classified as ccRCC cases, 60 (13%) as papillary RCC cases, 15 (3.3%) chromophobe RCC cases, and 13 (2.9%) other or undefined RCC cases. ethics statement. Individuals invited to participate in the NLCS received an invitation letter with details on the study and the use of their data. In addition, they received the baseline questionnaire, which included an envelope for returning toenail clippings. By completing and returning the baseline questionnaire, individuals www.nature.com/scientificreports www.nature.com/scientificreports/ consented to participate in the NLCS (response rate 35.5%). Individuals were informed about the possibility to end their participation at any time, at which point all their data would be removed. All methods were performed in accordance with the relevant guidelines and regulations that were applicable at that time (1986). The institutional review boards of Maastricht University (Maastricht) and the Netherlands Organization for Applied Scientific Research TNO (Zeist) approved the NLCS (February 2, 1985 and January 6, 1986, respectively). The institutional review board of Maastricht University (Maastricht) later re-evaluated the original approval of the study protocol and procedures (2010). Based on the re-evaluation the institutional review board amended the original approval to include the genotyping of SNPs (April 12, 2010). Participants did not provide written informed consent to the sharing of data.
Gene and Snp selection. Genes and SNPs related to RCC risk were selected through literature search. Priority was given to SNPs with a MAF ≥ 20% in Caucasians and primers had to be compatible with RAAS-pathway SNPs present on the multiplex assay 31 . Consequently, three VHL SNPs (rs779805, rs265318 and rs1642739) and one HIF1A SNP (rs2301111) were selected. All included VHL SNPs were selected based on their association with VHL promoter methylation in previous research 23 . The included HIF1A tag-SNP had the highest MAF of the HIF1A SNPs compatible with the assay. tissue collection and DnA isolation. Approximately 90,000 participants provided toenail clippings at baseline, which have been shown to be a valid source of DNA for the genotyping of germline genetic variants 37 . DNA was isolated according to the DNA isolation protocol by Cline et al. 38 . To increase the number of cases with available DNA, DNA was isolated from FFPE healthy tissue, as described by van Houwelingen et al. 5 , for 67 RCC cases without toenail clippings. There were no substantial quality differences between DNA samples from toenail and FFPE healthy tissue 31 . In total, 3582 (75%) subcohort members and 502 (83%) RCC cases were genotyped.
SNP genotyping was performed on the Sequenom MassARRAY platform using the iPLEX assay (Sequenom Inc., Hamburg, Germany), as described previously 31 . This method provides suitable SNP call rates and reproducibility using toenail DNA 37 .
DNA methylation of the CpG island of the VHL gene promoter region, of which methylation has been associated with inhibition of VHL gene expression 39 , in RCC tumor blocks was determined by chemical modification of genomic DNA with sodium bisulfite and subsequent methylation-specific PCR analysis (MSP) as previously described elsewhere [40][41][42] . MSP primer design was based on the MBD-affinity massive parallel sequencing data. Detailed information on primer sequences and MSP conditions are available elsewhere 24 .
Questionnaire information. All participants completed a mailed, self-administered, questionnaire on diet and other cancer risk factors for cancer at baseline (1986) 43 . Information on dietary habits was obtained through a 150-item, semi-quantitative food frequency questionnaire (FFQ) focusing on habitual consumption of food and beverages during the year preceding baseline.
Cigarette smoking status, frequency and duration were based on self-reported information. Participants reported hypertension as diagnosed by a physician, preceding baseline. Participants were asked to report the use of any drugs that they used longer than 6 months. From this information, the use of antihypertensive medication was extracted. BMI was calculated using self-reported height and weight from the baseline questionnaire. Questions on beer, red wine, white wine, sherry, fortified wines, liqueur, and liquor were used to assess the consumption of alcohol. Participants who consumed alcoholic beverages less than once a month were considered non-users. Standard glass sizes were defined as 200 ml for beer, 105 ml for wine, 80 ml for sherry, and 45 ml for both liqueur and liquor 44 . These values corresponded to 8, 10, 11, 7 and 13 grams of alcohol, respectively. Mean daily alcohol consumption was calculated by multiplying the consumption frequency and the standardized item unit. Statistical analyses. Cox proportional hazards models were used to estimate age-and sex-adjusted and multivariable-adjusted hazard ratios (HR) and 95% confidence intervals (CIs). A priori selected covariables in the multivariable-adjusted model were BMI (kg/m 2 , continuous), hypertension (yes,no), cigarette smoking status (never, former, current), intensity (cig/d, centered; continuous), duration (years, centered; continuous) and alcohol consumption (g/d, continuous).
The most common allele was used as the reference allele. Associations between genotypes and RCC and ccRCC risk were assessed using additive and dominant models. Results of SNPs with a MAF < 0.25 were interpreted using a dominant model for power reasons. SNP allele frequencies in the subcohort were tested against departure from the Hardy-Weinberg Equilibrium using the Pearson χ 2 -test, as calculated with the Stata program 'hwsnp' 45 . Gene-environment interactions were tested with the Wald χ 2 -test. Gene-environment analyses were adjusted for multiple comparisons with the adaptive Benjamini-Hochberg false discovery rate (FDR) procedure with a q-value threshold of 10% 30 . Sensitivity analyses were performed to explore the impact of using alternative categorizations for BMI (<20 kg/m2, 20-<25 kg/m2, 25-<30 kg/m2 and 30 + kg/m2), smoking status (never, ever), hypertension (no self-reported hypertension or no self-reported antihypertensive medication, hypertension with self-reported hypertensive medication) and alcohol consumption (0 g/d, 0.1-4 g/d, 5-14 g/d, 15-29 g/d and 30+ g/d) when assessing gene-environment interactions. Gene-gene interactions between VHL SNPs and the selected HIF1A SNP were tested using the Wald χ 2 -test. In a case-only analysis, the association between VHL SNPs and VHL tumor promoter methylation status (methylated, unmethylated) was assessed using multiple logistic regression for both RCC and ccRCC.
All analyses were performed using Stata Statistical Software: Release 15 (StataCorp., 2017, College Station, TX). The proportional hazards assumption was tested using scaled Schoenfeld residuals 46 . A violation of the assumption was apparent for age. Therefore, all models were adjusted for age as a time-dependent covariable. With the exception of FDR-corrected analyses, a p-value < 0.05 was considered statistically significant.