Prognostic significance of folate metabolism polymorphisms for lung cancer

Functional nonsynonymous single-nucleotide polymorphisms (nsSNPs) of folate metabolism genes can influence the methylation of tumour suppressor genes, thereby potentially impacting on tumour behaviour. To investigate whether such polymorphisms influence lung cancer survival, we genotyped 14 nsSNPs mapping to methylene-tetrahydrofolate reductase (MTHFR), methionine synthase (MTR), methionine synthase reductase (MTRR); DNA methyltransferase (DNMT2), methylenetetrahydrofolate dehydrogenase (MTHFD1) and methenyltetrahydrofolate synthetase (MTHFS) in 619 Caucasian women with incident disease, 465 with non-small cell (NSCLC) and 154 with small cell lung cancer (SCLC). The most significant association detected was with MTHFS Thr202Ala, with carriers of variant alleles having a worse prognosis (hazard ratio (HR)=1.49; 95% confidence interval: 1.14–1.94). Associations were also detected between overall survival (OS) in SCLC and homozygosity for MTHFR 222Val (HR=1.92; 1.03–3.58) and between OS from NSCLC and MTRR 175Leu carrier status (HR=1.36; 1.06–1.75). While there is evidence that variation in the folate metabolism genes may influence prognosis from lung cancer, current data are insufficiently robust to distinguish individual patient outcome.

Lung cancer is a major cause of cancer mortality worldwide. In the United Kingdom, it accounts for more than 33 000 cancer deaths each year (Cancer Research UK). Despite improvements in treatment in recent years, the prognosis from the disease has only marginally improved with 5-year survival rates for both small (SCLC) and non-small cell lung cancer (NSCLC), typically being no better than 15% (Jemal et al, 2002). While the major prognostic determinant in lung cancer is stage at presentation, there is variability in survival for patients with same-stage disease. Hence, it is advantageous to identify further prognostic markers, which may aid identification of those patients who will benefit from therapeutic interventions. Furthermore, identifying genes which influence prognosis has the potential to aid the identification of pathways that will be targeted for therapeutic interventions.
Aberrant DNA methylation is recognised as being a common feature of human neoplasia, CpG island hypermethylation and global genomic hypomethylation occurring simultaneously in tumours including lung cancers. Moreover, the cellular profile of DNA hypermethylation has been implicated in progression and metastasis of lung cancer (Nakamura et al, 2003;Shimamoto et al, 2004).
Variants of folate metabolism pathway ( Figure 1) genes such as functional polymorphisms of 5,10-methylene-tetrahydrofolate reductase (MTHFR), affect methylation of DNA and tumour suppressor genes (Kamiya et al, 1998;Paz et al, 2002), thereby potentially impacting on tumour behaviour. This coupled with the observation that polymorphisms of this pathway can affect the efficacy of cytotoxic drugs (Maring et al, 2005) has provided a strong rationale for evaluating such variants as prognostic factors.
Here, we report the impact of polymorphic variation within the folate metabolism pathway genes MTHFR, methionine synthase (MTR), methionine synthase reductase (MTRR), DNA methyltransferase (DNMT2), methylenetetrahydrofolate dehydrogenase (MTHFD1) and methenyltetrahydrofolate synthetase (MTHFS) on lung cancer prognosis in 619 patients. We based our analysis on nonsynonymous single-nucleotide polymorphisms (nsSNPs) as these alter the amino-acid sequence of expressed proteins and are most likely to have functional consequences.

METHODS Patients
Patients with lung cancer were ascertained through the Genetic Lung Cancer Predisposition Study (GELCAPS). Full details about the design and conduct of the study can be obtained elsewhere (Matakidou et al, 2005). Briefly, patients were recruited through oncology centres in the UK specializing in the management of lung cancer. To ensure that data and samples were collected from bona fide lung cancer cases and avoid issues of bias from survivorship, only incident cases with histologically or cytologically (only if not adenocarcinoma) confirmed primary disease were ascertained. Demographic characteristics (sex, date of birth, ethnic group, area of residence and country of birth, smoking history, history of lung cancer in a first degree relative), treatment and clinical follow-up were collected from cases using standardized questionnaire and proformas. The current analysis is based on 619 female patients all of whom are white Caucasians. Patient characteristics are detailed in Table 1. Ethical approval for the study was obtained from the London Multi-Centre Research Ethics Committee (MREC/98/2/67) in accordance with the tenets of the Declaration of Helsinki. All participants provided informed consent.

Statistical methods
Associations between survival and demographic and clinical variables were assessed by means of the w 2 and Fisher's exact tests. Testing for population substructure was based on examining the distribution of SNP genotypes for evidence of Hardy -Weinberg disequilibrium. Overall survival (OS) of patients was the end point of the analyses. Survival time was calculated from the date of diagnosis of lung cancer to the date of death. Patients who were not deceased were censored at the date of last contact. Median follow-up time was computed among censored observations only. Kaplan -Meier survival curves according to genotype were generated and the homogeneity of the survival curves was tested using the log-rank test. Cox regression analysis (Klein and Moeschberger, 1997) was used to estimate hazard ratios (HRs) and their 95% confidence intervals (CI) while adjusting for radiotherapy and stage. Likelihood ratio testing for the inclusion of covariates and interaction terms was performed to determine the best-fitting model. For each SNP, HRs were generated using common allele homozygotes as the reference group (unless otherwise specified). For polymorphisms with fewer than five  minor allele homozygotes, minor allele homozygote genotypes were combined with heterozygotes. In addition, to study the impact of individual SNPs on survival, we evaluated OS as a function of the number of 'risk alleles' carried. In this analysis, risk was trichotomised into low, medium and high-risk categories.
Owing to the exploratory nature of this study, we reported nominal statistical associations for all analyses. We recognise that examining multiple SNPs risks identification of false associations. However, correction for multiple testing may increase the risk of type II errors (Perneger, 1998). Accordingly, we present uncorrected P-values but recognise our exploratory findings require confirmation in another study. This approach minimises loss of true positive results but allows false positive results to be identified (Perneger, 1998;Cuzick, 1999). To adjust for multiple testing, we multiplied P-values of each individual test statistic by the number of SNPs in the corresponding gene to obtain a genewide P-value, the global P-value being the product of the genewide P-value and the number of genes. Statistical analyses were undertaken using S-Plus (Version 8, Insightful Corporation, USA). The power to demonstrate a relationship between SNP genotype and OS was estimated using sample size formulae for comparative binomial trials (Farrington and Manning, 1990). In all analyses, a P-value of 0.05 was considered statistically significant. To assess the level of linkage disequilibrium (LD) between SNPs, we calculated the pairwise LD measure r 2 between markers mapping to the same gene using the programme PHASE (Stephens et al, 2001) that implements the Monte Carlo Markov Chain procedure to estimate two-locus haplotype frequencies. This information was used to investigate the relationship between haplotypes and OS.

Bioinformatic analysis
We applied two in silico algorithms, Polymorphism Phenotyping (PolyPhen) and the Sorting Intolerant from Tolerant to predict the putative impact of missense variants on protein function (Ng and Henikoff, 2002;Ramensky et al, 2002). Sorting Intolerant from Tolerant and PolyPhen scores were classified according to the established criteria (Ng and Henikoff, 2002;Xi et al, 2004).

Study population and SNP genotype distributions
One hundred and fifty-four of the patients (25%) had SCLC, somewhat less than half (43%)  Surgery, any chemotherapy and treatment specifically with platinum-based compounds did not satisfy the proportional hazards assumption required for the Cox model. Therefore, we used a stratified Cox model, stratifying on these covariates. Stage at presentation, histology, radiotherapy, smoking, family history of lung cancer and age at diagnosis were initially included as covariates and backward stepwise selection procedure was conducted to cover the most parsimonious model. Stage and age were included as categorical and continuous variables, respectively. Other factors were coded as binary variables. Factors significantly influencing patient prognosis were stage at presentation (Po10 À4 ), histology (P ¼ 0.026) and radiotherapy (P ¼ 0.0042). Smoking, family history of lung cancer and age at diagnosis did not impact on survival.

Relationship between SNP genotype and prognosis
For most SNPs genotyped (92%), minor allele frequencies (MAF) were 5% or higher. One SNP was however, observed at comparatively low frequencies (i.e. having MAF o5%). There was no evidence in the data set for population stratification based on testing the distribution of SNP genotypes for Hardy -Weinberg disequilibrium. Thirteen nsSNPs in six genes were assayed. Only SNPs S257T, R415C and P450R, and SNPs H595Y and K175L, all mapping to MTRR, were in strong LD (i.e. r 2 ¼ 1.0 and r 2 ¼ 0.81, respectively). Hence, the relationship between SNP haplotype and prognosis was restricted to this locus.
There was no correlation between the SNP genotype and pathological parameters, (stage and histology), but in view of the differences in biology of NSCLC and SCLC we also examined for relationships between genotypes and prognosis in the two cell types separately. Table 2 details the relationships between SNP genotype and OS from lung cancer obtained from Cox regression analysis.
Evaluating OS as a function of the number of 'risk alleles' provided no evidence of an interaction between SNPs (data not shown). Finally, we examined for potential interactive effects between SNPs, response to platinum-based chemotherapy and prognosis. None showed nominally significant interactions at the 5% level.

DISCUSSION
Major strengths of our study are its large size, the fact that it is population-based, included only patients with incident disease, and has involved the systematic follow-up of patients. We are mindful that it is desirable that studies aimed at identifying prognostic markers should be conducted within the context of a clinical trial to minimise bias. Although bias from non-uniform treatment is a potential confounder in studies of some solid tumours, the management of lung cancer is relatively uniform in the UK, as there are only a restricted number of effective chemotherapeutic agents and prognosis is uniformly poor. Support for this assertion is provided by the fact that survival rates observed in our patient cohort were not different to those expected. It is therefore unlikely that any spurious influences as a consequence of study design will have impacted significantly on our findings. It is well known that the allele frequencies of many SNPs vary among different populations. As our analysis was restricted to white patients, our study findings are unlikely to be confounded by population stratification. The main limitation of our study is the ability to pursue an in-depth examination of the effect of non-genetic factors such as circulating folate levels, which may interact with genotype in defining the clinical behaviour of tumours.
Despite such limitations in this study, we have observed significant evidence for associations between survival and variation in MTHFR, MTHFS and MTRR. Our observation that polymorphic variation in the folate metabolism genes influences cancer prognosis is not without precedent (Alberola et al, 2004). We fully acknowledge that we have not captured all variation defined by nsSNPs mapping to all of the folate metabolism genes but our selection was restricted to validated SNPs that could be robustly genotyped using the analytical platform we employed. For example, it would have been desirable to have genotyped nsSNPs mapping to DNMT1 and DNMT3b, given previously published data implicating variation in these genes in development and prognosis of lung cancer (Kassis et al, 2006;Kim et al, 2006;Wang et al, 2006). However, to date only two common (MAF 40.05) validated nsSNPs map to DNMT1 (Ile311Val and His97Arg) and both unfortunately had low designability for the genotyping platform we employed, thereby precluding evaluation.
We evaluated nsSNPs on the basis that each has the capacity to directly affect the function of expressed proteins, implying a higher probability of being directly causally related to susceptibility. There is good evidence that MTHFR Ala222Val directly affect the function of the expressed protein. For SNPs such as MTHFS Thr202Ala and MTRR Ser175Leu, substitutions are not predicted to be benign. Although such in silico predictions about the functional consequences of amino-acid changes are not definitive, these algorithms have been demonstrated in benchmarking studies to successfully categorise 80% of amino-acid substitutions (Savas et al, 2004;Xi et al, 2004).
The nature of our study precluded us from formally evaluating SNPs in relation to response to radiotherapy as this was only administered to a small number of patients. Similarly only a small number of patients did not receive platinum-based chemotherapy limiting our ability to robustly detect interactions between this type of therapy, genotype and prognosis. Although there may be differences between NSCLC and SCLC, which may reflect differences in biology of the tumour types, our data did not provide real evidence that folate metabolism variation plays a major role in defining differences in prognosis between these tumour types.
In studies of the type we have conducted, there is the issue of adjustment for multiple comparisons. We assessed 13 polymorphisms in seven genes but because more than one polymorphism þ / þ , þ /À and À/À refer to the common homozygotes, heterozygotes and rare homozygotes, respectively.
was tested in some genes, the results are not independent. Hence, for MTHFS Thr202Ala, the statistical threshold for global significance is 0.007. Issues of power are also relevant to the formulation of studies seeking to identify polymorphic variants influencing cancer prognosis. The magnitude of any difference in prognosis associated with individual SNPs is likely to be at best modest hence stipulating significance levels of B10 À4 or less to adjust for multiple testing is inherently unrealistic. For example, for an analysis to have 80% power to demonstrate a 5% difference in survival, which is clinically relevant, would require at least 4800 patient samples to be analysed even if the frequency of the at-risk genotype is 50% stipulating such significance levels. For less frequent genotypes, samples sizes would be impossibly large. On this basis the imposition of very stringent P-values (as advocated in genome-wide case -control studies) to outcome studies is questionable creating the serious issue of generating a raft of type II errors (Perneger, 1998).
Despite the strong biologic plausibility and consistency with literature for several individual associations as discussed herein, some of these associations may be false positives as a result of the inherent pitfalls of the candidate gene approach. Hence, individual associations reported in this article must inevitably be interpreted with caution. Nevertheless, even for those true associations, it is unlikely that any individual SNP would have sufficient power to predict clinical outcomes in a disease as complex as cancer. In this context, combined analyses of two or more SNPs in the same pathway are likely to have superior potential to assist in distinguishing different outcome patterns among patients with the same stage disease as even 5 -10% differences in prognosis are relevant in a disease. Furthermore, it is plausible that the impact of variation in the folate metabolism genes is likely to be best seen in situations where the pathway plays a major role in defining the efficacy of chemotherapeutic agents in cancers amenable to treatment with agents such as pyrimidine-antagonists (Maring et al, 2005).
In conclusion, however attractive the notion that polymorphisms of the folate metabolism pathway genes are in defining cancer prognosis, their role in lung cancer on the basis of our data is minor at best and they are unlikely to have clinical utility.