Introduction

Lynch syndrome (LS) is the most common inherited condition predisposing to colorectal cancer (CRC), and individuals with this condition (LS individuals) also have an increased risk of developing other types of epithelial cancers, most commonly in the colorectum and endometrium12. A molecular genetic diagnosis of LS is established by identifying either a germline pathogenic variant in one of the DNA mismatch repair (MMR) genes MLH1, MSH2, MSH6 or PMS2 or an EPCAM deletion affecting the expression of MSH23. Differences in lifetime risk of CRC are known, showing that carriers of pathogenic variants in MSH6 and PMS2 have a lower risk of developing cancer, especially CRC and at later ages of onset than those with variants in MLH1 and MSH24,6,7,8,9,10,11,12,12. Gender differences are also observed, showing that women have a lower lifetime risk of developing CRC than men8,9, 13, 14.

MMR proteins are responsible for the elimination of base-substitution and insertion/deletion mismatches. Impaired or lost function of one or more MMR proteins confers genetic hypermutability and a higher risk of developing several epithelial cancers throughout life1,15. Differences in disease expression are observed within and among families harbouring the same MMR germline variants and are believed to result from environmental and genetic risk modifiers15,17,18,18.

Genetic variants in the methylenetetrahydrofolate reductase (MTHFR) gene have been proposed as genetic modifiers in LS, affecting disease expression15, 19,20,21. MTHFR is a key enzyme in the folate metabolism pathway. It catalyses the reduction of 5,10-methylenetetrahydrofolate (5,10-MTHF) to 5-methyltetrahydrofolate (5-MTHF), a methyl donor that promotes DNA methylation at the expense of thymidine synthesis20,22. A shift away from thymidine synthesis may cause uracil to be misincorporated into DNA, with excision repair leading to single-strand and double-strand breaks during replication15,19. In individuals with defective DNA MMR, the undesirable effects of high MTHFR activity may be deleterious15.

There are two common single nucleotide polymorphisms (SNPs) in the MTHFR gene, C677T (rs1081133) and A1298C (rs1081131), both known to reduce MTHFR activity, that have been suggested to protect against the development of cancer in LS individuals20,23, 24. The lower MTHFR enzyme activity is hypothesised to reduce the misincorporation of uracil into DNA, reducing the double-strand breaks needing to be repaired, thus causing the protective effect shown in cancer development.

Through international collaboration, we were able to analyse MTHFR C677T and A1298C in 2,723 LS individuals and investigate their association with age at cancer onset and the risk of developing CRC and any LS-related cancer.

Materials and methods

Our sample cohort consists of Australian, Polish, German, Norwegian and Spanish LS individual samples recruited from diagnostic laboratories or family cancer clinics, all carrying pathogenic or likely pathogenic germline MMR variants. The study complies with the ethical considerations and approvals for each separate sample cohort in the respective country: the Hunter New England Research Ethics Committee (Australia), the ethics committee of the Pomeranian Academy of Medicine (Poland), the ethics committee of the University Hospital Bonn, the Regional Committees for Medical and Health Research Ethics (Norway) and the IDIBELL Ethics Committee (Spain)—all experiments were performed in accordance with institutional guidelines and regulations. Written informed consent was obtained from all participants, which for participants under the age of 18 years was their parent or guardian.

Sample cohort

A total of 2,723 LS individual samples with appropriate clinical information available were included in the current international study from five different countries: 680 LS individuals from Australia, 410 from Poland, 557 from Germany, 204 from Norway and 872 from Spain. Demographic data is shown in Tables 1A and 1B. The sample cohort was split in two for analysis purposes depending on whether the LS individual with a cancer diagnosis was diagnosed with CRC or any other LS-related cancer (LS cancer). LS cancer in this context refers to CRC and any extra-colonic epithelial cancer associated with LS, including cancers of the uterine, stomach, liver, kidney, ovaries, brain, pancreas, and certain types of skin cancers.

Table 1 Displays demographic data from combined sample cohorts. (A) Displays demographics for the studied LS cohort (rs1801131 and rs1801133), while (B) Displays demographics for the five countries separately.

Genotyping

Australian and Polish samples

DNA samples were amplified under universal conditions using the Applied Biosystem® 7500 Real-Time (RT) PCR System (Applied Biosystems, Foster City, Ca, USA). Post-PCR allelic discrimination was performed using TaqMan® SNP Genotyping Assays (ThermoFisher Scientific) for C677T (rs1801133, assay ID: C___1202883_20) and A1298C (rs1801131, assay ID: C____850486_20). Each reaction mixture contained 0.125 µL 40 × Assay Mix, 2.5µL TaqMan® Universal PCR master mix, 1 µL DNA and Milli-Q® water to make up a final volume of 5 µL. Thermal cycling conditions were set at 60 °C for 1 min, 95℃ for 10 min, 60 cycles of 95 °C for 15 s and 60 °C for 1 min. Positive controls for each SNP genotype were used to ensure the quality of PCR performance, while no template controls (NTCs) monitored for the contamination of reagents.

German samples

Leukocyte-derived DNA was genotyped with the Illumina Infinium Global Screening Array (GSA) v3.0 (Illumina, Inc., San Diego, CA, USA) designed by the Global Screening Array Consortium using a semiautomated protocol. All laboratory procedures were performed in accordance with the manufacturer's instructions. Illumina raw intensity files were uploaded with the Illumina GSA manifest and cluster file into the GenomeStudio software, and genotypes were subsequently exported to PLINK format.

Norwegian samples

SNP genotyping for the two variants was performed using TaqMan® assays (ThermoFisher Scientific) and TaqPath ProAmp Master Mix (Applied Biosystems, ThermoFisher Scientific) according to the manufacturer’s instructions, with minor modifications. The two TaqMan Assays included SNP ID: rs1081131 (A/C Chr.3: 3739758 on GRCh38) and rs1801133 (G/A Chr.1: 11796321 on GRCh38) (ThermoFisher Scientific catalogue nr 4351376). In brief, approximately 0.75 ng DNA was used as input for the 10 µl SNP TaqMan assays run in 384-well plates. A master mix was prepared, containing 5.0 µl TaqPath ProAmp Master Mix, 0.25 µl TaqMan SNP Genotyping Assay (40X), 3 µl genomic DNA or NTC and 1.75 µl water.

The SNP genotyping assay was performed on a real-time PCR instrument (QuantStudio™ 5 Real-Time PCR System, Applied Biosystems, Thermo Fisher Scientific) under the following conditions: Pre-read (60 °CC for 30 s), initial denature/enzyme activation (95 °C for 5 min), cycling for 40 cycles (95 °C for 15 s, 60 °C for 30 s and 60 °C for 60 s) and post-read (60 °C for 30 s). SNP genotypes were obtained by the QuantStudio™ 5 Real-Time PCR System software.

Spanish samples

Leukocyte-derived DNA samples were genotyped with the Illumina Global Screening Array-24 v2.0 and v3.0 designed by the Global Screening Array Consortium (GSA). Samples were genotyped at once (24 samples/array). As internal controls, 23 unique samples belonging to the HapMap project were also included in duplicate to measure the experiment's reproducibility. Genotyping was performed at CEGEN (Centro Nacional de Genotipado, Instituto de Salud Carlos III, Spain).

Statistics

Statistical analyses were performed using R version 4.1.1 (2021-08-10) (R Foundation for Statistical Computing, Vienna, Austria). Pearson’s Chi-square test was used to evaluate deviation from the expected Hardy–Weinberg equilibrium using a web-based program (http://www.dr-petrek.eu/documents/HWE.xls). For each SNP, variation in age at cancer onset by genotype was examined using Kaplan–Meier plots. Cancer-free individuals were censored at their age at last follow-up. Kaplan–Meier survival curves stratified by genotype are provided with p-values from log-rank tests assessing whether age at cancer onset differed by genotype.

In the total sample, the association between SNP genotype and age at cancer onset (risk of cancer) was analysed using a Cox proportional hazards gamma shared frailty model to allow for the relatedness of some individuals within a single-family group. Two models were provided: a crude model containing genotype only and a model additionally adjusted for gender, country and gene.

The risk of cancer was also estimated for each SNP by genotype and gene (excluding individuals with pathogenic variants in PMS2 or EPCAM due to low sample numbers in the rare genotypes) using the Cox proportional hazard gamma shared frailty model as above. Two models were used: a crude model containing gene and genotype and their interaction, and a model additionally including gender and country as covariates. Hazard ratios, 95% confidence intervals and p-values were provided.

In addition, Kaplan–Meier and Cox proportional hazards gamma analysis was performed to explore the relationship between the number of protective alleles for both SNPs and age at cancer onset and cancer risk (aggregated effect of protective alleles). The protective alleles were C for A1298C (rs1801131) and T for C677T (rs1801133).

P-values less than 0.025 were considered statistically significant after applying a Bonferroni correction for the two SNPs analysed.

Results

The analysis included 2,723 individuals with a molecular genetic diagnosis of LS, carrying pathogenic or likely pathogenic variants in MLH1, MSH2, MSH6, PMS2 or EPCAM (see Table 1A for LS individual demographics). Of these, 127 samples were excluded from the study due to insufficient DNA quantity for genotyping or missing/undetermined genotyping information for both SNPs. Of the samples with informative genotyping data, three had missing/failed information for A1298C and 14 for C677T, making the sample size 2,593 for A1298C (rs1801131) and 2,582 for C677T (rs1801133). Demographics of the sample by country and genotypes for the two SNPs are shown in Tables 1B and 2, respectively. Genotype distributions were consistent with Hardy–Weinberg equilibrium for A1298C (rs1801131) (p = 0.126) and C677T (rs1801133) (p = 0.099).The mean age of cancer onset in this sample population is 47 years (54 years for MSH6 and 44 years for both MLH1 and MSH2 variant carriers).

Table 2 Genotype frequencies and percentages for the sample cohort, total LS cohort and divided by country.

Overall, no significant associations (p < 0.025) were observed when the data set was analysed using LS cancer in LS individuals as the endpoint of analysis. Kaplan–Meier analysis showed that within all genes, LS individuals with the SNP A1298C (rs1801131) AA genotype appeared more likely to develop LS cancer earlier than individuals with genotypes AC or CC, but the difference was not statistically significant. The same was true for Cox regression analysis; LS individuals with SNP A1298C (rs1801131) genotypes AC and CC were less likely to develop LS cancer than the AA genotype. However, the difference was not significant, see Table 3. Results using CRC as the endpoint of analysis are summarised in Tables 4 and 5.

Table 3 Displays the results for the crude and adjusted (gender, country and gene included as covariates) regression for SNP rs1801131(A > C) and rs1801133(C > T) in the whole sample (LS-related cancer) across all genes including EPCAM and PMS2.
Table 4 Displays the results for the crude and adjusted (gender, country and gene included as covariates) regression for SNP rs1801131(A > C) and rs1801133(C > T) in the CRC sample across all genes including EPCAM and PMS2.
Table 5 Displays results from Cox Regression analysis using the genotype results for A1298C (rs1801131) from the LS CRC sample cohort divided by individual MMR genes adjusted for gender and country.

Risk of CRC

As expected, individuals with germline variants in MSH6 demonstrated a reduced risk of CRC (mean age of onset 54 years) compared to both MLH1 and MSH2 (both with a mean age of onset of 44 years) germline variant carriers (this is consistent with all genotypes for both SNPs in the current study), see Figs. 1 and 2. The same was observed when using LS cancer as the endpoint of analysis (data not shown).

Figure 1
figure 1

Displays C677T (rs1801133) hazard ratios for risk of CRC in MLH1, MSH2 and MSH6 pathogenic variant carriers.

Figure 2
figure 2

Displays A1298C (rs1801131) hazard ratios for risk of CRC in MLH1, MSH2 and MS6 pathogenic variant carriers.

With Cox regression analysis adjusted for gender, country of sample origin and mutated MMR gene, LS-individuals with A1298C (rs1801131) genotypes AC and CC were less likely to develop CRC than those with genotype AA (17% estimated reduction in risk; HR 0.83 (CI 0.72–0.96), p = 0.012 and 22% reduction in risk; HR 0.78 (CI 0.61–0.99), p = 0.044 respectively, see Table 4). Only the AC genotype was associated with a significant reduction in risk due to the adjusted significance threshold of 0.025. No significant difference between genotypes for C677T (rs1801133) and risk of CRC was observed, see Table 4.

In the analysis by mutated MMR gene (PMS2 and EPCAM excluded due to low sample number), for individuals with germline pathogenic variants in the MLH1 gene we observed that those with the CC genotype of A1298C (rs1801131) had a 39% lower risk of developing CRC than individuals with the AA genotype (HR 0.61 (CI 0.42–0.89), p = 0.011, see Table 5 and Fig. 1). No significant association was found for C677T (rs1801133) (see Table 6). Interestingly, MSH2 variant carriers carrying the AC genotype for rs1801131 had a significantly reduced risk of CRC, with a 26% reduction compared to those with the AA genotype (HR 0.74 (CI0.58–0.93), p = 0.010, see Table 4 and Fig. 2) but not those with the CC genotype. Again, results were not significant for rs1801133, see Table 6.

Table 6 Displays results from Cox Regression analysis using the genotype results for C677T (rs1801133) from the LS-related cancer sample cohort divided by individual MMR genes adjusted for gender and country.

Aggregated effect of combined protective alleles

The aggregated effect of combined protective alleles from the two SNPs was explored. Due to low numbers of LS individuals carrying 3 or 4 protective alleles, these were combined into one group (3–4 alleles). A later age of onset of CRC was seen for the LS individuals with 3–4 protective alleles, but this was not significantly different due to the adjusted significance threshold (p = 0.04). Cox regression analysis showed that LS individuals with some protective alleles were significantly less likely to develop CRC than those with no protective allele. Having one protective allele was associated with a 26% reduction in risk (HR 0.74 (CI 0.59–0.92), p = 0.006), and having two protective alleles, a 27% reduction (HR 0.73 (CI 0.58–0.91), p = 0.006). However, having 3–4 protective alleles conferred no benefit (HR 0.89 (CI 0.40–2.00), p = 0.8), see Table 7 and Fig. 3.

Table 7 Displays results from Cox Regression analysis using the combined number of protective alleles for A1298C (rs1801131) and C677T (rs1801133) from the CRC sample cohort adjusted for gene, gender, and country.
Figure 3
figure 3

Displays the aggregated effect of protective alleles C (A1298C (rs1801131)) and T (C677T (rs1801133)) hazard ratios for risk of CRC in 0, 1, 2 and 3–4 protective alleles.

Discussion

Few studies have investigated the modifying effect of MTHFR SNPs on the risk of CRC in LS individuals, and their results are conflicting19,21,21. In this analysis, we aimed to verify previous findings to determine the modifying effect of MTHFR polymorphisms on LS expression by increasing the size of the analyzed cohort. The current study explores the role of two common MTHFR SNPs, A1298C (rs1801131) and C677T (rs1801133), and their effect on cancer risk in individuals with a molecular genetic diagnosis of LS.

These SNPs are alleged to be involved in the development of cancer, especially CRC, by altering MTHFR activity, which in turn reduces the silencing of tumour suppressor genes and increases the availability of nucleotides for DNA synthesis and repair, thereby protecting against early-onset cancer in LS15,21. The current study shows that these SNPs affect CRC risk but not LS cancer risk as a whole. The effect of C677T (rs1081133) is well established, with the variant allele resulting in a thermolabile enzyme with 65% (CT) and 30% (TT) enzyme activity, respectively, compared to wildtype genotype (CC)23,25. Several studies have found that LS individuals carrying one or more variant alleles of this SNP have a reduced risk of CRC19,21,21, 26,27,28,29,30,31,32. For A1298C (rs1081131), the reduction in MTHFR activity results in an enzyme with 85% activity for the AC genotype and 70% for the CC genotype compared to the AA genotype 33,34. Research on A1298C (rs1801131) and cancer risk display inconclusive association results and studies are often limited by small sample size26,27, 32, 35. However, some studies suggest that harbouring one or two C alleles on A1298C protects against developing CRC21,29, 32.

The current study shows LS cohorts consistent with published literature; individuals carrying germline MSH6 pathogenic variants have a reduced risk of developing cancer compared to carriers of MLH1 and MSH2 pathogenic variants4,5, 7, 8, 11, 12.

Our findings display that irrespective of the mutated MMR gene, individuals with the AC genotype of the A1298C (rs1801131) SNP have a significantly reduced risk of developing CRC (17%) compared to those individuals with the AA genotype. The heterozygote AC genotype has previously been shown to reduce the risk of CRC21,29, 32, supporting the protective effect of the C allele. Individuals with the CC genotype also have a 22% reduced risk of CRC compared to the AA genotype, but this reduction was not statistically significant. Our results are similar to those of other studies24,32, 35, 36. Still, controversial results have been published showing an increased risk for genotype CC19,27, which was not confirmed in the current analysis. The small sample size in this group in the current study, reflected in the wide confidence interval, likely affected our power to estimate this effect.

Furthermore, we found that individuals with germline pathogenic variants in MLH1 and the CC genotype of A1298C (rs1801131) had a significantly reduced risk of developing CRC (39%) compared to the rest of the cohort, indicating that the underlying germline MMR variant is important when looking at the modifying effects of MTHFR polymorphisms. These genotypes will be of even more interest once polygenetic risk scores become better defined. Our findings also showed that individuals with MSH2 pathogenic variants and A1298C (rs1801131) genotype AC had a significantly reduced risk of developing CRC (26%) compared to individuals with MSH2 pathogenic variant genotype AA, demonstrating that the heterozygote genotype has the best protective effect for these individuals. MTHFR is an important folate-metabolising enzyme that regulates DNA methylation and synthesis. Increased MTHFR activity has been theorised to result in earlier CRC onset, owing to the hypermethylation of tumour suppressor genes and the depletion of nucleotides available for DNA synthesis and repair. A limitation of our study was the inability to account for lifestyle and environmental factors, particularly folate status. It has been well established that adequate dietary folate consumption reduces cancer risk due to the hypermethylation of oncogenes37.

In conclusion, our study explored the association between MTHFR polymorphisms C677T (rs1801133) and A1298C (rs1801131) and the risk of developing CRC in LS individuals. We have shown that two genotypes (AC and CC) of SNP A1298C might have a protective effect on CRC development that differentiates between MLH1 and MSH2 germline variant carriers, which can explain some of the previous inconsistencies in results for this SNP and risk of CRC in LS individuals. In addition, we show that an aggregated effect of protective alleles from the two SNPs combined reduces the risk of CRC. Our study suggests that MTHFR genotypes, together with the underlying germline MMR gene, might be useful in an algorithm predicting the risk of developing CRC for individuals diagnosed with LS. The current study may also provide guidance for CRC risk estimation in LS individuals and contribute to reducing the current health, social and economic burden of cancer development in LS individuals.