A large-scale meta-analysis to refine colorectal cancer risk estimates associated with MUTYH variants

Background: Defective DNA repair has a causal role in hereditary colorectal cancer (CRC). Defects in the base excision repair gene MUTYH are responsible for MUTYH-associated polyposis and CRC predisposition as an autosomal recessive trait. Numerous reports have suggested MUTYH mono-allelic variants to be low penetrance risk alleles. We report a large collaborative meta-analysis to assess and refine CRC risk estimates associated with bi-allelic and mono-allelic MUTYH variants and investigate age and sex influence on risk. Methods: MUTYH genotype data were included from 20 565 cases and 15 524 controls. Three logistic regression models were tested: a crude model; adjusted for age and sex; adjusted for age, sex and study. Results: All three models produced very similar results. MUTYH bi-allelic carriers demonstrated a 28-fold increase in risk (95% confidence interval (CI): 6.95–115). Significant bi-allelic effects were also observed for G396D and Y179C/G396D compound heterozygotes and a marginal mono-allelic effect for variant Y179C (odds ratio (OR)=1.34; 95% CI: 1.00–1.80). A pooled meta-analysis of all published and unpublished datasets submitted showed bi-allelic effects for MUTYH, G396D and Y179C (OR=10.8, 95% CI: 5.02–23.2; OR=6.47, 95% CI: 2.33–18.0; OR=3.35, 95% CI: 1.14–9.89) and marginal mono-allelic effect for variants MUTYH (OR=1.16, 95% CI: 1.00–1.34) and Y179C alone (OR=1.34, 95% CI: 1.01–1.77). Conclusions: Overall, this large study refines estimates of disease risk associated with mono-allelic and bi-allelic MUTYH carriers.

2005; Zhou et al, 2005;Moreno et al, 2006;Tenesa et al, 2006;Webb et al, 2006;Küry et al, 2007;Cleary et al, 2009;Lubbe et al, 2009). Although an increased CRC risk associated with bi-allelic MUTYH mutations is incontrovertible, the risk associated with one MUTYH mutant allele is controversial (Croitoru et al, 2004;Farrington et al, 2005;Jenkins et al, 2006;Tenesa et al, 2006;Webb et al, 2006;Cleary et al, 2009;Jones et al, 2009;Lubbe et al, 2009). A statistically significant or close to significant effect, for a MUTYH mono-allelic effect, has been reported in different studies with possible age specific effects present, but the rarity of the alleles associated with the small increased risk for CRC have made it difficult to replicate study findings. A recent risk analysis of MAP family members agreed with previous family based findings (Jenkins et al, 2006) that mono-allelic carriers are at a two-fold increase in risk of CRC ) providing further evidence of a mono-allelic effect of the gene. However, family based studies can be subject to ascertainment bias and any mono-allelic effect could potentially be modified by other inherited factors, including alleles at other loci. Furthermore, environmental risk factors also show familial aggregation and hence, studies in which there has been selection of cases based on family history may be confounded. Bi-allelic carriers may develop CRC because of the predominant effect of MUTYH, whereas the environmental effect is greater in affected siblings with mono-allelic mutations but the risk is ascribed to the MUTYH allele. Thus further work is required to resolve the mono-allelic carrier risk question.
To clarify the role of MUTYH in disease risk, we initiated a multi-centre collaboration allowing large-scale meta-analysis of the individual MUTYH variants, with special interest in determining if there were age and sex-specific effects on CRC association with MUTYH variants . In this study, we present the results of this collaborative meta-analysis.

Participating studies
Relevant case -control studies to be invited for inclusion in the meta-analysis of the effect of MUTYH on CRC risk were identified by a literature search in the ISI Web of Science (http:// wok.mimas.ac.uk) and PUBMED bibliographic databases (http:// www.ncbi.nlm.nih.gov/pubmed/), using the search terms 'MYH or MUTYH and CRC'. In the initial search 55 studies were identified and eight of these were considered for our study (Enholm et al, 2003;Croitoru et al, 2004;Fleischmann et al, 2004;Kambara et al, 2004;Wang et al, 2004;Farrington et al, 2005;Peterlongo et al, 2005;Zhou et al, 2005). The inclusion criteria were as follows: the patients had to be diagnosed with CRC and the studies had to have genotype data for both cases and controls. Ten additional studies were identified during the progress of the project - Webb et al (2006), Moreno et al (2006), Küry et al (2007), Cleary et al (2009), Lubbe et al (2009; and unpublished data from Koessler T and Pharoah PD; and Tomlinson -personal communication. Colebatch et al (2006); Balaguer et al (2007); Avezzù et al (2008) were used in the pooled meta-analysis of all available published and unpublished datasets.
The principal investigators (PIs) of the selected studies were contacted and were asked to participate by providing a minimum dataset including variables necessary for the analysis (Supplementary Box 1: Study questionnaire; Supplementary Table 1: Data  extraction table). In cases, in whom PIs failed to respond to our invitation to participate, reminder letters were despatched. It was not possible to include data from the following studies in the logistic regression analyses because (i) data was only available for cases that were heterozygous or homozygous for a MUTYH mutation (Enholm et al, 2003); (ii) co-variate data were only available for cases, as controls were anonymous blood donors (Zhou et al, 2005; Tomlinson, unpublished data); (iii) failure to communicate with us (Kambara et al, 2004 andWang et al, 2004). The study by Fleischmann et al (2004) and Webb et al (2006) were not used because they had been superseded by a later study (Lubbe et al, 2009), which was included.

Statistical analysis
Data from all collaborating centres were checked for completeness, coded and merged to form a core database. MUTYH defects were considered pathogenic only if there was published evidence of their pathogenicity. Individuals reported to have two defects of MUTYH in the original report were classified as mutated/mutated (MM), those with one defect as wild type/mutant (WM) and those with no mutation as wild type/wild type (WW). Descriptive statistics were produced on all subject characteristics, risk factors and event data. All populations described in the case -control studies were tested for Hardy -Weinberg equilibrium in controls and the genotype distributions between all groups were compared by w 2 -test.
Three logistic regression models were applied to address confounding co-variates (model I: crude, model II: including co-variates for age and sex, model III: including co-variates for age, sex and study) on the combined datasets investigating the effect of MUTYH defects (WM vs WW and MM vs WW) as well as of the individual mutations Y179C (c.536A4G/p.Tyr179Cys; AA ¼ WW, GG ¼ MM) and G396D (c.1187G4A/p.Gly396Asp; GG ¼ WW, AA ¼ MM; previously known as Y165C and G382D), to identify any variant specific associations. The three logistic regression models were applied after sex and age (over 55 years and under or equal 55 years) stratification as previously described (Farrington et al, 2005), to assess the effect of age and sex on the association of the variants with disease risk. In all the studies, interaction associations between the MUTYH variants and study code (for each individual study) were estimated and similarly between MUTYH variants and hormone replacement therapy (HRT) among female participants in three studies (the Scottish SOCCS studies and the studies - Croitoru et al, 2004;Cleary et al, 2009). Association between both genetic (i.e., one of the MUTYH mutations) and the study code or environmental factor (i.e., HRT) and disease was assessed and interaction was tested by fitting interactive and nested multiplicative models. To assess for any small study effects, we performed Funnel plot analysis and tested for significance using the Harbord test.
Finally, the relationship between the genotype and CRC was analysed by meta-analysis, combining the effect estimates of all published and unpublished datasets.
All statistic analyses were conducted using Intercooled STATA version 10.0 (Stata Corp, College Station, TX, USA). For the logistic regression analyses, it is necessary to add a whole number to any fields containing 0 (see model I a in Table 2), which reduces the final OR value, however, by using the META command in the STATA meta-analysis programme, a lower value can be added (0.5 as indicated by model I b in Table 2) thereby giving a more accurate assessment of risk. However, this is a grouped analysis and therefore cannot be adjusted for confounding co-variates, such as age/sex and study as in models II and III. To account for multiple testing we applied the Bonferroni correction method, and the P-value threshold for significance was estimated to be 0.003.

RESULTS
Table 1 details summary data from the studies included in our combined analysis (comprising a total of 20 565 cases and 15 524 controls). The two variant alleles are rare with G396D variant allele having a frequency of 0.007 in controls and the Y179C variant allele a frequency of 0.002. Tests for deviation from Hardy -Weinberg equilibrium in controls were P ¼ 0.99 and Po0.00005 for G396D and Y179C variants, respectively.    Bi-allelic effect of MUTYH All three models of the logistic regression analysis gave consistent results and so the results of the crude analysis (model I) are described below and presented in Table 2; results of the other two models can be found in Supplementary

Colorectal cancer risk associated with mono-allelic MUTYH mutations
The results of the combined analysis demonstrate that there are no significant mono-allelic effects for either G396D or for combined MUTYH variants (Table 2). However, the specific Y179C variant was shown to increase risk of disease in the heterozygous state (OR ¼ 1.34; (95% CI: 1.00 -1.80)) in the whole sample set and also when stratified by sex, male sex demonstrated a mono-allelic effect (OR ¼ 1.70; (95% CI: 1.06 -2.73)). However, after Bonferroni correction, these mono-allelic effects did not remain significant.

The role of study population and HRT in modulating CRC risk
We hypothesised that origin of the data, that is, study population might modify the association between the genotype and CRC risk. However, there was no evidence for an interaction between study code and MUTYH genotypes (Supplementary Table 5). Similarly, HRT intake, a known risk factor for CRC (Chan et al, 2006;Theodoratou et al, 2008), might be influenced by genotype and therefore modulate female risk. Both the Scottish and Canadian datasets had recorded data on HRT intake and these were used to test for an interaction between HRT and MUTYH genotype. Across both datasets there was no evidence of any interaction between HRT and MUTYH genotype (Supplementary Table 6).

Meta-analysis of published and unpublished datasets
The results of a meta-analysis of published and unpublished datasets submitted to us, estimating the effect of the MUTYH whole gene defects demonstrated a pooled fixed bi-allelic effect of 10.8 (95% CI: 5.02 -23.2) for the MM and a pooled fixed monoallelic effect of 1.16 (95% CI: 1.00 -1.34) for WM genotype (Table 3; Figures 1 and 2). Analysis of the specific variants by pooled meta-analysis demonstrated bi-allelic effects for both G396D and Y179C (OR ¼ 6.47 (95% CI: 2.33 -18.0) and OR ¼ 3.35 (95% CI: 1.14 -9.89), respectively) and in agreement with the logistic regression analysis results, Y179C variant also demonstrated a very similar pooled fixed mono-allelic effect of 1.34 (95% CI: 1.01 -1.77; Tables 4 and 5; Supplementary Figures 1 -4).

Assessment of study publication bias
Funnel plots for both the mono-and bi-allelic effect were created to assess whether study size was significantly influencing the results. These plots appeared asymmetric, but the Harbord's test for small study effect demonstrated that this was not statistically significant ( Supplementary Figures 5 and 6).

DISCUSSION
This large meta-analysis study refines the estimates of CRC risk associated with mutations in the MUTYH gene to date. Bi-allelic carriers of the combined MUTYH mutations (MM) are associated with a 28-fold (95% CI: 6.95 -115) increase in CRC risk from the logistic regression analysis. Bi-allelic carriers of the G396D variant and Y179C/G396D compound heterozygotes were also significantly associated with a similar increase in CRC risk (OR ¼ 23.1 (95% CI: 3.15 -169) and 21.6 (95% CI: 2.94 -159), respectively). Although the risk estimate was slightly lower from the overall larger pooled meta-analysis of published and unpublished datasets (OR ¼ 10.8  (2006) and Fleischmann et al (2004).
recessive Mendelian disease, whereas the results for Y179C are more complex and there is therefore some argument against combining the two variants. However, the results from the Y179C/ G396D compound heterozygotes analysis demonstrates an increase in risk similar to G396D bi-allelic carriers, suggesting that the two variants are complementary and analysis of combined MUTYH mutations as historically performed, appears appropriate to assess risk for the whole gene. The rarity of the Y179C allele has made it difficult to truly assess its effect on disease risk, however the large numbers analysed in this report have resulted in the demonstration that both bi-allelic and mono-allelic Y179C variants are associated with disease risk. The study population did not appear to modulate disease risk and although the study replicated the reported decrease in disease risk in MUTYH wild-type females associated with HRT intake (Chan et al, 2006;Theodoratou et al, 2008), we found no interaction with the MUTYH gene and its variants. Therefore, it is unlikely that HRT intake is an explanation for any sex variation in risk and other genetic factors may be involved in modifying CRC risk.
Evidence of a mono-allelic MUTYH effect on CRC has been reported in several case -control studies (Croitoru et al, 2004;Wang et al, 2004;Farrington et al, 2005;Zhou et al, 2005;Tenesa et al, 2006;Cleary et al, 2009) and family-based studies (Jenkins et al, 2006;Jones et al, 2009), but not in other studies (Kambara et al, 2004;Webb et al, 2006;Balaguer et al, 2007;Lubbe et al, 2009). Our large meta-analysis has demonstrated a marginal significant association for the specific variant Y179C, highlighting the possible increased phenotypic severity of this allele. This is in agreement with other studies and biochemical and model organism studies, which indicate that this variant shows an increased detrimental effect on protein function (Al-Tassan et al, 2002;Parker et al, 2005;Lubbe et al, 2009;Nielsen et al, 2009;D'Agostino et al, 2010). The pooled analysis of published studies and unpublished datasets submitted to us also indicated a marginally significant mono-allelic MUTYH effect, as well as a mono-allelic Y179C effect.
However, there are a number of caveats that need to be considered; if any of the studied datasets contain cases recruited because of the familial clustering of disease, there may be ascertainment bias, artificially inflating the number of MUTYH WM variant allele carriers; secondly the screening of the MUTYH gene has predominantly been performed on the two most common pathogenic variants Y179C and G396D -in some studies, the rest of the gene may be explored in cases with a heterozygous allele for these variants but not usually in the controls, hence there is an overall screening bias and bi-allelic carriers may well have been missed in both cases and controls.
The demonstration of a mono-allelic effect specifically for Y179C should be considered with further caution, as analysis of the control datasets for the Y179C allele demonstrated that it was not in Hardy -Weinberg equilibrium. This may be because of several factors, the rarity of the allele and the fact that both female control subjects with bi-allelic mutations carry Y179C variants. One of these control subjects was shown to have polyps on colonoscopy (Cleary et al, 2009) and may therefore be considered a case. The other is relatively young, less than 60 years old (Lubbe et al, 2009), so potentially may develop cancer over the next few years. However, in this large dataset, we have also shown that bi-allelic carriers of Y179C predisposes to an earlier onset of disease than G396D, consistent with previous reports (Lubbe et al, 2009;Nielsen et al, 2009) and highlights a severer disease phenotype of this variant.
In conclusion, inactivation of the MUTYH gene is a recessive risk factor for CRC, with possible modifying effects indicated by increased risk in cases with early age of onset, although not significantly different in the current dataset. An increased risk associated with mono-allelic MUTYH mutation is indicated, albeit small and not currently clinically relevant, and likely specific for the variant Y179C. Despite the size of this study it has not been possible to definitively establish whether there are significant age and sex effects of increasing disease risk for G396D and Y179C carriers. The evidence presented raises the possibility of a monoallelic effect for Y179C, but the effect is low (OR 1.34; 95% CI: 1.00 -1.80) and is sensitive to variations in population allele frequency because of the rarity of the variant (allele frequency 0.002), as well as potential issues of subgroup analysis and multiple testing (indeed the overall significance is lost after Bonferoni correction). Nonetheless, it does appear that this study is the first to demonstrate that the Y179C variant does impart an increased risk of CRC.