Introduction

Clinical situations suggestive of an inherited predisposition to colo-rectal cancer (CRC) are, like in other cancers, familial aggregation, early age of tumour onset and the development of multiple primary tumours in the same individual. Molecular diagnosis of inherited CRC is stratified according to the clinical, pathological and genetic presentation of the families (see Patel and Ahnen1). Patients presenting with adenomatous polyposis are subjected to APC and/or MUTYH analyses, according to the number of polyps and the familial history suggestive of either a dominant or a recessive mode. The rare patients presenting hamartomatous polyps are subjected to STK11, SMAD4, BMPR1A or PTEN analysis, and, for the vast majority of the patients who do not present with polyposis, the presence of a Replication ERror (RER)/MicroSatellite Instability (MSI) phenotype will be searched within the tumour to establish a somatic signature of the Lynch syndrome. Patients with RER+/MSI tumours will be subjected to the analysis of the MMR genes, MSH2, MSH6, MLH1 and PMS2.

An important fraction of clinical situations suggestive of an increased genetic risk for CRC cannot be explained by a simple Mendelian model involving one of these key CRC genes. Analysis of the activities of the French laboratories by the French National Cancer Institute (INCa) indicates that, among 2000 index cases suspected to present an inherited form of CRC and subjected each year to the key CRC genes testing, the mutation detection rate is lower than 15%. Besides the Mendelian forms of CRC, another significant advance in the genetic determinism of CRC was the identification, owing to numerous genome-wide association studies (GWAS), of genetic risk factors corresponding to single-nucleotide polymorphisms (SNPs). So far, 20 SNPs, respectively located on 1q41, 3q26.2, 6p21.2, 8q23.3, 8q24.21, 10p14, 11q13.4, 11q23.1, 12q13, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.11, 20p12.3, 20q13.33 and Xp22.2, have been reported to be associated with an increased CRC risk.2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 Each of these SNPs confers a small increase in CRC risk with an odds ratio (OR) usually in the magnitude of 1.2. Although for most of these SNPs, the association with CRC has been confirmed by replication studies, the low CRC risk increase conferred by these variants hampers their clinical use. Considering that these GWAS studies have been performed on very large numbers of patients to reach sizeable statistical powers, it is possible that the heterogeneity of CRC patient cohorts, in terms of CRC onset age and/or familial history, the heterogeneity of the patient genetic background and the lack, in most of these studies, of a strict definition of the controls may have diluted the risk associated with these SNPs.

Therefore, to further evaluate the contribution of these SNPs to the genetic determinism of CRC and to determine if these SNPs, alone or in combination, could explain a fraction of CRC occurrence in patients whose clinical presentation is suggestive of an increased genetic risk not explained by a deleterious mutation within one CRC key genes, we performed a prospective study on very carefully selected patients and controls. We selected for this study SNPs whose association with CRC in European populations had been validated by replication studies, showing the highest association with CRC, and which were in the vicinity of genes potentially involved in CRC: (1) rs16892766 on 8q23.3, which is 140 kb centromeric to EIF3H encoding the subunit H of the eukaryotic translation initiation factor 3;5, 8, 13 in vitro EIF3H overexpression has been shown to increase the cellular proliferation of CRC-derived cell lines, whereas reduction of EIF3H has the opposite effect;13 (2) rs6983267 on 8q24.21,3, 5, 6, 8, 14, 15 this SNP affects a binding site for TCF4, a downstream effector of the Wnt/β-catenin signalling pathway commonly deregulated in CRC;16 (3) rs4779584 on 15q13.3 between the GREM1 and SCG5 locus.4, 5, 8, 10 GREM1 encodes one bone morphogenic antagonist secreted by intestinal subepithelial myofibroblasts and increased GREM1 expression, resulting from an upstream region duplication, has been shown to be in some families the molecular bases of hereditary mixed polyposis syndrome, which indicates that alteration of GREM1 expression is tumourigenic;17 (4) rs4939827 on 18q21.1 within intron 3 of SMAD7, encoding an intracellular antagonist of the TGFβ pathway frequently inactivated in CRC.2, 5, 6, 11, 15 The association of CRC with this SNP was subsequently shown to be due to another SNP, rs58920878/Novel 1, also located within SMAD7 intron 3 and regulating SMAD7 expression.18 We also selected for this study the 10p14 SNP rs10795668, initially detected by a large GWAS study5, as a recent meta-analysis confirmed its association with increased CRC risk,19 and rs3802842 on 11q23.1,6 as a strong association with CRC had been reported in Dutch CRC cohort enriched in familial and/or early CRC cases.8

The medical objective of this study was to determine whether or not the increased risk conferred by these SNPs could be sufficient to justify their genotyping in clinical practice.

Materials and methods

Subjects

Patients, from Caucasian origin, were selected according to three different criteria suggestive of an increased genetic risk for CRC (Table 1). All patients were recruited by the French network of Cancer Genetics Departments to ensure the strict observation of inclusion criteria. Controls, recruited by the Clinical Investigation Center of Rouen University Hospital, were healthy volunteers from Caucasian origin, between 45 and 60 years of age, without any personal or familial history among their first-degree relatives of CRC. For each subject, informed consent both for study and performance of genetic analyses was obtained. The study was approved by the local ethical committee.

Table 1 Inclusion and exclusion criteria

Genotyping

For controls and most of the patients, DNA was extracted from peripheral blood using the Flexigen Kit (Qiagen, Courtaboeuf, France) and quality of DNA was assessed using the NanoVue spectrophotometer (GE Healthcare Life Sciences, Velizy-Villacoublay, France). Genotyping of the variants rs16892766: A>C (chr8. GRCh38: g.116618444A>C), rs6983267: G>T (chr8. GRCh38: g.127401060G>T), rs10795668: G>A (chr10. GRCh38: g.8659256G>A), rs3802842: C>A (chr11. GRCh38: g.111300984C>A), rs4779584: T>C (chr15. GRCh38: g.32702555T>C), rs4939827: T>C (chr18. GRCh38: g.48927093T>C) and rs58920878: C>G (chr18. GRCh38: g.48923195C>G) was performed using the SNaPshot methodology following the manufacturer's protocol (Applied Biosystems, Meylan, France). This genotyping method was validated by Sanger sequencing of the 7 SNPs in 10 samples. Quality of the genotyping was checked by analysing 10% of the samples in duplicate, by two different technicians. The concordance was 100%.

Statistical analyses

The target sample size was calculated for a 2:1 case-to-control ratio, no individual case–control matching, two-sided 0.05 Type I error and to obtain at least 80% power for an OR value of 2 in the range 0.06–0.90 for allele frequencies. The OR value of 2 was used as the aim of this study was to determine genetic variations conferring a clinically relevant increase in CRC risk. Based on these criteria, the target sample size was set at 700 cases and 350 controls. Sample size calculations were performed using the POWER program (from EPITOME software available on the US NCI website http://dceg.cancer.gov/tools/design/power).

For each SNP, genotype frequencies in cases and controls were tested for deviations from expected frequencies under the Hardy–Weinberg equilibrium, using the χ2 goodness-of-fit test. For each SNP, frequencies of the three possible genotypes were compared between cases and controls using Pearson's χ2 test. For each of the two genotypes including the at-risk allele reported in the literature, the OR was estimated along with its 95% confidence interval (CI) from logistic regression using the remaining genotype as the reference and with adjustment for sex. The same analyses were performed after dichotomization of genotypes according to a dominant model based on the at-risk allele. The ability of the three-genotype model to better explain risk (ie, to distinguish between CRC cases and controls) was compared with that of the two-level dominant model using the likelihood ratio test within logistic regression to assess the appropriateness of the dominant model. We applied Bonferroni's correction for multiple testing. To evaluate the potential cumulative effect of at-risk alleles, we compared the distribution of the number of at-risk alleles and at-risk genotypes between patients and controls, using the Cochran–Armitage test of trend. OR with 95% CIs were obtained from logistic regression. These analyses were performed using the SAS software (version 9.2; SAS Institute Inc., Cary, NC, USA). Data are available at: http://www.gwascentral.org/study/HGVST1828.

Results

According to the criteria presented in Table 1, we included 1029 patients (412 males (40%), 617 females (60%)) whose clinical presentation was suggestive of an increased genetic risk for CRC but which did not correspond to a known Mendelian form of CRC. None of the patients presented with either adenomatous polyposis defined on the basis of at least 10 adenomas or hamartomatous polyposes. In all the patients, the diagnosis of Lynch syndrome had been excluded, either on the basis of the absence of an RER/MSI phenotype and a normal immunostaining of the MMR proteins in the tumour or on the absence of detectable germline MSH2, MLH1 and MSH6 mutations and rearrangements, when tumour analysis was not possible. Patient clinical characteristics are presented in Table 2. Taking into account that some of these patients fulfilled several inclusion criteria, 363 had a family history of CRC in two first-degree relatives, one having been diagnosed before 61 years of age, 887 had been diagnosed with CRC before age 51 years or with advanced colorectal adenoma before age 41, and 44 patients presented with multiple primary CRCs, the first one being diagnosed before 61 years. We recruited 350 controls (113 males (32.3%) and 237 females (67.7%)) of 45–60 years of age without personal or familial history of CRC, among their first-degree relatives. Mean ages (±SD) of cases at inclusion and controls were 48.7±10.2 and 51.7±4.5 years, respectively. The seven SNPs analysed in this study, rs16892766 on 8q23.3, rs6983267 on 8q24.21, rs10795668 on 10p14, rs3802842 on 11q23.1, rs4779584 on 15q13.3, rs4939827 and rs58920878/Novel 1 on 18q21.1, were in Hardy–Weinberg equilibrium, both in controls and patients (data not shown). Table 3 displays genotype frequencies in CRC patients and controls. As shown in Table 3, there was a statistically significant difference between CRC patients and controls regarding the distributions of rs16892766 on 8q23.3, rs4779584 on 15q13.3, rs4939827 and rs58920878/Novel 1 on 18q21.1, but not those of rs6983267 on 8q24.21, rs10795668 on 10p14 and rs3802842 on 11q23.1. For all these SNPs, there was no evidence that CRC risk was better explained by distinguishing the three genotypes than by grouping them according to a dominant model (likelihood ratio test, all P-value between 0.19 and 0.74). The association of the at-risk level of rs16892766 on 8q23.3 according to the dominant model was the highest of the four SNPs associated with CRC risk, with an OR estimated at 1.88 (95% CI: 1.30–2.72), whereas the remaining three SNPs showed lower (but still significant, even after Bonferroni's correction) associations with very close ORs in the range 1.42–1.49 (Table 3). Almost all the controls (80/89) that were homozygote for the rs4939827 C allele were also homozygote for the rs58920878/Novel 1 C allele and the majority of the homozygote controls for the rs4939827 T allele (67/104) were homozygote for the rs58920878/Novel 1 G allele, confirming that, as previously shown,18 the two SNPs are in complete linkage disequilibrium and constitute a haplotypic block on 18q21.1.

Table 2 Clinical characteristics of the 1029 patients included in the study
Table 3 Assessment of associations between colorectal tumour risk and each of the seven at-risk alleles of SNPsa

Associations of CRC risk with SNPs were further assessed within the subgroups as defined by the inclusion criteria. Based on our results above, only dominant models were considered and only the three SNPs, rs16892766 on 8q23.3, rs4779584 on 15q13.3 and rs58920878/ Novel 1 on 18q21.1, were assessed. The rs58920878/Novel 1 SNP was considered rather than rs4939827 on 18q21.1 in view of previous results showing it to support the association of CRC with 18q21.1,18 but the results were similar with either SNP. As shown in Tables 4 and 5, even after Bonferroni’s correction for multiple testing, we observed, as compared with controls, a significant enrichment of the at-risk allele for the 8q23.3 and 18q21.1 SNPs in patients selected on the basis of the occurrence of CRC in two first-degree relatives and for the 8q23.3, 15q13.3 and 18q21.1 SNPs in patients who developed CRC before 51 years of age or advanced colorectal adenoma before 41 years. The number of patients selected on the basis of the development of multiple primary CRCs (n=44) hampered the comparison in this subgroup. Of note, independently of the inclusion criteria, the enrichment of the at-risk alleles for the three SNPs in CRC cases was confirmed in young patients using a 51-year threshold for CRC occurrence (data not shown).

Table 4 Assessment of associations between colorectal cancer risk and each of the three at-risk alleles of SNPsa in patients selected according to the occurrence of CRC in two first-degree relatives, one being diagnosed before 61 years of age
Table 5 Assessment of associations between colorectal cancer risk and each of the three at-risk alleles of SNPsa in patients with CRC diagnosed before 51 years of age or advanced colorectal adenoma before 41 years of age

We then evaluated the potential cumulative effect of the 8q23.3 on rs16892766, 15q13.3 on rs4779584 and rs58920878 on 18q21.1 at-risk alleles on CRC risk. The overall number of at-risk alleles for the three loci was obtained by summing over the three loci the number of at-risk alleles for each locus, that is, 1 for heterozygote and 2 for homozygote subjects. None of the controls or CRC patients had six at-risk alleles. As only five cases and none of the controls had five at-risk alleles, subjects with four and five at-risk alleles were grouped in one category. CRC risk increased with the number of at-risk alleles (P<0.0001, trend test), with OR values >2 for individuals harbouring at least two at-risk alleles relative to those with none up to a maximal OR of 3.88 (95% CI: 1.72–8.76) for at least four at-risk alleles relative to none (Table 6 and Figure 1). In view of our above results on the appropriateness of a dominant model, the same analysis was repeated by counting the number of at-risk genotypes, whether homozygous or heterozygous. Similarly, CRC increased with the number of at-risk genotypes (P<0.0001, trend test), with OR values >2 for individuals harboring at least two at-risk genotypes relative to those with none, and a maximal OR of 6.21 (95% CI: 2.67–14.42) for three at-risk genotypes (Table 6 and Figure 1). We also calculated the cumulative OR, by considering, as the reference, the most frequent number of at-risk alleles or genotypes observed in the control population, and we found again a statistical significance (P<0.0001, data not shown).

Table 6 Assessment of associations between colorectal tumour risk and the number of at-risk alleles or genotypes at the 8q23.3, 15q13.3 and 18q21.1 loci
Figure 1
figure 1

Number of at-risk alleles or genotypes in cases and controls. (a) Distribution of the number of at-risk alleles in cases and controls. (b) OR according to the number of at-risk alleles. (c) Distribution of the number of at-risk genotypes in cases and controls. (d) OR according to the number of at-risk genotypes.

Discussion

This study on highly selected patients and controls confirms the association of rs16892766 on 8q23.3, rs4779584 on 15q13.3 and rs58920878 on 18q21.1 at-risk alleles with CRC, with ORs in the range 1.4–1.9, thus higher than reported from GWAS studies (Table 3). In contrast, we did no replicate other previously reported associations. It could be argued that our study was underpowered. It is important to highlight here that our study was adequately powered for ORs of 2 or more and had indeed little power for ORs <1.5. This was a deliberate design option in view of the little clinical value of genotyping SNPs with such low levels of CRC risk.

The higher ORs we found compared with GWAS studies may be because our decision to include selected CRC cases with a high likelihood of significant genetic determinism in contrast with GWAS studies with non-selective recruitment of CRC cases. It could also be argued that these higher OR values might be because ‘super-controls’, without any personal or familial history among their first-degree relatives of CRC, were used in this study. Nevertheless, as indicated in Table 3, for all seven SNPs considered, we observed in our control sample a MAF (minor allele frequency) in excellent agreement with previous estimates obtained on large Caucasian populations, which indicates that there was probably no bias in our control sample. A recent study, based on a meta-analysis of previous GWAS studies and high coverage sequencing of the at-risk loci, showed that the association of these GWAS tag SNPs with CRC is not due to the combined effects of rare causal variants but reflects linkage disequilibrium with a common variant located at a vicinity of a gene involved in CRC.12 That study also led to the identification of SNPs more strongly associated with CRC than the original tag SNPs, with ORs between 1.08 and 1.61.12 It is therefore possible that genotyping of these recently identified top SNPs would have allowed us to obtain even larger OR than those we observed.

An enrichment of the 15q13.3 and 18q21.1 at-risk alleles had already been reported in a study performed on 995 Dutch selected CRC cases and 1340 controls8 with OR estimated to be 1.45 and 1.23, respectively. The rs4779584 at-risk allele on 15q13 was also recently found enriched in a series of 252 genetically unexplained index patients with >10 colorectal adenomas versus 745 controls, and interestingly, the observed OR (1.5) was also higher compared with that reported in GWAS studies.20 Similar to the Dutch study,8 we could not replicate the previously reported association with the 10p14 SNP (Table 3), rs10795668, possibly because of the lack of statistical power to detect low ORs, as previously indicated. Our study and that of Middeldorp et al8 found different results for 8q23.3, 8q24.21 and 11q23.1 SNPs as, in the Dutch study, the association with 8q23.3 rs16892766 did not remain significant after correction for multiple testing, whereas the associations with the 8q24.21 and 11q23.1 at-risk alleles were highly significant. This apparent discrepancy between both studies might be explained by the fact that our study was underpowered to detect these associations or by a difference in the selection of the patients or in the genetic background between the French and Dutch samples. Both studies found that CRC risk increased with the number of at-risk alleles, which supports an oligogenic determinism of CRC. This oligogenic determinism of CRC might explain a fraction of the clinical situations suggestive of an increased genetic risk but not explained by a deleterious mutation within a key CRC gene (ie, of the Mendelian type). A large international study,21 performed on over 40 000 individuals mainly from North Europe origin, had previously shown that it should be possible to identify population subgroups with increased risk for CRC, by combining SNP genotyping and familial history. Our study, on a limited number of highly selected cases and controls, confirms the clinical relevance of 8q23.3, 15q13.3 and 18q21.1 SNP genotyping. Indeed, the OR values that we found in individuals harboring at least two at-risk alleles or genotypes are, to our knowledge, the highest values reported, and the level of increased risk (OR>2) is of the same magnitude than that estimated for first relatives of patients with CRC;22 these patients benefit from CRC detection based either on faecal immunochemical testing or colonoscopy every 5 years, starting 5 years before the age of tumour onset in the index case. As our overall finding that the presence of at least two at-risk alleles or genotypes was associated with a substantially increased risk of CRC also applied specifically to CRC cases diagnosed before 51 years of age, our study suggests that this genotyping might be considered from 40 years of age.