Association studies on 11 published colorectal cancer risk loci

Background: Recently, several genome-wide association studies (GWAS) have independently found numerous loci at which common single-nucleotide polymorphisms (SNPs) modestly influence the risk of developing colorectal cancer. The aim of this study was to test 11 loci, reported to be associated with an increased or decreased risk of colorectal cancer: 8q23.3 (rs16892766), 8q24.21 (rs6983267), 9p24 (rs719725), 10p14 (rs10795668), 11q23.1 (rs3802842), 14q22.2 (rs4444235), 15q13.3 (rs4779584), 16q22.1 (rs9929218), 18q21.1 (rs4939827), 19q13.1 (rs10411210) and 20p12.3 (rs961253), in a Swedish-based cohort. Methods: The cohort was composed of 1786 cases and 1749 controls that were genotyped and analysed statistically. Genotype–phenotype analysis, for all 11 SNPs and sex, age of onset, family history of CRC and tumour location, was performed. Results: Of eleven loci, 5 showed statistically significant odds ratios similar to previously published findings: 8q23.3, 8q24.21, 10p14, 15q13.3 and 18q21.1. The remaining loci 11q23.1, 16q22.1, 19q13.1 and 20p12.3 showed weak trends but somehow similar to what was previously published. The loci 9p24 and 14q22.2 could not be confirmed. We show a higher number of risk alleles in affected individuals compared to controls. Four statistically significant genotype–phenotype associations were found; the G allele of rs6983267 was associated to older age, the G allele of rs1075668 was associated with a younger age and sporadic cases, and the T allele of rs10411210 was associated with younger age. Conclusions: Our study, using a Swedish population, supports most genetic variants published in GWAS. More studies are needed to validate the genotype–phenotype correlations.

Until some years ago, the candidate-gene approach was the only method available to the researchers for identifying potentially pathogenic variants. However, the fast technological development and the consequent acquisition of large amount of data in the past decade shifted the focus of research to genome-wide association studies (GWAS). Recent GWAS have identified multiple genetic loci associated with an increased or decreased risk of colorectal cancer (CRC) on 8q23. 3, 8q24.21, 9p24, 10p14, 11q23.1, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.1 and 20p12.3, explaining, at least to some extent, the genetics behind CRC as a complex disease Haiman et al, 2007;Tomlinson et al, 2007Tomlinson et al, , 2008Zanke et al, 2007;Houlston et al, 2008;Jaeger et al, 2008;Tenesa et al, 2008). Each of these loci is associated with a modest risk and, although fairly common they contribute very little to the overall burden of CRC. This case -control study focused on the known CRC single-nucleotide polymorphisms (SNPs) in a Swedish-based cohort and to compare our results with previous association studies in other populations. It also tested if there were more CRC patients than controls among individuals with higher number of risk alleles as reported previously (Tomlinson et al, 2010). Genotype -phenotype associations were analysed for age of onset, sex, family history of CRC and tumour location.

Subjects
The case cohort was composed of 1786 consecutive CRC patients of Swedish origin recruited through the Swedish Low-Risk CRC Study Group from 14 different hospitals from central Sweden during 2004 -2006. The mean age (at diagnosis) was 68.6 years (range 28 -95 years), 53% were men and 47% were women and 22% had a family history of CRC among first-or second-degree relatives. The control cohort was composed of 1749 individuals as follows: 1319 blood donors from the general population between the age of 18 and 65 years and 430 unaffected spouses of CRC patients with the mean age of 66.3 (25 -92) years, which were cancer-free and did not have a family history of any type of cancer.

Quality control
Sequencing was performed using Big-Dye terminator v3.1 cycle sequencing kit (Applied Biosystems), and fragments were separated on an ABI 3730 XL capillary sequencer. Chromatograms were analysed using SeqScape v2.5 (Applied Biosystems). Primers and amplification conditions are available upon request.

Genotype -phenotype analysis
We studied sex, age of onset (early vs late, 460 years), family history of CRC (any case of CRC among first-or second-degree relatives), location, colon vs rectum and right vs left (proximal and distal to the splenic flexure).

Statistical analysis
Deviations of the genotype frequencies in cases and controls from those expected under Hardy -Weinberg equilibrium were calculated by w 2 -tests (one degree of freedom). Allelic frequencies of the SNPs in the case and control groups were compared using a w 2 -test (allele 1 (common) vs allele 2 (minor)), except for rs6983267 where the common allele is suggested to be the risk allele . To make comparisons, we chose to present risk and common allele according to previous publications. Analyses were also performed under various types of genetic models including the comparison of homozygotes (genotype 11 vs 22), the dominant (11 vs (12 þ 22)), the recessive ((11 þ 12) vs 22) models and the allele frequency difference ((1) vs (2)). In addition, Armitage's trend test, which takes into account the individuals' genotypes rather than just alleles, (Sasieni, 1997) was performed using the DeFinetti programme provided as an online source (http://ihg2.helmholtz-muenchen.de/cgi-bin/hw/hwa1.pl). The significance level for statistical tests was set at 0.05. Odds ratios (ORs), their 95% confidence intervals (CIs) and their corresponding P-values were calculated using the same programme. The analyses were validated using Statistica 7.0 (StatSoft Inc., Tulsa, OK, USA).
Statistical analysis for the clinical parameters was carried out with Statistica, using cross-tabulation analysis. Pearson w 2 -test was used to calculate the P-value, and the level of significance was set at 0.05.

RESULTS
Genotype frequencies of cases and controls as well as ORs and P-values for the different analyses are shown in Table 1. Significant associations between 5 of the 11 genotyped SNPs (rs16892766, rs6983267, rs10795668, rs4779584 and rs4939827) and CRC risk were confirmed and showed similar ORs as in previous publications Tomlinson et al, 2007Tomlinson et al, , 2008Jaeger et al, 2008). For SNP rs16892766 on 8q23.3, an increased risk of CRC was identified (Po0.002 for all analyses except the recessive model) with the highest OR equal to 1.34 (1.13 -1.60) for the heterozygous. Likewise, the increased risk suggested for the variant rs6983267 on 8q24.21 was confirmed in all the analyses, with the highest OR equal to 1.37 (1.13 -1.67) for the homozygous state. rs4779584 on 15q13.3 has been associated with an increased risk that could be confirmed for the heterozygous individuals, OR ¼ 1.18 (1.02 -1.36). The protected effects suggested for rs10795668 on 10p14 and rs4939827 on 18q21.1 were both confirmed for homozygous and heterozygous with an OR equal to 0.66 (0.52 -0.83) and OR 0.82 (0.70-0.96), respectively. The ORs for rs3802842 on 11q23.1 showed a trend with an OR equal to 1.27 (NS) for homozygous. The rs9929218 on 16q22.1, rs10411210 on 19q13.1 and rs961253 on 20p12.3 showed weak trends in the same direction as published (NS), whereas the two SNPs rs719725 on 9p24 and rs4444235 on 14q22.2 were not confirmed. The distribution of risk alleles between cases and controls in the Swedish population is shown in Figure 1. There is a clear shift with a higher number of alleles in affected individuals compared to controls.
Genotype -phenotype analysis was performed for all 11 loci and for sex, age, family history and tumour location, and the P-values for all analyses are shown in Table 2. Four associations were found, three for age and one for family history (Table 3). Being homozygous for the risk allele G for rs6983267 showed association to older age (P ¼ 0.0014). In contrast, for rs1075668 the risk allele G was associated with younger age (P ¼ 0.035) and sporadic cases (P ¼ 0.047). The T allele of rs10411210 was associated with younger age (P ¼ 0.045) in homozygotes (Table 3).

DISCUSSION
We studied SNPs on 11 loci published to be associated with an increased or decreased risk for CRC and were able to show statistically significant results for 5 of them. The first SNP, rs6983267 on 8q24.21, was published by Tomlinson et al (2007), where the most common allele G was suggested to be the risk allele. Our study showed similar results as previous studies in other populations (Berndt et al, 2008;Tuupanen et al, 2008;Wokolorczyk et al, 2008;Curtin et al, 2009;Middeldorp et al, 2009). Likewise, the SNP rs16892766 on 8q23.3 was similar to both the GWAS study and one replicative study Wijnen et al, 2009). The protective effect associated with rs10795668 on 10p14 was confirmed for homozygous carriers in the Swedish material . The SNP rs4779584 on 15q13.3, published by Jaeger et al (2008) as a risk association with CRC was confirmed by us. For the SNP rs4939827 on 18q21.1, Broderick et al published the variant to be protective, which could also be shown by us and one previous study (Curtin et al, 2009). The SNP rs3802842 on 11q23.1 was first published by Tenesa and co-workers and confirmed by others Middeldorp et al, 2009;Wijnen et al, 2009). Our results were similar, but not statistically significant. This discrepancy could be due to different populations, sample size or study design. Wijnen et al (2009) used a Dutch population, and used mismatch repair No association was detected for rs719725 on 9p24, initially reported in cohorts from Canada, the United States, Newfoundland, Scotland and France, which the authors themselves were unable to replicate in a second French cohort (Zanke et al, 2007). Later it was confirmed in cohorts from the American, Canadian and Australian populations (Poynter et al, 2007). Even though the distribution of the three genotypes was the same, we hypothesise that this negative result could be due to its population specificity and the causal SNP being on different haplotypes or were these results false positives. A study using British and American cohorts was also unable to detect any association for this SNP (Curtin et al, 2009).
To our knowledge, none of the remaining four SNPs has been studied in other populations yet. In fact, the confirmed five loci were the first ones to be published whereas the SNPs on 14q22.2, 16q22.1, 19q13.1 and 20p12.3 were only captured by meta-analysis of large GWAS , suggesting that these four could be more difficult to replicate in follow-up studies. The three SNPs on 16q22.1, 19q13.1 and 20p12.3 did not show statistically significant values in our study. However, when looking at the ORs in the Swedish samples, association was suggested but with a wider CI compared to the first report . Finally, we were unable to confirm association to CRC risk for rs4444235 on 14q22.2 , which again could be due to a smaller size or possibly a population difference.
Another possible explanation for the different results could depend on different genotype frequencies among populations or methods used for genotyping. For all SNPs the genotype frequencies in Swedish samples were similar to published data. Regarding methods, SNP arrays were used for the GWAS, whereas other studies used Sequenom's iPLEX Gold (San Diego, CA, USA), genomic sequencing, SNPlex, PCR KASPar or TaqMan. This does not immediately explain the different results in the Swedish material. Because four of the five SNPs genotyped by DeCode and rs4778495 genotyped in Edinburgh using TaqMan were confirmed, while none of the five (rs9929218, rs719725, rs4444235, rs10411210 and rs961253) carried out in our lab showed statistically significant results we validated the results from our TaqMan analysis. In total 1000 cases and 1000 controls were sequenced for the five SNPs. The concordance was 99.8%, why we do not think that the method explains the difference between our study results and previous publications.
Carrying one risk variant alone is neither necessary nor sufficient for developing CRC. However, in Figure 1 we show support for the general idea that the CRC patients carry more risk alleles compared to controls. For both cases and controls, the distribution is outlined in the diagram of carriers with a shift toward higher numbers of risk alleles in affected individuals, in line with what has been published (Tomlinson et al, 2010). Even though we did not confirm all SNPs, and used 11 SNPs instead of 10, the distribution of risk alleles showed very similar data (Figure 1) to what was published that further strengthens the results and confirms the genetic contribution by the alleles overall (Tomlinson et al, 2010).
The genotype -phenotype analysis interestingly showed four associations for three SNPs. Other studies have published genotype -phenotype analysis for only one of the loci, 8q24.21, and sex, tumour site, age at diagnosis and family history (Haiman et al, 2007;Poynter et al, 2007;Tuupanen et al, 2008). We report an association to age for rs6983267 on 8q24.21; the risk allele G was associated to our older patients (P ¼ 0.0014). This was not seen in any of the other studies (Haiman et al, 2007;Poynter et al, 2007;Tuupanen et al, 2008), perhaps because of the different age groups used. In contrast to our study and the two other studies, Tuupanen et al (2008) for the same SNP reported an association to family history. This is not likely to depend on the definition of family history, because only our study used a different classification from the other three. In line with our results, no one found any support for sex or tumour site (Haiman et al, 2007;Poynter et al, 2007;Tuupanen et al, 2008). For rs10795668 on 10p14, we showed association to age and family history. Being homozygous for the risk alleles was associated to younger patients (P ¼ 0.035) and to sporadic cases (P ¼ 0.047). For rs10411210 an association was identified for being homozygous for the risk allele in younger patients (P ¼ 0.045). Replications of these genotype -phenotype analyses are needed before any conclusion can be made.
The genetic contribution to CRC as a whole has been estimated to be as high as 35% (Lichtenstein et al, 2000). Although very common in the general population, considering an additive model of inheritance the 10 SNPs discovered so far (9p24 excluded) account jointly for only about 6% of the excess genetic risk . These statements leave the majority of the genetic contribution to CRC development still unexplained and  more studies aiming to define additional SNPs and hopefully also some more high-penetrant predisposing genes are welcomed.