A two-phase case–control study for colorectal cancer genetic susceptibility: candidate genes from chromosomal regions 9q22 and 3q22

Background: Colorectal cancer (CRC) is the second cause of cancer-related death in the Western world. Much of the CRC genetic risk remains unidentified and may be attributable to a large number of common, low-penetrance genetic variants. Genetic linkage studies in CRC families have reported additional association with regions 9q22–31, 3q21–24, 7q31, 11q, 14q and 22q. There are several plausible candidate genes for CRC susceptibility within the aforementioned linkage regions including PTCH1, XPA and TGFBR1 in 9q22–31, and EPHB1 and MRAS in 3q21–q24. Methods: CRC cases and matched controls were from EPICOLON, a prospective, multicentre, nationwide Spanish initiative, composed of two independent phases. Phase 1 corresponded to 515 CRC cases and 515 controls, whereas phase 2 consisted of 901 CRC cases and 909 controls. Genotyping was performed for 172 single-nucleotide polymorphisms (SNPs) in 84 genes located within regions 9q22–31 and 3q21–q24. Results: None of the 172 SNPs analysed in our study could be formally associated with CRC risk. However, rs1444601 (TOPBP1) and rs13088006 (CDV3) in region 3q22 showed interesting results and may have an effect on CRC risk. Conclusions: TOPBP1 and CDV3 genetic variants on region 3q22 may modulate CRC risk. Further validation and meta-analysis should be undertaken in larger CRC cohorts.

Colorectal cancer (CRC) continues to be a major public health problem, although it is a preventable and potentially curable neoplasm. Colorectal cancer is the second most common cancer in Western countries and it also represents the second leading cause of cancer-related death among men and women (Ferlay et al, 2010). Genetic and environmental factors are important in its pathogenesis. Although the majority of CRC is sporadic, inherited susceptibility is relevant in about 30 -35% of cases. Germline mutations in known genes such as APC and the DNA mismatch repair family account for o6% of cases (Piñol et al, 2005;Jasperson et al, 2010). Much of the remaining genetic risk may be attributable to a large number of common, low-penetrance genetic variants each exerting a small influence on risk and following a polygenic model of inheritance (Balmain et al, 2003).
Genetic association studies are among the possible approaches to identify genes that underlie common diseases, either by candidate gene selection or GWAS. Recent GWASs have robustly demonstrated that common genetic variation contributes to the risk of developing CRC, and an increasing number of genomic regions have been shown to be associated with CRC risk. So far, 14 common, low-penetrant genetic variants have been identified for CRC susceptibility on 8q24. 21, 18q21.1, 15q13.3, 8q23.3, 10p14, 11q23.1, 14q22.2, 16q22.1, 19q13, 20p12.3, 1q41, 3q26.2, 12q13.13 and 20q13.33 (Tenesa and Dunlop, 2009;Houlston et al, 2010). Previously, genetic linkage studies in CRC families additionally reported association with regions 9q22 -31 (Wiesner et al, 2003;Skoglund et al, 2006;Kemp et al, 2006a;Gray-McGuire et al, 2010), 3q21 -24 (Kemp et al, 2006b;Papaemmanuil et al, 2008;Picelli et al, 2008;Middeldorp et al, 2010), 7q31 (Neklason et al, 2008), 11q, 14q and 22q . Wiesner et al (2003) reported genetic linkage to chromosomal region 9q22.2 -31.2 in a set of 74 affected sibling pairs from 53 kindred with multiple CRC and/or advanced colorectal adenoma cases. Subsequently, Skoglund et al (2006) confirmed, in one extended family, a linkage region on 9q22.32 -31.1 associated with adenoma and CRC predisposition. In addition, Kemp et al (2006a) suggested further evidence of a CRC susceptibility locus on 9q22.32 -q31.1 in 57 CRC families from the United Kingdom. Recently, Gray-McGuire et al (2010) refined CRC linkage on 9q22 -31 and narrowed it down to a 151-kb region using an additional cohort of 69 independent CRC kindred. On the other hand, Kemp et al (2006b) provided evidence of the existence of a novel CRC predisposition region on chromosome 3q21 -q24 through a genome-wide linkage analysis in 69 CRC families, which was subsequently extended by Papaemmanuil et al (2008) in 34 additional CRC families. Two other independent linkage studies have also pointed out a region on chromosome 3q21 -q24 to be linked to CRC susceptibility (Picelli et al, 2008;Middeldorp et al, 2010).
There are several plausible candidate genes for CRC susceptibility within the aforementioned linkage regions, including PTCH1, XPA and TGFBR1 in 9q22 -31, and EPHB1 and MRAS in 3q21 -q24. However, variations within these genes or other interesting candidate genes have been hardly evaluated in casecontrol genetic association studies to assess them as CRC genetic susceptibility components.
Hence, the aim of our study was to select candidate genes/ variants from 9q22 and 3q22 and to consider their potential implications in CRC genetic susceptibility. Thus, we performed a two-phase case -control association study in the EPICOLON cohort (1416 CRC cases and 1424 controls) analysing 172 singlenucleotide polymorphisms (SNPs) within 84 genes located within these chromosomal regions.

Study subjects
We included 1416 CRC patients and 1424 controls from the Spanish population. Cases and healthy controls were recruited in the EPICOLON project, a prospective, multicentre, populationbased cohort, in two independent phases (2000 -2001 and 2006 -2008). The EPICOLON cohort has been described in detail elsewhere (Piñol et al, 2005;Abulí et al, 2010). The mean age at CRC diagnosis was 70 years, early-onset CRC (o50) was present at a 4 -5% frequency and B15% of cases has a first-degree relative with CRC. Subjects in the discovery phase (phase 1) included 515 CRC cases and 515 controls. Subjects in the replication phase (phase 2) comprised 901 CRC cases and 909 controls. Exclusion criteria for the case -control study were hereditary CRC forms (familial adenomatous polyposis, MUTYH-associated polyposis and Lynch's syndrome) and personal history of inflammatory bowel disease. Cases and controls were gender and age matched ( ± 5 years) and controls lacked personal and family cancer history. DNA samples were extracted as described previously (Castellví-Bel et al, 2007;Fernández-Rozadilla et al, 2010). This study was approved by the institutional ethics committees of each participating hospital and written informed consent was obtained from all individuals.

Candidate gene selection
Linkage region on chromosome 9 was delimited between 90 795 373 and 106 903 700 bp, spanning 16.1 Mb, whereas linkage region on chromosome 3 was contained between 129 274 056 and 140 286 919 bp covering 11.01 Mb (according to NCBI genome build 37.2, http://www.ncbi.nlm.nih.gov). Gene selection was biased to include genes with previous evidence of being involved in cancer (e.g., DNA repair genes), with a function compatible with cancer involvement (e.g., cell cycle) or with a gene ontology term suggestive of a role in cancer (e.g., DNA binding). Pseudogenes were excluded from gene selection. In all, 41 out of 207 genes and 43 out of 123 genes were selected within the delimited regions on chromosomes 9 and 3, respectively, adding up to a total of 84. Description of all selected genes is summarised in Supplementary Tables 1 and 2. It is worth mentioning that gene selection was limited as it was based on current gene function annotations on available databases. Nowadays, it is estimated that gene function is known for 10 -30% of genes in the human genome.

SNP selection and genotyping
Overall, 94 SNPs from 41 genes and 78 SNPs from 43 genes were selected in regions 9q22 and 3q22, respectively, adding up to 172 variants. Single-nucleotide polymorphisms were chosen using only a direct strategy, selecting variants within each gene with a putative functional effect by using PupaSuite, a web tool used for the selection of genetic variants with potential phenotypic effect (Conde et al, 2006; http://pupasuite.bioinfo.cipf.es). TagSNPs were not selected in our study and segregation in the same haplotype block was not considered for exclusion. Single-nucleotide polymorphisms were always prioritised if they were coding, evolutionary conserved in mouse, with a putative regulatory effect in promoter, intronic or 3 0 -UTR regions, or involved in microRNAs binding. Minor allele frequency (MAF) was always 45%. One or two SNPs were usually selected from each gene, although more SNPs per gene were included if gene functionality was considered more important for CRC susceptibility. A description of all selected SNPs from regions 9q22 and 3q22 is available in Supplementary Tables 1 and 2, respectively.
High-throughput genotyping was performed according to the manufacturer's instructions using the SNPlex system (Applied Biosystems, Foster City, CA, USA), or the single-base primer extension chemistry matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry detection platform (Sequenom Inc., San Diego, CA, USA). For rs11466445 (TGFBR1), a specific PCR followed by single-strand conformation polymorphism detection was performed to identify a 9-bp allele difference (deletion of GGCGGCGGC). This SNP was not possible to genotype with high-throughput technology and it was only genotyped in EPICOLON 1 samples. Genotyping was performed at the Santiago de Compostela and Barcelona nodes of the Spanish National Genotyping Centre (www.cegen.org).

Statistical analysis
Genotyping quality control in both cohorts was performed using PLINK v1.03 (Purcell et al, 2007; http://pngu.mgh.harvard.edu/ purcell/plink) excluding SNPs with genotype success rates below 90% and individuals with genotype success rates below 80%. Departure from the Hardy -Weinberg equilibrium for all biallelic SNP markers was tested in controls using a w 2 -test with a single degree of freedom to exclude genotyping artefacts. After quality control, 991 samples (487 cases and 504 controls) and 170 SNPs remained to be analysed on phase 1. On the other hand, 1685 samples (847 cases and 838 controls) and 20 SNPs remained to be analysed on phase 2. There was no sign of underlying population stratification in EPICOLON as tested by an independent study . Genotypic/allelic association tests and logistic regression analyses were performed using PLINK v1.03. Genotype frequency differences were evaluated by regression analysis for allelic, genotypic, dominant and recessive models of inheritance. We estimated the crude odds ratio (OR) and 95% confidence intervals. If one of the genotypes had a frequency o5, then Fisher's exact test was used. In EPICOLON phase 1, a liberal Pvalue threshold (P-value o0.05) was used to avoid false-negative results. We then validated statistically significant phase 1 results by replicating them in another independent CRC cohort (EPICOLON phase 2). Although they were not statistically significant in EPICOLON phase 1, rs1800975 (XPA) and rs357564 (PTCH1) were also moved forward to phase 2 because of their biological interest. Finally, to address the issue of multiple testing, we used Bonferroni's correction (P ¼ 0.0025 for 20 SNPs). Study power was estimated using CATS software (Skol et al, 2006). Association results in EPICOLON for some SNPs were also evaluated in two cohorts described in a previous GWAS (Tomlinson et al, 2007), either by checking the original variant or a proxy SNP highly correlated with it (r 2 40.7). These cohorts were CORGI (1432 CRC cases and 2697 controls) and VQ58 (928 CRC cases and 931 controls).

RESULTS
After genotyping quality control, data from 170 SNPs in 84 genes were available in EPICOLON phase 1 and all SNPs analysed were in the Hardy -Weinberg equilibrium in controls (P-value 40. 01). After association analysis in EPICOLON phase 1, 21 SNPs were statistically significant with an unadjusted P-value o0.05 in any of the tested inheritance models (Table 1). It is worth mentioning that eight of the significant SNPs in phase 1 were located in region 9q22, three of them in the ABCA1 gene. On the other hand, the remaining 13 significant SNPs were located in region 3q22, 2 SNPs each in the A4GNT and ARMC8 genes.
In the EPICOLON phase 1 cohort, a liberal P-value threshold (P-value o0.05) was used to avoid false-negative results. We then validated statistically significant phase 1 results by replicating them in another independent CRC cohort (EPICOLON phase 2). Two SNPs with non-statistically significant results in phase 1 from relevant genes in 9q22 (rs357564 in PTCH1 and rs1800975 in XPA) were pushed forward to phase 2 to achieve results in both cohorts. In addition, rs11466445 in TGFBR1, although being borderline significant in EPICOLON phase 1, was not genotyped in phase 2 because of incompatibility with the used high-throughput technology. Genotype frequencies of all variants in the EPICOLON phase 2 control population fitted the Hardy -Weinberg equilibrium (P40.01), except for rs12236219 and rs3738000 that were excluded. Therefore, results were available in EPICOLON phases 1 and 2 for the remaining 20 SNPs. For them, we performed a joint analysis of data combining both cohorts (1361 CRC cases and 1342 controls) to improve statistical power, as suggested previously (Skol et al, 2006).
Taking into account both EPICOLON phases, six SNPs in region 3q22 (rs1444601, rs13088006, rs939453, rs1131597, rs10934954 and rs2071387) maintained statistical significance with unadjusted P-value in the overall analysis. Association analysis for these six SNPs in the discovery, replication and overall cohorts is shown in Table 2. Among them, the most interesting results were achieved by rs1444601 in TOPBP1 and rs13088006 in CDV3, maintaining statistical significance with an unadjusted P-value in all phases. However, it is worth mentioning that these observed associations would not be present if Bonferroni's correction for multiple testing was applied (P ¼ 0.0025 for 20 SNPs) and, therefore, they should be considered formally as not statistically significant.
In addition, association results for rs1444601, rs13088006, rs939453, rs1131597, rs10934954 and rs2071387 were also evaluated in two cohorts described in a previous GWAS (Tomlinson et al, 2007), either by checking the original variant or a proxy SNP highly correlated with it (r 2 40.7) ( Table 3). None of these six variants were significantly associated with CRC risk in this previous GWAS.

DISCUSSION
Several genetic linkage studies in CRC families have previously reported disease association with chromosomal regions 9q22 -31 and 3q21 -24. There are several plausible candidate genes for CRC susceptibility within the aforementioned linkage regions including PTCH1, XPA and TGFBR1 in 9q22 -31, and EPHB1 and MRAS in 3q21 -q24. Therefore, our study selected candidate genes/variants from 9q22 and 3q22 and evaluated its potential implication in CRC genetic susceptibility. For these means, we performed a two-phase case -control association study in the EPICOLON cohort (1416 CRC cases and 1424 controls) analysing 172 SNPs within 84 genes located within chromosomal regions 9q22 -31 and 3q21 -q24 potentially involved in cancer. For instance, among the selected genes were phosphatidylinositol 3-kinases, a gene family involved in multiple cellular functions related to cancer, or MRAS, part of the Ras signalling extensively dysregulated in carcinogenesis (Bunney and Katan, 2010). The TGFBR1 gene located in chromosomal region 9q22 was also incorporated in our study, and SNPs within this gene included rs11466445, a polymorphic 9-bp deletion with controversial results regarding its association with CRC risk (Kaklamani et al, 2003;Skoglund et al, 2007). Our results in the first phase showed a borderline significant association (unadjusted P-value ¼ 0.0491, dominant inheritance) but did not reach statistical significance after multiple-testing correction, suggesting that rs11466445 does not increase CRC risk. We also screened six other TGFBR1 variants with potential pathogenic effects and found no evidence of CRC risk association, in agreement with recent studies in Spanish and northern European populations (Castillejo et al, 2009;Carvajal-Carmona et al, 2010). Besides, the transforming growth factor-b (TGF-b) pathway has been  strongly involved in CRC carcinogenesis, and its signalling is dependent on both receptors, TGFBR1 and TGFBR2. Although mutations in the TGFBR2 gene have been explicitly associated with CRC, the contribution of TGFBR1 to the CRC is less clear. (Valle et al, 2008) suggested that germline TGFBR1 allele-specific expression could confer an increased CRC risk, although very recently, another study refuted this hypothesis (Seguí et al, 2011). Another interesting gene located on region 9q22 is GALNT12, a member of the N-acetylgalactosaminyltransferase gene superfamily involved in protein glycosylation and highly expressed in the colon. Aberrant glycosylation is a known alteration that leads to the onset and progression of many cancers, including CRC. Recently, a study found germline mutations in the GALNT12 gene in some CRC patients (Guda et al, 2009). We also included in our study eight GALNT12 variants to further investigate whether the SNP variability of this gene could be involved in genetic susceptibility to CRC. However, we did not find any statistical significant association.
Although using a through and extensive two-phase case -control association study, we were not able to find any new susceptibility loci for CRC risk within these regions. However, some of the SNPs in region 3q22 showed interesting results in the joint analysis combining both cohorts (1361 CRC cases and 1342 controls), especially rs1444601 in TOPBP1 and rs13088006 in CDV3, which maintained statistical significance with an unadjusted P-value in all phases. TOPBP1 (topoisomerase DNA II-binding protein 1) represents a very interesting candidate for CRC genetic susceptibility as it contains multiple BRCT domains, the C-terminal portion of the BRCA-1 gene, and it has a critical role in the control of DNA damage and replication checkpoint (Gong et al, 2010). On the other hand, CDV3 (carnitine deficiency-associated gene expressed in ventricle 3), also known as H41, seems to be involved in cell proliferation and altered in gastric cancer (Oh et al, 2005). It is noteworthy that the TOPBP1 and CDV3 genes lie next to each other in 3q22 and rs1444601 and rs13088006 are only 34 kb apart. Therefore, it was interesting to know whether they co-segregated. Unfortunately, there were no available data for rs13088006 in HapMap. However, we used our own genotyping data in Haploview and found that they are not in the same haplotype block and segregate independently (r 2 ¼ 0.35).
When we checked our association results in two cohorts described in a previous GWAS (Tomlinson et al, 2007), none of these six variants were significantly associated with CRC risk. There could be differences in terms of the allele frequencies and linkage disequilibrium patterns between CORGI/VQ58 and EPICOLON data. This may explain the lack of replication of our suggestive hits in this external GWAS. In addition, the magnitude of the effect of a risk allele may differ between populations because of gene -gene or gene -environment interactions.
Finally, as limitations to our study, it should be commented that our cohort sample size is probably not large enough to be able to reach stronger conclusions for the analysed variants. However, our study (1416 CRC cases and 1424 controls for EPICOLON cohorts) had an estimated 80% power to detect an OR as low as 1.26 with an MAF of 0.30, 1.25 for a MAF down to 0.20 or 1.31 for a MAF down to 0.08, assuming a dominant model and a ¼ 0.05. It must be noted also that our conclusions were obtained in the EPICOLON cohorts and results were additionally corroborated in the CORGI and VQ58 cohorts, adding together 3721 CRC cases and 4970 controls. In addition, it should be commented that our results apply only for the analysed SNPs as we did not whatsoever comprehensively cover all possible low-penetrance genetic variants within the selected genes. Nevertheless, gene/SNP selection was biased to include genes with a plausible function in cancer and SNPs with a putative functional effect.
In conclusion, none of the 172 SNPs initially analysed in our study could be formally associated with CRC risk. However, variants in TOPBP1 and CDV3 showed interesting results and may have an effect on CRC risk. Despite our negative results, we consider additional case -control studies in larger CRC cohorts and meta-analysis could be useful to confirm or refute the role of TOPBP1 and CDV3 variants in CRC susceptibility.
This work is published under the standard license to publish agreement. After 12 months the work will become freely available and the license terms will switch to a Creative Commons Attribution-NonCommercial-Share Alike 3.0 Unported License.