Common risk variants for colorectal cancer: an evaluation of associations with age at cancer onset

Common genetic risk variants for colorectal cancer (CRC) have been identified at approximately 40 loci by genome-wide association studies (GWAS). We investigated the association of these risk variants by age at onset of CRC using case-only and case-control analysis. A total of 1,962 CRC cases and 2,668 controls from two independent case-control studies conducted by Korea’s National Cancer Center were included in this study. We genotyped 33 GWAS-identified single-nucleotide polymorphisms (SNPs) associated with CRC risk. The risk allele in SNP rs704017, located at 10q22.3 in the ZMIZ1-AS1 gene, was consistently less frequent among CRC patients aged <50 years than among CRC patients aged ≥50 years in the case-only analysis (odds ratio (OR) = 0.78, 95% confidence interval (CI) = 0.66–0.92, P = 2.7 × 10−3, in an additive model), although this did not surpass the threshold for multiple testing. The direction of associations between rs704017 and CRC risk differed by age group in the combined case-control analysis (<50 years: OR = 0.77, 95% CI = 0.60–0.98, P = 0.03 and ≥50 years: OR = 1.13, 95% CI = 0.98–1.29, P = 0.09, in a dominant model); the p-values for heterogeneity (Pheterogeneity = 7.5 × 10−3) and for interaction were statistically significant (Pinteraction = 7.8 × 10−3, in the dominant model). Our results suggest that the CRC susceptibility SNP rs704017 has a hereditary effect on onset age of CRC.

Hereditary factors are thought to contribute about 35% to causation of colorectal cancer (CRC) 1 . This view is supported by the fact that while rare genetic variants with high penetrance do confer a predisposition for inherited forms of CRC, such as the APC gene mutation in familial adenomatous polyposis (FAP) and the mismatch repair (MMR) gene mutation in Lynch syndrome, they account for only about 5% of CRC cases 2 . In order to explain the remaining genetic heritability, genome-wide association studies (GWAS) have identified approximately 40 common genetic loci for sporadic CRC 3 ; susceptibility single-nucleotide polymorphisms (SNPs) are thought to confer weak but cumulative and increasing effects on CRC risk 4 .
Genetic variants in susceptibility SNPs for CRC are likely to influence age at onset 4 . It has been suggested that, compared with late-onset CRC, the genetic contributions are enriched in early-onset CRC 5 in that clinico-pathologically advanced disease and poor prognosis 6 . Furthermore, the fact that age was differently distributed according to molecular features, such as CpG island methylator phenotype (CIMP) 7 , DNA macrosatellite instabilitly (MSI) status 8 , precursor adenomas 9 , and mutations in BRAF or KRAS gene 9 in sporadic CRC suggests that a distinct genetic background contributes to the disease that differs between early-and late-onset CRC 4 . Furthermore, a considerable number of unidentified genetic variants remain and replication studies of previously reported CRC susceptibility SNPs according to age at onset are needed.
We hypothesized that several common genetic variants of susceptibility SNPs could be related either to early or late age at onset of CRC. To test this hypothesis, allele frequencies of 33 susceptibility SNPs identified by previous GWAS were compared between early-onset CRC patients (aged < 50 years) and later-onset CRC patients Scientific RepoRts | 7:40644 | DOI: 10.1038/srep40644 (aged ≥ 50 years) in a case-only analysis. We assessed the heterogeneity of associations between SNPs and CRC risk according to age groups and interactions between SNPs and age groups in case-control analyses. Table 1 shows the baseline characteristics of CRC patients in each study. A total of 1,962 sporadic CRC patients comprising 436 early-onset (aged < 50 years, mean: 42.5 years) patients and 1,526 late-onset (aged ≥ 50 years, mean: 62.2 years) patients were included in this analysis. In both the NCC 2010-2013 and NCC 2000-2004 studies, late-onset CRC patients were more likely to have higher body mass index (P NCC 2010-2013 = 0.03, P NCC 2000-2004 < 0.01, and P combined < 0.01) and a lower education level (P NCC 2010-2013 < 0.01, P NCC 2000-2004 < 0.01, and P combined < 0.01) than early-onset patients. Early-onset patients were more likely than late-onset patients to have reported ever consuming alcohol (P NCC 2000-2004 < 0.01 and P combined < 0.01). In the NCC 2010-2013, late-onset CRC patients had more frequency of fecal occult blood test (FOBT) history than early-onset patients. There were no differences for sex, smoking, TNM stage, or CRC site between onset age groups.

Discussion
We found that the risk allele of SNP rs704017 at 10q22.3 (ZMIZ1-AS1) was less frequent among sporadic CRC patients with an early age at onset (< 50 years) than among patients with late-onset age (≥ 50 years) in our case-only analysis. Furthermore, both heterogeneity and interaction was observed in the association between genotypes of rs704017 and risk for CRC according to age groups (< 50 and ≥ 50 years) in our case-control analysis.
Early-onset CRC includes approximately 30% of hereditary and 70% of sporadic CRC cases 10 . The molecular mechanisms driving hereditary early-onset CRC have been well defined as germline mutations such as the MLH1, MSH2, MSH6, PMS2, and EPCAM mutations in Lynch syndrome and APC and MUTYH mutations in FAP 11 , whereas sporadic early-onset CRC has not been fully clarified 10 . Although sporadic early-onset CRC is thought to be attributable to common genetic variants with low penetrance 4 , only a few SNPs, including rs10795668 at 10p14, rs3802842 at 11q23.1, and rs4779584 at 15q13.3, have been associated with an increased risk for early-onset CRC 12 .
We found that the risk allele (G) of rs704017 was less frequent among early-onset CRC patients and was associated with increased risk among late-onset CRC patients. Accordingly, it may be that this variant plays a role in genetic predisposition to late-onset CRC. To date, a few associations of this risk variant for CRC have been reported among East Asians (P = 2.07 × 10 −8 ) and Europeans (P = 4.71 × 10 4 ) 13 . In those analyses, the mean age of CRC patients was 60.25 years in East Asians and 64.10 years in Europeans, and the analyses included all CRC patients regardless of onset age. Therefore, there is a need for more association studies, in order to confirm the associations of rs704017 with CRC risk according to onset age.
Rs704017 is located in an intron of the zinc finger MIZ-type containing 1 antisense RNA1 (ZMIZ1-AS1) gene in the 10q22.3 region. ZMIZ1-AS1 interferes with and inhibits translation of ZMIZ1 gene. Reduced ZMIZ1 gene expression and greater frequencies of somatic mutations were observed in colon tumors based on data from The Cancer Genome Atlas (TCGA) 14 and the Catalogue of Somatic Mutation in Cancer (COSMIC) 15 . The ZMIZ1 gene encodes a part of the protein inhibitor of activated signal transducer and activator of transcription (STAT) protein family (PIAS). With a Janus kinase (JAK), the STAT protein belongs to JAK-STAT signaling pathway, which can control survival, proliferation, and differentiation of various cells 16 . The oncogenic transformation can be promoted by persistently activated STAT proteins because of several somatic mutations in the JAK-STAT pathway, which have been identified in patients with a variety of diseases, including myeloproliferative disease, polycythemia vera, megakaryoblastic myeloid leukemia, lymphoblastic leukemia, and uterine leimyosarcomas 16 , and also could cause CRC 17 .
A large proportion of CRC patients have late-onset sporadic disease without an obvious hereditary syndrome 18 . Although the majority of late-onset CRC is located in the distal colon and microsatellite stable (MSS), some features more characteristic of late-onset CRC include occurrence in the proximal colon, as well as the presence of MSI via MLH1 gene promoter methylation, chromosomal instability, and a high CpG island methylator phenotype, especially when compared with sporadic early-onset CRC 11 . In addition to these characteristics, constitutively decreased PTEN expression in colon mucosa and p53 were experimentally observed to be associated with a late process of tumorigenesis in CRC [19][20][21] . Because the PIAS protein family has been known to regulate p53 22 and PTEN 23 , tumor development of CRC may also occur late.
On the other hand, rs704017 (G) was less frequent and tended to be associated with decreased risk of early-onset CRC compared to late-onset CRC. This is because rs704017 might have only small effects on early-onset CRC according to both the common disease-common variant hypothesis 24 and the polygenic inheritance model 25 . Moreover, several early-onset sporadic CRC cases without family history showed the possibility of hereditary CRC suggesting a role for germline mutations in hMLH1 and hMSH2 in carcinogenesis in contrast to general sporadic CRC, which is more related to epigenetic changes 26 . Thus, tumorigenesis of early-onset CRC could be more influenced by germline mutators than by somatic mutations.
An age of 50 years has been considered the cut-off for early-vs. late-onset CRC according to previous publications 11,27 . The reason that CRC screening is recommended for people starting at age 50 years in Korea 28 , as well as in many other national guidelines [29][30][31] , is because screening colonoscopy studies have shown a significantly increased risk of advanced neoplasms among people older than 50 years [32][33][34] . Additionally, we considered CRC patients aged 65 years or more as late-onset for the sensitivity analysis. We also compared allelic frequencies between CRC patients aged under 30 or 40 (early-onset) and patients aged 50 or 65 years or more (late-onset), but the results were more attenuated due to small sample size effects. One strength of our study is that we evaluated the association of risk variants according to onset age of CRC throughout both stages of our case-only and case-control analyses. Because case-only analysis is considered to produce more precise estimations than case-control analysis due to both small dispersion and homogeneity 35 , we conducted a case-only analysis before the case-control analysis. From those analyses, we were able to observe the relationship between rs704017 and onset age of CRC. A limitation of this study is that although we made adjustments for multiple testing, specifically the Bonferroni and false-discovery rate (FDR) tests, the association of rs704017 with CRC onset age was not statistically significant. The p-value was 2.7 × 10 −3 in the combined dataset when comparing allele frequencies between onset age of CRC patients. However, p-values were compared to 0.05 divided by 33 (= 1.5 × 10 −3 ) which was the Bonferroni-corrected p-value for 33 SNPs and the FDR-adjusted p-value of rs704017 was estimated to be 0.09. Accordingly, adjustments for multiple testing were not applied to the results and further analyses with larger sample sizes are needed to prevent false-positive results and confirm the possible association noted in this study.
In conclusion, we found that the risk variant of rs704017 at 10q22.3 (ZMIZ1-AS1) was significantly less frequent among early-onset sporadic CRC patients, although this did not surpass the threshold for multiple testing. Moreover, the association between rs704017 and risk of CRC tended to be in opposite directions according to the onset age, and heterogeneity and genotype-onset age interaction were observed. To ascertain the role of susceptibility SNPs on the onset age of CRC, further studies are needed.

Study population.
This study used data from two independent, hospital-based case-control studies conducted by the National Cancer Center (NCC) in Korea, NCC 2010and NCC 2000-2004, the details of which have been reported previously 13,36,37 . NCC 2010-2013 recruited 1,070 newly diagnosed CRC patients, who had been surgically treated between 2010 and 2013. The controls were recruited from among people who visited a cancer-screening center at the NCC for a health check-up through a benefit program of the National Health Insurance Corporation between 2007 and 2014. After excluding individuals who did not complete a structured written questionnaire or whose blood sample was insufficient for genotyping, the remaining 703 cases were 1:2 matched with 1,406 controls by sex and age (5-year intervals). Of these, 49 cases and 67 controls who had a firstor second-degree family history of CRC were also excluded. Thus, a total of 654 cases and 1,339 controls from NCC 2010-2013 were included in the analysis.
In Data collection. From CRC patients, general and lifestyle information on age, sex, body mass index, education level, alcohol consumption and smoking habits, and previous FOBT history was obtained by a face-to-face interview conducted by a trained interviewer using a structured, written questionnaire. Clinico-pathological information on tumor-node-metastasis (TNM) stage and CRC site was obtained from patients' medical records from the Center for Colorectal Cancer at the NCC. The control participants conducted self-administered questionnaires on general and lifestyle information, after which an interviewer contacted them by phone and confirmed the participants' responses.
Genotyping. For genotyping, we selected 36 susceptibility SNPs at 27 loci that had been associated with CRC risk by previous GWAS 13, [38][39][40][41][42][43][44][45][46][47] . For participants in the NCC 2010-2013 study, genomic DNA from blood was extracted using the MagAttract DNA Blood M48 kit and BioRobot M48 automatic extraction equipment (Qiagen, Inc., Valencia, CA, US), according to the manufacturer's instructions. Genotyping was performed using Agenabio MassArray iPLEX ® gold assay (Agena Bioscience, Inc., San Diego, CA, US), and 32 of the 36 selected SNPs (88.9%) were successfully genotyped. Genotyping for the NCC 2000-2004 study had been conducted using the iPLEX Sequenom MassARRAY platform (Sequenom, Inc., San Diego, CA, US) for 29 susceptibility SNPs as previously described 13,37 , and 28 SNPs overlapped with the 36 SNPs selected for this analysis. Accordingly, one SNP rs719725 was excluded from the two studies. Additionally, two SNPs, rs6691170 and rs16892766, were monomorphic and therefore excluded. Thus, of the originally selected 36 SNPs, a total of 33 GWAS-identified SNPs at 25 loci were included in the analysis (Supplementary Table 1). All experimental methods were approved by the IRB of the NCC and performed in accordance with the manufacturer acguidelines and regulations.

Statistical analysis.
To compare the characteristics of sporadic CRC patients aged < 50 years with those aged ≥ 50 years, we used Student's t-test for continuous variables and the chi-square test for categorical variables. For all selected SNPs, Hardy-Weinberg Equilibrium (HWE) was tested among controls. The risk allele frequencies (RAFs) of the SNPs were calculated for each early-onset (aged < 50 years) and late-onset (aged ≥ 50 years) CRC Scientific RepoRts | 7:40644 | DOI: 10.1038/srep40644 patient and all controls. To compare the RAFs of SNPs between onset age groups (< 50 and ≥ 50 years) of the CRC patients under an additive model, a logistic regression model adjusted for sex was used. For the sensitivity analysis, the RAFs of SNPs were also compared between CRC patients aged < 50 years and those aged ≥ 65 years. To investigate associations of susceptibility SNPs with CRC risk according to age groups (< 50 and ≥ 50 years) under additive, dominant, and recessive models, we used logistic regression models adjusted for age and sex and stratified by age groups. The heterogeneity of the associations between age groups was evaluated with Cochran's Q test. Interactions were assessed with Wald statistics by adding a genotype × age group interaction term to the models. The heterogeneity of the association for SNP, rs704017, between NCC 2010-2013 and NCC 2000-2004 was evaluated with Cochran's Q test and random effect meta-analysis as well as pooled analysis was performed. Since there was not statistically significant heterogeneity between two study groups and showed similar combined results, the pooled analysis was applied in combined results of two study groups. Because of multiple comparison problems, Bonferroni and the false-discovery rate (FDR) tests were conducted. Associations were evaluated by odds ratios (ORs) and 95% confidence intervals (CIs), and p-values less than 0.05 were considered to be statistically significant. All statistical analyses were two-sided and performed separately in each of the datasets from the two studies and in the combined dataset using SAS version 9.3 software (SAS Institute, Inc., Cary, NC, US) and STATA version 13 software (STATA Corp., College Station, TX, US).