Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk

Abstract

Known genetic loci explain only a small proportion of the familial relative risk of colorectal cancer (CRC). We conducted a genome-wide association study of CRC in East Asians with 14,963 cases and 31,945 controls and identified 6 new loci associated with CRC risk (P = 3.42 × 10−8 to 9.22 × 10−21) at 10q22.3, 10q25.2, 11q12.2, 12p13.31, 17p13.3 and 19q13.2. Two of these loci map to genes (TCF7L2 and TGFB1) with established roles in colorectal tumorigenesis. Four other loci are located in or near genes involved in transcriptional regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9), and cell growth and differentiation (NXN). We also found suggestive evidence for three additional loci associated with CRC risk near genome-wide significance at 8q24.11, 10q21.1 and 10q24.2. Furthermore, we replicated 22 previously reported CRC-associated loci. Our study provides insights into the genetic basis of CRC and suggests the involvement of new biological pathways.

Main

CRC is a leading cause of cancer morbidity and mortality worldwide1. It is well established that genetic factors have an important role in the etiology of CRC2,3. Deleterious germline mutations in known susceptibility genes, notably APC (adenomatous polyposis coli), MLH1, MSH2, MSH6 and PMS2, confer high risk of CRC in hereditary cancer syndromes3,4,5,6. Most sporadic CRC cases, however, do not carry these high-penetrance mutations3,4. Since 2007, genome-wide association studies (GWAS) and subsequent fine-mapping analyses conducted in individuals of European descent have identified 21 low-penetrance susceptibility loci associated with CRC risk7,8,9,10,11,12,13,14,15,16,17. Together, these common loci explain less than 10% of the familial relative risk of CRC in European populations13,14. In a GWAS of 7,456 CRC cases and 11,671 controls conducted as part of the Asia Colorectal Cancer Consortium, we identified 3 new loci at 5q31.1 (near PITX1), 12p13.32 (near CCND2) and 20p12.3 (near HAO1) associated with CRC risk18. In addition, we discovered a new risk variant in the SMAD7 gene associated with CRC in East Asians19. Over the past 2 years, we have doubled the sample size in the Asia Colorectal Cancer Consortium and conducted a 4-stage GWAS, including 14,963 CRC cases and 31,945 controls, to identify additional susceptibility loci for CRC.

Results

Study overview

We performed a fixed-effects meta-analysis to evaluate approximately 2.4 million genotyped or imputed SNPs on 22 autosomes from 5 GWAS (stage 1) conducted in China, Japan and South Korea, including in total 2,098 CRC cases and 6,172 cancer-free controls (Supplementary Tables 1 and 2). There was little evidence of population stratification in these studies (Supplementary Figs. 1 and 2), with genomic inflation factor λ < 1.04 in each of the five studies and the meta-analysis (λ1,000 = 1.01). We selected 8,539 SNPs showing evidence of association with CRC risk (P < 0.05) according to prespecified criteria (Online Methods). We also included the 31 risk-associated variants identified by previous GWAS7,8,9,10,11,12,13,14,15,16,17,18,19,20, resulting in a total of 8,570 SNPs. Of these, 7,113 SNPs were successfully designed using Illumina Infinium assays as part of a large genotyping effort for multiple projects. Using this customized array, we genotyped an independent set of 3,632 CRC cases and 6,404 controls recruited in 3 studies (stage 2) conducted in China. After quality control exclusions, 6,899 SNPs remained for analysis in 3,519 cases and 6,275 controls. We evaluated associations between CRC risk and these SNPs in each study separately and then performed a fixed-effects meta-analysis to obtain summary estimates. Again, we observed little evidence of population stratification, either in the three studies individually (λ < 1.05) or combined (λ = 1.05, λ1,000 = 1.01) (Supplementary Fig. 3). In a meta-analysis of data from stages 1 and 2, we identified 559 SNPs showing evidence of association at P < 0.005. We then evaluated these SNPs using data from a large Japanese CRC GWAS (stage 3) with 2,814 CRC cases and 11,358 controls20. Thirty SNPs in 25 new loci were associated with CRC risk at P < 0.0001 in the meta-analysis of data from stages 1–3 and at P < 0.01 in the meta-analysis of stages 2 and 3. Of these SNPs, 29 were successfully genotyped in an independent sample of 6,532 CRC cases and 8,140 controls from 5 additional studies (stage 4) conducted in China, South Korea and Japan.

Newly identified risk-associated loci for CRC

In the meta-analysis of all data for the 29 SNPs from stages 1–4 with 14,963 CRC cases and 31,945 controls, signals from 10 SNPs, representing 6 new loci, showed convincing evidence of an association with CRC risk at the genome-wide significance level (P < 5 × 10−8), including rs704017 at 10q22.3; rs11196172 at 10q25.2; rs174537, rs4246215, rs174550 and rs1535 at 11q12.2; rs10849432 at 12p13.31; rs12603526 at 17p13.3; and rs1800469 and rs2241714 at 19q13.2 (Table 1, Supplementary Fig. 4 and Supplementary Tables 3 and 4). Associations of CRC risk with the top SNPs in each of the six loci were consistent across almost all studies, with no evidence of heterogeneity (Fig. 1). With the exception of the intergenic SNP rs10849432 at 12p13.31, the remaining nine newly identified risk-associated variants were located in exonic, promoter, 3′ UTR or intronic regions of known genes (Table 1). The linkage disequilibrium (LD) blocks (r2 > 0.5) tagged by rs704017 (10q22.3), rs174537 (11q12.2) and rs1800469 (19q13.2) each span multiple genes (Supplementary Table 5). The LD blocks tagged by rs11196172 (10q25.2) and rs12603526 (17p13.3) each lie within a single gene. The LD block tagged by rs10849432 (12p13.31) does not contain any known gene. Stratification analyses of the newly identified risk variants by tumor anatomical site (colon or rectum), population (Chinese, Korean or Japanese) and sex (male or female) did not identify any significant heterogeneity (Supplementary Tables 6, 7, 8). In addition to the six newly identified loci, three additional regions showed association with CRC risk near genome-wide significance at 8q24.11 (rs6469656; P = 5.38 × 10−8), 10q21.1 (rs4948317; P = 7.14 × 10−8) and 10q24.2 (rs12412391; P = 7.41 × 10−7). Results for all 29 SNPs across stages 1–4 are presented in Supplementary Table 3.

Table 1 Summary results for risk variants in the six newly identified loci associated with CRC in East Asians
Figure 1: Forest plots for risk-associated variants in the six newly identified loci.
figure1

(a) rs704017. (b) rs11196172. (c) rs174537. (d) rs10849432. (e) rs12603526. (f) rs1800469. Per-allele OR estimates are presented, with the area of each box proportional to the inverse-variance weight of the estimate. Horizontal lines represent 95% CIs. Diamonds represent summary OR estimates generated under a fixed-effects meta-analysis; width of the diamonds corresponds to 95% CIs. Continuous vertical lines represent the null value; dashed vertical lines represent the summary OR estimates for all studies for each SNP.

We performed conditional analyses for SNPs located within a 1-Mb region centered on the index SNP in each of the six newly identified loci. No second association signal was identified at P < 0.01 after adjusting for the respective index SNP (data not shown). Four SNPs at 11q12.2 and 2 SNPs at 19q13.2 showed association with CRC risk at P < 5 × 10−8, and we thus performed haplotype analysis for these 2 loci using genotype data available for 10,051 CRC cases and 14,415 controls (stages 2 and 4). Two common haplotypes were found in the 11q12.2 locus, accounting for more than 99% of the haplotypes constructed using the four highly correlated SNPs. The haplotype with all four risk-associated alleles (frequency = 0.574 in controls) was strongly associated with CRC risk (odds ratio (OR) = 1.40, 95% confidence interval (CI) = 1.29–1.51; P = 3.69 × 10−16) (Supplementary Table 9). Similarly, we identified two common haplotypes at the 19q13.2 locus, accounting for more than 99% of the haplotypes constructed using the two highly correlated SNPs. The haplotype with the risk-associated allele at both SNPs (frequency = 0.485 in controls) was also associated with increased risk of CRC (OR = 1.16, 95% CI = 1.08–1.26; P = 1.18 × 10−4) (Supplementary Table 10). Overall, these analyses did not identify an independent signal in any of the six newly identified loci.

We examined potential SNP-SNP interactions between the 6 new risk-associated variants identified in this study (rs704017, rs11196172, rs174537, rs10849432, rs12603526 and rs1800469) and also between these 6 SNPs and the risk-associated variants in 25 previously reported loci (Supplementary Table 11). Multiplicative interactions were found with suggestive evidence of association (P < 0.05) for seven pairs of SNPs. None of these interactions, however, remained statistically significant after correcting for multiple comparisons in 180 tests (adjusted P = 0.00028).

We evaluated associations of the 10 newly identified SNPs with CRC risk in individuals of European descent using data from 3 consortia, the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO)17, the Colorectal Transdisciplinary (CORECT) Study and the Colon Cancer Family Registry (CCFR)21, with a total sample size of 16,984 CRC cases and 18,262 controls (Supplementary Table 12). In a meta-analysis of data from these consortia, all ten SNPs showed association with CRC risk in the same direction as observed in East Asians (Table 2). Five SNPs in two loci (10q22.3 and 11q12.2) were associated with CRC risk at P < 0.008 (corrected for multiple comparisons of six loci). These associations in individuals of European descent, however, were weaker than in East Asians. Tests showed statistically significant evidence of heterogeneity for risk variants at 11q12.2 and 19q13.2 (P < 0.008). The frequency of the risk-associated allele was also considerably different in East Asians and individuals of European ancestry for SNPs in five loci (Supplementary Table 13). For example, the minor allele (C) of rs12603526 is common in East Asians, whereas the minor allele frequency (MAF) is <0.02 in individuals of European descent. These differences might in part reflect distinct patterns of LD between the index SNPs and causal SNPs in these two populations. As expected, LD patterns for most of the newly identified loci were considerably different in East Asians and individuals of European descent (Supplementary Fig. 5). Large-scale fine-mapping of these loci will be helpful in identifying causal variants.

Table 2 Associations of risk variants in the six newly identified loci with CRC in individuals of European descent

Putative functional variants and candidate genes

We evaluated and annotated putative functional variants and candidate genes in each of the six newly identified loci using data from the 1000 Genomes Project22, HapMap 2 (ref. 23), the Encyclopedia of DNA Elements (ENCODE)24, expression quantitative trait locus (eQTL) databases25,26,27,28, the Catalogue of Somatic Mutations in Cancer (COSMIC)29, The Cancer Genome Atlas (TCGA) CRC project30, the Expression Atlas31, PubMed and Online Mendelian Inheritance in Man (OMIM) (Online Methods). We summarize the results below for each locus.

At the 10q25.2 locus, rs11196172 is located in intron 4 of the TCF7L2 gene. This SNP and other correlated SNPs (r2 > 0.5) fall within a region with strong enhancer activity and a DNase I hypersensitivity site annotated by ENCODE (Supplementary Table 14), suggesting a potential functional role for these SNPs. We found that the risk-associated allele of rs11196172 was significantly associated with higher expression of the TCF7L2 gene (P = 0.003) in colon tumor tissue using TCGA data (Fig. 2). The TCF7L2 gene encodes TCF7L2 (previously known as TCF4), which is a key transcription factor in the Wnt signaling pathway. Aberrant activation of Wnt signaling is found in more than 90% of CRCs30, and TCF7L2 is a known tumor suppressor in CRC. Loss of TCF7L2 function enhances CRC cell growth, whereas gain of function suppresses CRC cell growth32,33. The TCF7L2 gene is one of the most frequently mutated genes in CRC, with estimated point mutation rates of approximately 8–12.5% (refs. 29,30). Although TCF7L2 is the only gene in this locus (Supplementary Fig. 4), we also found that the risk-associated allele of rs11196172 was significantly associated with higher expression of the VTI1A gene (P = 5.1 × 10−4) in colon tumor tissue (Fig. 2). The VTI1A gene is located approximately 131 kb upstream of the TCF7L2 gene, and mRNA levels for these two genes are highly correlated in colon tumor tissue (r = 0.71; P < 0.0001). Recently, a recurrent gene fusion connecting the first three exons of VTI1A to the fourth exon of TCF7L2 was identified in approximately 3% of colorectal tumors34. It is possible that the VTI1A gene might also be involved in the association between rs11196172 and CRC risk.

Figure 2: Association of selected risk variants identified in this study with gene expression in colon tumor tissue.
figure2

(a) rs11196172 and TCF7L2. (b) rs11196172 and VTI1A. (c) rs1535 and FADS2. Gene expression levels are represented by reads per kilobase of exon per million mapped reads (RPKM) values based on the three genotypes of each SNP shown in red, blue and green. The median RPKM values and interquartile ranges (IQRs) for each SNP are presented in the overlaid box plots, and whiskers represent 1.5 times the IQR of the lower quartile to 1.5 times the IQR of the upper quartile. In a and b, RPKM values are shown at normal scale, whereas RPKM values in c are shown with a logarithmic scale owing to departure from a normal distribution. P values for associations between SNP genotypes and gene RPKM values were tested using a linear regression model.

At the 19q13.2 locus, we identified two perfectly correlated SNPs (rs1800469 and rs2241714; r2 = 1) associated with CRC risk. Of these, rs1800469 has previously been investigated with respect to CRC risk in many small candidate gene association studies, with conflicting results5. We herein provide for the first time, to our knowledge, convincing evidence of association for rs1800469 through our GWAS analysis. SNP rs1800469 maps to the promoter of the TGFB1 gene, and rs2241714 is a nonsynonymous SNP that results in an amino acid substitution at residue 11 of the B9D2 protein. The A allele of rs1800469 has been related to higher transcriptional activity for the TGFB1 gene and higher circulating levels of the transforming growth factor (TGF)-β1 protein than the G allele35. Both rs1800469 and rs2241714 are in perfect LD with another nonsynonymous SNP, rs1800470, which causes a proline-to-leucine substitution at residue 10 of the TGF-β1 protein. Although the two nonsynonymous SNPs are predicted to be tolerated36 or benign37, the Pro10 variant encoded by rs1800470 has also been associated with an increase in TGFB1 gene expression, TGF-β1 protein secretion and circulating levels of TGF-β1 protein38,39,40. Whereas rs2241714 is an eQTL for TGFB1, both rs1800469 and rs2241714 are also eQTLs for other genes in this locus (Supplementary Table 15). In addition to these three SNPs, we suggest that many highly correlated SNPs located in the TGFB1 gene might potentially have regulatory functions (Supplementary Table 14). The TGF-β1 protein is a member of the TGF-β signaling pathway. Somatic alterations of certain components in this pathway (TGFBR2, SMAD2, SMAD3 and SMAD4) are estimated to be present in almost half of CRCs41. High-penetrance germline mutations in the SMAD4 gene are known to cause juvenile polyposis, an autosomal dominant polyposis syndrome linked to a high risk of CRC42. Germline, allele-specific expression of the TGFBR1 gene has also been shown to contribute to increased risk of CRC43. Thus far, GWAS have identified at least six other independent SNPs that are located in or proximal to genes in the TGF-β signaling pathway (SMAD7, GREM1, BMP2, BMP4 and RHPN2)9,10,13,19. Our finding of an association between a genetic variant in the TGFB1 gene and CRC risk adds further evidence for the critical role of this pathway in colorectal tumorigenesis.

At the 11q12.2 locus, the four perfectly correlated SNPs rs174537, rs4246215, rs174550 and rs1535 lie in intron 24 of MYRF, the 3′ UTR of FEN1, intron 7 of FADS1 and intron 1 of FADS2, respectively. Of these SNPs, rs4246215 is an eQTL for the FEN1 gene in normal colorectal tissue44 and is predicted to affect microRNA (miRNA) binding site activity45. SNP rs174537 is an eQTL for the FADS1 and FADS2 genes in whole blood and other types of tissue (Supplementary Table 15). Using data from TCGA, we identified a strong correlation of rs1535 genotypes with FADS2 gene expression (P = 1.4 × 10−5) in colon tumor tissue (Fig. 2). These findings suggest that the potential functions of these SNPs might be mediated through their effects on their host genes. We also found that the FEN1, FADS1 and FADS2 genes are all highly expressed in colon tumor tissue compared with normal colon tissue (Supplementary Table 16). The FEN1 gene encodes flap structure–specific endonuclease 1, a protein that is essential for DNA repair, replication and degradation and that has a critical role in maintaining genome stability and protecting against carcinogenesis46. FEN1 mutations have been found in several human cancers47. Mouse models with haploinsufficiency for Fen1 showed rapid progression of CRC and reduced survival48. Two other genes in this locus, FADS1 and FADS2, respectively encode delta-5 and delta-6 desaturases, which are key enzymes in the metabolism of polyunsaturated fatty acids. Of these proteins, delta-6 desaturase is responsible for the synthesis of arachidonic acid49, the precursor of prostaglandin E2 (PGE2), which is a key molecule mediating the effect of cyclooxygenase-2 in colorectal carcinogenesis50. Notably, SNPs in perfect LD with the risk-associated variants for CRC identified in this study are strongly associated with circulating arachidonic acid levels49. We have shown previously that high levels of the PGE2 metabolite in urine, a marker of endogenous PGE2 production, are strongly related to higher risk of CRC51. Because the LD block of approximately 190 kb tagged by the four risk-associated variants covers many putatively functional SNPs that are located in the FEN1, FADS1 and FADS2 genes (Supplementary Fig. 6 and Supplementary Table 14), it is difficult to pinpoint a single SNP or gene that might be responsible for the association with CRC risk in this locus. Nevertheless, our study provides evidence of a potentially important role for the FEN1, FADS1 and FADS2 genes in the etiology of CRC.

At the 10q22.3 locus, rs704017 is located in intron 3 of the ZMIZ1-AS1 gene and resides in a strong enhancer region predicted using ENCODE data (Supplementary Fig. 6 and Supplementary Table 14). It also maps to a DNase I hypersensitivity site identified in the Caco-2 CRC cell line. In addition to the ZMIZ1-AS1 gene, the LD block tagged by rs704017 also includes the ZMIZ1 gene, whose expression is downregulated in the Caco-2 and HT-29 CRC cell lines31. In line with these observations, we found in TCGA data that ZMIZ1 gene expression is lower in colon tumor tissue compared with normal colon tissue (P = 3.28 × 10−6). In addition, somatic mutations in the ZMIZ1 gene have been reported in more than 2% of colon tumors29. Whereas ZMIZ1-AS1 is a miscellaneous RNA (miscRNA) gene with unknown function, the ZMIZ1 gene encodes the protein ZMIZ1, which regulates the activity of several transcription factors, including AR, SMAD3, SMAD4 and p53. It has been shown that ZMIZ1 might have a broader role in epithelial cancers, including CRC52. SNP rs704010, located in intron 1 of the ZMIZ1 gene, has been associated with breast cancer53. However, this SNP, which is in weak LD (r2 = 0.09) with the risk-associated variant we identified for CRC, was not associated with CRC in this study (data not shown). Given the biological function of the ZMIZ1 gene, it is possible that this gene is involved in the association observed in this locus.

In the 12p13.31 locus, rs10849432 maps to an LD block of approximately 52 kb with no known genes. ENCODE data suggest that rs4764551 and rs4764552, perfectly correlated with rs10849432, might be located in a strong enhancer region (Supplementary Table 14). Notably, rs4764551 also maps to a DNase I hypersensitivity site in the HCT-116 CRC cell line and a binding site for the CTCF protein in the Caco-2 CRC cell line. Using data from TCGA, we showed that the closest genes to rs10849432 (CD9, PLEKHG6 and TNFRSF1A) all have downregulated expression in colon tumor tissue (Supplementary Table 16). The CD9 gene encodes the CD9 antigen, which participates in many cellular processes, including differentiation, adhesion and signal transduction. Notably, CD9 has a critical role in the suppression of cancer cell motility and metastasis54, and overexpression of the CD9 gene is associated with favorable prognosis for patients with CRC55. CD9 is also involved in suppressing Wnt signaling56. Although the function of the PLEKHG6 gene is less clear, somatic mutations in this gene were found in approximately 2% of colon tumors29. The protein encoded by TNFRSF1A is a major receptor for tumor necrosis factor (TNF)-α and is known to be involved in cytokine-induced senescence in cancer57. In addition to evidence for the three nearby genes, we also found that rs4764552 is an eQTL for the LTBR gene (Supplementary Table 15). The LTßR protein has an essential role in lymphoid organ formation and has also been linked to cancer58, including CRC59. On the basis of these data, we propose that the CD9 gene is the most likely candidate to explain the association identified in this locus. However, potential roles for other genes cannot be excluded.

At the 17p13.3 locus, rs12603526 lies in intron 1 of the NXN gene, in a region covering several regulatory elements, including a DNase I hypersensitivity site, a strong enhancer region and a site with an effect on regulatory motifs as annotated by ENCODE (Supplementary Table 14). NXN gene expression was lower in the colon tumor tissue samples included in TCGA (P = 2.83 × 10−5). Nucleoredoxin, encoded by the NXN gene, has functions related to cell growth and differentiation60. Overexpression of the NXN gene has been found to suppress the Wnt signaling pathway, and nucleoredoxin dysfunction might cause activation of the transcription factor TCF (T cell factor), accelerated cell proliferation and enhancement of oncogenicity61. Further research is needed to determine the causal variant and biological mechanism for the association at this locus.

Previously reported CRC-associated loci in East Asians

We evaluated association evidence for 31 SNPs in 25 established CRC susceptibility loci7,8,9,10,11,12,13,14,15,16,17,18,19,20 by analyzing data from stages 1–3 and our previous GWAS18,19 with a total sample size of up to 11,934 CRC cases and 28,282 controls (Table 3 and Supplementary Table 17). We found further evidence to support the associations of the four loci identified previously in our GWAS conducted among East Asians (P = 1.40 × 10−10 to 3.05 × 10−15). Of the 23 SNPs in the 18 susceptibility loci previously identified by GWAS of individuals of European descent, 20 showed association with CRC risk at P < 0.05 in East Asians in the same direction as reported in the original studies7,8,9,10,11,12,13,14,15,16,17. These signals included 6 SNPs in 4 loci (1q41, 8q24.21, 10p14 and 18q21.1) with association at P < 5 × 10−8, 6 SNPs in 6 loci with association at P < 0.002 (significance level adjusted for multiple comparisons of 25 independent loci) and 8 SNPs in 8 additional loci with association at P < 0.05. Three SNPs in three loci were not associated with CRC risk (P > 0.05). Given that our study had a statistical power of >80% to identify an association with an OR of 1.05 at P = 0.05 for SNPs with a MAF of 0.20, it is unlikely that these three SNPs confer substantial risk of CRC in East Asian populations. In general, loci initially identified in individuals of European descent had smaller ORs in East Asians, with evidence of heterogeneity noted for three SNPs (P < 0.002). SNPs rs6691170 and rs16892766, identified by previous GWAS of individuals of European descent, are not polymorphic in East Asians, and SNP rs5934683 is located on the X chromosome. We did not have data to evaluate the associations of these three SNPs with CRC risk in this study.

Table 3 Association evidence in East Asians for risk variants in previously reported CRC susceptibility loci

Familial relative risk explained by CRC-associated loci

The six newly identified loci in this study explain approximately 2.1% of the familial relative risk of CRC in East Asians (Supplementary Table 18). The variants, along with the four SNPs identified in our previous GWAS, explained approximately 4.3% of the familial relative risk of CRC in East Asians. An additional 3.4% of the familial relative risk in East Asians can be explained by 18 independent SNPs initially identified in studies conducted among individuals of European descent and confirmed in this study. On the basis of per-allele OR values derived from previously published GWAS7,8,9,10,11,12,13,14,15,16,17,18 and this study, we estimate that the SNPs in the 31 loci identified thus far explain approximately 9% of the familial relative risk of CRC in individuals of European descent (Supplementary Table 19), a level slightly higher than the 7.7% explained in East Asians.

Discussion

In the largest GWAS conducted thus far among East Asians, we identified six new genetic loci associated with CRC risk and provided suggestive evidence for three additional previously unreported loci. In addition, we replicated 22 previously reported CRC susceptibility loci. Of the six newly identified loci, two map to genes (TCF7L2 and TGFB1) that have established roles in colorectal tumorigenesis. The other four loci are located in or proximal to genes that are functionally important in transcription regulation (ZMIZ1), genome maintenance (FEN1), fatty acid metabolism (FADS1 and FADS2), cancer cell motility and metastasis (CD9), and cell growth and differentiation (NXN). Risk-associated variants at some loci fall within potential functional regions, and two are associated with the expression levels of the TCF7L2 and FADS2 genes. This study expands current understanding of the genetic basis of CRC risk and provides evidence for new genes and biological pathways that might be involved in colorectal tumorigenesis.

On the basis of a large twin study conducted in Sweden, Denmark and Finland2, the heritabilities estimated for CRC, breast cancer and prostate cancer were 35%, 27% and 42%, respectively. Thus far, more than 70 low-penetrance susceptibility loci have been identified in GWAS for breast cancer62 or prostate cancer63, and these loci together explain approximately 14% and 30%, respectively, of the familial relative risk of these cancers in individuals of European descent. For CRC, however, only 31 low-penetrance susceptibility loci have been identified, explaining approximately 9% of the familial relative risk of CRC in individuals of European descent. Compared with GWAS of breast cancer and prostate cancer, studies conducted for CRC have been relatively small. Our study, in which we evaluated approximately 7,000 promising variants identified by GWAS in the replication stages, represents one of the largest efforts thus far to follow up genetic variants identified by GWAS. We identified six new loci, representing the largest number of new loci identified for CRC risk in a single study. Although multiple GWAS with sample sizes larger than the one in this study have been conducted among individuals of European descent13,14,16, we were still able to identify risk-associated variants with relatively large effect sizes. Our study further highlights the value of conducting GWAS in non-European populations to discover new susceptibility loci for CRC.

In summary, we have identified six new loci associated with CRC risk in this large GWAS conducted among East Asians. These new loci contain genes with established connections to colorectal tumorigenesis through major biological pathways such as Wnt and TGF-β signaling, as well as genes with important biological functions that have not yet been well linked to CRC. Our study considerably expands knowledge of the genetic landscape of CRC and provides direction for future studies to characterize the causal variants and functional mechanisms of these GWAS-identified loci.

Methods

Study participants.

This GWAS was conducted as part of the Asia Colorectal Cancer Consortium, comprising a total of 14,963 CRC cases and 31,945 controls of East Asian ancestry from 14 studies conducted in China, South Korea and Japan (Supplementary Table 1). Specifically, stage 1 (GWAS discovery) consisted of 5 studies: Shanghai CRC Study 1 (Shanghai-1; n = 3,102), Shanghai CRC Study 2 (Shanghai-2; n = 908), Guangzhou CRC Study 1 (Guangzhou-1; n = 1,603), Aichi CRC Study 1 (Aichi-1; n = 1,346) and Korean Cancer Prevention Study-II CRC (KCPS-II; n = 1,301). With the exception of Shanghai-2, for which we added 423 controls from other studies64,65, samples for the remaining 4 studies were the same as we reported in our previous study18. Stage 2 consisted of 3 studies: Shanghai CRC Study 3 (Shanghai-3; n = 6,577), Guangzhou CRC Study 2 (Guangzhou-2; n = 809) and Guangzhou CRC Study 3 (Guangzhou-3; n = 2,408). Stage 3 included 1 study: the BioBank Japan CRC Study (BBJ; n = 14,172). Stage 4 consisted of 5 studies: Guangzhou CRC Study 4 (Guangzhou-4; n = 1,791), Aichi CRC Study 2 (Aichi-2; n = 708), Korean–National Cancer Center CRC Study (Korea-NCC; n = 2,721), Seoul CRC Study (Korea-Seoul; n = 1,522) and Hwasun Cancer Epidemiology Study–Colon and Rectum Cancer (HCES-CRC; n = 7,930). We estimated that our study had a statistical power of >80% to identify an association with an OR of 1.10 or greater at P < 5 × 10−8 for SNPs with a MAF of as low as 0.30. We evaluated the generalizability of the newly identified associations with CRC risk in individuals of European descent in data from 3 consortia including 23 studies (Supplementary Table 13) with a total sample size of 16,984 cases and 18,262 controls recruited in the United States, Europe, Canada and Australia: the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO)17, the Colorectal Transdisciplinary (CORECT) Study and the Colon Cancer Family Registry (CCFR)21. Summary descriptions of participating studies are presented in the Supplementary Note. Study protocols were approved by the relevant review boards in the respective institutions, and informed consent was obtained from all study participants.

Laboratory procedures.

Genotyping of samples in stage 1 was conducted as described previously18,64,65,66,67,68,69 using the following platforms: the Affymetrix Genome-Wide Human SNP Array 6.0, the Illumina HumanOmniExpress BeadChip, the Illumina Infinium HumanHap550 BeadChip, the Illumina 660W-Quad BeadChip, the Illumina Human610-Quad BeadChip, the Illumina Infinium HumanHap610 BeadChip and the Affymetrix Genome-Wide Human SNP Array 5.0. We used a uniform quality control protocol as recently described18 to filter samples and SNPs. Genotyping and quality control methods are also presented in the Supplementary Note. After quality control exclusions, we obtained 502,145 autosomal SNPs for samples in Shanghai-1, 245,961 SNPs in Shanghai-2, 250,612 SNPs in Guangzhou-1, 232,426 SNPs in Aichi-1 and 312,869 SNPs in KCPS-II (Supplementary Table 2).

Genotyping for 3,632 cases and 6,404 controls in stage 2 was completed using Illumina Infinium assays as part of the customer add-on content for multiple projects to the Illumina HumanExome BeadChip (see URLs). Details of array design, genotyping, genotype calling and quality control are provided in the Supplementary Note. Samples were excluded according to the following criteria: (i) genotype call rate of <98%, (ii) genetically identical or duplicated samples, (iii) sex determined using genetic data inconsistent with epidemiological or clinical data, (iv) first- or second-degree relatives, (v) ancestry outliers or (vi) heterozygosity outliers. Genetic markers were excluded using the following criteria: (i) MAF = 0, (ii) genotype call rate of <98%, (iii) consistency rate of <98% in positive quality control samples, (iv) P for Hardy-Weinberg equilibrium < 1 × 10−5 in controls or (v) caution SNPs revealed by the Exome Chip Design group (see URLs). We obtained a final data set including 6,899 SNPs genotyped in 3,519 cases and 6,275 controls for this project.

Cases and controls in stage 3 were genotyped using the Illumina HumanHap610-Quad BeadChip. Quality control filters were based on criteria described previously20. Methods of genotyping and quality control procedures are also presented in the Supplementary Note. After sample and SNP exclusions, we generated a data set comprising 2,814 cases and 11,358 controls with 460,463 SNPs.

Stage 4 genotyping for 29 SNPs was conducted using the iPLEX Sequenom MassARRAY platform according to manufacturer's protocols at the Vanderbilt Molecular Epidemiology Laboratory (Nashville, Tennessee, USA). Details of genotyping and quality control are provided in the Supplementary Note. We filtered out SNPs with (i) genotype call rate of <95%, (ii) genotyping consistency rate of <95% in positive control samples, (iii) an unclear genotype call or (iv) P for Hardy-Weinberg equilibrium of <1 × 10−5 in controls. The average consistency rate of these SNPs passing quality control filters was 99.9% with a median value of 100% in each of the five participating studies included in this stage.

Samples in GECCO, CORECT and CCFR were genotyped with Illumina and Affymetrix arrays17,21. Genotyping, quality control and imputation have been reported previously17,21 and are described in the Supplementary Note.

SNP selection.

Selection of SNPs for stage 2 replication was primarily based on the following criteria: (i) P < 0.05 in meta-analysis, (ii) P for heterogeneity > 0.0001, (iii) imputation R2 > 0.5 in each of the included studies, (iv) MAF > 0.05 in each of the included studies, (v) SNPs uncorrelated with established CRC SNPs (defined as r2 < 0.2 in the HapMap Asian population), (vi) SNPs uncorrelated with other SNPs identified in this project (r2 < 0.2) and (vii) data available in at least two studies (Supplementary Note). We included multiple SNPs in some regions with a prior association P value of <0.002 or with genes of interest. Risk variants identified from previously published GWAS were also included in the assay7,8,9,10,11,12,13,14,15,16,17,18,19,20. In total, 8,570 unique SNPs were selected. Of these, 7,113 SNPs were successfully designed. For stage 3 replication, we selected 559 SNPs according to the following criteria: (i) P < 0.005 in the meta-analysis of data from stages 1 and 2, (ii) association in the same direction in both stages and (iii) P for heterogeneity > 0.0001. For stage 4, we selected 30 SNPs on the basis of the following criteria: (i) P < 0.0001 in the meta-analysis of stages 1–3, (ii) P < 0.01 in the meta-analysis of stages 2 and 3, (iii) association in the same direction in the three stages and (iv) P for heterogeneity > 0.0001.

Statistical and bioinformatics analysis.

Details of imputation and population substructure evaluation are provided in the Supplementary Note. Briefly, stage 1 imputation was performed with the CHB (Han Chinese in Beijing, China) and JPT (Japanese in Tokyo, Japan) HapMap 2 panel as the reference using the MACH v1.0 program70 (see URLs). Stage 3 imputation was conducted with phased data for JPT, CHS (Southern Han Chinese, China) and CHD (Chinese in Metropolitan Denver, Colorado) participants from 1000 Genomes Project phase 1 release v3 as the reference using MACH v1.0 and Minimac71 (see URLs). Regional imputation of genotype data from TCGA30 (see URLs) was performed with the GIANT ALL reference panel from 1000 Genomes Project phase 1 release v3 using MACH v1.0 and Minimac (see URLs). To evaluate imputation quality in our study, we directly genotyped the 10 newly identified risk variants in the approximately 2,800 samples included in stage 1. The concordance between imputed and genotyped data was very high, with mean values ranging from 96.00% to 99.96% for the ten SNPs (Supplementary Table 20). For rs10849432, the imputation quality for the Aichi-1 study was relatively low (R2 = 0.57), and data from this study were therefore not included in our final analysis. We evaluated population structure in studies included in stages 1 and 2 using principal-components analysis with EIGENSTRAT software72 (see URLs). On the basis of adjusted regression models including the first ten principal components, the genomic inflation factor λ was <1.04 in each of the five studies included in stage 1 and 1.0368 in the meta-analysis of all five studies (Supplementary Fig. 2). The λ value was <1.05 in each of the three studies included in stage 2 and 1.0525 in the meta-analysis of all three studies (Supplementary Fig. 3). A rescaled inflation statistic, λ1,000, representing the equivalent value for a study with 1,000 cases and 1,000 controls using the formula73 λ1,000 = 1 + 500 × (λ − 1) × (1/Ncases + 1/Ncontrols) was 1.01 in both stages 1 and 2. These findings show little evidence of population stratification in our studies.

Associations between SNPs and CRC risk were evaluated on the basis of the log-additive model using Mach2dat70, PLINK (version 1.0.7)74, R version 3.0.0 and SAS version 9.3 (for all, see URLs). Per-allele OR estimates and 95% CIs were derived from logistic regression models, adjusting for age, sex and the first ten principal components when appropriate. Association analysis was conducted for each participating study separately, and a fixed-effects meta-analysis was conducted to obtain summary results for each of the four stages and all stages combined with the inverse-variance method using the Metal75 program. SNPs showing an association at P < 5 × 10−8 in the combined analysis of all studies were considered genome-wide significant. We also performed stratified analyses for the top SNPs by tumor anatomical site (colon or rectum), population (Chinese, Korean or Japanese) and sex (male or female). We estimated heterogeneity across studies and subgroups with a Cochran's Q test76, with P for heterogeneity < 0.008 set as statistically significant when considering multiple comparisons of six independent loci. Independent signals in a locus were identified using stepwise logistic regression models conditioning on the top risk-associated variant we identified in each of the new loci using R software (see URLs). We estimated haplotype frequencies using Haploview (version 4.2)77 (see URLs) and conducted haplotype association analysis for two loci (11q12.2 and 19q13.2) where two or more SNPs were identified using SAS Genetics v9.3 with logistic regression models. Pairwise SNP-SNP interactions between 6 top risk-associated variants in the newly identified loci with association P < 5 × 10−8 and also between these 6 SNPs and the risk-associated variants in 25 previously reported loci were evaluated using the maximum-likelihood ratio test with inclusion of interaction terms in logistic regression models. Interactions with P < 0.00028 were considered statistically significant with adjustment for multiple comparisons of 180 tests.

The familial relative risk (λ) for the offspring of an affected individual due to a single locus was estimated using a log-additive model: λ = (pr2 + q)/(pr + q)2, where p is the frequency of the risk allele, q = 1−p is the frequency of the reference allele and r is the per-allele relative risk78. The proportion of the familial relative risk explained by this locus, assuming a multiplicative interaction between markers in the locus and other loci, was calculated as log (λ)/log (λ0), where λ0 is the overall familial relative risk. λ0 is assigned to be 2.2 for CRC risk estimated from a meta-analysis79. Assuming that the risks associated with individual loci combine multiplicatively, the familial relative risks also multiply. Thus, the combined contribution of the familial relative risks from multiple loci is equal to

We generated forest plots and quantile-quantile plots using R software (see URLs). Regional association plots for SNPs in newly identified loci were generated using the website-based tool LocusZoom (version 1.1)80 (see URLs). LD structure between SNPs was determined on the basis of data from 1000 Genomes Project Pilot 1 or HapMap 2 as provided by the website-based tool SNAP81 (see URLs) and plotted using Haploview, SNAP and the UCSC Genome Browser (see URLs). LD blocks were defined using HapMap recombination rates and hotspots23. All genomic coordinates are based on NCBI Build 36.

To find putative functional variants for newly identified loci, we identified all SNPs in LD (r2 > 0.5) with the risk-associated variants using data from the 1000 Genomes Project22 and HapMap 2 (ref. 23). We mapped the genomic locations of these SNPs to nonsynonymous sites, splice sites, promoters, nearGene-3 regions, nearGene-5 regions, 3′ UTRs, 5′ UTRs, introns and intergenic regions. We evaluated the potential functional effect of nonsynonymous SNPs using the prediction algorithms SIFT36 and PolyPhen-2 (ref. 37) (see URLs). We predicted the putative function of SNPs in promoters, nearGene-3 regions, nearGene-5 regions, 3′ UTRs and 5′ UTRs with the SNPinfo Web Server45 (see URLs). We conducted analyses to evaluate the potential regulatory effect of SNPs in noncoding regions on transcription using the ENCODE tool HaploReg (v2)82 and the UCSC Genome Browser (see URLs) on the basis of their location within regions of promoter or enhancer activity, DNase I hypersensitivity, local histone modification, proteins bound to these regulatory sites, cis-eQTL and transcription factor binding motifs. We obtained additional functional evidence for these SNPs from the published literature.

We identified all genes that localize to 1-Mb windows centered on the top risk-associated variants in our newly identified loci, including SNPs correlated (r2 > 0.5) with the top risk variants. To determine whether these genes might explain the observed associations in these loci, we first examined genome-wide cis-eQTL data in multiple tissues from four major eQTL databases: the Blood eQTL Browser25, the eQTL Browser26, the Genotype-Tissue Expression (GTEx) Project27 and the Multiple Tissue Human Expression Resource (MuTHER) Project28. The significance threshold for these analyses was set to P < 0.008 to account for six tests. Somatic mutations of these genes were evaluated using data from COSMIC29 (see URLs). Expression levels of these genes in CRC cell lines were assessed using data from the Expression Atlas31 (see URLs). To correct for multiple comparisons of the 11 key genes, associations with P < 0.0045 were considered to be statistically significant. We searched the published literature for these genes with respect to CRC in PubMed and OMIM (see URLs).

Expression analysis.

We downloaded RNA sequencing (level 1) and SNP array (level 2) data for 364 colon adenocarcinoma and 18 normal colon tissue samples from TCGA30 (see URLs). To quantify expression levels of candidate genes in the newly identified loci, we normalized gene expression levels using RPKM (reads per kilobase of exon per million mapped reads) values as previously described83. Expression differences between tumor and normal samples for each gene were evaluated on the basis of RPKM values with the Wilcoxon rank-sum test. Associations between gene RPKM values and SNP genotypes were analyzed using a linear regression model including age and sex as covariates. We converted the RPKM value of a gene to log scale for analysis if it was not normally distributed. We considered P < 0.0045 to be statistically significant with adjustment for testing of the 11 key genes.

URLs.

1000 Genomes Browser, http://browser.1000genomes.org/index.html; BioBank Japan (in Japanese), http://biobankjp.org/; Blood eQTL browser, http://genenetwork.nl/bloodeqtlbrowser/; Catalogue of Somatic Mutations in Cancer (COSMIC), http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/; database of Genotypes and Phenotypes (dbGaP), http://www.ncbi.nlm.nih.gov/gap; EIGENSTRAT, http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm; eQTL Browser from the University of Chicago, http://eqtl.uchicago.edu/Home.html; GTEx eQTL Browser, http://www.ncbi.nlm.nih.gov/projects/gap/eqtl/index.cgi/; Expression Atlas, http://www.ebi.ac.uk/gxa/; Haploview, http://www.broad.mit.edu/mpg/haploview/; HaploReg v2, http://www.broadinstitute.org/mammals/haploreg/haploreg.php; HapMap Project, http://hapmap.ncbi.nlm.nih.gov/; Illumina HumanExome-12v1_A BeadChip, International Mouse Phenotyping Consortium (IMPC), https://www.mousephenotype.org/; LocusZoom, http://csg.sph.umich.edu/locuszoom/; http://genome.sph.umich.edu/wiki/Exome_Chip_Design; MACH 1.0, http://www.sph.umich.edu/csg/abecasis/MACH/; Mach2dat, http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output; Minimac, http://genome.sph.umich.edu/wiki/Minimac; Metal, http://www.sph.umich.edu/csg/abecasis/Metal/; Multiple Tissue Human Expression Resource (MuTHER) Project, http://www.muther.ac.uk/; Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/omim/; PLINK version 1.07, http://pngu.mgh.harvard.edu/~purcell/plink/; PolyPhen-2, http://genetics.bwh.harvard.edu/pph2/; R version 3.0.0, http://www.r-project.org/; SAS version 9.2, http://www.sas.com/; SIFT, SNAP, http://www.broadinstitute.org/mpg/snap/; http://sift.jcvi.org/; The Cancer Genome Atlas (TCGA), http://cancergenome.nih.gov/; TRANSFAC, http://www.gene-regulation.com/pub/databases.html; UCSC Genome Browser, http://genome.ucsc.edu/.

References

  1. 1

    Jemal, A. et al. Global cancer statistics. CA Cancer J. Clin. 61, 69–90 (2011).

    Google Scholar 

  2. 2

    Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer—analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 78–85 (2000).

    CAS  Article  Google Scholar 

  3. 3

    de la Chapelle, A. Genetic predisposition to colorectal cancer. Nat. Rev. Cancer 4, 769–780 (2004).

    Article  CAS  Google Scholar 

  4. 4

    Aaltonen, L., Johns, L., Jarvinen, H., Mecklin, J.P. & Houlston, R. Explaining the familial colorectal cancer risk associated with mismatch repair (MMR)-deficient and MMR-stable tumors. Clin. Cancer Res. 13, 356–361 (2007).

    Article  CAS  Google Scholar 

  5. 5

    Ma, X., Zhang, B. & Zheng, W. Genetic variants associated with colorectal cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence. Gut 63, 326–336 (2014).

    Article  CAS  Google Scholar 

  6. 6

    Palles, C. et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet. 45, 136–144 (2013).

    Article  CAS  Google Scholar 

  7. 7

    Zanke, B.W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat. Genet. 39, 989–994 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Tomlinson, I. et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat. Genet. 39, 984–988 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Broderick, P. et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315–1317 (2007).

    Article  CAS  Google Scholar 

  10. 10

    Jaeger, E. et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat. Genet. 40, 26–28 (2008).

    Article  CAS  Google Scholar 

  11. 11

    Tenesa, A. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631–637 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Tomlinson, I.P. et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat. Genet. 40, 623–630 (2008).

    Article  CAS  Google Scholar 

  13. 13

    Houlston, R.S. et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 40, 1426–1435 (2008).

    Article  CAS  Google Scholar 

  14. 14

    Houlston, R.S. et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat. Genet. 42, 973–977 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Tomlinson, I.P. et al. Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4, and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet. 7, e1002105 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Dunlop, M.G. et al. Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat. Genet. 44, 770–776 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Peters, U. et al. Identification of genetic susceptibility loci for colorectal tumors in a genome-wide meta-analysis. Gastroenterology 144, 799–807 (2013).

    Article  CAS  Google Scholar 

  18. 18

    Jia, W.H. et al. Genome-wide association analyses in East Asians identify new susceptibility loci for colorectal cancer. Nat. Genet. 45, 191–196 (2013).

    Article  CAS  Google Scholar 

  19. 19

    Zhang, B. et al. Genome-wide association study identifies a new SMAD7 risk variant associated with colorectal cancer risk in East Asians. Int. J. Cancer 10.1002/ijc.28733 (21 January 2014).

  20. 20

    Cui, R. et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 60, 799–805 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Figueiredo, J.C. et al. Genotype-environment interactions in microsatellite stable/microsatellite instability-low colorectal cancer: results from a genome-wide association study. Cancer Epidemiol. Biomarkers Prev. 20, 758–766 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22

    Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Frazer, K.A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    Article  CAS  Google Scholar 

  24. 24

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  25. 25

    Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    CAS  Article  Google Scholar 

  26. 26

    Degner, J.F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  28. 28

    Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Forbes, S.A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).

    Article  CAS  Google Scholar 

  30. 30

    Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

  31. 31

    Kapushesky, M. et al. Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res. 40, D1077–D1081 (2012).

    Article  CAS  Google Scholar 

  32. 32

    Tang, W. et al. A genome-wide RNAi screen for Wnt/β-catenin pathway components identifies unexpected roles for TCF transcription factors in cancer. Proc. Natl. Acad. Sci. USA 105, 9697–9702 (2008).

    Article  Google Scholar 

  33. 33

    Angus-Hill, M.L., Elbert, K.M., Hidalgo, J. & Capecchi, M.R. T-cell factor 4 functions as a tumor suppressor whose disruption modulates colon cell proliferation and tumorigenesis. Proc. Natl. Acad. Sci. USA 108, 4914–4919 (2011).

    Article  Google Scholar 

  34. 34

    Bass, A.J. et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nat. Genet. 43, 964–968 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Grainger, D.J. et al. Genetic control of the circulating concentration of transforming growth factor type β1. Hum. Mol. Genet. 8, 93–97 (1999).

    Article  CAS  Google Scholar 

  36. 36

    Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

    CAS  Article  Google Scholar 

  37. 37

    Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Dunning, A.M. et al. A transforming growth factor β1 signal peptide variant increases secretion in vitro and is associated with increased incidence of invasive breast cancer. Cancer Res. 63, 2610–2615 (2003).

    CAS  PubMed  Google Scholar 

  39. 39

    Suthanthiran, M. et al. Transforming growth factor-β1 hyperexpression in African-American hypertensives: a novel mediator of hypertension and/or target organ damage. Proc. Natl. Acad. Sci. USA 97, 3479–3484 (2000).

    CAS  PubMed  Google Scholar 

  40. 40

    Yamada, Y. et al. Association of a polymorphism of the transforming growth factor-β1 gene with genetic susceptibility to osteoporosis in postmenopausal Japanese women. J. Bone Miner. Res. 13, 1569–1576 (1998).

    Article  CAS  Google Scholar 

  41. 41

    Markowitz, S.D. & Bertagnolli, M.M. Molecular origins of cancer: molecular basis of colorectal cancer. N. Engl. J. Med. 361, 2449–2460 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Howe, J.R. et al. Mutations in the SMAD4/DPC4 gene in juvenile polyposis. Science 280, 1086–1088 (1998).

    Article  CAS  Google Scholar 

  43. 43

    Valle, L. et al. Germline allele-specific expression of TGFBR1 confers an increased risk of colorectal cancer. Science 321, 1361–1365 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Liu, L. et al. Functional FEN1 genetic variants contribute to risk of hepatocellular carcinoma, esophageal cancer, gastric cancer and colorectal cancer. Carcinogenesis 33, 119–123 (2012).

    Article  CAS  Google Scholar 

  45. 45

    Xu, Z. & Taylor, J.A. SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 37, W600–W605 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Zheng, L. et al. Functional regulation of FEN1 nuclease and its link to cancer. Nucleic Acids Res. 39, 781–794 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Zheng, L. et al. Fen1 mutations result in autoimmunity, chronic inflammation and cancers. Nat. Med. 13, 812–819 (2007).

    Article  CAS  Google Scholar 

  48. 48

    Kucherlapati, M. et al. Haploinsufficiency of Flap endonuclease (Fen1) leads to rapid tumor progression. Proc. Natl. Acad. Sci. USA 99, 9924–9929 (2002).

    Article  CAS  Google Scholar 

  49. 49

    Schaeffer, L. et al. Common genetic variants of the FADS1-FADS2 gene cluster and their reconstructed haplotypes are associated with the fatty acid composition in phospholipids. Hum. Mol. Genet. 15, 1745–1756 (2006).

    Article  CAS  Google Scholar 

  50. 50

    Castellone, M.D., Teramoto, H., Williams, B.O., Druey, K.M. & Gutkind, J.S. Prostaglandin E2 promotes colon cancer cell growth through a Gs-axin–β-catenin signaling axis. Science 310, 1504–1510 (2005).

    Article  CAS  Google Scholar 

  51. 51

    Cai, Q. et al. Prospective study of urinary prostaglandin E2 metabolite and colorectal cancer risk. J. Clin. Oncol. 24, 5010–5016 (2006).

    Article  CAS  Google Scholar 

  52. 52

    Rogers, L.M., Riordan, J.D., Swick, B.L., Meyerholz, D.K. & Dupuy, A.J. Ectopic expression of Zmiz1 induces cutaneous squamous cell malignancies in a mouse model of cancer. J. Invest. Dermatol. 133, 1863–1869 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Turnbull, C. et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat. Genet. 42, 504–507 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Ovalle, S. et al. The tetraspanin CD9 inhibits the proliferation and tumorigenicity of human colon carcinoma cells. Int. J. Cancer 121, 2140–2152 (2007).

    Article  CAS  Google Scholar 

  55. 55

    Mori, M. et al. Motility related protein 1 (MRP1/CD9) expression in colon cancer. Clin. Cancer Res. 4, 1507–1510 (1998).

    CAS  PubMed  Google Scholar 

  56. 56

    Lee, J.H. et al. Glycoprotein 90K, downregulated in advanced colorectal cancer tissues, interacts with CD9/CD82 and suppresses the Wnt/β-catenin signal via ISGylation of β-catenin. Gut 59, 907–917 (2010).

    Article  CAS  Google Scholar 

  57. 57

    Braumüller, H. et al. T-helper-1-cell cytokines drive cancer into senescence. Nature 494, 361–365 (2013).

    Article  CAS  Google Scholar 

  58. 58

    Wolf, M.J., Seleznik, G.M., Zeller, N. & Heikenwalder, M. The unexpected role of lymphotoxin β receptor signaling in carcinogenesis: from lymphoid tissue formation to liver and prostate cancer development. Oncogene 29, 5006–5018 (2010).

    Article  CAS  Google Scholar 

  59. 59

    Lukashev, M. et al. Targeting the lymphotoxin-β receptor with agonist antibodies as a potential cancer therapy. Cancer Res. 66, 9617–9624 (2006).

    Article  CAS  Google Scholar 

  60. 60

    Funato, Y. & Miki, H. Nucleoredoxin, a novel thioredoxin family member involved in cell growth and differentiation. Antioxid. Redox Signal. 9, 1035–1057 (2007).

    Article  CAS  Google Scholar 

  61. 61

    Funato, Y., Michiue, T., Asashima, M. & Miki, H. The thioredoxin-related redox-regulating protein nucleoredoxin inhibits Wnt–β-catenin signalling through Dishevelled. Nat. Cell Biol. 8, 501–508 (2006).

    Article  CAS  Google Scholar 

  62. 62

    Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Eeles, R.A. et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat. Genet. 45, 385–391 (2013).

    Article  CAS  Google Scholar 

  64. 64

    Abnet, C.C. et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat. Genet. 42, 764–767 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Amundadottir, L. et al. Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat. Genet. 41, 986–990 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Bei, J.X. et al. A genome-wide association study of nasopharyngeal carcinoma identifies three new susceptibility loci. Nat. Genet. 42, 599–603 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Nakata, I. et al. Association between the SERPING1 gene and age-related macular degeneration and polypoidal choroidal vasculopathy in Japanese. PLoS ONE 6, e19108 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. 68

    Jee, S.H. et al. Adiponectin concentrations: a genome-wide association study. Am. J. Hum. Genet. 87, 545–552 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Zheng, W. et al. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41, 324–328 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Li, Y., Willer, C.J., Ding, J., Scheet, P. & Abecasis, G.R. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  71. 71

    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Freedman, M.L. et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. 74

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Lau, J., Ioannidis, J.P. & Schmid, C.H. Quantitative synthesis in systematic reviews. Ann. Intern. Med. 127, 820–826 (1997).

    Article  CAS  Google Scholar 

  77. 77

    Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Zheng, W. et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum. Mol. Genet. 22, 2539–2550 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. 79

    Johns, L.E. & Houlston, R.S. A systematic review and meta-analysis of familial colorectal cancer risk. Am. J. Gastroenterol. 96, 2992–3003 (2001).

    Article  CAS  Google Scholar 

  80. 80

    Pruim, R.J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Johnson, A.D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Ward, L.D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

    Article  CAS  Google Scholar 

  83. 83

    Yan, G. et al. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat. Biotechnol. 29, 1019–1023 (2011).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors are solely responsible for the scientific content of this paper. The sponsors of this study had no role in study design, data collection, analysis or interpretation, writing of the report or the decision for submission. We thank all study participants and research staff of all parent studies for their contributions and commitment to this project, R. Courtney for DNA preparation, J. He for data processing and analyses, X. Guo for suggestions on bioinformatics analysis, and M.J. Daly and B.J. Rammer for editing and preparing the manuscript. The work at the Vanderbilt University School of Medicine was supported by US National Institutes of Health (NIH) grants R37CA070867, R01CA082729, R01CA124558, R01CA148667 and R01CA122364, as well as by Ingram Professorship and Research Reward funds from the Vanderbilt University School of Medicine. Studies (grant support) participating in the Asia Colorectal Cancer Consortium include the Shanghai Women's Health Study (US NIH, R37CA070867), the Shanghai Men's Health Study (US NIH, R01CA082729), the Shanghai Breast and Endometrial Cancer Studies (US NIH, R01CA064277 and R01CA092585; contributing only controls), Shanghai Colorectal Cancer Study 3 (US NIH, R37CA070867 and Ingram Professorship funds), the Guangzhou Colorectal Cancer Study (National Key Scientific and Technological Project, 2011ZX09307-001-04; the National Basic Research Program, 2011CB504303, contributing only controls; the Natural Science Foundation of China, 81072383, contributing only controls), the Japan BioBank Colorectal Cancer Study (grant from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese government), the Hwasun Cancer Epidemiology Study–Colon and Rectum Cancer (HCES-CRC; grants from the Korea Center for Disease Control and Prevention and the Jeonnam Regional Cancer Center), the Aichi Colorectal Cancer Study (Grant-in-Aid for Cancer Research, grant for the Third Term Comprehensive Control Research for Cancer and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology, 17015018 and 221S0001), the Korea-NCC (National Cancer Center) Colorectal Cancer Study (Basic Science Research Program through the National Research Foundation of Korea, 2010-0010276; National Cancer Center Korea, 0910220), the Korea-Seoul Colorectal Cancer Study (none reported) and the KCPS-II Colorectal Cancer Study (National R&D Program for Cancer Control, 1220180; Seoul R&D Program, 10526).

We also thank all participants, staff and investigators from the GECCO, CORECT and CCFR consortia for making it possible to present results from populations of European ancestry for the new CRC-associated loci identified among East Asians. GECCO, CORECT and CCFR are directed by U. Peters, S. Gruber and G. Casey, respectively. Complete lists of investigators from the GECCO, CORECT and CCFR consortia are provided below.

Investigators (institution and location) in the GECCO consortium include (in alphabetical order) John A. Baron (Division of Gastroenterology and Hepatology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA), Sonja I. Berndt (Division of Cancer Epidemiology and Genetics, National Cancer Institute, US NIH, Bethesda, Maryland, USA), Stéphane Bezieau (Service de Génétique Médicale, Centre Hospitalier Universitaire (CHU) Nantes, Nantes, France), Hermann Brenner (Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany), Bette J. Caan (Division of Research, Kaiser Permanente Medical Care Program, Oakland, California, USA), Christopher S. Carlson (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, School of Public Health, University of Washington, Seattle, Washington, USA), Graham Casey (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Andrew T. Chan (Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA and Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA), Jenny Chang-Claude (Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg, Germany), Stephen J. Chanock (Division of Cancer Epidemiology and Genetics, National Cancer Institute, US NIH, Bethesda, Maryland, USA), David V. Conti (Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA), Keith Curtis (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), David Duggan (Translational Genomics Research Institute, Phoenix, Arizona, USA), Charles S. Fuchs (Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA and Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA), Steven Gallinger (Department of Surgery, Mount Sinai Hospital, Toronto, Ontario, Canada and Samuel Lunenfeld Research Institute, Toronto, Ontario, Canada), Edward L. Giovannucci (Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA and Department of Nutrition, Harvard School of Public Health, Boston, Massachusetts, USA), Stephen B. Gruber (University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Robert W. Haile (Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA), Tabitha A. Harrison (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Richard B. Hayes (Division of Epidemiology, Department of Environmental Medicine, New York University School of Medicine, New York, New York, USA), Michael Hoffmeister (Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Heidelberg, Germany), John L. Hopper (Melbourne School of Population Health, The University of Melbourne, Melbourne, Victoria, Australia), Li Hsu (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA and Department of Biostatistics, University of Washington, Seattle, Washington, USA), Thomas J. Hudson (Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada), David J. Hunter (Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA), Carolyn M. Hutter (Division of Cancer Control and Population Sciences, National Cancer Institute, US NIH, Bethesda, Maryland, USA), Rebecca D. Jackson (Division of Endocrinology, Diabetes and Metabolism, Ohio State University, Columbus, Ohio, USA), Mark A. Jenkins (Melbourne School of Population Health, The University of Melbourne, Melbourne, Victoria, Australia), Shuo Jiao (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Sébastien Küry (Service de Génétique Médicale, CHU Nantes, Nantes, France), Loic Le Marchand (Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, USA), Mathieu Lemire (Ontario Institute for Cancer Research, Toronto, Ontario, Canada), Noralane M. Lindor (Department of Health Sciences Research, Mayo Clinic, Scottsdale, Arizona, USA), Jing Ma (Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA), Polly A. Newcomb (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA and Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA), Ulrike Peters (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA and Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA), John D. Potter (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA, Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA and Centre for Public Health Research, Massey University, Palmerston North, New Zealand), Conghui Qu (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Thomas Rohan (Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Yeshiva University, Bronx, New York, USA), Robert E. Schoen (Department of Medicine and Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA), Fredrick R. Schumacher (Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA), Daniela Seminara (Division of Cancer Control and Population Sciences, National Cancer Institute, US NIH, Bethesda, Maryland, USA), Martha L. Slattery (Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah, USA), Stephen N. Thibodeau (Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA and Department of Laboratory Genetics, Mayo Clinic, Rochester, Minnesota, USA), Emily White (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA and Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA) and Brent W. Zanke (Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada).

Investigators (institution and location) from the CORECT consortium include (in alphabetical order) Kendra Blalock (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Peter T. Campbell (Epidemiology Research Program, American Cancer Society, Atlanta, Georgia, USA), Graham Casey (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), David V. Conti (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Christopher K. Edlund (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Jane Figueiredo (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), W. James Gauderman (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Jian Gong (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Roger C. Green (Faculty of Medicine, Memorial University of Newfoundland, St. John's, Newfoundland, Canada), Stephen B. Gruber (University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), John F. Harju (University of Michigan Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan, USA), Tabitha A. Harrison (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Eric J. Jacobs (Epidemiology Research Program, American Cancer Society, Atlanta, Georgia, USA), Mark A. Jenkins (Melbourne School of Population Health, The University of Melbourne, Melbourne, Victoria, Australia), Shuo Jiao (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Li Li (Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio, USA), Yi Lin (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Frank J. Manion (University of Michigan Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan, USA), Victor Moreno (Institut d'Investigació Biomèdica de Bellvitge, Institut Catala d'Oncologia, Hospitalet, Barcelona, Spain), Bhramar Mukherjee (University of Michigan Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan, USA), Ulrike Peters (Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA), Leon Raskin (University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Fredrick R. Schumacher (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA), Daniela Seminara (Division of Cancer Control and Population Sciences, National Cancer Institute, US NIH, Bethesda, Maryland, USA), Gianluca Severi (Melbourne School of Population Health, The University of Melbourne, Melbourne, Victoria, Australia), Stephanie L. Stenzel (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA) and Duncan C. Thomas (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA).

The CCFR consortium is represented by Graham Casey (Department of Preventive Medicine, University of Southern California Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA).

We also thank B. Buecher of ASTERISK; U. Handte-Daub, M. Celik, R. Hettler-Jensen, U. Benscheid and U. Eilber of DACHS; and P. Soule, H. Ranu, I. Devivo, D.J. Hunter, Q. Guo, L. Zhu and H. Zhang of HPFS, NHS and PHS, as well as the following state cancer registries for their help: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Nebraska, New Hampshire, New Jersey, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, Tennessee, Texas, Virginia, Washington and Wyoming. We thank C. Berg and P. Prorok of PLCO; T. Riley of Information Management Services, Inc.; B. O'Brien of Westat, Inc.; B. Kopp and W. Shao of SAIC-Frederick; the WHI investigators (see https://www.whi.org/researchers/SitePages/Write%20a%20Paper.aspx) and the GECCO Coordinating Center. Participating studies (grant support) in the GECCO, CORECT and CCFR GWAS meta-analysis are GECCO (US NIH, U01CA137088 and R01CA059045), DALS (US NIH, R01CA048998), DACHS (German Federal Ministry of Education and Research, BR 1704/6-1, BR 1704/6-3, BR 1704/6-4, CH 117/1-1, 01KH0404 and 01ER0814), HPFS (US NIH, P01CA055075, UM1CA167552, R01137178 and P50CA127003), NHS (US NIH, R01137178, P50CA127003 and P01CA087969), OFCCR (US NIH, U01CA074783), PMH (US NIH, R01CA076366), PHS (US NIH, R01CA042182), VITAL (US NIH, K05CA154337), WHI (US NIH, HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, HHSN271201100004C and 268200764316C) and PLCO (US NIH, Z01CP 010200, U01HG004446 and U01HG 004438). CORECT is supported by the National Cancer Institute as part of the GAME-ON consortium (US NIH, U19CA148107) with additional support from National Cancer Institute grants (R01CA81488 and P30CA014089), the National Human Genome Research Institute at the US NIH (T32HG000040) and the National Institute of Environmental Health Sciences at the US NIH (T32ES013678). CCFR is supported by the National Cancer Institute, US NIH under RFA CA-95-011 and through cooperative agreements with members of the Colon Cancer Family Registry and principal investigators of the Australasian Colorectal Cancer Family Registry (US NIH, U01CA097735), the Familial Colorectal Neoplasia Collaborative Group (US NIH, U01CA074799) (University of Southern California), the Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (US NIH, U01CA074800), the Ontario Registry for Studies of Familial Colorectal Cancer (US NIH, U01CA074783), the Seattle Colorectal Cancer Family Registry (US NIH, U01CA074794) and the University of Hawaii Colorectal Cancer Family Registry (US NIH, U01CA074806). The GWAS work was supported by a National Cancer Institute grant (US NIH, U01CA122839). OFCCR was supported by a GL2 grant from the Ontario Research Fund, Canadian Institutes of Health Research and a Cancer Risk Evaluation (CaRE) Program grant from the Canadian Cancer Society Research Institute. T.J. Hudson and B.W. Zanke are recipients of Senior Investigator Awards from the Ontario Institute for Cancer Research, through support from the Ontario Ministry of Economic Development and Innovation. ASTERISK was funded by a Regional Hospital Clinical Research Program (PHRC) and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Françaises dans la Lutte contre le Cancer (GEFLUC), the Association Anne de Bretagne Génétique and the Ligue Régionale Contre le Cancer (LRCC). PLCO data sets were accessed with approval through dbGaP (CGEMS prostate cancer scan, phs000207.v1.p1; CGEMS pancreatic cancer scan, phs000206.v4.p3; and GWAS of Lung Cancer and Smoking, phs000093.v2.p2, which was funded by Z01CP 010200, U01HG004446 and U01HG 004438 from the US NIH).

Author information

Affiliations

Authors

Consortia

Contributions

W.Z. conceived and directed the Asia Colorectal Cancer Consortium and the Shanghai-Vanderbilt Colorectal Cancer Genetics Project. W.-H.J. and Y.-X.Z.; K. Matsuda; S.-S.K.; K. Matsuo; X.-O.S., Y.-B.X. and Y.-T.G.; A.S.; S.H.J.; and D.-H.K. directed CRC projects for the Guangzhou Colorectal Cancer Study, the BioBank Japan Colorectal Cancer Study, the Hwasun Cancer Epidemiology Study–Colon and Rectum Cancer (HCES-CRC), the Aichi Colorectal Cancer Study, the Shanghai studies, the Korea-NCC (National Cancer Center) Colorectal Cancer Study, the KCPS-II Colorectal Cancer Study and the Korea-Seoul Colorectal Cancer Study, respectively. B.Z., Q.C. and W.W. coordinated the project. Q.C. directed laboratory operations. J.S. performed the genotyping experiments. B.Z. performed the statistical and bioinformatics analyses. W.W. contributed to the statistical analyses and data interpretation. A.T. conducted the statistical analyses and imputation for BioBank Japan. B.Z., W.W. and J.L. managed the data. Y.Z. and B.Z. performed the expression analysis for TCGA data. B.Z. and W.Z. wrote the manuscript with significant contributions from X.-O.S., Q.C., J.L., W.W., B.L. and Y.Z. All authors contributed to data and biological sample collection in the original studies included in this project and to manuscript revision. All authors have reviewed and approved the content of the paper.

Corresponding author

Correspondence to Wei Zheng.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

A complete list of members and affiliations appears in the Acknowledgments.

A complete list of members and affiliations appears in the Acknowledgments.

A complete list of members and affiliations appears in the Acknowledgments.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Tables 1–20 and Supplementary Figures 1–6 (PDF 9786 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Jia, WH., Matsuda, K. et al. Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk. Nat Genet 46, 533–542 (2014). https://doi.org/10.1038/ng.2985

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing