Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries

An Author Correction to this article was published on 13 February 2023

This article has been updated

Abstract

Colorectal cancer (CRC) is a leading cause of mortality worldwide. We conducted a genome-wide association study meta-analysis of 100,204 CRC cases and 154,587 controls of European and east Asian ancestry, identifying 205 independent risk associations, of which 50 were unreported. We performed integrative genomic, transcriptomic and methylomic analyses across large bowel mucosa and other tissues. Transcriptome- and methylome-wide association studies revealed an additional 53 risk associations. We identified 155 high-confidence effector genes functionally linked to CRC risk, many of which had no previously established role in CRC. These have multiple different functions and specifically indicate that variation in normal colorectal homeostasis, proliferation, cell adhesion, migration, immunity and microbial interactions determines CRC risk. Crosstissue analyses indicated that over a third of effector genes most probably act outside the colonic mucosa. Our findings provide insights into colorectal oncogenesis and highlight potential targets across tissues for new CRC treatment and chemoprevention strategies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Summary of the study data and analytical design and the number of previously unreported CRC risk loci discovered.
Fig. 2: Effector genes for CRC risk and the cellular processes in which they act.
Fig. 3: Representation of effector genes and their putative actions in the colorectum.

Similar content being viewed by others

Data availability

Summary-level data for the full set of Asian and European GWASs are available through the GWAS catalog (accession no. GCST90129505). For individual-level data, CCFR, CORECT, CORSA_2 and GECCO are deposited in dbGaP (accession nos. phs001415.v1.p1, phs001315.v1.p1, phs001078.v1.p1, phs001903.v1.p1, phs001856.v1.p1 and phs001045.v1.p1). NSCCG and COIN are available in the European Genome–Phenome Archive under accession nos. EGAS00001005412 (NSCCG) and EGAS00001005421 (COIN). UK Biobank data are available through http://www.ukbiobank.ac.uk and Finnish data through THL Biobank. Access to individual-level data for the remaining studies is controlled through oversight committees. CCFR 1 and CCFR 2 data can be requested by submitting an application for collaboration to the CCFR (forms, instructions and contact information can be located at www.coloncfr/collaboration.org). Applications for individual-level data from the QUASAR2 and SCOT clinical trials will be assessed by the translational research steering committees that oversee those studies. Individual-level data from the CORGI (UK1) study will be made available subject to standard institutional agreements. Application forms for these three studies, and for Scotland Phase 1, Scotland Phase 2, SOCCS, DACHS4 and Croatia, will be provided by emailing a request to access.crc.gwas.data@outlook.com. For access to CORSA_1, please contact gecco@fredhutch.org. For Generation Scotland (GS) access is through the GS Access Committee (access@generationscotland.org). Applications for the Lothian Birth Cohort data should be made through https://www.ed.ac.uk/lothian-birth-cohorts/data-access-collaboration. For details of the application process for Aichi1, Aichi2, BBJ, Guanzhou1, HCES, HCES2, Korea and Shanghai cohorts, please go to https://swhs-smhs.app.vumc.org or contact W.Z. at wei.zheng@vanderbilt.edu. CRC-relevant epigenome data were obtained from the National Center for Biotechnology Information’s Gene Expression Omnibus database under accession nos. GSE77737 and GSE36401. Genetically predicted models of gene expression and methylation have been deposited in the Zenodo repository (https://zenodo.org/deposit/6472285).

Code availability

All bioinformatics and statistical analysis tools used in the present study are open source, details of which are available in Methods and Nature Portfolio Reporting Summary. No customized code was used to process or analyze data. Details on URLs used can be found in Supplementary Note.

Change history

References

  1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).

    Article  PubMed  Google Scholar 

  2. Jiao, S. et al. Estimating the heritability of colorectal cancer. Hum. Mol. Genet. 23, 3898–3905 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Law, P. J. et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat. Commun. 10, 2154 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Huyghe, J. R. et al. Discovery of common and rare genetic risk variants for colorectal cancer. Nat. Genet. 51, 76–87 (2019).

    Article  CAS  PubMed  Google Scholar 

  5. Kvale, M. N. et al. Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics 200, 1051–1060 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wang, H. et al. Trans-ethnic genome-wide association study of colorectal cancer identifies a new susceptibility locus in VTI1A. Nat. Commun. 5, 4613 (2014).

    Article  CAS  PubMed  Google Scholar 

  7. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Bien, S. A. et al. Genetic variant predictors of gene expression provide new insight into risk of colorectal cancer. Hum. Genet. 138, 307–326 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Guo, X. et al. Identifying novel susceptibility genes for colorectal cancer risk from a transcriptome-wide association study of 125,478 subjects. Gastroenterology. 160, 1164–1178.e1166 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Koo, B. K. et al. Tumour suppressor RNF43 is a stem-cell E3 ligase that induces endocytosis of Wnt receptors. Nature 488, 665–669 (2012).

    Article  CAS  PubMed  Google Scholar 

  13. Hirano, Y. et al. Cell cycle-dependent phosphorylation of MAN1. Biochemistry 48, 1636–1643 (2009).

    Article  CAS  PubMed  Google Scholar 

  14. Fattet, L. & Yang, J. RREB1 integrates TGF-beta and RAS signals to drive EMT. Dev. Cell 52, 259–260 (2020).

    Article  CAS  PubMed  Google Scholar 

  15. Keku, T. O., Dulal, S., Deveaux, A., Jovov, B. & Han, X. The gastrointestinal microbiota and colorectal cancer. Am. J. Physiol. Gastrointest. Liver Physiol. 308, G351–G363 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Tuomisto, A. E., Makinen, M. J. & Vayrynen, J. P. Systemic inflammation in colorectal cancer: Underlying factors, effects, and prognostic significance. World J. Gastroenterol. 25, 4383–4404 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).

    Article  CAS  PubMed  Google Scholar 

  18. Pearson-Stuttard, J. et al. Type 2 diabetes and cancer: an umbrella review of observational and Mendelian randomization studies. Cancer Epidemiol. Biomarkers Prev. 30, 1218–1228 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Kyrgiou, M. et al. Adiposity and cancer at major anatomical sites: umbrella review of the literature. Br. Med. J. 356, j477 (2017).

    Article  Google Scholar 

  20. Liu, J. et al. Targeting Wnt-driven cancer through the inhibition of Porcupine by LGK974. Proc. Natl Acad. Sci. USA 110, 20224–20229 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Zhang, Y. D. et al. Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers. Nat. Commun. 11, 3353 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Liu, J. Z. et al. Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat. Genet. 42, 436–440 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Magi, R. et al. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinf. 18, 25 (2017).

    Article  Google Scholar 

  25. Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).

    Article  CAS  PubMed  Google Scholar 

  26. Johns, L. E. & Houlston, R. S. A systematic review and meta-analysis of familial colorectal cancer risk. Am. J. Gastroenterol. 96, 2992–3003 (2001).

    Article  CAS  PubMed  Google Scholar 

  27. Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Zhang, Y., Qi, G., Park, J. H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Tian Y. et al. ChAMP: updated methylation analysis pipeline for Illumina BeadChips. Bioinformatics https://doi.org/10.1093/bioinformatics/btx513 (2017).

  33. Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).

    PubMed  Google Scholar 

  34. Dong, X. et al. A general framework for functionally informed set-based analysis: application to a large-scale colorectal cancer study. PLoS Genet. 16, e1008947 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Le Borgne, F. et al. Standardized and weighted time-dependent receiver operating characteristic curves to evaluate the intrinsic prognostic capacities of a marker by taking into account confounding factors. Statist. Methods Med. Res. 27, 3397–3410 (2018).

    Article  Google Scholar 

Download references

Acknowledgements

At the Institute of Cancer Research, this work was supported by Cancer Research UK (CRUK, grant no. C1298/A25514 to R.S.H.). Additional support was provided by the National Cancer Research Network. In Edinburgh, the work was supported by program grant funding from CRUK (grant nos. C348/A12076 to M.G.D. and C6199/A16459 to I.T.), EU European Research Council Advanced Grant EVOCAN, and the infrastructure and staffing of the Edinburgh CRUK Cancer Research Centre. C.F.R. was supported by a Marie Sklodowska-Curie Intra-European Fellowship Action (grant no. IEF-301077) for the INTERMPHEN project and received considerable help from many staff in the Department of Endoscopy at the John Radcliffe Hospital in Oxford. Support from the European Union (grant nos. FP7/207–2013 and 258236), FP7 collaborative project SYSCOL and COST Actions EuColonGene and TransColonCan are also acknowledged (grant nos. BM1206 and CA17118 to I.T.). We are grateful to many colleagues within UK clinical genetics departments (for CORGI) and to many collaborators who participated in the VICTOR, QUASAR2 and SCOT trials. We also thank colleagues from the UK National Cancer Research Network (for NSCCG). IT acknowledges funding from CRUK program grant no. C6199/A27327. The work at Vanderbilt University Medical Center was supported by US National Institutes of Health (NIH; grant nos. R01CA188214, R37CA070867, UM1CA182910, R01CA124558, R01CA158473 and R01CA148667), as well as Anne Potter Wilson Chair funds from the Vanderbilt University School of Medicine (to W.Z.). Sample preparation and genotyping assays at Vanderbilt University were conducted at the Survey and Biospecimen Shared Resources and Vanderbilt Microarray Shared Resource, supported in part by the Vanderbilt-Ingram Cancer Center (grant no. P30CA068485). Statistical analyses were performed on servers maintained by the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University. Genetics and Epidemiology of Colorectal Cancer Consortium (GECCC), National Cancer Institute (NCI), NIH, US Department of Health and Human Services provided grant nos. U01 CA164930, U01 CA137088, R01 CA059045, R01201407 and R01CA206279. Genotyping services were provided by the Center for Inherited Disease Research (CIDR) contract no. HHSN268201200008I. This research was funded in part through the NIH/NCI Cancer Center support grant no. P30 CA015704. Scientific Computing Infrastructure at the Fred Hutchinson Cancer Research Center was funded by ORIP grant no. S10OD028685 (to U.P.). The Colorectal Cancer Transdisciplinary (CORECT) study was supported by the NCI/NIH, US Department of Health and Human Services (grant nos. U19 CA148107, R01 CA81488, P30 CA014089, R01 CA197350, P01 CA196569 and R01 CA201407) and National Institutes of Environmental Health Sciences, NIH (grant no. T32 ES013678). The Colon Cancer Family Registry (CCFR) participant recruitment and collection of data and biospecimens used in the present study were supported by the NCI, NIH (grant no. U01 CA167551). OFCCR was supported through funding allocated to the Ontario Registry for Studies of Familial Colorectal Cancer (grant no. U01 CA074783). The content of this manuscript does not necessarily reflect the views or policies of the NCI or any of the collaborating centers in the CCFR, and the mention of trade names, commercial products or organizations does not imply endorsement by the US Government, any cancer registry or the CCFR.

Author information

Authors and Affiliations

Authors

Contributions

C.F.R., M.N.T., P.J.L., V.M., G.C., S.B.G., I.T., W.Z., M.G.D., R.S.H. and U.P. designed the study. C.F.R., C.P., S.M.F., J.P.B., P.G.V.S., X.O.S., J.L., Q.C., X.G., Y.L.U., P.B., J.S., T.A.H., D.V.C., M.M., G.R., M.O.S., J.O., D.K., S.J., K.J., S.S.K., A.E.S., M.H.S., Y.A., J.E.K., I.O., W.W., K.E.M., K.O.M., C.T., Z.R., Y.G., W.J., J.L.H., M.A.J., A.K.W., R.K.P., J.C.F., R.W.H., S.G., M.O.W., P.A.N., J.P.C., R.K., T.S.M., R.S.K., D.J.K., I.K., J.B., L.P.M., P.J., P.K., L.A.A., H.R., E.P., J.G.E., T.C., U.H., J.O.K., K.P., T.T., L.R., B.Z., S.M., D.A., J.R.P., D.D.B., E.A.P., N.U., E.M.S., S.B.R., A.G., P.T.C., V.M.S., J.C.C., M.H., H.B., M.L.S., J.D.P., M.B.S., M.J.G., N.M., A.C., S.C.B., L.M., V.A., M.S., B.E.P., D.T.B., G.G.G., C.H.H., M.C.S., G.E.I., K.J.M., A.F.Z., J.K.G., K.A.S., F.L., K.O., Y.S., T.O.K., B.V.G., T.J.H., H.H., R.P., R.B.H., M.E.M., P.P., S.C.L., Y.Y., H.J.L., E.W., L.L., A.T.C., M.C.C., A.L., D.J.H., C.S., P.C.S., D.A.N., R.E.S., J.H., Z.K.S., P.E.V., L.V., V.V., N.P., D.S., A.E.T., S.D.M., S.J.C., F.v.D., E.J.M.F., M.G.D., A.W., A.N., B.A.P., L.M.F., L.S.C., S.O., C.K., C.I.L., R.L.P., C.X.Q., S.B.E., C.M.T., E.R.M., L.L.M., A.H.W., C.E.M., G.A.C., C.H., I.J.D., S.E.H., E.T., S.J.R., M.W., L.Y.O., M.A.D., T.U.S., T.Y., N.S., M.I., V.M., G.C., S.B.G., I.T., W.Z., M.D., R.S.H. and U.P. recruited patients and collected samples. C.F.R., M.N.T., P.J.L., S.L.S., V.D.O., C.P., S.E.B., V.S., K.D., S.M.F., P.G.V.S., J.L., Q.C., X.G., Y.L.U., P.B., J.S., J.R.H., T.A.H., D.V.C., C.H.D., M.D., F.R.S., M.M., G.R., M.O.S., W.W., J.L.H., D.D., J.P.C., R.K., R.S.K., D.J.K., K.P., D.A., S.J.W., E.A.R.N., J.R.P., E.A.P., K.V., N.U., E.M.S., P.T.C., J.C.C., M.H., H.B., M.L.S., M.J.G., A.C., S.C.B., L.M., B.E.P., M.C.S., G.E.I., A.F.Z., J.K.G., K.A.S., F.L., R.S., T.O.K., S.I.B., S.T., D.A.C., P.P., H.J.L., E.W., K.F.D., E.W.P., A.T.C., A.L., A.D.J., C.S., P.C.S., J.H., C.K.E., D.C.T., A.E.K., F.v.D., E.J.M.F., L.C.S., M.G.D., A.W., L.M.F., S.O., S.A.B., C.K., Y.L.I., C.X.Q., L.L.M., C.Q., C.E.M., S.E.H., E.T., S.J.R., V.M., G.C., S.B.G., I.T., W.Z., M.D., R.S.H. and U.P. carried out the molecular analysis. C.F.R., M.N.T., P.J.L., M.T., Z.C., S.L.S., V.D.O., L.H., J.F.T., C.P., K.I.S., V.S., K.D., JRH, M.M., F.M.N., K.P., A.N.S., A.B.K., C.K.E., W.J.G., D.C.T., Y.L.I., C.X.Q., C.Q., S.B.G., I.T., W.Z., M.D., R.S.H. and U.P. analyzed the data. C.F.R., M.N.T., P.J.L., M.T., Z.C., S.L.S., V.D.O., L.H., J.F.T., K.I.S., J.R.H., A.K.W., J.C.F., R.W.H., P.T.C., K.K.T., M.J.G., A.N.S., B.E.P., D.A.C., P.P., M.C.C., A.B.K., L.C.S., S.O., R.L.P., V.M., G.C., S.B.G., I.T., W.Z., M.D., R.S.H. and U.P. interpreted the data. All authors drafted or substantially revised the manuscript. C.F.R., V.M., S.B.G., I.T., M.D., R.S.H. and U.P. supervised the study and acquired funding.

Corresponding authors

Correspondence to Ian Tomlinson, Wei Zheng, Malcolm Dunlop, Richard Houlston or Ulrike Peters.

Ethics declarations

Competing interests

A.C. is a consultant to Bayer Pharma AG, Boehringer Ingelheim and Pfizer Inc. for work unrelated to this manuscript. A.S. is an employee at Insitro, including consulting fees from BMS. H.H. is SAB for Invitae Genetics, Promega and Genome Medical, Stock/Stock options for Genome Medical and GI OnDemand. J.K. is a consultant for Guardant Health. N.P. is a collaborator for Thrive and Exact, PGDx, CAGE, NeoPhore, Vidium and ManaTbio, and receives royalties for licensed technologies according to JHU rules. R.K.P. collaborates with Eli Lilly, AbbVie, Allergan, Verily and Alimentiv, which includes consulting fees (outside the submitted work). S.A.B. has financial interest in Adaptive Biotechnologies. S.B.G. is co-founder, Brogent International LLC. T.S.M. receives research and honoraria from Merck Serono. One of Z.K.S.’s immediate family members serves as a consultant in ophthalmology for Alcon, Adverum, Gyroscope Therapeutics Limited, Neurogene and RegenexBio (outside the submitted work). V.M. has research projects and owns stocks of Aniling. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Note with references.

Reporting Summary

Supplementary Tables

Legends and data for Supplementary Tables 1–21.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fernandez-Rozadilla, C., Timofeeva, M., Chen, Z. et al. Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries. Nat Genet 55, 89–99 (2023). https://doi.org/10.1038/s41588-022-01222-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01222-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing