The genomic landscape of 2,023 colorectal cancers

Colorectal carcinoma (CRC) is a common cause of mortality1, but a comprehensive description of its genomic landscape is lacking2–9. Here we perform whole-genome sequencing of 2,023 CRC samples from participants in the UK 100,000 Genomes Project, thereby providing a highly detailed somatic mutational landscape of this cancer. Integrated analyses identify more than 250 putative CRC driver genes, many not previously implicated in CRC or other cancers, including several recurrent changes outside the coding genome. We extend the molecular pathways involved in CRC development, define four new common subgroups of microsatellite-stable CRC based on genomic features and show that these groups have independent prognostic associations. We also characterize several rare molecular CRC subgroups, some with potential clinical relevance, including cancers with both microsatellite and chromosomal instability. We demonstrate a spectrum of mutational profiles across the colorectum, which reflect aetiological differences. These include the role of Escherichia colipks+ colibactin in rectal cancers10 and the importance of the SBS93 signature11–13, which suggests that diet or smoking is a risk factor. Immune-escape driver mutations14 are near-ubiquitous in hypermutant tumours and occur in about half of microsatellite-stable CRCs, often in the form of HLA copy number changes. Many driver mutations are actionable, including those associated with rare subgroups (for example, BRCA1 and IDH1), highlighting the role of whole-genome sequencing in optimizing patient care.


nature portfolio | reporting summary
April 2023

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.neoadjuvant therapy may be under-represented owing to a very small cancer or impure sample following that therapy.
Participant recruitment was by NHS staff.Recruitment was open to all patients with colorectal carcinoma who were able to provide informed consent.Small biases are likely based on patient willingness to take part in research, and also clinical features (e.g.patients presenting as emergencies were likely to be under-recruited).
Ethical approval was provided to the 100,000 Genomes Project by the HRA Committee East of England -Cambridge South research ethics committee (REC Ref 14/EE/1112).Samples were obtained as part of the 100kGP cancer programme, an initiative for high throughput tumour sequencing for NHS cancer patients.Patient recruitment was organised by 13 Genomic Medicine Centres (GMCs) and their affiliated hospitals across England.All patients provided written informed consent.Study oversight was subsequently undertaken by Genomics England through regular reporting updates to the GeCIP steering committee and data Airlock committee.
Sample size was determined by the recruitment achieved by NHS staff, by availability of tumour and matched normal samples for DNA extraction, and by quality control thereafter in terms of DNA extraction.In addition, some samples were excluded from copy number analysis owing to failure to establish a fit to reported purity metrics.
Exclusions were based on low sample purity, standard sequencing quality metrics, and availability of clinicopathological data (for sub-studies).Specific sequence data were excluded from regions of duplications or repeats, low mappability, or seuencing chemistry errors (e.g.strand bias).All criteria were based on standards or norms in the field, although some additional exclusions were made ad hoc based on our own findings.
Comparisons with previous work in the field were performed wherever possible.Almost all the common colorectal cancer driver mutations and copy number alterations found by other studies were also found by us, and there was overlap with previously reported mutational signatures.However, we only replicated ~7% of previously reported drivers and some signatures were present at much higher frequencies or absent in our data compared with other data sets.We make relevant comparisons with previous data at various points in the manuscript.Since some of our discoveries were of uncommon mutations or cancer sub-groups, we did not sub-divide our study into test and validation patient sets.We did, however, test the stability of mutational signatures and derived clusters by analyses of random sub-sets of the data.
This was not an intervention-based study and hence randomisation is inappropriate.
N/A.The study has no assessments or procedures that are appropriate for blinding.

nature portfolio | reporting summary
April 2023

Clinical data Policy information about clinical studies
All manuscripts should comply with the ICMJEguidelines for publication of clinical research and a completedCONSORT checklist must be included with all submissions.

Data collection
Outcomes N/A This is is described in in https://www.bmj.com/content/361/bmj.k1687 Within Genomics England Genomic Medicine Centres and their satellite hospitals, with central data collection by by Genomics ENgland core team.
Certain studies have utilised overall survival as as an an outcome.Other outcomes include fundamental measures found on on the histopathological reporting proforma for colorectal malignancy, e.g.stage.