Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
Cancer is a disease of the genome, caused by a cell's acquisition of somatic mutations in key cancer genes. These mutations alter pathways involved in regulating cellular growth and interactions with the tissue environment. Until recently, research on the cancer genome was focused on protein-coding genes, which together account for only 1% of the genome. To address this issue, the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Project performed whole genome sequencing and integrative analysis on over 2,600 primary cancers and their matching normal tissues across 38 distinct tumor types. This study revealed the extensive role played by large-scale structural mutations in cancer, identified previously-unknown cancer-related mutations in gene regulatory regions, inferred tumor evolution across multiple cancer types, illuminated the interactions between somatic mutations and the transcriptome, and studied the role of germline genetic variants in modulating mutational processes. This collection comprises papers describing the core set of analyses conducted by the PCAWG Consortium, and showcases data, tools, and other resources useful for those who seek to further explore this legacy data set.
Browse the PCAWG publications and associated content, including News and Views, Comment, and Nature editorial. This dedicated collection compiles the PCAWG datasets, other resources and community-generated content.
The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 cancer whole genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
Whole-genome sequencing data from more than 2,500 cancers of 38 tumour types reveal 16 signatures that can be used to classify somatic structural variants, highlighting the diversity of genomic rearrangements in cancer.
The characterization of 4,645 whole-genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small-insertion-and-deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.
Whole-genome sequencing data for 2,778 cancer samples from 2,658 unique donors across 38 cancer types is used to reconstruct the evolutionary history of cancer, revealing that driver mutations can precede diagnosis by several years to decades.
Analysis of mitochondrial genomes (mtDNA) by using whole-genome sequencing data from 2,658 cancer samples across 38 cancer types identifies hypermutated mtDNA cases, frequent somatic nuclear transfer of mtDNA and high variability of mtDNA copy number in many cancers.
An analysis of 2,954 genomes from 38 cancer subtypes identified 19,166 retrotransposition events in 35% of samples. Aberrant LINE-1 retrotranspositions can lead to the deletion of tumor-suppressor genes as well as the amplification of oncogenes.
Viral pathogen load in cancer genomes is estimated through analysis of sequencing data from 2,656 tumors across 35 cancer types using multiple pathogen-detection pipelines, identifying viruses in 382 genomic and 68 transcriptome datasets.
Analysis of whole-genome sequencing data across 2,658 tumors spanning 38 cancer types shows that chromothripsis is pervasive, with a frequency of more than 50% in several cancer types, contributing to oncogene amplification, gene inactivation and cancer genome evolution.
Joana Carlevaro-Fita, Andrés Lanzós et al. present the Cancer LncRNA Census (CLC), a manually curated dataset of 122 long noncoding RNAs (lncRNAs) with experimentally-validated functions in cancer based on data from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. CLC lncRNAs have unique gene features, and a number display evidence for cancer-driving functions that are conserved from humans to mice.
Multi-omics datasets pose major challenges to data interpretation and hypothesis generation owing to their high-dimensional molecular profiles. Here, the authors develop ActivePathways method, which uses data fusion techniques for integrative pathway analysis of multi-omics data and candidate gene discovery.
Understanding deregulation of biological pathways in cancer can provide insight into disease etiology and potential therapies. Here, as part of the PanCancer Analysis of Whole Genomes (PCAWG) consortium, the authors present pathway and network analysis of 2583 whole cancer genomes from 27 tumour types.
Some cancer patients first present with metastases where the location of the primary is unidentified; these are difficult to treat. In this study, using machine learning, the authors develop a method to determine the tissue of origin of a cancer based on whole sequencing data.
In this study the authors consider the structural variants (SVs) present within cancer cases of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. They report hundreds of genes, including known cancer-associated genes for which the nearby presence of a SV breakpoint is associated with altered expression.
In somatic cells the mechanisms maintaining the chromosome ends are normally inactivated; however, cancer cells can re-activate these pathways to support continuous growth. Here, the authors characterize the telomeric landscapes across tumour types and identify genomic alterations associated with different telomere maintenance mechanisms.
Analysis of cancer genome sequencing data has enabled the discovery of driver mutations. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium the authors present DriverPower, a software package that identifies coding and non-coding driver mutations within cancer whole genomes via consideration of mutational burden and functional impact evidence.
PCAWG Drivers and Functional Interpretation Working Group ⋯
Many tumours exhibit hypoxia (low oxygen) and hypoxic tumours often respond poorly to therapy. Here, the authors quantify hypoxia in 1188 tumours from 27 cancer types, showing elevated hypoxia links to increased mutational load, directing evolutionary trajectories.
Cancers evolve as they progress under differing selective pressures. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, the authors present the method TrackSig the estimates evolutionary trajectories of somatic mutational processes from single bulk tumour data.
There’s an emerging body of evidence to show how biological sex impacts cancer incidence, treatment and underlying biology. Here, using a large pan-cancer dataset, the authors further highlight how sex differences shape the cancer genome.
With the generation of large pan-cancer whole-exome and whole-genome sequencing projects, a question remains about how comparable these datasets are. Here, using The Cancer Genome Atlas samples analysed as part of the Pan-Cancer Analysis of Whole Genomes project, the authors explore the concordance of mutations called by whole exome sequencing and whole genome sequencing techniques.