The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 cancer whole genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.
Pan-Cancer Analysis of Whole Genomes
Cancer is a disease of the genome, caused by a cell's acquisition of somatic mutations in key cancer genes. These mutations alter pathways involved in regulating cellular growth and interactions with the tissue environment. Until recently, research on the cancer genome was focused on protein-coding genes, which together account for only 1% of the genome. To address this issue, the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Project performed whole genome sequencing and integrative analysis on over 2,600 primary cancers and their matching normal tissues across 38 distinct tumor types. This study revealed the extensive role played by large-scale structural mutations in cancer, identified previously-unknown cancer-related mutations in gene regulatory regions, inferred tumor evolution across multiple cancer types, illuminated the interactions between somatic mutations and the transcriptome, and studied the role of germline genetic variants in modulating mutational processes. This collection comprises papers describing the core set of analyses conducted by the PCAWG Consortium, and showcases data, tools, and other resources useful for those who seek to further explore this legacy data set.
Browse the PCAWG publications and associated content, including News and Views, Comment, and Nature editorial. This dedicated collection compiles the PCAWG datasets, other resources and community-generated content.
Whole-genome sequencing data from more than 2,500 cancers of 38 tumour types reveal 16 signatures that can be used to classify somatic structural variants, highlighting the diversity of genomic rearrangements in cancer.
The characterization of 4,645 whole-genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small-insertion-and-deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.
Whole-genome sequencing data for 2,778 cancer samples from 2,658 unique donors across 38 cancer types is used to reconstruct the evolutionary history of cancer, revealing that driver mutations can precede diagnosis by several years to decades.
Integrative analyses of transcriptome and whole-genome sequencing data for 1,188 tumours across 27 types of cancer are used to provide a comprehensive catalogue of RNA-level alterations in cancer.
Analyses of 2,658 whole genomes across 38 types of cancer identify the contribution of non-coding point mutations and structural variants to driving cancer.
Analysis of mitochondrial genomes (mtDNA) by using whole-genome sequencing data from 2,658 cancer samples across 38 cancer types identifies hypermutated mtDNA cases, frequent somatic nuclear transfer of mtDNA and high variability of mtDNA copy number in many cancers.
A pan-cancer genomic analysis reports the effects of structural variations on chromatin domains (TADs). Most TAD disruptions do not result in appreciable changes in expression of nearby genes.
Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition
An analysis of 2,954 genomes from 38 cancer subtypes identified 19,166 retrotransposition events in 35% of samples. Aberrant LINE-1 retrotranspositions can lead to the deletion of tumor-suppressor genes as well as the amplification of oncogenes.
Viral pathogen load in cancer genomes is estimated through analysis of sequencing data from 2,656 tumors across 35 cancer types using multiple pathogen-detection pipelines, identifying viruses in 382 genomic and 68 transcriptome datasets.
Analysis of whole-genome sequencing data across 2,658 tumors spanning 38 cancer types shows that chromothripsis is pervasive, with a frequency of more than 50% in several cancer types, contributing to oncogene amplification, gene inactivation and cancer genome evolution.
Efficient, large-scale genomic analysis is facilitated on the cloud by a computational tool with error-diagnosing and self-healing capabilities.
Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis
Joana Carlevaro-Fita, Andrés Lanzós et al. present the Cancer LncRNA Census (CLC), a manually curated dataset of 122 long noncoding RNAs (lncRNAs) with experimentally-validated functions in cancer based on data from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. CLC lncRNAs have unique gene features, and a number display evidence for cancer-driving functions that are conserved from humans to mice.
Multi-omics datasets pose major challenges to data interpretation and hypothesis generation owing to their high-dimensional molecular profiles. Here, the authors develop ActivePathways method, which uses data fusion techniques for integrative pathway analysis of multi-omics data and candidate gene discovery.
Understanding deregulation of biological pathways in cancer can provide insight into disease etiology and potential therapies. Here, as part of the PanCancer Analysis of Whole Genomes (PCAWG) consortium, the authors present pathway and network analysis of 2583 whole cancer genomes from 27 tumour types.
A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns
Some cancer patients first present with metastases where the location of the primary is unidentified; these are difficult to treat. In this study, using machine learning, the authors develop a method to determine the tissue of origin of a cancer based on whole sequencing data.
High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations
In this study the authors consider the structural variants (SVs) present within cancer cases of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. They report hundreds of genes, including known cancer-associated genes for which the nearby presence of a SV breakpoint is associated with altered expression.
In somatic cells the mechanisms maintaining the chromosome ends are normally inactivated; however, cancer cells can re-activate these pathways to support continuous growth. Here, the authors characterize the telomeric landscapes across tumour types and identify genomic alterations associated with different telomere maintenance mechanisms.
Analysis of cancer genome sequencing data has enabled the discovery of driver mutations. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium the authors present DriverPower, a software package that identifies coding and non-coding driver mutations within cancer whole genomes via consideration of mutational burden and functional impact evidence.
The authors present SVclone, a computational method for inferring the cancer cell fraction of structural variants from whole-genome sequencing data.
Many tumours exhibit hypoxia (low oxygen) and hypoxic tumours often respond poorly to therapy. Here, the authors quantify hypoxia in 1188 tumours from 27 cancer types, showing elevated hypoxia links to increased mutational load, directing evolutionary trajectories.
A collection of papers and related content across Nature Research explores insights from the PCAWG project
The future of cancer genomics lies in the clinic.
Efforts to protect people’s privacy in a massive international cancer project offer lessons for data sharing.
A massive international effort has yielded multifaceted studies of more than 2,600 tumours from 38 tissues, generating a wealth of insights into the genetic basis of cancer.