$Stylised illustration showing a virtual landscape of cancer tumours\n with markers identifying genetic information$

Pan-Cancer Analysis of Whole Genomes

A collection of research and related content from the ICGC/TCGA consortium on whole‑genome sequencing and integrative analysis of cancer

Cancer is a disease of the genome, caused by a cell's acquisition of somatic mutations in key cancer genes. These mutations alter pathways involved in regulating cellular growth and interactions with the tissue environment. Until recently, research on the cancer genome was focused on protein-coding genes, which together account for only 1% of the genome. To address this issue, the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Project performed whole‑genome sequencing and integrative analysis on more than 2,600 primary cancers and their matching normal tissues across 38 distinct tumour types.

Flagship paper

Nature

Pan-cancer analysis of whole genomes

Description of the PCAWG resource of >2,600 whole cancer genomes and their matching normal tissues across 38 tumour types, including data, portals, analysis pipelines and downstream integrative analyses. A full list of authors (pdf 482 kb) is available for download.

Structural variation

One of the advantages of sequencing whole genomes is the ability to move beyond characterization of point mutations. Analysis of structural variations, including insertions, deletions, rearrangements and transposon sequences, gives a much richer and more accurate picture of the types of genomic lesions in a tumour. These papers analyse causes and patterns of structural variation in the PCAWG data set and shed light on their contribution to tumorigenesis, highlighting their potential clinical relevance.

Nature

Patterns of somatic structural variation in human cancer genomes

Analysis of patterns and signatures of structural variants across PCAWG, identifying 16 signatures of structural variation, including a new set of replication-based processes generating clusters of several rearrangements.

Nature Genetics

Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer

Genomic rearrangements can alter the 3D chromatin organization inside the nucleus; this study describes the prevalence and effects of these mutations on chromatin folding domains and gene expression in human cancers.

Nature Genetics

Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition

A computational approach to study retrotransposons (‘jumping genes’) in the human genome finds that they can participate in the origin and development of some human tumours.

Nature Genetics

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole‑genome sequencing

Chromothripsis is found to be much more prevalent across cancers than previously thought, with a frequency of >50% in several cancer types.

Tumour evolution

Cancers cells are subject to selective forces shaped by mutation rates and the microenvironment, among other factors. PCAWG researchers use the information obtained from whole‑genome sequencing to delineate more precisely the parameters that influence tumour evolution, and how it shapes the cancer genome. Looking at cancer through an evolutionary lens can give clues into metastasis and therapy response and resistance.

Nature

The evolutionary history of 2,658 cancers

By reconstructing the life history of cancers from their genomes, the study determines the evolutionary trajectories of cancers, showing that cancers develop over many years to sometimes even decades, and highlighting opportunities for early cancer detection.

Communications Biology

Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis

A resource of oncogenic and tumour suppressor long noncoding RNAs reveals evidence for deep evolutionary conservation of their functions since human–mouse divergence.

Mutational signatures

Mutational signatures are particular changes in specific nucleotide contexts that reflect the collective actions of endogenous or exogenous mutagenic forces in combination with molecular repair processes. Through analysing whole genome sequences, a clearer picture of genome-wide mutation signature patterns can emerge, providing insights into the aetiology of carcinogenesis.

Nature

The repertoire of mutational signatures in human cancer

The characterization of 4,645 whole‑genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small insertion‑and‑deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.

Nature Genetics

Comprehensive molecular characterization of mitochondrial genomes in human cancers

Analysis of mitochondrial genomes (mtDNA) using whole‑genome sequencing data from 2,658 cancer samples across 38 cancer types identifies hypermutated mtDNA cases, frequent somatic nuclear transfer of mtDNA and high variability of mtDNA copy number in many cancers.

Nature Communications

Genomic footprints of activated telomere maintenance mechanisms in cancer

Genomic characteristics are described that enable the identification of patients with alternative lengthening of telomeres from DDNA sequences with high specificity, with relevance for the development of new diagnostic and prognostic tests.

Nature Communications

Divergent mutational processes distinguish hypoxic and normoxic tumours

Cancers grow in different locations around the body, and these differ in their levels of oxygen; the study investigates how oxygen levels change the ways tumours grow, mutate, evolve and become lethal.

Cancer drivers

A goal of cancer genomics is to parse how alterations drive the development of cancer. Almost all identified driver mutations have been found in genes. Whole‑genome sequencing of tumours allows for the discovery of recurrent driver mutations in non-coding regions, representing an under-explored avenue for understanding cancer development and treatment.

Nature

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes

A new framework for analysing non-coding drivers discovers new candidates and shows that they are less frequent than protein-coding disruptions.

Nature Communications

Pathway and network analysis of more than 2,500 whole cancer genomes

Multi-faceted pathway and network analysis of 2,583 whole cancer genomes integrates non-coding and coding mutations across known and new cancer processes.

Nature Genetics

The landscape of viral associations in human cancers

Viral landscape across 38 cancer types identifies known and new links to cancer aetiology.

Gene regulation

Perturbation of normal gene‑expression programs can lead to cancer, but owing to the complexity of gene regulation, this can be challenging to analyse. Information from whole‑genome‑sequencing combined with transcriptome data allows for a more complete picture of the relationship between genome alterations and dysregulated transcription in cancer.

Nature

Genomic basis for RNA alterations in cancer

This study provides a comprehensive catalogue of RNA alterations in cancer, including gene expression, splicing, allelic expression and fusions, and associates them with DNA-level alterations identified from whole‑genome sequencing. Integrated analysis of DNA and RNA changes highlights the heterogeneous mechanisms of cancer gene alterations.

Nature Communications

High-coverage whole‑genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations

Hundreds of genes, including known cancer-associated genes, are found to have altered expression in conjuction with the nearby presence of a somatic structural variant (SV) breakpoint.

Tools

PCAWG investigators have developed new methods and resources for high-throughput data analysis, to enable mining of whole‑genome cancer data sets. Using new approaches such as cloud-based computing or deep learning, a suite of tools is made available to the community for further exploration of cancer omics data.

Nature Biotechnology

Butler enables rapid cloud-based analysis of thousands of human genomes

Butler is an open source framework for large-scale analysis of scientific data with cloud computing, which applies continuous system monitoring and automated self-healing to deal with failures, allowing for 43% more efficient data processing than prior approaches.

Nature Communications

Integrative pathway enrichment analysis of multivariate omics data

ActivePathways is an integrative method for prioritizing target pathways and genes in complex multi-omics data sets such as coding and non-coding mutation data of cancer genomes.

Nature Communications

A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns

Using machine learning, we can accurately discriminate 24 common tumour types based solely on their patterns of somatic mutation, potentially allowing us to determine the identity of tumours of uncertain primary using whole genome sequencing.

Nature Communications

Combined burden and functional impact tests for cancer driver discovery using DriverPower

A new highly sensitive algorithm is described for distinguishing cancer driver from passenger mutations in whole‑genome and exome sequencing data.

Nature Communications

Inferring structural variant cancer cell fraction

SVclone is a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole‑genome sequencing data.

Nature Communications

Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig

A new method, TrackSig, uses mutational signatures to inform the accurate reconstruction of tumour subclones and their evolutionary trajectories.

Browse the collection

Stylised illustration showing a virtual landscape of cancer tumours with markers identifying genetic information — Nik Spencer/*Nature*

Pan-cancer analysis of whole genomes collection

Browse the PCAWG publications in Nature, Nature Genetics, Nature Biotechnology, Nature Communications and Communications Biology. This dedicated collection compiles the PCAWG data sets, other resources and community-generated content, including News & Views, Comment, and an Editorial.

Consortium member list

A full list of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium working group members and tissue providers (pdf 274 kb) is available for download.