Stylised illustration showing a virtual landscape of cancer tumours\n with markers identifying genetic information

Pan-Cancer Analysis of Whole Genomes

A collection of research and related content from the ICGC/TCGA consortium on whole‑genome sequencing and integrative analysis of cancer

Cancer is a disease of the genome, caused by a cell's acquisition of somatic mutations in key cancer genes. These mutations alter pathways involved in regulating cellular growth and interactions with the tissue environment. Until recently, research on the cancer genome was focused on protein-coding genes, which together account for only 1% of the genome. To address this issue, the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Project performed whole‑genome sequencing and integrative analysis on more than 2,600 primary cancers and their matching normal tissues across 38 distinct tumour types.

Flagship paper

Description of the PCAWG resource of >2,600 whole cancer genomes and their matching normal tissues across 38 tumour types, including data, portals, analysis pipelines and downstream integrative analyses. A full list of authors (pdf 482 kb) is available for download.

Icon representing a circos plot

Structural variation

One of the advantages of sequencing whole genomes is the ability to move beyond characterization of point mutations. Analysis of structural variations, including insertions, deletions, rearrangements and transposon sequences, gives a much richer and more accurate picture of the types of genomic lesions in a tumour. These papers analyse causes and patterns of structural variation in the PCAWG data set and shed light on their contribution to tumorigenesis, highlighting their potential clinical relevance.

Analysis of patterns and signatures of structural variants across PCAWG, identifying 16 signatures of structural variation, including a new set of replication-based processes generating clusters of several rearrangements.

Genomic rearrangements can alter the 3D chromatin organization inside the nucleus; this study describes the prevalence and effects of these mutations on chromatin folding domains and gene expression in human cancers.

A computational approach to study retrotransposons (‘jumping genes’) in the human genome finds that they can participate in the origin and development of some human tumours.

Chromothripsis is found to be much more prevalent across cancers than previously thought, with a frequency of >50% in several cancer types.

Icon representing a Evolutionary tree

Tumour evolution

Cancers cells are subject to selective forces shaped by mutation rates and the microenvironment, among other factors. PCAWG researchers use the information obtained from whole‑genome sequencing to delineate more precisely the parameters that influence tumour evolution, and how it shapes the cancer genome. Looking at cancer through an evolutionary lens can give clues into metastasis and therapy response and resistance.

By reconstructing the life history of cancers from their genomes, the study determines the evolutionary trajectories of cancers, showing that cancers develop over many years to sometimes even decades, and highlighting opportunities for early cancer detection.

A resource of oncogenic and tumour suppressor long noncoding RNAs reveals evidence for deep evolutionary conservation of their functions since human–mouse divergence.

Icon representing a trinucleotide frequency plot

Mutational signatures

Mutational signatures are particular changes in specific nucleotide contexts that reflect the collective actions of endogenous or exogenous mutagenic forces in combination with molecular repair processes. Through analysing whole genome sequences, a clearer picture of genome-wide mutation signature patterns can emerge, providing insights into the aetiology of carcinogenesis.

The characterization of 4,645 whole‑genome and 19,184 exome sequences, covering most types of cancer, identifies 81 single-base substitution, doublet-base substitution and small insertion‑and‑deletion mutational signatures, providing a systematic overview of the mutational processes that contribute to cancer development.

Analysis of mitochondrial genomes (mtDNA) using whole‑genome sequencing data from 2,658 cancer samples across 38 cancer types identifies hypermutated mtDNA cases, frequent somatic nuclear transfer of mtDNA and high variability of mtDNA copy number in many cancers.

Genomic characteristics are described that enable the identification of patients with alternative lengthening of telomeres from DDNA sequences with high specificity, with relevance for the development of new diagnostic and prognostic tests.

Cancers grow in different locations around the body, and these differ in their levels of oxygen; the study investigates how oxygen levels change the ways tumours grow, mutate, evolve and become lethal.

Icon representing a gene network

Cancer drivers

A goal of cancer genomics is to parse how alterations drive the development of cancer. Almost all identified driver mutations have been found in genes. Whole‑genome sequencing of tumours allows for the discovery of recurrent driver mutations in non-coding regions, representing an under-explored avenue for understanding cancer development and treatment.

A new framework for analysing non-coding drivers discovers new candidates and shows that they are less frequent than protein-coding disruptions.

Multi-faceted pathway and network analysis of 2,583 whole cancer genomes integrates non-coding and coding mutations across known and new cancer processes.

Viral landscape across 38 cancer types identifies known and new links to cancer aetiology.

Icon representing a heatmap

Gene regulation

Perturbation of normal gene‑expression programs can lead to cancer, but owing to the complexity of gene regulation, this can be challenging to analyse. Information from whole‑genome‑sequencing combined with transcriptome data allows for a more complete picture of the relationship between genome alterations and dysregulated transcription in cancer.

This study provides a comprehensive catalogue of RNA alterations in cancer, including gene expression, splicing, allelic expression and fusions, and associates them with DNA-level alterations identified from whole‑genome sequencing. Integrated analysis of DNA and RNA changes highlights the heterogeneous mechanisms of cancer gene alterations.

Hundreds of genes, including known cancer-associated genes, are found to have altered expression in conjuction with the nearby presence of a somatic structural variant (SV) breakpoint.

Icon representing two hand tools


PCAWG investigators have developed new methods and resources for high-throughput data analysis, to enable mining of whole‑genome cancer data sets. Using new approaches such as cloud-based computing or deep learning, a suite of tools is made available to the community for further exploration of cancer omics data.

Butler is an open source framework for large-scale analysis of scientific data with cloud computing, which applies continuous system monitoring and automated self-healing to deal with failures, allowing for 43% more efficient data processing than prior approaches.

ActivePathways is an integrative method for prioritizing target pathways and genes in complex multi-omics data sets such as coding and non-coding mutation data of cancer genomes.

Using machine learning, we can accurately discriminate 24 common tumour types based solely on their patterns of somatic mutation, potentially allowing us to determine the identity of tumours of uncertain primary using whole genome sequencing.

A new highly sensitive algorithm is described for distinguishing cancer driver from passenger mutations in whole‑genome and exome sequencing data.

SVclone is a computational method for inferring the cancer cell fraction of structural variant (SV) breakpoints from whole‑genome sequencing data.

A new method, TrackSig, uses mutational signatures to inform the accurate reconstruction of tumour subclones and their evolutionary trajectories.

Browse the collection

Stylised illustration showing a virtual landscape of cancer tumours with markers identifying genetic information
Nik Spencer/Nature

Browse the PCAWG publications in Nature, Nature Genetics, Nature Biotechnology, Nature Communications and Communications Biology. This dedicated collection compiles the PCAWG data sets, other resources and community-generated content, including News & Views, Comment, and an Editorial.