Chromosomal instability in cancer consists of dynamic changes to the number and structure of chromosomes1,2. The resulting diversity in somatic copy number alterations (SCNAs) may provide the variation necessary for tumour evolution1,3,4. Here we use multi-sample phasing and SCNA analysis of 1,421 samples from 394 tumours across 22 tumour types to show that continuous chromosomal instability results in pervasive SCNA heterogeneity. Parallel evolutionary events, which cause disruption in the same genes (such as BCL9, MCL1, ARNT (also known as HIF1B), TERT and MYC) within separate subclones, were present in 37% of tumours. Most recurrent losses probably occurred before whole-genome doubling, that was found as a clonal event in 49% of tumours. However, loss of heterozygosity at the human leukocyte antigen (HLA) locus and loss of chromosome 8p to a single haploid copy recurred at substantial subclonal frequencies, even in tumours with whole-genome doubling, indicating ongoing karyotype remodelling. Focal amplifications that affected chromosomes 1q21 (which encompasses BCL9, MCL1 and ARNT), 5p15.33 (TERT), 11q13.3 (CCND1), 19q12 (CCNE1) and 8q24.1 (MYC) were frequently subclonal yet appeared to be clonal within single samples. Analysis of an independent series of 1,024 metastatic samples revealed that 13 focal SCNAs were enriched in metastatic samples, including gains in chromosome 8q24.1 (encompassing MYC) in clear cell renal cell carcinoma and chromosome 11q13.3 (encompassing CCND1) in HER2+ breast cancer. Chromosomal instability may enable the continuous selection of SCNAs, which are established as ordered events that often occur in parallel, throughout tumour evolution.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Cell Death Discovery Open Access 26 November 2022
Nature Biotechnology Open Access 24 November 2022
SGOL2 is a novel prognostic marker and fosters disease progression via a MAD2-mediated pathway in hepatocellular carcinoma
Biomarker Research Open Access 15 November 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
TRACERx sequencing datasets used in this paper are described in previous studies7,39. Details of all other datasets obtained from third parties used in this study can be found in Supplementary Table 1. Clinical trial information (if applicable) is also available within the associated publications described in Supplementary Table 1.
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Bolhaqueiro, A. C. F. et al. Ongoing chromosomal instability and karyotype evolution in human colorectal cancer organoids. Nat. Genet. 51, 824–834 (2019).
Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).
Turajlic, S. et al. Deterministic evolutionary trajectories influence primary tumor growth: TRACERx Renal. Cell 173, 595–610 (2018).
McGranahan, N. et al. Cancer chromosomal instability: therapeutic and diagnostic challenges. ‘Exploring aneuploidy: the significance of chromosomal imbalance’ review series. EMBO Rep. 13, 528–538 (2012).
Schwarz, R. F. et al. Spatial and temporal heterogeneity in high-grade serous ovarian cancer: a phylogenetic analysis. PLoS Med. 12, e1001789 (2015).
Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).
Hieronymus, H. et al. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. eLife 7, e37294 (2018).
Carter, S. et al. A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, 1043–1048 (2006).
Schwarz, R. F. et al. Phylogenetic quantification of intra-tumour heterogeneity, PLOS Comput. Biol. 10, e1003535 (2014).
von der Thüsen, J. H. et al. Prognostic significance of predominant histologic pattern and nuclear grade in resected adenocarcinoma of the lung: potential parameters for a grading system. J. Thorac. Oncol. 8, 37–44 (2013).
Kadota, K. et al. Comprehensive pathological analyses in lung squamous cell carcinoma: single cell invasion, nuclear diameter, and tumor budding are independent prognostic factors for worse outcomes. J. Thorac. Oncol. 9, 1126–1139 (2014).
Laughney, A. M., Elizalde, S., Genovese, G. & Bakhoum, S. F. Dynamics of tumor heterogeneity derived from clonal karyotypic evolution. Cell Rep. 12, 809–820 (2015).
Elizalde, S., Laughney, A. M. & Bakhoum, S. F. A Markov chain for numerical chromosomal instability in clonally expanding populations. PLOS Comput. Biol. 14, e1006447 (2018).
Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209–216 (2015).
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
López, S. et al. Interplay between whole-genome doubling and the accumulation of deleterious alterations in cancer evolution. Nat. Genet. 52, 283–293 (2020).
Fujiwara, T. et al. Cytokinesis failure generating tetraploids promotes tumorigenesis in p53-null cells. Nature 437, 1043–1047 (2005).
Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 50, 1189–1195 (2018).
McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell 171, 1259–1271 (2017).
Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014).
Kim, M. et al. Comparative oncogenomics identifies NEDD9 as a melanoma metastasis gene. Cell 125, 1269–1281 (2006).
Cai, Y. et al. Loss of chromosome 8p governs tumor progression and drug response by altering lipid metabolism. Cancer Cell 29, 751–766 (2016).
Bakhoum, S. F. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018).
Lackner, C. et al. Convergent evolution of copy number alterations in multi-centric hepatocellular carcinoma. Sci. Rep. 9, 4611 (2019).
Jakubek, Y. A. et al. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer. Nat. Biotechnol. 38, 90–96 (2020).
Zaccaria, S. & Raphael, B. J. Characterizing the allele- and haplotype-specific copy number landscape of cancer genomes at single-cell resolution with CHISEL. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0661-6 (2020).
Shih, D. J. H. et al. Genomic characterization of human brain metastases identifies drivers of metastatic lung adenocarcinoma. Nat. Genet. 52, 371–377 (2020).
Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).
Worrall, J. T. et al. Non-random mis-segregation of human chromosomes. Cell Rep. 23, 3366–3380 (2018).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015).
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Yates, L. R. et al. Genomic evolution of breast cancer metastasis and relapse. Cancer Cell 32, 169–184 (2017).
Mitchell, T. J. et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell 173, 611–623 (2018).
Martinez, P. et al. Parallel evolution of tumour subclones mimics diversity between tumours. J. Pathol. 230, 356–364 (2013).
Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Castel, S. E., Mohammadi, P., Chung, W. K., Shen, Y. & Lappalainen, T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016).
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).
Hartigan, J. A. & Hartigan, P. M. The dip test of unimodality. Ann. Stat. 13, 70–84 (1985).
Maechler, M. diptest: Hartigan’s dip test statistic for unimodality—corrected. R package version 0.75-7 https://cran.r-project.org/package=diptest (2015).
Wolff, A. C. et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J. Clin. Oncol. 31, 3997–4013 (2013).
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Fungtammasan, A., Walsh, E., Chiaromonte, F., Eckert, K. A. & Makova, K. D. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome? Genome Res. 22, 993–1005 (2012).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Cheng, J. et al. Single-cell copy number variation detection. Genome Biol. 12, R80 (2011).
Whitfield, M. L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13, 1977–2000 (2002).
Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 545, 446–451 (2017).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Moulos, P. & Hatzis, P. Systematic integration of RNA-seq statistical algorithms for accurate detection of differential gene expression patterns. Nucleic Acids Res. 43, e25 (2015).
T.B.K.W. was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001169), the UK Medical Research Council (FC001169) and the Wellcome Trust (FC001169) as well as the Marie Curie ITN Project PLOIDYNET (FP7-PEOPLE-2013, 607722), Breast Cancer Research Foundation (BCRF), Royal Society Research Professorships Enhancement Award (RP/EA/180007) and the Foulkes Foundation. E.L.L. receives funding from NovoNordisk Foundation (ID 16584). N.J.B. is a fellow of the Lundbeck Foundation and acknowledges funding from the Aarhus University Research Foundation. E.G. is funded by the European Research Council, FP7-THESEUS-617844 and PROTEUS-835297. J.D. is a postdoctoral fellow of the Research Foundation–Flanders (FWO) and the European Union’s Horizon 2020 research and innovation program (Marie Skłodowska-Curie grant agreement no. 703594-DECODE). R.R. is supported by Royal Society Research Professorships Enhancement Award (RP/EA/180007). K.L. is supported by a UK Medical Research Council Skills Development Fellowship Award (grant number MR/P014712/1). L.Y. was funded by a Wellcome Trust Clinical Career Development Fellowship 214584/Z/18/Z and CRUK Early Detection Pump Prime Award. B.C.B. is supported by an NCI Outstanding Investigatory Award (1R35CA220481). G.B.J. is supported by the Swedish Cancer Society, Swedish Research Council and the Berta Kamprad Foundation. S.L. is supported by the National Breast Cancer Foundation of Australia Endowed Chair and the Breast Cancer Research Foundation, New York. N.M.L. and G.D.C. were supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC010110), the UK Medical Research Council (FC010110) and the Wellcome Trust (FC010110). S.T. is funded by Cancer Research UK (grant number C50947/A18176), the National Institute for Health Research (NIHR) Biomedical Research Centre at The Royal Marsden Hospital and Institute of Cancer Research (grant number A109), the Kidney and Melanoma Cancer Fund of The Royal Marsden Cancer Charity, and The Rosetrees Trust (grant number A2204). M.J.-H. has received funding from Cancer Research UK, National Institute for Health Research, Rosetrees Trust, UKI NETs and NIHR University College London Hospitals Biomedical Research Centre. P.V.L. is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202) and the Wellcome Trust (FC001202) and is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute. S.F.B. is supported by the Office of the Director, the National Institutes of Health under award number DP5OD026395 High-Risk High-Reward Program, the Department of Defense Breast Cancer Research Breakthrough Award W81XWH-16-1-0315 (project: BC151244), the Burroughs Wellcome Fund Career Award for Medical Scientists, the Parker Institute for Immunotherapy at MSKCC, the Josie Robertson Foundation and MSKCC core grant P30-CA008748. R.F.S. and M.P. thank the Helmholtz Association (Germany) for support. N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (Grant Number 211179/Z/18/Z) and also receives funding from Cancer Research UK, Rosetrees and the NIHR BRC at University College London Hospitals and the CRUK University College London Experimental Cancer Medicine Centre. C.S. is Royal Society Napier Research Professor. His work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001169), the UK Medical Research Council (FC001169), and the Wellcome Trust (FC001169). C.S. is funded by Cancer Research UK (TRACERx, PEACE and CRUK Cancer Immunotherapy Catalyst Network), Cancer Research UK Lung Cancer Centre of Excellence, the Rosetrees Trust, Butterfield and Stoneygate Trusts, NovoNordisk Foundation (ID16584), Royal Society Research Professorships Enhancement Award (RP/EA/180007), the NIHR BRC at University College London Hospitals, the CRUK-UCL Centre, Experimental Cancer Medicine Centre and the Breast Cancer Research Foundation (BCRF). This research is supported by a Stand Up To Cancer-LUNGevity-American Lung Association Lung Cancer Interception Dream Team Translational Research Grant (SU2C-AACR-DT23-17). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. C.S. also receives funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) Consolidator Grant (FP7-THESEUS-617844), European Commission ITN (FP7-PloidyNet 607722), an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (835297) and Chromavision from the European Union’s Horizon 2020 research and innovation programme (665233). The results published here are based in part on data generated by The Cancer Genome Atlas pilot project established by the NCI and the National Human Genome Research Institute. The data were retrieved through database of Genotypes and Phenotypes (dbGaP) authorization (accession number phs000178.v9.p8). Information about TCGA and the constituent investigators and institutions of the TCGA research network can be found at http://cancergenome.nih.gov/. This project was enabled through access to the MRC eMedLab Medical Bioinformatics infrastructure, supported by the Medical Research Council (MR/L016311/1). In particular, we acknowledge the support of the High-Performance Computing at the Francis Crick Institute as well as the UCL Department of Computer Science Cluster and the support team. This publication and the underlying study have been made possible partly on the basis of the data that the Hartwig Medical Foundation and the Center of Personalised Cancer Treatment (CPCT-02, NCT01855477) and DRUP clinical study (NCT02925234) have made available to the project.
G.A.W. has consulted for and has stock options in Achilles Therapeutics. D.A.M. reports speaker fees from AstraZeneca. M.A.B. has consulted for Achilles Therapeutics. C.V. has received travel expenses from Astellas, Roche and Pfizer, and grant support from Bristol Myers Squibb. R.R. has consulted for and has stock options in Achilles Therapeutics. K.L. reports speaker fees from Roche Tissue Diagnostics. P.K.B. has consulted for Angiochem, Roche-Genentech, Eli Lilly, Tesaro, ElevateBio, Pfizer (Array), and received grant or research support from Merck, Bristol Myers Squibb and Eli Lilly and honoraria from Merck, Roche-Genentech and Eli Lilly. L.D. has sponsored research agreements with C2i-genomics, Natera, AstraZeneca and Ferring, and has an advisory/consulting role at Ferring. P.S. serves an uncompensated consultant for Roche-Genentech. S.L. receives research funding to her institution from Novartis, Bristol Myers Squibb, Merck, Roche-Genentech, Puma Biotechnology, Pfizer, Eli Lilly and Seattle Genetics, has acted as consultant (not compensated) to Seattle Genetics, Pfizer, Novartis, Bristol Myers Squibb, Merck, AstraZeneca and Roche-Genentech and has acted as consultant (paid to her institution) to Aduro Biotech, Novartis, GlaxoSmithKline and G1 Therapeutics. F.A. is a member of the Advisory Boards for Pfizer, AstraZeneca, Eli Lilly, Roche-Genentech, Novartis and Daiichi Sankyo, acknowledges grant support from Pfizer, AstraZeneca, Eli Lilly, Novartis and Daiichi Sankyo and is a co-founder of Pegacsy. V.C.G.T.-H. reports grants and personal fees from Pfizer, Roche, Novartis and Eli Lilly, grants from Eisai and personal fees from Accord. S.T. has received funding from Ventana Medical Systems Inc (grant numbers 10467 and 10530), has received speaking fees from Roche, AstraZeneca, Novartis and Ipsen and has the following European and US patent filed: Indel mutations as a therapeutic target and predictive biomarker (PCTGB2018/051892) and European patent: Clear Cell Renal Cell Carcinoma Biomarkers (P113326GB). M.J.-H. is a member of the Advisory Board for Achilles Therapeutics. S.F.B. holds a patent related to some of the work described targeting CIN and the cGAS-STING pathway in advanced cancer, owns equity in, receives compensation from and serves as a consultant and on the Scientific Advisory Board and Board of Directors of Volastra Therapeutics, and has also consulted for Sanofi, received sponsored travel from the Prostate Cancer Foundation, and both travel and compensation from Cancer Research UK. N.M. has stock options in and has consulted for Achilles Therapeutics and holds a European patent in determining HLA LOH (PCT/GB2018/052004). C.S. acknowledges grant support from Pfizer, AstraZeneca, Bristol Myers Squibb, Roche-Ventana, Boehringer-Ingelheim, Archer Dx Inc (collaboration in minimal residual disease sequencing technologies) and Ono Pharmaceutical, is an AstraZeneca Advisory Board Member and Chief Investigator for the MeRmaiD1 clinical trial, has consulted for Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol Myers Squibb, Celgene, AstraZeneca, Illumina, Genentech, Roche-Ventana, GRAIL, Medicxi and the Sarah Cannon Research Institute, has stock options in Apogen Biotechnologies, Epic Bioscience, GRAIL, and has stock options and is co-founder of Achilles Therapeutics. C.S. holds European patents relating to assay technology to detect tumour recurrence (PCT/GB2017/053289); to targeting neoantigens (PCT/EP2016/059401), identifying patent response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients who respond to cancer treatment (PCT/GB2018/051912), a US patent relating to detecting tumour mutations (PCT/US2017/28013) and both a European and US patent related to identifying insertion/deletion mutation targets (PCT/GB2018/051892).
Peer review information Nature thanks Rameen Beroukhim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Schematic of the analyses of allele-specific copy number alterations. Left, the SCNA profiles across the genome for the two samples of a tumour (red, A allele; blue, B allele), with raw allele-specific copy number values for heterozygous SNPs shown as points and inferred allele-specific integer copy number states as lines. The clonality of the SCNAs across the two samples is indicated by a track between the two SCNA profiles, with clonal SCNAs indicated in grey, subclonal SCNAs in yellow and both clonal and subclonal SCNAs in dashed yellow and grey. All SCNA profile plots in the figure are scaled by the number of data points per chromosome. Top right, the approach to summarise SCNA timing (clonal versus subclonal) from the tumour. Bottom right, the integer SCNA profile across the genome of the inferred MRCA based on the integer SCNA profiles of the two samples of the tumour. b, c, Multi-sample phasing (b) and SCNA calling relative to ploidy (c). b, Multi-sample phasing is the method that we used to obtain allele-specific copy number profiles. This allowed us to identify previously undetected allelic imbalance (yellow boxes), and mirrored subclonal allelic imbalance and parallel SCNAs (purple boxes). c, Chromosomal illustrations and nomenclature of various SCNAs. As SCNAs are reported relative to ploidy, illustrations are provided for the diploid, triploid and tetraploid states. AI, allelic imbalance. d, e, Pan-cancer cohort characteristics. Our pan-cancer multi-sample cohort is summarised by tumour type in these bar plots, indicating the total number of patients (d) with the bar plot coloured according to the number of samples each tumour contributes, and tumour samples (e) with the bar plot coloured according to the type of sample.
a, Scatter plots indicating, for each tumour type, the association between the number of samples and the proportion of the genome affected by subclonal SCNAs. ρ and P values are from Spearman correlation tests. b, Scatter plots showing median purity per tumour versus the proportion of the genome affected by subclonal SCNA. ρ and P values are from Spearman correlation tests. c, Comparing the proportion of the genome affected by clonal and subclonal SCNAs. The median value for each tumour type is indicated. The size of the dots indicates the number of tumours in the corresponding tumour type. Red dots indicate tumour types with significant differences in the proportion of the genome affected by clonal versus subclonal SCNAs. A two-sided Student’s t-test was used to compare proportions of the genome affected by clonal and subclonal SCNAs. a–c, Tumour types with tumour samples from at least 10 patients were included: bladder urothelial carcinoma (BLCA, n = 26), ER+ breast cancer (ER+ BRCA, n = 19), HER2+ breast cancer (HER2+ BRCA, n = 18), triple-negative breast cancer (TN BRCA, n = 17), colorectal adenocarcinoma (COAD, n = 13), oesophageal adenocarcinoma (ESCA, n = 22), glioma (n = 12), clear cell renal cell carcinoma (KIRC, n = 54), lung adenocarcinoma (LUAD, n = 84), lung squamous cell carcinoma (LUSC, n = 31), prostate adenocarcinoma (PRAD, n = 10), melanoma (SKCM, n = 30) and endometrial carcinoma (UCEC, n = 27). d, The results of the linear regression analysis between LUAD and HER2+ breast cancer of the proportion of the genome subject to subclonal SCNAs along with the number of samples from each tumour and the median sample purity for each tumour.
Extended Data Fig. 3 NSCLC SCNAs correlate with cell cycle gene expression and tumour cell characteristics.
a, b, Scatter plots comparing the average cell cycle gene expression in LUAD tumours (n = 36), LUSC tumours (n = 15) and NSCLC-other tumours (n = 7) with the total proportion of the genome affected by SCNAs. Each dot is coloured according to tumour type. (a) and the proportion of the genome affected by clonal SCNAs (b). c, The proportion of the genome affected by subclonal SCNAs. d, The proportion of SCNAs that are subclonal. a–d, ρ and P values are from Spearman correlation tests. Associations between tumour cell characteristics and SCNA statistics for LUAD (n = 53), LUSC (n = 27) and NSCLC-other (n = 3). e–h, Mitotic index scores for each tumour are compared against total SCNAs (e), clonal SCNAs (f), subclonal SCNAs (g) and the proportion of SCNAs that are subclonal (h) in each tumour. Each dot is coloured according to tumour type. ρ and P values are from Spearman correlation tests. i–l, Association between tumour volume and SCNA metrics. For each tumour for which both digitized slides and tumour volume information were available (n = 83), we performed Spearman correlation tests comparing the tumour volume with the total proportion of the genome affected by SCNAs (i), the proportion of the genome affected by clonal SCNAs (j), the proportion of the genome affected by subclonal SCNAs (k) and the proportion of SCNAs that are subclonal (l). Padj values reflect P values from linear regression models incorporating the number of samples as well as estimated tumour volume and SCNA measure investigated. m–p, Associations between tumour cell characteristics and SCNA statistics for LUAD (n = 53), LUSC (n = 27) and NSCLC-other (n = 3). Anisonucleosis scores for each tumour are compared with the proportion of the genome affected by SCNAs (m), clonal SCNAs (n) or subclonal SCNAs (o) and the proportion of SCNAs that are subclonal (p) in each tumour. Each dot is coloured according to tumour type. The lines represent the median of each group. es, effect size.
a, Bar plots indicating the number and proportion of tumours of each tumour type that show WGD. Subclonal WGD tumours are indicated in blue. b, Beeswarm plots comparing the proportion of the genome affected by clonal or subclonal SCNAs and mirrored subclonal allelic imbalance (MSAI) in WGD and non-WGD tumours. Black bars indicate the median of each distribution. Two-sided Student’s t-tests were used for each comparison. c, Comparing the proportion of the genome affected by clonal or subclonal SCNAs in matched WGD and non-WGD samples from tumours with subclonal WGD. Bars indicate, for each patient with subclonal WGD, the difference between the median proportion of the genome affected by SCNAs in WGD and non-WGD samples. The inset beeswarm plots compare the proportion of the genome affected by different types of SCNAs in WGD and non-WGD samples. The black bars in the beeswarm plots represent the medians of each group. d–f, Impact of OG-TSG score on average arm-level copy number changes. Scatter plots showing the average subclonal arm-level change from MRCA in non-WGD (d; n = 171), WGD (e; n = 194) and subclonal WGD (f; n = 29) tumours versus arm OG–TSG score. Shaded areas indicate the 95% confidence interval. ρ and P values are from Spearman correlation tests. g, Scatter plot showing the average clonal (MRCA) copy number in the entire cohort (n = 394) versus chromosome arm size. h–j, Scatter plots showing the average subclonal arm-level change from MRCA in non-WGD (h; n = 171), WGD (i; n = 194) and subclonal WGD (j; n = 29) tumours versus chromosome size. Shaded areas indicate the 95% confidence interval. ρ and P values are from Spearman correlation tests.
a, List of parameters used for Markov chain modelling. b, Diagrams of simplified Markov chain for each chromosome arm and bar charts of the resulting probability distributions of arm-level copy number. c–e, Beeswarm plots showing the difference in deviance score on a per-tumour basis for non-WGD (n = 171), WGD (n = 194) and subclonal WGD (n = 29) tumours. Black horizontal bars indicate the median of the distribution. Paired two-tailed Student’s t-tests were performed between the deviance scores of the first and second model included in each comparison. es, effect size. c, Comparison between the unweighted (neutral) model and the weighted model that includes OG–TSG scores. d, Comparison between the unweighted model and the model with scrambled OG–TSG scores. e, Comparison between the weighted model that includes OG–TSG scores and the model with scrambled OG–TSG scores. f, g, For each context (non-WGD, WGD or subclonal WGD), the percentage of samples in which the OG–TSG-weighted model outperforms the unweighted model (f) or scrambled model (g) is shown. h–j, Robustness analysis of the Markov chain model of karyotype evolution. Graphs show the relative performance of the three iterations of the model with varying values of g with non-WGD (pGD = 0), WGD (pGD = 0.005) and subclonal WGD (pGD = 0.012) input. The model with scrambled scores has been run for 10 different random permutations of the chromosomes. k, l, Graphs show the performance of three iterations of the model with changing values of pGD (pGD = 0.003 in k and pGD = 0.007 in l) with WGD data. m, n, Graphs show the performance of three iterations of the model with changing values of pGD (pGD = 0.01 in m and pGD = 0.014 in n) with subclonal WGD data. o–q, Graphs show the performance of the three iterations of the model when varying pmisseg with non-WGD, WGD and subclonal WGD input data.
a–h, The following tumour types were analysed: bladder urothelial carcinoma (a; n = 26), ER+ breast cancer (b; n = 19), HER2+ breast cancer (c; n = 18), triple-negative breast cancer (d; n = 17), colorectal adenocarcinoma (e; n = 13), oesophageal adenocarcinoma (f; n = 22), glioma (g; n = 12) and KIRC (h; n = 54). n numbers represent tumours. Across-genome plots show clonal and subclonal SCNAs. Within each tumour type for each chromosome, the following data are shown (top to bottom): the proportion of patients with gains or amplifications. The black line indicates the total proportion of patients with gains/amplifications; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal gains, respectively. The MRCA was derived by phylogenetic analysis (see Methods, ‘Ancestral reconstruction and phylogeny inference’). For each locus, the frequency of gains (red) and losses (blue) found in the MRCAs of the tumours are indicated. The GISTIC2.0 events. These tracks indicate significant SCNA focal events that were identified by GISTIC2.0 (see Methods, ‘GISTIC2.0 peak definition’ and ‘GISTIC2.0 consensus peak definition’) and recurrent arm-level events (see Methods, ‘Arm-level SCNA definition’). The proportion of patients with loss/LOH events. The black line indicates the total proportion of patients with loss/LOH events; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal losses, respectively. The black, yellow and grey lines indicate significance thresholds for total loss/LOH, subclonal loss/LOH and clonal loss/LOH, respectively. Proportion of patients with mirrored subclonal allelic imbalance (MSAI) originating from distinct haplotypes identified by multi-sample phasing. The red line indicates the significance threshold determined by a permutation test at the 0.05 level (see Methods, ‘Permutation test for recurrence of SCNA across tumours’).
a–e, The following tumour types were analysed: LUAD (a; n = 84), LUSC (b; n = 31), prostate adenocarcinoma (c; n = 10), SKCM (d; n = 30) and endometrial carcinoma (e; n = 27). Across-genome plots show clonal and subclonal SCNAs. Within each tumour type for each chromosome, the following data are shown (top to bottom): the proportion of patients with gains or amplifications. The black line indicates the total proportion of patients with gains/amplifications; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal gains, respectively. The MRCA was derived by phylogenetic analysis (see Methods, ‘Ancestral reconstruction and phylogeny inference’). For each locus, the frequency of gains (red) and losses (blue) found in the MRCAs of the tumours are indicated. The GISTIC2.0 events. These tracks indicate significant SCNA focal events that were identified by GISTIC2.0 (see Methods, ‘GISTIC2.0 peak definition’ and ‘GISTIC2.0 consensus peak definition’) and recurrent arm-level events (see Methods, ‘Arm-level SCNA definition’). The proportion of patients with loss/LOH events. The black line indicates the total proportion of patients with loss/LOH events; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal losses, respectively. The black, yellow and grey lines indicate significance thresholds for total loss/LOH, subclonal loss/LOH and clonal loss/LOH, respectively. Proportion of patients with mirrored subclonal allelic imbalance (MSAI) originating from distinct haplotypes identified by multi-sample phasing. The red line indicates the significance threshold determined by a permutation test at the 0.05 level (see Methods, ‘Permutation test for recurrence of SCNA across tumours’).
a, b, Difference in gains and losses in consensus-peak region gains (red, n = 255) and losses (blue, n = 149) (a) and chromosome arm gains (red, n = 95) and losses (blue, n = 200) across all tumour types (b). Black horizontal bars indicate the median of the distribution. Significance testing was performed using an unpaired Student's t-test. c, Classification of chromosomal arm-level events according to timing. Left, heat map of the percentage of subclonal occurrence of all events in each tumour type. The numerator within each cell indicates, in that tumour type, the total number of subclonal occurrences of that event and the denominator indicates the total number of both clonal and subclonal occurrences of that event in that tumour type. Shading of each cell in the heat map indicates the percentage of subclonal occurrences of an event within a tumour type with orange indicating a higher subclonality and grey indicating a higher clonality. The border of each cell indicates the classification of that event in a tumour type as either early (grey border), intermediate (no border) or late (orange border). Right, bar plot of arm-level events ordered by median percentage of subclonal occurrences across tumour types (bottom axis). Bars representing gain events are coloured in red and loss events are coloured in blue. Horizontal black lines indicate separation of events into pan-cancer categories of early, intermediate and late, according to tertiles of the median proportion of SCNAs that is subclonal. Dots centred on the same axis positions indicate the total event count of each loss or gain event across tumour types (top axis). d, Enrichment of early, intermediate and late consensus peak events with known cancer-associated genes. Heat map indicating the resulting P values from two-sided Fisher’s exact tests comparing the overlap of genes in early, intermediate and late consensus peaks with previously reported oncogenes and tumour-suppressor genes. Gain peaks were investigated in relation to oncogenes, while loss peaks were investigated in relation to tumour-suppressor genes. Significant overlaps (Benjamini–Hochberg-adjusted P < 0.05) are indicated with an asterisk (see Methods, ‘Cancer-associated gene and fragile site enrichment’). e, Enrichment of early, intermediate and late consensus peak events with chromosome fragile sites. Heat map indicating the resulting P values from Fisher’s exact tests comparing the overlap of cytobands found in early, intermediate and late consensus peaks with cytobands from previously reported chromosome fragile sites. Significant overlaps (Benjamini–Hochberg-adjusted P < 0.05) are indicated with an asterisk (see Methods, ‘Cancer-associated gene and fragile site enrichment’). f, Prevalence of SNVs and indels in cancer-associated genes. Heat map displaying the proportion of samples from each tumour type with an SNV or indel in the corresponding cancer-associated gene. Yellow asterisks indicate where the SNVs and indels are present clonally in ≥75% of tumours in the corresponding tumour type.
a, Across-genome plot showing the frequency of parallel gain/amplification events in red and frequency of parallel LOH events in blue. The dashed red lines indicate the significance threshold determined by a permutation test. b, Example of parallel evolution on chromosome 1 in CRUK0005. log2[R], B-allele frequency (BAF) and allele-specific expression (ASE) plots are shown for chromosome 1 in samples 3 and 4. On the phylogenetic tree, we indicate the branches in which the parallel gains of chromosome 1 were identified. c, Correlating intra-tumour heterogeneity (ITH) for each gene at the DNA and RNA levels. The scatter plot shows that the percentage of expressed genes with allele-specific DNA intratumour heterogeneity correlates with the percentage of expressed genes with allele-specific RNA intratumour heterogeneity. Only the 43 tumours, for which we had paired multi-sample exome-sequencing and multi-sample RNA sequencing data, were included in this analysis. d, Prevalence of single haploid copies in WGD tumours. Across-genome plot showing the frequency of loss to a single haploid copy in WGD tumours at the cytoband level. Clonal loss to a single haploid copy is shown in grey. Subclonal loss to a single haploid copy is shown in orange. The solid black line indicates the total frequency, including both clonal and subclonal events, of loss to a single haploid copy. HLA LOH is not shown as only the whole-exome sequencing subset of our cohort could be analysed using the LOHHLA bioinformatics tool (see Methods, ‘HLA LOH detection’). e, Prevalence of LOH in WGD tumours. This across-genome plot at the cytoband level shows the proportion of tumours with LOH. The solid black line indicates the total proportion of tumours with either subclonal or clonal LOH; the yellow shading indicates the proportion of tumours with WGD in the cohort that had subclonal LOH at these cytobands. The dashed grey lines demarcate the borders between separate chromosomes. f, Prevalence of HLA LOH across tumour types. We indicate for each tumour type the count and proportion of tumours in which HLA LOH was observed. Dark grey and orange bars show tumours for which HLA LOH was observed clonally or subclonally, respectively; light grey bars show tumours for which no HLA LOH was observed.
a, Beeswarm plot indicating the total proportion of the genome affected by either clonal or subclonal SCNAs in primary tumour samples (red dots) or metastatic samples (blue dots). The black bars indicate the median of the distribution. A two-sided unpaired Student’s t-test was used in this comparison; the P value and effect size(es) are shown. b, Difference in the percentage of the genome affected by SCNAs between paired metastatic and primary tumour samples (n = 152). The waterfall plot shows whether a greater or lesser proportion of the genome was affected by total SCNAs in the primary or metastatic sample(s) of tumours with at least one primary tumour sample and at least one metastatic sample. Purple bars indicate that a greater proportion of the genome was affected by total SCNAs in the metastatic sample and pink bars indicate a greater proportion was affected in the primary tumour sample. A two-sided paired Student’s t-test was used for this comparison. c, Beeswarm plots indicating, for each primary tumour and metastatic sample, the proportion of the genome impacted by SCNAs. These are the same samples included in the analysis of a. The black bars indicate the median of the distribution. Two-sided unpaired Student’s t-tests were used for each comparison; P values are indicated at the top of each plot. d, Beeswarm plots indicating for each primary tumour and metastatic sample the proportion of SCNAs that is subclonal. These are the same samples included in the analysis of a. The black bars show the median of the distribution. Two-sided unpaired Student’s t-tests were used for each comparison; P values are indicated at the top of each plot. e, Shared and private primary tumour and metastatic LOH. Bar plots separated by tumour type with each stacked bar representing the LOH identified in a single tumour sample with both primary tumour and metastatic samples. Each bar is coloured according to the proportion of LOH identified in that tumour that is shared between the primary tumour and metastatic samples (blue), the proportion of LOH present only in primary tumour samples (green) or the proportion of LOH present only in metastatic samples (red). The grey horizontal lines show the median value of the proportion of LOH shared between primary tumour and metastatic samples for each tumour type. f–i, Chromosomal arm-level events enriched in metastatic samples. We included only the four tumour types with >10 tumours with paired primary tumour–metastatic samples: LUAD (f), ER+ breast cancer (g), HER2+ breast cancer (h) and KIRC (i). In each panel, all chromosome arms are featured. The bar plots show the number of tumours with arm-level SCNAs in each tumour type. The colour of the bars indicates whether that arm-level event was enriched, depleted or maintained in the metastatic sample when compared with the corresponding primary tumour sample from the disease of the same patient. Bars facing right represent gain SCNAs; bars facing left represent loss SCNAs. The rectangular blocks between the bar plots indicate whether the arm-level events were recurrent events. Orange blocks represent recurrent subclonal events; grey blocks represent recurrent clonal events; blocks that are partially grey and partially orange represent events that are clonally and subclonally recurrent. The asterisks indicate whether the arm-level event is significantly enriched in metastatic samples in the combined paired (two-sided binomial test) and unpaired (test of equal or given proportions) primary tumour–metastatic analysis.
About this article
Cite this article
Watkins, T.B.K., Lim, E.L., Petkovic, M. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020). https://doi.org/10.1038/s41586-020-2698-6
This article is cited by
SGOL2 is a novel prognostic marker and fosters disease progression via a MAD2-mediated pathway in hepatocellular carcinoma
Biomarker Research (2022)
Geotemporospatial and causal inferential epidemiological overview and survey of USA cannabis, cannabidiol and cannabinoid genotoxicity expressed in cancer incidence 2003–2017: part 3 – spatiotemporal, multivariable and causal inferential pathfinding and exploratory analyses of prostate and ovarian cancers
Archives of Public Health (2022)
Algorithms for Molecular Biology (2022)
Evaluating statistical approaches to define clonal origin of tumours using bulk DNA sequencing: context is everything
Genome Biology (2022)
Genome Biology (2022)