Pervasive lesion segregation shapes cancer genome evolution

Aitken, Sarah J.; Anderson, Craig J.; Connor, Frances; Pich, Oriol; Sundaram, Vasavi; Feig, Christine; Rayner, Tim F.; Lukk, Margus; Aitken, Stuart; Luft, Juliet; Kentepozidou, Elissavet; Arnedo-Pac, Claudia; Beentjes, Sjoerd V.; Davies, Susan E.; Drews, Ruben M.; Ewing, Ailith; Kaiser, Vera B.; Khamseh, Ava; López-Arribillaga, Erika; Redmond, Aisling M.; Santoyo-Lopez, Javier; Sentís, Inés; Talmane, Lana; Yates, Andrew D.; Semple, Colin A.; López-Bigas, Núria; Flicek, Paul; Odom, Duncan T.; Taylor, Martin S.

doi:10.1038/s41586-020-2435-1

Article
Published: 24 June 2020

Pervasive lesion segregation shapes cancer genome evolution

Nature volume 583, pages 265–270 (2020)Cite this article

22k Accesses
28 Citations
308 Altmetric
Metrics details

Subjects

Abstract

Cancers arise through the acquisition of oncogenic mutations and grow by clonal expansion^1,2. Here we reveal that most mutagenic DNA lesions are not resolved into a mutated DNA base pair within a single cell cycle. Instead, DNA lesions segregate, unrepaired, into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterize this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multiallelic and combinatorial genetic diversity. The phasing of lesions enables accurate measurement of strand-biased repair processes, quantification of oncogenic selection and fine mapping of sister-chromatid-exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: DEN-initiated tumours have a high burden of T→N and A→N mutations and driver changes in the EGFR–RAS–RAF pathway.**

**Fig. 2: Chromosome-scale and strand asymmetric segregation of DNA lesions.**

**Fig. 3: Identification of the lesion-containing DNA strand enables TCR to be quantified with strand specificity.**

**Fig. 4: Lesion segregation generates multiallelic and combinatorial genetic diversity.**

**Fig. 5: Lesion segregation is a pervasive feature of exogenous mutagens and is evident in human cancers.**

Single-mitosis dissection of acute and chronic DNA mutagenesis and repair

Article Open access 16 April 2024

Paul Adrian Ginno, Helena Borgers, … Duncan T. Odom

Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers

Article 10 January 2024

Marina Salvadores & Fran Supek

The evolutionary dynamics of extrachromosomal DNA in human cancers

Article Open access 19 September 2022

Joshua T. Lange, John C. Rose, … Paul S. Mischel

Data availability

The WGS FASTQ files are available from the European Nucleotide Archive (ENA) under accession number PRJEB37808. RNA-seq files are available from Array Express under experiment number E-MTAB-8518. Digitised histology images are available from Biostudies under accession S-BSST383.

Code availability

The analysis pipeline including Conda and Snakemake configuration files can be obtained without restriction from the repository https://git.ecdf.ed.ac.uk/taylor-lab/lce-ls.

References

Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
CAS PubMed PubMed Central Google Scholar
Turajlic, S., Sottoriva, A., Graham, T. & Swanton, C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 20, 404–416 (2019).
CAS PubMed Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
CAS PubMed PubMed Central Google Scholar
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).
CAS PubMed PubMed Central Google Scholar
Connor, F. et al. Mutational landscape of a chemically-induced mouse model of liver cancer. J. Hepatol. 69, 840–850 (2018).
PubMed PubMed Central Google Scholar
Maronpot, R. R. Biological basis of differential susceptibility to hepatocarcinogenesis among mouse strains. J. Toxicol. Pathol. 22, 11–33 (2009).
CAS PubMed PubMed Central Google Scholar
Wang, C. et al. Whole-genome sequencing reveals genomic signatures associated with the inflammatory microenvironments in Chinese NSCLC patients. Nat. Commun. 9, 2054 (2018).
ADS PubMed PubMed Central Google Scholar
Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175–180 (2017).
ADS CAS PubMed Google Scholar
Verna, L., Whysner, J. & Williams, G. M. N-nitrosodiethylamine mechanistic data and risk assessment: bioactivation, DNA-adduct formation, mutagenicity, and tumor initiation. Pharmacol. Ther. 71, 57–81 (1996).
CAS PubMed Google Scholar
Maronpot, R. R., Fox, T., Malarkey, D. E. & Goldsworthy, T. L. Mutations in the ras proto-oncogene: clues to etiology and molecular pathogenesis of mouse liver tumors. Toxicology 101, 125–156 (1995).
CAS PubMed Google Scholar
Buchmann, A., Karcier, Z., Schmid, B., Strathmann, J. & Schwarz, M. Differential selection for B-raf and Ha-ras mutated liver tumors in mice with high and low susceptibility to hepatocarcinogenesis. Mutat. Res. 638, 66–74 (2008).
CAS PubMed Google Scholar
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
CAS PubMed PubMed Central Google Scholar
Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).
CAS PubMed PubMed Central Google Scholar
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).
CAS PubMed PubMed Central Google Scholar
Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018).
PubMed PubMed Central Google Scholar
Perry, P. & Evans, H. J. Cytological detection of mutagen-carcinogen exposure by sister chromatid exchange. Nature 258, 121–125 (1975).
ADS CAS PubMed Google Scholar
Guirouilh-Barbat, J., Lambert, S., Bertrand, P. & Lopez, B. S. Is homologous recombination really an error-free process? Front. Genet. 5, 175 (2014).
PubMed PubMed Central Google Scholar
Strick, T. R. & Portman, J. R. Transcription-coupled repair: from cells to single molecules and back again. J. Mol. Biol. 431, 4093–4102 (2019).
CAS PubMed Google Scholar
Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015).
CAS PubMed PubMed Central Google Scholar
Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547 (2017).
CAS PubMed Google Scholar
Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).
ADS CAS PubMed PubMed Central Google Scholar
Preker, P. et al. PROMoter uPstream transcripts share characteristics with mRNAs and are produced upstream of all three major types of mammalian promoters. Nucleic Acids Res. 39, 7179–7193 (2011).
CAS PubMed PubMed Central Google Scholar
Kuipers, J., Jahn, K., Raphael, B. J. & Beerenwinkel, N. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res. 27, 1885–1894 (2017).
CAS PubMed PubMed Central Google Scholar
Brody, Y. et al. Quantification of somatic mutation flow across individual cell division events by lineage sequencing. Genome Res. 28, 1901–1918 (2018).
CAS PubMed PubMed Central Google Scholar
Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018).
ADS PubMed PubMed Central Google Scholar
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
ADS Google Scholar
Parkhomchuk, D., Amstislavskiy, V., Soldatov, A. & Ogryzko, V. Use of high throughput sequencing to observe genome dynamics at a single cell level. Proc. Natl Acad. Sci. USA 106, 20830–20835 (2009).
ADS CAS PubMed PubMed Central Google Scholar
Chan, K. & Gordenin, D. A. Clusters of multiple mutations: incidence and molecular mechanisms. Annu. Rev. Genet. 49, 243–267 (2015).
CAS PubMed PubMed Central Google Scholar
Schwartz, R. & Schäffer, A. A. The evolution of tumour phylogenetics: principles and practice. Nat. Rev. Genet. 18, 213–229 (2017).
CAS PubMed PubMed Central Google Scholar
Kimura, M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61, 893–903 (1969).
CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Genetic load and potential mutational meltdown in cancer cell populations. Mol. Biol. Evol. 36, 541–552 (2019).
CAS PubMed Google Scholar
Hill, W. G. & Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res. 8, 269–294 (1966).
CAS PubMed Google Scholar
Tilk, S., Curtis, C., Petrov, D. & McFarland, C. D. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. Preprint at bioRxiv https://doi.org/10.1101/764340 (2019).
Article Google Scholar
Thoolen, B. et al. Proliferative and nonproliferative lesions of the rat and mouse hepatobiliary system. Toxicol. Pathol. 38 (Suppl), 5S–81S (2010).
PubMed Google Scholar
Lilue, J. et al. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat. Genet. 50, 1574–1583 (2018).
CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS PubMed PubMed Central Google Scholar
Picard Tools (Broad Institute, 2019); http://broadinstitute.github.io/picard
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
CAS PubMed Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
CAS PubMed PubMed Central Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2013).
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLOS Comput. Biol. 12, e1004873 (2016).
ADS PubMed PubMed Central Google Scholar
Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-seq quantification with kallisto. Nat. Biotechnol. 34, 525–527 (2016).
CAS PubMed Google Scholar
Cunningham, F. et al. Ensembl 2019. Nucleic Acids Res. 47 (D1), D745–D751 (2019).
CAS PubMed Google Scholar
Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).
CAS PubMed PubMed Central Google Scholar
Church, D. M. et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 7, e1000112 (2009).
PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
CAS PubMed PubMed Central Google Scholar
Armstrong, J. et al. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. Preprint at bioRxiv https://doi.org/10.1101/730531 (2019).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
CAS PubMed PubMed Central Google Scholar
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at bioRxiv https://doi.org/10.1101/372896 (2018).
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
PubMed PubMed Central Google Scholar
Arnedo-Pac, C., Mularoni, L., Muiños, F., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers. Bioinformatics 35, 5396 (2019).
PubMed PubMed Central Google Scholar
Killick, R. & Eckley, I. A. changepoint: an R package for changepoint analysis. J. Stat. Softw. 58, 1–19 (2014).
Google Scholar
Akeson, E. C. et al. Chromosomal inversion discovered in C3H/HeJ mice. Genomics 87, 311–313 (2006).
CAS PubMed Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010).
ADS Google Scholar
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
PubMed PubMed Central Google Scholar
Caeiro, F. & Mateus, A. randtests: testing randomness in R. (2014).
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).
CAS PubMed PubMed Central Google Scholar
Singer, B. In vivo formation and persistence of modified nucleosides resulting from alkylating agents. Environ. Health Perspect. 62, 41–48 (1985).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank M. Roller and F. Markowetz for supervision; L. Mularoni and G. Ritchie for software support; the CRUK Cambridge Institute Core facilities for their valuable contribution: CRUK Biological Resources (A. Mowbray), Preclinical Genome Editing (L. Young, S. Kupczak, M. Cronshaw, P. Mackin, Y. Cheng and L. Hughes-Hallett), Genomics (J. Hadfield and F. Bowater), Bioinformatics (G. Brown, M. Eldridge and R. Bowers), Histopathology and ISH (L.-A. McDuffus, C. Brodie and J. Arnold), and Research Instrumentation (J. Gray); Edinburgh Genomics (Clinical) Facility; the EMBL-EBI technical services cluster (Z. Mears, A. Cristofori, T. Nowak, S. Nanuwa, V. Tabak and A. Checcucci); and W. Bickmore and C. Ponting for comments on the manuscript. This work was supported by: Cancer Research UK (20412 and 22398), the European Research Council (615584, 682398), the Wellcome Trust (WT108749/Z/15/Z, WT106563/Z/14/A and WT202878/B/16/Z), the European Molecular Biology Laboratory, the MRC Human Genetics Unit core funding programme grants (MC_UU_00007/11 and MC_UU_00007/16) and the ERDF/Spanish Ministry of Science, Innovation and Universities-Spanish State Research Agency/DamReMap Project (RTI2018-094095-B-I00). S.J.A. received a Wellcome Trust PhD Training Fellowship for Clinicians (WT106563/Z/14/Z) and is now funded by a National Institute for Health Research (NIHR) Clinical Lectureship. O.P. is funded by a BIST PhD fellowship supported by the Secretariat for Universities and Research of the Ministry of Business and Knowledge of the Government of Catalonia and the Barcelona Institute of Science and Technology. V.S. is supported by an EMBL Interdisciplinary Postdoc (EIPOD) fellowship under Marie Skłodowska Curie actions COFUND (664726). E.K. is supported by the EMBL International PhD Programme. C.A-P. is supported by La Caixa Foundation fellowship (ID 100010434; LCF/BQ/ES18/11670011). S.V.B. is supported by ERC Starter Grant 759967. A.E. is supported by a UKRI Innovation Fellowship (MR/RO26017/1). A.K. is a cross-disciplinary postdoctoral fellow supported by funding from the University of Edinburgh and Medical Research Council (core grant to the MRC Institute of Genetics and Molecular Medicine). I.S. is supported by an FPI fellowship from Spanish Ministry of Economy and Competitiveness (SAF2015-66084-R). IRB Barcelona is a recipient of a Severo Ochoa Centre of Excellence Award from Spanish Ministry of Science, Innovation and Universities (MICINN, Government of Spain) and is supported by CERCA (Generalitat de Catalunya).

Author information

These authors contributed equally: Craig J. Anderson, Frances Connor

Authors and Affiliations

Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
Sarah J. Aitken, Frances Connor, Vasavi Sundaram, Christine Feig, Tim F. Rayner, Margus Lukk, Ruben M. Drews, Aisling M. Redmond, Sarah J. Aitken, Frances Connor, Ruben M. Drews, Christine Feig, Paul Flicek, Margus Lukk, Duncan T. Odom, Tim F. Rayner, Vasavi Sundaram, Paul Flicek & Duncan T. Odom
Department of Pathology, University of Cambridge, Cambridge, UK
Sarah J. Aitken & Sarah J. Aitken
Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
Sarah J. Aitken, Susan E. Davies & Sarah J. Aitken
MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK
Craig J. Anderson, Stuart Aitken, Juliet Luft, Ailith Ewing, Vera B. Kaiser, Ava Khamseh, Lana Talmane, Stuart Aitken, Craig J. Anderson, Ailith Ewing, Vera B. Kaiser, Juliet Luft, Colin A. Semple, Lana Talmane, Martin S. Taylor, Colin A. Semple & Martin S. Taylor
Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
Oriol Pich, Claudia Arnedo-Pac, Erika López-Arribillaga, Inés Sentís, Claudia Arnedo-Pac, Erika López-Arribillaga, Núria López-Bigas, Oriol Pich, Inés Sentís & Núria López-Bigas
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
Vasavi Sundaram, Elissavet Kentepozidou, Andrew D. Yates, Paul Flicek, Elissavet Kentepozidou, Vasavi Sundaram & Paul Flicek
School of Mathematics and Maxwell Institute, University of Edinburgh, Edinburgh, UK
Sjoerd V. Beentjes
Higgs Centre for Theoretical Physics, University of Edinburgh, Edinburgh, UK
Ava Khamseh
Edinburgh Genomics (Clinical), The University of Edinburgh, Edinburgh, UK
Javier Santoyo-Lopez
Universitat Pompeu Fabra (UPF), Barcelona, Spain
Núria López-Bigas & Núria López-Bigas
Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
Núria López-Bigas & Núria López-Bigas
German Cancer Research Center (DKFZ), Division of Regulatory Genomics and Cancer Evolution, Heidelberg, Germany
Duncan T. Odom & Duncan T. Odom

Authors

Sarah J. Aitken
View author publications
You can also search for this author in PubMed Google Scholar
Craig J. Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Frances Connor
View author publications
You can also search for this author in PubMed Google Scholar
Oriol Pich
View author publications
You can also search for this author in PubMed Google Scholar
Vasavi Sundaram
View author publications
You can also search for this author in PubMed Google Scholar
Christine Feig
View author publications
You can also search for this author in PubMed Google Scholar
Tim F. Rayner
View author publications
You can also search for this author in PubMed Google Scholar
Margus Lukk
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Aitken
View author publications
You can also search for this author in PubMed Google Scholar
Juliet Luft
View author publications
You can also search for this author in PubMed Google Scholar
Elissavet Kentepozidou
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Arnedo-Pac
View author publications
You can also search for this author in PubMed Google Scholar
Sjoerd V. Beentjes
View author publications
You can also search for this author in PubMed Google Scholar
Susan E. Davies
View author publications
You can also search for this author in PubMed Google Scholar
Ruben M. Drews
View author publications
You can also search for this author in PubMed Google Scholar
Ailith Ewing
View author publications
You can also search for this author in PubMed Google Scholar
Vera B. Kaiser
View author publications
You can also search for this author in PubMed Google Scholar
Ava Khamseh
View author publications
You can also search for this author in PubMed Google Scholar
Erika López-Arribillaga
View author publications
You can also search for this author in PubMed Google Scholar
Aisling M. Redmond
View author publications
You can also search for this author in PubMed Google Scholar
Javier Santoyo-Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Inés Sentís
View author publications
You can also search for this author in PubMed Google Scholar
Lana Talmane
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D. Yates
View author publications
You can also search for this author in PubMed Google Scholar
Colin A. Semple
View author publications
You can also search for this author in PubMed Google Scholar
Núria López-Bigas
View author publications
You can also search for this author in PubMed Google Scholar
Paul Flicek
View author publications
You can also search for this author in PubMed Google Scholar
Duncan T. Odom
View author publications
You can also search for this author in PubMed Google Scholar
Martin S. Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Liver Cancer Evolution Consortium

Sarah J. Aitken
, Stuart Aitken
, Craig J. Anderson
, Claudia Arnedo-Pac
, Frances Connor
, Ruben M. Drews
, Ailith Ewing
, Christine Feig
, Paul Flicek
, Vera B. Kaiser
, Elissavet Kentepozidou
, Erika López-Arribillaga
, Núria López-Bigas
, Juliet Luft
, Margus Lukk
, Duncan T. Odom
, Oriol Pich
, Tim F. Rayner
, Colin A. Semple
, Inés Sentís
, Vasavi Sundaram
, Lana Talmane
& Martin S. Taylor

Contributions

S.J.A., F.C., C.F. and D.T.O. conceived the project and designed the experiments. S.J.A., F.C. and C.F., performed the mutagenesis experiments and sequencing experiments. E.L.-A and A.M.R. performed supporting experiments. J.S.-L. provided contract sequencing. S.J.A. performed the histopathological analyses with S.E.D. providing advice. C.J.A. and M.S.T. designed and implemented computational analysis. M.S.T. discovered lesion segregation. O.P., V.S., T.F.R., M.L., S.A., E.K. and J.L. performed supporting computational analysis. C.A-P., S.V.B., R.M.D., A.E., V.B.K., A.K., I.S. and L.T. contributed to the computational analyses. T.F.R., M.L., S.A. and A.D.Y. curated data. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. supervised the work. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. lead the Liver Cancer Evolution Consortium. S.J.A. and P.F. provided scientific and administrative organisation. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. funded the work. S.J.A., D.T.O. and M.S.T. wrote the manuscript. All authors had the opportunity to edit the manuscript. All authors approved the final manuscript.

Corresponding authors

Correspondence to Duncan T. Odom, Martin S. Taylor, Duncan T. Odom or Martin S. Taylor.

Ethics declarations

Competing interests

P.F. is a member of the Scientific Advisory Boards of Fabric Genomics, Inc. and Eagle Genomics, Ltd. The other authors declare no competing interests.

Additional information

Peer review information Nature thanks Trevor Graham and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Summary mutation metrics for C3H and CAST tumours.

a, Single nucleotide substitution rates per C3H tumour, rank ordered over x-axis (grey points, median blue line). Insertion/deletion (indel, <11 nt) rates show as black. b, Y-axis from a, expanded to show distribution of indel rates with preserved tumour order. c, Number of C3H copy number variant (CNV) segments and their total span as a percent of the haploid genome. Blue shading shows intensity of overlapping points as a percent of all tumours in the plot. d–f, Corresponding plots for CAST derived tumours; f, two extreme x-axis outliers relocated (red) and x-axis value shown. g, h, Mutation spectra deconvolved from the aggregate spectra of 371 C3H tumours, subsequently referred to as the DEN1 and DEN2 signatures. DEN1 is dominated by T→N or their complement A→N changes thought to arise from the O⁴-ethyl-deoxythymidine adduct of T¹⁰. DEN2 substitutions are primarily C→T or their complement G→A changes likely from O⁶-ethyl-2-deoxyguanosine lesions of G¹⁰. i, Oncoplot summarizing mutation load, mutation signature composition, and driver gene mutation complement of C3H tumours. j, Oncoplot of CAST derived tumours as in i. The DEN2 signature is a minor component of most tumours but prominent in a minority (i, j).

Extended Data Fig. 2 Mutational asymmetry across 50% of the autosomal genome and 100% of the haploid X chromosomes.

a, b, Typically 50% of the autosomal genomic span (percent of nucleotides) in tumours is contained in segments with either Watson or Crick strand mutational asymmetry. a, C3H tumours, n = 371. b, CAST tumours, n = 84. c, d, Typically 100% of the haploid X chromosome shows Watson or Crick strand mutational asymmetry. c, C3H tumours (n = 371). d, CAST tumours (n = 84).

Extended Data Fig. 3 The frequency of SCEs correlates with mutation rate, and localizing reference genome assembly errors.

a, The relationship between single nucleotide substitution mutation load and detected SCE events in C3H tumours. DEN is known to produce ethyl aducts on the sugar-phosphate backbone of DNA as well as mutation-inducing modifications to the bases¹⁰ which could lead to strand breaks⁶⁰ triggering SCE. The frequent observation and correlation between rates of SCE and point mutation supports this view. Counts of SCE (y-axis) are based on down-sampling to 10,000 informative mutations per tumour to ensure equal power to detect SCE in each tumour. Tumours with <50% cellularity (pink) have high mutation load and form a sub-group with few detected SCE events; these are suspected to be polyclonal tumours and were excluded from the Pearson’s correlation reported (n = 335 independent tumour samples, implemented in a two-sided test, significance from Fisher’s transform). b, As for a, but showing CAST derived tumours (n = 84, after cellularity exclusions n = 77). c, Evaluation of the relationship between mutation load and ability to detect SCE events. Mutations from C3H tumour 94315_N8 (shown in Fig. 2) randomly down-sampled and segmentation analysis applied. The y-axis shows the percentage of SCE events detected (100 replicates, mean red, 95% C.I. pink). The x-axis is on a log-scale: 95% of C3H and >95% of CAST tumours have mutation counts to the right of the blue vertical line. Down-sampling other tumours gave comparable results. d, The same down-sampling data as shown in c but the y-axis shows the percent of mutations with the correct (same as full data) mutational asymmetry assignment (mean red, 95% C.I. pink). e, Candidate C3H reference genome assembly errors. Genome coordinates shown on the x-axis. Immediate switches between Watson and Crick asymmetry are not expected on autosomes unless both copies of the chromosome have a SCE event at equivalent sites. However, inversions and translocations between the sequenced genomes and the reference assembly are expected to produce immediate asymmetry switches. The discordant segment coverage count (black y-axis) shows the number of informative tumours (those with either Watson or Crick strand asymmetry at the corresponding genome position) that suggest a tumour genome to reference genome discrepancy. Consensus support (brown y-axis) plotted as triangles shows the percentage of informative tumours that support a genomic discrepancy at the indicated position (only shown for values >50% support). The two sites on chromosome 6 in C3H correspond to a previously identified C3H strain specific inversion that is known to be incorrectly oriented in the C3H reference assembly⁵⁴. f, Candidate CAST reference genome assembly errors, plotted as per e. The candidate misassembly on chromosome 14 in both strains occurs at an approximately orthologous position, suggesting a rearrangement shared between strains or a misassembly in the BL6 GRCm38 reference assembly against which other mouse reference genome assemblies have been scaffolded.

Extended Data Fig. 4 Locally elevated mutation load is driven by SCE.

a, Double strand breaks (DSBs) and other DNA damage can trigger homologous-recombination-mediated DNA repair between sister chromatids. The repair intermediate resolves into separate chromatids through cleavage and ligation; grey triangles denote cleavage sites for one of the possible resolutions that would result in a large-scale SCE event. Although illustrated for double-ended DNA breaks, single ended breaks from collapsed replication forks can be repaired through homologous recombination and could similarly lead to the formation of repair intermediate structures that can be resolved as SCEs. b, Enrichment analysis of SCE sites (red) compared with null expectations from randomly permuting locations into the analysable fraction of the genome (grey distributions), the black boxes denote 95% of 1,000 permutations. SCE events are enriched in later replicating and transcriptionally less active genomic regions (Hi-C defined compartment B), and correspondingly depleted from early replicating active regions. c, Aggregating across n = 9,645 SCE sites, the observed mutation rate approximately doubles at the inferred site of exchange (x = 0). Aggregate mutation rates (brown) were calculated in consecutive 5-kb windows. Compositionally matched null expectation was generated by permuting each exchange site into 100 proxy tumours and calculating median (black) and 95% confidence intervals (grey) while preserving the total number of projected sites per proxy tumour. d, The elevated mutation count is not the result of a high mutation density in a subset of exchange sites, rather it is a subtle increase in mutations across most exchange sites. Heatmap showing mutation counts calculated in consecutive 5-kb windows across each exchange site. Rows represent each exchange site, rank-ordered by total mutation count across each 400-kb interval. e, The distribution of positional uncertainty in exchange site location approximately mirrors the decay profile of elevated mutation frequency. f, Divergence of mutation rate spectra is shown as cosine distance between the analysed window and the genome wide mutation rate spectrum aggregated over all C3H tumours. Despite the elevated mutation frequency, there is no detected distortion of the mutation spectrum. g, A model based on homologous recombination repair intermediate, branch migration that produces heteroduplex segments of (i) mismatch:mismatch (circles) and (ii) lesion:lesion (red triangles) strands. Subsequent strand segregation would increase the mutational diversity of a descendant cell population but not the mutation count per cell (key as per Fig. 2).

Extended Data Fig. 5 Replication of TCR with lesion strand resolution in Mus musculus castaneus.

a, TCR of template strand lesions is dependent on transcription level (P15 liver, median TPM). Mutation rate estimates (circles) are the aggregate rates for expression level binned genes across CAST tumours (n = 84). Expression level bin 0 contains n = 2,645 genes, all subsequent bins contain n = 4,323 genes. See Methods for per-gene, per-tumour inclusion criteria. Empiric confidence intervals (99%) were calculated through bootstrap sampling (n = 100 replicates) of genes within the expression level bin. b, Comparison of mutation rates for the 64 trinucleotide contexts: each context has a high and a low expression point linked by a line. c, Sequence composition normalized profiles of mutation rate around TSS loci. d, Stratifying the data plotted in c by lesion strand reveals greater detail on the observed mutation patterns, including the pronounced influence of bidirectional transcription initiation.

Extended Data Fig. 6 Variant allele frequency distributions demonstrate high rates of non-mutagenic replication over segregating lesions.

a–f, VAF distributions shown as probability density functions (total area under curve = 1) for six example tumours, calculated taking into account observed multiallelic variation. The VAF for identified driver mutations is indicated (brown triangle). Tumour identifiers are shown top right along with the percent of genomic segments (based on mutation asymmetry segmentation) that are multiallelic. Skew shows Pearson’s median skewness coefficient for the VAF distributions. a–c, Tumours with no multiallelic segments and exhibit a symmetric VAF distribution showing minimal sub-clonal structure. d–f, Tumours with all segments multiallelic, illustrating the sub-clonal structure generated by segregating lesions. g, Tumours with a high proportion of multiallelic segments have a left-skewed VAF distribution indicating frequent non-mutagenic replication over segregating lesions. Percent of genome segments that are multiallelic (x-axis) plotted against VAF distribution skew for 371 C3H tumours. Tumours with low estimated cellularity indicated in pink and excluded from correlation analysis (n = 335 independent tumour samples in Pearson’s correlation, two-sided significance from Fisher’s transform). h, As for g, but showing 84 CAST tumours (n = 77 independent tumours included in Pearson’s correlation). i, Mutation asymmetry summary ribbon for example C3H tumour 90797_N2; C3H genome on the x-axis. The percent of mutation sites with robust support for multiallelic variation (y-axis) calculated in 10Mb windows (grey) and for each asymmetric segment (black). Thresholds for high (black), intermediate (grey) and zero (red) rates of multiallelic sites shown on the right axis. j, VAF density plots for the example tumour 90797_N2 (shown in i) mutations in asymmetry segments stratified by the multiallelic rate thresholds defined in i. As with individual tumour based analysis (a–h), high multiallelic rates correspond to a leftward skew of the VAF (black, grey) whereas segments without multiallelic variation (red) show a minimally skewed distribution.

Extended Data Fig. 7 Examples of mutation patterns generated by lesion segregation from a diverse range of clinically relevant mutagens.

a–c, Genome-wide mutation asymmetry plots (shown as per Fig. 2a–c) for mutagen exposed human iPSCs⁵. Cells exposed to simulated solar radiation illustrate lesion segregation for ultraviolet damage (a). Immediately adjacent mutations (intermutation distance 10⁰) indicate CC→TT dinucleotide changes. Despite a low total mutation load (1,308 nucleotide substitutions, 842 informative T→A changes), the mutational asymmetry of lesion segregation is evident for the aristolochic acid exposed clone⁵ (b) and the polycyclic aromatic hydrocarbon DBADE (c) that is found in tobacco smoke. d, Summary mutation asymmetry ribbons (as per Fig. 2d) for all mutagen exposed clones with rl₂₀ > 5, which illustrates the independence of asymmetry pattern between replicate clones, almost universal asymmetry on chromosome X, and approximately 50% of the autosomal genome with asymmetry over autosomal chromosomes. The dominant mutation type is indicated for each mutagen. In those clones with low mutation rates, some sister exchange sites are likely to have been missed leading to reduced asymmetry signal (for example, on the X chromosome). Segments with <20 informative mutations are shown in white.

Extended Data Fig. 8 Lesion segregation is evident for multiple DNA damaging agents but not for damage independent mutational processes.

a, DEN induced C3H tumour genomes (n = 371) typically show significant mutational asymmetry across their genome. Wald–Wolfowitz runs test (x-axis) P-values calculated using a normal approximation (two-sided). Nominal P = 0.05 significance threshold indicated by dashed blue line, Bonferroni-corrected threshold shown as solid vertical blue line. P-values <1 × 10⁻¹⁵ are rank-ordered. The rl₂₀ metric (Fig. 5a; Methods) is shown on the y-axis, horizontal blue line gives emprical significance threshold of rl₂₀ > 5. b, Many human iPSCs grown from single cells after exogenous mutagen exposure⁵ show significant mutation asymmetry (n = 148 WGS, mutagen-exposed cell lines). Statistical calculations and plotting as in a, with adjustment of Bonferroni correction. Diverse categories of mutagen, denoted by point colour (see Fig. 5b), show asymmetry indicative of lesion segregation. c, Cell lines with genetically perturbed genome replication and maintenance machinery²⁶ and similar mutation load to those in b do not show significant mutation asymmetry (n = 72 WGS, genetically perturbed cell-lines). Statistical calculations and plotting as in a with adjustment of Bonferroni correction.

Extended Data Table 1 A lesion segregation-based test for oncogenic selectionA lesion-segregation-based test for oncogenic selection

Full size table

Supplementary information

Reporting Summary

Supplementary Table 1

Table of tumours sequenced containing key parameters and mutation spectra signature matrices. For each sequenced tumour sample, summarising: sample identifiers, animal and parent identifiers, diagnosis and histopathological features, mutation signatures and counts, cellularity estimates, count of sister chromatid exchange events, mutation asymmetry metrics, and identified driver mutations. Excel format file containing three sheets.

Supplementary Table 2

Table of exogenous mutagen and ICGC scan results. Summary of mutation rate and mutational asymmetry analysis of publicly available data from: (a) exogenous mutagen exposed human induced pluripotent stem-cells and (b) paired tumour-normal whole genome sequencing of clinical cancers from the International Cancer Genome Sequencing Consortium (ICGC). Excel format file containing three sheets.

Supplementary Table 3

Table of key resources and software. Summary of reagents, resources and software used, including where applicable unique identifiers, references and version numbers. Excel format file containing two sheets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aitken, S.J., Anderson, C.J., Connor, F. et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 583, 265–270 (2020). https://doi.org/10.1038/s41586-020-2435-1

Download citation

Received: 06 December 2019
Accepted: 07 May 2020
Published: 24 June 2020
Issue Date: 09 July 2020
DOI: https://doi.org/10.1038/s41586-020-2435-1

This article is cited by

Single-mitosis dissection of acute and chronic DNA mutagenesis and repair
- Paul Adrian Ginno
- Helena Borgers
- Duncan T. Odom
Nature Genetics (2024)
MutationalPatterns: the one stop shop for the analysis of mutational processes
- Freek Manders
- Arianne M. Brandsma
- Ruben van Boxtel
BMC Genomics (2022)
DNA nicks induce mutational signatures associated with BRCA1 deficiency
- Yi-Li Feng
- Qian Liu
- An-Yong Xie
Nature Communications (2022)
Common anti-cancer therapies induce somatic mutations in stem cells of healthy tissue
- Ewart Kuijk
- Onno Kranenburg
- Arne Van Hoeck
Nature Communications (2022)
The evolution of hematopoietic cells under cancer therapy
- Oriol Pich
- Albert Cortes-Bullich
- Nuria Lopez-Bigas
Nature Communications (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.