Pervasive lesion segregation shapes cancer genome evolution


Cancers arise through the acquisition of oncogenic mutations and grow by clonal expansion1,2. Here we reveal that most mutagenic DNA lesions are not resolved into a mutated DNA base pair within a single cell cycle. Instead, DNA lesions segregate, unrepaired, into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterize this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multiallelic and combinatorial genetic diversity. The phasing of lesions enables accurate measurement of strand-biased repair processes, quantification of oncogenic selection and fine mapping of sister-chromatid-exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: DEN-initiated tumours have a high burden of T→N and A→N mutations and driver changes in the EGFR–RAS–RAF pathway.
Fig. 2: Chromosome-scale and strand asymmetric segregation of DNA lesions.
Fig. 3: Identification of the lesion-containing DNA strand enables TCR to be quantified with strand specificity.
Fig. 4: Lesion segregation generates multiallelic and combinatorial genetic diversity.
Fig. 5: Lesion segregation is a pervasive feature of exogenous mutagens and is evident in human cancers.

Data availability

The WGS FASTQ files are available from the European Nucleotide Archive (ENA) under accession number PRJEB37808. RNA-seq files are available from Array Express under experiment number E-MTAB-8518. Digitised histology images are available from Biostudies under accession S-BSST383.

Code availability

The analysis pipeline including Conda and Snakemake configuration files can be obtained without restriction from the repository


  1. 1.

    Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Turajlic, S., Sottoriva, A., Graham, T. & Swanton, C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 20, 404–416 (2019).

    CAS  PubMed  Google Scholar 

  3. 3.

    Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Connor, F. et al. Mutational landscape of a chemically-induced mouse model of liver cancer. J. Hepatol. 69, 840–850 (2018).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Maronpot, R. R. Biological basis of differential susceptibility to hepatocarcinogenesis among mouse strains. J. Toxicol. Pathol. 22, 11–33 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Wang, C. et al. Whole-genome sequencing reveals genomic signatures associated with the inflammatory microenvironments in Chinese NSCLC patients. Nat. Commun. 9, 2054 (2018).

    ADS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature 545, 175–180 (2017).

    ADS  CAS  PubMed  Google Scholar 

  10. 10.

    Verna, L., Whysner, J. & Williams, G. M. N-nitrosodiethylamine mechanistic data and risk assessment: bioactivation, DNA-adduct formation, mutagenicity, and tumor initiation. Pharmacol. Ther. 71, 57–81 (1996).

    CAS  PubMed  Google Scholar 

  11. 11.

    Maronpot, R. R., Fox, T., Malarkey, D. E. & Goldsworthy, T. L. Mutations in the ras proto-oncogene: clues to etiology and molecular pathogenesis of mouse liver tumors. Toxicology 101, 125–156 (1995).

    CAS  PubMed  Google Scholar 

  12. 12.

    Buchmann, A., Karcier, Z., Schmid, B., Strathmann, J. & Schwarz, M. Differential selection for B-raf and Ha-ras mutated liver tumors in mice with high and low susceptibility to hepatocarcinogenesis. Mutat. Res. 638, 66–74 (2008).

    CAS  PubMed  Google Scholar 

  13. 13.

    Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell 46, 424–435 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Tomkova, M., Tomek, J., Kriaucionis, S. & Schuster-Böckler, B. Mutational signature distribution varies with DNA replication timing and strand asymmetry. Genome Biol. 19, 129 (2018).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Perry, P. & Evans, H. J. Cytological detection of mutagen-carcinogen exposure by sister chromatid exchange. Nature 258, 121–125 (1975).

    ADS  CAS  PubMed  Google Scholar 

  18. 18.

    Guirouilh-Barbat, J., Lambert, S., Bertrand, P. & Lopez, B. S. Is homologous recombination really an error-free process? Front. Genet. 5, 175 (2014).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Strick, T. R. & Portman, J. R. Transcription-coupled repair: from cells to single molecules and back again. J. Mol. Biol. 431, 4093–4102 (2019).

    CAS  PubMed  Google Scholar 

  20. 20.

    Hu, J., Adar, S., Selby, C. P., Lieb, J. D. & Sancar, A. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution. Genes Dev. 29, 948–960 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell 170, 534–547 (2017).

    CAS  PubMed  Google Scholar 

  22. 22.

    Seila, A. C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Preker, P. et al. PROMoter uPstream transcripts share characteristics with mRNAs and are produced upstream of all three major types of mammalian promoters. Nucleic Acids Res. 39, 7179–7193 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Kuipers, J., Jahn, K., Raphael, B. J. & Beerenwinkel, N. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res. 27, 1885–1894 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Brody, Y. et al. Quantification of somatic mutation flow across individual cell division events by lineage sequencing. Genome Res. 28, 1901–1918 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018).

    ADS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    ADS  Google Scholar 

  28. 28.

    Parkhomchuk, D., Amstislavskiy, V., Soldatov, A. & Ogryzko, V. Use of high throughput sequencing to observe genome dynamics at a single cell level. Proc. Natl Acad. Sci. USA 106, 20830–20835 (2009).

    ADS  CAS  PubMed  Google Scholar 

  29. 29.

    Chan, K. & Gordenin, D. A. Clusters of multiple mutations: incidence and molecular mechanisms. Annu. Rev. Genet. 49, 243–267 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Schwartz, R. & Schäffer, A. A. The evolution of tumour phylogenetics: principles and practice. Nat. Rev. Genet. 18, 213–229 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Kimura, M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61, 893–903 (1969).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Zhang, Y. et al. Genetic load and potential mutational meltdown in cancer cell populations. Mol. Biol. Evol. 36, 541–552 (2019).

    CAS  PubMed  Google Scholar 

  33. 33.

    Hill, W. G. & Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res. 8, 269–294 (1966).

    CAS  PubMed  Google Scholar 

  34. 34.

    Tilk, S., Curtis, C., Petrov, D. & McFarland, C. D. Most cancers carry a substantial deleterious load due to Hill-Robertson interference. Preprint at bioRxiv (2019).

    Article  Google Scholar 

  35. 35.

    Thoolen, B. et al. Proliferative and nonproliferative lesions of the rat and mouse hepatobiliary system. Toxicol. Pathol. 38 (Suppl), 5S–81S (2010).

    PubMed  Google Scholar 

  36. 36.

    Lilue, J. et al. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat. Genet. 50, 1574–1583 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Picard Tools (Broad Institute, 2019);

  39. 39.

    Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    CAS  PubMed  Google Scholar 

  40. 40.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2013).

  42. 42.

    Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLOS Comput. Biol. 12, e1004873 (2016).

    ADS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal RNA-seq quantification with kallisto. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  PubMed  Google Scholar 

  44. 44.

    Cunningham, F. et al. Ensembl 2019. Nucleic Acids Res. 47 (D1), D745–D751 (2019).

    CAS  PubMed  Google Scholar 

  45. 45.

    Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Church, D. M. et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 7, e1000112 (2009).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Armstrong, J. et al. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. Preprint at bioRxiv (2019).

  49. 49.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at bioRxiv (2018).

  51. 51.

    Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).

    PubMed  PubMed Central  Google Scholar 

  52. 52.

    Arnedo-Pac, C., Mularoni, L., Muiños, F., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers. Bioinformatics 35, 5396 (2019).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Killick, R. & Eckley, I. A. changepoint: an R package for changepoint analysis. J. Stat. Softw. 58, 1–19 (2014).

    Google Scholar 

  54. 54.

    Akeson, E. C. et al. Chromosomal inversion discovered in C3H/HeJ mice. Genomics 87, 311–313 (2006).

    CAS  PubMed  Google Scholar 

  55. 55.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  56. 56.

    International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010).

    ADS  Google Scholar 

  57. 57.

    Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).

    PubMed  PubMed Central  Google Scholar 

  58. 58.

    Caeiro, F. & Mateus, A. randtests: testing randomness in R. (2014).

  59. 59.

    Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Singer, B. In vivo formation and persistence of modified nucleosides resulting from alkylating agents. Environ. Health Perspect. 62, 41–48 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank M. Roller and F. Markowetz for supervision; L. Mularoni and G. Ritchie for software support; the CRUK Cambridge Institute Core facilities for their valuable contribution: CRUK Biological Resources (A. Mowbray), Preclinical Genome Editing (L. Young, S. Kupczak, M. Cronshaw, P. Mackin, Y. Cheng and L. Hughes-Hallett), Genomics (J. Hadfield and F. Bowater), Bioinformatics (G. Brown, M. Eldridge and R. Bowers), Histopathology and ISH (L.-A. McDuffus, C. Brodie and J. Arnold), and Research Instrumentation (J. Gray); Edinburgh Genomics (Clinical) Facility; the EMBL-EBI technical services cluster (Z. Mears, A. Cristofori, T. Nowak, S. Nanuwa, V. Tabak and A. Checcucci); and W. Bickmore and C. Ponting for comments on the manuscript. This work was supported by: Cancer Research UK (20412 and 22398), the European Research Council (615584, 682398), the Wellcome Trust (WT108749/Z/15/Z, WT106563/Z/14/A and WT202878/B/16/Z), the European Molecular Biology Laboratory, the MRC Human Genetics Unit core funding programme grants (MC_UU_00007/11 and MC_UU_00007/16) and the ERDF/Spanish Ministry of Science, Innovation and Universities-Spanish State Research Agency/DamReMap Project (RTI2018-094095-B-I00). S.J.A. received a Wellcome Trust PhD Training Fellowship for Clinicians (WT106563/Z/14/Z) and is now funded by a National Institute for Health Research (NIHR) Clinical Lectureship. O.P. is funded by a BIST PhD fellowship supported by the Secretariat for Universities and Research of the Ministry of Business and Knowledge of the Government of Catalonia and the Barcelona Institute of Science and Technology. V.S. is supported by an EMBL Interdisciplinary Postdoc (EIPOD) fellowship under Marie Skłodowska Curie actions COFUND (664726). E.K. is supported by the EMBL International PhD Programme. C.A-P. is supported by La Caixa Foundation fellowship (ID 100010434; LCF/BQ/ES18/11670011). S.V.B. is supported by ERC Starter Grant 759967. A.E. is supported by a UKRI Innovation Fellowship (MR/RO26017/1). A.K. is a cross-disciplinary postdoctoral fellow supported by funding from the University of Edinburgh and Medical Research Council (core grant to the MRC Institute of Genetics and Molecular Medicine). I.S. is supported by an FPI fellowship from Spanish Ministry of Economy and Competitiveness (SAF2015-66084-R). IRB Barcelona is a recipient of a Severo Ochoa Centre of Excellence Award from Spanish Ministry of Science, Innovation and Universities (MICINN, Government of Spain) and is supported by CERCA (Generalitat de Catalunya).

Author information





S.J.A., F.C., C.F. and D.T.O. conceived the project and designed the experiments. S.J.A., F.C. and C.F., performed the mutagenesis experiments and sequencing experiments. E.L.-A and A.M.R. performed supporting experiments. J.S.-L. provided contract sequencing. S.J.A. performed the histopathological analyses with S.E.D. providing advice. C.J.A. and M.S.T. designed and implemented computational analysis. M.S.T. discovered lesion segregation. O.P., V.S., T.F.R., M.L., S.A., E.K. and J.L. performed supporting computational analysis. C.A-P., S.V.B., R.M.D., A.E., V.B.K., A.K., I.S. and L.T. contributed to the computational analyses. T.F.R., M.L., S.A. and A.D.Y. curated data. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. supervised the work. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. lead the Liver Cancer Evolution Consortium. S.J.A. and P.F. provided scientific and administrative organisation. S.J.A., C.A.S., N.L.-B., P.F., D.T.O. and M.S.T. funded the work. S.J.A., D.T.O. and M.S.T. wrote the manuscript. All authors had the opportunity to edit the manuscript. All authors approved the final manuscript.

Corresponding authors

Correspondence to Duncan T. Odom or Martin S. Taylor or Duncan T. Odom or Martin S. Taylor.

Ethics declarations

Competing interests

P.F. is a member of the Scientific Advisory Boards of Fabric Genomics, Inc. and Eagle Genomics, Ltd. The other authors declare no competing interests.

Additional information

Peer review information Nature thanks Trevor Graham and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Summary mutation metrics for C3H and CAST tumours.

a, Single nucleotide substitution rates per C3H tumour, rank ordered over x-axis (grey points, median blue line). Insertion/deletion (indel, <11 nt) rates show as black. b, Y-axis from a, expanded to show distribution of indel rates with preserved tumour order. c, Number of C3H copy number variant (CNV) segments and their total span as a percent of the haploid genome. Blue shading shows intensity of overlapping points as a percent of all tumours in the plot. d–f, Corresponding plots for CAST derived tumours; f, two extreme x-axis outliers relocated (red) and x-axis value shown. g, h, Mutation spectra deconvolved from the aggregate spectra of 371 C3H tumours, subsequently referred to as the DEN1 and DEN2 signatures. DEN1 is dominated by T→N or their complement A→N changes thought to arise from the O4-ethyl-deoxythymidine adduct of T10. DEN2 substitutions are primarily C→T or their complement G→A changes likely from O6-ethyl-2-deoxyguanosine lesions of G10. i, Oncoplot summarizing mutation load, mutation signature composition, and driver gene mutation complement of C3H tumours. j, Oncoplot of CAST derived tumours as in i. The DEN2 signature is a minor component of most tumours but prominent in a minority (i, j).

Extended Data Fig. 2 Mutational asymmetry across 50% of the autosomal genome and 100% of the haploid X chromosomes.

a, b, Typically 50% of the autosomal genomic span (percent of nucleotides) in tumours is contained in segments with either Watson or Crick strand mutational asymmetry. a, C3H tumours, n = 371. b, CAST tumours, n = 84. c, d, Typically 100% of the haploid X chromosome shows Watson or Crick strand mutational asymmetry. c, C3H tumours (n = 371). d, CAST tumours (n = 84).

Extended Data Fig. 3 The frequency of SCEs correlates with mutation rate, and localizing reference genome assembly errors.

a, The relationship between single nucleotide substitution mutation load and detected SCE events in C3H tumours. DEN is known to produce ethyl aducts on the sugar-phosphate backbone of DNA as well as mutation-inducing modifications to the bases10 which could lead to strand breaks60 triggering SCE. The frequent observation and correlation between rates of SCE and point mutation supports this view. Counts of SCE (y-axis) are based on down-sampling to 10,000 informative mutations per tumour to ensure equal power to detect SCE in each tumour. Tumours with <50% cellularity (pink) have high mutation load and form a sub-group with few detected SCE events; these are suspected to be polyclonal tumours and were excluded from the Pearson’s correlation reported (n = 335 independent tumour samples, implemented in a two-sided test, significance from Fisher’s transform). b, As for a, but showing CAST derived tumours (n = 84, after cellularity exclusions n = 77). c, Evaluation of the relationship between mutation load and ability to detect SCE events. Mutations from C3H tumour 94315_N8 (shown in Fig. 2) randomly down-sampled and segmentation analysis applied. The y-axis shows the percentage of SCE events detected (100 replicates, mean red, 95% C.I. pink). The x-axis is on a log-scale: 95% of C3H and >95% of CAST tumours have mutation counts to the right of the blue vertical line. Down-sampling other tumours gave comparable results. d, The same down-sampling data as shown in c but the y-axis shows the percent of mutations with the correct (same as full data) mutational asymmetry assignment (mean red, 95% C.I. pink). e, Candidate C3H reference genome assembly errors. Genome coordinates shown on the x-axis. Immediate switches between Watson and Crick asymmetry are not expected on autosomes unless both copies of the chromosome have a SCE event at equivalent sites. However, inversions and translocations between the sequenced genomes and the reference assembly are expected to produce immediate asymmetry switches. The discordant segment coverage count (black y-axis) shows the number of informative tumours (those with either Watson or Crick strand asymmetry at the corresponding genome position) that suggest a tumour genome to reference genome discrepancy. Consensus support (brown y-axis) plotted as triangles shows the percentage of informative tumours that support a genomic discrepancy at the indicated position (only shown for values >50% support). The two sites on chromosome 6 in C3H correspond to a previously identified C3H strain specific inversion that is known to be incorrectly oriented in the C3H reference assembly54. f, Candidate CAST reference genome assembly errors, plotted as per e. The candidate misassembly on chromosome 14 in both strains occurs at an approximately orthologous position, suggesting a rearrangement shared between strains or a misassembly in the BL6 GRCm38 reference assembly against which other mouse reference genome assemblies have been scaffolded.

Extended Data Fig. 4 Locally elevated mutation load is driven by SCE.

a, Double strand breaks (DSBs) and other DNA damage can trigger homologous-recombination-mediated DNA repair between sister chromatids. The repair intermediate resolves into separate chromatids through cleavage and ligation; grey triangles denote cleavage sites for one of the possible resolutions that would result in a large-scale SCE event. Although illustrated for double-ended DNA breaks, single ended breaks from collapsed replication forks can be repaired through homologous recombination and could similarly lead to the formation of repair intermediate structures that can be resolved as SCEs. b, Enrichment analysis of SCE sites (red) compared with null expectations from randomly permuting locations into the analysable fraction of the genome (grey distributions), the black boxes denote 95% of 1,000 permutations. SCE events are enriched in later replicating and transcriptionally less active genomic regions (Hi-C defined compartment B), and correspondingly depleted from early replicating active regions. c, Aggregating across n = 9,645 SCE sites, the observed mutation rate approximately doubles at the inferred site of exchange (x = 0). Aggregate mutation rates (brown) were calculated in consecutive 5-kb windows. Compositionally matched null expectation was generated by permuting each exchange site into 100 proxy tumours and calculating median (black) and 95% confidence intervals (grey) while preserving the total number of projected sites per proxy tumour. d, The elevated mutation count is not the result of a high mutation density in a subset of exchange sites, rather it is a subtle increase in mutations across most exchange sites. Heatmap showing mutation counts calculated in consecutive 5-kb windows across each exchange site. Rows represent each exchange site, rank-ordered by total mutation count across each 400-kb interval. e, The distribution of positional uncertainty in exchange site location approximately mirrors the decay profile of elevated mutation frequency. f, Divergence of mutation rate spectra is shown as cosine distance between the analysed window and the genome wide mutation rate spectrum aggregated over all C3H tumours. Despite the elevated mutation frequency, there is no detected distortion of the mutation spectrum. g, A model based on homologous recombination repair intermediate, branch migration that produces heteroduplex segments of (i) mismatch:mismatch (circles) and (ii) lesion:lesion (red triangles) strands. Subsequent strand segregation would increase the mutational diversity of a descendant cell population but not the mutation count per cell (key as per Fig. 2).

Extended Data Fig. 5 Replication of TCR with lesion strand resolution in Mus musculus castaneus.

a, TCR of template strand lesions is dependent on transcription level (P15 liver, median TPM). Mutation rate estimates (circles) are the aggregate rates for expression level binned genes across CAST tumours (n = 84). Expression level bin 0 contains n = 2,645 genes, all subsequent bins contain n = 4,323 genes. See Methods for per-gene, per-tumour inclusion criteria. Empiric confidence intervals (99%) were calculated through bootstrap sampling (n = 100 replicates) of genes within the expression level bin. b, Comparison of mutation rates for the 64 trinucleotide contexts: each context has a high and a low expression point linked by a line. c, Sequence composition normalized profiles of mutation rate around TSS loci. d, Stratifying the data plotted in c by lesion strand reveals greater detail on the observed mutation patterns, including the pronounced influence of bidirectional transcription initiation.

Extended Data Fig. 6 Variant allele frequency distributions demonstrate high rates of non-mutagenic replication over segregating lesions.

af, VAF distributions shown as probability density functions (total area under curve = 1) for six example tumours, calculated taking into account observed multiallelic variation. The VAF for identified driver mutations is indicated (brown triangle). Tumour identifiers are shown top right along with the percent of genomic segments (based on mutation asymmetry segmentation) that are multiallelic. Skew shows Pearson’s median skewness coefficient for the VAF distributions. ac, Tumours with no multiallelic segments and exhibit a symmetric VAF distribution showing minimal sub-clonal structure. df, Tumours with all segments multiallelic, illustrating the sub-clonal structure generated by segregating lesions. g, Tumours with a high proportion of multiallelic segments have a left-skewed VAF distribution indicating frequent non-mutagenic replication over segregating lesions. Percent of genome segments that are multiallelic (x-axis) plotted against VAF distribution skew for 371 C3H tumours. Tumours with low estimated cellularity indicated in pink and excluded from correlation analysis (n = 335 independent tumour samples in Pearson’s correlation, two-sided significance from Fisher’s transform). h, As for g, but showing 84 CAST tumours (n = 77 independent tumours included in Pearson’s correlation). i, Mutation asymmetry summary ribbon for example C3H tumour 90797_N2; C3H genome on the x-axis. The percent of mutation sites with robust support for multiallelic variation (y-axis) calculated in 10Mb windows (grey) and for each asymmetric segment (black). Thresholds for high (black), intermediate (grey) and zero (red) rates of multiallelic sites shown on the right axis. j, VAF density plots for the example tumour 90797_N2 (shown in i) mutations in asymmetry segments stratified by the multiallelic rate thresholds defined in i. As with individual tumour based analysis (ah), high multiallelic rates correspond to a leftward skew of the VAF (black, grey) whereas segments without multiallelic variation (red) show a minimally skewed distribution.

Extended Data Fig. 7 Examples of mutation patterns generated by lesion segregation from a diverse range of clinically relevant mutagens.

ac, Genome-wide mutation asymmetry plots (shown as per Fig. 2a–c) for mutagen exposed human iPSCs5. Cells exposed to simulated solar radiation illustrate lesion segregation for ultraviolet damage (a). Immediately adjacent mutations (intermutation distance 100) indicate CC→TT dinucleotide changes. Despite a low total mutation load (1,308 nucleotide substitutions, 842 informative T→A changes), the mutational asymmetry of lesion segregation is evident for the aristolochic acid exposed clone5 (b) and the polycyclic aromatic hydrocarbon DBADE (c) that is found in tobacco smoke. d, Summary mutation asymmetry ribbons (as per Fig. 2d) for all mutagen exposed clones with rl20 > 5, which illustrates the independence of asymmetry pattern between replicate clones, almost universal asymmetry on chromosome X, and approximately 50% of the autosomal genome with asymmetry over autosomal chromosomes. The dominant mutation type is indicated for each mutagen. In those clones with low mutation rates, some sister exchange sites are likely to have been missed leading to reduced asymmetry signal (for example, on the X chromosome). Segments with <20 informative mutations are shown in white.

Extended Data Fig. 8 Lesion segregation is evident for multiple DNA damaging agents but not for damage independent mutational processes.

a, DEN induced C3H tumour genomes (n = 371) typically show significant mutational asymmetry across their genome. Wald–Wolfowitz runs test (x-axis) P-values calculated using a normal approximation (two-sided). Nominal P = 0.05 significance threshold indicated by dashed blue line, Bonferroni-corrected threshold shown as solid vertical blue line. P-values <1 × 10−15 are rank-ordered. The rl20 metric (Fig. 5a; Methods) is shown on the y-axis, horizontal blue line gives emprical significance threshold of rl20 > 5. b, Many human iPSCs grown from single cells after exogenous mutagen exposure5 show significant mutation asymmetry (n = 148 WGS, mutagen-exposed cell lines). Statistical calculations and plotting as in a, with adjustment of Bonferroni correction. Diverse categories of mutagen, denoted by point colour (see Fig. 5b), show asymmetry indicative of lesion segregation. c, Cell lines with genetically perturbed genome replication and maintenance machinery26 and similar mutation load to those in b do not show significant mutation asymmetry (n = 72 WGS, genetically perturbed cell-lines). Statistical calculations and plotting as in a with adjustment of Bonferroni correction.

Extended Data Table 1 A lesion segregation-based test for oncogenic selectionA lesion-segregation-based test for oncogenic selection

Supplementary information

Reporting Summary

Supplementary Table 1

Table of tumours sequenced containing key parameters and mutation spectra signature matrices. For each sequenced tumour sample, summarising: sample identifiers, animal and parent identifiers, diagnosis and histopathological features, mutation signatures and counts, cellularity estimates, count of sister chromatid exchange events, mutation asymmetry metrics, and identified driver mutations. Excel format file containing three sheets.

Supplementary Table 2

Table of exogenous mutagen and ICGC scan results. Summary of mutation rate and mutational asymmetry analysis of publicly available data from: (a) exogenous mutagen exposed human induced pluripotent stem-cells and (b) paired tumour-normal whole genome sequencing of clinical cancers from the International Cancer Genome Sequencing Consortium (ICGC). Excel format file containing three sheets.

Supplementary Table 3

Table of key resources and software. Summary of reagents, resources and software used, including where applicable unique identifiers, references and version numbers. Excel format file containing two sheets.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aitken, S.J., Anderson, C.J., Connor, F. et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 583, 265–270 (2020).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.