Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage

Abstract

Mutational signatures are imprints of pathophysiological processes arising through tumorigenesis. We generated isogenic CRISPR–Cas9 knockouts (∆) of 43 genes in human induced pluripotent stem cells, cultured them in the absence of added DNA damage and performed whole-genome sequencing of 173 subclones. ∆OGG1, ∆UNG, ∆EXO1, ∆RNF168, ∆MLH1, ∆MSH2, ∆MSH6, ∆PMS1 and ∆PMS2 produced marked mutational signatures indicative of them being critical mitigators of endogenous DNA modifications. Detailed analyses revealed mutational mechanistic insights, including how 8-oxo-2′-deoxyguanosine elimination is sequence context specific while uracil clearance is sequence context independent. Mismatch repair (MMR) deficiency signatures are engendered by oxidative damage (C > A transversions) and differential misincorporation by replicative polymerases (T > C and C > T transitions), and we propose a reverse template slippage model for T > A transversions. ∆MLH1, ∆MSH6 and ∆MSH2 signatures were similar to each other but distinct from ∆PMS2. Finally, we developed a classifier, MMRDetect, where application to 7,695 whole-genome-sequenced cancers showed enhanced detection of MMR-deficient tumors, with implications for responsiveness to immunotherapies.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Mutational consequences of DNA replicative/repair pathway gene knockouts.
Fig. 2: Safeguarding the genome from oxidative damage and cytosine deamination.
Fig. 3: Multiple endogenous sources of DNA damage managed by MMR.
Fig. 4: Gene-specific features of signatures of MMRd are recapitulated in other model systems.
Fig. 5: Mutational signature-based MMRd classifier, MMRDetect.
Fig. 6: Impact of experimental validation of cancer-derived mutational signatures on biological understanding and the development of clinical applications.

Data availability

Raw sequence files are deposited at the European Genome-Phenome Archive with accession numbers EGAS00001000800 and EGAS00001000874. Mutation calls have been deposited at Mendeley: https://doi.org/10.17632/ymn3ykkmyx. hiPSCs can be obtained directly from the authors. The curated data are available for general browsing from our reference mutational signatures website, Signal (https://signal.mutationalsignatures.com). Age information relating to human patient samples is not publicly available as this could compromise privacy and lead to identification of the individuals. Publicly available genomic datasets reanalyzed here to compare the performance of MMRDetect and MSIseq are available from the European Genome-Phenome Archive (EGAS0001001178)72, http://dcc.icgc.org/pcawg/ (ref. 73), https://data.mendeley.com/datasets/2mn4ctdpxp/1 (ref. 74), https://resources.hartwigmedicalfoundation.nl/ (ref. 75) and the Genomics England Research Environment (main program version 8) via https://re.extge.co.uk/ovd/. Source data are provided with this paper. All other data supporting the findings of this study are available from the corresponding author upon reasonable request.

Code availability

The R code used to generate results presented in Figs. 1–5 and the R source code of MMRDetect can be obtained from https://github.com/Nik-Zainal-Group/COMSIG_KO.git.

References

  1. Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Haradhvala, N. J. et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Nik-Zainal, S. et al. The genome as a record of environmental exposure. Mutagenesis 30, 763–770 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Zou, X. et al. Validating the concept of mutational signatures with isogenic cell models. Nat. Commun. 9, 1744 (2018).

    PubMed  PubMed Central  Google Scholar 

  10. Christensen, S. et al. 5-Fluorouracil treatment induces characteristic T>G mutations in human cancer. Nat. Commun. 10, 4571 (2019).

    PubMed  PubMed Central  Google Scholar 

  11. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836.e16 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Lindahl, T. & Nyberg, B. Rate of depurination of native deoxyribonucleic acid. Biochemistry 11, 3610–3618 (1972).

    CAS  PubMed  Google Scholar 

  13. Mardis, E. R.The impact of next-generation sequencing on cancer genomics: from discovery to clinic. Cold Spring Harb. Perspect. Med. 9, a036269 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Berger, M. F. & Mardis, E. R. The emerging clinical relevance of genomics in cancer medicine. Nat. Rev. Clin. Oncol. 15, 353–365 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. David, S. S., O’Shea, V. L. & Kundu, S. Base-excision repair of oxidative DNA damage. Nature 447, 941–950 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Kunkel, T. A. & Erie, D. A. DNA mismatch repair. Annu. Rev. Biochem. 74, 681–710 (2005).

    CAS  PubMed  Google Scholar 

  17. Kottemann, M. C. & Smogorzewska, A. Fanconi anaemia and the repair of Watson and Crick DNA crosslinks. Nature 493, 356–363 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Wood, R. D., Mitchell, M., Sgouros, J. & Lindahl, T. Human DNA repair genes. Science 291, 1284–1289 (2001).

    CAS  PubMed  Google Scholar 

  19. Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52–64 (2016).

    CAS  PubMed  Google Scholar 

  20. Hanawalt, P. C. & Spivak, G. Transcription-coupled DNA repair: two decades of progress and surprises. Nat. Rev. Mol. Cell Biol. 9, 958–970 (2008).

    CAS  PubMed  Google Scholar 

  21. Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).

    PubMed  PubMed Central  Google Scholar 

  22. Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  23. Evans, M. D., Dizdaroglu, M. & Cooke, M. S. Oxidative DNA damage and disease: induction, repair and significance. Mutat. Res. 567, 1–61 (2004).

    CAS  PubMed  Google Scholar 

  24. Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).

    PubMed  PubMed Central  Google Scholar 

  25. Pilati, C. et al. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J. Pathol. 242, 10–15 (2017).

    CAS  PubMed  Google Scholar 

  26. Radicella, J. P., Dherin, C., Desmaze, C., Fox, M. S. & Boiteux, S. Cloning and characterization of hOGG1, a human homolog of the OGG1 gene of Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 94, 8010–8015 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Bruner, S. D., Norman, D. P. G. & Verdine, G. L. Structural basis for recognition and repair of the endogenous mutagen 8-oxoguanine in DNA. Nature 403, 859–866 (2000).

    CAS  PubMed  Google Scholar 

  28. Lee, Y. A., Durandin, A., Dedon, P. C., Geacintov, N. E. & Shafirovich, V. Oxidation of guanine in G, GG, and GGG sequence contexts by aromatic pyrenyl radical cations and carbonate radical anions: relationship between kinetics and distribution of alkali-labile lesions. J. Phys. Chem. B 112, 1834–1844 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Sugiyama, H. & Saito, I. Theoretical studies of GG-specific photocleavage of DNA via electron transfer: significant lowering of ionization potential and 5′-localization of HOMO of stacked GG bases in B-form DNA. J. Am. Chem. Soc. 118, 7063–7068 (1996).

    CAS  Google Scholar 

  30. Allgayer, J., Kitsera, N., von der Lippen, C., Epe, B. & Khobta, A. Modulation of base excision repair of 8-oxoguanine by the nucleotide sequence. Nucleic Acids Res. 41, 8559–8571 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Banerjee, A., Yang, W., Karplus, M. & Verdine, G. L. Structure of a repair enzyme interrogating undamaged DNA elucidates recognition of damaged DNA. Nature 434, 612–618 (2005).

    CAS  PubMed  Google Scholar 

  32. Banerjee, A. & Verdine, G. L. A nucleobase lesion remodels the interaction of its normal neighbor in a DNA glycosylase complex. Proc. Natl Acad. Sci. USA 103, 15020–15025 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Friedman, J. I. & Stivers, J. T. Detection of damaged DNA bases by DNA glycosylase enzymes. Biochemistry 49, 4957–4967 (2010).

    CAS  PubMed  Google Scholar 

  34. Lutsenko, E. & Bhagwat, A. S. Principal causes of hot spots for cytosine to thymine mutations at sites of cytosine methylation in growing cells. A model, its experimental support and implications. Mutat. Res. 437, 11–20 (1999).

    CAS  PubMed  Google Scholar 

  35. Shen, J. C., Rideout, W. M. 3rd & Jones, P. A. The rate of hydrolytic deamination of 5-methylcytosine in double-stranded DNA. Nucleic Acids Res. 22, 972–976 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Waters, T. R. & Swann, P. F. Thymine-DNA glycosylase and G to A transition mutations at CpG sites. Mutat Res. 462, 137–147 (2000).

    CAS  PubMed  Google Scholar 

  37. Sanders, M. A. et al. MBD4 guards against methylation damage and germ line deficiency predisposes to clonal hematopoiesis and early-onset AML. Blood 132, 1526–1534 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Barnes, D. E. & Lindahl, T. Repair and genetic consequences of endogenous DNA base damage in mammalian cells. Ann. Rev. Genet. 38, 445–476 (2004).

    CAS  PubMed  Google Scholar 

  39. Mol, C. D. et al. Crystal structure and mutational analysis of human uracil-DNA glycosylase: structural basis for specificity and catalysis. Cell 80, 869–878 (1995).

    CAS  PubMed  Google Scholar 

  40. Grolleman, J. E. et al. Mutational signature analysis reveals NTHL1 deficiency to cause a multi-tumor phenotype. Cancer Cell 35, 256–266.e5 (2019).

    CAS  PubMed  Google Scholar 

  41. Genschel, J. & Modrich, P. Mechanism of 5′-directed excision in human mismatch repair. Mol. Cell 12, 1077–1086 (2003).

    CAS  PubMed  Google Scholar 

  42. Bolderson, E. et al. Phosphorylation of Exo1 modulates homologous recombination repair of DNA double-strand breaks. Nucleic Acids Res. 38, 1821–1831 (2010).

    CAS  PubMed  Google Scholar 

  43. Mattiroli, F. et al. RNF168 ubiquitinates K13–15 on H2A/H2AX to drive DNA damage signaling. Cell 150, 1182–1195 (2012).

    CAS  PubMed  Google Scholar 

  44. Bohgaki, M. et al. RNF168 ubiquitylates 53BP1 and controls its response to DNA double-strand breaks. Proc. Natl Acad. Sci. USA 110, 20982 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Doil, C. et al. RNF168 binds and amplifies ubiquitin conjugates on damaged chromosomes to allow accumulation of repair proteins. Cell 136, 435–446 (2009).

    CAS  PubMed  Google Scholar 

  46. Stewart, G. S. et al. The RIDDLE syndrome protein mediates a ubiquitin-dependent signaling cascade at sites of DNA damage. Cell 136, 420–434 (2009).

    CAS  PubMed  Google Scholar 

  47. Gupta, S., Gellert, M. & Yang, W. Mechanism of mismatch recognition revealed by human MutSβ bound to unpaired DNA loops. Nat. Struct. Mol. Biol. 19, 72–78 (2012).

    CAS  Google Scholar 

  48. Palombo, F. et al. GTBP, a 160-kilodalton protein essential for mismatch-binding activity in human cells. Science 268, 1912–1914 (1995).

    CAS  PubMed  Google Scholar 

  49. Warren, J. J. et al. Structure of the human MutSα DNA lesion recognition complex. Mol. Cell 26, 579–592 (2007).

    CAS  PubMed  Google Scholar 

  50. Andrianova, M. A., Bazykin, G. A., Nikolaev, S. I. & Seplyarskiy, V. B. Human mismatch repair system balances mutation rates between strands by removing more mismatches from the lagging strand. Genome Res. 27, 1336–1343 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Lujan, S. A. et al. Mismatch repair balances leading and lagging strand DNA replication fidelity. PLoS Genet. 8, e1003016 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Morganella, S. et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Aboul-ela, F., Koh, D., Tinoco, I. Jr. & Martin, F. H. Base–base mismatches. Thermodynamics of double helix formation for dCA3XA3G + dCT3YT3G (X, Y = A,C,G,T). Nucleic Acids Res. 13, 4811–4824 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Mazurek, A., Berardini, M. & Fishel, R. Activation of human MutS homologs by 8-oxo-guanine DNA damage. J. Biol. Chem. 277, 8260–8266 (2002).

    CAS  PubMed  Google Scholar 

  55. Morikawa, M. et al. Analysis of guanine oxidation products in double-stranded DNA and proposed guanine oxidation pathways in single-stranded, double-stranded or quadruplex DNA. Biomolecules 4, 140–159 (2014).

    PubMed  PubMed Central  Google Scholar 

  56. Pavlov, Y. I., Newlon, C. S. & Kunkel, T. A. Yeast origins establish a strand bias for replicational mutagenesis. Mol. Cell 10, 207–213 (2002).

    CAS  PubMed  Google Scholar 

  57. Mudrak, S. V., Welz-Voegele, C. & Jinks-Robertson, S.The polymerase η translesion synthesis DNA polymerase acts independently of the mismatch repair system to limit mutagenesis caused by 7,8-dihydro-8-oxoguanine in yeast. Mol. Cell. Biol. 29, 5316–5326 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Meier, B. et al. Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers. Genome Res. 28, 666–675 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Lang, G. I., Parsons, L. & Gammie, A. E.Mutation rates, spectra, and genome-wide distribution of spontaneous mutations in mismatch repair deficient yeast. G3 (Bethesda) 3, 1453–1465 (2013).

    PubMed Central  Google Scholar 

  60. Drummond, J. T., Li, G. M., Longley, M. J. & Modrich, P.Isolation of an hMSH2-p160 heterodimer that restores DNA mismatch repair to tumor cells. Science 268, 1909–1912 (1995).

    CAS  PubMed  Google Scholar 

  61. Palombo, F. et al. hMutSβ, a heterodimer of hMSH2 and hMSH3, binds to insertion/deletion loops in DNA. Curr. Biol. 6, 1181–1184 (1996).

    CAS  PubMed  Google Scholar 

  62. Wind, Nd et al. HNPCC-like cancer predisposition in mice through simultaneous loss of Msh3 and Msh6 mismatch-repair protein functions. Nat. Genet. 23, 359–362 (1999).

  63. Poulogiannis, G., Frayling, I. M. & Arends, M. J. DNA mismatch repair deficiency in sporadic colorectal cancer and Lynch syndrome. Histopathology 56, 167–179 (2010).

    PubMed  Google Scholar 

  64. Heinen, C. D. Mismatch repair defects and Lynch syndrome: the role of the basic scientist in the battle against cancer. DNA Repair 38, 127–134 (2016).

    CAS  PubMed  Google Scholar 

  65. Agu, C. A. et al. Successful generation of human induced pluripotent stem cell lines from blood samples held at room temperature for up to 48 hr. Stem Cell Rep. 5, 660–671 (2015).

    CAS  Google Scholar 

  66. Ni Huang, M. et al. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci. Rep. 5, 13321 (2015).

    PubMed Central  Google Scholar 

  67. Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2013).

    PubMed  PubMed Central  Google Scholar 

  68. Wang, C. & Liang, C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci. Rep. 8, 17546 (2018).

    PubMed  PubMed Central  Google Scholar 

  69. Cortes-Ciriano, I., Lee, S., Park, W.-Y., Kim, T.-M. & Park, P. J. A molecular portrait of microsatellite instability across multiple cancers. Nat. Commun. 8, 15180 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Salipante, S. J., Scroggins, S. M., Hampel, H. L., Turner, E. H. & Pritchard, C. C. Microsatellite instability detection by next generation sequencing. Clin. Chem. 60, 1192–1199 (2014).

    CAS  PubMed  Google Scholar 

  71. Hause, R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J.Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).

    CAS  PubMed  Google Scholar 

  72. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Google Scholar 

  74. Staaf, J. et al. Whole-genome sequencing of triple-negative breast cancers in a population-based clinical study. Nat. Med. 25, 1526–1533 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Fujimoto, A. et al. Comprehensive analysis of indels in whole-genome microsatellite regions and microsatellite instability across 21 cancer types. Genome Res. 30, 334–346 (2020).

    CAS  PubMed Central  Google Scholar 

  77. Campbell, B. B. et al. Comprehensive analysis of hypermutation in human cancer. Cell 171, 1042–1056.e10 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Bressan, R. B. et al. Efficient CRISPR/Cas9-assisted gene targeting enables rapid and precise genetic manipulation of mammalian neural stem cells. Development 144, 635–648 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Tate, P. H. & Skarnes, W. C. Bi-allelic gene targeting in mouse embryonic stem cells. Methods 53, 331–338 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Hodgkins, A. et al. WGE: a CRISPR database for genome engineering. Bioinformatics 31, 3078–3080 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  82. Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Google Scholar 

  83. Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics 52, 15.7.1–15.7.12 (2015).

    Google Scholar 

  84. Cradick, T. J., Qiu, P., Lee, C. M., Fine, E. J. & Bao, G. COSMID: a web-based tool for identifying and validating CRISPR/Cas off-target sites. Mol. Ther. Nucleic Acids 3, e214 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. The Encode Project Consortium et al.An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    PubMed Central  Google Scholar 

  86. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2017).

  88. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).

Download references

Acknowledgements

We thank the Wellcome Sanger Institute Cellular Genetics and Phenotyping Facility for assistance and the CASM IT team, J. Foreman and G. Ping for assistance in carrying out and completing this project. We thank the COMSIG Consortium spearheaded by S. Jackson. In Cambridge, this work was funded by the Cancer Research UK (CRUK) Advanced Clinician Scientist Award (C60100/A23916), the Dr. Josef Steiner Cancer Research Award 2019, a Medical Research Council (MRC) Grant-in-Aid to the MRC Cancer Unit, the CRUK Pioneer Award, a Wellcome Strategic Award (WT101126) and Wellcome Sanger Institute faculty funding and supported by the National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre (BRC-1215-20014) and UK Regenerative Medicine Platform (MR/R015724/1). The work of T.I.R. and J.S.C. was funded by a CRUK Centre grant (reference number C309/A25144). Support for the MMRDetect classifier was enabled by access to data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England (a wholly owned company of the Department of Health and Social Care), funded by the NIHR and NHS England. The Wellcome Trust, CRUK and the MRC have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. The views expressed are those of the author(s) and not necessarily those of the NIHR or Department of Health and Social Care. This publication and the underlying research were facilitated by data that were generated by the Hartwig Medical Foundation (HMF) and Center for Personalized Cancer Treatment (CPCT) in the Netherlands and the International Cancer Genome Consortium.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

S.N.-Z. and W.C.S. conceived of the study idea. S.N.-Z., X.Z., G.C.C.K., J.J. and W.C.S. wrote the paper. L.S., G.B., V.P.-A., D.R. and S.N.-Z. collected the clinical samples. G.C.C.K., K.U., T.I.R., C.A.A., W.B. and C.G. performed the laboratory work. X.Z., G.C.C.K., A.S.N., A.D., C.B., S.M., T.D.A., T.I.R., J.S.C. and S.N.-Z. performed data curation and formal analysis. R.H., W.B. and J.Y. performed administrative tasks.

Corresponding author

Correspondence to Serena Nik-Zainal.

Ethics declarations

Competing interests

S.N.-Z. holds patents on clinical algorithms of mutational signatures and, during completion of this project, served advisory roles for AstraZeneca, Artios Pharma and the Scottish Genomes Project.

Additional information

Peer review information Nature Cancer thanks Daniel Durocher and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Results of pilot study.

Three genes were selected for knockout (∆): MSH6, UNG and ATP2B4 (negative control). Two genotypes per gene were obtained and grown in culture to gauge reproducibility of signatures between different genotypes of a gene-knockout. These lines were cultured under normoxic (20%) and hypoxic (3%) states, for defined culture times of ~15, 30 or 45 days. Two single-cell subclones were derived for whole genome sequencing for each parental line (equivalent to four subclones per gene edit). One of the UNG genotypes appeared to be heterozygous, which was excluded in downstream analysis. a, Substitution burden for knockouts of ATP2B4, UNG and MSH6 under hypoxic and normoxic conditions as well as different culturing time. b, The cosine similarities between the mutational profile of each subclone and background signature of culture. c, Indel burden for knockouts of ATP2B4, UNG and MSH6 under hypoxic and normoxic conditions as well as different culturing time. d, The cosine similarities between the mutational profile of each subclone with background signature of culture. Overall, the differences between normoxic and hypoxic conditions were not marked, although normoxic conditions produced slightly more mutations. Time in culture made only a marginal, non-linear difference to burden of mutagenesis. Given the results of the pilot, weighing up the costs and risks associated with prolonged culture time (risk of infection, risk of selection, marked increase in cost of experimental reagents) with the minimal return in terms of mutation number, and also intending to minimize transitions between hypoxic to normoxic conditions while handling cell cultures, we opted to proceed with the full-scale study under normoxic conditions and for 15 days for the rest of study.

Source data

Extended Data Fig. 2 Detecting mutational consequences of knockouts in the absence of added external DNA damage.

a,b, Schematic illustration of potential components of background signature (a) and possible mutational consequences of the DNA repair gene knockouts for proteins that are critical mitigators of mutagenesis (b). c-e, Mutation burden of whole-genome-sequenced subclones of gene knockouts. c, Substitution, (d) indel and (e) double substitution. Bars represent the mean. Individual data points are shown in orange dots. In all comparative analyses, all gene knockouts were cultured for 15 days and only daughter subclones that were fully clonal (that is, clearly derived from a single cell) were included. N = 2~4, which is the number of clonal knockout subclones cultured under normoxic condition for 15 days (see Supplementary Table 2). f, 96-channel substitution mutation profiles of 173 gene knockout subclones.

Source data

Extended Data Fig. 3 Results of contrastive principal component analysis and t-SNE.

a, Contrastive principal component analysis (cPCA) was employed to discriminate knockout profiles from control profiles (∆ATP2B4). Each figure contains six different genes. Nine gene knockouts separate from the controls. Using this method, ∆ADH5 did not separate clearly from ∆ATP2B4, indicative of either having no signature or a weak signature. Dot colours indicate the repair/replicative pathway that each gene is involved: in black - control; green - MMR; orange – BER; dark purple – HR and HR regulation; light purple - checkpoint. Each dot represents a subclone. The number of subclones for each gene knockout (N = 2~4) can be found in Supplementary Table 2. b, The t-SNE algorithm was applied to discriminate the mutational profiles of gene knockouts from those of control knockouts. Gene knockouts that produce mutational signatures separate clearly from control subclones and other knockouts which do not have signatures. Subclones of the gene knockouts which produce signatures are clustered together, indicating consistency between subclones.

Source data

Extended Data Fig. 4 Oxidative damage-associated mutational signatures.

a, Relative mutation frequency of G>T/C>A in 256 possible channels which take two adjacent bases 5’ and 3’ of each mutated base (4×4×4×4=256) for ∆ATP2B4, ∆OGG1, a head and neck cancer with strong SBS18 and SBS18. b, Left: tSNE plot of tissue-specific mutational signature 18. Two groups are featured with predominant peaks at TGC>TTC/GCA>GAA (highlighted in green) and AGA>ATA/TCT>TAT (highlighted in purple), respectively. Right: heatmap of 21 tissue-specific mutational signatures at C>A. We compared experimental signatures to previously published cancer-derived signatures, focusing on 21 tissue-specific variations of Signature 18. Interestingly, we found two distinct groups of Signature 18. Signatures of ∆OGG1, cellular models and signatures derived from head and neck tumors, pancreas, myeloid, bladder, uterus, cervix, lymphoid tumors were most similar to each other, with the predominant G>T/C>A peak at TGC>TTC/GCA>GAA. By contrast, an alternative version of this signature with a predominant G>T/C>A peak at AGA>ATA/TCT>TAT was noted in colorectal, esophagus, stomach, bone, lung, CNS, breast, skin, prostate, liver, head and neck tumors (Signature Head_neck_G), ovary, biliary and kidney cancers. Indeed, there are many types of oxidative species which could fluctuate between tissues, variably affecting trinucleotides resulting in the variation observed in Signature 18.

Source data

Extended Data Fig. 5 Indel signatures and double substitution signatures.

a, 15-channel Indel signatures. b, 186-channel Indel signatures. c, Aggregated double substitution profile of ∆RNF168 and ∆EXO1.

Source data

Extended Data Fig. 6 Similarities between ∆EXO1, ∆RNF168 signatures and RefSig5 and results of analysis on transcriptional strand bias and distribution of mutations on replication timing domains.

a, Hierarchical clustering of cancer-derived reference signatures (RefSig) with ∆EXO1 and ∆RNF168 signatures. b, Hierarchical clustering of tissue-specific signature 5 with ∆EXO1 and ∆RNF168 signatures. c, Transcriptional strand bias in 9 gene knockouts. Pearson’s Chi-Squared test (chisq.test()) was used to calculate the p-value. P-value was corrected using p.adjust(). Unlike mutational signatures of environmental mutagens, we do not observe striking transcriptional strand bias in signatures generated by DNA repair gene knockouts, except for T>C generated by ∆EXO1 and ∆RNF168. Since transcriptional strand bias is largely induced by NER repairing DNA bulky adducts, lack of it indicates that most of the endogenous DNA damage is not particularly bulky or DNA-deforming. d, Distribution of mutation density across replication timing domains (separated into deciles) for signatures associated with different gene knockouts. Green bars indicate observed distribution. Blue lines indicate expected distribution with correction of trinucleotide density of each domain. Bars and error bars represent mean ± SD of bootstrapping replicates (n=100).

Source data

Extended Data Fig. 7 Putative outcomes of all possible base-base mismatches.

Outcomes from 12 possible base-base mismatches. The red and black strands represent lagging and leading strands, respectively. The arrowed strand is the nascent strand. The highlighted pathways are the ones that generate C>A (blue), C>T (red) and T>C mutations (green) in the ∆MSH2 mutational signature.

Extended Data Fig. 8 Distribution of G>T/C>A mutations in polyG tracts of ∆MSH2, ∆MSH6 and ∆MLH1.

a, Relative frequency of occurrence of G>T/C>A in polyG tracts. b, Occurrence of G>T/C>A in polyG tracts.

Source data

Extended Data Fig. 9 Gene-specific mutational signatures in MMR-deficiency.

Proportion of different mutation types of substitution (a) and indel (b) signatures for four MMR gene knockouts. c, The ratio of substitution and indel burden. d, Schematic interpretation of the relative mutation burdens of ∆MSH2 and ∆MSH6.

Source data

Extended Data Fig. 10 Development of MMRDetect.

(a)-(e) Distribution of the five parameters across IHC-determined MMR gene abnormal (orange) and MMR gene normal (green) samples. black dots and error bars represent mean ± SD of the paramenters. NAbnormal=79 samples (yellow); NNormal= 257 samples (green). a, Exposure of MMRd signatures. b, Cosine similarity between the substitution profile of cancer samples and that of MMR gene knockouts. c, Number of indels in repetitive regions. d, Cosine similarity between the profile of repeat-mediated deletions of cancer sample and that of knockout generated indel signatures, (e) the cosine similarity between the profile of repeat-mediated insertion of cancer sample and that of knockout generated indel signatures. P-values were calculated through two-sided Mann-Whitney test. f, Distribution of coefficients from 10-fold cross validation using training data set. Box plots denote median (horizontal line) and 25th to 75th percentiles (boxes). The lower and upper whiskers extend to 1.5× the inter-quartile range. N = 10 iterations. g, MMRDetect-calculated probabilities for 336 colorectal cancers. With cut-off of 0.7, 77 out of 336 were predicted to be MMR-deficient samples (probability < 0.7). Colour bars represent the MSI status determined by IHC staining: red – abnormal; blue – normal. Four samples with abnormal IHC staining have probabilities > 0.7, whilst 2 samples with normal IHC staining have probabilities < 0.7. The four samples were revealed to be false positive cases and the two samples were false negative ones for IHC staining through validation using MSIseq and seeking coding mutations in MMR genes. h, Distribution of the mutation number of repeat-mediated indels, MMRd signatures and non-MMRd signatures across four groups of samples: MMR-deficient samples determined by only MMRDetect (yellow), MMR-deficient samples determined by only MSIseq (purple), MMR-deficient samples determined by both MMRDetect and MSIseq (blue) and non-MMR-deficient samples determined by both MMRDetect and MSIseq (pink). P-values were calculated through two-sided Mann-Whitney test. Numbers of MMR-deficient samples determined by MMRDetect only (blue), MSIseq only (pink), both (yellow) and none (purple) are 34, 20, 587 and 6,718, respectively.

Source data

Supplementary information

Supplementary Information

Supplementary Table 6.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–5 and 7–15.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zou, X., Koh, G.C.C., Nanda, A.S. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat Cancer 2, 643–657 (2021). https://doi.org/10.1038/s43018-021-00200-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43018-021-00200-0

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing