Somatic mutation landscapes at single-molecule resolution

Abascal, Federico; Harvey, Luke M. R.; Mitchell, Emily; Lawson, Andrew R. J.; Lensing, Stefanie V.; Ellis, Peter; Russell, Andrew J. C.; Alcantara, Raul E.; Baez-Ortega, Adrian; Wang, Yichen; Kwa, Eugene Jing; Lee-Six, Henry; Cagan, Alex; Coorens, Tim H. H.; Chapman, Michael Spencer; Olafsson, Sigurgeir; Leonard, Steven; Jones, David; Machado, Heather E.; Davies, Megan; Øbro, Nina F.; Mahubani, Krishnaa T.; Allinson, Kieren; Gerstung, Moritz; Saeb-Parsy, Kourosh; Kent, David G.; Laurenti, Elisa; Stratton, Michael R.; Rahbari, Raheleh; Campbell, Peter J.; Osborne, Robert J.; Martincorena, Iñigo

doi:10.1038/s41586-021-03477-4

Article
Published: 28 April 2021

Somatic mutation landscapes at single-molecule resolution

Nature volume 593, pages 405–410 (2021)Cite this article

56k Accesses
206 Citations
495 Altmetric
Metrics details

Subjects

Abstract

Somatic mutations drive the development of cancer and may contribute to ageing and other diseases^1,2. Despite their importance, the difficulty of detecting mutations that are only present in single cells or small clones has limited our knowledge of somatic mutagenesis to a minority of tissues. Here, to overcome these limitations, we developed nanorate sequencing (NanoSeq), a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations. This rate is two orders of magnitude lower than typical somatic mutation loads, enabling the study of somatic mutations in any tissue independently of clonality. We used this single-molecule sensitivity to study somatic mutations in non-dividing cells across several tissues, comparing stem cells to differentiated cells and studying mutagenesis in the absence of cell division. Differentiated cells in blood and colon displayed remarkably similar mutation loads and signatures to their corresponding stem cells, despite mature blood cells having undergone considerably more divisions. We then characterized the mutational landscape of post-mitotic neurons and polyclonal smooth muscle, confirming that neurons accumulate somatic mutations at a constant rate throughout life without cell division, with similar rates to mitotically active tissues. Together, our results suggest that mutational processes that are independent of cell division are important contributors to somatic mutagenesis. We anticipate that the ability to reliably detect mutations in single DNA molecules could transform our understanding of somatic mutagenesis and enable non-invasive studies on large-scale cohorts.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: BotSeqS and NanoSeq sequencing protocols.**

**Fig. 2: Mutation in stem and differentiated cells.**

**Fig. 3: Mutational landscape in neurons and smooth muscle.**

Analyzing somatic mutations by single-cell whole-genome sequencing

Article 23 November 2023

Dissecting single-cell genomes through the clonal organoid technique

Article Open access 18 October 2021

De novo detection of somatic mutations in high-throughput single-cell profiling data sets

Article Open access 06 July 2023

Data availability

Information on data availability for all samples is available in Supplementary Table 1. NanoSeq sequencing data have been deposited in the European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) under accession number EGAD00001006459. Sperm samples are available from the EGA under accession number EGAD00001007028. Standard sequencing data have been deposited in the EGA under accession number EGAD00001006595. For publicly available samples, references to the original sources are provided in Supplementary Table 1. Substitution and indel rates are available in Supplementary Table 4. Substitution and indel calls for samples sequenced with NanoSeq are available in Supplementary Tables 5, 6. Trinucleotide substitution profiles are available in Supplementary Table 7. A detailed NanoSeq protocol is available in Protocol Exchange⁵³.

Code availability

The bioinformatics pipeline to process NanoSeq sequencing data comprises all steps including processing sequencing data, mapping, calling mutations and calculating corrected burden estimates and substitution profiles. This code is available from https://zenodo.org/record/4604537 (https://doi.org/10.5281/zenodo.4604537). Pipelines to call indels, perform signature extraction and signature fitting with sigfit, simulate the efficiency of the NanoSeq protocol, calculate the mutation burden in specific genomic regions and reproduce most of the main plots are also available from https://zenodo.org/record/4604537. Analyses in R were done with R v.3.3 and v.3.6. R libraries used include: GenomicRanges⁵⁴ (v.1.38.0), Rsamtools (v.2.2.3), MASS (v.7.3-51.5), sigfit⁵² (v.2.0), readxl (v.1.3.1), deconstructSigs (v.1.8.0), lsa (v.0.73.2), deepSNV⁵⁵ (v.1.32.0), lme4 (v.1.1-26), afex (v.0.28-1), lmerTest (v.3.1-3), bootpredictlme4 (v.0.1) and Biostrings (v.2.54.0). Our pipeline makes use of samtools⁵⁶ v.1.9, bcftools⁵⁷ v.1.9, bwa v.0.7.5a-r405 and bedtools⁵⁸ v.2.29.0. We also used the following software: CaVeMan (v.2020), Pindel (v.2020) and MPBoot v.1.1.0.

References

Kennedy, S. R., Loeb, L. A. & Herr, A. J. Somatic mutations in aging, cancer and neurodegeneration. Mech. Ageing Dev. 133, 118–126 (2012).
Article CAS PubMed Google Scholar
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015
Article ADS CAS PubMed PubMed Central Google Scholar
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Article ADS CAS PubMed Google Scholar
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, R. et al. Macroscopic somatic clonal expansion in morphologically normal human urothelium. Science 370, 82–89 (2020).
Article ADS CAS PubMed Google Scholar
Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).
Article CAS PubMed PubMed Central Google Scholar
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Franco, I. et al. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat. Commun. 9, 800 (2018).
Article ADS PubMed PubMed Central Google Scholar
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Article ADS CAS PubMed Google Scholar
Brazhnik, K. et al. Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver. Sci. Adv. 6, eaax2659 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xing, D., Tan, L., Chang, C. H., Li, H. & Xie, X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).
Article PubMed PubMed Central Google Scholar
Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294.e20 (2019).
Article CAS PubMed PubMed Central Google Scholar
Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19, 269–285 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protocols 9, 2586–2606 (2014).
Article CAS PubMed Google Scholar
Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).
Article CAS PubMed PubMed Central Google Scholar
You, X. et al. Detection of genome-wide low-frequency mutations with paired-end and complementary consensus sequencing (PECC-seq) revealed end-repair-derived artifacts as residual errors. Arch. Toxicol. 94, 3475–3485 (2020).
Article CAS PubMed Google Scholar
Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Article CAS PubMed Google Scholar
Wyles, S. P., Brandt, E. B. & Nelson, T. J. Stem cells: the pursuit of genomic stability. Int. J. Mol. Sci. 15, 20948–20967 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Nicholson, A. M. et al. Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22, 909–918.e8 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pleguezuelos-Manzano, C. et al. Mutational signature in colorectal cancer caused by genotoxic pks⁺ E. coli. Nature 580, 269–273 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758 (2013).
Article PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Gabella, G. Cells of visceral smooth muscles. J. Smooth Muscle Res. 48, 65–95 (2012).
Article PubMed Google Scholar
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).
Article ADS CAS PubMed Google Scholar
Gao, Z., Wyman, M. J., Sella, G. & Przeworski, M. Interpreting the dependence of mutation rates on age and time. PLoS Biol. 14, e1002355 (2016).
Article PubMed PubMed Central Google Scholar
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).
Article CAS PubMed PubMed Central Google Scholar
Matsumura, S. et al. Genome-wide somatic mutation analysis via Hawk-seq™ reveals mutation profiles associated with chemical mutagens. Arch. Toxicol. 93, 2689–2701 (2019).
Article CAS PubMed Google Scholar
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protocols 16, 841–871 (2021).
Article CAS PubMed Google Scholar
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684 (2020).
Article CAS PubMed PubMed Central Google Scholar
Krishnaswami, S. R. et al. Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons. Nat. Protocols 11, 499–524 (2016).
Article CAS PubMed Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Tischler, G. & Leonard, S. biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol. Med. 9, 13 (2014).
Article PubMed Central Google Scholar
Gerstung, M. et al. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012).
Article ADS PubMed Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Zhang, F. et al. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 30, 185–194 (2020).
Article CAS PubMed PubMed Central Google Scholar
Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).
Article CAS PubMed PubMed Central Google Scholar
Robinson, P. S. et al. Elevated somatic mutation burdens in normal human cells due to defective DNA polymerases. Preprint at https://doi.org/10.1101/2020.06.23.167668 (2020).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Article Google Scholar
Raine, K. M. et al. Cgppindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr Protoc Bioinformatics 52, 15.17.1–15.17.12 (2015).
Article Google Scholar
Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
Article PubMed PubMed Central Google Scholar
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at https://doi.org/10.1101/372896 (2020).
Lensing S. V. et al. Somatic mutation landscapes at single-molecule resolution. Protocol Exchange https://doi.org/10.21203/rs.3.pex-1298/v1 (2021).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We are grateful to the live donors and the families of the deceased organ transplant donors. We thank L. Anderson, K. Roberts, C. Latimer, Q. Lin, members of the CGP-lab, R. Vicario, F. Geissmann, N. Angelopoulos, G. Tischler, T. Bellerby, M. Abascal and K. Chatterjee for assistance in the development of NanoSeq or with this manuscript; all NIHR BioResource Centre Cambridge volunteers for participation; the NIHR BioResource Centre Cambridge and staff for their contribution; the National Institute for Health Research and NHS Blood and Transplant; the Cambridge Blood and Stem Cell Biobank for sample donation and support of this work; the Cambridge Brain Bank for sample donation; and the participants and local coordinators at the TwinsUK study. This research was supported by the Cambridge NIHR BRC Cell Phenotyping Hub. I.M. is funded by Cancer Research UK (C57387/A21777) and the Wellcome Trust. P.J.C. is a Wellcome Trust Senior Clinical Fellow. R.R. is a recipient of a CRUK Career Development fellowship (C66259/A27114). E.L. is supported by a Wellcome/Royal Society Sir Henry Dale Fellowship (grant number 107630/Z/15/Z), the European Hematology Association, BBSRC and by core funding from Wellcome (grant number 203151/Z/16/Z) and MRC to the Wellcome-MRC Cambridge Stem Cell Institute. D.G.K. is supported by a Bloodwise Bennett Fellowship (15008), the Bill and Melinda Gates Foundation (INV-002189) and an ERC Starting Grant (ERC-2016-STG–715371). The TwinsUK study was funded by the Wellcome Trust and European Community’s Seventh Framework Programme (FP7/2007-2013). The TwinsUK study also receives support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health & Social Care.

Author information

Peter Ellis
Present address: Inivata, Babraham Research Campus, Cambridge, UK
Robert J. Osborne
Present address: Biofidelity, Cambridge Science Park, Cambridge, UK
These authors contributed equally: Luke M. R. Harvey, Emily Mitchell, Andrew R. J. Lawson, Stefanie V. Lensing, Peter Ellis

Authors and Affiliations

Wellcome Sanger Institute, Hinxton, UK
Federico Abascal, Luke M. R. Harvey, Emily Mitchell, Andrew R. J. Lawson, Stefanie V. Lensing, Peter Ellis, Andrew J. C. Russell, Raul E. Alcantara, Adrian Baez-Ortega, Yichen Wang, Eugene Jing Kwa, Henry Lee-Six, Alex Cagan, Tim H. H. Coorens, Michael Spencer Chapman, Sigurgeir Olafsson, Steven Leonard, David Jones, Heather E. Machado, Michael R. Stratton, Raheleh Rahbari, Peter J. Campbell, Robert J. Osborne & Iñigo Martincorena
Wellcome–MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, UK
Emily Mitchell, Megan Davies, Nina F. Øbro, David G. Kent & Elisa Laurenti
Department of Haematology, University of Cambridge, Cambridge, UK
Nina F. Øbro, Krishnaa T. Mahubani, Elisa Laurenti & Peter J. Campbell
Department of Surgery, University of Cambridge, Cambridge, UK
Krishnaa T. Mahubani & Kourosh Saeb-Parsy
NIHR Cambridge Biomedical Research Centre, Cambridge Biomedical Campus, Cambridge, UK
Krishnaa T. Mahubani & Kourosh Saeb-Parsy
Cambridge Brain Bank, Division of the Human Research Tissue Bank, Addenbrooke’s Hospital, Cambridge, UK
Kieren Allinson
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Moritz Gerstung
York Biomedical Research Institute, Department of Biology, University of York, York, UK
David G. Kent

Authors

Federico Abascal
View author publications
You can also search for this author in PubMed Google Scholar
Luke M. R. Harvey
View author publications
You can also search for this author in PubMed Google Scholar
Emily Mitchell
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R. J. Lawson
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie V. Lensing
View author publications
You can also search for this author in PubMed Google Scholar
Peter Ellis
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J. C. Russell
View author publications
You can also search for this author in PubMed Google Scholar
Raul E. Alcantara
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Baez-Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Yichen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Jing Kwa
View author publications
You can also search for this author in PubMed Google Scholar
Henry Lee-Six
View author publications
You can also search for this author in PubMed Google Scholar
Alex Cagan
View author publications
You can also search for this author in PubMed Google Scholar
Tim H. H. Coorens
View author publications
You can also search for this author in PubMed Google Scholar
Michael Spencer Chapman
View author publications
You can also search for this author in PubMed Google Scholar
Sigurgeir Olafsson
View author publications
You can also search for this author in PubMed Google Scholar
Steven Leonard
View author publications
You can also search for this author in PubMed Google Scholar
David Jones
View author publications
You can also search for this author in PubMed Google Scholar
Heather E. Machado
View author publications
You can also search for this author in PubMed Google Scholar
Megan Davies
View author publications
You can also search for this author in PubMed Google Scholar
Nina F. Øbro
View author publications
You can also search for this author in PubMed Google Scholar
Krishnaa T. Mahubani
View author publications
You can also search for this author in PubMed Google Scholar
Kieren Allinson
View author publications
You can also search for this author in PubMed Google Scholar
Moritz Gerstung
View author publications
You can also search for this author in PubMed Google Scholar
Kourosh Saeb-Parsy
View author publications
You can also search for this author in PubMed Google Scholar
David G. Kent
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Laurenti
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Stratton
View author publications
You can also search for this author in PubMed Google Scholar
Raheleh Rahbari
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Osborne
View author publications
You can also search for this author in PubMed Google Scholar
Iñigo Martincorena
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.J.O., F.A. and I.M. conceived the project. I.M., P.J.C., R.R. and M.R.S. supervised the project. F.A., R.J.O., E.M. and I.M. wrote the manuscript; all authors reviewed and edited the manuscript. R.J.O. led the development of the protocol with help from F.A., A.R.J.L., P.E., S.V.L. and I.M. R.J.O. and F.A. developed the bioinformatics pipeline with help from R.E.A., S.L. and D.J. F.A. led the analysis of the data with help from A.R.J.L., I.M., A.B.-O., Y.W., L.M.R.H., E.J.K., T.H.H.C., M.S.C. and M.G. E.M. performed the HSC/MPP experiments. L.M.R.H. and A.J.C.R. performed the cell sorting of neuronal nuclei. A.R.J.L. and A.C. performed laser microdissection. E.M., N.F.Ø., H.E.M., M.D., D.G.K., E.L., K.T.M., K.S.-P., K.A., R.R., H.L.-S. and S.O. collected and processed samples. E.M., E.L., M.G. and D.G.K. assisted in the interpretation of blood data.

Corresponding authors

Correspondence to Robert J. Osborne or Iñigo Martincorena.

Ethics declarations

Competing interests

A patent application on NanoSeq has been filed that includes R.J.O., F.A. and I.M.

Additional information

Peer review information Nature thanks John Dick and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Substitution imbalances and impact of A-tailing.

a, b, Imbalances in the distribution of the six complementary substitutions (for example, G>T versus C>A) across read positions in BotSeqS (a) and NanoSeq (b). c, Origin of G>T over C>A mutation call imbalances in standard sequencing²². d, Origin of imbalances in duplex sequencing/BotSeqS as a result of end repair during library preparation. e, Single-strand consensus calls for pyrimidine (top) and purine (bottom) substitutions for the standard BotSeqS (left) protocol and for NanoSeq with standard (middle) and modified (right) A-tailing protocols. For example, C>T changes are shown at the top, whereas the complementary G>A changes are shown at the bottom. By using ddBTPs, C>A, G>A and T>A errors are reduced, lowering the risk of false-positive double-strand consensus calls.

Extended Data Fig. 2 BotSeqS errors as a function of read-end trimming.

a, BotSeqS estimated burden for the granulocyte sample shown in Fig. 2c applying different extents of trimming to the 5′ ends of reads. Even with extensive trimming we predict at least 600 artifactual mutation calls per diploid genome. b, Substitution imbalances are observed deep into the reads and cannot be avoided with read trimming. Imbalances vary from experiment to experiment, as a consequence of DNA damage in the DNA source or during library preparation (Supplementary Note 1). c, Substitution profiles including the reference profile from single-cell-derived blood colonies and three BotSeqS profiles after trimming of 20, 40 and 60 bp from the 5′ end of reads (in addition to 15 bp trimming of the 3′ end). The text in the figure indicates the observed and expected cosine similarities (Methods) to the reference profile. C>A and C>G errors in BotSeqS remain after extensive trimming.

Extended Data Fig. 3 Mung Bean NanoSeq.

a, Estimated number of mutations per cord blood cell. Poisson 95% confidence intervals are shown as lines. The red dotted line shows the number of mutations per cord blood cell estimated with the restriction enzyme NanoSeq protocol, with Poisson 95% confidence intervals shown as a red shade. In contrast to Fig. 1f, we did not apply the correction for missing embryonic mutations because here we are comparing two protocols that are equally affected by this limitation. b, Substitution profiles for the standard end-repair protocol (BotSeqS) and for Mung Bean, showing the cosine similarities with the reference profile (Fig. 1c).

Extended Data Fig. 4 Optimization of duplicate rates, DNA input requirements and estimation of human contamination.

a, Relationship between sequencing yield, library complexity, duplicate rates and efficiency, based on a truncated Poisson model (Methods). Left, duplicate rate as a function of the sequencing ratio (sequencing reads/DNA fragments in the library). Middle, efficiency (measured as bases called with duplex coverage/bases sequenced) as a function of the duplicate rate. Right, efficiency as a function of sequencing ratio. b, Library yield as fmol per 25 μl as a function of the amount of input DNA in ng. c, Empirical relationship between the estimated fmol in library (measured by qPCR) and the number of unique molecules in the library estimated with Picard tools (Lander–Waterman equation) for our choice of restriction enzyme and fragment size selection (250–500 bp). d, Empirical relationship between duplicate rates and efficiency of the method, measured as duplex bases called/number of bases sequenced (that is, the number of paired-end reads multiplied by 300). The maximum efficiency (around 0.04) is lower than the maximum analytical expectation (0.12; middle panel in a) because of the trimming of read ends (barcodes, restriction sites and 8 bp from each end) and the strict filters that we apply to consider a site callable. e, VerifyBamId contamination estimates for different amounts of simulated contamination from individuals of different ancestry. f, Contamination simulation using two NanoSeq samples to contaminate each other.

Extended Data Fig. 5 Correction of standard (CaVEMan-based) mutation burden estimates and validation of NanoSeq indel calling.

a, Comparison of the mutation burden estimates in regions of the genome with at least 20× coverage (c) to the trinucleotide-context-corrected mutation burdens in the subset of c covered by NanoSeq and passing all NanoSeq filters. b, Ratio between the rates shown in a, showing that the corrected burden is approximately 20% higher than the uncorrected burden; box plots show the interquartile range, median and 95% confidence interval for the median. c, Comparison of indel rates between cord blood colonies (indels were called with the Pindel algorithm) and granulocytes from neonates (NanoSeq pipeline), showing Poisson 95% confidence intervals. Given the sparsity of indel calls in cord blood, data from different colonies (n = 100) and granulocytes (n = 2 donors, one of them with 5 replicates) were combined into single point estimates. d, The top two panels show the high similarity between the NanoSeq and Pindel indel profiles for a bladder tumour; the bottom two profiles show the indel spectra in blood from POLE and POLD1 germline mutation carriers, which are very similar to previously reported profiles⁴⁸.

Extended Data Fig. 6 Cell sorting of HSC/MPPs and colon histology.

a, Gating strategy for the isolation of HSC/MPPs from a representative bone marrow sample. Text above the plots indicates the population depicted. Text inside the plots indicates the name of the gates shown in pink. The CD34⁺CD38⁻ population is defined as the bottom 20% CD38⁻ as shown. For all initial samples (bone marrow, peripheral blood and cord blood), the index sorted population is the ‘HSC pool’ gate. Cell population abundance differed between samples but typically viable cells were 60–90% of total cells and singlets were 98–99% of viable cells. Live cells were 90–99% of viable cells and myeloid cells were 15–50% of live cells. CD34⁺ cells were typically 1–15% of myeloid cells. b, c, Colon histology sections showing microbiopsied areas of colonic epithelium and smooth muscle for donors PD34200 (b) and PD37449 (c). For donor PD34200, a single crypt, a pool of six crypts and two smooth muscle areas were sequenced. For donor PD37449, the two single crypts and the pool of six crypts were sequenced. The burden estimates for these microbiopsies are shown in Figs. 2d, 3j, k. d, Substitution profiles for colonic crypts from the three donors in Fig. 2d and cosine similarities to profiles obtained with standard methods.

Extended Data Fig. 7 Neuron nuclei sorting, comparison to single-cell data and accumulation of mutations with age.

a, Gating strategy for the isolation of neuronal nuclei from frontal cortex. Nuclei were sorted by FACS using an Influx cell sorter (BD Biosciences) with a 100-μm nozzle. For each sample an unstained control was used to help to determine the NeuN⁺ population. The text above each column indicates the population depicted and the text inside the plots indicates the population of the gates highlighted in black. Sorting results varied among samples, with 1–60% passing the DAPI gate and, of these, 2–53% passing a conservative NeuN⁺ gate. b, Substitution profiles for all mutations detected in neurons with SNP-phased error-corrected single-cell sequencing data from a previously published study¹³ (top) and with NanoSeq (middle). Bottom, a signature specific to single-cell sequencing data is shown (scF signature from a previous publication¹⁶). c, Mutational signatures extracted from a previously published study¹³, showing their relative contributions in the published dataset. These signatures were obtained using sigfit (Methods) on publicly available mutation calls and are referred to as LDA, LDB and LDC. Note the high similarity between the NanoSeq full spectrum for neurons and LDA (cosine similarity 0.96), and between scF and LDB (cosine similarity 0.97). d, Predicted contribution of LDA, LDB and LDC to each of the previously sequenced neurons¹³. e, Accumulation of mutations attributed to NanoSeq signatures A, B and C with age in healthy donors and in individuals with Alzheimer’s disease. f, Accumulation of mutations attributed to NanoSeq signatures A, B and C in smooth muscle from bladder and colon.

Extended Data Fig. 8 Normalized substitution spectra across different genomic regions.

a, Substitution spectra for neurons, granulocytes, smooth muscle and colonic crypts in chromatin states associated to transcription (states E4 and E5 in ENCODE) and inactive DNA (E9 and E15). Chromatin states were obtained from ENCODE⁵⁹, using the following epigenomes: E073 (frontal cortex), E030 (granulocytes), E076 (smooth muscle) and E075 (colonic mucosa). To enable the direct comparison of spectra across genomic regions with different trinucleotide frequencies, the profiles have been normalized to the genomic trinucleotide frequencies (Methods). b, Transcriptional strand asymmetries in neurons, granulocytes, smooth muscle and colonic crypts. c, Transcriptional strand asymmetries in neurons in quartiles of gene expression.

Extended Data Fig. 9 Additional substitution and indel spectra.

a, NanoSeq mutational spectrum for neurons corrected for trinucleotide frequency in the callable genome. Unlike the usual representation, which shows unnormalized rates, this representation shows mutation rates per available trinucleotide. b, Previously published LDA signature¹³ normalized to trinucleotide frequency in the genome also reveals high C>T rates at CpG dinucleotides. This observation from single-cell data suggests that the high C>T rates at CpG sites in NanoSeq neuron data (a) are not caused by contamination of NeuN⁺ pools with glia or other cells. c, Indel profiles of granulocytes (top) and colonic crypts without the colibactin signature (bottom). d, Indel profiles for the 250 most highly expressed genes in the PCAWG liver hepatocellular carcinoma data³¹.

Extended Data Fig. 10 Smooth muscle.

a, Histology of bladder smooth muscle showing two sections from donor PD40842; only one of the two sections was sequenced using NanoSeq. b, Number of mutations detected with CaVEMan in different smooth muscle sections processed with our standard microdissection sequencing protocol³⁸. The orange dots show the expected mutation burdens (with 95% confidence intervals) for these sections based on the donor age and the regression model shown in Fig. 3j. c, Distribution of VAFs for each of the smooth muscle sections using standard whole-genome sequencing. Box plots show the interquartile range, median, 95% confidence interval for the median (notches), and outliers (black dots).

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-10, including Supplementary Figures 1-5 and Supplementary References.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-8. Supplementary Table 1 lists samples used in this study and corresponding data availability. Supplementary Table 2 displays sequencing yields for NanoSeq/BotSeqS DNA libraries. Supplementary Table 3 shows in silico restriction enzyme digestion of the human genome. Supplementary Table 4 displays substitution and indel rates. Supplementary Table 5 shows substitution calls (NanoSeq protocol). Supplementary Table 6 shows indel calls (NanoSeq protocol). Supplementary Table 7 displays trinucleotide substitution profiles and Supplementary Table 8 shows Linear regression models.

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abascal, F., Harvey, L.M.R., Mitchell, E. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). https://doi.org/10.1038/s41586-021-03477-4

Download citation

Received: 13 November 2020
Accepted: 22 March 2021
Published: 28 April 2021
Issue Date: 20 May 2021
DOI: https://doi.org/10.1038/s41586-021-03477-4

This article is cited by

Tissue mosaicism following stem cell aging: blood as an exemplar
- Chiraag D. Kapadia
- Margaret A. Goodell
Nature Aging (2024)
Genetic variation across and within individuals
- Zhi Yu
- Tim H. H. Coorens
- Pradeep Natarajan
Nature Reviews Genetics (2024)
Droplet based whole genome amplification for sequencing minute amounts of purified Mycobacterium tuberculosis DNA
- Anzaan Dippenaar
- Nabila Ismail
- Robin M. Warren
Scientific Reports (2024)
Analyzing somatic mutations by single-cell whole-genome sequencing
- Lei Zhang
- Moonsook Lee
- Xiao Dong
Nature Protocols (2024)
Mitochondrial haplotype and mito-nuclear matching drive somatic mutation and selection throughout ageing
- Isabel M. Serrano
- Misa Hirose
- Peter H. Sudmant
Nature Ecology & Evolution (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.