Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Somatic mutation landscapes at single-molecule resolution


Somatic mutations drive the development of cancer and may contribute to ageing and other diseases1,2. Despite their importance, the difficulty of detecting mutations that are only present in single cells or small clones has limited our knowledge of somatic mutagenesis to a minority of tissues. Here, to overcome these limitations, we developed nanorate sequencing (NanoSeq), a duplex sequencing protocol with error rates of less than five errors per billion base pairs in single DNA molecules from cell populations. This rate is two orders of magnitude lower than typical somatic mutation loads, enabling the study of somatic mutations in any tissue independently of clonality. We used this single-molecule sensitivity to study somatic mutations in non-dividing cells across several tissues, comparing stem cells to differentiated cells and studying mutagenesis in the absence of cell division. Differentiated cells in blood and colon displayed remarkably similar mutation loads and signatures to their corresponding stem cells, despite mature blood cells having undergone considerably more divisions. We then characterized the mutational landscape of post-mitotic neurons and polyclonal smooth muscle, confirming that neurons accumulate somatic mutations at a constant rate throughout life without cell division, with similar rates to mitotically active tissues. Together, our results suggest that mutational processes that are independent of cell division are important contributors to somatic mutagenesis. We anticipate that the ability to reliably detect mutations in single DNA molecules could transform our understanding of somatic mutagenesis and enable non-invasive studies on large-scale cohorts.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: BotSeqS and NanoSeq sequencing protocols.
Fig. 2: Mutation in stem and differentiated cells.
Fig. 3: Mutational landscape in neurons and smooth muscle.

Data availability

Information on data availability for all samples is available in Supplementary Table 1. NanoSeq sequencing data have been deposited in the European Genome-phenome Archive (EGA; under accession number EGAD00001006459. Sperm samples are available from the EGA under accession number EGAD00001007028. Standard sequencing data have been deposited in the EGA under accession number EGAD00001006595. For publicly available samples, references to the original sources are provided in Supplementary Table 1. Substitution and indel rates are available in Supplementary Table 4. Substitution and indel calls for samples sequenced with NanoSeq are available in Supplementary Tables 5, 6. Trinucleotide substitution profiles are available in Supplementary Table 7. A detailed NanoSeq protocol is available in Protocol Exchange53.

Code availability

The bioinformatics pipeline to process NanoSeq sequencing data comprises all steps including processing sequencing data, mapping, calling mutations and calculating corrected burden estimates and substitution profiles. This code is available from ( Pipelines to call indels, perform signature extraction and signature fitting with sigfit, simulate the efficiency of the NanoSeq protocol, calculate the mutation burden in specific genomic regions and reproduce most of the main plots are also available from Analyses in R were done with R v.3.3 and v.3.6. R libraries used include: GenomicRanges54 (v.1.38.0), Rsamtools (v.2.2.3), MASS (v.7.3-51.5), sigfit52 (v.2.0), readxl (v.1.3.1), deconstructSigs (v.1.8.0), lsa (v.0.73.2), deepSNV55 (v.1.32.0), lme4 (v.1.1-26), afex (v.0.28-1), lmerTest (v.3.1-3), bootpredictlme4 (v.0.1) and Biostrings (v.2.54.0). Our pipeline makes use of samtools56 v.1.9, bcftools57 v.1.9, bwa v.0.7.5a-r405 and bedtools58 v.2.29.0. We also used the following software: CaVeMan (v.2020), Pindel (v.2020) and MPBoot v.1.1.0.


  1. 1.

    Kennedy, S. R., Loeb, L. A. & Herr, A. J. Somatic mutations in aging, cancer and neurodegeneration. Mech. Ageing Dev. 133, 118–126 (2012).

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    ADS  CAS  PubMed  Article  Google Scholar 

  7. 7.

    Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Li, R. et al. Macroscopic somatic clonal expansion in morphologically normal human urothelium. Science 370, 82–89 (2020).

    ADS  CAS  PubMed  Article  Google Scholar 

  9. 9.

    Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Franco, I. et al. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat. Commun. 9, 800 (2018).

    ADS  PubMed  PubMed Central  Article  CAS  Google Scholar 

  12. 12.

    Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Brazhnik, K. et al. Single-cell analysis reveals different age-related somatic mutation profiles between stem and differentiated cells in human liver. Sci. Adv. 6, eaax2659 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Xing, D., Tan, L., Chang, C. H., Li, H. & Xie, X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).

    PubMed  Article  CAS  Google Scholar 

  16. 16.

    Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294.e20 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17.

    Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19, 269–285 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Schmitt, M. W. et al. Detection of ultra-rare mutations by next-generation sequencing. Proc. Natl Acad. Sci. USA 109, 14508–14513 (2012).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Kennedy, S. R. et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat. Protocols 9, 2586–2606 (2014).

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    You, X. et al. Detection of genome-wide low-frequency mutations with paired-end and complementary consensus sequencing (PECC-seq) revealed end-repair-derived artifacts as residual errors. Arch. Toxicol. 94, 3475–3485 (2020).

    CAS  PubMed  Article  Google Scholar 

  22. 22.

    Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Wyles, S. P., Brandt, E. B. & Nelson, T. J. Stem cells: the pursuit of genomic stability. Int. J. Mol. Sci. 15, 20948–20967 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Nicholson, A. M. et al. Fixation and spread of somatic mutations in adult human colonic epithelium. Cell Stem Cell 22, 909–918.e8 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Pleguezuelos-Manzano, C. et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature 580, 269–273 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  30. 30.

    Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Gabella, G. Cells of visceral smooth muscles. J. Smooth Muscle Res. 48, 65–95 (2012).

    PubMed  Article  Google Scholar 

  33. 33.

    Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    ADS  CAS  PubMed  Article  Google Scholar 

  35. 35.

    Gao, Z., Wyman, M. J., Sella, G. & Przeworski, M. Interpreting the dependence of mutation rates on age and time. PLoS Biol. 14, e1002355 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. 36.

    Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Matsumura, S. et al. Genome-wide somatic mutation analysis via Hawk-seq™ reveals mutation profiles associated with chemical mutagens. Arch. Toxicol. 93, 2689–2701 (2019).

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protocols 16, 841–871 (2021).

    CAS  PubMed  Article  Google Scholar 

  39. 39.

    Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Krishnaswami, S. R. et al. Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons. Nat. Protocols 11, 499–524 (2016).

    CAS  PubMed  Article  Google Scholar 

  41. 41.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  42. 42.

    Tischler, G. & Leonard, S. biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol. Med. 9, 13 (2014).

    PubMed Central  Article  PubMed  Google Scholar 

  43. 43.

    Gerstung, M. et al. Reliable detection of subclonal single-nucleotide variants in tumour cell populations. Nat. Commun. 3, 811 (2012).

    ADS  PubMed  Article  CAS  Google Scholar 

  44. 44.

    Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  46. 46.

    Zhang, F. et al. Ancestry-agnostic estimation of DNA sample contamination from sequence reads. Genome Res. 30, 185–194 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Robinson, P. S. et al. Elevated somatic mutation burdens in normal human cells due to defective DNA polymerases. Preprint at (2020).

  49. 49.

    Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Article  Google Scholar 

  50. 50.

    Raine, K. M. et al. Cgppindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr Protoc Bioinformatics 52, 15.17.1–15.17.12 (2015).

    Article  Google Scholar 

  51. 51.

    Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  52. 52.

    Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at (2020).

  53. 53.

    Lensing S. V. et al. Somatic mutation landscapes at single-molecule resolution. Protocol Exchange (2021).

  54. 54.

    Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Comput. Biol. 9, e1003118 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. 57.

    Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. 59.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We are grateful to the live donors and the families of the deceased organ transplant donors. We thank L. Anderson, K. Roberts, C. Latimer, Q. Lin, members of the CGP-lab, R. Vicario, F. Geissmann, N. Angelopoulos, G. Tischler, T. Bellerby, M. Abascal and K. Chatterjee for assistance in the development of NanoSeq or with this manuscript; all NIHR BioResource Centre Cambridge volunteers for participation; the NIHR BioResource Centre Cambridge and staff for their contribution; the National Institute for Health Research and NHS Blood and Transplant; the Cambridge Blood and Stem Cell Biobank for sample donation and support of this work; the Cambridge Brain Bank for sample donation; and the participants and local coordinators at the TwinsUK study. This research was supported by the Cambridge NIHR BRC Cell Phenotyping Hub. I.M. is funded by Cancer Research UK (C57387/A21777) and the Wellcome Trust. P.J.C. is a Wellcome Trust Senior Clinical Fellow. R.R. is a recipient of a CRUK Career Development fellowship (C66259/A27114). E.L. is supported by a Wellcome/Royal Society Sir Henry Dale Fellowship (grant number 107630/Z/15/Z), the European Hematology Association, BBSRC and by core funding from Wellcome (grant number 203151/Z/16/Z) and MRC to the Wellcome-MRC Cambridge Stem Cell Institute. D.G.K. is supported by a Bloodwise Bennett Fellowship (15008), the Bill and Melinda Gates Foundation (INV-002189) and an ERC Starting Grant (ERC-2016-STG–715371). The TwinsUK study was funded by the Wellcome Trust and European Community’s Seventh Framework Programme (FP7/2007-2013). The TwinsUK study also receives support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health & Social Care.

Author information




R.J.O., F.A. and I.M. conceived the project. I.M., P.J.C., R.R. and M.R.S. supervised the project. F.A., R.J.O., E.M. and I.M. wrote the manuscript; all authors reviewed and edited the manuscript. R.J.O. led the development of the protocol with help from F.A., A.R.J.L., P.E., S.V.L. and I.M. R.J.O. and F.A. developed the bioinformatics pipeline with help from R.E.A., S.L. and D.J. F.A. led the analysis of the data with help from A.R.J.L., I.M., A.B.-O., Y.W., L.M.R.H., E.J.K., T.H.H.C., M.S.C. and M.G. E.M. performed the HSC/MPP experiments. L.M.R.H. and A.J.C.R. performed the cell sorting of neuronal nuclei. A.R.J.L. and A.C. performed laser microdissection. E.M., N.F.Ø., H.E.M., M.D., D.G.K., E.L., K.T.M., K.S.-P., K.A., R.R., H.L.-S. and S.O. collected and processed samples. E.M., E.L., M.G. and D.G.K. assisted in the interpretation of blood data.

Corresponding authors

Correspondence to Robert J. Osborne or Iñigo Martincorena.

Ethics declarations

Competing interests

A patent application on NanoSeq has been filed that includes R.J.O., F.A. and I.M.

Additional information

Peer review information Nature thanks John Dick and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Substitution imbalances and impact of A-tailing.

a, b, Imbalances in the distribution of the six complementary substitutions (for example, G>T versus C>A) across read positions in BotSeqS (a) and NanoSeq (b). c, Origin of G>T over C>A mutation call imbalances in standard sequencing22. d, Origin of imbalances in duplex sequencing/BotSeqS as a result of end repair during library preparation. e, Single-strand consensus calls for pyrimidine (top) and purine (bottom) substitutions for the standard BotSeqS (left) protocol and for NanoSeq with standard (middle) and modified (right) A-tailing protocols. For example, C>T changes are shown at the top, whereas the complementary G>A changes are shown at the bottom. By using ddBTPs, C>A, G>A and T>A errors are reduced, lowering the risk of false-positive double-strand consensus calls.

Extended Data Fig. 2 BotSeqS errors as a function of read-end trimming.

a, BotSeqS estimated burden for the granulocyte sample shown in Fig. 2c applying different extents of trimming to the 5′ ends of reads. Even with extensive trimming we predict at least 600 artifactual mutation calls per diploid genome. b, Substitution imbalances are observed deep into the reads and cannot be avoided with read trimming. Imbalances vary from experiment to experiment, as a consequence of DNA damage in the DNA source or during library preparation (Supplementary Note 1). c, Substitution profiles including the reference profile from single-cell-derived blood colonies and three BotSeqS profiles after trimming of 20, 40 and 60 bp from the 5′ end of reads (in addition to 15 bp trimming of the 3′ end). The text in the figure indicates the observed and expected cosine similarities (Methods) to the reference profile. C>A and C>G errors in BotSeqS remain after extensive trimming.

Extended Data Fig. 3 Mung Bean NanoSeq.

a, Estimated number of mutations per cord blood cell. Poisson 95% confidence intervals are shown as lines. The red dotted line shows the number of mutations per cord blood cell estimated with the restriction enzyme NanoSeq protocol, with Poisson 95% confidence intervals shown as a red shade. In contrast to Fig. 1f, we did not apply the correction for missing embryonic mutations because here we are comparing two protocols that are equally affected by this limitation. b, Substitution profiles for the standard end-repair protocol (BotSeqS) and for Mung Bean, showing the cosine similarities with the reference profile (Fig. 1c).

Extended Data Fig. 4 Optimization of duplicate rates, DNA input requirements and estimation of human contamination.

a, Relationship between sequencing yield, library complexity, duplicate rates and efficiency, based on a truncated Poisson model (Methods). Left, duplicate rate as a function of the sequencing ratio (sequencing reads/DNA fragments in the library). Middle, efficiency (measured as bases called with duplex coverage/bases sequenced) as a function of the duplicate rate. Right, efficiency as a function of sequencing ratio. b, Library yield as fmol per 25 μl as a function of the amount of input DNA in ng. c, Empirical relationship between the estimated fmol in library (measured by qPCR) and the number of unique molecules in the library estimated with Picard tools (Lander–Waterman equation) for our choice of restriction enzyme and fragment size selection (250–500 bp). d, Empirical relationship between duplicate rates and efficiency of the method, measured as duplex bases called/number of bases sequenced (that is, the number of paired-end reads multiplied by 300). The maximum efficiency (around 0.04) is lower than the maximum analytical expectation (0.12; middle panel in a) because of the trimming of read ends (barcodes, restriction sites and 8 bp from each end) and the strict filters that we apply to consider a site callable. e, VerifyBamId contamination estimates for different amounts of simulated contamination from individuals of different ancestry. f, Contamination simulation using two NanoSeq samples to contaminate each other.

Extended Data Fig. 5 Correction of standard (CaVEMan-based) mutation burden estimates and validation of NanoSeq indel calling.

a, Comparison of the mutation burden estimates in regions of the genome with at least 20× coverage (c) to the trinucleotide-context-corrected mutation burdens in the subset of c covered by NanoSeq and passing all NanoSeq filters. b, Ratio between the rates shown in a, showing that the corrected burden is approximately 20% higher than the uncorrected burden; box plots show the interquartile range, median and 95% confidence interval for the median. c, Comparison of indel rates between cord blood colonies (indels were called with the Pindel algorithm) and granulocytes from neonates (NanoSeq pipeline), showing Poisson 95% confidence intervals. Given the sparsity of indel calls in cord blood, data from different colonies (n = 100) and granulocytes (n = 2 donors, one of them with 5 replicates) were combined into single point estimates. d, The top two panels show the high similarity between the NanoSeq and Pindel indel profiles for a bladder tumour; the bottom two profiles show the indel spectra in blood from POLE and POLD1 germline mutation carriers, which are very similar to previously reported profiles48.

Extended Data Fig. 6 Cell sorting of HSC/MPPs and colon histology.

a, Gating strategy for the isolation of HSC/MPPs from a representative bone marrow sample. Text above the plots indicates the population depicted. Text inside the plots indicates the name of the gates shown in pink. The CD34+CD38 population is defined as the bottom 20% CD38 as shown. For all initial samples (bone marrow, peripheral blood and cord blood), the index sorted population is the ‘HSC pool’ gate. Cell population abundance differed between samples but typically viable cells were 60–90% of total cells and singlets were 98–99% of viable cells. Live cells were 90–99% of viable cells and myeloid cells were 15–50% of live cells. CD34+ cells were typically 1–15% of myeloid cells. b, c, Colon histology sections showing microbiopsied areas of colonic epithelium and smooth muscle for donors PD34200 (b) and PD37449 (c). For donor PD34200, a single crypt, a pool of six crypts and two smooth muscle areas were sequenced. For donor PD37449, the two single crypts and the pool of six crypts were sequenced. The burden estimates for these microbiopsies are shown in Figs. 2d, 3j, k. d, Substitution profiles for colonic crypts from the three donors in Fig. 2d and cosine similarities to profiles obtained with standard methods.

Extended Data Fig. 7 Neuron nuclei sorting, comparison to single-cell data and accumulation of mutations with age.

a, Gating strategy for the isolation of neuronal nuclei from frontal cortex. Nuclei were sorted by FACS using an Influx cell sorter (BD Biosciences) with a 100-μm nozzle. For each sample an unstained control was used to help to determine the NeuN+ population. The text above each column indicates the population depicted and the text inside the plots indicates the population of the gates highlighted in black. Sorting results varied among samples, with 1–60% passing the DAPI gate and, of these, 2–53% passing a conservative NeuN+ gate. b, Substitution profiles for all mutations detected in neurons with SNP-phased error-corrected single-cell sequencing data from a previously published study13 (top) and with NanoSeq (middle). Bottom, a signature specific to single-cell sequencing data is shown (scF signature from a previous publication16). c, Mutational signatures extracted from a previously published study13, showing their relative contributions in the published dataset. These signatures were obtained using sigfit (Methods) on publicly available mutation calls and are referred to as LDA, LDB and LDC. Note the high similarity between the NanoSeq full spectrum for neurons and LDA (cosine similarity 0.96), and between scF and LDB (cosine similarity 0.97). d, Predicted contribution of LDA, LDB and LDC to each of the previously sequenced neurons13. e, Accumulation of mutations attributed to NanoSeq signatures A, B and C with age in healthy donors and in individuals with Alzheimer’s disease. f, Accumulation of mutations attributed to NanoSeq signatures A, B and C in smooth muscle from bladder and colon.

Extended Data Fig. 8 Normalized substitution spectra across different genomic regions.

a, Substitution spectra for neurons, granulocytes, smooth muscle and colonic crypts in chromatin states associated to transcription (states E4 and E5 in ENCODE) and inactive DNA (E9 and E15). Chromatin states were obtained from ENCODE59, using the following epigenomes: E073 (frontal cortex), E030 (granulocytes), E076 (smooth muscle) and E075 (colonic mucosa). To enable the direct comparison of spectra across genomic regions with different trinucleotide frequencies, the profiles have been normalized to the genomic trinucleotide frequencies (Methods). b, Transcriptional strand asymmetries in neurons, granulocytes, smooth muscle and colonic crypts. c, Transcriptional strand asymmetries in neurons in quartiles of gene expression.

Extended Data Fig. 9 Additional substitution and indel spectra.

a, NanoSeq mutational spectrum for neurons corrected for trinucleotide frequency in the callable genome. Unlike the usual representation, which shows unnormalized rates, this representation shows mutation rates per available trinucleotide. b, Previously published LDA signature13 normalized to trinucleotide frequency in the genome also reveals high C>T rates at CpG dinucleotides. This observation from single-cell data suggests that the high C>T rates at CpG sites in NanoSeq neuron data (a) are not caused by contamination of NeuN+ pools with glia or other cells. c, Indel profiles of granulocytes (top) and colonic crypts without the colibactin signature (bottom). d, Indel profiles for the 250 most highly expressed genes in the PCAWG liver hepatocellular carcinoma data31.

Extended Data Fig. 10 Smooth muscle.

a, Histology of bladder smooth muscle showing two sections from donor PD40842; only one of the two sections was sequenced using NanoSeq. b, Number of mutations detected with CaVEMan in different smooth muscle sections processed with our standard microdissection sequencing protocol38. The orange dots show the expected mutation burdens (with 95% confidence intervals) for these sections based on the donor age and the regression model shown in Fig. 3j. c, Distribution of VAFs for each of the smooth muscle sections using standard whole-genome sequencing. Box plots show the interquartile range, median, 95% confidence interval for the median (notches), and outliers (black dots).

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-10, including Supplementary Figures 1-5 and Supplementary References.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-8. Supplementary Table 1 lists samples used in this study and corresponding data availability. Supplementary Table 2 displays sequencing yields for NanoSeq/BotSeqS DNA libraries. Supplementary Table 3 shows in silico restriction enzyme digestion of the human genome. Supplementary Table 4 displays substitution and indel rates. Supplementary Table 5 shows substitution calls (NanoSeq protocol). Supplementary Table 6 shows indel calls (NanoSeq protocol). Supplementary Table 7 displays trinucleotide substitution profiles and Supplementary Table 8 shows Linear regression models.

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Abascal, F., Harvey, L.M.R., Mitchell, E. et al. Somatic mutation landscapes at single-molecule resolution. Nature (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing