Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements


Accurate somatic mutation detection from single-cell DNA sequencing is challenging due to amplification-related artifacts. To reduce this artifact burden, an improved amplification technique, primary template-directed amplification (PTA), was recently introduced. We analyzed whole-genome sequencing data from 52 PTA-amplified single neurons using SCAN2, a new genotyper we developed to leverage mutation signatures and allele balance in identifying somatic single-nucleotide variants (SNVs) and small insertions and deletions (indels) in PTA data. Our analysis confirms an increase in nonclonal somatic mutation in single neurons with age, but revises the estimated rate of this accumulation to 16 SNVs per year. We also identify artifacts in other amplification methods. Most importantly, we show that somatic indels increase by at least three per year per neuron and are enriched in functional regions of the genome such as enhancers and promoters. Our data suggest that indels in gene-regulatory elements have a considerable effect on genome integrity in human neurons.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Improved large-scale amplification characteristics of PTA compared with MDA.
Fig. 2: PTA identifies MDA-induced artifacts.
Fig. 3: SCAN2 mutation signature-based calling approach for somatic SNVs and indels.
Fig. 4: SCAN2 VAF-based somatic SNVs and indels in aging human neurons.
Fig. 5: Enrichment of neuronal mutations in functionally active genomic regions with tissue- and cell-type specificity.

Data availability

All MDA-amplified single neurons and matched bulks listed in Supplementary Table 2 were downloaded from dbGaP, accession no. phs001485.v1.p1. Only neurons from the PFCs of individuals for which additional PTA data were generated were used. Raw sequencing read data for PTA-amplified human neurons can be downloaded from dbGaP, accession no. phs001485.v3.p1. PTA-amplified mESC kindred cells and bulks can be downloaded from the National Center for Biotechnology Information’s Sequence Read Archive, accession no. PRJNA832209.

Code availability

SCAN2 is available for download at Additional scripts used in the present study are available at and Zenodo44.


  1. Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 43–51 (2013).

    Article  CAS  Google Scholar 

  2. Lodato, M. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    Article  CAS  Google Scholar 

  3. Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  Google Scholar 

  4. Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377, 111–121 (2017).

    Article  Google Scholar 

  5. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  CAS  Google Scholar 

  6. Lodato, M. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  CAS  Google Scholar 

  7. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  CAS  Google Scholar 

  8. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  CAS  Google Scholar 

  9. Franco, I. et al. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat. Commun. 9, 800 (2018).

    Article  Google Scholar 

  10. Franco, I. et al. Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol. 20, 285 (2019).

    Article  CAS  Google Scholar 

  11. Woodworth, M. B., Girskis, K. M. & Walsh, C. A. Building a lineage from single cells: genetic techniques for cell lineage tracking. Nat. Rev. Genet. 18, 230–244 (2017).

    Article  CAS  Google Scholar 

  12. Evrony, G., Lee, E., Park, P. J. & Walsh, C. A. Resolving rates of mutation in the brain using single-neuron genomics. eLife 5, e12966 (2016).

    Article  Google Scholar 

  13. Zhang, C. Z. et al. Calibrating genomic and allelic coverage bias in single-cell sequencing. Nat. Commun. 6, 6822 (2015).

    Article  CAS  Google Scholar 

  14. Luquette, L. J. et al. Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance. Nat. Commun. 10, 3908 (2019).

    Article  Google Scholar 

  15. Bohrson, C. et al. Linked-read analysis identifies mutations in single-cell DNA sequencing data. Nat. Genet. 51, 749–754 (2019).

    Article  CAS  Google Scholar 

  16. Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl Acad. Sci. USA 118, e2024176118 (2021).

    Article  CAS  Google Scholar 

  17. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  Google Scholar 

  18. Zafar, H., Wang, Y., Nakhleh, L., Navin, N. & Chen, K. Monovar: single-nucleotide variant detection in single cells. Nat. Methods 13, 505–507 (2016).

    Article  CAS  Google Scholar 

  19. Singer, J., Kuipers, J., Jahn, K. & Beerenwinkel, N. Single-cell mutation identification via phylogenetic inference. Nat. Commun. 9, 5144 (2018).

    Article  Google Scholar 

  20. Miller, M. B. et al. Somatic genomic changes in single Alzheimer’s disease neurons. Nature 604, 714–722 (2022).

    Article  CAS  Google Scholar 

  21. McConnell, M. J. et al. Mosaic copy number variation in human neurons. Science 342, 632–637 (2013).

    Article  CAS  Google Scholar 

  22. Chronister, W. D. et al. Neurons with complex karyotypes are rare in aged human neocortex. Cell Rep. 26, 825–835 (2019).

    Article  CAS  Google Scholar 

  23. Dong, X. et al. Accurate identification of single-nucleotide variants in whole-genome-amplified single cells. Nat. Methods 14, 491–493 (2017).

    Article  CAS  Google Scholar 

  24. Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).

    Article  CAS  Google Scholar 

  25. Gymrek, M. PCR-free library preparation greatly reduces stutter noise at short tandem repeats. Preprint at bioRxiv (2016).

  26. Lasken, R. S. & Stockwell, T. B. Mechanism of chimera formation during the multiple displacement amplification reaction. BMC Biotechnol. (2007).

  27. Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

    Article  CAS  Google Scholar 

  28. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  Google Scholar 

  29. Reid, D. et al. Incorporation of a nucleoside analog maps genome repair sites in postmitotic human neurons. Science 372, 91–94 (2021).

    Article  CAS  Google Scholar 

  30. Wu, W. et al. Neuronal enhancers are hotspots for DNA single-strand break repair. Nature 593, 440–444 (2021).

    Article  CAS  Google Scholar 

  31. Madabhushi, R. et al. Activity-induced DNA breaks govern the expression of neuronal early-response genes. Cell 161, 1592–1605 (2015).

    Article  CAS  Google Scholar 

  32. Roadmap Epigenomics Consortium, Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  Google Scholar 

  33. Nott et al. Brain cell type-specific enhancer-promoter interactome maps and disease-risk association. Science 366, 1134–1139 (2019).

    Article  CAS  Google Scholar 

  34. Hauberg, M. et al. Common schizophrenia risk variants are enriched in open chromatin regions of human glutamatergic neurons. Nat. Commun. 11, 5581 (2020).

    Article  CAS  Google Scholar 

  35. Alt, F. W. & Schwer, B. DNA double-strand breaks as drivers of neural genomic change, function, and disease. DNA Repair 71, 158–163 (2018).

    Article  CAS  Google Scholar 

  36. Xing, D. et al. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).

    Article  CAS  Google Scholar 

  37. Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).

    Article  CAS  Google Scholar 

  38. Evrony, G. D. et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell 151, 483–496 (2012).

    Article  CAS  Google Scholar 

  39. Baslan, T. et al. Genome-wide copy number analysis of single cells. Nat. Protoc. 7, 1024–1041 (2012).

    Article  CAS  Google Scholar 

  40. Garvin, T. et al. Interactive analysis and assessment of single-cell copy-number variations. Nat. Methods 12, 1058–1060 (2015).

    Article  CAS  Google Scholar 

  41. Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genom. 20, 685 (2019).

    Article  Google Scholar 

  42. Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).

    Article  Google Scholar 

  43. Alexandrov, L. SigProfiler. MATLAB Central File Exchange (2020).

  44. Luquette, L. SCAN2_PTA_paper_2022. Zenodo (2022).

Download references


We thank R. S. Hill, R. Mathieu and L. (Sahithi) Cheemalamarri at the Boston Children’s Hospital & Harvard Stem Cell Institute Flow Cytometry Research Facility, the Research Computing group at Harvard Medical School, and the Boston Children’s Hospital Intellectual and Developmental Disabilities Research Center Molecular Genetics Core for assistance. Human tissue was obtained from the NIH Neurobiobank at the University of Maryland, and we thank the donors and families for their invaluable contributions to the advancement of science. This work was supported by the Bioinformatics and Integrative Genomics training grant (no. T32HG002295 to L.J.L.), grant nos. K08 AG065502 and T32 HL007627 (to M.B.M.), the Brigham and Women’s Hospital Program for Interdisciplinary Neuroscience through a gift from Lawrence and Tiina Rand (to M.B.M.), the donors of the Alzheimer’s Disease Research program of the BrightFocus Foundation (no. A20201292F to M.B.M.), the Doris Duke Charitable Foundation Clinical Scientist Development Award (no. 2021183 to M.B.M.), PRMRP Discovery Award (no. W81XWH2010028 to Z.Z.), the Edward R. and Anne G. Lefler Center postdoctoral fellowship (to Z.Z.), and grant nos. R00 AG054748 (to M.A.L.), R01 AG070921 (to C.A.W.) and R01NS032457 and U01MH106883 (to P.J.P. and C.A.W.), and the Allen Discovery Center program, a Paul G. Allen Frontiers Group advised program of the Paul G. Allen Family Foundation (to C.A.W.). C.A.W. is an investigator at the Howard Hughes Medical Institute. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



L.J.L. conceived and implemented SCAN2. C.A.W., M.B.M. and Z.Z. conceived the application of PTA to single neurons. P.J.P. and C.A.W. conceived and supervised the overall project. L.J.L. and Y.Z. performed computational analysis. L.J.L., M.B.M. and Z.Z. analyzed and interpreted results, and wrote the manuscript. All authors reviewed and edited the manuscript. M.B.M., Z.Z., J.G., S.B., S.K. and M.A.L. collected tissue specimens, isolated single neuronal nuclei and performed PTA amplification and amplification quality control studies. C.L.B. and L.J.L. performed LiRA analysis and comparisons to SCAN2. T.H. and C.L. generated mESCs. J.I.G. conceived and performed the mESC kindred experiment. C.L.B., A.G. and J.K. collected and processed all sequencing data. D.G. and H.J. made suggestions for signature analysis. C.G. and J.W. provided PTA reagents and advice on optimal use.

Corresponding authors

Correspondence to Christopher A. Walsh or Peter J. Park.

Ethics declarations

Competing interests

The authors declare the following competing interests: C.G. is Director and cofounder and J.W. is CEO and cofounder of Bioskryb, Inc., the manufacturer of PTA kits used in the present study. C.A.W. is a consultant for Maze Therapeutics (cash, equity), Third Rock Ventures (cash) and Flagship Pioneering (cash), none of which have any relevance to the present study. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Ruben van Boxtel and Federico Abascal for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Allele balance is not generally correlated between PTA amplifications.

a. Genome-wide allele balance (binned in 100 kb windows) for 3 typical PTA cells from the same individual. b. Allele balance for cells in (a) plotted against each other. c-d. Allele balance averaged across the cohort of 52 PTA cells (c) or 75 MDA cells (d); that is, each point represents the average allele balance for a single 100 kb window. A small number of regions show consistent allelic imbalance across many amplifications (arrows). e. Correlation of allele balance profiles between all pairs of PTA cells. Correlation is generally low; cells from the same individual show slightly higher correlations; and a single individual (4638) shows an atypically strong correlation.

Extended Data Fig. 2 SCAN2 performance on simulated sSNVs.

sSNVs were simulated using the synthetic diploid (SD) X chromosome approach (Methods). Sensitivity is the fraction of known spike-ins recovered and false positives (FPs) are defined as calls that are neither known spike-ins nor somatic mutations endogenous to the haploid X chromosomes used to create each SD. Each point in a-d represents a single SD simulation with 10-250 spike-ins. a-b. Comparison of SCAN2 and SCAN-SNV sensitivity (a; lines are R loess() fits) and false discovery rates (b; lines are linear regression fits to FDR ~ 1/mutations per Mb). c-d. Comparison to other single cell SNV genotypers. c. Sensitivity vs. false positives per megabase of analyzed sequence. d. False discovery rate vs. the number of spike-ins per megabase. Lines are parameterized by mean sensitivity S and false positive rate per megabase F measured across all points: FDR = F / (F + xS). SCcaller standard uses a calling threshold of α = 0.05 while stringent calling uses α = 0.01. e-f. Performance of SCAN2 mutation signature-based rescue as a function of the number of sSNVs available for learning the true mutation signature. Sensitivity (e) and false discovery rate (f) are shown relative to the sensitivity or false discovery rate of the same SD simulation using the maximum sSNV catalog of 4,666 sSNVs. ε = 0.0001 was added to all quantities to avoid division by zero. Solid lines are fitted by R’s loess() function. g. Effect of mutation signature of spike-ins on SCAN2 sensitivity. Each point is the average sensitivity of 9 SD simulations with 1000 spike-ins from a single COSMIC SBS signature. Mutation signatures are characterized by their similarity to the PTA SNV artifact signature. Solid line: linear regression on all points except PTAerr. SBS30 (h) is the most similar COSMIC signature to the PTA SNV artifact signature (PTAerr) (i).

Extended Data Fig. 3 Mutation spectra of SCAN2 and LiRA calls on kindred mouse ESC cells.

a-b. SBS spectra of somatic SNVs called in 4 single cells from the untreated clone. C > A mutations (blue peaks) are characteristic of COSMIC SBS18 and the mutation signature of SNVs acquired during clonal expansion5. These peaks persist in the clonally unsupported SNVs (b), suggesting that the method for classifying true positives is overly conservative. c. Spectra for SNVs called in the 4 single cells taken from an aristolochic acid (AAI)-treated clone.

Extended Data Fig. 4 SCAN2 performance on simulated somatic indels.

a-c. SCAN2 and other callers were applied to simulated indels using the synthetic diploid (SD) X chromosome spike-in approach (Methods). SDs received 10, 25 or 50 indel spike-ins each, which correspond, respectively, to genome-wide burdens of approximately 170 (intermediate), 430 (high) and 850 (very high) somatic indels. Performance was measured by the average number of indels called per SD (a), the fraction of false positives per indel call set (b) and the fraction of spike-ins recovered (c). Tested methods were SCAN2 (with and without signature-based rescue), GATK HaplotypeCaller, GATK HaplotypeCaller with filtration by SCAN2’s cross-sample recurrent artifact filter and an adaptation of SCAN-SNV’s somatic SNV discovery approach to indels. Boxplot whiskers, the furthest outlier < =1.5 times the interquartile range from the box; box, 25th and 75th percentiles; centre bar, median; n=9 SDs per boxplot. d. Distribution of indel lengths among all simulated indels (black) and VAF-based SCAN2 indel calls (red). e. Spike-in indel sensitivity by length for VAF-based SCAN2 calls. f. Sensitivity for VAF-based SCAN2 indel calling stratified by the 83-dimensional indel classification scheme used by COSMIC indel signatures (ID83). Dotted outlines: sensitivity before applying cross-subject filtration. g. ID83-stratified indel sensitivity for SCAN2 calls with signature-based rescue.

Extended Data Fig. 5 Comparison of SCAN2 and LiRA sSNV calls on human neurons.

Single human neurons were analyzed by LiRA15, a specific but lower sensitivity approach for calling somatic SNVs. a-b. SCAN2 and LiRA extrapolations for the total (not called) sSNV burden per diploid Gb of human sequence from MDA- (a) and PTA-amplified (b) single neurons. Solid lines: y=x. c. Linear regression estimates for the number of sSNVs accumulated per neuron per year from several sources and analyses. Horizontal bars represent 95% C.I.s produced by confint applied to an lmer fit by the lme4 R package; centre points from fixef applied to the same fits. (1) LiRA rates taken from ref. 6, which used a larger set of n=91 MDA-amplified PFC neurons; (2) LiRA rates taken from ref. 6 using n=73 of the 75 MDA-amplified PFC neurons from subjects analyzed in this study (the two excluded neurons are 5087pfc-Rp3C5, an extreme outlier, and 4638-MDA-14); (3) rerun of LiRA on n=74 MDA-amplified neurons in (2) using the same input provided to SCAN2; (4) SCAN2 on n=74 MDA-amplified neurons; (5) LiRA on n=34 PTA-amplified neurons from donors also analyzed in ref. 6 (N.B. LiRA’s higher rate estimate in (c) occurs despite lower burden estimations in (b) due to differences in model intercepts: SCAN2 intercept=95.83, LiRA intercept=17.63); (6) SCAN2 on all n=52 PTA-amplified neurons generated here. d. LiRA classification of SCAN2 calls where reads linked to nearby germline heterozygous SNPs are available (black: likely true sSNVs, red: possible false positives). PASS is the highest quality LiRA class. UNCERTAIN and LOW_POWER indicate lack of linking reads to make a confident call, but no evidence of artifactual status is detected. All other classes (red) are interpreted as false positives. Percentages show the fraction of all false positive classes among SCAN2 calls. e-f. Raw mutation spectra for SCAN2 calls without (e) and with mutation signature-based calling (f) SCAN2 calls stratified by LiRA classification. The similarities between PASS and the two lower quality UNCERTAIN_CALL and LOW_POWER classes suggest that the majority of UNCERTAIN_CALL and LOW_POWER SCAN2 calls are true mutations. Confident false positives (FILTERED_FPs) possess a C > T dominated signature with lack of C > Ts at CpGs.

Extended Data Fig. 6 Somatic indel mutation spectra in human neurons and other cells.

a. Spectrum of 1541 indels from PTA neurons from this study, same as Fig. 4e. b-e. Somatic indel spectra from other studies: clonally expanded single skeletal muscle stem cells (b), clonally expanded single kidney (excluding hypermutated kidney cells, designated KT2 in the original study), epidermis and fat cells (c) and clonally expanded bronchial epithelial cells from children and never-smokers (d). e. COSMIC signatures with clock-like or age-associated annotations. f. Non-aging COSMIC signatures with >5% contribution to single neurons. g. Per-neuron COSMIC signature fits, corrected for ID83 sensitivity (Methods). Correlation (ρ) between age and exposure and P-value of two-sided t-test for correlation=0 (p) are shown for each COSMIC signature. P-values were not adjusted for multiple comparisons. Colors correspond to subject IDs as shown in Fig. 4. Note that y-axes are not the same scale.

Extended Data Fig. 7 PTA sensitivity over genomic regions for SNVs and indels.

a. Absolute sensitivity for spatial measurements that divide the genome into roughly equally sized deciles (median GTEx expression for a single tissue type, brain BA9 prefrontal cortex, and phyloP 100way conservation). b-c. Relative sensitivities: sensitivity inside of the tested region divided by sensitivity of the complemented region. Enhancers and promoters from Nott et al. 2019, ATAC-seq from Hauberg et al. 2020, DNA repair hotspots from Wu et al. 2021 and Reid et al. 2021, H3K27ac peaks from Roadmap Epigenomics. Each point represents one PTA neuron; crosses represent the 7 PTA neurons sequenced to 60x, circles represent 30x depth samples. Boxplot whiskers, the furthest outlier < =1.5 times the interquartile range from the box; box, 25th and 75th percentiles; centre bar, median.

Extended Data Fig. 8 ChromHMM states and neuronal mutations.

Enrichment analysis of ChromHMM states from 127 tissues from the Roadmap Epigenomics Project. Active regions include 1_Tss, 4_Tx, 5_TxWk, 6_EnhG and 7_Enh; inactive states include 9_Het and 14_ReprPCWk. Red points, brain tissue regardless of significance level; black points, non-brain tissue; grey points, enrichment not significant at the P < 0.1 level. No correction for multiple hypothesis testing was applied.

Extended Data Fig. 9 Patterns of mutation enrichment persist at increasing sequencing depth thresholds.

Analyses presented in Fig. 5 rerun using mutations supported by at least 10, 15, 20, 25 and 30 reads; permutations used for enrichment analysis are also restricted to the subset of the genome with the corresponding sequencing depth. GABA, GABAergic neurons; GLU, glutamatergic neurons; OLIG, oligodendrocytes; MGAS, microglia and astrocytes. Error bars: 95% bootstrapping confidence intervals. For panels a-d, each plot presents an analysis at one depth cutoff; for panels e-i, each plot contains the full range of depth cutoffs, as indicated on the x-axis. Error bars in d-i represent bootstrap 95% C.I.s using n=10,000 bootstrap samples; centre points are the observed mutation count divided by the mean mutation count of the bootstrap samples.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11, Note and Tables 1–3.

Reporting summary

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Luquette, L.J., Miller, M.B., Zhou, Z. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat Genet 54, 1564–1571 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing