Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Somatic mosaicism reveals clonal distributions of neocortical development


The structure of the human neocortex underlies species-specific traits and reflects intricate developmental programs. Here we sought to reconstruct processes that occur during early development by sampling adult human tissues. We analysed neocortical clones in a post-mortem human brain through a comprehensive assessment of brain somatic mosaicism, acting as neutral lineage recorders1,2. We combined the sampling of 25 distinct anatomic locations with deep whole-genome sequencing in a neurotypical deceased individual and confirmed results with 5 samples collected from each of three additional donors. We identified 259 bona fide mosaic variants from the index case, then deconvolved distinct geographical, cell-type and clade organizations across the brain and other organs. We found that clones derived after the accumulation of 90–200 progenitors in the cerebral cortex tended to respect the midline axis, well before the anterior–posterior or ventral–dorsal axes, representing a secondary hierarchy following the overall patterning of forebrain and hindbrain domains. Clones across neocortically derived cells were consistent with a dual origin from both dorsal and ventral cellular populations, similar to rodents, whereas the microglia lineage appeared distinct from other resident brain cells. Our data provide a comprehensive analysis of brain somatic mosaicism across the neocortex and demonstrate cellular origins and progenitor distribution patterns within the human brain.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Mosaic variants in a human neocortex are mostly lateralized and region specific.
Fig. 2: Overlapping geoclone AFs reveal evidence of cellular bottlenecks within the neocortical anlage.
Fig. 3: Brain-derived cell types of the cortex separate along the midline in early development.
Fig. 4: Single-nucleus genotyping of mosaic variants resolves cellular lineage.

Data availability

Raw WGS and MPAS (MPAS/snMPAS) are available through NDA (NDA study 919, for ID01, and SRA for ID02–04 (PRJNA736951). Raw ChIP–seq reads are available on SRA (PRJNA736951). The 300× WGS panel of normal is available on SRA (PRJNA660493). Summary tables of the data are included as Supplementary Data 14.

Code availability

Details and codes for the data processing and annotation are provided at GitHub (


  1. Freed, D., Stevens, E. L. & Pevsner, J. Somatic mosaicism in the human genome. Genes 5, 1064–1094 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  2. Woodworth, M. B., Girskis, K. M. & Walsh, C. A. Building a lineage from single cells: genetic techniques for cell lineage tracking. Nat. Rev. Genet. 18, 230–244 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. D’Gama, A. M. & Walsh, C. A. Somatic mosaicism and neurodevelopmental disease. Nat. Neurosci. 21, 1504–1514 (2018).

    PubMed  Article  CAS  Google Scholar 

  4. Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).

    ADS  CAS  PubMed  Article  Google Scholar 

  5. Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. Machiela, M. J. & Chanock, S. J. The ageing genome, clonal mosaicism and chronic disease. Curr. Opin. Genet. Dev. 42, 8–13 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. Kalhor, R. et al. Developmental barcoding of whole mouse via homing CRISPR. Science 361, eaat9804 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. Bowling, S. et al. An engineered CRISPR-Cas9 mouse line for simultaneous readout of lineage histories and gene expression profiles in single cells. Cell 181, 1410–1422 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    ADS  CAS  PubMed  Article  Google Scholar 

  10. Li, R. et al. Macroscopic somatic clonal expansion in morphologically normal human urothelium. Science 370, 82–89 (2020).

    ADS  CAS  PubMed  Article  Google Scholar 

  11. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).

    ADS  CAS  PubMed  Article  Google Scholar 

  13. Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).

    ADS  CAS  PubMed  Article  Google Scholar 

  14. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. Rakic, P. Neurons in rhesus monkey visual cortex: systematic relation between time of origin and eventual disposition. Science 183, 425–427 (1974).

    ADS  CAS  PubMed  Article  Google Scholar 

  16. Bergles, D. E. & Richardson, W. D. Oligodendrocyte development and plasticity. Cold Spring Harb. Perspect. Biol. 8, a020453 (2015).

    PubMed  Article  Google Scholar 

  17. Bayraktar, O. A., Fuentealba, L. C., Alvarez-Buylla, A. & Rowitch, D. H. Astrocyte development and heterogeneity. Cold Spring Harb. Perspect. Biol. 7, a020362 (2014).

    PubMed  Article  Google Scholar 

  18. Gao, P., Sultan, K. T., Zhang, X. J. & Shi, S. H. Lineage-dependent circuit assembly in the neocortex. Development 140, 2645–2655 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. Marin, O. & Rubenstein, J. L. A long, remarkable journey: tangential migration in the telencephalon. Nat. Rev. Neurosci. 2, 780–790 (2001).

    CAS  PubMed  Article  Google Scholar 

  20. Lim, L., Mi, D., Llorca, A. & Marin, O. Development and functional diversification of cortical interneurons. Neuron 100, 294–313 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Prinz, M., Jung, S. & Priller, J. Microglia biology: one century of evolving concepts. Cell 179, 292–311 (2019).

    CAS  PubMed  Article  Google Scholar 

  22. Walsh, C. & Cepko, C. L. Clonal dispersion in proliferative layers of developing cerebral cortex. Nature 362, 632–635 (1993).

    ADS  CAS  PubMed  Article  Google Scholar 

  23. Walsh, C. & Cepko, C. L. Widespread dispersion of neuronal clones across functional regions of the cerebral cortex. Science 255, 434–440 (1992).

    ADS  CAS  PubMed  Article  Google Scholar 

  24. Gao, P. et al. Deterministic progenitor behavior and unitary production of neurons in the neocortex. Cell 159, 775–788 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    ADS  CAS  PubMed  Article  Google Scholar 

  26. Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Huang, A. Y. et al. Distinctive types of postzygotic single-nucleotide mosaicisms in healthy individuals revealed by genome-wide profiling of multiple organs. PLoS Genet. 14, e1007395 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. Bizzotto, S. et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Rodin, R. E. et al. The landscape of somatic mutation in cerebral cortex of autistic and neurotypical individuals revealed by ultra-deep whole-genome sequencing. Nat. Neurosci. 24, 176–185 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. Wang, Y. et al. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol. 22, 92 (2021).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. Yang, X. et al. Developmental and temporal characteristics of clonal sperm mosaicism. Cell 184, 4772–4783 (2021).

    CAS  PubMed  Article  Google Scholar 

  32. Breuss, M. W. et al. Autism risk in offspring can be assessed through quantification of male sperm mosaicism. Nat. Med. 26, 143–150 (2020).

    CAS  PubMed  Article  Google Scholar 

  33. Dou, Y. et al. Accurate detection of mosaic variants in sequencing data without matched controls. Nat. Biotechnol. 38, 314–319 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Nott, A. et al. Brain cell type-specific enhancer-promoter interactome maps and disease-risk association. Science 366, 1134–1139 (2019).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Wang, C. F. et al. Lhx2 expression in postmitotic cortical neurons initiates assembly of the thalamocortical somatosensory circuit. Cell Rep. 18, 849–856 (2017).

    CAS  PubMed  Article  Google Scholar 

  36. Kriegstein, A. & Alvarez-Buylla, A. The glial nature of embryonic and adult neural stem cells. Annu. Rev. Neurosci. 32, 149–184 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Ginhoux, F. & Garel, S. The mysterious origins of microglia. Nat. Neurosci. 21, 897–899 (2018).

    CAS  PubMed  Article  Google Scholar 

  38. Hevner, R. F. Layer-specific markers as probes for neuron type identity in human neocortex and malformations of cortical development. J. Neuropathol. Exp. Neurol. 66, 101–109 (2007).

    CAS  PubMed  Article  Google Scholar 

  39. Huang, A. Y. et al. Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc. Natl Acad. Sci. USA 117, 13886–13895 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  41. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  42. Takaoka, K. & Hamada, H. Cell fate decisions and axis determination in the early mouse embryo. Development 139, 3–14 (2012).

    CAS  PubMed  Article  Google Scholar 

  43. Rossant, J. & Tam, P. P. Blastocyst lineage formation, early embryonic asymmetries and axis patterning in the mouse. Development 136, 701–713 (2009).

    CAS  PubMed  Article  Google Scholar 

  44. Levin, M. Left-right asymmetry in embryonic development: a comprehensive review. Mech. Dev. 122, 3–25 (2005).

    CAS  PubMed  Article  Google Scholar 

  45. Burdine, R. D. & Schier, A. F. Conserved and divergent mechanisms in left-right axis formation. Genes Dev. 14, 763–776 (2000).

    CAS  PubMed  Article  Google Scholar 

  46. King, T. & Brown, N. A. Embryonic asymmetry: the left side gets all the best genes. Curr. Biol. 9, R18–R22 (1999).

    CAS  PubMed  Article  Google Scholar 

  47. Kessaris, N. et al. Competing waves of oligodendrocytes in the forebrain and postnatal elimination of an embryonic lineage. Nat. Neurosci. 9, 173–179 (2006).

    CAS  PubMed  Article  Google Scholar 

  48. Molho-Pessach, V. & Schaffer, J. V. Blaschko lines and other patterns of cutaneous mosaicism. Clin. Dermatol. 29, 205–225 (2011).

    PubMed  Article  Google Scholar 

  49. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    CAS  PubMed  Article  Google Scholar 

  51. Huang, A. Y. et al. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res. 45, e76 (2017).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    ADS  CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. Heinz, S. et al. Transcription elongation can affect genome 3D structure. Cell 174, 1522–1536 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. Consortium, E. P. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

    ADS  Article  CAS  Google Scholar 

  56. Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA 107, 139–144 (2010).

    ADS  CAS  PubMed  Article  Google Scholar 

  58. Griffiths, R. C. & Tavare, S. Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. Lond. B 344, 403–410 (1994).

    ADS  CAS  Article  Google Scholar 

  59. Popic, V. et al. Fast and scalable inference of multi-sample cancer lineages. Genome Biol. 16, 91 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

Download references


We thank the individuals who donate their bodies and tissues for the advancement of research; S. Lee, C. Zhu and I. Tang for feedback; D. Weinberger, J. Kleinman, T. Hyde and R. Narukar for the samples; and R. Sinkovits, A. Majumdar and S. Strande at the San Diego Supercomputer Center. Sequencing is supported by the Rady Children’s Institute for Genomic Medicine and the UCSD Institute for Genomic Medicine. M.W.B. was supported by an EMBO Long-Term Fellowship (no. ALTF 174-2015), the Marie Curie Actions of the European Commission (nos LTFCOFUND2013 and GA-2013-609409) and an Erwin Schrödinger Fellowship by the Austrian Science Fund (no. J 4197-B30). This study was supported by grants to J.G.G. from the Howard Hughes Medical Institute, NIMH (1U01 MH108898, R01 MH124890 and R21 AG070462), and to C.K.G. from NIA (RF1 AG061060-02, R01 AG056511-02, R01 NS096170-04), and the UC San Diego IGM Genomics Center (S10 OD026929). A.N. was supported by the UK Dementia Research Institute, which receives its funding from UK DRI Ltd, funded by the UK Medical Research Council, Alzheimer’s Society, and Alzheimer’s Research UK.

Author information

Authors and Affiliations




M.W.B., X.Y., J.C.M.S., D.A. and J.G.G. conceived the project and designed the experiments. M.W.B., X.Y., J.C.M.S., A.J.L., C.C., G.C., Q.S., T.F.N., S.O., M.A.H., A. Nott and M.P.P. performed the experiments. X.Y., D.A., X.X., M.W.B., J.C.M.S., A. Nguyen and B.C. performed the bioinformatics and data analyses. M.W.B., X.Y., V.S., J.M.-V., S.T.B., S.N., L.V.D.K. and Y.D. organized, handled and sequenced human samples. J.G.G. and C.K.G. provided financial and laboratory resources and supervised the project. M.W.B., X.Y., J.C.M.S., D.A. and J.G.G. wrote the manuscript. All of the authors reviewed the manuscript. A.J.L. and X.X. contributed equally to this work.

Corresponding author

Correspondence to Joseph G. Gleeson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Distribution and Features of Somatic Variants for ID01.

a, Circos plot of the genomic positions (hg19) of all detected and quantified positive variants. Different colors were used to distinguish AFs from different organs, the highest AF from all sequenced bulk brain regions is shown for each variant in the Brain track. The higher AF of both kidneys is plotted for the kidney track if present in left and right. Bar height: square root transformed AF from 0.0 to 0.5. Chromosomes are indicated by a number or with ‘X’. Overall, no clustering of the 259 variants was observed across the genome. b, Fraction of variants located in different genomic regions for the six categories based on tissue distribution. Categories of genomic regions are described in Methods. 95% permutation intervals were calculated from 10,000 random permutations of the same number of variants as for each mutation category from gnomAD (v2.1.1). If the detected variant category was outside of the permutation band, the band was labelled pink. Enrichment across features was as expected by random shuffling; the most distinct pattern of enrichment was observed for variants shared across the brain and the organs. c, Relative contribution of the six possible base substitutions for variants showing overall C>T predominance. Across the distribution categories, putatively early somatic mutations found across the brain and the organs were most distinct from the other categories, mainly due to an additional relative increase of C>T mutations (numbers for categories are provided in Fig. 1c).

Extended Data Fig. 2 The distribution of variants in three additional individuals suggests stochasticity.

a, Neocortical biopsies of a validation cohort ID02, ID03, and ID04 were taken with an 8mm punch (Sml: red circle) from prefrontal and temporal areas from each neocortical hemisphere (L-PF, L-T, R-PF, and R-T). In addition, one cerebellar biopsy (red circle) was taken (15 biopsies total from 3 individuals). The workflow was separated as shown in Fig. 1b. DNA from each Sml punch underwent 300× WGS, and mosaic variants were identified. Quantification/Analysis: bulk DNA from each punch, as well as FANS cell populations for one individual (ID02) underwent >3000× MPAS. b, Distribution of 471 bona fide somatic variants within sampled regions across ID02, ID03, and ID04. Cortical-only variants shared between hemispheres were labelled in red, the number is shown in parentheses. c, Square root transformed (sqrt-t) maximal allelic fraction (AFmax). Horizontal lines: median; box: quartiles; whiskers: the extent of data without outliers; outliers: inter-quartile range >1.5, n numbers are the same as labelled in b. d, Number of variants found exclusively in each Sml biopsy (from total n = 292). e, As in Fig. 1h, hierarchical clustering of 102 (ID02), 235 (ID03), or 134 (ID04) variants and their pairwise Pearson’s correlation of AFs from MPAS. Due to the sampling strategy single-tissue variants dominate in ID03 and ID04. ‘Enriched’: present in both biopsies on one side but only in one on the contralateral side; dark grey: ‘non-lateral’, i.e., variants present only in the cerebellum. Bottom: highlighted clusters (black triangles) reveal increased correlation within lobes and hemispheres.

Extended Data Fig. 3 Patterns of Clonal Spread within Lobes Are Predicted by Immediate Proximity for ID01.

a, Scatter plot as in Fig. 2h for 102 (ID02), 235 (ID03), 134 (ID04), or 471 (ID02, ID03, and ID04) variants and 5 sample pairs where mosaicism was detected. Horizontal red line: separation of 1 sample and >1 samples; vertical red line: AF at 0.05; OR = 18.996 (95% CI: 9.276–45.276) and P<2.2e-16. OR and P-value for h and i: Two-tailed Fisher’s exact test for count data, based on the measured AF and number of positive samples for each variant. b, 13 punches (8 punches proximal and 4 punches distal to the central punch) were assessed for all 259 variants from ID01 from 3 representative lobes (L-PF, L-T, and R-PF) to measure the degree of AF sharing based upon proximity. Lobe is projected onto the checkerboard. Central small biopsy (Sml) used for variant discovery is site ‘g’. Lrg: homogenized remaining lobar tissue was also assessed for variants. Sample dropouts in grey. c, Local spread of a variant shown in Fig. 2f, restricted to R-PF (see geoclone Fig. 2f). d, Local spread of four different variants that were restricted to a single Sml punch from one lobe. Variants identified only within a Sml punch were often evident in one or more adjacent punches, but even then often not evident in the Lrg tissue, likely a result of dilution within Lrg even at 3000x coverage. e-g, AF-based hierarchical clustering of variants and tissues in subsamples in L-T (e), R-PF (f), and L-PF (g). Dark grey: sample dropout. Light grey, not closely correlated with colored boxes. Central punch ‘g’ is marked in red. For each Sml punch, we noted a block of private variants not found in any adjacent punches, suggesting these as geographically restricted, and for this reason, clustering did not demonstrate that punches adjacent to ‘g’ were also clustered closest to ‘g’. Most closely related pairs in the hierarchy were adjacent samples (e.g., in e, ‘i’ and ‘l’ block, ‘c’ and ‘h’ block), although not all adjacent samples show correlated AFs. The degree of sharing by adjacent clones exceeds random chance (P = 0.0003), as determined by 10,000 random shuffles of the sample labels. h, Spearman correlation’s ρ for a pair-wise comparison of the central Sml biopsy ‘g’ with all other analyzed sublobar samples. While some punches correlate more significantly with g than others, the correlation was not directly related to distance, suggesting that while adjacent samples may have correlated AFs, as seen in e-g, inter-biopsy distance, in general, is a poor predictor of correlation.

Extended Data Fig. 4 FANS Isolates Enriched Cellular Populations.

a, Available and MPAS-analyzed sorted populations from cortical areas of ID01. Black: available; White: DNA quantity/quality not sufficient for MPAS analysis. b, UCSC genome browser tracks of H3K27ac for brain cell-type nuclei populations. Representative genes for neurons include excitatory neurons (NEFL encoding Neurofilament Light), OPCs/Oligodendrocytes (OPALIN encoding for Oligodendrocytic Myelin Paranodal And Inner Loop Protein), astrocytes (GJA1 for Gap Junction Protein Alpha 1), and microglia (CX3CR1 for Fractalkine Receptor). c, PCA of H3K27ac in nuclei from NeuN+, TBR1+, DLX1+, OLIG2+, NeuN-/LHX2+, and PU.1+ brain populations. d, Heatmap of Pearson’s correlation of H3K27ac ChIP–seq log2(Normalized tags+1) in NeuN+, TBR1+, DLX1+, OLIG2+, LHX2+/ NeuN-, and PU.1+ cell populations. e, Comparison of H3K27ac ChIP–seq of brain nuclei populations from the postmortem, adult brain of ID01 with nuclei populations from surgically resected, pediatric brain. Heatmap of Pearson’s correlation of all H3K27ac ChIP–seq log2(Normalized tags+1) values from cell types in the postmortem tissue (marked with an asterisk) compared to H3K27ac ChIP–seq data sets from surgically resected brain tissue of pediatric patients34.

Extended Data Fig. 5 Correlations of AFs in Bulk Tissues and Sorted Populations Highlight Features of Mosaic Variants in ID01.

Correlation plots with hierarchical clustering based on Pearson’s correlation coefficients between AFs measured in different bulk tissues (Bulk) or sorted cellular fractions (Sorted Populations). AFs were assessed by MPAS and correlations were calculated between all possible combinations from the 259 detected variants, as described in Fig. 1h. Color codes show the left–right distribution of the variant, and in which tissue the variants were detected on the level of bulk tissues. The upper half of the diamond is the correlation used to determine the order in the lower half of the diamond. The two correlations show that bulk sample analysis and sorted cellular fraction analysis contain overlapping but distinct information. For instance, shared lateralized variants appear in both analyses when using ‘Bulk’ to cluster, but the variants restricted in one sample are mostly absent from ‘Sorted Populations’.

Extended Data Fig. 6 Statistical Modeling Estimates an Effective Population Size of ~90–200 Progenitors prior to Left–right Separation.

a-e, Contour plot similar to Fig. 3i for informative variants for ID01 (n = 187, a), ID02 (n = 95, b), ID03 (n = 226, c), ID04 (n = 131, d), or the combination of ID02, ID03, and ID04 (n = 452, e) but for bulk tissues; anterior (PF) and posterior (T) brain regions: Ant, Pos. Arrows shown in e indicate the continuous distribution between anterior–posterior but not left–right as in Fig. 3i. f, Normalized difference of mosaic variant average AF (Rmean - Lmean) of sorted brain-derived cells (i.e., non-PU.1+) of ID01 from left and right hemisphere (Normalized Δ; see Methods) and their negative log10 P-value comparing individual values from both hemispheres (Two-way ANOVA for side, using side and sorted cell type as two independent variables; Bonferroni-corrected). Size of markers, fill-color, and edge-color indicate a variant’s AFmax, significant lateralization, and P<10-10, respectively and as indicated. Enrichment is determined by a P<0.05 and a Normalized Δ of below -0.5 or above 0.5. g, Allelic fractions of variants enriched in either hemisphere of ID01. X-axis as in a, y-axis is the AFmax of a variant. The color indicates enrichment as in f. h, Red: cells with variants occurring during very early development stages before brain lateralization, distributed differently in both hemispheres and potentially shared by non-brain tissues. Blue: cells with variants that occur after the left–right split, detected only in one hemisphere. i, AF quantified from the left and right hemisphere of the red variants: the larger the predicted starting population at the time of the left–right separation is, the smaller the expected AF differences will be. j, AF quantified for fully lateralized variants; the smaller the population immediately after the left–right separation, the higher AF will be observed for lateralized variants. k, Example variant of ID01 used for the estimation of the maximal effective population size supported by the observed difference between left and right (95% bands of a hypergeometric distribution are plotted in black). Blue and red dashed lines: average AF measured in both hemispheres. Green line: upper bound of the estimated starting population. l, Upper bound of the starting population estimated from all variants of ID01 shared in both hemispheres, by non-brain organs, or both, suggesting that they were present before the left–right split. The 5-percentile for all the estimated variants was 211 (grey dashed line), the lowest estimation was 160. m, Minimum Starting population estimated from all variants of ID01 unique to one hemisphere; the smallest estimated number was 86 (black dashed line). This estimated that the effective founder population prior to the left–right separation was 86–211 progenitors.

Extended Data Fig. 7 Individual Geoclones and Overall AF Correlation of Cell Types is consistent with the Detection of Contributing Ventral and Dorsal Clones.

a, Clone from ID01 with NeuN+, OLIG2+, and LXH2+ cells in one right-sided lobe, suggesting a dorsally and ventrally derived clone with restriction along left–right and anterior–posterior. b, Clone from ID01 with OLIG2+ and LXH2+ cells in R-PF, but not observed in NeuN+ cells and not in other lobes, suggesting a dorsally derived clone. c, Clone from ID01 with bilateral LHX2+ cells and PU.1+ cells, suggesting an early low-abundance clone that might have been positively selected in both proliferating populations. N: neurons; OG: oligodendrocytes; AC: astrocytes; MG: microglia d-f, Correlation plots of AFs for sorted populations of NeuN+, OLIG2+ and LHX2+ cells from ID01. Each data point shows one variant for one region where high-quality data (>1,000×) was available; d: n = 522 pairs/118 variants; e: n = 416/117; f: n = 395/115. While all three cell types showed a significant positive correlation, neurons showed a higher correlation with oligodendrocytes than astrocytes, consistent with current knowledge about cellular origins. g-i, Correlation plots of AFs for sorted populations of PU.1+ cells with NeuN+, OLIG2+, and LHX2+, and TBR1+ cells from ID01. Each data point shows one variant for one region where high-quality data (>1,000×) was available; g: n = 134 pairs/82 variants; h: n = 138/86; i: n = 65/65. Overall, correlation is low, but best for astrocytes, likely driven by the clonal patterns similar to c. j-n, Clones from ID02 where samples of the four cortical areas were sorted for NeuN+ and OLIG2+ cells. Examples show a widely distributed clone (j), an enriched clone (k), unilateral clones (l and m), and a clone restricted in one sample (n). o, Correlation plots of AFs for sorted populations of NeuN+ and OLIG2+ cells from ID02. Each data point shows one variant for one region where high-quality data (>1,000×) was available; n = 108 pairs/71 variants. Spearman correlation’s ρ and two-tailed P-value are shown for the pair-wise comparison, as is a simple (one independent) linear regression with least-square estimated mean in the center and 95% error bands for d-i and o.

Extended Data Fig. 8 Excitatory Neuron Marker TBR1 and Inhibitory Neuron Marker DLX1 Enable Dissection of Ventral and Dorsal Clone Contribution.

a, TBR1+ sorted nuclei from ID01 show acetylation of H3K27 at promoter-specific for excitatory neurons but not for inhibitory neurons. DLX1+ sorted nuclei from ID01 show acetylation of H3K27 at promoter-specific for inhibitory neurons but not for excitatory neurons. UCSC genome browser track for H3K27ac in NeuN+, TBR1+, and DLX1+ populations at loci for excitatory and inhibitory neuronal markers (SLC1A7: Excitatory amino acid transporter 5; GRIN2B Glutamate Ionotropic Receptor NMDA Type Subunit 2B; TBR1: T-box Brain Transcription Factor 1; GAD2: Glutamate Decarboxylase 2; SLC6A1: GABA-Transporter 1; GAD1: Glutamate Decarboxlase 1). b-d, Correlation plots of AFs for sorted populations of TBR1+ cells with NeuN+, OLIG2+, and LHX2+ cells. Each data point shows one variant for one region where high-quality data (>1,000×) was available; b: n = 137 pairs/89 variants; c: n = 66 pairs/66 variants; d: n = 140 pairs/96 variants. e-h, Correlation plots of AFs for sorted populations of DLX1+ cells with NeuN+, OLIG2+, LHX2+, and TBR1 cells. Each data point shows one variant for one region where high-quality data (>1,000×) was available; e: n = 139 pairs/88 variants; f: n = 69 pairs/69 variants; g: n = 145 pairs/96 variants; h: n = 147/94 variants. Available data is from L-T and R-PF only. i-l, Lolliplot of the AFs in NeuN+, TBR1+, and DLX1 cells in L-T and R-PF for 1-180856518-T-G, 8-72947366-G-A, 7-80017095-C-T, and 2-139753954-C-T. The two hemispheres show distinct patterns for excitatory and inhibitory markers for all of the variants, likely due to the stochastic seeding of early cortical cell lineages after midline separation. Spearman correlation’s ρ and two-tailed P-value are shown for the pair-wise comparison, as is a simple (one independent) linear regression with least-square estimated mean in the center and 95% error bands for b-h.

Extended Data Fig. 9 BEAST Lineage tree confirms manual clade assignment and UMAP Embedding of Mosaic Variants Suggests that Clade Variants Are Randomly Intermixed.

a, Lineage tree for all considered cells (n = 71) using the filtered mosaic variants detectable in L-T (n = 33) from ID01. A representative tree was constructed using the maximum clade credibility method while branch colors represent inferred clades. Scale bar represents the expected substitutions per site as a function of branch length. b-c, UMAP embeddings of mosaic variants (n = 259) across 79 samples using the considered AF for tissues. In c, variants are colored according to their lateralization, as shown in Fig. 1h. As expected, lateralization segregates variants in this analysis. d, UMAP embedding as in b, but variants are colored according to the clades as determined from snMPAS analysis.

Extended Data Fig. 10 Clades Contribute Unequally to Interrogated Tissues and Cell Types.

a Genotype of somatic variants determined by snMPAS and their AF information from bulk and FANS-sorted samples from MPAS was used to reconstruct the lineages in ID01. Coloring is based on the manually identified clades (Fig. 4b). Numbers correspond to variant rank (Fig. 4b). This integrated analysis confirms clade existence and determines the lineage contributions of each clade to individual organs and tissues. b, Relative contribution of variants labelled in each lineage group presented in panel a were calculated through a linear regression model. An absolute error method was used to optimize the estimation so that the weighted sum of all predicted lineages reflected the AFs measured in the 25 bulk tissues. c, Relative contribution of lineages from each clade for all sorted populations.

Supplementary information

Supplementary Information

Supplementary Note, discussing the clinical relevance of our findings, and Supplementary Figs. 1 (FANS for cell type of origin from post-mortem brain tissue) and 2 (sorting of neuronal and non-neuronal single nuclei for snMPAS), describing the gating strategies of our FANS sorting.

Reporting Summary

Peer Review File

Supplementary Data 1

Raw mosaic SNV/INDEL calls from the 300× WGS. Raw mosaic SNV/INDEL calls from the 300× WGS. The information includes the genomic position, reference and alternative alleles, as well as caller agreement on the specific variant. The candidate variant list subjected to MPAS and snMPAS panel design and the considered regions are included as separate sheets. Quality control metrics of WGS are also included.

Supplementary Data 2

MPAS and snMPAS genotyping and quantification results. MPAS and snMPAS genotyping and quantification results. Quality control metrics of MPAS are also included.

Supplementary Data 3

Detailed visual representation for each of the 259 variants from ID01. Geoclones for bulk samples and geographic subsamples as well as lolliplot representations for all of the 259 positively detected mosaic variants based on MPAS results. Plots are further explained in Figs. 2 and 3.

Supplementary Data 4

Visual representation of snMPAS results for each of the 259 variants from ID01. Visualized AF for all 259 bona fide mosaic variants in the 95 sorted single nuclei based on snMPAS results. The y axis shows the square-root-transformed AF. Dots and error bars show the calculated AF and an exact binomial 95% CI. Cellular ID shows the individual’s ID (ID01), its origin and whether or not a cell was NeuN+ (NeuN) or NeuN (DAPI). Each cell is further identified by its good position (A01–H11). The title of each page represents the variant ID (chromosome position reference alternative).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Breuss, M.W., Yang, X., Schlachetzki, J.C.M. et al. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 604, 689–696 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing