Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Extensive phylogenies of human development inferred from somatic mutations


Starting from the zygote, all cells in the human body continuously acquire mutations. Mutations shared between different cells imply a common progenitor and are thus naturally occurring markers for lineage tracing1,2. Here we reconstruct extensive phylogenies of normal tissues from three adult individuals using whole-genome sequencing of 511 laser capture microdissections. Reconstructed embryonic progenitors in the same generation of a phylogeny often contribute to different extents to the adult body. The degree of this asymmetry varies between individuals, with ratios between the two reconstructed daughter cells of the zygote ranging from 60:40 to 93:7. Asymmetries pervade subsequent generations and can differ between tissues in the same individual. The phylogenies resolve the spatial embryonic patterning of tissues, revealing contiguous patches of, on average, 301 crypts in the adult colonic epithelium derived from a most recent embryonic cell and also a spatial effect in brain development. Using data from ten additional men, we investigated the developmental split between soma and germline, with results suggesting an extraembryonic contribution to primordial germ cells. This research demonstrates that, despite reaching the same ultimate tissue patterns, early bottlenecks and lineage commitments lead to substantial variation in embryonic patterns both within and between individuals.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Phylogenies of clonal populations.
Fig. 2: Developmental phylogenies and embryonic asymmetries.
Fig. 3: Embryonic mosaicism in tissues and organs.
Fig. 4: Patterns of mutations in early embryogenesis.

Similar content being viewed by others

Data availability

The DNA sequencing data are deposited in the European Genome-Phenome Archive (EGA) with the accession codes EGAD00001006641 (whole-genome sequencing) and EGAD00001006643 (targeted sequencing).

Code availability

The bespoke R scripts used for analysis and visualization in this study are available online from GitHub (


  1. Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).

    Article  CAS  PubMed  Google Scholar 

  4. Keller, P. J., Schmidt, A. D., Wittbrodt, J. & Stelzer, E. H. Reconstruction of zebrafish early embryonic development by scanned light sheet microscopy. Science 322, 1065–1069 (2008).

    Article  ADS  CAS  PubMed  Google Scholar 

  5. Strnad, P. et al. Inverted light-sheet microscope for imaging mouse pre-implantation development. Nat. Methods 13, 139–142 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  8. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Coorens, T. H. H. et al. Lineage-independent tumors in bilateral neuroblastoma. N. Engl. J. Med. 383, 1860–1865 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature (2021).

  13. Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature (2021).

  18. Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).

  20. Kuijk, E. et al. Early divergence of mutational processes in human fetal tissues. Sci. Adv. 5, eaaw1271 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kobayashi, T. & Surani, M. A. On the origin of the human germline. Development 145, dev150433 (2018).

  22. Aitken, S. J. et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 583, 265–270 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  24. Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Loftfield, E. et al. Predictors of mosaic chromosome Y loss and associations with mortality in the UK Biobank. Sci. Rep. 8, 12316 (2018).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  26. Voet, T., Vanneste, E. & Vermeesch, J. R. The human cleavage stage embryo is a cradle of chromosomal rearrangements. Cytogenet. Genome Res. 133, 160–168 (2011).

    Article  CAS  PubMed  Google Scholar 

  27. Shahbazi, M. N. et al. Developmental potential of aneuploid human embryos cultured beyond implantation. Nat. Commun. 11, 3987 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sancho, M. et al. Competitive interactions eliminate unfit embryonic stem cells at the onset of differentiation. Dev. Cell 26, 19–30 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  30. Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Robinson, P. S. et al. Elevated somatic mutation burdens in normal human cells due to defective DNA polymerases. Preprint at (2020).

  32. Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2020).

    Article  PubMed  Google Scholar 

  33. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020)

    Article  ADS  CAS  PubMed  Google Scholar 

  34. Li, H. & Durbin, R., Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).

    Article  PubMed  Google Scholar 

  36. Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Van Loo, P., et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  38. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 289–300 (1995).

  40. Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hoang, H. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Nguyen, L.-T. et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Novelli, M. et al. X-inactivation patch size in human female tissue confounds the assessment of tumor clonality. Proc. Natl Acad. Sci. USA 100, 3311–3314 (2003)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at (2018).

  46. Coorens, T. H. H. et al. Clonal hematopoiesis and therapy-related myeloid neoplasms following neuroblastoma treatment. Blood 137, 2992–2997 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the staff of the Wellcome Sanger Institute Sample Logistics, Genotyping, Pulldown, Sequencing and Informatics facilities for their contribution, especially L. O’Neill, C. Latimer and K. Roberts for their support with sample management and laboratory work; and S. Behjati, Y. S. Ju, S. Park, F. Abascal, J. Ijaz, P. Nicola and G. Collord for helpful discussions or critical review of the manuscript. This experiment was primarily funded by Wellcome (core funding to the Wellcome Sanger Institute and PhD studentship to T.H.H.C.; 203943/Z/16/Z). L.M. is a recipient of a Cancer Research UK (CRUK) Clinical PhD fellowship (C20/A20917) and the Jean Shank/Pathological Society of Great Britain and Ireland Intermediate Research Fellowship (grant reference no. 1175). T.J.M. is supported by CRUK and the Royal College of Surgeons (C63474/A27176). I.M. is funded by CRUK (C57387/A21777) and the Wellcome Trust. R.R. is funded by CRUK (C66259/A27114).

Author information

Authors and Affiliations



T.H.H.C., L.M., R.R. and M.R.S. conceived the study design. T.H.H.C. wrote the scripts and performed the analyses with help or input from R.S., J.C., M.D.C.N., M.S.C. and I.M. L.M., P.S.R., A.C. and T.R.W.O. performed the microdissections with support from Y.H. M.J.P. and A.R.J.L. called and analysed mitochondrial variants. T.J.M., A.N. and R.C.F. aided in sample procurement. M.R.S. oversaw the study. T.H.H.C. and M.R.S. wrote the manuscript with input from all other authors.

Corresponding author

Correspondence to Michael R. Stratton.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Nuria Lopez-Bigas, Aaron Mckenna, David Posada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Extended data figures and tables

Extended Data Fig. 1 VAF distributions reflect clonality of LCM sample.

a, Schematic of three different progenitor or stem cell contributions to the eventual sample. Monoclonal samples consist of the progeny of one cell, while oligoclonal and polyclonal are derived from a few and many progenitors, respectively. bd, VAF histograms and binomial decompositions for a monoclonal (b), oligoclonal (c) and polyclonal (d) sample. The red and blue dashed lines indicate clonal decomposition through a binomial mixture model, with the estimated peak VAF of clones indicated in the legend. The number indicated in the title of each histogram is the SNV burden.

Extended Data Fig. 2 Model of early embryogenesis and ABC.

a, Heat maps showing the results of estimates of the early embryonic bottleneck obtained through approximate Bayesian computation, with and without cell death rate as a parameter (Methods). The darkness of the colour indicates the frequency of the observed bottleneck in the accepted simulations. b, Estimates of the mutation rate per cell per division before and after zygotic genome activation (ZGA); the dot indicates the mean of the 20,000 accepted simulations and the line spans the 95% confidence interval.

Extended Data Fig. 3 The most recent common ancestors of tissues and completeness of early lineages.

ad, Phylogenetic trees with unit branch lengths for PD28690, showing the coalescence (red) of all samples from four tissues types: thyroid follicles (a), seminiferous tubules (b), small bowel crypts (c) and bronchial epithelium (d). The most recent common ancestor for all these tissues is the root of the tree. e, Sum of mean VAFs of branches of the same generation per bulk sample in PD28690 (n = 33), PD43850 (n = 1) and PD43851 (n = 2). The solid black line indicates the mean value across tissue samples. A total sum of mean VAFs approximating 0.5 indicates that all cells belong to one of the lineages of that generation and are accounted for, that is, no lineages are missing from the phylogeny. This is mostly the case for generations 1 and 2, but the total VAF of generation 3 indicates missing lineages.

Extended Data Fig. 4 Embryonic patch size in the colon.

a, Kernel smoothed 2D histogram of the linear distance (in number of crypts) and the number of shared SNVs between any two crypts from the same biopsy. The red line is shown at a shared SNV burden of 15, above which crypts were taken to be from the same embryonic patch. b, Histogram of the number of SNVs shared between all pairs of crypts showing a bimodal distribution on either side of an SNV burden of 15 (red line). c, Density plot of the prior distribution of the embryonic patch size radius. d, Plot of the radius versus the Euclidean distance in summary statistics between the simulations and our observed data. The red dots indicate those within the 5% closest simulations and are accepted. e, Density plot of the prior distribution (dashed line), the posterior distribution from the rejection method (black line) and the posterior distribution from the neural network regression (red line) of the embryonic patch size radius. f, A QQ-plot of the residuals of the neural network regression.

Extended Data Fig. 5 Clonal expansions later in life.

a, Phylogenetic tree for appendiceal crypts in PD28690, with annotated cancer driver mutations. An asterisk indicates that the two neighbouring crypts were taken as biological replicates of one another. Within the clade of crypts that acquired the BRAF mutation, the mutation burdens are Poisson distributed, consistent with a molecular clock (P = 0.99, dispersion test). Accordingly, we can estimate that the BRAF mutation was acquired before 23 years of age. b, c, Phylogeny (b) and sampling overview (c) for prostatic acini in PD28690, showing widespread benign prostatic hyperplasia in one biopsy. d, Histology and sampling overview alongside the phylogeny for a microscopic polyp in the colon of PD28690. e, Phylogeny of seminiferous tubules from PD42034, where a frameshift deletion in MEIOB was acquired after only six post-zygotic SNVs. Parts of the figure are composed of pictures from Servier Medical Art ( Servier Medical Art by Servier is licensed under a Creative Commons Attribution 3.0 Unported License (

Extended Data Fig. 6 Decomposition of polyclonal samples.

Phylogenetic trees with unit branch lengths for four polyclonal samples of the epidermis from PD28690, showing the contribution (blue) of early embryonic progenitors in the phylogeny to the sample. These samples were not used for the reconstruction of the phylogeny because of their lack of a dominant clone, but can still be decomposed into the contributing embryonic lineages that give rise to these polyclonal aggregates. For example, ‘SKN2_D2’, while somatically polyclonal, seems to be derived from a single early lineage.

Extended Data Fig. 7 Targeted resequencing in PD28690.

Cladogram of PD28690 with contribution to 84 bulk samples (none derived from testes) as assessed through targeted resequencing of embryonic and spermatogonia-specific variants. The colour of the branch indicates the mean VAF of substitutions on that branch across all bulk samples. Nodes that gave rise to only seminiferous tubules are annotated with an asterisk. Branches coming from those nodes do not contribute to the bulk samples, confirming that the segregation of primordial germ cell lineages coincides with the observed branching point on the phylogenies.

Extended Data Fig. 8 Early embryogenesis and bottlenecks.

a, Overview of lineage commitments in the early human embryo, up until gastrulation and early organogenesis. The blue arrows indicate the putative contribution of extraembryonic cells to embryonic lineages (for yolk sac haematopoiesis and intercalation of the endoderm) or lineages with an unknown origin (primordial germ cells). b, Schematic of the possible influence of multiple, successive bottlenecks on the eradication of a specific lineage in a certain population of cells. The two daughter lineages of the zygote are coloured in red and blue. Note that this is a toy example merely for illustration and the relative cell numbers or size of the bottlenecks need not represent reality.

Extended Data Fig. 9 Patterns of mitochondrial and nuclear SNVs.

ad, Phylogenies of nuclear SNVs with the VAF of mitochondrial mutations overlaid on them, showing a late shared SNV (a), an SNV that was heteroplasmic in the zygote (b), an SNV that is consistent with a shared subclone or stromal contamination (c) and an SNV recurrently acquired in samples from different tissues (d). e, Mutational spectrum and decomposition of early embryonic nuclear SNVs.

Extended Data Fig. 10 Loss of the Y chromosome.

a, b, Scatterplots showing the ratio between the mean Y-chromosomal coverage and autosomal coverage against the mean autosomal coverage for all samples from PD28690 (a) and PD43851 (b). The dashed red lines indicate the 95% confidence interval around an expected ratio of 0.5. The red dots indicate samples with significant evidence of loss of the Y chromosome. c, Phylogeny of PD28690 with samples exhibiting loss of the Y chromosome marked in red, indicating that all loss of the Y chromosome events are acquired independently.

Supplementary information

Supplementary Methods

This Supplementary Methods file includes Supplementary Tables 7–10 and Supplementary Fig. 1, and has the following sections: Proportion of SNVs filtered at each stage; Validation of phylogenies; Validation of non-monophyly of tissues; Recurrent SNVs and the infinite sites model; and Mutation rate in early embryogenesis.

Reporting Summary

Peer Review File

Supplementary Tables

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coorens, T.H.H., Moore, L., Robinson, P.S. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing