Starting from the zygote, all cells in the human body continuously acquire mutations. Mutations shared between different cells imply a common progenitor and are thus naturally occurring markers for lineage tracing1,2. Here we reconstruct extensive phylogenies of normal tissues from three adult individuals using whole-genome sequencing of 511 laser capture microdissections. Reconstructed embryonic progenitors in the same generation of a phylogeny often contribute to different extents to the adult body. The degree of this asymmetry varies between individuals, with ratios between the two reconstructed daughter cells of the zygote ranging from 60:40 to 93:7. Asymmetries pervade subsequent generations and can differ between tissues in the same individual. The phylogenies resolve the spatial embryonic patterning of tissues, revealing contiguous patches of, on average, 301 crypts in the adult colonic epithelium derived from a most recent embryonic cell and also a spatial effect in brain development. Using data from ten additional men, we investigated the developmental split between soma and germline, with results suggesting an extraembryonic contribution to primordial germ cells. This research demonstrates that, despite reaching the same ultimate tissue patterns, early bottlenecks and lineage commitments lead to substantial variation in embryonic patterns both within and between individuals.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The bespoke R scripts used for analysis and visualization in this study are available online from GitHub (https://github.com/TimCoorens/PanBody_Phylogenies).
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).
Keller, P. J., Schmidt, A. D., Wittbrodt, J. & Stelzer, E. H. Reconstruction of zebrafish early embryonic development by scanned light sheet microscopy. Science 322, 1065–1069 (2008).
Strnad, P. et al. Inverted light-sheet microscope for imaging mouse pre-implantation development. Nat. Methods 13, 139–142 (2016).
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Coorens, T. H. H. et al. Lineage-independent tumors in bilateral neuroblastoma. N. Engl. J. Med. 383, 1860–1865 (2020).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature https://doi.org/10.1038/s41586-021-03786-8 (2021).
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature https://doi.org/10.1038/s41586-021-03822-7 (2021).
Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).
Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
Kuijk, E. et al. Early divergence of mutational processes in human fetal tissues. Sci. Adv. 5, eaaw1271 (2019).
Kobayashi, T. & Surani, M. A. On the origin of the human germline. Development 145, dev150433 (2018).
Aitken, S. J. et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 583, 265–270 (2020).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).
Loftfield, E. et al. Predictors of mosaic chromosome Y loss and associations with mortality in the UK Biobank. Sci. Rep. 8, 12316 (2018).
Voet, T., Vanneste, E. & Vermeesch, J. R. The human cleavage stage embryo is a cradle of chromosomal rearrangements. Cytogenet. Genome Res. 133, 160–168 (2011).
Shahbazi, M. N. et al. Developmental potential of aneuploid human embryos cultured beyond implantation. Nat. Commun. 11, 3987 (2020).
Sancho, M. et al. Competitive interactions eliminate unfit embryonic stem cells at the onset of differentiation. Dev. Cell 26, 19–30 (2013).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).
Robinson, P. S. et al. Elevated somatic mutation burdens in normal human cells due to defective DNA polymerases. Preprint at https://doi.org/10.1101/2020.06.23.167668 (2020).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2020).
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020)
Li, H. & Durbin, R., Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Van Loo, P., et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 289–300 (1995).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).
Hoang, H. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
Nguyen, L.-T. et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2014).
Novelli, M. et al. X-inactivation patch size in human female tissue confounds the assessment of tumor clonality. Proc. Natl Acad. Sci. USA 100, 3311–3314 (2003)
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at https://doi.org/10.1101/372896 (2018).
Coorens, T. H. H. et al. Clonal hematopoiesis and therapy-related myeloid neoplasms following neuroblastoma treatment. Blood 137, 2992–2997 (2021).
We thank the staff of the Wellcome Sanger Institute Sample Logistics, Genotyping, Pulldown, Sequencing and Informatics facilities for their contribution, especially L. O’Neill, C. Latimer and K. Roberts for their support with sample management and laboratory work; and S. Behjati, Y. S. Ju, S. Park, F. Abascal, J. Ijaz, P. Nicola and G. Collord for helpful discussions or critical review of the manuscript. This experiment was primarily funded by Wellcome (core funding to the Wellcome Sanger Institute and PhD studentship to T.H.H.C.; 203943/Z/16/Z). L.M. is a recipient of a Cancer Research UK (CRUK) Clinical PhD fellowship (C20/A20917) and the Jean Shank/Pathological Society of Great Britain and Ireland Intermediate Research Fellowship (grant reference no. 1175). T.J.M. is supported by CRUK and the Royal College of Surgeons (C63474/A27176). I.M. is funded by CRUK (C57387/A21777) and the Wellcome Trust. R.R. is funded by CRUK (C66259/A27114).
The authors declare no competing interests.
Peer review information Nature thanks Nuria Lopez-Bigas, Aaron Mckenna, David Posada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Extended data figures and tables
a, Schematic of three different progenitor or stem cell contributions to the eventual sample. Monoclonal samples consist of the progeny of one cell, while oligoclonal and polyclonal are derived from a few and many progenitors, respectively. b–d, VAF histograms and binomial decompositions for a monoclonal (b), oligoclonal (c) and polyclonal (d) sample. The red and blue dashed lines indicate clonal decomposition through a binomial mixture model, with the estimated peak VAF of clones indicated in the legend. The number indicated in the title of each histogram is the SNV burden.
a, Heat maps showing the results of estimates of the early embryonic bottleneck obtained through approximate Bayesian computation, with and without cell death rate as a parameter (Methods). The darkness of the colour indicates the frequency of the observed bottleneck in the accepted simulations. b, Estimates of the mutation rate per cell per division before and after zygotic genome activation (ZGA); the dot indicates the mean of the 20,000 accepted simulations and the line spans the 95% confidence interval.
Extended Data Fig. 3 The most recent common ancestors of tissues and completeness of early lineages.
a–d, Phylogenetic trees with unit branch lengths for PD28690, showing the coalescence (red) of all samples from four tissues types: thyroid follicles (a), seminiferous tubules (b), small bowel crypts (c) and bronchial epithelium (d). The most recent common ancestor for all these tissues is the root of the tree. e, Sum of mean VAFs of branches of the same generation per bulk sample in PD28690 (n = 33), PD43850 (n = 1) and PD43851 (n = 2). The solid black line indicates the mean value across tissue samples. A total sum of mean VAFs approximating 0.5 indicates that all cells belong to one of the lineages of that generation and are accounted for, that is, no lineages are missing from the phylogeny. This is mostly the case for generations 1 and 2, but the total VAF of generation 3 indicates missing lineages.
a, Kernel smoothed 2D histogram of the linear distance (in number of crypts) and the number of shared SNVs between any two crypts from the same biopsy. The red line is shown at a shared SNV burden of 15, above which crypts were taken to be from the same embryonic patch. b, Histogram of the number of SNVs shared between all pairs of crypts showing a bimodal distribution on either side of an SNV burden of 15 (red line). c, Density plot of the prior distribution of the embryonic patch size radius. d, Plot of the radius versus the Euclidean distance in summary statistics between the simulations and our observed data. The red dots indicate those within the 5% closest simulations and are accepted. e, Density plot of the prior distribution (dashed line), the posterior distribution from the rejection method (black line) and the posterior distribution from the neural network regression (red line) of the embryonic patch size radius. f, A QQ-plot of the residuals of the neural network regression.
a, Phylogenetic tree for appendiceal crypts in PD28690, with annotated cancer driver mutations. An asterisk indicates that the two neighbouring crypts were taken as biological replicates of one another. Within the clade of crypts that acquired the BRAF mutation, the mutation burdens are Poisson distributed, consistent with a molecular clock (P = 0.99, dispersion test). Accordingly, we can estimate that the BRAF mutation was acquired before 23 years of age. b, c, Phylogeny (b) and sampling overview (c) for prostatic acini in PD28690, showing widespread benign prostatic hyperplasia in one biopsy. d, Histology and sampling overview alongside the phylogeny for a microscopic polyp in the colon of PD28690. e, Phylogeny of seminiferous tubules from PD42034, where a frameshift deletion in MEIOB was acquired after only six post-zygotic SNVs. Parts of the figure are composed of pictures from Servier Medical Art (https://smart.servier.com/). Servier Medical Art by Servier is licensed under a Creative Commons Attribution 3.0 Unported License (https://creativecommons.org/licenses/by/3.0/).
Phylogenetic trees with unit branch lengths for four polyclonal samples of the epidermis from PD28690, showing the contribution (blue) of early embryonic progenitors in the phylogeny to the sample. These samples were not used for the reconstruction of the phylogeny because of their lack of a dominant clone, but can still be decomposed into the contributing embryonic lineages that give rise to these polyclonal aggregates. For example, ‘SKN2_D2’, while somatically polyclonal, seems to be derived from a single early lineage.
Cladogram of PD28690 with contribution to 84 bulk samples (none derived from testes) as assessed through targeted resequencing of embryonic and spermatogonia-specific variants. The colour of the branch indicates the mean VAF of substitutions on that branch across all bulk samples. Nodes that gave rise to only seminiferous tubules are annotated with an asterisk. Branches coming from those nodes do not contribute to the bulk samples, confirming that the segregation of primordial germ cell lineages coincides with the observed branching point on the phylogenies.
a, Overview of lineage commitments in the early human embryo, up until gastrulation and early organogenesis. The blue arrows indicate the putative contribution of extraembryonic cells to embryonic lineages (for yolk sac haematopoiesis and intercalation of the endoderm) or lineages with an unknown origin (primordial germ cells). b, Schematic of the possible influence of multiple, successive bottlenecks on the eradication of a specific lineage in a certain population of cells. The two daughter lineages of the zygote are coloured in red and blue. Note that this is a toy example merely for illustration and the relative cell numbers or size of the bottlenecks need not represent reality.
a–d, Phylogenies of nuclear SNVs with the VAF of mitochondrial mutations overlaid on them, showing a late shared SNV (a), an SNV that was heteroplasmic in the zygote (b), an SNV that is consistent with a shared subclone or stromal contamination (c) and an SNV recurrently acquired in samples from different tissues (d). e, Mutational spectrum and decomposition of early embryonic nuclear SNVs.
a, b, Scatterplots showing the ratio between the mean Y-chromosomal coverage and autosomal coverage against the mean autosomal coverage for all samples from PD28690 (a) and PD43851 (b). The dashed red lines indicate the 95% confidence interval around an expected ratio of 0.5. The red dots indicate samples with significant evidence of loss of the Y chromosome. c, Phylogeny of PD28690 with samples exhibiting loss of the Y chromosome marked in red, indicating that all loss of the Y chromosome events are acquired independently.
This Supplementary Methods file includes Supplementary Tables 7–10 and Supplementary Fig. 1, and has the following sections: Proportion of SNVs filtered at each stage; Validation of phylogenies; Validation of non-monophyly of tissues; Recurrent SNVs and the infinite sites model; and Mutation rate in early embryogenesis.
About this article
Cite this article
Coorens, T.H.H., Moore, L., Robinson, P.S. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021). https://doi.org/10.1038/s41586-021-03790-y