Abstract
Starting from the zygote, all cells in the human body continuously acquire mutations. Mutations shared between different cells imply a common progenitor and are thus naturally occurring markers for lineage tracing1,2. Here we reconstruct extensive phylogenies of normal tissues from three adult individuals using whole-genome sequencing of 511 laser capture microdissections. Reconstructed embryonic progenitors in the same generation of a phylogeny often contribute to different extents to the adult body. The degree of this asymmetry varies between individuals, with ratios between the two reconstructed daughter cells of the zygote ranging from 60:40 to 93:7. Asymmetries pervade subsequent generations and can differ between tissues in the same individual. The phylogenies resolve the spatial embryonic patterning of tissues, revealing contiguous patches of, on average, 301 crypts in the adult colonic epithelium derived from a most recent embryonic cell and also a spatial effect in brain development. Using data from ten additional men, we investigated the developmental split between soma and germline, with results suggesting an extraembryonic contribution to primordial germ cells. This research demonstrates that, despite reaching the same ultimate tissue patterns, early bottlenecks and lineage commitments lead to substantial variation in embryonic patterns both within and between individuals.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The DNA sequencing data are deposited in the European Genome-Phenome Archive (EGA) with the accession codes EGAD00001006641 (whole-genome sequencing) and EGAD00001006643 (targeted sequencing).
Code availability
The bespoke R scripts used for analysis and visualization in this study are available online from GitHub (https://github.com/TimCoorens/PanBody_Phylogenies).
References
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).
Keller, P. J., Schmidt, A. D., Wittbrodt, J. & Stelzer, E. H. Reconstruction of zebrafish early embryonic development by scanned light sheet microscopy. Science 322, 1065–1069 (2008).
Strnad, P. et al. Inverted light-sheet microscope for imaging mouse pre-implantation development. Nat. Methods 13, 139–142 (2016).
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Coorens, T. H. H. et al. Lineage-independent tumors in bilateral neuroblastoma. N. Engl. J. Med. 383, 1860–1865 (2020).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature https://doi.org/10.1038/s41586-021-03786-8 (2021).
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature https://doi.org/10.1038/s41586-021-03822-7 (2021).
Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).
Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
Kuijk, E. et al. Early divergence of mutational processes in human fetal tissues. Sci. Adv. 5, eaaw1271 (2019).
Kobayashi, T. & Surani, M. A. On the origin of the human germline. Development 145, dev150433 (2018).
Aitken, S. J. et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 583, 265–270 (2020).
Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).
Forsberg, L. A. et al. Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer. Nat. Genet. 46, 624–628 (2014).
Loftfield, E. et al. Predictors of mosaic chromosome Y loss and associations with mortality in the UK Biobank. Sci. Rep. 8, 12316 (2018).
Voet, T., Vanneste, E. & Vermeesch, J. R. The human cleavage stage embryo is a cradle of chromosomal rearrangements. Cytogenet. Genome Res. 133, 160–168 (2011).
Shahbazi, M. N. et al. Developmental potential of aneuploid human embryos cultured beyond implantation. Nat. Commun. 11, 3987 (2020).
Sancho, M. et al. Competitive interactions eliminate unfit embryonic stem cells at the onset of differentiation. Dev. Cell 26, 19–30 (2013).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).
Robinson, P. S. et al. Elevated somatic mutation burdens in normal human cells due to defective DNA polymerases. Preprint at https://doi.org/10.1101/2020.06.23.167668 (2020).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2020).
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020)
Li, H. & Durbin, R., Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics 56, 15.10.1–15.10.18 (2016).
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Van Loo, P., et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 289–300 (1995).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Gerstung, M., Papaemmanuil, E. & Campbell, P. J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics 30, 1198–1204 (2014).
Hoang, H. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
Nguyen, L.-T. et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2014).
Novelli, M. et al. X-inactivation patch size in human female tissue confounds the assessment of tumor clonality. Proc. Natl Acad. Sci. USA 100, 3311–3314 (2003)
Gori, K. & Baez-Ortega, A. sigfit: flexible Bayesian inference of mutational signatures. Preprint at https://doi.org/10.1101/372896 (2018).
Coorens, T. H. H. et al. Clonal hematopoiesis and therapy-related myeloid neoplasms following neuroblastoma treatment. Blood 137, 2992–2997 (2021).
Acknowledgements
We thank the staff of the Wellcome Sanger Institute Sample Logistics, Genotyping, Pulldown, Sequencing and Informatics facilities for their contribution, especially L. O’Neill, C. Latimer and K. Roberts for their support with sample management and laboratory work; and S. Behjati, Y. S. Ju, S. Park, F. Abascal, J. Ijaz, P. Nicola and G. Collord for helpful discussions or critical review of the manuscript. This experiment was primarily funded by Wellcome (core funding to the Wellcome Sanger Institute and PhD studentship to T.H.H.C.; 203943/Z/16/Z). L.M. is a recipient of a Cancer Research UK (CRUK) Clinical PhD fellowship (C20/A20917) and the Jean Shank/Pathological Society of Great Britain and Ireland Intermediate Research Fellowship (grant reference no. 1175). T.J.M. is supported by CRUK and the Royal College of Surgeons (C63474/A27176). I.M. is funded by CRUK (C57387/A21777) and the Wellcome Trust. R.R. is funded by CRUK (C66259/A27114).
Author information
Authors and Affiliations
Contributions
T.H.H.C., L.M., R.R. and M.R.S. conceived the study design. T.H.H.C. wrote the scripts and performed the analyses with help or input from R.S., J.C., M.D.C.N., M.S.C. and I.M. L.M., P.S.R., A.C. and T.R.W.O. performed the microdissections with support from Y.H. M.J.P. and A.R.J.L. called and analysed mitochondrial variants. T.J.M., A.N. and R.C.F. aided in sample procurement. M.R.S. oversaw the study. T.H.H.C. and M.R.S. wrote the manuscript with input from all other authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks Nuria Lopez-Bigas, Aaron Mckenna, David Posada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Extended data figures and tables
Extended Data Fig. 1 VAF distributions reflect clonality of LCM sample.
a, Schematic of three different progenitor or stem cell contributions to the eventual sample. Monoclonal samples consist of the progeny of one cell, while oligoclonal and polyclonal are derived from a few and many progenitors, respectively. b–d, VAF histograms and binomial decompositions for a monoclonal (b), oligoclonal (c) and polyclonal (d) sample. The red and blue dashed lines indicate clonal decomposition through a binomial mixture model, with the estimated peak VAF of clones indicated in the legend. The number indicated in the title of each histogram is the SNV burden.
Extended Data Fig. 2 Model of early embryogenesis and ABC.
a, Heat maps showing the results of estimates of the early embryonic bottleneck obtained through approximate Bayesian computation, with and without cell death rate as a parameter (Methods). The darkness of the colour indicates the frequency of the observed bottleneck in the accepted simulations. b, Estimates of the mutation rate per cell per division before and after zygotic genome activation (ZGA); the dot indicates the mean of the 20,000 accepted simulations and the line spans the 95% confidence interval.
Extended Data Fig. 3 The most recent common ancestors of tissues and completeness of early lineages.
a–d, Phylogenetic trees with unit branch lengths for PD28690, showing the coalescence (red) of all samples from four tissues types: thyroid follicles (a), seminiferous tubules (b), small bowel crypts (c) and bronchial epithelium (d). The most recent common ancestor for all these tissues is the root of the tree. e, Sum of mean VAFs of branches of the same generation per bulk sample in PD28690 (n = 33), PD43850 (n = 1) and PD43851 (n = 2). The solid black line indicates the mean value across tissue samples. A total sum of mean VAFs approximating 0.5 indicates that all cells belong to one of the lineages of that generation and are accounted for, that is, no lineages are missing from the phylogeny. This is mostly the case for generations 1 and 2, but the total VAF of generation 3 indicates missing lineages.
Extended Data Fig. 4 Embryonic patch size in the colon.
a, Kernel smoothed 2D histogram of the linear distance (in number of crypts) and the number of shared SNVs between any two crypts from the same biopsy. The red line is shown at a shared SNV burden of 15, above which crypts were taken to be from the same embryonic patch. b, Histogram of the number of SNVs shared between all pairs of crypts showing a bimodal distribution on either side of an SNV burden of 15 (red line). c, Density plot of the prior distribution of the embryonic patch size radius. d, Plot of the radius versus the Euclidean distance in summary statistics between the simulations and our observed data. The red dots indicate those within the 5% closest simulations and are accepted. e, Density plot of the prior distribution (dashed line), the posterior distribution from the rejection method (black line) and the posterior distribution from the neural network regression (red line) of the embryonic patch size radius. f, A QQ-plot of the residuals of the neural network regression.
Extended Data Fig. 5 Clonal expansions later in life.
a, Phylogenetic tree for appendiceal crypts in PD28690, with annotated cancer driver mutations. An asterisk indicates that the two neighbouring crypts were taken as biological replicates of one another. Within the clade of crypts that acquired the BRAF mutation, the mutation burdens are Poisson distributed, consistent with a molecular clock (P = 0.99, dispersion test). Accordingly, we can estimate that the BRAF mutation was acquired before 23 years of age. b, c, Phylogeny (b) and sampling overview (c) for prostatic acini in PD28690, showing widespread benign prostatic hyperplasia in one biopsy. d, Histology and sampling overview alongside the phylogeny for a microscopic polyp in the colon of PD28690. e, Phylogeny of seminiferous tubules from PD42034, where a frameshift deletion in MEIOB was acquired after only six post-zygotic SNVs. Parts of the figure are composed of pictures from Servier Medical Art (https://smart.servier.com/). Servier Medical Art by Servier is licensed under a Creative Commons Attribution 3.0 Unported License (https://creativecommons.org/licenses/by/3.0/).
Extended Data Fig. 6 Decomposition of polyclonal samples.
Phylogenetic trees with unit branch lengths for four polyclonal samples of the epidermis from PD28690, showing the contribution (blue) of early embryonic progenitors in the phylogeny to the sample. These samples were not used for the reconstruction of the phylogeny because of their lack of a dominant clone, but can still be decomposed into the contributing embryonic lineages that give rise to these polyclonal aggregates. For example, ‘SKN2_D2’, while somatically polyclonal, seems to be derived from a single early lineage.
Extended Data Fig. 7 Targeted resequencing in PD28690.
Cladogram of PD28690 with contribution to 84 bulk samples (none derived from testes) as assessed through targeted resequencing of embryonic and spermatogonia-specific variants. The colour of the branch indicates the mean VAF of substitutions on that branch across all bulk samples. Nodes that gave rise to only seminiferous tubules are annotated with an asterisk. Branches coming from those nodes do not contribute to the bulk samples, confirming that the segregation of primordial germ cell lineages coincides with the observed branching point on the phylogenies.
Extended Data Fig. 8 Early embryogenesis and bottlenecks.
a, Overview of lineage commitments in the early human embryo, up until gastrulation and early organogenesis. The blue arrows indicate the putative contribution of extraembryonic cells to embryonic lineages (for yolk sac haematopoiesis and intercalation of the endoderm) or lineages with an unknown origin (primordial germ cells). b, Schematic of the possible influence of multiple, successive bottlenecks on the eradication of a specific lineage in a certain population of cells. The two daughter lineages of the zygote are coloured in red and blue. Note that this is a toy example merely for illustration and the relative cell numbers or size of the bottlenecks need not represent reality.
Extended Data Fig. 9 Patterns of mitochondrial and nuclear SNVs.
a–d, Phylogenies of nuclear SNVs with the VAF of mitochondrial mutations overlaid on them, showing a late shared SNV (a), an SNV that was heteroplasmic in the zygote (b), an SNV that is consistent with a shared subclone or stromal contamination (c) and an SNV recurrently acquired in samples from different tissues (d). e, Mutational spectrum and decomposition of early embryonic nuclear SNVs.
Extended Data Fig. 10 Loss of the Y chromosome.
a, b, Scatterplots showing the ratio between the mean Y-chromosomal coverage and autosomal coverage against the mean autosomal coverage for all samples from PD28690 (a) and PD43851 (b). The dashed red lines indicate the 95% confidence interval around an expected ratio of 0.5. The red dots indicate samples with significant evidence of loss of the Y chromosome. c, Phylogeny of PD28690 with samples exhibiting loss of the Y chromosome marked in red, indicating that all loss of the Y chromosome events are acquired independently.
Supplementary information
Supplementary Methods
This Supplementary Methods file includes Supplementary Tables 7–10 and Supplementary Fig. 1, and has the following sections: Proportion of SNVs filtered at each stage; Validation of phylogenies; Validation of non-monophyly of tissues; Recurrent SNVs and the infinite sites model; and Mutation rate in early embryogenesis.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Coorens, T.H.H., Moore, L., Robinson, P.S. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021). https://doi.org/10.1038/s41586-021-03790-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-021-03790-y
This article is cited by
-
De novo detection of somatic mutations in high-throughput single-cell profiling data sets
Nature Biotechnology (2024)
-
Genetic variation across and within individuals
Nature Reviews Genetics (2024)
-
A statistical method for quantifying progenitor cells reveals incipient cell fate commitments
Nature Methods (2024)
-
Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples
Nature Protocols (2024)
-
Human embryonic genetic mosaicism and its effects on development and disease
Nature Reviews Genetics (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.