Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Clonal dynamics in early human embryogenesis inferred from somatic mutation


Cellular dynamics and fate decision in early human embryogenesis remain largely unknown owing to the challenges of performing studies in human embryos1. Here, we explored whole-genomes of 334 single-cell colonies and targeted deep sequences of 379 bulk tissues obtained from various anatomical locations of seven recently deceased adult human donors. Using somatic mutations as an intrinsic barcode, we reconstructed early cellular phylogenies that demonstrate (1) an endogenous mutational rate that is higher in the first cell division but decreases to approximately one per cell per cell division later in life; (2) universal unequal contribution of early cells to embryo proper, resulting from early cellular bottlenecks that stochastically set aside epiblast cells within the embryo; (3) examples of varying degrees of early clonal imbalances between tissues on the left and right sides of the body, different germ layers and specific anatomical parts and organs; (4) emergence of a few ancestral cells that will substantially contribute to adult cell pools in blood and liver; and (5) presence of mitochondrial DNA heteroplasmy in the fertilized egg. Our approach also provides insights into the age-related mutational processes and loss of sex chromosomes in normal somatic cells. In sum, this study provides a foundation for future studies to complete cellular phylogenies in human embryogenesis.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Tracing early cellular phylogenies using somatic mutations.
Fig. 2: Unequal contribution of early lineages to human bodies and early mutation rate.
Fig. 3: Timing of fate determination of early cells.
Fig. 4: mtDNA heteroplasmy in fertilized egg and late-stage mutations.

Data availability

Whole-genome and targeted sequencing data are deposited in the European Genome-phenome Archive (EGA) with accession EGAS00001004824 and are available for general research use.

Code availability

The information of sequenced clones and tissues, detected early mutations, and their anatomical tracking can be browsed through Somatic Clone Viewer ( In-house scripts for genomic analyses and simulation studies are available on GitHub ( and


  1. Wamaitha, S. E. & Niakan, K. K. Human pre-gastrulation development. Curr Top Dev Biol 128, 295–338 (2018).

    Article  CAS  PubMed  Google Scholar 

  2. Sender, R., Fuchs, S. & Milo, R. Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 14, e1002533 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Gasser, R. F., Cork, R. J., Stillwell, B. J. & McWilliams, D. T. Rebirth of human embryology. Dev. Dyn. 243, 621–628 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Nakamura, T. et al. A developmental coordinate of pluripotency among mice, monkeys and humans. Nature 537, 57–62 (2016).

    Article  ADS  CAS  PubMed  Google Scholar 

  5. Rossant, J. Mouse and human blastocyst-derived stem cells: vive les differences. Development 142, 9–12 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Xiang, L. et al. A developmental landscape of 3D-cultured human pre-gastrulation embryos. Nature 577, 537–542 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Shahbazi, M. N. & Zernicka-Goetz, M. Deconstructing and reconstructing the mouse and human early embryo. Nat. Cell Biol. 20, 878–887 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Samuels, M. E. & Friedman, J. M. Genetic mosaics and the germ line lineage. Genes (Basel) 6, 216–237 (2015).

    Article  CAS  Google Scholar 

  9. Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  11. Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  16. Coorens, T. H. H. et al. Extensive phylogenies of human development reveal variable embryonic patterns. Nature (2021).

  17. Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).

    Article  CAS  PubMed  Google Scholar 

  18. Hardy, K., Handyside, A. H. & Winston, R. M. The human blastocyst: cell number, death and allocation during late preimplantation development in vitro. Development 107, 597–604 (1989).

    Article  CAS  PubMed  Google Scholar 

  19. Sancho, M. et al. Competitive interactions eliminate unfit embryonic stem cells at the onset of differentiation. Dev. Cell 26, 19–30 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Biggins, J. S., Royer, C., Watanabe, T. & Srinivas, S. Towards understanding the roles of position and geometry on cell fate decisions during preimplantation development. Semin. Cell Dev. Biol. 47-48, 74–79 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85-90 (2021).

  22. Wennekamp, S., Mesecke, S., Nédélec, F. & Hiiragi, T. A self-organization framework for symmetry breaking in the mammalian embryo. Nat. Rev. Mol. Cell Biol. 14, 452–459 (2013).

    Article  PubMed  CAS  Google Scholar 

  23. Blakeley, P. et al. Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151–3165 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Schulz, K. N. & Harrison, M. M. Mechanisms regulating zygotic genome activation. Nat. Rev. Genet. 20, 221–234 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gardner, R. L. Normal bias in the direction of fetal rotation depends on blastomere composition during early cleavage in the mouse. PLoS ONE 5, e9610 (2010).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  26. McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  28. Shlush, L. I. Age-related clonal hematopoiesis. Blood 131, 496–504 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Wai, T., Teoli, D. & Shoubridge, E. A. The mitochondrial DNA genetic bottleneck results from replication of a subpopulation of genomes. Nat. Genet. 40, 1484–1488 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Cummins, J. M. The role of maternal mitochondria during oogenesis, fertilization and embryogenesis. Reprod. Biomed. Online 4, 176–182 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Floros, V. I. et al. Segregation of mitochondrial DNA heteroplasmy through a developmental genetic bottleneck in human embryos. Nat. Cell Biol. 20, 144–151 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Coller, H. A. et al. High frequency of homoplasmic mitochondrial DNA mutations in human tumors can be explained without selection. Nat. Genet. 28, 147–150 (2001).

    Article  CAS  PubMed  Google Scholar 

  33. Wonnapinij, P., Chinnery, P. F. & Samuels, D. C. The distribution of mitochondrial DNA heteroplasmy due to random genetic drift. Am. J. Hum. Genet. 83, 582–593 (2008).

    Article  CAS  Google Scholar 

  34. Sanders, M. A. et al. Life without mismatch repair. Preprint at (2021).

  35. Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lee, J. J.-K. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857.e21 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  38. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  40. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  41. Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature (2021).

  42. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  45. Zhu, M. et al. Somatic mutations increase hepatic clonal fitness and regeneration in chronic liver disease. Cell 177, 608–621.e12 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Fasching et al. Early developmental asymmetries in cell lineage trees in living individuals? Science 371, 1245–1248 (2021). 

  47. Jones, G. E. & Wise, C. J. Establishment, maintenance, and cloning of human dermal fibroblasts. Methods Mol. Biol. 75, 13–21 (1997).

    CAS  PubMed  Google Scholar 

  48. Rittié, L. & Fisher, G. J. Isolation and culture of skin fibroblasts. Methods Mol. Med. 117, 83–98 (2005).

    PubMed  Google Scholar 

  49. Vangipuram, M., Ting, D., Kim, S., Diaz, R. & Schüle, B. Skin punch biopsy explant culture for derivation of primary human fibroblasts. J. Vis. Exp. (77), e3779 (2013).

  50. Spinazzola, J. M. & Gussoni, E. Isolation of primary human skeletal muscle cells. Bio Protoc. 7, e2591 (2017).

  51. Oh, J. W. et al. A guide to studying human hair follicle cycling in vivo. J. Invest. Dermatol. 136, 34–44 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 11, 11.10.1–11.10.33 (2013).

    Google Scholar 

  55. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Gusfield, D. Efficient algorithms for inferring evolutionary trees. Networks 21, 19–28 (1991).

    Article  MathSciNet  MATH  Google Scholar 

  59. Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. GGTREE: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. (2016).

  60. Strnad, P. et al. Inverted light-sheet microscope for imaging mouse pre-implantation development. Nat. Methods 13, 139–142 (2016).

    Article  CAS  PubMed  Google Scholar 

  61. Bertorelle, G., Benazzo, A. & Mona, S. ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Mol. Ecol. 19, 2609–2625 (2010).

    Article  CAS  PubMed  Google Scholar 

  62. Zernicka-Goetz, M., Morris, S. A. & Bruce, A. W. Making a firm decision: multifaceted regulation of cell fate in the early mouse embryo. Nat. Rev. Genet. 10, 467–477 (2009).

    Article  CAS  PubMed  Google Scholar 

  63. Csilléry, K., François, O. & Blum, M. G. B. abc: an R package for approximate Bayesian computation (ABC). Methods Ecol. Evol. 3, 475–479 (2012).

    Article  Google Scholar 

  64. Krijthe, J. H. Rtsne: t-distributed stochastic neighbor embedding using a Barnes–Hut implementation (R package, 2015).

  65. van den Ameele, J., Li, A. Y. Z., Ma, H. & Chinnery, P. F. Mitochondrial heteroplasmy beyond the oocyte bottleneck. Semin. Cell Dev. Biol. 97, 156–166 (2020).

    Article  PubMed  CAS  Google Scholar 

  66. Taylor, R. W. & Turnbull, D. M. Mitochondrial DNA mutations in human disease. Nat. Rev. Genet. 6, 389–402 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Hickerson, M. J., Stahl, E. & Takebayashi, N. msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC Bioinformatics 8, 268 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Borchers, H. W. pracma: practical numerical math functions (R package, 2019).

  70. Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).

    Article  CAS  PubMed  Google Scholar 

  71. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We are deeply indebted to the individuals who donated their cells and tissues for this study. We thank M. R. Stratton, P. J. Campbell, T. H. H. Coorens, R. Rahbari, L. Moore, A. Cagan and M. V. Plikus for their fruitful comments and discussions. We thank J.-Y. Shin, M. S. Jun, H. Jung, J. H. Lee, H. S. Lee, J. Y. Jeon, J. H. Jeon, S. Cho and J. S. Lee, for their methodological advice and technical assistance. This work was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute funded by the Ministry of Health and Welfare of Korea (HI17C1836 to Y.S.J.), the Suh Kyungbae Foundation (SUHF-18010082 to Y.S.J.), a National Research Foundation (NRF) of Korea funded by the Korean Government (NRF-2020R1A3B2078973 to Y.S.J.; NRF-2019R1I1A3A01060675, NRF-2020R1A5A2017323 and NRF-2021R1C1C1014425 to J.W.O.; NRF-2020R1A6A3A01100621 to S.P.; and NRF-2019H1D3A2A02061168 to S.Y.K.).

Author information

Authors and Affiliations



Y.S.J., J.W.O. and S.P. conceived the study; N.M.M. designed the warm autopsies and developed the entire protocol of the clonal expansion and bulk tissue preparation with help from J.W.O.; J.W.O., N.M.M., J.-W.C, J.M.P., D.K., J.H.C., S.G.K., J.H.H., M.K., D.S.K., J.Y.P., K.Y., T.K., J.Y., and Y.A. conducted autopsies, tissue sampling and clonal expansions. S.A.O. conducted DNA work. S.P. and R.K. conducted most of the genome and statistical analyses with a contribution from S.Y.K. and Y.S.J.; J. Lee and J.W.P. contributed to large-scale genome data management. J. Lim conducted mutational signature analysis. Y.K. and D.H. constructed the web tool (SCV). Y.S.J., S.P., R.K., N.M.M and J.W.O. wrote the manuscript with contributions from all the authors. Y.S.J. and J.W.O. supervised the study.

Corresponding authors

Correspondence to Ji Won Oh or Young Seok Ju.

Ethics declarations

Competing interests

Y.S.J. is a founder and chief executive officer of GENOME INSIGHT Inc. J.W.O. is a founder and chief executive officer of Immune Square Inc.

Additional information

Peer review information Nature thanks Chloé Baron, Aaron Mckenna and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Sample collection and phylogeny reconstruction.

a, A summary of the seven warm autopsies, in which clones (dots) and bulk tissues were collected. Internal organs contributing to bulk tissues are annotated (black line). The information can also be browsed in the Somatic Clone Viewer ( The detailed information for each sample is shown in Supplementary Discussion 1, Supplementary Tables 2 and 3, for clones and bulk-tissues, respectively. b, A scatter plot showing peak VAF and mean coverage of the WGS of all the clones established in this study. Excluded clones, due to their multiclonal origins and/or atypical VAF peaks, are coloured in red. Finally, 334 clones are included for the downstream analyses. c, A schematic illustration demonstrating our approach reconstructing a developmental phylogenetic tree. Let S= [s1, s2, …, s5] be the set of 5 clones, and G= [g1, g2, …, g23] be the union set of mutations detected in one or more clones from the same individual. We then build a matrix M with rows labeled g1, g2, …, g23, and columns labeled s1, s2, …, s5. If the VAF of somatic mutation gi in clone sj is determined to be true, Mij was assigned to 1 (blue-coloured tile), while others to 0 (white-coloured tile). After removing germline variants (G0), we grouped all mutations with the same profile into a mutation group according to the sharing pattern between clones. Over 8 distinct mutation groups in this example, mutation matrix M8×5 is defined such that each column represents a clone and each row represents a mutation group. From the mutation matrix M8×5, we reconstructed a phylogenetic tree. d, Mutation matrices constructed from all clones and all the detected embryonic mutations of the five individuals.

Extended Data Fig. 2 Detecting EEMs.

a, Examples of embryonic mutations found in DB3. A heatmap in the upper panel shows the VAFs of the early mutations detected in the capture phase. Integrative Genomics Viewer screenshots for two early mutations in WGS of four clones and a polyclonal blood are also shown in the lower panel. b, The aggregated VAFs for L1 and L2 mutation sets in bulk tissues (dot). Median and interquartile ranges (IQRs) are shown in boxplots with whiskers (1.5*IQRs). c, A scatter plot showing the number of clones established and the number of discovered early mutations. The number of base substitutions (triangle) and indels (circle) are shown separately. Red lines and shaded areas represent fitted lines from linear regression and 95% confidence intervals.

Extended Data Fig. 3 Features of EEMs.

a, The correlation between the number of early base substitutions and indels. A red line and shaded area represent fitted line from linear regression and 95% confidence intervals. b, An example of microsatellite length-changing mutation identified in the L1 branch of DB3. In this study, microsatellite regions were defined as 5 or more repeat of 1-6 nucleotides in the reference genome. c, Mutational spectrums of early and late mutations found in the study. The signatures of the early base substitutions (n=488) and indels (n=49) are delineated by version 3 COSMIC signatures (top). For late mutations, we categorized clones into two groups by the amount of ultraviolet (UV) light mediated mutations, (1) clones with prevalent UV-mediated mutations and (2) clones with lack of UV-mediated mutations, using the 5% cutoff for the proportion of the SBS7 mutations. The middle panel displays the mutational spectrums of base substitutions (n=74,824) and indels (n=3,404) in clones with SBS7 ≤5% (without UV exposure), while the bottom panel displays the spectrums of substitutions (n=1,457,489) and indels (n=31,805) in clones with SBS7 >5% (with UV exposure).

Extended Data Fig. 4 Lineage relationship among physically adjacent clones.

a, Lineages of physically adjacent clones established from <1cm of the distance are shown with early phylogenies. Clone pairs from the same late lineage are coloured in red. b, The distribution of distance scores between clones from same anatomical region. Distance score was calculated by 1/(No. of shared mutations +1). The random distributions (density plots) were generated by randomly assigning clones to lineages on the established phylogenies. A red line represents the actual mean distance score the clones. Empirical p-values from simulation (n=1,000) are shown.

Extended Data Fig. 5 The patterns of early phylogenies.

a, Annotated phylogenetic trees of DB3, DB6, DB8, DB9, and DB10. Dichotomy (bifurcation) and polytomy (multifurcation) nodes are indicated by black-filled and hollow circles, respectively. At bifurcation nodes, two daughter lineages are then coloured in red (major) or blue (minor), according to their relative contribution in phylogenies. b, A schematic illustration demonstrating informative and uninformative cell divisions. In contrast to a cell division accompanying spontaneous mutations (informative division, left of dashed line), a cell division without intrinsic mutation cannot be reflected in our phylogenetic tree (uninformative division, right of dashed line) due to a lack of ‘cellular barcodes’. c, A schematic illustration showing the effect of initial mutation rate on the pattern of the trees. d, The unequal contribution of the two earliest branches (L1 and L2) consistently found in phylogenies and bulk tissues. Pie graphs represent the proportion of each lineage counted in the phylogenetic trees. Horizontal bar graphs show the VAFs of the lineage-specific mutations in targeted sequencing of bulk tissues. Expected VAFs from the phylogenies are shown by red dashed lines. e, A correlation between VAFs of the early mutations expected in the phylogenies (x-axis) and observed in bulk tissues (y-axis). Median and interquartile ranges (IQRs) are shown in boxplots with whiskers (1.5*IQRs). A red line drawing shows y=x for comparison. f, The imbalance ratio at bifurcating nodes, which is the ratio between the numbers of the clones of major and minor lineages. The late-branched clones from the same lineage were counted as a single clone.

Extended Data Fig. 6 Cellular bottleneck and phylogenetic tree.

a, A developmental model showing the lineage imbalance in epiblast as an origin of global unequal L1 and L2 contribution. This model assumes the number of cells (n) is selected for epiblast at s-cell stage. We presume two different mutation rates (R≤2 and R>2) in early embryogenesis. R≤2, R>2 are mutation rates until and after 2-cell stage, respectively. b, A cellular genealogy scenario that can explain the features of early phylogenetic trees. Assuming that the mutation rate is constant, a longer branch results from the lineage that contribute less to the embryo. c, Impact of stochastic cellular segregation during embryogenesis on early developmental phylogeny. This illustration shows the consequence of biased selection on cellular phylogeny. In a case that all of the epiblast cells are derived from one cell in two-cell stage, mutations accumulated at first division are shared among all embryonic cells. High initial mutation rate could be masked in this situation.

Extended Data Fig. 7 Imbalanced distribution of early embryonic cells in adult body axes.

a, The phylogenetic trees of five individuals coloured by the significance levels of imbalances between the left and right tissues. Mutations with higher VAF in left tissues are coloured in red, while with higher VAF in right tissues are coloured in blue. The left-right clonal imbalances are frequently seen from very early lineages (<10 early mutations) in all the individuals. b, t-SNE clustering of bulk tissues using the VAFs of the embryonic mutations for the four individuals (DB6 is available in Fig. 3b). The separation of left and right tissues is shown in top panels. Other information, such as anatomical regions and organs, is shown in bottom panels. c, A schematic illustration showing the emergence of primitive streak and formation of left-right axis in early embryogenesis. d, Box plots illustrating the median VAFs and interquartile ranges (IQRs) with whiskers (1.5*IQRs) of L1 and L2 mutations in cranial or caudal tissues. There was no significant VAF difference between cranial- and caudal-side tissues (two-sided Wilcoxon test). NS, not significant.

Extended Data Fig. 8 Imbalanced distribution of early embryonic cells in adult anatomical regions.

a, Median VAFs of the early mutations in the bulk tissues according to their dominant germ layers. The horizontal axis shows early mutations sorted by the averaged VAFs in bulk tissues in descending order, approximately from earlier to later mutations. Tissues with mixed germ layers are excluded in this figure. The lines are fitted curves by locally estimated scatterplot smoothing (LOESS) methods. b, The phylogenetic trees coloured by the significance of imbalances between ectoderm and meso-endodermal tissues. DB3 was not suitable for the analysis due to a lack of ectodermal tissues sequenced. Comparisons were performed by two-sided Wilcoxon tests. Red asterisks indicate the estimated point of the branching of the ectoderm-dominant lineage. c, the phylogenetic trees of four individuals (DBs 3, 6, 8, and 10) coloured by the VAF in blood tissues. An equivalent figure for DB9 is shown in Fig. 3d. Red asterisks indicate the estimated point of the branching of the blood-enriched lineages. Blue asterisk in DB10 indicate the major lineage of contaminated tumour cells in blood. d, Ratios of VAF for embryonic mutations between observed in blood tissues and expected from phylogenetic trees. The molecular time of the embryogenesis is shown on the x-axis by the number of mutations. Dots in the direct lineages are linked by lines. e, Histograms for showing the number of the blood-specific mutations (absent in phylogenetic trees). Blood-specific mutation was not found in DB3 (n=0). DB10 is excluded due to tumour contamination in the blood. f, The phylogenetic tree of DB10 coloured by VAF in cancer tissue with the branching point of the ancestral cell of breast cancer (shown by a red asterisk). g, Schematic representation demonstrating the clonal imbalance and their timing in early embryogenesis.

Extended Data Fig. 9 Heteroplasmic mtDNA variants in fertilized eggs.

a, VAFs of MT:16,256 C>T substitution (frequently found in DB10 clones) in WGS of 279 clones explored in the study. Applying a VAF cutoff of 0.5%, the variant was detected only in the 14 clones all established from DB10. b, a developmental model for inferring the heteroplasmic level of a mitochondrial variant in a fertilized egg. We assumed that f% of mtDNA in a fertilized egg has a functionally neutral mtDNA variant (mutant-mtDNA), which randomly segregates to daughter cells in successive cell divisions. Two summary statistics were drawn from this model: 1) the proportion of samples harbouring mutant-mtDNA (p), and 2) the median heteroplasmic level of mutant-mtDNA (h). We compared the summary statistics (p, h) of each simulation to the observed summary statistics, and constructed the posterior distribution of f using the neural network regression algorithm of an approximate Bayesian computation. For detail, see the Methods section. c, a histogram of the null distribution of the statistic for the goodness of fit test assuming our model. d, Cross-validation to access the accuracy of parameter inference. e, a possible scenario underlying the recurrent mtDNA mutation. Mitochondrial bottleneck during the cleavage and random segregation of mtDNA during mitosis may underlie the early mtDNA variant. f, mtDNA variants and their heteroplasmic levels with 95% confidence intervals estimated by simulation (n=500,000) to be harboured in fertilized egg.

Extended Data Fig. 10 Features of late mutations.

a, Scatter plot showing the correlation between the number of total mutations and recurrent mutations. b, Total number and signature of somatic mutations in each of the 334 clones. Horizontal axis represents each clone in decreasing order of total mutation numbers. c, Linear correlation between the numbers of SBS7 (UV-mediated) and SBS5 (an endogenous, clock-like) mutations in skin fibroblast clones. Approximately one additional SBS5 mutation is acquired per ten SBS7 mutations. A red line represents the result of linear regression. d, The mutational spectrum of late recurrent base substitutions (n=619). e, A massive heterogeneity of UV-mediated mutational burden among clones established in the close anatomical location (inter-clonal distance < ~1cm; top). The number of clones in each location is illustrated at the bottom. f, The rate of purely endogenous mutations in skin fibroblasts showing a linear correlation with age (24.3 substitutions per year). A red line represents the result of linear regression. The median number of endogenous substitutions with interquartile ranges (IQRs) from clones are drawn by boxplots (whiskers=1.5*IQRs) and scatter plots.

Extended Data Fig. 11 Somatic copy-number changes in normal cells.

a, Bar plot showing the frequency of large-scale copy-number alterations (> 10 Mb) per segment type detected in clones. b, Bar plot showing the counts of whole-chromosomal copy-number changes per chromosome. c, Timing estimation of the copy-number gains (> 50 Mb) observed in the clones. Ages at death are shown in red dashed lines. fb., fibroblast.

Supplementary information

Supplementary Information

This file contains Supplementary Discussion 1-9, including Supplementary Figures 1-6, and additional references.

Reporting Summary

Peer Review File

Supplementary Table 1

Demographic features, clone number, and mutation rate of the participants.

Supplementary Table 2

Cell types, tissue locations, and mutational features of 334 single-cell clones.

Supplementary Table 3

Tissue type, location, and t-SNE information of 379 bulk tissues recapturing early mutations via targeted deep sequencing.

Supplementary Table 4

Information on whole-genome sequences for bulk tissues.

Supplementary Table 5

No. of somatic variants and phylogenetic structure of cellular lineages.

Supplementary Table 6

List of early embryonic mutations detected in this study.

Supplementary Table 7

Number of variants in each filtering steps.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, S., Mali, N.M., Kim, R. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing