Abstract
Quantifying the number of progenitor cells that found an organ, tissue or cell population is of fundamental importance for understanding the development and homeostasis of a multicellular organism. Previous efforts rely on marker genes that are specifically expressed in progenitors. This strategy is, however, often hindered by the lack of ideal markers. Here we propose a general statistical method to quantify the progenitors of any tissues or cell populations in an organism, even in the absence of progenitor-specific markers, by exploring the cell phylogenetic tree that records the cell division history during development. The method, termed targeting coalescent analysis (TarCA), computes the probability that two randomly sampled cells of a tissue coalesce within the tissue-specific monophyletic clades. The inverse of this probability then serves as a measure of the progenitor number of the tissue. Both mathematic modeling and computer simulations demonstrated the high accuracy of TarCA, which was then validated using real data from nematode, fruit fly and mouse, all with related cell phylogenetic trees. We further showed that TarCA can be used to identify lineage-specific upregulated genes during embryogenesis, revealing incipient cell fate commitments in mouse embryos.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The Yuan et al. dataset of C. elegans and P. marina is available at https://github.com/helloicyvodka/DELTA_code. Cell phylogenies of D. melanogaster can be found in source data files of Liu et al. (https://doi.org/10.1038/s41592-021-01325-x). The data about cell phylogenies of mouse embryogenesis are available in the Gene Expression Omnibus database under accession number GSE117542 and data of annotation from Pijuan-Sala et al. are available via http://tome.gs.washington.edu. Simulated phylogenies constructed by Fang et al. are available at https://doi.org/10.5281/zenodo.7112097. The data of hematopoiesis can be accessed at https://cospar.readthedocs.io/. All source data supporting the findings of the present study are available at https://github.com/shadowdeng1994/TarCA_sourcedata. Source data are provided with this paper.
Code availability
All custom codes for processing the data are available at https://github.com/shadowdeng1994/TarCA.
References
Frankham, R. Relationship of genetic variation to population size in wildlife. Conserv. Biol. 10, 1500–1508 (1996).
Charlesworth, B. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205 (2009).
Goodell, M. A., Nguyen, H. & Shroyer, N. Somatic stem cell heterogeneity: diversity in the blood, skin and intestinal stem cell compartments. Nat. Rev. Mol. Cell Biol. 16, 299–309 (2015).
Bhartiya, D. et al. Evolving definition of adult stem/progenitor cells. Stem Cell Rev. Rep. 15, 456–458 (2019).
Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, 1983).
Beerman, I. & Rossi, D. J. Epigenetic control of stem cell potential during homeostasis, aging, and disease. Cell Stem Cell 16, 613–625 (2015).
Forsberg, L. A., Gisselsson, D. & Dumanski, J. P. Mosaicism in health and disease—clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017).
Risques, R. A. & Kennedy, S. R. Aging and the rise of somatic cancer-associated mutations in normal tissues. PLoS Genet. 14, e1007108 (2018).
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Li, R. Y. et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 597, 398–403 (2021).
Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).
Clevers, H. & Watt, F. M. Defining adult stem cells by function, not by phenotype. Annu. Rev. Biochem. 87, 1015–1027 (2018).
Das, R. N. & Yaniv, K. Discovering new progenitor cell populations through lineage tracing and in vivo imaging. Cold Spring Harb. Perspect. Biol. 12, a035618 (2020).
VanHorn, S. & Morris, S. A. Next-generation lineage tracing and fate mapping to interrogate development. Dev. Cell 56, 7–21 (2021).
Kretzschmar, K. & Watt, F. M. Lineage tracing. Cell 148, 33–45 (2012).
Shu, H. S. et al. Tracing the skeletal progenitor transition during postnatal bone formation. Cell Stem Cell 28, 2122–2136 (2021).
Cattaneo, P. et al. Parallel lineage-tracing studies establish fibroblasts as the prevailing in vivo adipocyte progenitor. Cell Rep. 30, 571–582 (2020).
Wei, Y. et al. Liver homeostasis is maintained by midlobular zone 2 hepatocytes. Science 371, eabb1625 (2021).
Liu, K., Jin, H. & Zhou, B. Genetic lineage tracing with multiple DNA recombinases: a user’s guide for conducting more precise cell fate mapping studies. J. Biol. Chem. 295, 6413–6424 (2020).
He, L. et al. Enhancing the precision of genetic lineage tracing using dual recombinases. Nat. Med. 23, 1488–1498 (2017).
Farrell, J. A. et al. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science 360, eaar3131 (2018).
Wagner, D. E. & Klein, A. M. Lineage tracing meets single-cell omics: opportunities and challenges. Nat. Rev. Genet. 21, 410–427 (2020).
Liu, K. et al. Mapping single-cell-resolution cell phylogeny reveals cell population dynamics during organ development. Nat. Methods 18, 1506–1514 (2021).
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
Chan, M. M. et al. Molecular recording of mammalian embryogenesis. Nature 570, 77–82 (2019).
Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an organism exposes its cell lineage tree. PLoS Comput. Biol. 1, 382–394 (2005).
Fu, Y. X. & Li, W. H. Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Popul. Biol. 56, 1–10 (1999).
Yao, Z., Liu, K., Deng, S. & He, X. An instantaneous coalescent method insensitive to population structure. J. Genet. Genomics 48, 219–224 (2021).
Hu, Z., Fu, Y. X., Greenberg, A. J., Wu, C. I. & Zhai, W. Age-dependent transition from cell-level to population-level control in murine intestinal homeostasis revealed by coalescence analysis. PLoS Genet. 9, e1003326 (2013).
Werner, B. et al. Measuring single cell divisions in human tissues from multi-region sequencing data. Nat. Commun. 11, 1035 (2020).
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
Smadbeck, P. & Stumpf, M. P. H. Coalescent models for developmental biology and the spatio-temporal dynamics of growing tissues. J. R. Soc. Interface 13, 20160112 (2016).
Slatkin, M. Simulating genealogies of selected alleles in a population of variable size. Genet. Res. 78, 49–57 (2001).
Stadler, T., Pybus, O. G. & Stumpf, M. P. H. Phylodynamics for cell biologists. Science 371, eaah6266 (2021).
Fang, W. et al. Quantitative fate mapping: a general framework for analyzing progenitor state dynamics via retrospective lineage barcoding. Cell 185, 4604–4620 (2022).
Konno, N. et al. Deep distributed computing to reconstruct extremely large lineage trees. Nat. Biotechnol. 40, 566–575 (2022).
Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).
Houthoofd, W. et al. Embryonic cell lineage of the marine nematode Pellioditis marina. Dev. Biol. 258, 57–69 (2003).
Lehner, C. F., Jacobs, H. W., Sauer, K. & Meyer, C. A. Regulation of the embryonic cell proliferation by Drosophila cyclin D and cyclin E complexes. Novartis Found. Symposia 237, 43–54 (2001).
Aldaz, S. & Escudero, L. M. Imaginal discs. Curr. Biol. 20, 429–431 (2010).
Beira, J. V. & Paro, R. The legacy of Drosophila imaginal discs. Chromosoma 125, 573–592 (2016).
Madhavan, M. M. & Schneiderman, H. A. Histological analysis of the dynamics of growth of imaginal discs and histoblast nests during the larval development of Drosophila melanogaster. Wilhelm. Roux’s Arch. Dev. Biol. 183, 269–305 (1977).
Crews, S. T. Drosophila embryonic CNS development: neurogenesis, gliogenesis, cell fate, and differentiation. Genetics 213, 1111–1144 (2019).
Lengyel, J. A. & Iwaki, D. D. It takes guts: the Drosophila hindgut as a model system for organogenesis. Dev. Biol. 243, 1–19 (2002).
Nakamura, M. et al. Reduced cell number in the hindgut epithelium disrupts hindgut left–right asymmetry in a mutant of pebble, encoding a RhoGEF, in Drosophila embryos. Mech. Dev. 130, 169–180 (2013).
Rivera-Perez, J. A. & Hadjantonakis, A. K. The dynamics of morphogenesis in the early mouse embryo. Cold Spring Harb. Perspect. Biol. 7, a015867 (2015).
Carlson, B. M. Gastrulation and germ layer formation. in Reference Module in Biomedical Sciences (2015).
Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat. Genet. 54, 328–341 (2022).
Nowotschin, S. & Hadjantonakis, A. K. Guts and gastrulation: emergence and convergence of endoderm in the mouse embryo. Curr. Top. Dev. Biol. 136, 429–454 (2020).
Smith, R. J. et al. Single-cell chromatin profiling of the primitive gut tube reveals regulatory dynamics underlying lineage fate decisions. Nat. Commun. 13, 2965 (2022).
Nowotschin, S. et al. The emergent landscape of the mouse gut endoderm at single-cell resolution. Nature 569, 361–367 (2019).
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Li, L. C. et al. Single-cell patterning and axis characterization in the murine and human definitive endoderm. Cell Res. 31, 326–344 (2021).
Kwon, G. S., Viotti, M. & Hadjantonakis, A. K. The endoderm of the mouse embryo arises by dynamic widespread intercalation of embryonic and extraembryonic lineages. Dev. Cell 15, 509–520 (2008).
Wang, Z. et al. Insulin-like growth factor-1 signaling in lung development and inflammatory lung diseases. BioMed Res. Int. 2018, 6057589 (2018).
Cebola, I. et al. TEAD and YAP regulate the enhancer network of human embryonic pancreatic progenitors. Nat. Cell Biol. 17, 615–626 (2015).
Gao, N., White, P. & Kaestner, K. H. Establishment of intestinal identity and epithelial-mesenchymal signaling by Cdx2. Dev. Cell 16, 588–599 (2009).
Ikonomou, L. & Kotton, D. N. Derivation of endodermal progenitors from pluripotent stem cells. J. Cell. Physiol. 230, 246–258 (2015).
Weinreb, C., Rodriguez-Fraticelli, A., Camargo, F. D. & Klein, A. M. Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science 367, eaaw3381 (2020).
Wang, S. W., Herriges, M. J., Hurley, K., Kotton, D. N. & Klein, A. M. CoSpar identifies early cell fate biases from single-cell transcriptomic and lineage information. Nat. Biotechnol. 40, 1066–1074 (2022).
Chen, C., Liao, Y. & Peng, G. Connecting past and present: single-cell lineage tracing. Protein Cell 13, 790–807 (2022).
Lopez-Otin, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell 153, 1194–1217 (2013).
Simonsen, M., Mailund, T. & Pedersen, C. N. S. in International Workshop on Algorithms in Bioinformatics 113–122 (Springer, 2008).
Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Yuan, M. et al. Alignment of cell lineage trees elucidates genetic programs for the development and evolution of cell types. iScience 23, 101273 (2020).
Acknowledgements
We are grateful to W. Zhai, Z. Hu, J. Yang and Y. Zhang for comments. This work was supported by the National Key R&D Program of China (2021YFA1302500 and 2021YFA1302501), the National Natural Science Foundation of China (32293190, 32293191, 31970570 and 32200492).
Author information
Authors and Affiliations
Contributions
X.H. and S.D. conceived the study. S.D. and H.G. did the simulation and analyzed the data. D.Z. and M.Z. analyzed the third-party data. X.H. supervised the study. X.H., S.D. and H.G. wrote the manuscript with contribution from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Manu Setty, Michael Stumpf and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Madhura Mukhopadhyay, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The comparison between of TarCA and counting the sheer number of monophyletic clades.
Each dot in the figure represents the average estimation accuracy from 1,000 repeats, and the error bar indicates the interquartile range. The dots are colored based on the two inference strategies being compared. Specifically, the results obtained under a 0.1% sampling rate are highlighted in a grey shadow.
Extended Data Fig. 2 The impact of insufficient informative sites on Np estimation.
a. The schematic diagram shows the reconstructed error introduced by insufficient informative sites. When there is a loss of bifurcating nodes in the phylogenetic tree, it can lead to the overestimation of Np. b. In each panel, the performance of Np estimation is assessed for a specific number of informative sites, ranging from 10 to 1,000 sites. The x-axis represents the average Np obtained from 100 repeats, while the y-axis denotes the actual number of progenitors. The horizontal error bar indicates the interquartile range of Np estimates and the trendline fitted with a linear model (LM) is shown in blue. The Pearson′s correlation coefficients and the corresponding P-values obtained from two-sided t-tests are shown on the top-left corner of each panel.
Extended Data Fig. 3 The impact of lineage invasions on Np estimation.
a. Schematic plot of impact of lineage invasion. b. In each panel, the performance of Np estimation is assessed for a specific proportion of lineage invasions, ranging from 0.1% to 90%. The x-axis represents the average Np obtained from 1,000 repeats, while the y-axis denotes the actual number of progenitors. The horizontal error bars indicate the interquartile range of Np estimates and the trendline fitted with a linear model (LM) is shown in blue. The Pearson′s correlation coefficients and the corresponding P-values obtained from two-sided t-tests are shown on the top-left corner of each panel.
Extended Data Fig. 4 Examining the impact of reconstruction errors on the estimation of the number of progenitors for a focal cell type.
a. The diagram illustrates the process of dense sampling of the focal cell type (O1 in this case) under different levels of error rates. b. Each panel in the figure presents a comparison between two inference methods: TarCA and counting of the sheer number of monophyletic clades. The x-axis represents the sampling rate of the focal cell type within all its descendants. The y-axis displays the estimated number of progenitor cells and the corresponding estimation accuracy. The given error rate is indicated on the top of each panel. The results inferred using the sheer number of monophyletic clades are depicted with blue dots, while those inferred using TarCA are represented by orange dots. Each dot denotes the average level of the data. The error bars indicate the lower and upper quartiles of the data. Additionally, the red dashed line in each panel represents the actual number of progenitors of the focal cell type (O1 in this case). This line serves as a reference for assessing the accuracy of the estimation methods.
Extended Data Fig. 5 Testing TarCA in nematodes.
a. The Np estimated by TarCA matches well the actual progenitor number for the eight cell types examined in C. elegans, including blast cells (Bla, n = 39), epidermal cells (Epi, n = 93), germ cells (Ger, n = 2), gland cells (Gla, n = 13), intestinal cells (Int, n = 20), muscle cells (Mus, n = 122), neuron cells (Neu, n = 226) and structural cells (Str, n = 46). The Pearson′s correlation and Spearman′s correlation are shown on the left-top, with P-value obtained from two-side correlation test. b. A similar performance of TarCA is shown in P. marina. The seven cell types included are neuron (Neu, n = 173), epidermal (Epi, n = 103), pharynx (Pha, n = 93), muscle (Mus, n = 80), germ cells (Ger, n = 2), intestine (Int, n = 20), and unknown (Unknown, n = 30), respectively. The Pearson′s correlation and Spearman′s correlation are shown on the left-top, with P-value obtained from two-side correlation test.
Extended Data Fig. 6 Inference of the number of progenitors for different cell populations in the early stage of mouse embryogenesis.
a. Each dot in panel a represents the Np corresponding to one of the 10 artificially defined cell populations in Embryo-3. The size of the dot indicates the exact value of Np, which is shown on the left side of the figure. b. The estimated Np values for the 10 cell populations are shown to be consistent between two embryos. The color of each dot represents the identity of the corresponding cell population as depicted in panel a. The Pearson′s correlation is shown on the left-top, with P-value obtained from two-side correlation test.
Extended Data Fig. 7 Ten representative LUGs on the cell phylogeny of gut endoderm cells in Embryo-3.
The cell phylogeny of gut endoderm cells is shown, and two types of dots are depicted: i) dots in pink represent the cell population with expression upregulation of the focal gene; ii) dots in red show the clades on which the focal cell population with upregulated expression is concentrated.
Extended Data Fig. 8 Examining the distribution of the clusters derived from the whole transcriptome of gut endoderm cells on the cell phylogeny in Embryo-6.
a. UMAP projection of the gut endoderm cells in Embryo-6 are shown colored by clusters derived from Leiden clustering, along with the same projection showing the expression level of five extraembryonic marker genes. b. Each panel considers an individual cluster in the UMAP projection of gut endoderm cells in Embryo-6. The red line represents the observed Np for the cluster, and the histogram shows the expected Np based on 1,000 repeats of random shuffling. The one-sided empirical P-value in these 1,000 repeats is also shown. c. The phylogenetic tree of the gut endoderm cells of Embryo-6, with cell annotations referring to the UMAP projection in panel b.
Extended Data Fig. 9 Comparison between LUGs and genes predicted by CoSpar in hematopoiesis.
The Venn plot shows the relationship between 1,825 LUGs and 377 genes predicted by CoSpar, with TarCA-specific genes shown in red, overlapped genes shown in purple and CoSpar-specific genes shown in sky blue. Specifically, the four classic genes selected by Weinreb et al.61 are highlighted in orange.
Supplementary information
Supplementary Information
Supplementary Table 1, Figs. 1–16 and Notes 1–4.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 9
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Deng, S., Gong, H., Zhang, D. et al. A statistical method for quantifying progenitor cells reveals incipient cell fate commitments. Nat Methods 21, 597–608 (2024). https://doi.org/10.1038/s41592-024-02189-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-024-02189-7