Abstract
Mapping the cell phylogeny of a complex multicellular organism relies on somatic mutations accumulated from zygote to adult. Available cell barcoding methods can record about three mutations per barcode, enabling only low-resolution mapping of the cell phylogeny of complex organisms. Here we developed SMALT, a substitution mutation-aided lineage-tracing system that outperforms the available cell barcoding methods in mapping cell phylogeny. We applied SMALT to Drosophila melanogaster and obtained on average more than 20 mutations on a three-kilobase-pair barcoding sequence in early-adult cells. Using the barcoding mutations, we obtained high-quality cell phylogenetic trees, each comprising several thousand internal nodes with 84–93% median bootstrap support. The obtained cell phylogenies enabled a population genetic analysis that estimates the longitudinal dynamics of the number of actively dividing parental cells (Np) in each organ through development. The Np dynamics revealed the trajectory of cell births and provided insight into the balance of symmetric and asymmetric cell division.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Raw data have been deposited in the National Center for Biotechnology Information’s Sequence Read Archive with accession numbers PRJNA716791, PRJNA761270 and PRJNA761271. Source data supporting the findings of the present study are provided as online materials for this paper.
Code availability
Codes for processing the data are available at https://github.com/CellLineage/SLOTH.
References
Sulston, J. E. & Horvitz, H. R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans. Dev. Biol. 56, 110–156 (1977).
Nei, M. Phylogenetic analysis in molecular evolutionary genetics. Annu. Rev. Genet. 30, 371–403 (1996).
Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).
Woodworth, M. B., Girskis, K. M. & Walsh, C. A. Building a lineage from single cells: genetic techniques for cell lineage tracking. Nat. Rev. Genet. 18, 230–244 (2017).
Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an organism exposes its cell lineage tree. PLoS Comput. Biol. 1, 382–394 (2005).
Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006).
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).
Luo, T., He, X. & Xing, K. Lineage analysis by microsatellite loci deep sequencing in mice. Mol. Reprod. Dev. 83, 387–391 (2016).
McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).
Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460 (2017).
Kalhor, R., Mali, P. & Church, G. M. Rapidly evolving homing CRISPR barcodes. Nat. Methods 14, 195–200 (2017).
Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).
Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).
Kalhor, R. et al. Developmental barcoding of whole mouse via homing CRISPR. Science 361, eaat9804 (2018).
Raj, B. et al. Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nat. Biotechnol. 36, 442–450 (2018).
Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol. 36, 469–473 (2018).
Chan, M. M. et al. Molecular recording of mammalian embryogenesis. Nature 570, 77–7 (2019).
Hwang, B. et al. Lineage tracing using a Cas9-deaminase barcoding system targeting endogenous L1 elements. Nat. Commun. 10, 1234 (2019).
Bowling, S. et al. An engineered CRISPR–Cas9 mouse line for simultaneous readout of lineage histories and gene expression profiles in single cells. Cell 181, 1410–1422 (2020).
Ye, C., Chen, Z. X., Liu, Z., Wang, F. & He, X. L. Defining endogenous barcoding sites for CRISPR/Cas9-based cell lineage tracing in zebrafish. J. Genet Genomics 47, 85–91 (2020).
Chen, H. Q. et al. Efficient, continuous mutagenesis in human cells using a pseudo-random DNA editor. Nat. Biotechnol. 38, 165–16 (2020).
Baron, C. S. & van Oudenaarden, A. Unravelling cellular relationships during development and regeneration using genetic lineage tracing. Nat. Rev. Mol. Cell Bio 20, 753–765 (2019).
Wagner, D. E. & Klein, A. M. Lineage tracing meets single-cell omics: opportunities and challenges. Nat. Rev. Genet. 21, 410–427 (2020).
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Wasserstrom, A. et al. Estimating cell depth from somatic mutations. PLoS Comput. Biol. 4, e1000058 (2008).
Sender, R., Fuchs, S. & Milo, R. Revised estimates for the number of human and bacteria cells in the body. PloS Biol. https://doi.org/10.1371/journal.pbio.1002533 (2016).
Stadler, T., Pybus, O. G. & Stumpf, M. P. H. Phylodynamics for cell biologists. Science 371, https://doi.org/10.1126/science.aah6266 (2021).
Harris, R. S., Petersen-Mahrt, S. K. & Neuberger, M. S. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell 10, 1247–1253 (2002).
Chen, X. et al. Nucleosomes suppress spontaneous mutations base-specifically in eukaryotes. Science 335, 1235–1238 (2012).
Chen, P., Wang, D., Chen, H., Zhou, Z. & He, X. The nonessentiality of essential genes in yeast provides therapeutic insights into a human disease. Genome Res 26, 1355–1362 (2016).
Prorok, P. et al. Uracil in duplex DNA is a substrate for the nucleotide incision repair pathway in human cells. Proc. Natl Acad. Sci. USA 110, E3695–E3703 (2013).
Wang, M., Yang, Z. Z., Rada, C. & Neuberger, M. S. AID upmutants isolated using a high-throughput screen highlight the immunity/cancer balance limiting DNA deaminase activity. Nat. Struct. Mol. Biol. 16, 769–776 (2009).
Fonfara, I., Curth, U., Pingoud, A. & Wende, W. Creating highly specific nucleases by fusion of active restriction endonucleases and catalytically inactive homing endonucleases. Nucleic Acids Res. 40, 847–860 (2012).
Zhu, Y. O., Siegal, M. L., Hall, D. W. & Petrov, D. A. Precise estimates of mutation rate and spectrum in yeast. Proc. Natl Acad. Sci. USA 111, E2310–E2318 (2014).
Brand, A. H. & Perrimon, N. J. D. Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development 118, 401–415 (1993).
Bate, M., Martinez Arias, A. & Hartenstein, V. The Development of Drosophila melanogaster (Cold Spring Harbor Laboratory Press, 1993).
Farrell, J. A. & O’Farrell, P. H. From egg to gastrula: how the cell cycle is remodeled during the Drosophila mid-blastula transition. Annu. Rev. Genet. 48, 269–294 (2014).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Edgar, B. A., Zielke, N. & Gutierrez, C. Endocycles: a recurrent evolutionary innovation for post-mitotic cell growth. Nat. Rev. Mol. Cell Biol. 15, 197–210 (2014).
Underwood, E. M., Caulton, J. H., Allis, C. D. & Mahowald, A. P. Developmental fate of pole cells in Drosophila melanogaster. Dev. Biol. 77, 303–314 (1980).
Buchon, N. et al. Morphological and molecular characterization of adult midgut compartmentalization in Drosophila. Cell Rep. 3, 1725–1738 (2013).
Miguel-Aliaga, I., Jasper, H. & Lemaitre, B. Anatomy and physiology of the digestive tract of Drosophila melanogaster. Genetics 210, 357–396 (2018).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Fu, Y. X. A phylogenetic estimator of effective population size or mutation rate. Genetics 136, 685–692 (1994).
Pybus, O. G., Rambaut, A. & Harvey, P. H. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155, 1429–1437 (2000).
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).
Yao, Z., Liu, K., Deng, S. & He, X. An instantaneous coalescent method insensitive to population structure. J. Genet Genomics 48, 219–224 (2021).
Karcher, M. D., Palacios, J. A., Lan, S. & Minin, V. N. phylodyn: an R package for phylodynamic simulation and inference. Mol. Ecol. Resour. 17, 96–100 (2017).
Hu, Z., Fu, Y. X., Greenberg, A. J., Wu, C. I. & Zhai, W. W. Age-dependent transition from cell-level to population-level control in murine intestinal homeostasis revealed by coalescence analysis. PLoS Genet. 9, e1003326 (2013).
Salvador-Martinez, I., Grillo, M., Averof, M. & Telford, M. J. Is it possible to reconstruct an accurate cell lineage using CRISPR recorders? eLife 8, e40292 (2019).
Ho, S. Y. W. & Duchene, S. Molecular-clock methods for estimating evolutionary rates and timescales. Mol. Ecol. 23, 5947–5965 (2014).
Charlesworth, B. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205 (2009).
Gietz, R. D. & Schiestl, R. H. Large-scale high-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2, 38–41 (2007).
Radchenko, E. A., McGinty, R. J., Aksenova, A. Y., Neil, A. J. & Mirkin, S. M. Quantitative analysis of the rates for repeat-mediated genome instability in a yeast experimental system. Methods Mol. Biol. 1672, 421–438 (2018).
Roney, I. J., Rudner, A. D., Couture, J. F. & Kaern, M. Improvement of the reverse tetracycline transactivator by single amino acid substitutions that reduce leaky target gene expression to undetectable levels. Sci. Rep. 6, 27697 (2016).
Mol, C. D. et al. Crystal structure of human uracil-DNA glycosylase in complex with a protein inhibitor: protein mimicry of DNA. Cell 82, 701–708 (1995).
Bischof, J., Maeda, R. K., Hediger, M., Karch, F. & Basler, K. An optimized transgenesis system for Drosophila using germ-line-specific phi C31 integrases. Proc. Natl Acad. Sci. USA 104, 3312–3317 (2007).
Acknowledgements
We are grateful to J. Zhang for inspiration, Y. Rong for information on the phiC31 system and for providing related fly strains, the Tsinghua Fly Center at the Tsinghua University for providing fly strains, Y. Zhao for suggestions and guidance with regard to the fly embryo microinjection, and T. Tang, C-I. Wu, X. Shen and members of the Wu laboratory for help with fly work. We thank H. Chen, W. Qian, Y. Zhang, W. Zhai, C-I. Wu, X. Huang, Z. Wang and J. Yang for discussion and comments on the paper. This work was supported by grants of the National Key R&D Program of China (grant no. 2017YFA0103504), the National Natural Science Foundation of China (grant nos. 31630042, 31970570, and 32070687), the Shanghai Municipal Science and Technology Major Project (grant no. 2017SHZDZX01) and the Guangdong Special Support Program (grant no. 2017TX04R395).
Author information
Authors and Affiliations
Contributions
X.H., C.Y. and L.L. designed the study. C.Y., K.L. and L.L. conducted the yeast assays. K.L. and H.G. conducted the fly experiments. S.D., C.Y., Z.Y., K.L., J.W. and X.H. analyzed the data. X.H. supervised the study and wrote the paper with input from C.Y., S.D., K.L. and other coauthors.
Corresponding authors
Ethics declarations
Competing interests
A patent related to the developed technique has been filed.
Additional information
Peer review information Nature Methods thanks Hugo Bellen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Madhura Mukhopadhyay was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 SMALT mutations depend heavily on cell divisions.
a. The schematic of the Tet-On system used in the experiment. The AI and GFP are both controlled by the Tet promoter such that the induced expression of AI can be inferred from GFP. b. A single diploid clone was grown to saturation (~10 h, 107 cells per ml), then split into four independent groups (~106 cells in 100 μl), followed by ~4 h starvation in PBS to run out of the original YPDA media. Then the yeast cells were washed with ddH2O twice and grew for ~18 hours in the correspondent media (dividing group: YPDA Dox- and Dox-; non-dividing group: PBS Dox+ and YEKAC Dox + ). Distribution of the mutation rate per site along the 120 barcode sequence. The mutation rate of a site is the number of reads with mutations at the site divided by the total number of reads covering the site (Methods). The relative position of the barcode is indicated and each dot represents a site. Two iSceI binding sites are shaded with grey rectangle and marked with black arrows. A cutoff of 0.1% (dashed line) is used as the threshold for over-background mutations. Clone 1, 2 and 3 represent three independent initial single clones. c. The expression induction (measured by GFP) in Dox+ media supplied with 10 μg/ml Doxycycline. The photos were taken after the yeast cells stay for ~18 h in each of the media. There are comparable GFP levels for the yeast cells in the dividing and non-dividing Dox+ media. The bright, GFP and merged images are shown, respectively, with the scale bar = 100μm (40X Objective with PH40 mode).
Extended Data Fig. 2 Tubulin-GAL4 ubiquitously drives UAS-GFP expression.
a. Images of developing D. melanogaster embryos at the stage of fast cleavage. Embryos were collected within 30 min after egg laying. Over five larvae were examined and a random sample is shown for each organ. Bar = 100 μm. b. Images of 10 dissected organs from a third instar larva. Bar = 200 μm.
Extended Data Fig. 3 Genome-wide off-target analysis of the SMALT system in fly.
We compared flies with the SMALT system (AID+) to those without the system (NC). Six individuals of each category were examined, and their genomes were subject to Illumina sequencing (PE150). In total we obtained ~0.8 billion reads after trimming with trim_galore, leaving on average ~108 sequencing coverages for each genome. The processed sequences were mapped to the Drosophila melanogaster hgenome (BDGP6) using bwa-mem and duplicates were removed, followed by variants calling with GATK HaplotypeCaller. Alternative loci with allele frequency over 10% were defined as polymorphisms, which were excluded from further analysis. a. The number of sites with detected putative somatic mutations in each genome. The six AID + individuals and six NC individuals are compared under a variety of mutant allele frequencies. The numbers in parentheses are the average coverage of reliably mapped reads on the genome. There are not apparent differences between the AID + and NC groups. b. The relative frequency of the different mutation types for the putative somatic mutations detected in each of the genomes. The frequency of AID signature mutations (C > T and G > A) is similar between the AID + and NC individuals. c. To increase the sensitivity we pooled the reads of all AID + individuals (also for all NC individuals) and focused on the five potential off-target sites each with < =2 mismatches to the 18 bp iSceI binding site. For each site we considered the two flanking regions of 150 bp in length (hence 18 + 2 × 150 = 318 bp), and identified the fragments (paired reads) fully covering the 318 bp regions. The bar-plots represent the frequency of fragments harboring any kind of mutations (n shows the number of fragments analyzed in this region). The error bars show the standard error of the frequency. We observed a higher frequency of mutated fragments in AID + group than NC group. However, the AID specific mutations (C > T and G > A) appear similar between the two groups, as shown by the pie charts. This suggests the higher frequency of mutated fragments not be due to the AID enzymatic activity. A possible explanation is the strong overexpression of a heterogeneous protein imposes stresses on the genome stability by consuming a large amount of cellular energy.
Extended Data Fig. 4 The cell tree of Fly-1 that comprises 5,003 alleles, with no organ information shown.
The bootstrap supports on the early internal branches are generally low. The first 30 cell generations are highlighted regarding the bootstrap values, with the red line showing the median and the blue lines showing the 25th to 75th percentiles.
Extended Data Fig. 5 The cell tree of Fly-2 that comprises 5,421 alleles, with no organ information shown.
The bootstrap supports on the early internal branches are generally low. The first 30 cell generations are highlighted regarding the bootstrap values, with the red line showing the median and the blue lines showing the 25th to 75th percentiles.
Extended Data Fig. 6 Validation of the coalescent method by simulations.
a. A schematic diagram showing how the coalescent rate (CR) of each small time interval is calculated from a hypothetical phylogenetic tree. The reconstruction is conducted until the 40th generation when 95% of the terminal nodes in the tree have been examined. b. An organ development is simulated and the actual Np trajectory is defined by the given parameters (Methods). With the increasing sampling proportion and per-generation mutation rate the Np trajectory can be reliably reconstructed by computing the CR of each generation (Np = 0.5(1/CR + 1)). c. The estimated Np trajectories are robust against different modes of cell deaths.
Extended Data Fig. 7 The coalescent method is not sensitive to population structure.
a. The mathematic proof of the equal coalescent probability at the immediate preceding generation of two random cells in a panmictic population (A) and a structured population (B = B1 + B2) of the same total size. b. A and B are two simulated populations with identical total Np trajectory, while B is composed of two divided sub-populations (B1 and B2) each with a distinct Np trajectory. The simulation procedure is similar to that of Extended Data Fig. 6b, and the per-generation mutation rate is set to be one. The coalescent method performs well in the structured population B, evidenced by the achieved consistency between the reconstructed Np trajectories (blue curves) and the actual Np trajectory (black curves), although a higher sampling proportion is required to have the same performance as in the panmictic counterpart population A (yellow curves).
Supplementary information
Supplementary Information
Supplementary Tables 1–3, Figs. 1–15, Notes I and II and Datasets I and II.
Supplementary Data
Mutation rate estimated with a maximum-likelihood method.
Supplementary Data
Raw unprocessed image.
Source data
Source Data Fig. 1
Mutation rate estimated with a maximum-likelihood method.
Source Data Fig. 2
Mutation for each readout processed from Sanger sequencing and mutation for each allele processed from PacBio data in binary format.
Source Data Fig. 3
Two phylogenetic trees in Newick format.
Source Data Fig. 4
Estimation result from instantaneous coalescence analysis.
Source Data Fig. 5
Estimation result from instantaneous coalescence analysis.
Source Data Extended Data Fig. 1
Mutation table for each sample.
Source Data Extended Data Fig. 3
Mutation table for each sample.
Rights and permissions
About this article
Cite this article
Liu, K., Deng, S., Ye, C. et al. Mapping single-cell-resolution cell phylogeny reveals cell population dynamics during organ development. Nat Methods 18, 1506–1514 (2021). https://doi.org/10.1038/s41592-021-01325-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01325-x
This article is cited by
-
A statistical method for quantifying progenitor cells reveals incipient cell fate commitments
Nature Methods (2024)
-
PhyloVelo enhances transcriptomic velocity field mapping using monotonically expressed genes
Nature Biotechnology (2023)
-
Split complementation of base editors to minimize off-target edits
Nature Plants (2023)
-
Base editors: development and applications in biomedicine
Frontiers of Medicine (2023)
-
Clonal tracking in cancer and metastasis
Cancer and Metastasis Reviews (2023)