Molecular recording of mammalian embryogenesis

Abstract

Ontogeny describes the emergence of complex multicellular organisms from single totipotent cells. This field is particularly challenging in mammals, owing to the indeterminate relationship between self-renewal and differentiation, variation in progenitor field sizes, and internal gestation in these animals. Here we present a flexible, high-information, multi-channel molecular recorder with a single-cell readout and apply it as an evolving lineage tracer to assemble mouse cell-fate maps from fertilization through gastrulation. By combining lineage information with single-cell RNA sequencing profiles, we recapitulate canonical developmental relationships between different tissue types and reveal the nearly complete transcriptional convergence of endodermal cells of extra-embryonic and embryonic origins. Finally, we apply our cell-fate maps to estimate the number of embryonic progenitor cells and their degree of asymmetric partitioning during specification. Our approach enables massively parallel, high-resolution recording of lineage and other information in mammalian systems, which will facilitate the construction of a quantitative framework for understanding developmental processes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Optimization of a multi-purpose molecular recorder.
Fig. 2: Lineage tracing in mouse from fertilization through gastrulation.
Fig. 3: Assigning cellular phenotype by scRNA-seq.
Fig. 4: Single-cell lineage reconstruction of mouse embryogenesis.
Fig. 5: Disparities between transcriptional identity and lineage history within the endoderm.
Fig. 6: Lineage bias and estimated size of progenitor fields.

Data availability

The data are available in the Gene Expression Omnibus database under accession numbers GSE117542 (for lineage-traced embryos) and GSE122187 (for the gastrulation compendium). Any other relevant data are available from the corresponding authors upon reasonable request.

Code availability

The greedy reconstruction algorithm (named Cassiopeia) is available at https://github.com/YosefLab/Cassiopeia. Other code will be shared upon request.

References

  1. 1.

    Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).

    CAS  Article  Google Scholar 

  2. 2.

    Pijuan-Sala, B., Guibentif, C. & Göttgens, B. Single-cell transcriptional profiling: a window into embryonic cell-type specification. Nat. Rev. Mol. Cell Biol. 19, 399–412 (2018).

    CAS  Article  Google Scholar 

  3. 3.

    Zernicka-Goetz, M. Patterning of the embryo: the first spatial decisions in the life of a mouse. Development 129, 815–829 (2002).

    CAS  Google Scholar 

  4. 4.

    Fincher, C. T., Wurtzel, O., de Hoog, T., Kravarik, K. M. & Reddien, P. W. Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science 360, eaaq1736 (2018).

    Article  Google Scholar 

  5. 5.

    Plass, M. et al. Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics. Science 360, eaaq1723 (2018).

    Article  Google Scholar 

  6. 6.

    Briggs, J. A. et al. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science 360, eaar5780 (2018).

    Article  Google Scholar 

  7. 7.

    Farrell, J. A. et al. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science 360, eaar3131 (2018).

    Article  Google Scholar 

  8. 8.

    Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981–987 (2018).

    CAS  ADS  Article  Google Scholar 

  9. 9.

    Ibarra-Soria, X. et al. Defining murine organogenesis at single-cell resolution reveals a role for the leukotriene pathway in regulating blood progenitor formation. Nat. Cell Biol. 20, 127–134 (2018).

    CAS  Article  Google Scholar 

  10. 10.

    Han, X. et al. Mapping the mouse cell atlas by microwell-seq. Cell 172, 1091–1107 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Perli, S. D., Cui, C. H. & Lu, T. K. Continuous genetic recording with self-targeting CRISPR–Cas in human cells. Science 353, aag0511 (2016).

    Google Scholar 

  12. 12.

    Kalhor, R. et al. Developmental barcoding of whole mouse via homing CRISPR. Science 361, eaat9804 (2018).

    Article  Google Scholar 

  13. 13.

    Frieda, K. L. et al. Synthetic recording and in situ readout of lineage information in single cells. Nature 541, 107–111 (2017).

    CAS  ADS  Article  Google Scholar 

  14. 14.

    McKenna, A. et al. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science 353, aaf7907 (2016).

  15. 15.

    Raj, B. et al. Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nat. Biotechnol. 36, 442–450 (2018).

    CAS  Article  Google Scholar 

  16. 16.

    Alemany, A., Florescu, M., Baron, C. S., Peterson-Maduro, J. & van Oudenaarden, A. Whole-organism clone tracing using single-cell sequencing. Nature 556, 108–112 (2018).

    CAS  ADS  Article  Google Scholar 

  17. 17.

    Spanjaard, B. et al. Simultaneous lineage tracing and cell-type identification using CRISPR–Cas9-induced genetic scars. Nat. Biotechnol. 36, 469–473 (2018).

    CAS  Article  Google Scholar 

  18. 18.

    Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017).

    CAS  ADS  Article  Google Scholar 

  19. 19.

    Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    van Overbeek, M. et al. DNA repair profiling reveals nonrandom outcomes at Cas9-mediated breaks. Mol. Cell 63, 633–646 (2016).

    Article  Google Scholar 

  21. 21.

    Schimmel, J., Kool, H., van Schendel, R. & Tijsterman, M. Mutational signatures of non-homologous and polymerase theta-mediated end-joining in embryonic stem cells. EMBO J. 36, 3634–3649 (2017).

    CAS  Article  Google Scholar 

  22. 22.

    Lemos, B. R. et al. CRISPR/Cas9 cleavages in budding yeast reveal templated insertions and strand-specific insertion/deletion profiles. Proc. Natl Acad. Sci. USA 115, E2040–E2047 (2018).

    CAS  Article  Google Scholar 

  23. 23.

    Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).

    CAS  Article  Google Scholar 

  24. 24.

    Ihry, R. J. et al. p53 inhibits CRISPR–Cas9 engineering in human pluripotent stem cells. Nat. Med. 24, 939–946 (2018).

    CAS  Article  Google Scholar 

  25. 25.

    Haapaniemi, E., Botla, S., Persson, J., Schmierer, B. & Taipale, J. CRISPR–Cas9 genome editing induces a p53-mediated DNA damage response. Nat. Med. 24, 927–930 (2018).

    CAS  Article  Google Scholar 

  26. 26.

    Kim, S.-Y., Lee, J.-H., Shin, H.-S., Kang, H.-J. & Kim, Y.-S. The human elongation factor 1 alpha (EF-1α) first intron highly enhances expression of foreign genes from the murine cytomegalovirus promoter. J. Biotechnol. 93, 183–187 (2002).

    CAS  Article  Google Scholar 

  27. 27.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Kwon, G. S., Viotti, M. & Hadjantonakis, A.-K. The endoderm of the mouse embryo arises by dynamic widespread intercalation of embryonic and extraembryonic lineages. Dev. Cell 15, 509–520 (2008).

    CAS  Article  Google Scholar 

  29. 29.

    Eakin, G. S. & Hadjantonakis, A.-K. Sex-specific gene expression in preimplantation mouse embryos. Genome Biol. 7, 205 (2006).

    Article  Google Scholar 

  30. 30.

    Li, C.-S. et al. Trap1a is an X-linked and cell-intrinsic regulator of thymocyte development. Cell. Mol. Immunol. 14, 685–692 (2017).

    CAS  Article  Google Scholar 

  31. 31.

    Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    CAS  ADS  Article  Google Scholar 

  32. 32.

    Soriano, P. & Jaenisch, R. Retroviruses as probes for mammalian development: allocation of cells to the somatic and germ cell lineages. Cell 46, 19–29 (1986).

    CAS  Article  Google Scholar 

  33. 33.

    Jaenisch, R. Mammalian neural crest cells participate in normal embryonic development on microinjection into post-implantation mouse embryos. Nature 318, 181–183 (1985).

    CAS  ADS  Article  Google Scholar 

  34. 34.

    Nichols, J. & Smith, A. Naive and primed pluripotent states. Cell Stem Cell 4, 487–492 (2009).

    CAS  Article  Google Scholar 

  35. 35.

    Wang, Z. & Jaenisch, R. At most three ES cells contribute to the somatic lineages of chimeric mice and of mice produced by ES-tetraploid complementation. Dev. Biol. 275, 192–201 (2004).

    CAS  Article  Google Scholar 

  36. 36.

    Baeumler, T. A., Ahmed, A. A. & Fulga, T. A. Engineering synthetic signaling pathways with programmable dCas9-based chimeric receptors. Cell Reports 20, 2639–2653 (2017).

    CAS  Article  Google Scholar 

  37. 37.

    Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    CAS  ADS  Article  Google Scholar 

  38. 38.

    Hess, G. T. et al. Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036–1042 (2016).

    CAS  Article  Google Scholar 

  39. 39.

    Hou, J. et al. A systematic screen for genes expressed in definitive endoderm by Serial Analysis of Gene Expression (SAGE). BMC Dev. Biol. 7, 92 (2007).

    Article  Google Scholar 

  40. 40.

    Wang, G., Moffitt, J. R. & Zhuang, X. Multiplexed imaging of high-density libraries of RNAs with MERFISH and expansion microscopy. Sci. Rep. 8, 4847 (2018).

    ADS  Article  Google Scholar 

  41. 41.

    Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).

    CAS  Article  Google Scholar 

  42. 42.

    Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

    Article  Google Scholar 

  43. 43.

    Tzouanacou, E., Wegener, A., Wymeersch, F. J., Wilson, V. & Nicolas, J.-F. Redefining the progression of lineage segregations during mammalian embryogenesis by clonal analysis. Dev. Cell 17, 365–376 (2009).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank members of the Weissman, Meissner and Yosef laboratories, particularly J. Charlton and A. Arczewska for assistance with animal imaging, A. Kumar for explorations of extra-embryonic endoderm, and L. Gilbert and M. Horlbeck for guidance on technology design, as well as E. Chow and D. Bogdanoff from the UCSF Center for Advanced Technology for sequencing. This work was funded by National Institutes of Health Grants R01 DA036858 and 1RM1 HG009490-01 (J.S.W.), P50 HG006193 and R01 HD078679 (A.M.), F32 GM116331 (M.J.) and F32 GM125247 (J.J.Q.), and Chan-Zuckerberg Initiative 2018-184034. J.S.W. is a Howard Hughes Medical Institute Investigator. M.M.C. is a Gordon and Betty Moore fellow of the Life Sciences Research Foundation. T.M.N. is a fellow of the Damon Runyon Cancer Research Foundation. S.G., H.K. and A.M. are supported by the Max Planck Society.

Reviewer information

Nature thanks Anna-Katerina Hadjantonakis, Patrick Tam, Valerie Wilson and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

M.M.C., Z.D.S., A.M. and J.S.W. were responsible for the conception, design and interpretation of the experiments, and wrote the manuscript. M.M.C. and Z.D.S. conducted experiments, and M.M.C. developed the analysis with input from Z.D.S. S.G. and H.K. provided annotations for RNA-seq data, and assisted in experimental and analytical optimization. B.A., T.M.N. and M.J. provided vectors, experimental protocols and advice. J.J.Q. and D.Y. prepared several sequencing and transposon libraries and were engaged in discussion. M.G.J., A.K. and N.Y. provided phylogenetic reconstruction strategies.

Corresponding authors

Correspondence to Alexander Meissner or Jonathan S. Weissman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Target-site indel likelihoods from in vitro experiments.

a, Histograms for the relative indel frequency for protospacer sites 1, 2 and 2b within the target site. In this experiment, sgRNA-expressing vectors respective to each position were delivered into K562 cells. Repair outcomes and frequencies are different for each site, but every site produces hundreds of discrete outcomes. The top 20 most-frequent indels for each site are shown. Site 3 was not profiled in this experiment. b, For sites 1 and 2, histograms representing the likelihood that any specific base in the target site is deleted (blue) or has an insertion (red) that begins at that position. The position of the intBC and protospacer sequences (sites) within the target site are represented as a schematic along the bottom, with the protospacer adjacent motif (PAM) for each site proximal to the intBC. Indels start at the double-stranded-break point, three bases from the PAM sequence. c, Simultaneous and continuous molecular recording of multiple clonal populations in K562 cells. We transduced K562 cells with a high-complexity library of unique intBCs, sorted them into wells of 10 cells each and propagated them for 18 days. At the end of the experiment, we detected two populations by their intBCs, which implies that only two clonal lineages expanded from the initial population of ten, and confirmed generation of target-site mutations. Top left, strategy for partitioning a multi-clonal population. Target sites are amplified from a single-cell barcoded cDNA library and the intBCs in each cell are identified as present or absent. Top right, heat map of the overlap of intBCs between all cells. The cells segregate into two populations that represent the descendants of two progenitor cells from the beginning of the experiment. Bottom, table summarizing results of the experiment, including the generation of indels over the experiment duration. These data additionally showcase our ability to combine dynamic recording with tracing based on traditional static barcodes.

Extended Data Fig. 2 Capturing early differentiation by pooled sequencing of indels generated within an E9.5 embryo.

Scatter plots of indel proportions from dissected bulk tissue of an E9.5 embryo. Placenta is the most distantly related from embryonic tissues, followed by the yolk sac; the three embryonic compartments share the highest similarity. n, number of indels used in the comparison; r, Pearson correlation of the relative indel proportions. Each of the three sites is considered independently per intBC. A heat map representing the correlation coefficients appears in Fig. 2b.

Extended Data Fig. 3 Experimental overview.

a, Schematic of platform used for generation of scRNA-seq libraries and corresponding target-site amplicon libraries, adapted from a previous study19. The barcoded and amplified cDNA library is split into two fractions: one fraction is used to generate a global transcription profile and the other is used to specifically amplify the target site. CBC, cell barcode; UMI, unique molecular identifier. b, Summary of lineage-traced embryos detailing the type of guides used, the sampling proportion and sequencing results. Cells from embryo 2 were run on two 10x lanes. Embryo 4 was omitted from further analysis owing to the absence of cells identified as primitive heart tube. The sgRNA array is listed in order from site 1 to site 3: P, perfect complementarity between guide RNA and protospacer; 2, mismatch in region 2; 1, mismatch in region 1.

Extended Data Fig. 4 Target-site capture in mouse embryos.

a, Percentage of cells with at least one target site captured. Cells from embryo 2 were run on two 10x lanes. b, Scatter plot showing the relationship between the mean number of UMIs (a proxy for expression level) sequenced per target site and the percentage of cells in which the target site is detected, which we refer to as ‘target-site capture’. In general, as the mean number of UMIs increases, the percentage of cells also increases. Using a full-length, intron-containing EF1α promoter in mouse embryos leads to a higher number of UMIs, which generally results in better target-site capture. c, Percentage of cells for which a given intBC is detected across all seven embryos profiled in this study. d, Target-site capture and expression level across tissues for embryo 5, which uses a truncated EF1α promoter to direct transcription of the target site. Each row corresponds to a different intBC, indicated in the top left of the histogram. Left, the percentage of cells in each tissue for which the target site is captured. Right, violin plots representing the distribution of UMIs for the target site in each tissue. Dashed line refers to a ten-UMI threshold. The target site may be expressed at different levels in a tissue-specific manner, which leads to higher likelihoods of capture in certain tissues. Biased capture of target sites that carry the intBCs AGGACAAA and ATTGCTTG may also be explained by mosaic integration after the first cell cycle as their capture is preferential to extra-embryonic lineages that are restricted early in development. White dot indicates the median UMI count for cells from a given tissue, edges indicate the interquartile range, and whiskers denote the full range of the data. e, Target-site capture and expression level across tissues for embryo 7, which drives target-site expression from an intron-containing EF1α promoter. Each row corresponds to a different intBC, indicated in the top left of the histogram. Left, the percentage of cells in each tissue for which the target site is captured. Right, violin plots representing the distribution of UMIs for the target site in each tissue as in d. Dashed line is a visual threshold for ten UMIs. Although tissue-specific expression may explain some discrepancy in target-site capture, high expression (as estimated from the number of UMIs) can still correspond to low capture rates, as observed for the intBC TGGCGGGG. One possibility is that particular indels may destabilize the transcript and lead to either poor expression or capture. f, Scatter plots that show the relationship between estimated relative indel frequency and the median number of cells that carry the indel. Because the indel frequency within a mouse is dependent on the timing of the mutation, we cannot calculate the underlying indel frequency distribution using the fraction of cells within embryos that carry a given indel. Instead, we estimate this frequency by the presence or absence of an indel using all of the target-site integrations across mice, which reduces biases from cellular expansion but assumes that any given indel occurs only once in the history of each intBC. Because the number of integrations is small, we might expect our estimates to be poor. Here we see that the number of cells marked with an indel increases with indel frequency, which suggests that our frequency estimates are underestimated for particularly frequent indels. This is probably due to the fact that we cannot distinguish between identical indels in the same target site that may have resulted from multiple repair events (convergent indels). The most frequent insertions are of a single base and tend to be highly biased towards a single nucleotide (for example, 92:1:I is uniformly an ‘A’ in 5 out of 7 embryos, and never below 88%).

Extended Data Fig. 5 scRNA-seq tissue assignment and wild-type comparison.

a, Box plots representing tissue proportions from E8.0 (top) and E8.5 (bottom) wild-type embryos (n = 10 each), with lineage-traced embryos mapping to each state overlaid as dots. Wild-type embryos display a large variance in the proportions of particular tissues, and the proportions of our lineage-traced embryos generally fall within the range of those recovered from wild type. Large circles indicate embryos that were scored as either E8.0 or E8.5, and the bold red overlay highlights embryo 2, which is used throughout the text. Note that many processes—such as somitogenesis and neural development—are continuous or ongoing between E8.0 to E8.5. For example, from E8.0 to E8.5, the embryonic proportions of anterior neural ectoderm and fore- and midbrain are inversely correlated, as one cell type presumably matures into the other. Many of our embryos scored as E8.0 exhibit intermediate proportions for both tissue types, which supports the possibility that these embryos are slightly less developed than E8.5 but more developed than E8.0. For box plots, the centre line indicates the median, edges indicate the interquartile range, whiskers indicate the Tukey fences, and crosses denote outliers. b, t-SNE plots of scRNA-seq data with corresponding tissue annotations for the six lineage-traced embryos used in this study. Insets, pie charts of the relative proportions for different germ layers. Mesoderm is further separated to include blood (red). Although 36 different states are observed during this developmental interval, only broad classifications of particular groups (for example, neural ectoderm or lateral plate mesoderm) are overlaid to provide a frame of reference. In general, the relative spacing and coherence of different cell states are consistent across different embryos. c, Box plots of the Euclidean distance between single-cell transcriptomes and the average transcriptional profile of their assigned cluster (cluster centre) in comparison to their distance from the average of the next-closest possible assignment. Comparison is to the same 712 informative marker genes that were used to assign cells to states, and includes all cells used in this study (Supplementary Methods). Middle bar highlights the median, edges indicate the interquartile range, whiskers indicate the Tukey fences, and grey dots denote outliers. n values refer to the cumulative number of cells assigned to each state across all seven embryos for which single-cell data were collected, including for embryo 4, which was ultimately withheld from further analysis owing to the lack of primitive heart tube development.

Extended Data Fig. 6 Continuous indel generation by breeding.

a, Strategy for generating lineage-traced mice through breeding. The target site and guide array cassette are integrated into mouse zygotes as in Fig. 2a using sperm from C57BL/6J mice to generate P0 breeder mice, which are capable of transmitting high-copy genomic integrations of the technology. Then, P0 mice are crossed with homozygous transgenic mice that constitutively express Cas9 to enable continuous cutting from fertilization onwards in F1 progeny. Sibling 2 of a cross between a P0 male and a Cas9:eGFP female is shown. b, Bar charts show the degree of mutation (per cent cut, red) for a P0 male (top row) and four F1 offspring generated by breeding with a Cas9:eGFP female before weaning (21 days post partum). Each row represents a mouse and each column represents a target site. Each sibling inherits its own subset of the 23 parental target-site integrations, and demonstrates different levels of mutation throughout gestation and maturation. c, Indel frequencies for the ten most-frequent indels from three siblings in a common target-site integration (column 1 in b). Each mouse shows a large diversity of indels, and the different frequencies observed in each mouse demonstrate an independent mutational path.

Extended Data Fig. 7 Performance of tree-building algorithms used on embryonic data.

a, Table summarizing contemporary Cas9-based lineage tracers that have been applied to vertebrate development, highlighting attributes that differ between the studies. See Supplementary Methods for a more detailed overview of key characteristics of our technology. Single asterisk denotes that the study reports the average fraction recovered by tissue for integrations that cannot be distinguished, such that percentages reported here are effectively equivalent to our ‘≥1 intBC’ metric. Double asterisk indicates that the value refers to a plate-based DNA-sequencing approach that can be applied to all methods to improve target-site recovery. Triple asterisk denotes a range of cells in which at least one intBC is confidently detected and scored. Quadruple asterisk denotes that the study presents a tree reconstruction method, but includes results that predominantly rely on clonal analysis. b, Table of allele complexity, number of nodes and log-likelihood scores for embryos. Tree likelihoods are calculated using indel frequencies estimated from all embryo data (Fig. 2c, Extended Data Fig. 4, Supplementary Methods). Bold scores indicate the reconstruction algorithm selected for each embryo (trees shown in Fig. 4, Extended Data Figs. 8, 9). c, log-likelihood of trees generated using either the greedy or biased sampling approach as a function of complexity, which is measured as the number of unique alleles. There is near-equivalent performance of the two algorithms for low-complexity embryos, but the greedy algorithm produces higher-likelihood trees for embryos with larger numbers of unique alleles.

Extended Data Fig. 8 Single-cell lineage reconstruction of early mouse development for embryo 6.

a, Reconstructed lineage tree comprising 2,690 nodes generated from our most information-dense embryo (embryo 6) that we used to compare shared progenitor scores with embryo 2 in Fig. 4d. Each branch represents an independent indel generation event. b, Example paths from root to leaf from the selected tree (highlighted by colour). Cells for each node in the path are overlaid onto the t-SNE representation in Extended Data Fig. 5, with the tissue proportion for cells within each node included as a pie chart (colours are as in Fig. 3b). In the top path (pink), the lineage bifurcates into two independently fated progenitors that either generate mesoderm (secondary heart field and primitive heart tube) or neural ectoderm (anterior neural ectoderm and neural crest). Note that the middle path (green) also represents an earlier bifurcation from the same tree, and eventually produces neural ectoderm (neural crest and future spinal cord). These paths begin with a pluripotent node that can generate visceral endoderm, but subsequently lose this potential. The bottom path (dark blue) begins in an equivalently pluripotent state but becomes restricted towards the extra-embryonic visceral-endoderm fate. c, Violin plots that represent the relationship between lineage and expression for individual pairs of cells as calculated for embryo 2 in Fig. 4c. Expression Pearson correlation decreases with increasing lineage distance, which shows that closely related cells are more likely to share function. Red dot highlights the median, edges indicate the interquartile range, and whiskers indicate the full range. d, Comprehensive clustering of shared progenitor scores for embryo 6, which has the greatest number of unique alleles and samples multiple extra-embryonic tissue types. Shared progenitor score is calculated as the sum of shared nodes between cells from two tissues, normalized by the number of additional tissues that are also produced (a single shared progenitor score is calculated as 2−(n − 1), in which n is the number of clusters present within that node). In general, extra-embryonic tissues that are specified before implantation—such as extra-embryonic endoderm or ectoderm—co-cluster away from embryonic tissues and within their own groups, whereas the amnion and allantois of the extra-embryonic mesoderm cluster with other mesodermal products of the posterior primitive streak. The co-clustering of anterior paraxial mesoderm and somites may reflect the continuous nature of somitogenesis from presomitic mesoderm during this period, with production of only the anterior-most somites by E8.5. Note that the gut endoderm cluster has been further portioned according to embryonic or extra-embryonic lineage (Fig. 5).

Extended Data Fig. 9 Summary of results from additional mouse embryos.

a, b, Representative highest-likelihood tree analyses for additional embryos, including reconstructed trees as shown in Fig. 4a (a) and shared progenitor score heat maps as shown in Fig. 5a (b), normalized to the highest score for each embryo to account for differences in total node numbers. Here the shared progenitor score is calculated as the number of nodes that are shared between tissues, scaled by the number of tissues within each node (a single shared progenitor score is calculated as 2−(n − 1), in which n is the number of clusters present within that node). In general, the clustering of shared progenitors is recapitulated across embryos, with mesoderm and ectoderm sharing the highest relationship and either extra-embryonic ectoderm or extra-embryonic endoderm representing the most-deeply rooted and distinct outgroup, although these scores are sensitive to the number of target sites, the rate of cutting and the number of cells in the cluster. By shared progenitor, primordial germ cells (PGCs) are also frequently distant from other embryonic tissues; however, this often reflects the rarity of these cells, which restricts them to only a few branches of the tree in comparison to better-represented germ layers. The number of heterogeneous nodes from which scores are derived is included for each heat map. c, Violin plots that represent the pairwise relationship between lineage distance and transcriptional profile as shown for embryo 2 in Fig. 4c. Lineage distance is calculated using a modified Hamming distance, and transcriptional similarity by Pearson correlation. The exact dynamic range for lineage distance depends on the number of intBCs included and the cutting rate of the three-guide array. Here distances are binned into perfect (0), close (0 > x > 0.5), intermediate (0.5 ≤ x < 1), and distant (x ≥ 1) relationships for all cells that contain either three or six cut sites, depending on the embryo. As lineage distance increases, transcriptional similarity decreases, which is consistent with functional restriction over development. Red dot highlights the median, edges indicate the interquartile range, and whiskers denote the full range.

Extended Data Fig. 10 Expression characteristics of extra-embryonic and embryonic endoderm.

a, Violin plots that represent the pairwise scRNA-seq Pearson correlation coefficients for within- or across-group comparisons according to lineage (X, extra-embryonic; E, embryonic) and cluster assignment (light blue, gut endoderm; dark blue, visceral endoderm). Within-group comparisons for cells with the same lineage and transcriptional cluster identity are shown on the left, and across-group comparisons are presented on the right. Notably, extra-embryonic cells with gut-endoderm identities show higher pairwise correlations to embryonic cells with gut-endoderm identities (column 4) than they do to visceral-endoderm cells, with which they share a closer lineage relationship (column 5). Red dot highlights the median, edges indicate the interquartile range, and whiskers denote the full range. n, number of pairwise comparisons between cells in embryo 2. b, t-SNE plots of scRNA-seq data for embryo 2, with gut-endoderm cells highlighted. Endoderm cells segregate from the rest of the embryo, and cannot be distinguished by embryonic (light blue) or extra-embryonic (dark blue) origin. n, number of cells for embryo 2. Cells of ambiguous origin are not included in the two right-most plots. c, Expression box plots for the extra-embryonic markers Trap1a and Rhox5 from an independent scRNA-seq survey of E8.25 embryos (data and annotations taken from a previous study9). Both genes are heterogeneously present in cells identified as mid- and hindgut but uniformly present in canonical extra-embryonic tissues, which is consistent with the presence of a subpopulation of cells of extra-embryonic origin that resides within this otherwise-embryonic cluster. Red lines highlight the median, edges indicate the interquartile range, and whiskers denote the Tukey fence. Outliers were removed for clarity.

Supplementary information

Supplementary Information

This file contains Supplementary Methods and References.

Life Sciences Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chan, M.M., Smith, Z.D., Grosswendt, S. et al. Molecular recording of mammalian embryogenesis. Nature 570, 77–82 (2019). https://doi.org/10.1038/s41586-019-1184-5

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.