In vivo clonal tracking reveals evidence of haemangioblast and haematomesoblast contribution to yolk sac haematopoiesis

During embryogenesis, haematopoietic and endothelial lineages emerge closely in time and space. It is thought that the first blood and endothelium derive from a common clonal ancestor, the haemangioblast. However, investigation of candidate haemangioblasts in vitro revealed the capacity for mesenchymal differentiation, a feature more compatible with an earlier mesodermal precursor. To date, no evidence for an in vivo haemangioblast has been discovered. Using single cell RNA-Sequencing and in vivo cellular barcoding, we have unravelled the ancestral relationships that give rise to the haematopoietic lineages of the yolk sac, the endothelium, and the mesenchyme. We show that the mesodermal derivatives of the yolk sac are produced by three distinct precursors with dual-lineage outcomes: the haemangioblast, the mesenchymoangioblast, and a previously undescribed cell type: the haematomesoblast. Between E5.5 and E7.5, this trio of precursors seeds haematopoietic, endothelial, and mesenchymal trajectories.

During embryogenesis, haematopoietic and endothelial lineages emerge closely in time and space. It is thought that the first blood and endothelium derive from a common clonal ancestor, the haemangioblast. However, investigation of candidate haemangioblasts in vitro revealed the capacity for mesenchymal differentiation, a feature more compatible with an earlier mesodermal precursor. To date, no evidence for an in vivo haemangioblast has been discovered. Using single cell RNA-Sequencing and in vivo cellular barcoding, we have unravelled the ancestral relationships that give rise to the haematopoietic lineages of the yolk sac, the endothelium, and the mesenchyme. We show that the mesodermal derivatives of the yolk sac are produced by three distinct precursors with dual-lineage outcomes: the haemangioblast, the mesenchymoangioblast, and a previously undescribed cell type: the haematomesoblast. Between E5.5 and E7.5, this trio of precursors seeds haematopoietic, endothelial, and mesenchymal trajectories.
Haematopoiesis first occurs in the embryonic day (E) 7.0-E10.5 mouse yolk sac to produce the mature haematopoietic lineages (primitive erythrocytes, megakaryocytes, macrophages) 1-4 and erythro-myeloid progenitors 2,5 . Our understanding of how this occurs is predominantly informed by in vitro and ex vivo data that have suggested a differentiation sequence involving mesoderm, the haemangioblast (a precursor that gives rise to both haematopoietic and endothelial lineages), and the haemogenic endothelium (a precursor defined by dual haematopoietic and endothelial marker expression but committed to the blood lineage) [6][7][8][9][10][11][12] . A long-standing question has been whether the yolk sac-derived endothelial and haematopoietic lineages share a common clonal origin independent from local mesenchymal lineages (smooth muscle, fibroblast, mesothelium). The balance of in vitro evidence suggests that although cells from the gastrulating embryo are capable of generating smooth muscle, endothelium, and haematopoietic lineages in vitro 10 , dual endothelial-haematopoietic outcome is a rare occurrence from the extraembryonic mesoderm and haemogenic endothelium 7,8,13 . Lineage tracking studies suggest that dual endothelial-haematopoietic outcome may occur at a higher frequency 14 in vivo, although mesenchymal contribution was not addressed in this study.
Embryonic blood cells are classified with reference to their ancestry. Cells derived directly from the mesoderm without transiting through haematopoietic stem cells or multipotent erythro-myeloid progenitors (EMPs) are termed primitive. Haematopoietic cells that descend from a bona fide EMP or a stem cell are considered to be pro- e-mail: naik.s@wehi.edu.au; taoudi@wehi.edu.au definitive and definitive, respectively. Haematopoietic stem cells emerge between embryonic day (E) 10.5-E11. 5 15,16 , therefore haematopoietic lineages that emerge prior to E11.5 must derive directly from the mesoderm or from EMPs. Yolk sac primitive erythrocytes and megakaryocytes can derive from a common precursor 3 , have features that distinguish them from their stem cell-derived counterparts 4,17,18 , and are generated in the absence of EMPs 4, [19][20][21] . Accordingly, both lineages have been proposed to be primitive lineages. In contrast, yolk sac EMP-derived macrophages are largely indistinguishable from those derived from stem cells 22,23 . This, combined with the observation that fetal macrophages are not produced in the absence of EMPs 21 , suggests that macrophages are not a primitive lineage. Although the primitivedefinitive classification convention is based on sound deductive reasoning, whether it accurately predicts the in vivo ancestral relationship between the yolk sac haematopoietic lineages remains untested.
To understand the in vivo cellular genealogy of the first mesodermal derivatives, we investigated the yolk sac between E7.25-E10.5 using single-cell transcriptomics and between E5.5-E10.5 using single cell lineage tracking by in vivo barcoding. Herein, we provide in vivo evidence of the haemangioblast. We also show that haemangioblasts are not the sole blood and endothelial precursor. Rather, these lineages arise from a trio of progenitors with distinct patterns of lineage production: the haemangioblast (that produces haematopoietic lineages and conventional endothelium), the mesenchymoangioblast (that produces mesenchyme and conventional endothelium), and the haematomesoblast which is a previously undescribed class of haematogenic precursor that that produces haematopoietic lineages and mesenchyme, and so bridges the haematopoietic and mesenchymal elements of the yolk sac.

A putative mesenchymal axis of yolk sac haematopoiesis
The extraembryonic mesoderm of the E7.75 yolk sac lines the primitive endoderm as a sheet 1 (Fig. 1ai). By E8.5, this sheet has become morphologically and immunophenotypically diversified into mesenchymal cells 1 , the blood band (which contains a mix of primitive erythroid cells, megakaryocyte precursors, EMPs, endothelium, and mesothelial cells) [24][25][26] , and the endothelial zone 24 (Fig. 1aii). The E8.5 yolk sac also contains three immunophenotypically identifiable haematopoietic lineages (primitive erythrocyte, megakaryocyte, and haematopoietic progenitor/colony forming cells (HPC) (a population containing all EMP activity, Supplementary Fig. 1 (Supplementary Fig. 2a-d). Although macrophage-associated genes can be detected at E8.5 29 , significant numbers of bona fide macrophages are not detected before E10.5 ( Supplementary Fig. 2e) 4 . To define the transcriptional identity of cellular intermediates that appear during mesodermal diversification in the yolk sac, we generated a targeted single-cell RNA-sequencing (scRNA-Seq) dataset encompassing all known cellular immunophenotypes associated with endothelial and haematopoietic differentiation in the yolk sac between E7.25-E10.5 ( Supplementary Data 1 and 2). Using the gating strategies described in Supplementary Fig. 2a . This revealed that from E8.5, endothelium and blood lineages were readily identifiable, and that E7.25 and E7.75 populations had a more ambiguous identity, which is consistent with cells being in a state of developmental transition. Of note, a caveat of this approach is that the same broad cell type (e.g., EryP) could have different transcriptional features at different developmental stages.
Although haemato-endothelial precursors were transcriptionally distinct from the mesoderm at E7.25, they still exhibited a robust mesenchymal signature at E7.75 (similar to that of non-haematoendothelial populations) that was abruptly downregulated by E8.5 ( Fig. 1ci-iii). In the light of the haemangioblast theory, that postulates an early separation of these lineages 6,7,10-12 , a transcriptional connection between the endothelial, haematopoietic, and mesenchymal lineages was unexpected. Importantly, this suggested that mesenchymal and haemato-endothelial fates might segregate later than thought. Possibly via an as-yet undiscovered common clonal ancestor.

Tracking mesodermal diversification using cellular barcoding
To test the hypothesis that haemato-endothelial development might occur via mesenchymal as well as endothelial intermediaries, we performed in vivo lineage tracking using inducible cellular barcoding. This approach enables the fate of large numbers of individual cells to be tracked via indelible DNA tagging under physiological conditions [34][35][36][37] . Identification of the same DNA tag (or barcode) in two cells, or cell populations, demonstrates that they share a clonal ancestry. To enable the sensitive recovery of barcodes from small numbers of purified cells, we used a Cre-LoxP-based in situ barcoding mouse line (named the LoxCode line) that can generate a high diversity of cell-specific barcodes 38 following Cre exposure during unperturbed development (Fig. 2a, Supplementary Fig. 4a, b, and Methods). The LoxCode construct contains 14 LoxP sites in alternate orientation interspersed with 13 × 8-14 bp unique DNA segments (termed elements) ( Fig. 2a and Supplementary Fig. 4a, b). The theoretical diversity provided by the recombination of the LoxCode is >30 × 10 9 unique barcodes (see Methods). A Sanger-sequence verified LoxCode cassette was introduced in mice at the Gt(ROSA)26Sor locus (Rosa26) using CRISPR technology. Exposure to Cre recombinase led to recombination (inversion/deletion) and the expected formation of LoxCode cassettes composed of 13, 9, 7, 5, 3, or 1 element(s) (Fig. 2b). Construction of the LoxCode line will be described in greater detail elsewhere 39 .
To assess the sensitivity and linearity of barcode detection, control experiments were performed with barcoded LoxCode/Rosa26CreERT2 acute myeloid leukaemia clones. After in vitro exposure to 4-Hydroxytamoxifen (4-OHT), single acute myeloid leukaemia cells were sorted into individual wells and expanded in vitro yielding clonal lines that were sequenced for barcode identification and pooled in known proportions to assess the sensitivity and linearity of barcode detection in a pool. We found that LoxCode sequences between 5 and 9 elements were detected in a near-linear manner (Fig. 2c), providing the potential for reliable quantification of the magnitude of clonal contribution to any lineage (biomass).
LoxCode recombination generates a range of barcodes via one to 15 recombination steps (this is referred to as barcode complexity, see Methods). High complexity barcodes are detected in few embryos, suggesting a high likelihood of being clonal (Fig. 2d). In contrast, low complexity barcodes are frequently detected in independent embryos, this suggests that they were made independently in several cells and therefore could not be used for clonal tracking. Our filtering steps involved the selection of infrequently occurring and complex barcodes to ensure clonal tracking (Supplementary Figs. 4c, 5, and Methods). We found that the limit of sensitivity of barcode detection was approximately 1 in 16,000 cells (Supplementary Data 4).
To investigate the temporal dynamics of complex barcode generation after 4-OHT injection, we induced barcoding at E6.5 and collected embryos 1, 6, 12, 24, 48, or 120 h later. Complex barcodes were detected as early as 1 h after induction and steadily accumulated over the first 24 h (Fig. 2e). After 24 h the proportion of quantifiable barcodes reads was stable (Fig. 2e). Thus, when barcodes are generated during lineage diversification, developmental intermediaries can be labelled. This enables reconstruction of in vivo cellular genealogies.

Benchmarking the LoxCode mouse for yolk sac lineage tracking
To benchmark the LoxCode mouse line in the yolk sac, we first performed control experiments involving lineages that were known to be separate (negative control: primitive erythrocytes and non-erythroid haematopoietic lineages 40,41 ), or known to be connected (positive control: yolk sac macrophages and brain macrophages [microglia] 23,42,43 ).
We used inducible Cre lines that would either label all cells (Rosa26-ERT2-Cre, referred to as RosaiCre) or would label Cdh5expressing cells (Cdh5-ERT2-Cre 44 , referred to as Cdh5iCre). At early developmental stages (E6.5-E8.5), Cdh5-expressing cells include precursors to the endothelial and haematopoietic lineages (E6.5-E8.5), and at E8.5 the conventional endothelium itself (refs. 28, 29, 32, 45, 46 and Fig. 1c and Supplementary Fig. 2f). Barcode labelling was induced with 4-OHT between E6.5-E8.5 and offspring of barcoded cells were analysed in the E10.5 yolk sac lineages ( Fig. 3a and Supplementary  Fig. 4d). After collection of E10.5 yolk sac lineages by flow cytometry (Supplementary Data 5), LoxCode libraries were generated, sequenced and analysed following the pipeline described in Supplementary Fig. 5 and Methods. In the negative control experiment, we found that primitive erythrocyte and the non-erythroid haematopoietic lineages (HPC, macrophage, and megakaryocyte lineages, referred to herein as the Haem group) were on separate trajectories from E7.5 (Fig. 3b, c and Supplementary Fig. 6a-c). This was consistent with population-level lineage tracking ( Supplementary Fig. 6d), the early segregation of primitive erythrocytes inferred by our scRNA-Seq data (Fig. 1b, 1cii and Supplementary Fig. 2f), the independence of the primitive erythrocytes from the HPCs 40,41 , and the in vivo divergence of Mk and EryP lineage recently described 47 . This suggested that the LoxCode molecular protocol and analysis pipeline did not create spurious connections. In the positive control experiment, we found that >90% of yolk sac macrophages and cephalic microglia populations shared clonal ancestors at E7.5 (Fig. 3d). This demonstrated that expected ancestries were robustly detected using our method.
In addition, we found that the Haem sub-lineages derived from common clonal ancestors at E6.5 and seeded independent macrophage, HPC, and megakaryocyte trajectories between E7.5 and E8.5  Endothelial and haematopoietic cells diverge from E7.5 We next investigated the relationship between the Haem group and the endothelium. To this end, barcode formation was induced in E7.5 and E8.5 Cdh5-expressing cells using LoxCode:Cdh5iCre mice, and then recovered in the E10.5 yolk sac. This revealed that Cdh5-expressing clones contributed to either Haem cells or endothelium but not to both lineages ( Fig. 4d and Supplementary Fig. 8a, c). We used LoxCode:Rosa26iCre mice to enable unbiased labelling, this largely confirmed that the endothelium and Haem group were ancestrally independent. However, from 116 clones two instances of dual lineage contribution were observed ( Fig. 4e and Supplementary Fig. 8b).
Although rare and only a minor contributor to the E10.5 biomass, this pattern of contribution was consistent with the haemangioblast theory.
To investigate the ancestral relationship between haematopoietic, endothelial, and mesenchymal lineages in the yolk sac, we collected E10.5 endothelium, primitive erythrocyte, mesenchyme, and the Haem group (Fig. 6a). When LoxCode:Rosa26iCre mice were induced at E7.5 few barcodes were shared between the mesodermal derivatives ( Fig. 6b and Supplementary Fig. 10a) indicating that mesodermal derivatives in the yolk sac were on separate lineage trajectories from E7.5.
Induction of barcode formation at E6.5 yielded a striking pattern of barcode sharing that was consistent with labelling a dynamic developmental continuum that spanned mesoderm with multi-lineage contribution and lineage-restricted trajectories ( Fig. 6c and Supplementary Fig. 10b). The E6.5 clonal outcomes revealed a clear picture of how the mesoderm diversifies into its four major outcomes, this included: (1) Multi-outcome mesoderm that contributed to mesenchyme, conventional endothelium, primitive erythrocytes, and/or Haem lineages (i.e., haematogenic endothelium). (2) Haemangioblasts that were restricted to the formation of conventional endothelium and primitive erythrocyte/Haem outcomes. Thus were capable of producing both haematogenic endothelium and conventional endothelium.  Each class of dual-outcome precursor was observed in all of the nine embryos induced at E6.5 (Fig. 6cii), which demonstrates the biological reproducibility and robustness of the observations. Induction of barcode formation at E5.5 identified haemangioblast, mesenchymoangioblast, and haematomesoblast outcomes as well as a greater number of multi-lineage clones ( Fig. 6d and Supplementary  Fig. 10c). As recombination occurs for at least 24 h in this system, we cannot pinpoint the exact time of emergence of each type of ancestors, however, our data clearly demonstrated a progression from multi > dual > uni-lineage outcome between E5.5 and E7.5.

Discussion
Using scRNA-Seq gene expression, we made the surprise discovery that mesenchyme-associated genes were co-expressed with haematopoietic and endothelial genes in the E7. 25  early haematopoietic and endothelial development existed. Using the LoxCode mouse to induce cellular barcoding during unperturbed embryonic development, we were able to test our hypothesis in vivo. This powerful approach provided evidence that bona fide haemangioblasts exist in vivo, demonstrated the in vivo relevance of the mesenchymoangioblasts previously identified in vitro 48 , and enabled the discovery of a previously undescribed class of haematogenic precursor, the haematomesoblast which connects the mesenchymal derivatives of the mesoderm to the haematopoietic lineages.
It could be possible that the patterns of dual lineage outcomes observed arose because of lineage bias rather than lack of tripotentiality. Our observed limit of LoxCode barcode detection was 1 in 16,000 cells, which provided a 2-5 fold coverage of the non-erythroid yolk sac lineages. Thus, if a precursor such as the haematomesoblast contributed to a third lineage (e.g., the endothelium), the magnitude of this contribution would have been vanishingly small. Additionally, when investigated at E7.5, more than 90 % of the biomass of E10.5 yolk sac macrophages and cephalic microglia was shared (Fig. 3d). This suggested that any possible dropout effect (i.e., causing dual-rather than tri-lineage outcomes) was unlikely to be an issue by virtue of the increased time for clonal amplification (4-5 days compared to 3 days). Indeed, >90% connections were observed for the Haem lineages with an E6.5 induction (Fig. 4a) and the percentage of shared barcodes across PCR technical replicates ranged between 96.2 and 99.6% in the E5.5 induction experiments (Supplementary Fig. 10). A caveat of our study is that all end-point analysis was performed at E10.5, therefore it remains possible that clonal outcomes that we observed as dual outcome could give rise to a third lineage later in embryogenesis (e.g., given more time, mesenchymoangioblasts could generate hemogenic endothelium and so contribute to haematopoiesis).
Whether the haemangioblast, mesenchymoangioblast, and haematomesoblast precursors represent stable and isolatable populations with only dual lineage outcomes is unclear. A previous study showed that colonies with endothelial and haematopoietic output also contained mesenchymal derivatives 10 . Although this could be an outcome of the complex culture system required to investigate the differentiation potential of these cells, this could also indicate that all cells with a dual lineage output in vivo are fundamentally tripotential ancestors that only differentiate along two lineage trajectories due to local environmental cues. Heterotopic transplantations have shown that transplanted epiblast cells adopt the fate of their new location rather than that of their region of origin 49,50 . This highlighted the importance of regional cues in lineage differentiation. In the yolk sac, haematopoietic outcome is largely restricted to the blood band ( Fig. 1aii and ref. 24). This could either be due to inhibition of the haematopoietic potential of a tripotent mesodermal ancestor at the level of the endothelial zone or the activity of a specific mesenchymoangioblast precursor. Of note, the finding that from E6.5 blood (particularly EryP) and endothelial lineages are seeded by largely ancestrally distinct clones is in keeping with previous in situ 13 and ex vivo 14 tracking studies.
Although the molecular nature of the intermediates remains unclear, our findings indicate that haematopoietic and endothelial lineages are generated via both clonally related and unrelated ancestries.
Regarding the relationship between the haematopoietic lineages in the yolk sac, we have demonstrated that despite previous interpretations of yolk sac megakaryocytes being a primitive lineage coemerging with the primitive erythrocytes 3,4 , these two lineages diverge between E6.5 and E7.5, prior to the emergence of the haemogenic endothelium. Furthermore, we found that megakaryocyte, macrophage and HPC lineages predominantly derive from a common haematopoietic precursor which yields progeny that diverge between E7.5 and E8.5 and continue to develop in parallel in the yolk sac without substantial trajectory cross over before E10.5.
In summary, these data demonstrate the in vivo existence of the haemangioblast, the in vivo activity of mesenchymoangioblasts, and the discovery of a new class of haematogenic precursor-the haematomesoblast. The haemato-endothelial lineages of the yolk sac are established by the output of this precursor trio (Fig. 7).

Mice
Flk1-gfp 69 , Cdh5ERT2Cre 44 , Rosa26ERT2Cre 70 , and Rosa26ReYFP 71 lines were maintained on a C57BL/6 background. All animal experiments were approved by The Walter and Eliza Hall Institute animal ethics committee.

Confocal imaging
Embryos were fixed in 2% PFA for 20 min at room temperature. Samples were blocked and permeabilized in 0.6% Triton-X/10 % FCS/Ca 2+ / Mg 2+ DPBS for 30 min at room temperature. Staining with primary and secondary antibodies (Supplementary Data 1) was performed either overnight at 4°C or for 6-8 h at room temperature in 0.6% Triton-X/ 10% FCS/Ca 2+ /Mg 2+ DPBS. Nuclei were stained with DAPI for one hour at room temperature. Embryos were transferred to 4 ml silanized glass vials (Supelco) and dehydrated in a gradient of tetrahydrofuran (THF, Sigma) in H 2 O (50%, 70%, 100%) with 1.5 h washes held at room temperature, and a final overnight incubation in 100% THF held at 4°C 72 . The next morning embryos were transferred to a coverslip mounted with a silicone Fastwell (Grace BioLabs) and cleared in two changes of 100% dibenzyl ether (DBE, Sigma) before imaging on a Zeiss LSM780 confocal microscope. Data were processed using Imaris Software (v9, Bitplane).

Flow cytometry
E7 yolk sacs were dissected and dissociated in 0.25% Trypsin/EDTA (Gibco) at 37°C for five minutes 73 . Samples were washed with 1 ml of FACS buffer (7% FCS/ Ca 2+ /Mg 2+ -free DPBS) and centrifuged at 500 × g for 5 min. Samples were resuspended in 1 ml of FACS buffer and mechanically disrupted by gentle trituration with a P1000 20 times, then filtered through a 40 μm nylon mesh filter and centrifuged again before being placed on ice. Tissues from all older developmental stages (E8-10.5 yolk sacs) were dissociated in 10% Collagenase-Dispase solution (5 mg/ml stock; Roche) made in Dissection Medium (7% FCS/ Ca 2+ /Mg 2+ DPBS) at 37°C for 45-60 min, washed and mechanically disrupted as described above. Single cell suspensions were maintained on ice. Cells were washed and stained in FACS buffer. Staining of single cell suspension was performed with primary antibodies for 1 h. Dead cells were excluded according to uptake of 7-aminoactinomycin D (7-AAD). Gate placement was determined using appropriate isotype and fluorescence minus one controls. Cells were analysed on either BD LSRFortessa or LSRII cytometers. Flow cytometry cell sorting was performed on BD FACSAria using a 100 μm nozzle with collection in 1.5 ml eppendorf tubes containing 700 μl FACS Wash to minimise cell loss with collection tubes and cells maintained at 4°C throughout the sorting process. Sorted cells were always reanalysed to determine sort purity. Data were analysed using FlowJo software. Fluidigm C1 single cell dataset Capture and cDNA generation. Populations of interest were individually purified by flow cytometry sorting (Supplementary Data 1). Cell counts were performed by haemocytometer; 6000 cells were prepared for processing according to the manufacturer's instructions for capture on the Fluidigm C1 integrated fluidic circuit (IFC) with the capacity for 96 individual cells and a 10-17 μm capture aperture. All scripts used were for the 10-17 μm IFC. Briefly, Solutions A-C were prepared and held on ice, then the IFC was primed before 3000 cells were loaded in 20 μl of 3: 2 FACS Wash: C1 Suspension Reagent for capture (150 cells/μl). After inspecting and imaging each capture site to record the presence and quantity of captured cells and their morphology, Solutions A-C were loaded into the IFC according to the Loading Map, and overnight cDNA and pre-amplification was performed. The following morning~3 μl of cDNA was harvested into 10 μl of C1 DNA Dilution Reagent in a 96 well plate. Four single cell samples were run on a Tapestation as quality control to assess whether the overnight cDNA step was successful, before storing the plate at −20°C until sequencing library preparation.
Single-cell RNA library preparation and sequencing. cDNA concentration was assessed for each cell sample using the Qubit or a PicoGreen plate reader as per the manufacturer's instructions. Libraries were prepared using the Nextera XT kit and Illumina 96 index kits according to the Fluidigm protocol modified from the Illumina protocol to use ¼ of the kit per 96 well plate. Briefly, cDNA concentrations were adjusted to be 0. General statistical analyses (outside scRNA-seq data). Prism 7 (GraphPad) was used for data analysis and graph production. Data represented as mean ± standard deviation (SD), and analysed using Student's t-test (two-way, unpaired). One-way ANOVA (using Tukey's P value adjustment) was used for multiple comparisons. Differences were considered statistically significant when p < 0.05, designation of 'ns' indicates differences were not significant. * = p < 0.05, ** = p < 0.01, *** = p < 0.001, **** = p < 0.0001. 'n' was used to designate the number of independent experiments.
Construction of the LoxCode mouse. The LoxCode construct was assembled using degenerated oligonucleotides containing a high diversity of element sequences that were sequentially clone into pBlueScriptIISk. Sequencing revealed that all barcode elements differed from at least 2 nucleotides in either orientation. The LoxCode mouse was created using CRISPR technology. Cas9-gRNA ribonucleoproteins (guide RNA sequence: 5′-CTCCAGTCTTTCTAGAAGAT-3′) and a circularised vector (pBlueScriptIISk backbone) containing the Lox-Code cassette were injected into C57BL/6J oocytes before reimplantation into pseudopregnant females. Pups were screened by PCR for presence of the insertion. Sanger sequencing was performed to confirm the LoxCode sequence. The LoxCode line was bred to homozygosity on a C57BL/6 background. For distribution of LoxCode mice, contact corresponding authors.  Fig. 4c). They were also used to screen barcodes size classes regarding linearity of output (sequenced barcodes) versus input (inputed number of cells) (Fig. 2c).
Isolation of barcoded populations. Embryos were generated by crossing LoxCode/LoxCode mice with Cdh5ERT2Cre/+ or Rosa26ERT2Cre/Rosa26ERT2Cre mice. Noon of the day a vaginal plug was found was counted as E0.5. Barcoding was induced by injection of 4-OHT between E6.5 and E8.5 following a protocol optimised for each line for maximum informative barcode recovery: Cdh5ERT2Cre crosses -intraperitoneal injection of 300 μg/mouse of 4-OHT (dissolved in corn oil, Sigma-Aldrich); Rosa26ERT2Cre crosses: intravenous injection of 100 μg of 4-OHT (dissolved in KolliPhor, Sigma-Aldrich) 74 . Induced mice were kept in separate cages to prevent untimely induction via tamoxifen shedding. Yolk sac or head cell populations were recovered at E10.5. Concepti were dissected out of the uterus in (37°C 7% Fetal Calf serum, DPBS with Ca 2+ and Mg 2+ ) and rinsed three times in this buffer. The umbilical cord was pinched and cut beneath the placenta and the embryo and yolk sac transferred to a clean dish of (37°C 7% Fetal Calf serum, Ca 2+ /Mg 2+ -free DPBS, FACS buffer). The yolk sac was dissected with scissors (avoiding pulling on the tissue to preserve endothelial cells) and the embryo quickly moved to a fresh plate. The yolk sac and blood spilled from the umbilical cord were collected. Embryos were screened for normal development and heartbeat, staged by general morphology and somites counts and used for genotyping before sorting. For head macrophage (microglia) purification, heads were dissected after embryo scoring. Yolk sac and head samples were rinsed in DPBS and dissociated enzymatically with Liberase (100 μg/ml in Ca 2+ /Mg 2+ -free DPBS) for 12 min at 37°C. The reaction was stopped by adding 2 ml of cold FACS buffer and immediate centrifugation. Samples were resuspended in 1 ml of FACS buffer with 2.5 mM EDTA, incubated for a few minutes on ice to weaken cell adhesion further, and mechanically dissociated with a P1000 pipetman. Samples were filtered through a 40 μM nylon mesh, centrifuged, and resuspended for antibody labelling. Antibodies were purchased from Biolegend (PDGFRA (APA5), PECAM (390), CX3CR1 (SA011F11)), Invitrogen (CD41 (eBioMWReg30)) or made in-house (Ter119) and CD45 (30-F11). After antibody staining (1 h, 4°C), cells were washed and counterstained with 7AAD (Invitrogen) for dead cell exclusion. Cells were sorted on an Aria Cell Sorter (Becton Dickinson) and collected in FACS buffer. An aliquot of cells was used to assess sample purity with a 95% threshold (Supplementary Data 5), and the remainder was immediately centrifuged after sort. Cell pellets were lysed with 100 μg/ ml Proteinase K in proteinase K buffer, digested for 2 h at 56°C, inactivated for 1 h at 85°C and 5 min at 95°C. Lysates were maintained at −20°C until library preparation.
Analysis of the LoxCode sequencing data. All analyses were carried out using custom C++ and R scripts (available on request). Raw pairedend dual-indexed sequencing data was demultiplexed into individual samples. For each sequence, LoxCode elements (stereotypical in position) were extracted and aligned to those of the original cassette. To compute the minimal number of recombination steps necessary to make each LoxCode (complexity), a reference table with all possible combinations was created. For this, a simulation of all possible recombinations (excisions and inversions) of the original LoxCode construct was carried out, assuming a 82 bp minimal distance between loxP sites 75 . The resulting barcodes were stored and attributed a complexity of one. In a second step, all entries of this table were subjected to the same process. Barcodes generated in that way already present in the table were discarded, while new barcodes were added and attributed a complexity of two. This process was repeated 15 times until no new barcodes were generated, establishing the minimum number of recombination events needed to create any specific barcode and an expected theoretical diversity of 30,204,722,030 barcodes 38 . The usage of paired-end MiSeq2 600 cycles kits only allowed the sequencing of 12 out of 13 elements. LoxCodes with less than 13 elements (the vast majority of barcodes) were sequenced in full with this protocol. For 13-element LoxCodes, the sequence and orientation of the middle element was inputed from its surrounding elements, assuming a minimal number of recombination steps.
To exclude barcodes that could be illegitimate or made independently in two cells of the same embryo, barcoding data was processed following the flowchart on Supplementary Fig. 5: barcodes not conforming to the expected structure or with limited diversity (1-element, 13-element unrecombined barcodes) were removed. Barcodes potentially resulting from PCR artefacts ( Supplementary Fig. 4c) were filtered out if their reads represented less than 10% of those of all their potential parents combined (ie any barcode containing all the elements of the potential offspring barcode). A detection threshold of 100 reads was used to remove sequencing errors and potential contamination. Illegitimate barcode filtering and detection thresholds were determined using control experiments described above. Remaining (legitimate) barcodes were normalised for reads per cell across each dataset. As some barcodes were detected in all or many embryos, this raised the possibility of repeat barcode generation in independent cells. We found that barcode classes defined by length and complexity had various inherent combinatorial diversity (Fig S5  Box 1) and that high diversity (>2000 possible combinations) classes had a higher chance of being unique to one embryo in our first two datasets (11 embryos) ( Supplementary Fig. 5 Box 2). We used this information to filter for barcodes with the highest probability of clonality: belonging to length/complexity classes with the highest diversity and likelyhood to be detected in one embryo only (coloured orange in Supplementary Fig. 5) and uniquely detected in each dataset analysed. Barcodes passing all filtering steps were termed informative barcodes. Heatmaps were generated using Heatmap.2 (gplots R package). Barcode behaviour (display of number of barcodes with a given behaviour per experiment) and biomass (% of cells of a given lineage deriving from ancestors with a particular fate) analyses were generated using custom R scripts. For biomass analysis, only 5-9 elements barcodes were included, as 1-3 and 13 elements barcode frequencies were affected by beads clean-up, PCR, and sequencing biases. Total numbers of barcodes generated in each experiment as shown in Supplementary Data 8.
Duration of barcode formation after 4-OHT administration. Embryos were generated by crossing LoxCode/LoxCode or LoxCode/+ mice with Rosa26ERT2Cre/Rosa26ERT2Cre mice. Noon of the day, a vaginal plug was found was counted as E0.5. Barcoding was induced by injection of 100 μg of 4-OHT (dissolved in KolliPhor, Sigma-Aldrich) at E6.5 by intravenous injection. Whole concepti were collected 1, 6, 12, and 24 h after induction, and yolk sacs only after 48 and 120 h. 8-14 embryos were collected for each timepoint. Samples were processed as described above and sequenced on MiSeq using a 600 cycles kit. Barcode sequences and complexity were extracted as previously described. For each embryo, the proportion of reads or barcodes dedicated to quantifiable (complex) barcodes was determined.
10× Genomics scRNA-Seq dataset. Populations of interest were individually purified from a pool of 33 C57BL/6 E10.5 embryos by flow cytometry (Supplementary Data 1), as described on Supplementary  Fig. 4d. To enable the identification of the population of origin, each population was labelled with a distinct MultiSeq hashtag ( 76 ). 17,000 cells were loaded on the 10× Genomics Chromium system. 13,780 cells were identified using CellRanger. A high quality 7316 cells dataset was obtained after screening for transcriptome quality (>3000 UMI, >1000 features, <5% mitochondrial transcripts) and excluding cells with inconclusive hashtag calls or multiplets. Transcriptomes were scaled using ScTransform. Seurat clusters were identified and annotated using DE gene lists. Good correlation was found between hashtag call and transcriptional identity. In particular, >97% of cells purified as Mes belonged to the mesenchymal cluster (Fig. 5a-c). Others were most likely sorter errors or uncalled doublets. Cells belonging to the mesenchymal cluster were re-scaled and clustered for further identification. Aside from endothelial, haematopoietic, and mesenchymal populations, a small number of extraembryonic endodermal cells were identified (19 cells , cluster 22, Fig. 5a). The majority of those cells were labelled with an endothelial hashtag, most likely due to autofluorescence in the BV421 channel (CD31). Those cells, known to diverge from the epiblast lineage at E4.5 77,78 , represent a minimal (<4%) contamination of the endothelium, with no ancestral relationship to any of the followed populations in the frame of these experiments, and would therefore appear as endothelium only-barcodes in these experiments.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The Fluidigm scRNA-Seq dataset of E7.25 -E10.5 yolk sac lineages have been deposited in NCBI's Gene Expression Omnibus under accession code GSE164336. The 10X scRNA-Seq dataset of E10.5 yolk sac lineages have been deposited in NCBI's Gene Expression Omnibus under accession code GSE204896. Differential expression analysis of scRNA-Seq has been provided in Supplementary Data 3, 6, and 7. LoxCode mice and/or raw or processed data presented in this manuscript will be made available on request.

Code availability
All codes used will be made available upon request.