Introduction

In early fetal development, between embryonic day (E) 8.5 and E9.5 in mouse, equivalent to 17–23 days of human gestation, a series of inductive tissue interactions between the definitive endoderm (DE) and the surrounding splanchnic mesoderm (SM) progressively patterns the naive foregut tube into different progenitor domains. These domains further develop into distinct visceral organs including the trachea, lung, esophagus, stomach, liver, pancreas, and proximal small intestine1,2. The DE gives rise to the epithelial lining and parenchyma of the respiratory and digestive organs, while the SM gives rise to the mesenchymal tissues such as smooth muscle, fibroblasts, and mesentery surrounding the visceral organs1,2. This foregut patterning defines the landscape of the thoracic and abdominal cavities, setting the relative position of different organs. Disruptions in this process can lead to life threatening congenital birth defects.

A critical inductive role for the mesenchyme in gut tube organogenesis was first established in the 1960s, when it was shown that SM transplanted from different anterior–posterior (A-P) regions of the embryo could instruct the adjacent epithelium to adopt the organ identity consistent with the original SM position3. Since that time we have learned much about the mesoderm-derived paracrine signals in endoderm organogenesis1,2. However, most studies to date have focused on individual organ lineages or individual signaling pathways, and thus we still lack a comprehensive understanding of the temporally dynamic combinatorial signaling in the foregut microenvironment that orchestrates organogenesis. Moreover, several fundamental questions about the mesoderm have remained unanswered over the decades. How many types of SM are there, and does each fetal organ primordia have its own specific mesenchyme? How are the SM and DE lineages coordinated during organogenesis? What role if any does the endoderm have in regionalization of the mesoderm?

Initial specification and patterning of the embryonic mesoderm and endoderm occurs during gastrulation, from E6.25 to E8.0 in the mouse, as these germ layers progressively emerge from the primitive streak. The lateral plate mesoderm emerges from the streak after the extra-embryonic mesoderm, and is followed by the intermediate, paraxial, and axial mesoderm4,5. Concomitantly DE cells also delaminate from the streak and migrate along the outer surface of the mesoderm eventually intercalating into the overlaying visceral endoderm. By E8.0, morphogenetic processes begin to transform the bi-layered sheet of endoderm and mesoderm into a tube structure as the anterior DE folds over to form the foregut diverticulum and the adjacent lateral plate mesoderm containing cardiac progenitors migrates towards the ventral midline6. The lateral plate mesoderm further splits into an outer somatic mesoderm layer next to the ectoderm which gives rise to the limbs and body wall, and an inner splanchnic mesoderm layer, which surrounds the epithelial gut tube7,8. The first molecular indication of regional identity in the SM is the differential expression of Hox genes along the A-P axis of the embryo9. However, in contrast to heart development, where cell diversification has been well studied10,11,12, the molecular mechanism governing the foregut SM regionalization are obscure, particularly during the critical 24 h when the foregut DE subdivides into distinct organ primordia.

Recently, single-cell transcriptomics have begun to examine organogenesis at an unprecedented resolution13,14,15,16, however, studies in the developing gut have either primarily examined the epithelial component or later fetal organs after they have been specified17,18,19. Here we use single-cell transcriptomics of the mouse embryonic foregut to infer a comprehensive cell-state ontogeny of DE and SM lineages, discovering diversity in SM progenitor subtypes that develop in close register with the organ-specific epithelium. Projecting the transcriptional profile of paracrine signaling pathways onto these lineages, we infer a roadmap of the reciprocal endoderm–mesoderm inductive interactions that coordinate organogenesis. We validate key predictions with mouse genetics showing that differential hedgehog signaling from the epithelium patterns the SM into gut tube mesenchyme versus mesenchyme of the liver. Leveraging the signaling roadmap, we generate different subtypes of human SM from hPSCs, which previously have been elusive.

Results

Single-cell transcriptomes define diversity in the foregut

To comprehensively define lineage diversification during foregut organogenesis, we performed single-cell RNA sequencing (scRNA-seq) of the mouse embryonic foregut at three time points that span the period of early patterning and lineage induction: E8.5 (5–10 somites; s), E9.0 (12–15 s), and E9.5 (25–30 s) (Fig. 1a, b). We microdissected the foregut between the posterior pharynx and the midgut, pooling tissue from 15 to 20 embryos for each time point. At E9.5, we isolated anterior and posterior regions separately, containing lung/esophagus and liver/pancreas primordia, respectively. A total of 31,268 single-cell transcriptomes passed quality control measures with an average read depth of 3178 transcripts/cell. Cells were clustered based on the expression of highly variable genes across the population and visualized using uniform manifold approximation projection (UMAP) and t-distributed stochastic neighbor embedding (tSNE) (Fig. 1c; Supplementary Fig. 1). This identified 24 cell clusters that could be grouped into nine major cell lineages based on well-known marker genes: DE, SM, cardiac, other mesoderm (somatic and paraxial), endothelium, blood, ectoderm, neural, and extraembryonic (Supplementary Fig. 1). DE clusters (4448 cells) were characterized by co-expression of Foxa1/2, Cdh1, and/or Epcam, whereas SM (10,097 cells) was defined by co-expression of Foxf1 (Fig. 1d), Vim, and/or Pdgfra as well as low or absent expression of cardiac and other mesoderm specific transcripts.

Fig. 1: Single-cell analysis of the mouse foregut endoderm and mesoderm lineages.
figure 1

a Representative mouse embryo images at three developmental stages showing the foregut region (dashed) that was microdissected (insets) to generate single cells. At E9.5, anterior foregut (a.fg) and posterior foregut (p.fg) were isolated separately. E, embryonic day; s, somite number; n, number of cells. Scale bar 1 mm. b Schematic of the RNA-seq workflow. c UMAP visualization of 31,268 cells isolated from pooled samples of all three stages. Cells are colored based on major cell lineages. d. Whole-mount immunostaining of an E9.5 mouse foregut, showing the Cdh1+ endoderm and the surrounding Foxf1+ splanchnic mesoderm. n = 4/4 embryos. Scale bar: 100 μm. e, f tSNE plot of in silico isolated E9.5 endodermal (e) and splanchnic mesodermal (f) cells. g, h Pseudo-spatial ordering of E9.5 endodermal (g) and mesodermal (h) cells along the anterior–posterior (A-P) axis. i, j Schematic of the predicted locations of E9.5 cell types mapped onto i the embryonic mouse foregut endoderm (yellow) and j mesoderm (orange). def. definitive; meso, mesoderm; lg, lung; eso, esophagus; lv, liver; splanch; splanchnic; stm, septum transversum mesenchyme; sto, stomach; pha, pharynx. Source data for (g) and (h) are provides in the Source data file.

Focusing on lineage diversification in the DE and SM, we selected these cells in silico for further analysis. We defined 11 major DE clusters consisting of 26 stage-specific sub-clusters (E9.5, 12 clusters; E9.0, 8 clusters; E8.5, 6 clusters) and 13 major SM groups comprised of 36 stage-specific sub-clusters (E9.5, 17 clusters; E9.0, 12 clusters; E8.5, 7 clusters) (Fig. 1e, f, Supplementary Figs. 2, 3, Supplementary Data 1). We annotated clusters by comparing their distinguishing genes with published expression patterns of over 160 genes in the Mouse Genome Informatics (MGI) database20. These data provide a comprehensive single-cell resolution view of early foregut organogenesis and can be explored at URL https://research.cchmc.org/ZornLab-singlecell.

Our annotations identified all the major DE organ lineages at E9.5 including: Tbx1+ pharynx, two Nkx2-1/Foxa2+ respiratory clusters (trachea and lung), two Sox2+ esophagus clusters (one of which is likely dorsal based on Mnx1 expression), two Sox2/Osr1+ stomach clusters, two Alb/Prox1/Afp+ hepatic clusters (c1_hepatoblasts and c10_ early hepatocytes with higher Alb/HNF4a expression), Sox17/Pdx1+ ventral hepatopancreatic duct, Mnx1/Pdx1+ dorsal pancreas, and Cdx2+ duodenum (Fig. 1e). Consistent with our dissections we did not detect any Nkx2-1+/Hhex+ thyroid progenitors. Similar to recent scRNA-seq analysis of the E8.75 gut epithelium19, we also annotated half a dozen distinct DE progenitor states between E8.5 and E9.0, based on the restricted expression of lineage specifying transcription factors (TF), including Otx2+ anterior foregut, Sox2/Sp5-enriched dorsal lateral foregut, Osr1/Irx1-enriched foregut, Hhex+ hepatic endoderm, Nkx2-3+ ventral DE adjacent to heart and a small population of Cdx2+ midgut cells (Supplementary Fig. 2 and Supplementary Data 1).

Validation of mesenchymal subtypes

At all stages, the SM cell-type diversity in the foregut was, much more complex than previously appreciated (Fig. 1f and Supplementary Fig. 2). However, unlike the DE, SM populations were typically defined not just by one or two markers, but rather by a combination of multiple transcripts (Fig. 2a, b and Supplementary Data 1). In situ hybridization and immunostaining of E9.5 foreguts and embryo sections confirmed that combinations of co-expressed transcripts defined different organ-specific SM subtypes (Fig. 2c–q and Supplementary Movies 1 and 2). The 17 SM cell populations at E9.5 included five Tbx1/Prrx1+ pharyngeal clusters, Isl1/Mtus2+ cardiac outflow tract cells, Nkx6-1/Gata4/Wnt2+ respiratory and Nkx6-1/Sfrp2/Wnt4+ esophageal mesenchyme (Fig. 2b–j). We annotated three Barx1/Hlx+ stomach mesenchyme populations, one was probably ventral based on Gata4 expression (Fig. 2I, k). Finally, we annotated one Hand1/Hoxc8+ duodenum mesenchyme population. We were unable to identify pancreas-specific mesenchyme and suspect that these cells were in the stomach or duodenum clusters (Fig. 2p, q).

Fig. 2: Lineage-restricted gene expression in different SM cell types.
figure 2

a Schematic of the E9.5 foregut indicating the level of sections. b Dotplot showing average scRNA-seq expression (normalized to a scale of 0–2) of marker genes in different E9.5 SM cell clusters. The size of the dot represents the % of cells in a cluster expressing the marker. cg Whole-mount immunostaining (c) or in situ hybridization (dg) of dissected E9.5 foregut tissue. n = 2/2 embryos/probe. Scale bar: 100 μm. hq RNA-scope in situ detection on transvers E9.5 mouse embryos sections (i–iv indicates the A-P level of the section in a). n = 2/2 embryos/probe combination. Scale bar: 50 μm. duo; duodenum, dp; dorsal pancreas, eso; esophagus, ht; heart, lg; lung, liv; liver, oft, outflow tract, pha pharynx, res; respiratory, stm; septum transversum, mesenchyme, sto; stomach, sv; sinus venosus, vp; ventral pancreas.

The liver bud was the most complex with five distinct mesenchymal populations. Data mining of MGI and in situ validation allowed us to annotate an Alcam/Wnt2/Gata4-enriched septum transversum mesenchyme (stm), a Tbx5/Wnt2/Gata4/Vsnl1+ sinus venosus, a Msx1/Wnt2/Hand1/Col1a1+ fibroblast population, and two Wt1/Gata4/Uroplakin+ mesothelium populations (Fig. 2k–n and Supplementary Fig. 4). Interestingly we observed the restricted expression of Hand1 in the posterior versus anterior liver bud (Supplementary Fig. 4b) and the mutually exclusive expression of Msx1 from Wnt2 and Wt1 (Supplementary Fig. 4e-f). This indicated extensive compartmentalization of the early liver bud mesenchyme warranting future investigation.

Pseudotime spatial ordering of foregut cells

Different organs form at precise locations along the A-P axis of the gut. To assess whether this was reflected in the single-cell transcriptional profiles, we employed a pseudotime analysis, which several groups have recently used to examine positional information of cells in a continuous field of embryonic tissue16,19,21. Thus we analyzed the DE and SM cells at each stage using diffusion maps, a dimensional reduction method for reconstructing developmental trajectories22,23. Anchoring the most anterior pharyngeal cluster as a root, we plotted the pseudotime density distribution for each cluster based on transition probabilities from root cells to all other cells in the graph (see the “Methods” section). Remarkably, this ordered both the DE and SM cell populations according to their appropriate A-P position in the embryo, indicating that the analysis represents an unbiased proxy of pseudo-space (Fig. 1g, j; Supplementary Fig. 2). The data also indicated that at this time in development, cells in the embryonic gut tube exhibit a continuum of transcriptional signatures with spatially adjacent cell types having more similar expression profiles than distant cell types. Indeed, the E9.5 clusters from the anterior dissections were located in the anterior half of the pseudo-space continuum, compared to posterior tissue, confirming the robustness of the computational ordering. Finally, we examined Hox genes which are known to be expressed in a co-linear fashion along the A-P axis and accordingly we observed a progressive increase of posterior Hox paralog expression in more posterior clusters, particularly within the SM (Supplementary Fig. 5a).

Combining the pseudo-space analysis, MGI curations and in situ validation, we were able to map each DE and SM population to their approximate locations in the gut tube (Fig. 1i, j; Supplementary Fig. 2). This revealed that the SM diversity mirrored DE lineages, indicating their closely coordinated development from the very beginning of organogenesis.

Transcription factor code of foregut endoderm and mesenchyme

DE organ lineages have historically been defined by the overlapping expression domains of a few key transcription factors (TFs)1,2,24. While some regionally expressed TFs have been reported in the SM, the single-cell RNA-seq data allowed us to define a comprehensive combinatorial code of differentially expressed TFs that distinguish different SM and DE subtypes (Supplementary Fig. 5c and Supplementary Data 1). Selected signatures of TFs with enriched expression in organ-specific endoderm include Nkx2-5/Pax1 for pharynx, Sox21/Klf4 in esophagus, Nkx2-1/Irx1 for the trachea, Nkx2-1/Sp5 for lung, Gata4/Osr1 for gastric, Pdx1/Ptf1a for pancreatic, Hhex/Sox17 for the hepatopancreatic biliary duct progenitors, Onecut3/Nr5a2 for hepatic, and Cdx2/Osr2 duodenum lineages. TFs signatures with enriched expression in organ-specific mesenchyme include Tbx1/Irx2 for the pharyngeal, Osr1/Hic1 for the esophageal, Sp5/Hoxa5 for respiratory, Barx1/Tox3 for gastric, Lhx9/Gata4 for stm, and Hand1/Pitx1 for duodenum mesenchyme. This analysis also identified lineage-restricted SM markers such as homeodomain TF Nkx6-1. Well-known for its expression in the pancreatic endoderm (Fig. 2p)25,26, Nkx6-1 was also specifically expressed in the respiratory and esophageal mesoderm at E9.5 (Fig. 2b, c, h–j, and Supplementary Movies 1 and 2). These TF codes will facilitate lineage tracing experiments and prompt studies testing their role in mesenchymal differentiation.

Synchronized endoderm and mesenchyme lineage trajectories

The transcriptional cell-state complexity of the DE and SM doubled in just 24 h between E8.5 and E9.5, reflecting progenitors forming more specialized cell types. To examine the temporal dynamics of lineage diversification, we visualized the single-cell data using SPRING (Fig. 3a, b), an algorithm that represents k-nearest neighbors in a force-directed graph, facilitating analysis of developmental trajectories27. Both the DE and SM trajectories progressed from a continuum of closely related cell states at E8.5 to transcriptionally distinct cell populations at E9.5 (Fig. 3a, b and Supplementary Fig. 6), consistent with the transition from multipotent progenitors to organ-specific lineages. Importantly the cell clusters defined by tSNE were well-preserved in SPRING (Supplementary Fig. 6), supporting the robustness of the clustering. One striking observation evident in the structure of the SPRING plots was the apparent coordination of SM and DE lineage diversification over the 24 h from E8.5 to E9.5.

Fig. 3: Coordinated endoderm and mesoderm cell trajectories.
figure 3

a, b Force-directed SPRING visualization of the a splanchnic mesoderm (n = 10,097) and b definitive endoderm (n = 4448) cell trajectories. Cells are colored by developmental stage. White arrows indicate cell lineage progression. c, d Confusion matrix summarizing parent-child single-cell voting for c SM and d DE cells, used to construct the cell-state tree. Each cell at the later time point (y-axis) voted for its most similar cell at the preceding time point (x-axis) based on transcriptome similarity (KNN) (see Methods). All of the votes for a give cluster are tabulated, normalized for cluster size (see Methods for details) and represented as a % of votes in the heatmap. E8.5, E9.0, and E9.5 clusters are designated as (a), (b), and (c), respectively. e, f Cell-state trees of e SM and f DE lineages predicted by single-cell voting. The top choice linking cell states of sequential time points are solid lines, and prominent second choices are dashed lines. Nodes are colored by stages and annotated with the cluster numbers.

To more clearly visualize the developmental trajectories associated with lineage diversification, we generated a consensus cell-state tree using a single-cell voting method, where each cell of a later time point votes for its most likely parent of the previous time point based on gene expression similarity. We then tabulate all the cell votes for each cluster (Fig. 3c, d) and represented this in a simple tree manifold (Fig. 3e, f). While we cannot rule out SM migration bringing distant cell types to a given organ, the data supported the notion of transcriptionally related cell states arising from the subdivision of common progenitor populations. Given that our time points were generated from pooled embryos of slightly different ages; it was possible that parent-child relationships could exist within a given time point. To address this and confirm the single-cell voting results, we assessed each trajectory with a pseudotime analysis that computationally predicts progenitor states in a cell population (Monocle14). In general, the pseudotime analysis agreed with the single-cell voting. But in the case of the liver endoderm, Monocle predicted a parent-child relationship within E9.0, where Hhex+ posterior foregut endoderm (cluster e_b2) gives rise to both Prox1/Afp+ hepatoblasts (e_b5) and Prox1/Sox17/Pdx1+ hepatopancreatic biliary progenitors (e_b7) (Supplementary Fig. 7), consistent with in vivo lineage tracing experiments28,29.

Overall the DE trajectories (Fig. 3b, f) inferred by the single-cell transcriptomes are consistent with experimentally determined fate maps28,29,30, demonstrating the robustness of our analysis and suggesting that the SM trajectories (Fig. 3a, e), which previously have not been well-defined, may also represent lineage relationships. Having said that we caution that cells with similar transcriptomes may not necessarily be lineage-related. Indeed, there are cases where cells from different lineages such as ventral and dorsal pancreas can converge on similar transcriptional profiles. Thus our results establish a theoretical framework for future experimental analysis of foregut mesenchyme development.

Coordinated development of endoderm and mesoderm progenitors

A close examination of the DE and SM trajectories suggests the coordinated development of a number of cell populations within adjacent endoderm and mesoderm tissue layers. For example, at E8.5 the DE lateral foregut cells (e_a2) and the spatially neighboring SM cells (m_a0) both express the TF Osr1, and the trajectories predict that these two cell populations are progenitors, giving rise to the respiratory, esophageal, gastric epithelium, and mesenchyme, respectively (Fig. 4a, b). As development proceeds, different cell populations appear to be segregated as they progressively express distinct lineage regulating TFs and growth factors (Fig. 4a–d). In situ validation confirmed that Osr1 is expressed in both the epithelium and mesenchyme of the presumptive esophagus, lung and stomach at E9.5 (Fig. 4e–g). These data suggest that Osr1+ multipotent progenitors exist in both the endoderm and mesoderm lineages which contribute to the same organs and that some mechanism must coordinate this parallel expression pattern.

Fig. 4: Coordinated development of endoderm and mesoderm progenitor populations.
figure 4

a, b Graphical illustration of the esophageal–respiratory–gastric cell-state trajectories for a SM and b DE with key marker genes. This suggests the coordinated development of Osr1+ multi-lineage progenitors. c, d SRPING plots of c SM and d DE projecting the normalized expression of key genes in each cell. e In situ hybridization of Osr1 in dissected foregut, showing Osr1 is expressed in the respiratory, esophageal and gastric regions. Scale bar: 100 μm. n = 3/3 embryos. f, g In situ hybridization of Osr1 in sections across the respiratory and gastric regions within the foregut, showing that Osr1 is expressed in both the endodermal and mesenchymal cells. Scale bar: 200 μm. n = 3/3 embryos. h SPRING plot of the DE esophageal–respiratory lineages. i, j Normalized Nkx2-1 (i) and Sox2 (j) transcript levels in each cell projected onto the SPRING plot, showing co-expression at the esophageal–tracheal boundary. k Sox2 and Nkx2-1 whole-mount immunostaining of a E9.5 mouse foregut. Scale bar: 50 μm. n = 3/3 embryos. l Sox2, Nkx2-1, and Foxf1 immunostaining of a transverse E9.5 foregut section, confirming a rare population of Sox2/Nkx2-1 co-expressing cells. Panel to the right is a magnification of box in (l). Scale bar: 50 μm. n = 3/3 embryos.

Furthermore, a close examination of the DE trachea cluster suggested a transitional cell population co-expressing the respiratory maker Nkx2-1 and the esophageal marker Sox2 at E9.5 when the foregut is being patterned along the dorsal–ventral axis (Fig. 4h–j). Immunostaining confirmed that this was indeed a rare Nkx2-1/Sox2+ cell population at the prospective tracheal–esophageal boundary (Fig. 4k, l), which recent studies have demonstrated to be critical in tracheoesophageal morphogenesis31,32. In sum, the foregut lineage trajectories predicted from the single-cell transcriptomes represent a valuable resource for further studies.

Predicting a signaling roadmap of organ induction

We next sought to computationally predict the paracrine signaling microenvironment in the foregut that controls these cell fate decisions (Fig. 5a, b). We calculated metagene expression profiles for all the ligands, receptors, and context-independent response genes in each DE and SM cluster for six major signaling pathways implicated in organogenesis: BMP, FGF, Hedgehog (HH), Notch, retinoic acid (RA), and canonical Wnt (see Methods, Supplementary Fig. 8 and Supplementary Data 2). Leveraging our spatial map of each cell population in the foregut (Fig. 1i, j) we ordered cell populations along the A-P axis such that DE and SM cell types most likely to be in direct contact were opposite one another in the signaling diagram (Fig. 5c). We then used the metagene expression levels to predict potential ligand-receptor pairs and the likelihood that a given cell population was responding to local paracrine or autocrine signals (Fig. 5a–c, Supplementary Fig. 9). We benchmarked the metagene expression thresholds on experimentally validated interactions from the literature. Also we limited potential ligand-receptor pairings to nearby cell clusters, consistent with the generally accepted view that these pathways act over a relatively short range. Together this analysis revealed a hypothetical combinatorial signaling network (Fig. 5a–c, Supplementary Fig. 9).

Fig. 5: Computationally inferred receptor–ligand interactions predict a signaling roadmap of foregut organogenesis.
figure 5

a, b E9.5 foregut immunostaining of Cdh1 (epithelium) and Foxf1 (mesenchyme) in a whole-mount (same image as Fig. 1d) and b section, showing the epithelial–mesenchymal tissue microenvironment (dashed circle). Scale bars: 100 μm. n = 4/4 embryos. c Predicted receptor–ligand interactions between adjacent foregut cell populations. The schematics show paracrine signaling between the DE (yellow cells) and the SM (brown cells) for six major pathways. E9.5 DE and SM cell clusters are ordered along the anterior to posterior axis based to their locations in vivo, with spatially adjacent DE and SM cell types across from one another. Color and size of each node represents the normalized average signaling-response–metagene expression level scaled from 2 to −2 and the % of expressing cell in each cluster, predicting the likelihood that a given cell population is responding to the growth factor signal. Thin vertical lines next to clusters indicate different cell populations in spatial proximity that are all responding to the same signal pathway. Arrows represent the predicted paracrine and autocrine receptor–ligand interactions (see Methods). d BMP-response–metagene expression levels projected on the DE and SM SPRING plot, colored by normalized scaled expression in each cell. e In situ hybridization of Bmp4 in a foregut transverse section, showing the expression of in the respiratory mesenchyme and the stm. n = 8/8 embryos. Scale bar: 100 μm. f, g pSmad1 immunostaining in foregut transverse sections, indicating BMP signal response in the respiratory and liver DE and SM. n = 3/3 embryos. Scale bar: 100 μm. h, i Signaling roadmap summarizing the inferred signaling state of all 6 pathways projected on the h DE and i SM cell-state trees suggests the combinatorial signals predicted to control lineage diversification. The letters indicated the putative signals at each step, with larger font indicating a stronger signaling response. a, anterior; p, posterior; hp, hepatopancreatic; stm, septum transversum mesenchyme.

Overall the computational predictions are consistent with known expression patterns of ligands and receptors, and identified most known signaling interactions controlling DE lineage specification. These include mesoderm-derived BMP, FGF, and Wnt promoting DE liver and lung fate, and autocrine notch signaling in the DE endocrine pancreas1,2,33,34. This suggested that previously undefined SM signaling predictions are also likely to be accurate. To test this, we examined BMP signaling as an example. Consistent with the scRNA-seq data, in situ hybridization confirmed high levels of Bmp4 ligand expression the stm and the respiratory mesenchyme, while immunostaining for phospho-Smad1/5/8, the cellular effector of BMP signaling, confirmed autocrine and paracrine signaling in the developing liver and respiratory mesenchyme and epithelium, respectively, as predicted (Fig. 5e–g).

We projected the signaling-response–metagene expression levels onto the SPRING plots and cell-state tree which revealed spatiotemporally dynamic signaling domains that correlated with cell lineages (Fig. 5d, Supplementary Fig. 10). In general, the transcriptome data predicts locally restricted interactions, with the SM being the primary source of BMP, FGF, RA, and Wnt ligands, signaling to both the adjacent DE and within the SM itself (Fig. 5c). In contrast, HH ligands are produced by the DE and signal to the gut tube SM, with no evidence of autocrine activity in the DE (Fig. 5c). Combining the data for all six signaling pathways onto the cell-state trees, we generated a comprehensive roadmap of the combinatorial signals predicted to coordinate the temporal and spatial development of each DE and SM lineage (Fig. 5h, i). This analysis predicts a number of previously unappreciated signaling interactions and represent a hypothesis generating resource for further experimental validation.

Epithelial hedgehog signaling patterns foregut mesenchyme

To genetically test the predictive value of the signaling roadmap, we focused on HH activity, which is suggested by the scRNA-seq to be high in the gut tube SM (esophagus, respiratory, stomach, and duodenum) but low in the pharyngeal and liver SM (Fig. 6a–c). HH ligands stimulate the activation of Gli2 and Gli3 TFs, which in turn promote the transcription of HH-target genes (e.g. Gli1)35. As expected, mouse embryo sections confirmed that Shh ligand was expressed in the gut tube DE with high levels of Gli1-LacZ expression in the adjacent SM. By contrast, the hepatic endoderm did not express Shh36 and the hepatic SM had very few if any Gli1-LacZ positive cells (Fig. 6d). To define the function of HH in SM patterning, we performed bulk RNA-seq on foreguts from Gli2−/−;Gli3−/− double mutant embryos, which lack all HH activity and fail to specify respiratory fate34. Comparing homozygous mutants to heterozygous littermates, we identified 156 HH/Gli-regulated transcripts (Fig. 6e; Supplementary Data 3). Given the caveat that this bulk RNA sequencing is performed with both endoderm and mesoderm, we examined the enrichment of these HH-regulated transcripts in the transcriptome of DE and SM single-cell clusters. This revealed that most transcripts were expressed the SM compared to the DE. Importantly, transcripts downregulated in Gli2/3-mutants (n = 80) were normally enriched in the gut tube SM, whereas upregulated transcripts (n = 76) were normally enriched in the liver or pharyngeal SM (Fig. 6e–g). Interestingly HH/Gli-regulated transcripts, including downregulated TFs (Osr1, Tbx4/5, and Foxf1/2) and upregulated TFs (Tbx18, Lhx2, and Wt1), have been implicated in respiratory and hepatic development, respectively (Fig. 6e; Supplementary Data 3)37. This genetic analysis confirmed the predictive value of the signaling roadmap where differential HH activity promotes gut tube versus liver and pharyngeal SM (Fig. 5i), in part by regulating other lineage specifying TFs and signaling proteins.

Fig. 6: Genetic test of the signaling roadmap revealed that HH promotes gut tube versus liver mesenchyme.
figure 6

a, b SPRING visualization of a the HH ligand–metagene expression in DE cells and b HH-response–metagene expression in SM cells. The color scale shows the normalized expression in each cell. c The normalized HH response–metagene expression projected onto the SM cell-state tree, showing low HH activity in the liver and pharynx SM but high activity in the gut tube mesenchyme. d Shh is expressed in the gut tube epithelium but not in the hepatic epithelium (outlined). n = 5/5 embryos. Gli1-lacZ, a HH-response transgene, is active in the gut tube mesenchyme but not in the liver stm. n = 3/3 embryos. Scale bars: 100 μm. e. Differentially expressed genes between Gli2−/−, Gli3−/− and Gli2+/−, Gi3+/− mouse E9.5 foreguts through bulk RNA sequencing (log2 FC > 1, FDR < 5%). n = 3 embryos/genotype. f Heatmap showing the min–max row normalized average expression of HH/Gli-regulated genes (from e) in E9.5 DE and SM single-cell clusters. g Gene set enrichment analysis (GSEA) reveals specific cell-type enrichment of HH/Gli-regulated genes. h Schematic of HH activity in the foregut.

Our data, together with previous work, suggested a model where the reciprocal epithelial–mesenchymal signaling network coordinates DE and SM lineages during organogenesis. In this model, SM-derived RA induces a regionally restricted expression of Shh in the DE by E9.034, which then signals back to the SM, establishing broad pharynx, gut tube and liver domains. Other SM ligands (BMP, FGF, Notch, RA, and Wnt), with distinct regional expression in these three broad domains, then progressively subdivide DE and SM progenitors in a coordinated manner. In the future it will be important to test this model by cell-specific genetic manipulations.

Differentiation of splanchnic mesenchyme-like lineages from human PSCs

We next tested whether our SM markers and signaling roadmap could be used to direct the differentiation of distinct SM subtypes from human pluripotent stem cells (hPSC), which to date have been elusive. Previous studies have established protocols to differentiate hPSC into lateral plate mesoderm (lpm) and cardiac tissue38. Although both the SM and heart are derived from the lpm, the single-cell data suggested that in the mouse, the early SM experiences more RA signaling than the early cardiac mesoderm. This was confirmed by RA-responsive RARE:lacZ transgene expression in E8.5 embryos (Supplementary Fig. 11a, b). Accordingly, addition of RA to the lpm differentiation media on days (d) 2–4 downregulated the cardiac markers NKX2-5, ISL1, and TBX20 and promoted expression of the SM markers FOXF1, HOXA1, HOXA5, and WNT2 (Fig. 7b, Supplemental Fig. 11c, d). This is consistent with the mouse scRNA-seq data which shows that E8.5 SM expresses Nkx2-5, Isl1, and Tbx20, at lower levels than the cardiac mesoderm. Examination of PAX3, PRRX1, and CD31 confirmed that the d4 SM cultures did not express significant levels of endothelial, somatic, or limb mesenchyme markers (Supplemental Fig. S11c).

Fig. 7: Generation of splanchnic mesoderm-like progenitors from human PSCs.
figure 7

a Schematic of the protocol to differentiate hPSCs into SM subtypes. Factors in red indicate signals predicted from the mouse single-cell signaling roadmap. b RT-PCR of markers with enriched expression in specific SM subtypes based on the mouse single-cell data: cardiac (NKX2-5), early SM (FOXF1, HOXA1); liver stm/mesothelium (WT1, UKP1B), liver-fibroblast (MSX1), respiratory SM (NKX6-1+, MSC−), esophageal/gastric (MSC, BARX1). The histogram shows the means ± S.D. Statistical significance was calculated using a two-sided Tukey’s multiple comparisons test. *p < 0.05, **p < 0.005, ***p < 0.0005. Exact p-values were provided in Source data file. n = 3 independent biological samples. Similar results were obtained from five independent experiments. c Representative images of Day 7 cell cultures immunostaining. Similar results were obtained from three independent experiments. Scale bar; 50 μm (upper panels), 10 μm (lower panels). d Quantification of % cells positive for the indicated immunostaining or RNA-scope in situ hybridization. Histograms show the means ± S.D. n = 3 independent fields. Immunostaining quantification results were similar for two separate experiments and RNA-scope validation was performed once. Statistical significance was calculated using two-sided Dunnett’s multiple comparisons test. *p < 0.05, **p < 0.005, ***p < 0.0005. Exact p-values were provided in Source data file. ns; not significant, nt; not tested. Source data are provided as a Source data file.

We next treated the primitive SM with different combinations of HH, RA, Wnt, and BMP agonists or antagonist from d4 to d7 (Fig. 7a), to drive organ-specific SM-like lineages based on the roadmap. As predicted, the HH agonist promoted gut tube identity and efficiently blocked the hepatic fate. In the HH treated cultures, addition of RA and BMP4 (RA/BMP4) followed by WNT on d6–7 promoted gene expression consistent with respiratory mesenchyme (NKX6-1, TBX5, and WNT2) with low levels of esophageal, gastric or hepatic markers. In contrast, addition of RA and BMP4-antagoist on d6–7 promoted an esophageal/gastric-like identity (MSC, BARX1, WNT4, and NKX3-2) (Fig. 7b, c and Supplementary Fig. 12d). In the absence of HH agonist, cells treated with RA/BMP had a gene expression profile similar to liver stm and mesothelium (WT1, TBX18, LHX2, and UPK1B), whereas RA/BMP4/WNT treated cells expressed liver-fibroblast markers (MSX1/2 and HAND1). Immunostaining and RNA-scope confirmed the RT-PCR analysis (Fig. 7c, d and Supplementary Fig. 12a-c) showing that ~70–80% cells in the liver stm/mesothelium-like cultures were WT1+, MSX1−, NKX6-1−, whereas the other populations appear to be around 30–40%. The remainder of cells appeared to be undifferentiated rather than an alternative lineage. While future work is needed to optimize the culture conditions for each lineage, these data provided a proof of principle that the signaling roadmap inferred from the mouse scRNA-seq data can be used to direct the differentiation of different organ-specific SM subtypes from hPSC.

Discussion

We have used single-cell transcriptomics to define the complexity of DE and SM cell types in the embryonic mouse foregut over the first 24 h of organogenesis as the primitive gut tube is subdivided into distinct organ domains. Our analysis identifies a diversity of distinct cell types in the early foregut mesenchyme, defined by marker genes and a combinatorial code of transcription factors. Cell trajectories indicate that the development of organ-specific DE and SM is closely coordinated, suggesting a tightly regulated signaling network. We computationally predicted a putative ligand-receptor signaling network of the reciprocal epithelial–mesenchymal interactions that are likely to coordinate lineage specification of the two tissue compartments. This study represents a valuable resource for further experimental examination of foregut organogenesis and the data can be explored via the interactive website https://research.cchmc.org/ZornLab-singlecell.

Prior studies of SM regional identity in the early embryo have been limited. Besides well-known regionalization of Hox gene expression, most studies have largely focused on individual organs such as the gastric or pulmonary mesenchyme9,39,40. By comparing single-cell transcriptomes across the entire foregut, we show an extensive regionalization of the early SM into distinct organ-specific mesenchyme subtypes. It is possible that the divergent transcription signatures of early SM cell types are only transiently utilized to define position and molecular programs during fetal organogenesis and that after organ fate is determined, different SM cell types may converge on similar differentiation programs such as smooth muscle or fibroblasts, which are common in all visceral organs. However, our results of fetal SM diversification are interesting in light of the emerging idea of organ-specific stromal cells in adults, such as hepatic versus pancreatic stellate cells and pulmonary specific fibroblasts41,42. For example, Tbx4 is expressed in embryonic respiratory SM and later is specifically maintained in adult pulmonary fibroblasts but not in fibroblast of other organs42. Future integrated analyses of our data with other scRNA-seq dataset from later fetal and adult organs43,44 should resolve how transcriptional programs evolve during cellular differentiation, homeostasis, and pathogenesis.

Interestingly, the liver bud contained more distinct SM cell states than any other organ primordia with the septum transversum mesenchyme (stm), sinus venosus, two mesothelium, and a fibroblast population. This may be due to the fact that unlike other GI organs that form by epithelium evagination, the hepatic endoderm delaminates and invades the adjacent stm, a process that may require more complex epithelial–mesenchymal interactions with the extracellular matrix. Our transcriptome analysis is consistent with lineage tracing experiment showing that the early stm gives rise to the mesothelium, hepatic stellate cells, stromal fibroblasts, and perivascular smooth muscle45. It will be important to determine if other organ buds have a similar elaboration of cell types as they differentiate. Alternatively, mesothelium and fibroblast that originate in the liver may migrate to other organ buds. Indeed, mesenchymal cell movement is one confounding limitation of our study and there is good evidence that mesothelium of the liver bud, also known as the proepicardium migrates to surround the heart and lungs46,47. Going forward, it will be important to use technologies that couple lineage tracing with single-cell transcriptomics and live imaging48 to resolve these important questions.

The foregut SM and the cardiac mesoderm are closely related, both arising from the anterior lateral plate mesoderm10,11. A preliminary cross-comparison of our data with recent single-cell RNA-seq studies of the early heart suggests that this common origin is reflected in the transcriptomes12. The developing heart tube is contiguous with the ventral foregut SM (also known as the second heart field), with the arterial pole attached to the pharyngeal SM and the venous pole attached to the lung/liver SM. Fate mapping studies indicate that the second heart field gives rise to heart tissue as well as pharyngeal SM, respiratory SM, and pulmonary vasculature10,49. Indeed, our single-cell transcriptomics and genetic analysis of Gli mutants are consistent with studies indicating that the epithelium-derived HH signals are critical for the development of these cardio-pulmonary progenitors49,50. How the SHF is subsequently segregated into different cardiac and SM lineages is unclear but could be addressed by an integration of our data with other cardiac centric studies.

One important outcome of our study was to use the signaling roadmap inferred from the single-cell transcriptomics to direct the development of hPSCs into different SM-like cell types, which to date have been elusive. While more work is needed to optimize the purity of the cell populations and to determine the differentiation potential of each of these cell populations, this system provides a unique opportunity to model human fetal mesenchyme development and to interrogate how combinatorial signaling pathways direct parallel mesenchymal fate choices. These hPSC-derived SM-like tissue may also have important applications for tissue engineering. To date, most hPSC-derived foregut organoids (gastric, esophageal, and pulmonary) tend to lack mesenchyme, unlike hindgut-derived intestinal organoids. This is because the differentiation protocol needed to make foregut epithelium is not compatible with mesenchymal development. The protocols we have defined here should enable the recombination of DE and SM organoids, an important step toward engineering complex foregut tissue for regenerative medicine. Looking forward, our approach of computational inferring signal interactions coupled with manual curation and a deep understanding of tissue anatomy could be used to predict cell–cell interactions in other organ systems or those that drive pathological states such as the cancer niche.

Methods

Mouse embryo collection and single-cell dissociation

All mouse experiments were performed in accordance with protocols approved by the Cincinnati Children’s Hospital Medical Center Institutional Animal Care and Use Committee (IACUC). Mice were housed at 72°F, 30–70% humidity and a 14/10 h light/dark cycle. No statistical sample size estimates were performed prior to the experiment, sufficient embryos to generate the material need for the experiments were used. No randomization was utilized as no particular treatment was performed in different groups. Gli2+/−;Gli3+/− mice34 were maintained on a mixed genetic background and embryos were harvested at embryonic day 9.5. Embryo sex was not determined. For the single-cell analysis timed matings were set up between C57BL/6J mice and the day when a plug was detected was considered embryonic day 0.5. Staging was validated by counting somite (s) numbers E8.5 (5–10 s), E9.0 (12–15 s), and E9.5 (25–30 s) (Fig. 1a, b). Embryo images were collected on Nikon confocal microscope (https://www.microscope.healthcare.nikon.com/products/confocal-microscopes) or a Leica MZ10F with software Leica Application Suite version 4.9. We microdissected the foregut between the posterior pharynx and the midgut, removing most of the heart and paraxial tissue and excluded the thyroid. At E9.5, we isolated anterior and posterior regions separately, containing lung/esophagus and liver/pancreas primordia, respectively. We pooled dissected foregut tissue from 16, 20, 18, and 15 embryos for E8.5, E9.0, and E9.5 anterior and E9.5 posterior, respectively, isolated from 2 to 3 litters.

Single-cell dissociation by cold active protease protocol was performed as described previously51. Rapidly dissected C57BL/6J mouse embryo tissues were transferred to ice-cold PBS with 5 mM CaCl2, 10 mg/ml of Bacillus Licheniformis protease (Sigma), and 125 U/ml DNAse (Qiagen), and incubated on ice with mixing by pipet. After 7 min, single-cell dissociation was confirmed with microscope. Cells were then transferred to a 15 ml conical tube, and 3 ml ice-cold PBS with 10% FBS (FBS/PBS) was added. Cells were pelleted (1200G for 5 min) and resuspended in 2 ml PBS/FBS. Cells were washed three times in 5 ml PBS/0.01%BSA (PBS/BSA) and resuspended in a final cell concentration of 100,000 cells/ml for scRNA-seq. Single-cell suspensions of each stage were loaded onto the Chromium Single Cell Controller instrument (10x Genomics) to generate single-cell gel beads in emulsion. Single-cell RNA-Seq libraries for high-throughput sequencing were prepared using the Chromium Single Cell 5′ Library and Gel Bead Kit (10x Genomics). All samples were multiplexed together and sequenced in an Illumina HiSeq 2500. The technician was blinded during the RNA extraction, library preparation, and sequencing.

Immunofluorescence staining and in situ hybridization

Mouse embryos were harvested at indicated stages and fixed in 4% paraformaldehyde (PFA) at 4 °C for overnight. The fixed samples were washed with PBS for 10 min three times. Either whole-mount tissues or sectioned tissues were used for subsequent procedures. To obtain tissue sections, fixed embryos were immersed in 30% sucrose/PBS overnight, embedded in OCT, cryosectioned (12 μm) onto SuperfrostTM Plus slides. For hPSC culture, cells were differentiated on Geltrex-coated u-Slide 8 well (80826, ibid) and fixed in 4% PFA at room temperature for 30 min. Cells were dehydrated with ethanol gradient and stored in 100% ethanol at −20 ˚C.

For immunofluorescence staining, samples were perforated with 0.5% Triton X-100/PBS, and then incubated with 5% normal donkey serum. Tissues were incubated with primary antibodies include Cdh1 (1:1000), Foxa2 (1:500), Foxf1 (1:500), Nkx2-1 (1:200), Nkx6-1 (1:100), Wt1 (1:300), Nkx2-5 (1:200), Sox2 (1:200), Nkx2-5 (1:200), pSmad1/5/8 (500), or B-gal (1:1000) (details in Supplementary Table 1). Next day, cells were washed with PBS, and then incubated with fluorescently conjugated secondary antibodies for visualization using Nikon Confocal Fluorescent Microscopes.

For standard in situ hybridization, tissues were treated with Proteinase K, and incubated with various Digoxigenin (DIG)-tagged probes complementary to different mRNAs. Samples were then incubated with an anti-DIG antibody conjugated with alkaline phosphatase. BM Purple was used for the chromogenic reaction. RNA-scope fluorescent in situ hybridization was conducted with RNA-scope Multiplex Fluorescent Detection Reagents V2 (323110, Advanced Cell Diagnostics, Inc.) and Opal fluorophore (Akoya Biosciences) according to manufacturer’s instructions. Detail procedures were listed in Supplementary Table 2.

Pre-processing 10x Genomics raw scRNA-seq data

Raw scRNA-Seq data was processed using CellRanger (v2.0.0 10x Genomics) (https://github.com/10XGenomics/cellranger). Reads were aligned to mouse genome [mm10] to produce counts of genes across barcodes. Barcodes with less than ~5k UMI counts were not included in downstream analysis. Percentage of reads mapped to transcriptome was ~70% each sample. The resulting data comprised 9748 cells in E8.5, 9265 cells in E9.0, 7208 cells in E9.5 anterior samples, and 5085 cells in E9.5 posterior samples.

Quality control and cell clustering

Quality control (QC), and clustering was performed using Seurat [v2.3.4] package in R52,53. Basic filtering was carried out where all genes expressed ≥3 cells and all cells with at least 100 detected genes were included. QC was based on nGene and percent.mito parameters to remove the multiplets and cells with high mitochondrial gene expression. After filtering 9748, 9265, and 12,255 cells were retained in E8.5, E9.0, and E9.5 samples, respectively. Global scaling was used to normalize counts across all cells in each sample [scale factor: 10,000] and cell cycle effect was removed by regressing out difference between S phase and G2M phase from normalized data using default parameters. We first clustered each developmental stage separately to identify major cell lineages. Approximately 1500 highly variable genes (HVG) across each population were selected by marking outliers from dispersion vs. avgExp plot. PCA was performed using HVG, and the first 20 principal components were used for cells clustering, which then was visualized using t-distributed stochastic neighbor embedding (tSNE). Marker genes defining each cluster were identified using ‘FindAllMarkers’ function (Wilcoxon rank-sum test) in Seurat and these were used to annotate clusters based on well-known cell-type specific genes. Data wrangling was performed using perl [v5.18], R [v3.4.4], and python [v2.7].

Cells from all three time points were integrated with Seurat (v3.0) using a diagonalized canonical correlation analysis (CCA) to reduce the dimensionality of the datasets followed by L2-normalization of canonical correlation vectors (CCV). Finally, mutual nearest neighbors (MNN) were obtained which also are referred as integration anchors (cell pairs) to integrate the cells. First 30 CCs (canonical correlation components) were used for clustering and non-linear dimension reduction approaches (UMAP and tSNE) were used to reduce the dimensions and visualize cells in two dimensions.

In silico selection and clustering of DE and SM lineages

Definitive endoderm (DE) clusters (4448 cells) were defined by the co-expression of Foxa1/2, Cdh1, and/or Epcam, whereas the splanchnic SM (10,097 cells) were defined by co-expression of Foxf1, Vim, and/or Pdgfra as well as being negative for cardiac, somatic, and paraxial mesoderm specific transcripts. Cells from DE and SM clusters were extracted from each time point and re-clustered using Seurat [v2.3.4] to define lineage subtypes. Prior to re-clustering blood, mitochondrial, ribosomal, and strain-dependent noncoding RNA genes were regressed from the data. Dimensionality reduction, clustering, and marker prediction steps were performed as described above for each stage. DE and SM cell subtypes were annotated by manual curation comparing the cluster marker genes with over 300 published expression profiles in the MGI database20 and our own gene expression validations. DE and SM clusters from all three time points analyzed together using Seurat (v3.0) integration approach explained above, respectively.

Transcription factor code for DE and SM lineages

To identify TFs with enriched expression specific to different DE and SM cell types ‘FindAllMarkers’ function in Seurat [v3.0] was utilized on set of 1623 TFs expressed in the mouse genome [AnimalTFDB]54. Raw counts of TFs were normalized and scaled in Seurat [v3.0]. Cells in clusters served as replicates in finding marker TFs for each lineage. Wilcoxon rank-sum test was used in identifying marker TFs. Top 5 marker TFs were then visualized using DimHeatmap function in Seurat (v3.0)

Pseudotime analysis of cell population spatial organization

To examine whether pseudotime analysis could inform the spatial organization of cells in the continuous sheet of tissue DE or SM tissue a pseudotime analysis was performed using URD [v1.0]22. First, in order to calculate pseudotime, transition probabilities were calculated for DE and SM cells at each stage using diffusion maps. Then the calcDM function was used to generate diffusion map components and the first eight components were used to calculate transition probabilities among cells. To calculate pseudotime root cells were fixed to the most anterior clusters based on manual annotation. Starting from root cells a probabilistic breadth-first graph search using transition probabilities was performed until all the cells in the graph have been visited. Multiple simulations were run and pseudotime equaled average iteration that visited each cell in the graph from the root cells. The floodPseudotime and floodPseudotimeProcess functions in URD were used to calculate the pseudotime density distribution which was plotted for each cluster/cell type using plotDists function. Density distribution of pseudotime, ordered clusters similarly to the manually curated order of cell types along the A-P axis.

SPRING analysis of cell trajectories

To examine cell trajectories across the three time points, we implemented SPRING [v1.0]27, which uses a k-nearest neighbors (KNN) graph (five nearest neighbors) to obtain force-directed layout of cells and their neighbors. To understand transcriptional change across cell states (lineages), the first 40 principle components (PC) were learnt from the latest time point E9.5, and this PC space was used to transform the entire dataset (E8.5, E9.0, and E9.5). This transformed data was used to generate a distance matrix which then was used to obtain the KNN graph using the default parameters.

Inferring cell-state trees by parent-child single-cell voting

To visualize the trajectories in a simple transcriptional cell-state tree, we used a parent-child single-cell voting approach based on the KNN classification algorithm. First, a normalized counts matrix was generated using the distinguishing marker genes from all DE or SM clusters as features at each stage. Marker genes were used as features to train KNN, during which the KNN learns the distance among cells in the training set based on the feature expression. Each cell was classified based on the Seurat cluster assignment. Cells of a later time point vote for their most likely parent cells in the earlier time point as follows: train KNN using E8.5 cells and test by E9.0 cells voting for E8.5 cells. KNN resulted in vote probability for each cell in E9.0 against each cluster in E8.5, which was subsequently averaged for each cluster in E9.0 against each cluster in E8.5. This approach was repeated with E9.5 cells voting for E9.0 parents. The average vote probability for a given cluster was tabulated, normalized for cluster size and represented as a % of total votes in a confusion matrices. The top winning votes linking later time points back to the preceding time point were displayed as a solid line on the tree. Prominent second choices with >60% of winning votes were reported on the tree as dashed lines. We also compared this vote probability with the confusion matrix resulting from the KNN to assess our transcriptional cell-state tree. In more than 99% cases, these two methods resulted in the same first and second choices, thereby validating deduced parent-child relationships.

We validate the cell-state tree assertions using pseudotime analysis, Monocle [v3.0.0]14 was deployed on individual lineages/cell states. tSNE was used for dimensionality reduction and principle graph was learnt using SimplePPT. All the other parameters were set to default.

Calculation of metagene profiles

For six of the major paracrine signaling pathways implicated in foregut organogenesis (BMP, FGF, HH, Notch RA, and canonical Wnt), we curated a list of all the well-established ligands, receptors and context-independent pathway response genes that were encoded in the mouse genome (Supplementary Data 2a —excel tab1). We then calculated a ligand–metagene, receptor–metagene, and response–metagene profiles by summing the normalized expression of each individual gene for each pathway (e.g., Wnt-ligand metagene = ΣWnt1 + Wnt2 + Wnt2b + Wnt3…Wnt10b expression) in each cell and cluster as follows:

Assuming that there are x genes in the gene set and n cells. Gene1 has (a1, a2,…an) counts, Gene2 has (b1, b2,…bn) counts and so on.

Step 1: Each gene’s counts were normalized using the gene’s max count across all DE and SM cells (n = 14,545 cells): Gene1_norm = (a1, a2,…an)/max(a1, a2,…an).

Step 2: Normalized counts of genes were summed up, for each cell, to generate a metagene_v1 with counts across cells: metagene_v1 = Gene1_norm + Gene2_norm + …+Genex_norm.

Presuming summed up counts are m1, m2,…mn.

Step 3: Summed counts of metagene_v1 were normalized by max counts of the metagene_v1, to create a metagene profile for each cell: MetaGene = (m1, m2,…mn)/max(m1, m2…mn).

The average Metagene expression profiles for ligands, receptors and response genes in each DE and SM cluster were then calculated in Seurat [v3.0] using ‘AverageExpression’ function. The average-expression profile of metagene across all DE and SM clusters were visualized as a Dotplot using Seurat (v3.0). Average expression of metagene expression profiles was scaled from −2 to 2 for Dotplot visualization (Supplementary Data 2).

Prediction of receptor–ligand interactions

A given cell type was scored to be expressing enough ligand to send a signal or enough receptor to respond to ligand if the average ligand–metagene or receptor–metagene expression level was ≥−1 and expressed in ≥25% of cells. (Except for the Notch ligand–metagene where expression threshold of ≥−1.5 was used due to low overall expression in all cells.) These thresholds empirically set to be conservative and benchmarked against experimentally validate signaling interactions in DE liver, lung and pancreas from the literature. Furthermore, we determined the likely hood that a given cell population was responding based on the context-independent pathway response–metagene expression level being ≥−1 and expressed in ≥25% of cells. Context-independent response genes are those genes that are known from the literature to be directly transcribed in most cell types that are responding to a ligand-receptor activation.

DE and SM cell clusters of each stage are ordered along the A-P axis consistent with the location of organ primordia in vivo with spatially adjacent DE and SM cell types across from one another in the diagram. To assign receptor–ligand interactions for each cell cluster we first determined if a given cluster was responding based on having response–metagene and receptor–metagene levels ≥−1 threshold. If the responding cluster also expressed the ligand–metagene level ≥−1, an autocrine signaling was established. For paracrine signaling, we then identified adjacent cell populations, within the same tissue layer and from the adjacent layer that expressed the ligand–metagene above the threshold and then established a receptor–ligand interaction (arrow). The signal strength was calculated as the sum of the ligand–metagene and the response–metagene values. If this value was ≥1, the signal was considered strong.

Comparison of bulk RNA-Seq vs scRNA-Seq

Foregut tissue was dissected from E9.5 double mutant Gli2−/−;Gli3−/− (n = 3) and Gli2+/−;Gli3+/− heterozygous litter mate controls (n = 3). Each dissected foregut was separately used for RNA extraction, library preparation and bulk RNA-Seq. These mice were of mixed strains, and the sex of the embryos were unknown. The CSBB [v3.0] (https://github.com/csbbcompbio/CSBB-v3.0) pipeline was used to aligned to mouse genome [mm10] and determine differentially expressed transcripts between the two gene types were obtained using RUVSeq (LogFC ≥ |1| and FDR ≤ 0.1). Differentially expressed (DE) genes were clustered using hierarchical clustering and visualized in Morpheus (https://software.broadinstitute.org/morpheus/) across samples. To compare our bulk analysis to scRNA-Seq, we visualized the expression of DE genes across cells in scClusters. We utilized the ‘DoHeatmap’ function in Seurat [v3.0.0]. Cells were arranged according to Anterior/Posterior axis position of their respective clusters and genes were ordered as returned from the clustering order obtained above. We also performed Gene Set Enrichment Analysis (GSEA) [v3.0]55 to examine statistical enrichment of the DE genes in the gut tube SM (respiratory, esophagus, gastric, duodenum), pharynx and liver SM clusters. Normalized counts of genes across cells and up/downregulated genes from bulk RNA sequencing were used as custom gene sets to perform the GSEA analysis.

Maintenance of PSCs

Two hPSC lines were used in this study: (1) WA01-H1 human embryonic stem cells purchased from WiCell (NIH approval number NIHhESC-10-0043 and NIHhESC-10-0062) and (2) human iPSC72_3 generated by the CCHMC Pluripotent Stem Cell Facility. Both cell lines have been authenticated as follows: (i) cell identity: via STR profiling by Genetica DNA Laboratory (a LabCorp brand; Burlington, NC), (ii) genetic stability: by standard metaphase spread and G-banded karyotype analysis in CCHMC Cytogenetics Laboratory, and (iii) functional pluripotency: cells were subjected to analysis of functional pluripotency by teratoma assay demonstrating ability to differentiate into each of the three germ layer. Both cell lines routinely tested negative for mycoplasma contamination. hPSC lines were maintained on feeder-free conditions in mTeSR1 medium (Stem Cell technologies, Vancouver, Canada) on six-well Nunclon surface plates (Nunc) coated with Geltrex (ThermoFisher Scientific) and maintained in mTESR1 media (Stem Cell Technologies) at 37 °C with 5% CO2. Cells were checked daily and differentiated cells were manually removed. Cells were passaged every 4 days using Dispase solution (ThermoFisher Scientific).

Differentiation of PSCs into mesenchyme

Partially confluent hPSCs were dissociated into very fine clumps in Accutase (Invitrogen) and passaged 1:18 onto new Geltrex-coated 24-well plates for immunocytochemistry and 12-well plates for RNA preparation in mTeSR1 with 1 μM thiazovivin (Tocris) (Day −1). Differentiation of hPSCs into lateral plate mesoderm was based on previously described methods38 as follows. A brief wash with DMEM/F12 was followed with Day 0 medium 30 ng/ml Activin A (Cell Guidance Systems) + 40 ng/ml BMP4 (R&D Systems) + 6 μM CHIR99021 (Tocris) + 20 ng/ml FGF2 (ThermoFisher Scientific) + 100 nM PIK90 (EMD Millipore) for 24 h. A basal media composed of Advanced DMEM/F12, N2, B27, 15 mM HEPES, 2 mM L-glutamine, penicillin-streptomycin is used for this and all subsequent differentiation. On Day 1, a brief wash with with DMEM/F12 is followed with Day 1 medium 1 μM A8301 (Tocris) + 30 ng/ml BMP4 + 1 μM C59 (Cellagen Technology) for 24 h. For cardiac mesoderm generation, cells are cultured in 1 μM A8301 + 30 ng/ml BMP4 + 1 μM C59 + 20 ng/ml FGF2 from Day 2 to Day 4 (medium changed every day). From Day 4, cells are cultured in 200 μg/ml 2-phospho-Ascorbic acid (Sigma) + 1 μM XAV939 (Sigma) + 30 ng/ml Bmp4 for 3 days. For splanchnic mesoderm generation, cells are cultured in 1 μM A8301 + 30 ng/ml BMP4 + 1 μM C59 + 20 ng/ml FGF2 + 2 μM RA (Sigma) from Day 2 to Day 4 (medium changed every day). To further direct regional splanchnic mesoderm, either 2 μM RA + 40 ng/ml BMP4 is used to promote STM fate for 3 days; 2 μM RA + 2 μM PMA (Tocris) is used for 2 days, and then 2 μM RA + 2 μM PMA + 100 ng/ml Noggin (R&D Systems) is used at the last 1 day to promote esophageal/gastric mesenchyme fate; 2 μM RA + 40 ng/ml BMP4 + 2 μM PMA is used for 2 days, and then 2 μM RA + 40 ng/ml BMP4 + 2 μM PMA + 1 μM CHIR99021 is used at the last 1 day to promote respiratory mesenchyme fate. Medium was changed every day. Similar results were obtained with WA01 hES cells and human iPSC 72_3.

Quantitative RT-PCR

Total RNA was prepared from differentiating human ES cells by using Nucleospin kit according to manufacturer’s protocol. Reverse transcription PCR was performed by Superscript VILO cDNA synthesis kit. QuantStudio 5 and 6 were used for qPCR analyses. Primers for qPCR were listed in Supplementary Table 1. Statistics were performed with PRISM8 (GraphPad Software). Significance was determined by one-way ANOVA, followed by Tukey’s test.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.