Single cell transcriptomics reveals a signaling roadmap coordinating endoderm and mesoderm lineage diversification during foregut organogenesis

Visceral organs, such as the lungs, stomach, liver and pancreas, derive from the fetal foregut through a series of inductive interactions between the definitive endoderm (DE) epithelium and the surrounding splanchnic mesoderm (SM) 1,2. This foregut patterning, which occurs between embryonic day (E) 8.5 and E9.5 in the mouse embryo, equivalent to 17-23 days of human gestation, defines the landscape of the thoracic cavity and disruptions in this process can lead to severe congenital defects. While patterning of the endoderm lineages has been fairly well studies, the SM which is known to provide many paracrine factors required for organogeneis is virtually unstudied 1,2. In particular we lack a comprehensive understanding of the molecular nature of SM regional identity, the mechanisms by which SM signaling boundaries are established, the role of the epithelium in SM patterning and how SM and DE lineages are dynamically coordinated during organogenesis. Here we used single cell transcriptomics to generate a high-resolution expression map of the embryonic mouse foregut. This uncovered an unexpected diversity in SM progenitors that developed in close register with the organ-specific epithelium. These data allowed us to infer a spatial and temporal signaling roadmap of the combinatorial endoderm-mesoderm interactions that orchestrate foregut organogenesis. We validated key predictions with mouse genetics, showing importance of epithelial signaling in mesoderm patterning. Finally, we leveraged the signaling road map to generate different SM subtypes from human pluripotent stem cells (hPSCs), which previously have been elusive.

Visceral organs, such as the lungs, stomach, liver and pancreas, derive from the fetal foregut through a series of inductive interactions between the definitive endoderm (DE) epithelium and the surrounding splanchnic mesoderm (SM) 1,2 . This foregut patterning, which occurs between embryonic day (E) 8.5 and E9.5 in the mouse embryo, equivalent to 17-23 days of human gestation, defines the landscape of the thoracic cavity and disruptions in this process can lead to severe congenital defects. While patterning of the endoderm lineages has been fairly well studies, the SM which is known to provide many paracrine factors required for organogeneis is virtually unstudied 1,2 . In particular we lack a comprehensive understanding of the molecular nature of SM regional identity, the mechanisms by which SM signaling boundaries are established, the role of the epithelium in SM patterning and how SM and DE lineages are dynamically coordinated during organogenesis. Here we used single cell transcriptomics to generate a high-resolution expression map of the embryonic mouse foregut. This uncovered an unexpected diversity in SM progenitors that developed in close register with the organ-specific epithelium.
These data allowed us to infer a spatial and temporal signaling roadmap of the combinatorial endoderm-mesoderm interactions that orchestrate foregut organogenesis.
We validated key predictions with mouse genetics, showing importance of epithelial signaling in mesoderm patterning. Finally, we leveraged the signaling road map to generate different SM subtypes from human pluripotent stem cells (hPSCs), which previously have been elusive.
Single cell transcriptomics allows examination of organogenesis at an unprecedented resolution [3][4][5][6] , however, to date studies have either taken a broad overview of many organ systems or focused only on the epithelial component of the developing gut [7][8][9] . To comprehensively characterize the signaling networks orchestrating foregut organogenesis, we performed single cell RNA sequencing (scRNA-seq) of the mouse embryonic foregut at three time points that span early patterning through organ lineage induction: Embryonic day (E) E8.5 (5-10 somites, 's'), E9.0 (12-15s) and E9.5 (25-30s) (Fig. 1a, b). We micro-dissected the foregut between the posterior pharynx and the midgut, pooling tissue from 15-20 embryos for each time point. At E9.5, we isolated anterior and posterior regions separately, containing lung/esophagus and liver/pancreas primordia, respectively. A total of 31,268 single-cell transcriptomes passed quality control measures with an average read depth of 3,178 transcripts/cell. Cells were clustered based on the expression of highly variable genes across the population and visualized using t-distributed stochastic neighbor embedding (t-SNE) ( Fig. 1c; Supplementary Fig. 1). This identified 9 major cell lineages: DE, SM, cardiac, other mesoderm (somatic and paraxial), endothelium, blood, ectoderm, neural crest and extraembryonic. DE clusters (4,448 cells) were characterized by coexpression of Foxa1/2, Cdh1 and/or Epcam, whereas SM (10,097 cells) was defined by coexpression of Foxf1 (Fig. 1d), Vim and/or Pdgfra as well as being negative for cardiac and other mesoderm specific transcripts.
At E8.5 and E9.0 the level of regional pattern and diversity of SM populations was surprisingly complex and mirrored that of the spatially restricted DE progenitor domains ( Supplementary Fig. S1). By E9.5 we identified 17 SM cell populations (Fig. 1f) Table S1). This revealed new lineage-restricted markers such as homeodomain TF Nkx6-1. Well known for its expression in the pancreatic endoderm at E10.5 13, 14 , Nkx6-1 was also specifically expressed in the respiratory and esophageal mesoderm at E9.5 (Fig. 1k). This TF code will facilitate lineage tracing experiments and prompt studies testing their role in mesenchymal differentiation.
Different organs form at precise locations along the anterior-posterior (A-P) axis of the gut.
To assess whether this was reflected in the single cell transcriptional profiles, we inferred the pseudo-spatial ordering of the cells by plotting the diffusion pseudo-time of each cluster (see Methods). Both the DE and SM cell populations were indeed appropriately ordered along the A-P axis (Fig. 1g-h; Supplementary Fig.1) 6 . E9.5 clusters from the anterior dissections were located in the anterior half of the pseudo-space continuum, compared to posterior tissue, confirming the robustness of the computational ordering. Finally, Hox genes are expressed in a co-linear fashion along the A-P axis and accordingly we observed a progressive increase in posterior Hox paralogs expression in more posterior clusters (not shown). Combining the pseudo-space analysis, MGI curations and in situ validation, we were able to map each DE and SM population to their approximate location in the gut tube (Fig. 1i, j; Supplementary Fig. 1). This revealed that the SM diversity mirrored DE lineages, indicating that they are closely coordinated from the very beginning of organogenesis.
The transcriptional cell state complexity of the DE and SM doubled in just 24 hours between E8.5 and E9.5, reflecting progenitors forming more specialized cell types. To examine the temporal dynamics of lineage diversification we visualized the single cell data using SPRING ( Fig. 2a, b), an algorithm that represents k-nearest neighbors in a force directed graph, facilitating analysis of developmental trajectories 15 16,17 . Paralleling the specialization in the liver epithelium, the trajectory allows mapping of five distinct liver SM-cells, consistent with the liver being the first organ to differentiate.
Overall the DE trajectories are consistent with experimentally determined fate maps [16][17][18] , demonstrating the robustness of our analysis and suggesting that the SM lineages, which have not been well defined, are also correctly. Interestingly the DE and SM trajectories were highly synchronized. For example, at E8.5 the DE lateral foregut population (e_a2) and the spatially neighboring SM population (m_a0) both express the TF Osr1, and the trajectories suggests they are multipotent progenitors, giving rise to respiratory, esophageal and gastric epithelium and mesenchyme, respectively ( Supplementary Fig. 2h, i). Furthermore, a close examination of the DE trachea cluster revealed a transitional cell population co-expressing Nkx2-1 and Sox2, which was identified to be a rare cell population at the prospective tracheal-esophageal boundary ( Fig.   2e-h). Together these analyses revealed novel progenitor populations and the close synchronization of DE and SM development.
We next sought to computationally predict the signaling microenvironment in the foregut that controls these cell fate decisions. We calculated metagene expression profiles for all the ligands, receptors and context-independent response genes in each DE and SM cluster for six major paracrine signaling pathways implicated in organogenesis: BMP, FGF, Hedgehog (HH), Notch, retinoic acid (RA) and canonical Wnt (Methods & Supplementary Fig. S3, Table S2).
Leveraging our spatial map of juxtaposed DE and SM populations in the foregut we then used the metagene expression levels to predict ligand-receptor pairs and the likelihood that a given cell population was responding to local paracrine or autocrine signals ( Fig. 3a-c, Supplementary Fig.   S4). In addition, we projected the signaling response-metagene expression levels onto the lineage trajectories which revealed spatiotemporally dynamic signaling domains that correlated with cell lineages (Fig. 3d, Supplementary Fig. S5). In general, the transcriptome data predicts local interactions, with the SM being the primary source of BMP, FGF, RA and Wnt ligands signaling to both the adjacent DE and within the SM itself (Fig. 3c). In contrast HH ligands are produced by the DE and signal to the gut tube SM, with no evidence of autocrine activity in the DE (Fig. 3c).
The computational predictions identified most known signaling interactions controlling DE lineage specification such as mesoderm derived BMP, FGF and Wnt promoting DE liver and lung fate, and autocrine notch signaling in the DE endocrine pancreas 1,2,19,20 . This suggesting that previously undefined SM signaling predictions are also likely to regulate cell fate. Combining the data for all six signaling pathways onto the cell state trees, we generated a comprehensive roadmap of the combinatorial signals predicted to coordinate the temporal and spatial development of each DE and SM lineage (Fig. 3e,f).
To genetically test the predictive value of the signaling roadmap, we focused on HH activity, which the scRNA-seq suggests is high in gut tube SM (esophagus, respiratory, stomach and duodenum) but low in the pharyngeal and liver SM (Fig. 4a-c). HH ligands stimulate the activation of Gli2 and Gli3 TFs, which in turn promote the transcription of HH-target genes (e.g. 21  Interestingly HH/Gli-regulated transcripts encoded, BMP, FGF and Wnt signaling components, as well as 28 TFs, including downregulated TFs (Osr1, Tbx4/5, Foxf1/2) and upregulated TFs (Tbx18, Lhx2 and Wt1) that have been implicated in respiratory and liver development respectively (Fig. 3e; supplementary Table S3) 23 . This genetic analysis confirmed the predictive value of the signaling roadmap where differential HH activity promotes gut tube versus liver and pharyngeal SM, in part by regulating other lineage specifying TFs and signaling proteins. Most previous efforts to direct differentiation of human PSCs into specific cell and tissue types have depended on combining data from many published papers describing how individual pathways control each developmental step. Here, in one study, using single cell transcriptomics we inferred a roadmap for the entire foregut revealing an unexpected diversity of early mesenchymal lineages that are synchronized with the epithelial identity. Moreover, this roadmap predicts the combinatorial signaling microenvironment orchestrating foregut organogenesis which we leveraged to direct the formation of organ-specific mesenchyme from hPSCs; an important step towards engineering complex foregut tissue for regenerative medicine. A similar approach could be used to identify cell-cell interactions in other organ systems or those that drive pathological states such as the cancer niche.

Embryo collection and single cell dissociation
All mouse experiments were performed in accordance with protocols approved by the Cincinnati Children's Hospital Medical Center Institutional Animal Care and Use Committee (IACUC). No statistical sample size estimates were performed prior to the experiment, sufficient embryos to generate the material need for the experiments were used. No randomization was utilized as no particular treatment was performed in different groups. Timed matings were set up between C57BL/6J mice and the day where a plug was detected was considered embryonic day 0.5.
Single cell dissociation by cold active protease protocol was performed as described previously 25 . Rapidly dissected C57BL/6J mouse embryo tissues were transferred to ice-cold PBS

Immunofluorescence staining and whole mount in situ for dissected foreguts
Mouse embryos were harvested at indicated stages and fixed in 4% paraformaldehyde (PFA) at 4°C for overnight. The fixed samples were washed with PBS 10 min 3 times and the foreguts were micro-dissected when indicated. Embryos or dissected foreguts were then processed as described previously 26  10000] and cell cycle effect was removed by regressing out difference between S phase and G2M phase from normalized data using default parameters. We first clustered each developmental stage separately to identify major cell lineages. Approximately 1500 Highly variable genes (HVG) across each population were selected by marking outliers from dispersion vs. avgExp plot. PCA was performed using HVG, and the first 20 Principal Components were used for cells clustering, which then was visualized using t-distributed stochastic neighbor embedding (tSNE). Marker genes defining each cluster were identified using 'FindAllMarkers' function (Wilcoxon Rank Sum Test) in Seurat and these were used to annotate clusters based on well-known cell type specific genes. Canonical correlation analysis (CCA) was used to visualize all of the cells from all stages together. Briefly, the HVGs from latest stage [E9.5] and 30 CCs (canonical correlation components) were used for multi-CCA (integration analysis) and clustering was performed with default parameters.

In silico selection and clustering for definitive endoderm and splanchnic mesenchyme
Definitive Endoderm (DE) clusters (4,448 cells) were defined by the co-expression of Foxa1/2, Cdh1 and/or Epcam, whereas the splanchnic SM (10,097 cells) were defined by co-expression of .Cells in clusters served as replicates in finding differential TFs for each lineage. Wilcoxon rank sum test was used in identifying differential TFs.

Pseudo-space ordering of cells
To computationally predict the ordering of cells along the anterior-poster axis, the density of marker gene expression in each cluster was plotted against pseudotime using URD [v1.0] 30 .
The most anterior pharyngeal cluster was set as the root of X-axis of the pseudotime continuum. In order to calculate pseudotime, URD first obtains transition probabilities between cells to construct a diffusion map. Diffusion maps for several pair of dimensions help understanding the structure of differentiation. Next, the most anterior pharyngeal cluster was set as the root [cells in this cluster served as start point] and then a probabilistic breadth-first graph search using transition probabilities was carried out. Breadth-first graph moves stepwise outward from root cells, until all the cells in the graph have been traversed through. Lastly, pseudotime equals average iteration that visited each cell, after running several simulations.

Analysis of cell trajectories
To examine cell trajectories across time, we implemented SPRING [v1.0] 15  Prominent second choices with >60% of winning votes were reported on the tree as dashed lines. We also compared this vote probability with the confusion matrix resulting from the SPRING KNN to assess our transcriptional cell-state tree. In more than 99% cases, these two methods resulted in the same first and second choices, thereby validating deduced parentchild relationships.
To validate the SPRING trajectory and cell state tree using pseudotime analysis, Monocle [v3.0.0] 31-33 was deployed on individual lineages/cell states. tSNE was used for Dimensionality Reduction and principle graph was learnt using SimplePPT. All the other parameters were set to default. In general Monocle pseudotime replicated real time and we found one instance of a strong parent-child relationship between cells of a given stage ( Supplementary Fig. S2g).

Calculation of metagene profiles and
For six of the major paracrine signaling pathways implicated in foregut organogenesis (BMP, FGF, HH, Notch RA and canonical Wnt), we curated a list of all the well-established ligands, receptors and context-independent pathway response genes that were encoded in the mouse genome (supplementary table S2a -excel tab1). We then calculated a "ligand-metagene", "receptor-

Prediction of receptor-ligand interactions
A given cell type was scored to be expressing enough ligand to send a signal or enough receptor to respond to ligand if the average ligand-metagene or receptor-metagene expression level was > -1 and expressed in > 25% of cells. (Except for the Notch ligand-metagene where expression threshold of > -1.5 was used due to low overall expression in all cells). These thresholds empirically set to be conservative and benchmarked against experimentally validate signaling interactions in DE liver, lung and pancreas from the literature. Furthermore, we determined the likely hood that a given cell population was responding based on the context-independent pathway response-metagene expression level being > -1 and expressed in > 25% of cells.
Context-independent response genes are those genes that are known from the literature to be directly transcribed in most cell types that are responding to a ligand-receptor activation. Normalized counts of genes across cells and up/down-regulated genes from bulk RNA sequencing were used as custom gene sets to perform the GSEA analysis.

Differentiation of PSCs into mesenchyme
Differentiation of hPSCs into lateral plate mesoderm was induced using previously described methods 24 with modifications. In brief, partially confluent hPSCs were dissociated into very fine clumps in Accutase (Invitrogen) and passaged 1:18 onto new Geltrex-coated 24-well plates for immunocytochemistry and 12-well plates for RNA preparation in mTeSR1 with 1uM thiazovivin (Tocris) (Day -1). Next day, a brief wash with DMEM/F12 is followed with Day0 medium 30ng/ml Significance was determined by one-way ANOVA, followed by Tukey's test.

Immunocytochemistry
Cells were fixed with 4% PFA/PBS for 30min at room temperature. After perforation with 0.5% Triton X-100/PBS for 10 min, cells were incubated with 5% normal donkey serum for 2 hours.
Cells were incubated with primary antibodies (listed in Supplementary Table 4) overnight at 4°C.
Next day, cells were washed with PBS, and then incubated with secondary antibodies for 1 hour at room temperature.

Data and Code Availability
The scRNA-seq and bulk RNA-seq data are available at Gene Expression Omnibus (GEO): GSE136689 and GSE136687. All the code (scripts, R-packages and software) and their documentation has been uploaded to GitHub [https://github.com/ZornLab/Single-celltranscriptomics-reveals-a-signaling-roadmap-coordinating-endoderm-and-mesoderm-lineage].
All the deposited code is available to use with GPLv3.0.