## Main

The traditional account of cellular heterogeneity in the lung based on meticulous histology and expression of few characteristic markers suggests more than 40 cell types in the adult human lung1. The lung cell-type repertoire has been further expanded by recent developments in single-cell genomics allowing the interrogation of hundreds of thousand cells from adult healthy and diseased human lungs2,3,4,5. So far, 58 distinct cell types and states can be categorized into the five major cell classes of epithelial, stromal, immune endothelial and neuronal cells.

Our knowledge of human lung development derives largely from animal models and simplified organoid cultures6,7 underscoring the lack of systematic studies of intact embryonic tissues. In this Resource, we focused on the first trimester of gestation and applied state-of-the-art technologies to capture and map the gene expression profiles of human embryonic lung in time and space. We first defined six main cell categories: mesenchymal, epithelial, endothelial, neuronal and immune cells, and erythroblasts/erythrocytes. Higher-resolution analysis of each of these categories suggested 83 cell identities, corresponding to cell types and transitional states. Next, we defined topological neighbourhoods of spatially related cell identities and used interactome analyses to describe communication niches and tissue-design rules driven by spatial factors and cell interactions. We present an online platform integrating single-cell RNA sequencing (scRNA-seq) with the spatial analyses to facilitate interactive exploration of our data on whole lung tissue sections at different ages.

## Results

### Overview of cell heterogeneity in the embryonic lung

We dissected lungs from 17 embryos, ranging from 5 to 14 weeks post conception (PCW) at approximately weekly intervals (Supplementary Table 1 (1) and Extended Data Fig. 1a–c). Assuming that the two lungs are bilaterally symmetric, we regularly used the right lobes for scRNA-seq and processed the left lobes for spatial analyses. For in situ mapping, we aimed to analyse consecutive sections of the same tissues to independently validate the cell-state topologies. A first clustering and differential expression analysis of 163,236, high-quality complementary DNA libraries (Extended Data Fig. 1d–h) revealed six main cell categories: the mesoderm-derived (1) mesenchymal, (2) endothelial, (3) immune cells and (4) erythroblasts/erythrocytes, as well as (5) the ectoderm-derived neuronal and (6) the endoderm-derived epithelial cells (Extended Data Fig. 2a–g and Supplementary Table 1 (3) and (13)). Next, we dived deeper into each of them by re-clustering the corresponding cells, to expose additional cell states that were hidden in the whole dataset analysis. This revealed an unexpectedly high heterogeneity of 83 distinct cell states (Fig. 1a and Extended Data Fig. 3a).

To further explore the proposed cell-states and map them back to the tissue, we monitored gene expression patterns on tissue sections with spatial transcriptomics (ST) in nine different stages (the interactive viewer8 contains representative sections of 6, 8.5, 10 and 11.5 PCW lungs). Probabilistic analysis of the ST data9 largely validated the scRNA-seq results and spatially mapped the suggested clusters (example in Fig. 1b). The probability estimation of each cluster in every ST spot allowed definition of possible cluster pairs, located consistently in the same ‘niche’ (55-µm-diameter ST spot). We defined four distinct cell neighbourhoods, in characteristic anatomical positions, including proximal and distal airway compartments, vessels and parenchyma (Fig. 1c and Methods). To explore the communication code among cell states in each neighbourhood, we used interactome analyses with CellChat10 and Nichenet11 (interactive viewer and example in Fig. 1d).

To achieve higher resolution, we targeted 177 cell-state markers and selected NOTCH, HH, WNT and RTK/FGF signalling components to validate cell communication events by multiplex HybISS12,13 (Fig. 1e and Extended Data Fig. 2h) and SCRINSHOT14. To facilitate accessibility and easy data exploration, we constructed an interactive viewer combining all modules of our analyses (https://hdca-sweden.scilifelab.se/tissues-overview/lung/). Below, we present the analyses of mesenchymal, epithelial and neuronal cell states and their interactions. Immune and endothelial cells are described in Supplementary Note 1.

### Distinct positions of mesenchymal cell states

The largest cluster in our dataset consisted of mesenchymal cells (Extended Data Fig. 2a). Subclustering revealed six distinct cell types expressing specific markers for known fibroblast, mesothelial, chondroblast and smooth muscle cell types and several immature states, characterized by the general mesenchymal markers COL1A2 (ref. 2) and TBX4 (ref. 15) and the lack of specific cell-type markers (Fig. 2a, Extended Data Fig. 4a and Supplementary Table 1 (4)). Annotation was also based on the spatial mapping of clusters at different timepoints (Fig. 2b and Extended Data Fig. 4b), the relative cluster positioning in the uniform manifold approximation and projection (UMAP) plot16, partition-based graph abstraction (PAGA plot)17 (Fig. 2a) and scVelo18 analyses (Extended Data Fig. 4c) positioning immature cell states in the UMAP-plot centre and the more mature ones at the periphery. We spatially detected: (1) mesothelial cells (cluster (cl)-19), expressing WT1, MSLN, KRT18 and KRT19 at the tissue margins (Extended Data Fig. 4d), (2) pericytes/vascular smooth muscle (cl-14) associated with endothelium (Fig. 1c) and marked by PDGFRB and moderate levels of ACTA2 and TAGLN, (3) SOX9pos COL2A1pos chondroblasts (cl-18) surrounding proximal airways, (4) MYH11pos DACH2pos airway smooth muscle (ASM, cl-13) close to airway epithelium, (5) SERPINF1pos SRFP2pos adventitial fibroblasts (AdvFs, cl-10) and (6) ASPNpos TNCpos airway fibroblasts (AFs, cl-16). AdvF and AF occupied distinct positions in the bronchovascular bundles19, with the AFs being localized closer to airways than AdvF (Fig. 2b (5), (6)). Immature cell states (cl-0, cl-2 and cl-6) showed scattered distribution (Extended Data Fig. 4b). Lastly, 5 of the 21 mesenchymal clusters contained proliferating cells, which were widely distributed at early stages and became more localized around distal airways over time (Fig. 2a and Extended Data Fig. 4e).

### ASM maturation states coincide with distinct topologies

A prominent PAGA-plot trajectory suggested a differentiation path of immature mesenchyme towards ASM. It connected three immature clusters (cl-0, cl-2 and cl-6) to a proliferating ASM cluster (cl-20) and three ASM clusters (cl-8, cl-12 and cl-13) (Fig. 2a). This proposed that the trajectory stems from the immature mesenchyme connects to the immature ASM cl-8 and cl-12, leading to the more mature ASM cl-13 (Fig. 2c,d and Extended Data Fig. 4f). Proliferating ASM cells showed high expression of smooth muscle markers, such as ACTA2 and TAGLN, implying that they represent a more mature state than cl-0 (Extended Data Fig. 4a). Interestingly, cl-20 also selectively expressed genes encoding extracellular matrix (ECM) proteins (Extended Data Fig. 4g), suggesting that proliferating ASM progenitors are transcriptionally distinct and locally contribute to ECM composition. Using pseudotime analysis20,21, we defined differentially expressed gene-modules that might contribute to differentiation along the ASM trajectory (Extended Data Fig. 5a). Characteristic regulators include the myogenic transcription factor (TF) DACH2 (ref. 22), which was detected mainly in intermediate states (cl-8 and c-12) (Extended Data Fig. 5a,b, module 5). LEF1 was expressed in cl-8 but not earlier, in agreement with the published role of WNT signalling in smooth muscle development23,24 and SSRP1, a FACT complex component, which modifies the chromatin structure at the promoters of muscle-specific genes, activating them25 (Extended Data Fig. 5b). The expression of the NOTCH ligand JAG1 was also increased in cl-6 and cl-8, in agreement with previous in vitro analysis26 (Extended Data Fig. 5c). Differentiation into mature ASM states seems to occur in cl-12 and cl-13 and is illustrated by increased expression of ACTA2, TAGLN and MYH11 (ref. 2) (Extended Data Fig. 5a, module 7). NR4A1, a negative regulator of vascular smooth muscle27 proliferation, was among the most highly upregulated TFs in the mature ASM cells (cl-13) (Extended Data Fig. 5b). HHIP, a target and inhibitor of HH-signalling28, and the secreted BMP-inhibitor GREM2 (ref. 29) were enriched in the more mature ASM cluster (Extended Data Figs. 4a and 5d: modules −7 and −9), implicating regulation of these pathways during ASM differentiation.

Spatial analysis localized most clusters of this trajectory in distinct positions along the developing airways (Fig. 2d,e), indicating a link between the ASM maturation states and their topology, with most immature states located peripherally and the mature ones being closer to proximal airways, as in mouse lung15. Mesenchymal cl-0 and cl-2 were dispersed in the parenchyma (Fig. 1d and Extended Data Fig. 4b) and highly expressed WNT2 and RSPO2 (Extended Data Fig. 5a,d). This is consistent with defects in ASM differentiation caused by WNT2 inactivation in mice30. This suggests that precursors are evenly distributed in the peripheral parenchyma and begin to differentiate close to the bud tips.

### Two differentiation trajectories of lung fibroblasts

To complement the mesenchymal cell analysis, we focused on the two suggested fibroblast trajectories, based on the relation of the involved clusters (cl-4, cl-5, cl-16, cl-9 and cl-10) in PAGA plot (Fig. 2a and Extended Data Fig. 5e,f). ST analysis showed that cl-16 is localized around the airways, as early as 6 PCW (Fig. 2b (6)). This cluster is negative for ACTA2 but expresses markers of other adult stromal cell types, such as ASPN for myofibroblasts, SERPINF1 for AdvFs2 and COL13A1 characterizing a recently described lung fibroblast type found in human and mouse31,32,33 (Extended Data Fig. 4a). Its unique profile and close proximity to the ASM layer (Fig. 2e,f) argued that cl-16 corresponds to an undescribed mesenchymal cell type, which we named ‘airway fibroblast (AF)’. On the other hand, AdvFs were localized in bronchovascular bundles, at greater distance from the airways than AFs (Fig. 2b (5)).

scVelo and Slingshot analyses (Extended Data Fig. 5e,f) indicated that the immature fibroblasts of cl-4 either transit to immature AF2 (cl-5) and then to the mature AFs (cl-16) or produce the immature AdvFs (cl-9), which mature to the cl-10. WNT2 and FGF10 were expressed in the immature fibroblasts, similarly to the other immature mesenchymal clusters (Extended Data Fig. 5d) but the Netrin-receptor DCC is more selective for all three immature mesenchymal clusters and especially cl-4, suggesting a decline as differentiation proceeds (Extended Data Fig. 5g and Supplementary Table 1 (5)). Similarly, immature cells expressed DACH1 and ZBTB16, whereas MECOM was gradually increased along the AF trajectory and the BMP-signalling targets ID1 and ID3 (ref. 34) along the adventitial one (Extended Data Fig. 5h). Different secreted ECM proteins such asTNC, ASPN and collagens were differentially expressed along the trajectories (Extended Data Fig. 5i). This suggests distinct roles of the embryonic lung fibroblast types in the creation of the ‘scaffolding’ substrates for resident lung cells.

### AF interactions with smooth muscle

Focusing on the AF trajectory, there was a gradual increase of markers such as COL13A1 and SEMA3E35 in mature cl-16 (Extended Data Fig. 4a). Spatial analyses showed that AFs surround the ASMs, with cl-16 located most proximal to ASM (Fig. 2e,f) and the more immature AF state (cl-5) in more peripheral positions (Fig. 2e). To explore potential communication routes between AF and ASM, we focused on signalling pathways emanating from the one and targeting the other (Extended Data Fig. 6a,b). IGF, WNT and BMP pathways were among the most prominent ones (Extended Data Fig. 6c–e). The IGF1 was mainly expressed in immature ASM2 (mes cl-12), as early as 5 PCW and increased over time (Extended Data Fig. 6f,g). The expression of the corresponding receptor, IGF1R was also evident at that stage, in immature AFs (mes cl-5) showing relatively stable expression until 14 PCW. The predicted IGF1-target gene, LUM, was expressed by AFs (Fig. 2g and Extended Data Fig. 6c) and may facilitate the alignment and formation of collagen bundles around proximal airways, as previously reported36. WNT5A was produced by ASM cells and targeted AFs through the FZD1 receptor, in a communication pattern that intensifies overtime, as indicated by the gradually elevated expression of both proteins (Extended Data Fig. 6d,g,h). Our computational predictions suggested BMP4 as a WNT5A target (Extended Data Fig. 6d), in agreement with previous in vitro experiments37. BMP4 is in turn predicted to upregulate ACTA2 expression in ASM38, suggesting a positive feedback loop, between adjacent AFs and ASM (Extended Data Fig. 6e). Our results identify AFs as an undescribed cell type in contact with ASM and suggest their mutual signalling interactions.

### SCPs produce lung parasympathetic neurons

The trachea and lungs are innervated by the vagus nerve, containing sympathetic, parasympathetic and sensory neurons. These fibres comprise a pre-ganglionic and a post-ganglionic compartment39,40. Only parasympathetic ganglia are localized inside the lung, close to the airways, containing the somata of post-ganglionic neurons that innervate the ASM41 and regulate bronchoconstriction40. The source for parasympathetic neurons in mice42,43 is the neural crest-derived Schwann cell precursors (SCPs), which migrate towards trunk and cephalic ganglionic positions to differentiate into neurons, in an ASCL1-dependent process42.

Subclustering of neuronal cells revealed eight cell states, which can be ordered into one main differentiation trajectory, resembling the transition of SCPs to neurons (Fig. 3a,b). The dataset also contains proliferating SCPs (cl-1, cl-5 and cl-7) (Extended Data Fig. 7a and Supplementary Table 1 (6)). The neuronal cl-0 and cl-3 gradually lose SCP-marker expression while increasing ASCL1, suggesting transient states from SCPs to neurons. cl-2 and cl-6 expressed the neuronal markers PRPH, NRG1 and PHOX2B (Extended Data Fig. 7a), together with the acetylcholine receptors M2 and M3 (CHRM2 and CHRM3) and the nicotinic acetylcholine receptor subunits α3 and α7 (CHRNA3 and CHRNA7). This suggested that they can respond to acetylcholine. Similarly, they expressed acetylcholinesterase (ACHE) and SLC5A7, encoding the high-affinity choline transporter for intraneuronal acetylcholine synthesis44 (Extended Data Fig. 7b). However, the lack NOS1 and VIP (Extended Data Fig. 7a) suggests that they are still immature parasympathetic neurons.

Stereoscope analysis detected the collective signature of both SCPs and neuronal cells in the trachea at 6 PCW (Fig. 3c). Intra-lobar signal was first detected close to the trachea at 7 PCW (Fig. 3d, asterisk). At later timepoints the signal was detected more centrally, within the bronchovascular bundle interstitium19, coinciding with a distinct haematoxylin and eosin (H&E) staining pattern (Fig. 3e) that overlaps with the protein expression of the SCP and neuronal markers PHOX2B, DLL3 and NEFM (Fig. 3f). This suggests that the SCPs, presumably deriving from neural crest, enter the lung and mature to parasympathetic neurons in ganglia embedded in the bronchial interstitium.

To explore the cellular composition and differentiation states in the proposed embryonic ganglia we first stained for PHOX2B (SCPs and neurons), DLL3 (differentiating neurons45) and NF-M (mature neuron projections) (Fig. 3g,h). At 8.5 PCW, we found several clusters of PHOX2Bpos cells in NF-Mpos domains, that contained some DLL3pos cells, which would correspond to differentiating neurons. We further explored this by analysing the characteristic TFs SOX10, ASCL1 and ISL1, which are sequentially activated along the trajectory (Extended Data Fig. 7c–e). We detected SOX10pos SCPs, SOX10pos-ASCL1pos neuronal precursors and ISL1pos neurons, consistent with the differentiation steps proposed by the pseudotime analysis. The selective expression of ASCL1 and DLL3 in subclusters of the ganglionic cells prompted us to interrogate the expression of NOTCH-signalling pathway genes in the clusters (Fig. 3i). The selective expression of JAG1 in SCPs suggested that it activates NOTCH signalling in parasympathetic ganglia, similarly to its role in mouse limb nerves, which also derive from neural crest46.

### Early developmental trajectories of epithelial differentiation

We subclustered epithelial cells into 15 groups (Fig. 4a) and annotated them on the basis of known markers (Extended Data Fig. 8a and Supplementary Table 1 (7)), spatial distribution (Fig. 4b and Extended Data Fig. 8b) and their trajectory relationships illustrated by PAGA plot and scVelo analyses (Extended Data Fig. 8c,d). We detected four distal cell identities (cl-10, cl-2, cl-3 and cl-9) and seven proximal ones, corresponding to ciliated (cl-14), secretory (cl-0), neuroendocrine (NE) cells (cl-11 and cl-12) and their progenitors (cl-6, cl-7 and cl-4). We also found an intermediately located population (cl-1) and three proliferating cell states (cl-8, cl-13 and cl-5), which were preferentially localized in distal airways (Extended Data Fig. 8b). Surprisingly, we did not detect any cluster with characteristic basal cell features but only a few TP63pos cells within cl-7, being negative for typical embryonic47 or adult2 basal markers (Extended Data Fig. 8e,f). Similar to the scRNA-seq analysis, immunofluorescence of 8.5 and 14 PCW lung sections showed TP63pos cells in large airways with only a small fraction being KRT5pos at only 14 PCW (Extended Data Fig. 8g). This suggests that basal cells begin to differentiate at 14 PCW in the intra-lobar airways.

In distal airways, epithelial cl-2, cl-3, cl-9 and cl-10 were positive for SOX9 and ETV5 (refs. 6,48) (Extended Data Fig. 8a,b and Fig. 4b,c). Among them, cl-2 and cl-10 cells highly expressed SOX9 and were located in the most distal part of the bud tips. Trajectory analyses (Extended Data Fig. 8c,d) and their topology suggested that they function as the source of the remaining two distal clusters, which were predominantly composed of later-timepoint cells (>10 PCW) (Extended Data Fig. 9a). Accordingly, cl-9 included SFTPChigh cells co-expressing ACSL3, which participates in lipid metabolism49, a prerequisite for surfactant biosynthesis50 (Extended Data Fig. 9b,c). By contrast, cl-3 cells were found scattered in the distal epithelium as early as 5 PCW (Extended Data Fig. 8b) and expressed elevated CTGF levels (Extended Data Fig. 9d), a growth factor implicated in mouse alveolar development51 and in stimulation of fibroblasts during mouse lung fibrosis52. Immunofluorescence for KRT17, another cl-3 selective marker (Extended Data Fig. 8e) confirmed the existence of sparsely distributed Ecadpos KRT17pos cells in the 14 PCW distal airway epithelium (Fig. 4d). Overall, these cells share gene expression similarities with ‘basaloid’ cells (Extended Data Fig. 9f,g and Supplementary Table 1 (8)), a pathogenic cell state in interstitial pulmonary fibrosis4,53. However, the embryonic clusters are distinguished by marked differences, as they are TP63neg and are localized in the luminal rather than basal part of the epithelium (Fig. 4d).

### Cell communication patterns in the distal lung compartment

We utilized the definitions of cell neighbourhoods (Fig. 1c) to explore candidate cell communication pathways in the distal lung compartment (Viewer: CellChat). FGF signalling was among the most prominent predictions (Fig. 4e) with FGF10 being mainly expressed in scattered mesenchymal cells (cl-0) around the epithelium (Fig. 4f and Extended Data Fig. 4b). This expression pattern differs in the mouse embryonic lungs, where FGF10 is focally expressed at the bud tips to induce branching54. This difference might explain why FGF10 induces cyst formation instead of branching in human explants55. Additional FGF-ligand genes (Fig. 4f,g) were detected in the distal epithelium, defining both mesenchymal and epithelial cells as sources. For example, FGF18 and FGF20 were detected in distal epithelium by both scRNA-seq (cl-2, cl-3, cl-9 and cl-10) and HybISS. The localized expression of FGFR2, FGFR3 and FGFR4 agreed with an independent study55. Potential FGFR downstream targets, such as ETV5 (ref. 56) and SPRY2 (ref. 57), were detected in distal epithelium, suggesting a potential epithelial-intrinsic function for FGF signalling (Fig. 4f,g). Another prominent predicted target of epithelial FGFR activation is SOX9 (Extended Data Fig. 9h), consistent with its reported regulation by FGF/Kras48,55.

### Distinct steps in proximal airway cell differentiation

The secretory (cl-0 and cl-4), ciliated (cl-14) and NE (cl-11 and cl-12) clusters were located in the most proximal airway positions. However, their putative progenitors (cl-6 and cl-7) were found in slightly more distal positions (Fig. 4b, Viewer: HybISS). The FOXJ1pos cl-14 cells expressed only early ciliogenesis genes, suggesting an early differentiation state (Extended Data Fig. 9i and Supplementary Table 1 (24)). The major difference between secretory cl-0 and cl-4 was the high levels of HOPX and KRT17 in cl-4 (Extended Data Fig. 8a), which also expressed activated epithelial markers (Extended Data Fig. 9g), similar to the distal epithelial cl-3. These cl-0 and cl-4 cells showed similar spatial distribution (Fig. 4b and Extended Data Fig. 8b), but cl-4 was enriched for migration-related genes (Extended Data Fig. 9j and Supplementary Table 1 (25)). Thus, cl-4 may correspond to a transient progenitor state giving rise to the ‘default’, static airway secretory cl-0. PAGA plot (Extended Data Fig. 8c) and pseudotime (Fig. 5a,b) analyses suggested that cl-6 cells can function as a source for either secretory cl-0 or NE-progenitor cl-7 cells, which further progresses towards the NE cl-12 and cl-11 states. Differential expression analysis along the two trajectories identified 569 genes that were grouped in nine modules (Supplementary Table 1 (18), top 10, and Fig. 5c). Among the earliest activated genes in the secretory trajectory, we detected YAP1 and the WNT extracellular inhibitor GPC5 (Fig. 5c, module 6) (refs. 58,59). These were followed by increased levels of the characteristic secretory marker SCGB3A2 and the NOTCH-signalling targets HES1 and HES4 (Fig. 5c, module 9), further arguing for an evolutionary conserved role of NOTCH-signalling in airway secretory cell differentiation60 and maintenance61.

### Distinct topologies and possible functions of NE identities

In the NE trajectory, cl-7 probably represents a progenitor expressing low levels of ASCL1, a critical factor in NE cell differentiation62 (Fig. 5c, module 4). The differentially expressed TFs along the secretory and NE trajectories included the direct ASCL1-target, MYCL63, which was transiently expressed along the NE trajectory (Fig. 5d and Extended Data Fig. 9k). The NE progenitor cl-7 was connected by few cells with the NE2 (cl-12), creating a stalk that splits in two directions, one towards the remaining NE2-cells and the other towards NE1-cells (cl-11) (Fig. 5a). In this part, gene module 4 contained ASCL1, its direct target IGFBP5 (ref. 64), together with HES6 (ref. 65) (Fig. 5c). Finally, at the part towards NE1 cells, module 1 contained NEUROD1 (Extended Data Fig. 9l), its target HNF4G63 (Fig. 5c, module 1, and Extended Data Fig. 9m) and SSTR2 (Fig. 5c, module 1). Gene expression comparison between cl-11 and cl-12 (Extended Data Fig. 9n and Supplementary Table 1 (9)) showed that cl-12 produces the characteristic pulmonary neuropeptides GRP and CALCA together with SST, whereas cl-11 expresses GHRL and CRH. Gene Ontology (GO) analysis for enriched biological processes suggested hormone secretion (GO:0030072) and neuronal axon guidance (GO:0007411), as characteristic terms for cl-11 compared with cl-12 (Extended Data Fig. 9o,p and Supplementary Table 1 (26, 27)). The NE1 cells (cl-11) resemble a recently identified NE cell type in human embryos7.

To investigate the spatial arrangement of NE clusters, we used SCRINSHOT to detect a panel of 31 genes, encompassing NE, epithelial and mesenchymal markers (Extended Data Fig. 10a–d). We defined NE-specific patterns by segmenting the sections in hexagonal bins (7 μm width), approximating the size of epithelial cells. Among 20,351 bins expressing general epithelial and characteristic NE genes (Methods), we found three main NE-associated categories, corresponding to NE-progenitors, GRPpos and GHRLpos NE-cells in situ (Extended Data Fig. 10e,f). These expression patterns match the ones of scRNA-seq analysis. GHRLpos NE-cells were located exclusively in the most proximal airways, while NE progenitors and GRPpos NE-cells were less restricted in their location along the airway proximal–distal axis (Extended Data Fig. 10d,g). Immunofluorescence analysis confirmed that GRPpos and GHRLpos NE cells are differentially distributed along the airways (Extended Data Fig. 10h).

As different levels of graded NOTCH-signalling activation are required for NE and non-NE cell-fate specification in the airway epithelium66, we interrogated the proximal clusters for the expression of NOTCH-signalling genes (Fig. 5e). Both NE clusters (cl-11 and cl-12) expressed HES6 (a pathway target and inhibitor65). However, cl-12 expressed higher levels of JAG1 and DLL3 (a NOTCH cell-autonomous inhibitor67), in addition to low levels of JAG2 and DLL1. This suggests that cl-12 cells are a source of NOTCH signalling and that they are less capable of receiving it. The downregulation of DLL3 might be permissive for lower NOTCH-signalling activation, contributing to the cl-11 gene-expression programme defined by the NEUROD1, RFX6, HNF4G and NKX2-2 TFs (Fig. 5d and Extended Data Fig. 9l,m). Upstream, in the trajectory, at the bifurcation of secretory (cl-6) and NE-progenitor (cl-7) states, the repressor REST68 and the receptor NOTCH2 showed similar expression levels, but HES6 and NOTCH1 were higher expressed in the NE-progenitor cluster, suggesting differences in strength or duration of NOTCH signalling69,70. NOTCH2 activation in proximal progenitors (cl-6) is expected to be more potent69,70, promoting the secretory differentiation.

Overall, the pseudotime analysis suggests two sequential but distinct NOTCH-signalling events, utilizing different ligands and intracellular effectors: one promotes secretory differentiation, and the other controls the transition of cl-12 to cl-11 (Fig. 5f). Further interactome analysis revealed another unique communication pattern between the two NE clusters involving somatostatin (SST) expressed by cl-12 and its receptor SSTR2 in cl-11 (Fig. 5g,h).

In summary, we mapped the distinct topologies and developmental trajectories of airway secretory and NE identities from naïve epithelial cells in the embryonic lung. Each trajectory contains distinct candidate regulators of NOTCH signalling for the respective cell-state transitions.

### Mesenchymal cell zonation patterns along two airway axes

Stromal cell populations in fully grown lungs show distinct distributions along the proximal–distal axis of the airways2. They also show specialized radial arrangements surrounding each major airway, with ASM adjacent to the epithelium (centre) and AdvFs and chondroblasts positioned more peripherally. To explore the spatial organization of different mesenchymal trajectories (AF, ASM and AdvF) relative to the growing airways on the tissue level, we defined two axes. A proximal–distal one, which was defined by the graded expression of proximal (SOX2 and SCGB3A2) and distal (ETV5 and TPPP3) epithelial genes, validated by HybISS (Methods) and a radial one, extending from the airway centre towards peripheral positions in the mesenchyme. We positioned the ST spots and HybISS-annotated cells corresponding to immature and differentiated states of AdvFs (mes cl-10), ASM (mes cl-13) and AFs (mes cl-16) relative to these two airway-dependent axes (Fig. 6 and Methods). This analysis revealed that the immature cell states occupy predominantly distal and peripheral positions relatively to the airway branches. By contrast, the more mature mesenchymal clusters are found proximally and centrally located. In particular, the most immature ASM clusters (cl-0, cl-2 and cl-6) were the most peripheral. More differentiated clusters (cl-8, cl-20 and cl-12) were found closer to the airways and in more proximal positions, whereas the most mature ASM (cl-13) was found proximal and tightly associated with the airways. At all three consecutive timepoints (6, 8.5 and 11.5 PCW), the immature fibroblast (mes cl-4) was consistently found more proximal compared with the ASM progenitor clusters (viewer: ST). This argues for the presence of a peripheral central zone of mesenchymal progenitors giving rise to AdvFs, AFs and chondroblasts and reveals an early origin of radial patterning in the mesoderm. We suggest that undifferentiated cells from the distinct progenitor regions proliferate and continuously differentiate while migrating radially towards the centre and their functional positions, similarly to the model of the mesenchymal progenitor niche in the mouse lung15.

### Cell heterogeneity and possible communication patterns

The spatial probabilistic methods (PciSeq71 and Tangram) generated systematic spatial maps of several stages, showing the cellular composition of distinct organ compartments over time (Fig. 7a). On the tissue level, this allows the definition of spatial rules of tissue organization and estimation of developmental origins by interrogating the relative positions of pseudotime trajectories. A graphical representation of the developing lung shows a summary of mature and intermediate cell states, localized in distinct tissue positions, creating cell ‘neighbourhoods’ with specific communication patterns (Fig. 7b).

We integrated our scRNA-seq data with the HybISS, ST and SCRINSHOT spatial analyses, together with the CellChat results in the TissUUmaps viewing tool (https://hdca-sweden.scilifelab.se/tissues-overview/lung/). This portal provides an open interactive atlas of early lung development that directly facilitates exploration, sharing and hypothesis building.

## Discussion

We have generated a systematic topographic atlas of the developing human lung, combining gene expression profiling by scRNA-seq with spatially resolved transcriptomics on intact tissue sections. We identified 83 cell states and inferred developmental trajectories leading to a remarkable heterogeneity reflecting the structural and functional complexity of the lung. Although we present an extensive analysis of weekly intervals during the first trimester, our data have a few limitations. Our first datapoint is at 5 PCW and we analysed only about 180,000 cells. Earlier and broader sampling is likely to uncover additional diversity and infer more precise trajectories than the proposed ones. We aimed to collect and analyse freshly dissociated cells, omitting tracheas, without enrichment for specific populations. The lack of enrichment may have hampered detection of rare, fragile or difficult-to-dissociate cells. Indeed, we detected chondroblasts and mesothelial cells only in the samples deriving from earlier timepoints. We performed iterative clustering, where a conservative first clustering was followed by subclustering of the major populations. Although most of the subclusters showed distinct topologies and gene expression profiles, some of the cell states may result from overclustering, which is difficult to define because of the presence of immature but committed states of distinct cell types. Finally, we have described the spatial diversity of the developing lung mainly at the messenger RNA level, relating this diversity to the proteome and further to physiological functions remains a future task.

We suggest that the diversity of gene expression patterns in the developing human lung can be explained at distinct but hierarchically coupled levels. First, the major cell classes of epithelial, endothelial, immune, stromal and neuronal cells are characterized by distinct gene expression programmes of their ancestries from distinct germ layers: endoderm, mesoderm and ectoderm. We show several levels of subdivisions in each of these classes, during the first trimester. For example, within the endothelial group there are lymphatic, venous, arterial, bronchial and capillary clusters characterized by distinct regulatory and functional gene-expression profiles (Supplementary Note 1). Second, some cell clusters show region-specific gene expression profiles, presumably reflecting their developmental history. This is exemplified by the separation of proximal and distal compartments in the epithelium. The SOX2pos-proximal and the SOX9pos-distal domains are specified earlier and are maintained during the glandular stages. This suggests that transcriptional networks are conveyed into the later diversification of more specialized cell states specific to each region. Our spatial analysis illustrates this by the striking correlation of characteristically different radial arrangements of AFs and ASM states along different positions of the epithelial proximal–distal axis. This suggests that the different values of the proximal–distal axis intersect with distinct values of a radial axis visualized by the organization of surrounding smooth muscle and fibroblast states. The potential regulatory relationships between these axes are unknown. A third level of diversification results from cell communication patterns within local environments reflecting inducible or transient regulation of gene modules. The integration of single-cell sequencing with ST data defined specific neighbourhoods for most of the cell states. Our curated interactome analyses predicted several known and new examples of this organization level. They include the activation of NOTCH signalling between the SCP and neuronal states46, within parasympathetic ganglia.

Lung diseases are major causes of death worldwide72. An outstanding challenge for medical research is to define deviation points from normal cellular trajectories at the start and during the advancement of lung pathologies and to analyse cellular responses after treatments73. Our atlas of early human lung development revealed several distinct cell states and proposed their interactions with neighbours and progression along differentiation trajectories.

As single-cell analysis technologies are increasingly used in the description of detailed cell-state trajectories in disease, we believe that our integrated scRNA-seq data, with spatially resolved transcriptomics and local interactome analyses in an open, interactive portal will provide a useful resource towards understanding and reversal of pulmonary disease progression.

## Methods

### Human lungs

The tissue donors were recruited among pregnant women after their decision to terminate their pregnancy. The referral to hospitals was done by a central office for all abortion clinics in the Stockholm region, and according to our information it was random. The recruitments were done by midwifes who were not involved in the conducted research. Thus, there was no bias regarding which women were recruited. Inclusion criteria: 18 years of age or older and fluent in Swedish. Exclusion criteria: abortions performed for any medical reasons, by socially compromised women and/or by women showing any signs that the consent may not be informed. All women provided written consent for tissue usage for research purposes and for their ability to withdraw their consent at any time. There was no compensation to the tissue donors.

The use of human foetal material from the elective routine abortions was approved by the Swedish National Board of Health and Welfare and the analysis using this material was approved by the Swedish Ethical Review Authority (2018/769-31). After the clinical staff acquired the informed written consent by the donor, the retrieved tissue was transferred to the research prenatal material. The lung samples were retrieved from foetuses between 5 and 14 PCW.

### Tissue treatment for spatial analyses

One of the two lungs (preferentially the left), from each donor, was snap frozen in cryomatrix and further used for histological analyses. We cut 10–12-μm-thick tissue sections with a cryostat (Leica CM3050S or analogue) and collected them onto poly-lysine-coated slides (VWR cat. no. 631-0107) for SCRINSHOT and immunofluorescence or Superfrost Plus (VWR cat. no. 48311-703) for in situ sequencing (ISS). Sections were left to dry in a container with silica gel or at 37 °C for 15 min and then stored at −80 °C until usage.

### Tissue dissociation of human embryonic lungs

For tissue dissociation, tracheas were removed and lungs were finely minced. For later timepoints, lobes were first dissected into smaller pieces. Then, they were digested in 4 U ml−1 Elastase (Worthington, cat no. LS002292), 1 mg ml−1 of DNase (Worthington, cat. no. LK003170) in Hanks’ balanced salt solution (HBSS) (Gibco, cat. no. 14170) at 37 °C ranging between 30 min and 3 h depending on age (older timepoints require longer digestion times). HBSS supplemented with 2% fetal calf serum (FCS) (Gibco, cat. no. 10500064) was used for the whole procedure. The tissues were triturated with glass Pasteur pipettes every 15–20 min to enhance dissociation. After digestion, the cell suspension was filtered in a 15 ml Falcon tube using a 30 μm cell strainer (CellTrics, Sysmex), to remove clumps and debris. The cell suspension was kept ice cold and was diluted (roughly 1:2) with ice-cold HBSS. The filtered cells were pelleted at 200g for 5 min at 4 °C and the pellet resuspended in a small volume of calcium- and magnesium-free HBSS (Gibco, cat. no. 14170) and transferred to 1.5 ml Eppendorf tubes pre-coated with 30% BSA (A9576, Sigma-Aldrich). A Bürker chamber was used for cell counting.

### scRNA-seq of human embryonic lung cells

scRNA-seq was carried out with the Chromium Single Cell 3′ Reagent Kit v2 and v3. Cell suspensions were counted and diluted to concentrations of 800–1,200 cells μl−1 for a target recovery of 5,000 cells on the Chromium platform. Downstream procedures including cDNA synthesis, library preparation and sequencing were performed according to the manufacturer’s instructions (10X Genomics). Libraries were sequenced on an Illumina NovaSeq 6000 (Illumina). We aimed to obtain 75,000 and 200,000 sequencing reads per cell for the v2 and v3 libraries, respectively, to match the different performances of the Chromium Single Cell 3′ Reagent v2 and v3 Kits and to achieve sufficient sequencing saturation. Across all 39 libraries we obtained an average of 187,242 reads per cell. Reads were aligned to the human reference genome GRCh38-3.0.0 and libraries were demultiplexed and aligned with the 10X Genomics pipeline CellRanger (version 3.0.2). Loom files were generated for each sample by running Velocyto (0.17.17) (ref. 76) to map molecules to unspliced and spliced transcripts.

### Bioinformatic analysis for scRNA-seq

All *.loom files were imported to R as ‘Seurat objects’, using the ‘connect’ function of the loomR package and the ‘as.Seurat’ function of SeuratDisk for *.loom files >3.0.0 (refs. 77,78). The counts were obtained using the ‘ReadVelocity’ function of SeuratWrappers package and we created objects with ‘merged’, ‘spliced’, ‘unspliced’ and ‘ambiguous’ counts.

The scRNA-seq datasets from the same donor that were sequenced in the same sequencing run were merged to create donor-specific objects. The only exception was the cells of donor 17 that were analysed as two individual datasets because 10 × 256 was sequenced after 10 × 253, but we identified no ‘batch effect’ separating its cells from the others of the same donor (‘10 × 253’ and ‘10 × 256’ in Viewer).

The individual donor datasets were analysed separately using Seurat package in R, to inspect their quality. Firstly, we removed the cells with low and high number of detected genes, based on their histogram distribution (likely cell fragments and multiplets, respectively). Next, we ran the DoubletFinder package79 to identify and remove possibly cell multiplets, considering that 4% of the analysed cells are multiplets.

To integrate the resulting datasets of 163,000 cells, we used the SCTranform function in Seurat, with 5,000 variable genes. We used 5,000 integration features for the dataset integration, setting as reference dataset the donor 17 that corresponds to the oldest timepoint of our analysis (14 PCW). We observed no profound clustering of the cells according to the examined technical covariates, like the utilized 10X Genomics chemistry or the donor identity, especially for those of the same age (Viewer).

The principal component analysis (PCA) was based on the first 100 top principal components (PCs). For definition of the neighbourhood graph and the clusters, we used the default settings of ‘FindNeighbors’ and ‘FindClusters’ functions of Seurat77,78, with 100 PCs. For identification of cluster selective markers, we used the ‘FindAllMarkers’ function77,78, with MAST80 statistical test and maximum cell number/cluster set to 126, which corresponds to the smallest suggested cluster. To accept a gene as a cluster marker, it had to be expressed in at least 25% of the cells in the cluster, have 0.1 logarithmic fold increase and be expressed in at least 10% more cells in the cluster than the remaining dataset. We also selected the statistically significant markers (adjusted P value <0.001, after Bonferroni correction) for all downstream analyses.

For the analysis of (1) epithelial, (2) endothelial and (3) immune cells, we selected the corresponding clusters of the 163,000 cell dataset and harmonized the cells according to the donor parameter, using the ‘PrepSCTIntegration’ function in Seurat with default settings and 5,000 features (genes) and regressing out stress-related genes (‘AddModuleScore’ function in Seurat)81,82, that have been previously shown to get induced by enzymatic tissue dissociation at 37 °C (ref. 83). Because of the large size of mesenchymal cell subset (>138,000 cells), we used donor 17 as a reference dataset for the harmonization of the different donor datasets. Especially for the analysis of the neuronal cells, we selected the donor datasets with more than 29 cells, that facilitated their decent integration (5 PCW: 49 cells, 5.5 PCW: 187 cells, 6 PCW: 169 cells, 7 PCW: 227 cells, 8 PCW: 38 cells, 8.5 PCW: 52 cells and 14 PCW: 30 cells). The selected 752 cells were further processed as all other categories.

For dimension reduction and clustering of the above main cell-type categories, we applied the same approach as with whole dataset but with the first 50 PCs.

To further filter the cells for possible multiplets, we firstly normalized the counts to 10,000 and then we removed possible red-blood contaminants, setting expression of HBA1 <4, when necessary. For each of the epithelial, endothelial and immune datasets, we detected a cluster that expressed mesenchymal cell markers. Taking into account that (1) mesenchymal cell number is 12 times larger than epithelial, 21 times larger than endothelial and 33 times larger than immune cell number and (2) it is unlikely for immune cells to express mesenchymal cells markers, we considered these clusters doublets and removed them.

For trajectory inference analysis of complex multicellular developmental tissue architecture, we guided our analysis towards understanding key lineage branching points inspired by the graph abstraction concept. We used the cell–cell unweighted shared nearest neighbour graph (G {0,1}cDaN × N) and their assigned one-hot clusters (O {0,1} N × k) to compute for each cluster k the number of edges shared with all clusters (Ek × k), including itself.

$$E = \left( {GO} \right)TO$$

The number of cluster shared edges was then element-wise normalized by its total number of edges (Hadamard division), resulting in transition probabilities (P [0,1] k × k) that range between 0 and 1 for each cluster, representing the proportion of connections shared between each cluster, where J{1} k × k is a square all-ones matrix.

$$P = E \oslash \left( {E \cdot J} \right)$$

Spurious weak connections with transition probabilities below 10−4 were filtered out by setting its value to zero. Edges were then projected onto the cluster centroids on the UMAP embedding for visualization. Cluster transition probabilities on existing edges (p ij > 0) were converted to graph weights (w ij) defined by the inverse of transition probabilities:

$$w\,ij = 1/\left( {p\,ij} \right)$$

and optimal paths from immature (that is, root) to mature cell states were calculated using Dijkstra’s shortest path algorithm implemented in the igraph package84. The indicated clusters, for distinct trajectories, were selected and re-analysed to create a new UMAP plot with ‘RunUMAP’ function in Seurat77,78. The Slingshot package was used for pseudotime analysis. Firstly, we set the root and the end-point clusters with ‘getLineages’ function, and then we calculated the principal curves (‘getCurves’ function), the pseudotime estimates (‘slingPseudotime’ function) and the lineage assignment weights (‘slingCurveWeights’ function). To identify differentially expressed genes along the trajectories, we used the ‘fitGAM’ function of tradeSeq. ‘patternTest’ was used for the analyses of two trajectories and the ‘associationTest’ function for the differential expression analysis along one trajectory. The differentially expressed genes were ordered on the basis of the hierarchical clustering ward.D2 method, using ‘hclust’ function in fastcluster package85 and plotted using a custom script. The ‘clusterboot’ function of fpc package86 was used to calculate stability values of gene modules. For the RNA-velocity analyses, we transformed the Seurat objects to *.h5ad with SeuratWrappers and used scVelo pipeline, filtering for 50 ‘shared counts’ and 5,000 ‘top genes’. As described in the pipeline, the analyses used the packages scvelo, cellrank87 loompy, matplotlib88, numpy89, pandas90 and scanpy91.

For the analyses of aberrant basaloid4 gene expression programmes in the scRNA-seq dataset, we used the ‘AddModuleScore’ function in Seurat77,78 to calculate the aggregated gene-expression scores of their characteristic markers, as they have been defined in the corresponding studies.

For the identification of TFs and co-factors, between the differentially expressed genes, we used the AnimalTFDB 3.0 database92. The Human Protein Atlas was used for screening of secreted and surface (CD) proteins93, and Neuropedia database was used to find differentially expressed neuropeptides94. Statistically significant (adjusted P value <0.001, average logarithmic fold change >0.25) genes were used in Toppgene suite95, for GO analyses, with default settings. Their P values were calculated according to the hypergeometric probability mass function, and the top-ten biological processes were plotted with GraphPad Prism 9 (GraphPad Software, LLC).

### ST

The capture areas of Visium arrays contain 55-µm-diameter spots, with barcoded oligo-dT anchors (unique for each spot) that allow hybridization of the mRNA molecules in a tissue section that are released through its digestion. The anchors are used as primers to facilitate cDNA synthesis and the produced libraries are sequenced. The unique barcodes for each spot allow the spatial resolution of the detected mRNA-species back the tissue, using the spot coordinates.

### ST library preparation

Spatial gene expression libraries (n = 9) (6–13 PCW) were generated with the Visium Spatial Gene Expression Slide & Reagent kit (PN-1000184; 10X Genomics), according to manufacturer’s protocol. Before the analyses, RNA integrity numbers (RIN) were obtained for all samples to assess the quality of the RNA.

Depending on the size of each section, one or more sections of the same sample were placed in each capture area (6.5 × 6.5 mm) of the Visium arrays. The sections were first fixed for 10 min in acetone, stained with Mayer’s H&E Y and imaged with a Zeiss Imager.Z2 Microscope (Carl Zeiss Microscopy GmbH), using the Metafer5 software MetaSystems Hard & Software GmbH). Depending on the age of the lung, the tissue sections were permeabilized for 8–20 min to capture the mRNA molecules. The optimal fixative and permeabilization time for developing lung samples was determined before the Visium experiments using a Visium Spatial Tissue Optimization Slide & Reagent Kit (PN1000193; 10X Genomics). The cDNA synthesis and library preparation were done according to manufacturer’s protocol (PN-1000184 and PN-1000215; 10X Genomics). Sufficient amount of 2–4 nM concentration libraries was used for sequencing for Illumina platform, following the manufacturer’s instructions.

### ST data analysis

Sequenced ST libraries were processed using Space Ranger 1.0.0 Pipeline (10X Genomics). Reads were aligned to the human reference genome to obtain an expression matrix. The count matrix was filtered for all mitochondrial, ribosomal and non-coding genes. Spots with fewer than 300 unique molecular identifier (UMIs), fewer than 100 genes and genes detected in fewer than five spots were excluded from the analysis. After filtering, a total of 18,125 features were retained for final analysis across 66,626 spots (6 PCW: 1,439, 7 PCW: 2,692, 8 PCW: 1,840, 8.5 PCW: 1,882, 9 PCW: 3,284, 10 PCW: 11,720, 11 PCW: 15,534, 12 PCW: 13,287 and 13 PCW: 14,948).

Normalization and dimension reduction were performed using the Seurat and STUtility packages (version 0.1.0, https://ludvigla.github.io/STUtility_web_site/Installation.html). Technical variability across samples was reduced with RunSCT and RunHarmony (version 1.0, https://github.com/immunogenomics/harmony) functions. PCA was used to select the most important components and a total of 30 principal components were used in downstream analyses, in all cases.

### Integration of scRNA-Seq and ST data

For the integration between scRNA-seq and Visium data, we used the Python package stereoscope (v.03). This method uses scRNA-seq data to characterize the expression profile of each cluster and then find the combination of the clusters that best explains the detected gene mRNAs in every ST spot, using a probabilistic model. Thus, it produces a matrix with ST spots as rows and percentages of each cluster as columns.

Raw counts from the scRNA-seq and Visium data were used as input, along with the scRNA-seq cluster labels. For the scRNA-seq data from each donor, we used the top 5,000 most variable genes as input, obtained by the ‘VariableFeatures’ function in Seurat77,78. Stereoscope was run with 25,000 epochs with default parameters (more details in the ‘README’ file in package github page). For the integrated scRNA-seq, that is, all age groups, the entire set of scRNA-seq was used as input to each Visium sample individually and stereoscope was run with 20,000 epochs. For visualization, the output matrix was imported into R and the stereoscope proportion values for each ST spot were plotted as features with the STUtility R package (v.1.0) (ref. 96).

### Interactome analyses of spatially related cell identities

For the definition of cell neighbourhoods, that include cell identities being consistently found with high percentage in the same ST spots, we used the stereoscope data and performed Pearson correlation analysis comparing the frequencies of the different cell types in the analysed ST spots, across all samples and timepoints. We further proceeded with the pairwise connections, that had Pearson’s r higher than 0.04. The interactome analyses were based on (1) CellChat because of its ability to identify cell communications based on the interactions between ligands, receptors and co-factors and (2) Nichenet, which predicts cell communications by estimating ligand–target links, based on their expression levels in the interrogated cells, to identify signalling pathways that facilitate cell communications. We initially kept the genes with average gene expression >0.3 log2(normalized UMI counts + 1) in any of the analysed clusters and then used default settings for the downstream analyses. To analyse the predicted target genes of specific ligands, we used the ligand–target score matrix of NicheNet and selected the same genes as for CellChat, applying an extra filter by keeping the expressed genes in at least 25% of any of the clusters and have 10% increase in the number of positive cells and in the logarithmic fold change. Then, we used Seurat to plot the top-predicted genes, using ‘Dotplot’ function. The ligand and the identified by CellChat receptors were also included at the beginning of the plot.

### HybISS

ISS is a targeted method for detecting RNA species on tissue sections97,98. It utilizes padlock probes that upon specific hybridization to the targeted RNA molecule and enzymatically ligated to become circular. Rolling cycle amplification (RCA) is used to produce large DNA molecules of hundreds of complementary repeats of the padlock probe, that provides high signal-to-noise ratios. Multiplexing is achieved with a four-digit barcode approach that decodes distinct combinations of fluorescence of a given RCA product to the initial targeted RNA species, allowing for spatial expression analysis of several tenths of different genes.

### Gene panel selection

The HybISS gene panel was selected on the basis of two independent criteria: gene potential to be markers of the different identified populations and their role in different key signalling pathways. To select the minimum amount of marker genes needed to uncover the cell type of every cell in the analysed samples, an initial list of candidate marker genes was generated by selecting the top four markers of the main clusters found when analysing individually four samples from different timepoints (5 PCW, 8.5 PCW, 13 PCW and 14 PCW), based on their δpct (difference in the percentage of positives in the cluster against all other cells). This list was curated by assessing the importance of every gene in accurately predicting the different cell types (https://github.com/Moldia/Tools/tree/master/Gene_selection). For this, ISS datasets were simulated by randomly distributing cells in a bidimensional space, assigning a cell type to each cell and simulating the expression of each gene by sampling in a negative binomial distribution with r being the mean expression of a certain gene in a certain cell type. Then, probabilistic cell typing by ISS (pciSeq) was used to assess the cell type of each simulated cell, obtaining the contribution of each gene to predict correctly each cell type. Top-five genes contributing to correctly predict each cell type were kept, and further simulations were run, obtaining a final list of 72 genes that were able to predict correctly all the cell types on simulated datasets. For the pathway gene selection, we interrogated the above four scRNA-seq datasets for the expression of WNT, SHH, NOTCH and RTK pathway components, such as ligands, receptors, transducers, inhibitors and targets. We further proceeded with those that showed non-ubiquitous expression patterns. The final gene panel of 147 markers was sent to CARTANA with accompanying customized ID sequences for in-house HybISS chemistry detection.

### HybISS mRNA detection

The HybISS experiments were performed by the ISS facility at Science for Life Laboratories (SciLifeLab) following the manufacturer’s instructions of CARTANA’s High-Sensitivity library preparation kit, using customized backbones, as described in ref. 97 (probe sequences are provided in Supplementary Table 1 (28–30)). After fixation, the tissue sections were overnight incubated with the probe mix, in a hybridization buffer, followed by stringent washing. Then, they were incubated with ligation mix. After washes, RCA was performed overnight. Finally, labelling for detection was performed as described in <protocols.io> (https://doi.org/10.17504/protocols.io.xy4fpyw). Twelve detection cycles were performed on each sample to avoid optical crowding. Therefore, detected genes were divided in three groups, and their four cycle-based barcode was detected in either detection cycles 1–4, 5–8 or 9–12.

### Imaging of HybISS detection cycles

Imaging was performed using a Zeiss Axio Imager.Z2 epifluorescence microscope (Carl Zeiss Microscopy, GmbH), with a Zeiss Plan-Apochromat 20×/0.8 objective (Carl Zeiss Microscopy, GmbH, 420650-9901) and an automatic multi-slide stage (PILine, M-686K011) to allow re-call of coordinates for the regions of interest, facilitating repetitive cycle imaging. The system was equipped with a Lumencor SPECTRA X light engine LED source (Lumencor), having the 395/25, 438/29, 470/24, 555/28, 635/22 and 730/40 filter paddles. The filters, for wavelength separation, included the quad band Chroma 89402 (DAPI, Cy3, Cy5), the quad band Chroma 89403 (AlexaFluor750) and the single band Zeiss 38HE (AlexaFluor488). Images were obtained with an ORCA-Flash4.0 LT Plus sCMOS camera (2,048 × 2,048, 16-bit, Hamamatsu Photonics K. K.).

### HybISS image processing

Imaging data were processed with an in-house pipeline based on MATLAB (https://github.com/Moldia/iss_starfish). Maximum intensity projection was performed on each field of view to obtain a two-dimensional representation of each tile. Then, stitching of tiles was performed using a MATLAB implementation of MIST algorithm, obtaining, after exporting, different *.tiff images corresponding to each channel and round. Then, data were retiled and formatted to fit the Starfish required input. As genes can be either detected in 1–4, 5–8 or 9–12 detection cycles, each group was then decoded independently. Using Starfish tools, individual tiles were registered across cycles and a top hat filter was applied on each channel to get rid of the background noise. Channel intensities were also normalized, and spots were detected. Finally, decoding was performed on each tile using MetricDistance, obtaining the identity of all the detected RCA products.

### HybISS data analysis

Two different yet complementary strategies were followed to characterize the cellular heterogeneity within the ISS datasets. Probabilistic cell typing for in situ sequencing (PciSeq) was performed to identify the identity of every cell in the tissue. For this, cells were segmented on the basis of DAPI using a watershed segmentation, and reads were assigned to cells as described in ref. 71. In addition, Tangram was used to couple the scRNA-seq with the HybISS datasets, functioning similarly to stereoscope. Gene expression imputation was performed as described in ref. 99. In 5 PCW sections, where nuclear segmentation was not possible, hexagonal binning was used to segment the tissue. In this case, the expression of each hexagonal bin was used as input for probabilistic cell typing and Tangram.

### SCRINSHOT

SCRINSHOT is also a targeted method of RNA-species in situ detection that utilizes padlock probes for signal amplification, similarly to ISS. Its major difference is the usage of SplintR-ligase for padlock probe circularization and the simplest detection approach that assigns a fluorophore to a distinct gene, in each detection cycle. The different chemistry and the omission of decoding results in better sensitivity than ISS. However, it has reduced multiplexity (three to five genes per detection cycle), being more laborious than ISS.

### Gene selection, padlock probe design and mRNA detection

For spatial analysis of the two identified NE-cell identities, we used the highly expressed GRP and GHRL, for easy identification of epi cl-12 and epi cl-11, respectively. Then, we selected markers that are expressed in intermediate and low levels, focusing mainly on TFs, such as ASCL1, RFX6, NKX2-2, ARX and PROX1. Markers such as SCGB3A2, FOXJ1 and TP63 were used to identify the non-NE cells. The SCGB1A1, SFTPC, ETV5, FOXJ1, AGER, SOX2 and SOX9 padlock probes were designed as in SCRINSHOT original publication. For the rest, a unique barcode was inserted in the backbone of all probes that recognize the same mRNA, that allowed their detection by only one detection oligo, reducing substantially the cost (all sequences are found in Supplementary Table 1 (31)). All the reactions were done according to the original SCRINSHOT protocol, except for an increase of the detection-oligo hybridization temperature to 30 °C.

### Imaging of SCRINSHOT signals on tissue sections

For signal acquisition we did 13 detection cycles, using a Zeiss Axio Observer Z.2 fluorescent microscope (Carl Zeiss Microscopy, GmbH) with a Colibri 7 LED light source (Carl Zeiss Microscopy, GmbH, 423052-9770-000), equipped with a Zeiss 20×/0.75 Plan-Apochromat, a Zeiss AxioCam 506 Mono digital camera and an automated stage, that allowed imaging of the same regions in every cycle. For signal detection, we used the following Chroma filters: DAPI (49000), FITC (49003), Cy3 (49304), Cy5 (49307), Texas Red (49310) and Atto740 (49007).

### SCRINSHOT image analysis

The nuclear staining was used to align the images of the same areas between the hybridizations, using Zen2.5 (Carl Zeiss Microscopy GmbH). The images were analysed as 16-bit *.tiff files, without compression or scaling. Images were tiled using a custom script in Fiji100,101. The signal dots were counted using Cell-Profiler 4.13 (ref. 102), Fiji100,101 and R-RStudio103,104,105,106,107 custom scripts. The identified signal-dot coordinates were used to project the signals on DAPI images, using TisUUmaps108.

For the analysis of the 11.5 PCW SCRINSHOT dataset, nuclei images were segmented into hexagonal bins of 7 µm radius. Only bins with a clear proximal epithelial component (SOX2 dots >3, EPCAM dots >3) were further processed. To maintain NE-related bins, we used the analysed genes that were specifically expressed in NE cells according to scRNA-seq (ARX, NKX2-2, GHRL, ACSL1, CALCA, GRP, RFX6, CFC1, PCSK1 and ASCL1). Bins with a presence of at least 12 signals of the above genes were further processed. We also kept bins containing more than ten ASCL1 dots, which was found to be expressed by NE progenitors. We created AnnData objects with the counts for each gene in every bin, in addition to the bin coordinates. We used Scanpy to perform Leiden clustering with 0.1 resolution and represented those clusters using UMAP plots. We further assessed the correlation in expression between the different NE genes and represented the Pearson’s correlation results as heat map. Finally, the suggested clusters were annotated on the basis of the combination of different NE markers, according to the scRNA-seq data.

### Exploration of the zonation patterns in the developing lung using ISS

To calculate the relative position of distinct cell types in the proximal–distal and radial axis, analysed tissues with HybISS were segmented into bins (radius 20 µm). Only bins with more than three detected EPCAM mRNAs were considered to be airway related. We calculated the distance of each bin in the tissue to the closest identified airway-related bin, defining the first axis explored (radial axis considering the airway as the centre). Cells with a radial distance higher than 140 µm were excluded from the analysis. To define the second axis, we explored the diversity within airway-related bins and, by UMAP-dimension reduction, we identified that the first dimension recapitulated the proximal–distal typical patterning, based on the expression of known markers. We used that value as pseudotime to assign a proximal–distal value to each of the detected bins. These values served as the second axis of the analysis, considering the proximal–distal value of the closest epithelial bin as the proximal–distal value of the analysed mesenchymal cells. The distribution of the cells analysed was represented using kernel density estimation (KDE)-based heat maps.

### Exploration of the zonation patterns in the developing lung using ST

To explore the zonation of mesenchymal populations present in the developing lung with ST datasets, we analysed sections from 8.5 PCW. We identified ST spots containing airways by looking at the expression top ten differentially expressed epithelial markers (Extended Data Fig. 2g). Cells containing more than eight UMIs were considered as airway-related ST spots. To define the radial axis, each ST spot was given a value depending on its distance from its closer airway-related ST spot. The proximal–distal axis was calculated on the basis of the compared relative expression levels of known proximal (SOX2 and SCGB3A2) and distal (ETV5 and TPPP3) epithelial markers. On the basis of the relative expression of proximal and distal markers, every epithelial ST spot was given a value between −1 (proximal) and 1 (distal). ST spots that were not airway related were given the proximal–distal score of their closest airway-related ST spot. After rounding the proximal–distal scores of every ST spot, the frequency of every cluster detected using stereoscope was then computed by averaging ST spots with the same proximal–distal and radial coordinates.

### Immunofluorescence

Tissue sections were prepared, using the same protocol as SCRINSHOT. Fresh frozen material was fixed with 4% PFA for 10 min at room temperature, and slides were washed three times for 5 min with phosphate-buffered saline (PBS) 1× (pH 7.4). We incubated the sections with 5% donkey serum (Jackson ImmunoResearch, 017-000-121) in PBS 1× (pH 7.4) with 0.1% Triton X100 (blocking buffer) for 1 h at room temperature, and then they were incubated with primary antibodies in blocking buffer overnight at 4 °C. Slides were washed with PBS 1× (pH 7.4) three times for 5 min and incubated with secondary antibodies in 2% donkey serum in PBS 1× (pH 7.4) with 0.1% Triton X100 for 1 h at room temperature. After three washes with PBS 1× (pH 7.4) for 10 min each, nuclei were counterstained with 0.5 µg ml−1 DAPI (Biolegend, 422801) in PBS 1× (pH 7.4) in 0.1% Triton X100 and slides were mounted with ProLong Diamond Antifade Mountant (Thermo, P36961).

Sections treated with anti-PHOX2B goat, anti-DLL3 rabbit, anti-COL13A1 rabbit and Cy3 anti-Actin, α-Smooth Muscle (ACTA2) mouse monoclonal antibodies were incubated in TE buffer (10 mM Tris and 1 mM EDTA pH 9.0) for 30 min, at 80 °C in a waterbath and cooled on ice for 30 min to facilitate antigen retrieval and washed three times for 5 min with PBS 1× (pH 7.4), before incubation with the blocking solution. Sections treated with anti-Krt5 chicken and anti-p63a rabbit antibodies were incubated in sodium citrate (10 mM pH 6.0) and processed as above.

### Image acquisition for immunofluorescence

Image acquisition was initially done as in SCRINSHOT, with a 10× lens, allowing the identification of informative regions of interest. For high-resolution images, we used a Zeiss LSM800 confocal microscope, equipped with a Plan-Apochromat 40×/1.30 oil lens or a Zeiss LSM780 confocal microscope, equipped with a Plan-Apochromat 63×/1.40 oil DIC M27 objective. Optimal resolution settings were used and images were acquired as optical stacks. For imaging of the ACSL1-CGRP-CDH1 stainings, we used a Leica DMI8 microscope (Leica Microsystems, 11090148013000), with a SOLA light engine light source (Lumencor,16740), equipped with a 40×/ 0.80 HC Fluotar, a Hamamatsu camera (2,048 × 2,048, 16-bit, C13440-20C-CL-301201) and an automated stage (ITK Hydra XY). For the signal detection, we used the following Chroma filters: QUAD-S filter set: DFTC (DC: 425; 505; 575; 660). Imaging was done via the LASX software (Leica Microsystems), and images were analysed with Fiji100,101.

### Browser-based interactive visualization of the scRNA-seq, spatial and interactome analyses

For the browser-based representation of our data, we used the TissUUmaps tool109. In the presented version, we have modified TissUUmaps for accelerated GPU-based rendering, enabling real-time interactive multiscale viewing of millions of data points directly via a web browser. Furthermore, we have added functionality so that ST data and single-cell pciSeq data from ISS can be presented as pie charts for efficient viewing of spatial heterogeneity. TissUUmaps supports FAIR sharing of data by allowing users to select regions of interest and directly download raw data in a flexible *.csv format, enabling further exploration and analysis, of all datasets. We based the interactome browser in the Cell Chat shiny app, described in ref. 10.

### Statistics and reproducibility

No statistical method was used to pre-determine sample size. No data were excluded from the analyses. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment. For differential expression analyses of scRNA-seq datasets, MAST package was used in Seurat, and when it is mentioned in figure legends, the results were filtered according to the adjusted P value that was based on Bonferroni correction using all features in the datasets.

For scRNA-seq experiments, we analysed one 5 PCW lung, one 5.5 PCW lung, two 6 PCW lungs, two 7 PCW lungs (twins), one 8 PCW lung, two 8.5 PCW lung, one 10 PCW lung, two 11.5 PCW lungs, two 12 PCW lungs, two 13 PCW lung and one 14 PCW lung. All attempts at replication with the provided scripts were successful.

For ST experiments, we analysed four sections of 6 PCW lungs, (Figs. 1b, 2b,d and 3c and Extended Data Fig. 4c), eight sections of 7 PCW lungs (Fig. 3d), four sections of 8–8.5 PCW lungs (Figs. 2b and 6a and Extended Data Fig. 4c) and four sections of 11.5 lungs (Figs. 2b and 3e and Extended Data Fig. 4c). Sections of each stage were processed in at least two independent experiments with similar results.

For HybISS experiments, we analysed three sections of 5.5 PCW lungs, (Extended Data Figs. 4d, 6g and 8b), two sections of 6 PCW lungs (Figs. 1e, 2e,f and 4g and Extended Data Fig. 2h) and two sections of 13 PCW lungs (Figs. 6c and 7a and Extended Data Figs. 6g and 8b). Sections of each stage were processed in two independent experiments with similar results.

For SCRINSHOT experiments, we analysed one section of a 6 PCW lung, one section of an 8.5 PCW lung, one section of an 11 PCW lung (Extended Data Fig. 10g) and one section of a 14 PCW lung (Fig. 4c and Extended Data Fig. 10d). The sections were processed in two independent experiments, showing similar distal tip (>500 cases) and NE cell patterns (>100 cases).

For LUM COL13A1 ACTA2 immunofluorescence, we analysed four 8.5 PCW lung sections and one 12 PCW lung section in two experiments. More than ten patterns similar to those shown in Fig. 2g were found in each section. For ACTA2 Ecad MKI67 immunofluorescence, we analysed three 8.5 PCW, two 12 PCW and one 14 PCW lung sections, in two independent experiments with similar results. Extended Data Fig. 4e contains representative images of large airways (8.5 PCW: >20, 12 PCW: >40 and 14 PCW: >50), of airway stalks with tips (8.5 PCW: >20, 12 PCW: >50 and 14 PCW: >50) and of distal tips (8.5 PCW: >20, 12 PCW: >50 and 14 PCW: >50). For the DLL3 NF-M PHOX2B stainings in Fig. 3f–h, we stained three 8.5 PCW and one 12 PCW lung sections in two independent experiments. One 8.5 PCW and one 12 PCW lung sections were independently processed for H&E staining. In both stainings, the different tissues gave similar results. For the SOX10 ASCL1 ISL1 immunofluorescence (Extended Data Fig. 7e), we analysed two 8.5 PCW, two 12 PCW and one 14 PCW lung sections, in two independent experiments, with similar results. For the KRT17 Ecad immunofluorescence (Fig. 4d), we stained two 12 PCW and one 14 PCW in two independent experiments with similar results. For TP63 KRT5 Ecad immunofluorescence, we stained two 8.5 PCW and two 14 PCW lung sections in two independent experiments with similar results (Extended Data Fig. 8g). For the SST SSTR2 GHRL staining, we analysed four 8.5 PCW and one 12 PCW lung sections, in three independent experiments with similar results. For GRP GHRL immunofluorescence four 8.5 PCW and one 12 PCW lung sections were analysed, in three independent experiments with similar results.

For all spatial methods, we acquired images of whole lung sections. Representative areas of interest were identified, imaged and used in the figures.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.