Abstract
From singlecell RNAsequencing (scRNAseq) and spatial transcriptomics (ST), one can extract highdimensional gene expression patterns that can be described by intercellular communication networks or decoupled gene modules. These two descriptions of information flow are often assumed to occur independently. However, intercellular communication drives directed flows of information that are mediated by intracellular gene modules, in turn triggering outflows of other signals. Methodologies to describe such intercellular flows are lacking. We present FlowSig, a method that infers communicationdriven intercellular flows from scRNAseq or ST data using graphical causal modeling and conditional independence. We benchmark FlowSig using newly generated experimental cortical organoid data and synthetic data generated from mathematical modeling. We demonstrate FlowSig’s utility by applying it to various studies, showing that FlowSig can capture stimulationinduced changes to paracrine signaling in pancreatic islets, demonstrate shifts in intercellular flows due to increasing COVID19 severity and reconstruct morphogendriven activator–inhibitor patterns in mouse embryogenesis.
Similar content being viewed by others
Main
Cells communicate through biochemical signaling to organize biological activities. Inflows of intercellular signals are processed through intracellular gene regulatory mechanisms involving transcription factors (TFs) and their downstream targets, which result in outflows of other signals. These spatiotemporal flows of ‘cause and effect’ drive every biological process. One famous example of an ‘intercellular flow’ is Wolpert’s French Flag Problem^{1}, wherein a spatially propagating morphogen drives coordinated expression of multiple TFs, generating the eponymous ‘flag’. Biological homeostasis is maintained by coordination between intercellular flows, which is perturbed in disease. Disentangling these intercellular flows is critical to understanding health and disease.
scRNAseq and ST generate simultaneous measurements of 10,000–20,000 genes, yielding highdimensional snapshots of gene expression in biological tissue. From these data, patterns can be extracted that vary along axes such as trajectory, disease status, space and time. There are two primary categories of methods to extract such patterns. First, one can construct gene expression modules (GEMs), defined by gene sets such that intraset expression is more correlated than is gene expression between sets^{2,3,4,5,6,7,8,9,10}. Second, one can infer ligand–receptor interaction networks that facilitate intercellular communication directly from nonspatial scRNAseq^{11,12,13} or spatial data^{13,14,15}. The interplay between both ligand–receptor interactions and GEMs drives intercellular flows across tissues, but there are few methods that can infer such flows. We aim to address this gap.
In studies by Sachs et al.^{16} and Chen et al.^{17}, which are similar to this work, graphical causal modeling was used to learn dependencies directly from singlecell data. Sachs et al. inferred a signaling network from multiperturbation flow cytometry data of phosphoproteins measured in CD8^{+} T cells. Chen et al. inferred personspecific networks between GEMs generated from bulk RNAseq and scRNAseq data sampled from head and neck squamous cell carcinoma tumors. There is also the nodecentric expression model by Fischer et al.^{18} and the spatial variance component analysis framework by Arnol et al.^{19}, which infer how gene expression depends on the local environment. Other methods construct ‘multicellular representations’ of gene expression programs coordinated by several cell states^{5,20,21,22,23} (see Supplementary Table 1 for a comparison of methods).
Here, we present FlowSig, a method that identifies ligand–receptor interactions whose inflows are mediated by intracellular processes and drive subsequent outflow of other intercellular signals. Using graphical modeling and conditional independence testing, FlowSig learns a completed partial directed acyclic graph (CPDAG) describing intercellular flows between three types of constructed variables: inflowing signals, intracellular gene modules and outflowing signals. To reduce the false discovery rate, we orient the CPDAG according to the biological assumption that inflowing intercellular signals are processed by intracellular models before being converted to other outflowing signals. FlowSig can be applied to either nonspatial scRNAseq or ST data. To analyze nonspatial scRNAseq data, in which ligand–receptor interactions are harder to infer accurately, we incorporate information gained from ‘control versus perturbed’ studies, in which the system has been altered by, for example, external stimulation, disease or time. FlowSig uses differential expression analysis and conditional invariance testing to infer the set of inflow and outflow variables that most significantly shift in distribution and thus most likely drive intercellular flows. In doing so, we reduce the set of possible graphs that could be generated by the data and learn a more accurate CPDAG. We validate FlowSig using (1) synthetic data generated from simulations of mathematical models of intercellular flows and (2) novel experimental data generated from cortical organoids. We benchmark FlowSig against several methods and show the unique insights gained from the platform. FlowSig is applied to scRNAseq of stimulated human pancreatic islets, identifying specific changes due to stimulation. We analyze the case of multiple perturbations due to different COVID19 severities resulting in distinct intercellular flow mechanisms. Applying FlowSig to ST data of mouse embryogenesis, we uncover regulatory TFs that enable a ‘flow module’ resembling Turing’s activator–inhibitor system.
Results
FlowSig uses gene expression measurements and output from cell–cell communication inference to learn intercellular flows that describe directed dependencies. These dependencies are oriented from inflowing intercellular signals to intracellular GEMs, which could be individual TFs or cellwise enrichment for correlated gene sets, and from GEMs to outflowing intercellular signals (Fig. 1a). We model the intercellular flows using graphical causal models, where nodes represent the flow variables—inflowing signals, GEMs and outflowing signals—and learn a directed graph using conditional independence testing and the unknown target interventional greedy sparsest permutation algorithm (UTIGSP)^{24}. Considering that one can use statistical conditional independence relations to infer, at best, a set of equivalent directed acyclic graphs (DAGs) with the same undirected skeleton graph and directed vstructures (connected node triplets (x, y, z) with the directed edges x→ y← z)^{25}, we use UTIGSP to learn an initial CPDAG, which can contain both directed arcs and undirected edges. We then construct the intercellular flow network by reorienting undirected edges and removing biologically unrealistic arcs so that edges are directed from inflowing signals to GEMs, between GEMs and from GEMs to outflowing signals.
Although the core steps in using FlowSig to analyze nonspatial scRNAseq and ST data are the same, there are several differences. For nonspatial scRNAseq data, we must overcome a fundamental issue: it is not possible to directly measure the intercellular signals that each cell received. Therefore, we impose two constraints (Fig. 1b). First, we consider only studies comparing a ‘control’ condition against one or more perturbed conditions, for example, healthy versus diseased. We use the additional information gained from perturbation data through conditional invariance testing to narrow down the set of possible flow graphs, reducing the occurrence of false positive edge discovery. Second, for each ligand–receptor interaction inferred from cell–cell communication inference, we extract downstream TF targets from the OmniPath database^{26} to measure signal inflow. Receptor gene expression quantifies the potential for a cell to receive an intercellular signal, and downstream TF expression quantifies the extent to which the cell actually received the signal; we define signal inflow as the product of receptor gene expression and the average expression of downstream TF targets.
ST technologies are currently in their infancy, so there are relatively fewer control versus perturbed ST studies than scRNAseq studies. However, we can use communication methods such as COMMOT^{14} to spatially constrain and measure the amount of inflowing signal more accurately (Fig. 1c). Therefore, FlowSig uses the greedy sparsest algorithm (GSP)^{27}, which does not use perturbation data, to analyze ST data.
Synthetic validation of FlowSig
We first benchmarked FlowSig using synthetic data generated from mathematical models of intercellular flows (see ‘Generating synthetic data from model simulations’ in the Supplementary Notes). For simplicity, we modeled GEMs as individual TFs. We considered three scenarios. In the first scenario, we examined unidirectional intercellular flow induced by SHH signaling that generates outflow of BMP4 through FOXF1 (ref. ^{28}), with flows learned over a set of five nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow due to SHH–PTCH1 binding, FOXF1 TF and BMP4 ligand (Fig. 2a). The second scenario involved SHHinduced tissue patterning, characterized by the expression of NKX2.2, OLIG2, PAX6 and IRX3 (ref. ^{29}). Flows were inferred over a set of seven nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow (SHH–PTCH1 complex), NKX2.2 TF, OLIG2 TF, PAX6 TF and IRX3 TF (Fig. 2b). In the third scenario, we explored competition between SHH and BMP4 in driving dorsoventral patterning^{30}. Flows were learned over a set of nine nodes, including SHH ligand, unbound PTCH1 receptor, inflowing SHH (SHH–PTCH1 complex), BMP4 ligand, unbound BMP1A and BMPR2 receptor, inflowing BMP4 (BMP complex) and three GEM variables, dorsal, intermediate and ventral (Fig. 2c). We wanted to validate two core FlowSig assumptions. The first is that accurate measurement of inflowing signal is needed to infer intercellular flows. For all models, we compared the use of bound ligand–receptor complex as signal inflow to total receptor expression (free receptor plus bound complex), the latter of which is directly measured from scRNAseq and ST data. The second is that including perturbation data increases the accuracy of intercellular flow inference. We quantified the accuracy of FlowSig by measuring the true positive rate (TPR) and true negative rate (TNR) for each scenario. For all scenarios (Fig. 2d–f), we found that the average TPR does not change if we use bound receptor expression to measure signal inflow, or if perturbation data are introduced. However, measuring inflow using bound receptor increases the average TNR. This is especially true for the models describing SHHdriven patterning and competition between SHH and BMP4, in which flows are more complex and multidirectional (Fig. 2e,f). Incorporating perturbation data through conditional invariance testing reduces the variation in TNR values, both in terms of the interquartile range and outliers, resulting in ‘tighter’ estimates of intercellular flows. These results suggest that FlowSig reduces the number of false positive discoveries inferred from baseline GSP and UTIGSP algorithms.
Benchmarking FlowSig against multicellular representation methods
To provide additional insight into FlowSig’s capabilities, we benchmarked it against methods that construct multicellular program representations from scRNAseq and ST data, including DIALOGUE^{5}, scITD^{20}, MOFAcellular^{22}, MOFAtalk^{22}, MultiNicheNet^{23} and Tensorcell2cell^{21}. We also compared FlowSig with direct CellChat output (Supplementary Table 1). All methods were benchmarked using an scRNAseq dataset of stimulated peripheral blood mononuclear cells sampled from people with lupus, which was generated by Kang et al.^{31}. We summarize key points here (see ‘Comparison to other methods’ in the Supplementary Results for a full discussion). We also evaluated FlowSig’s robustness to different inputs constructed by alternative cell–cell communication and GEM construction methods (see ‘Robustness of FlowSig to different input methodologies’ in the Supplementary Results) and found that different cell–cell communication methods can result in different sets of intercellular flows, owing to discrepancies in inferred ligand–receptor interactions; however, FlowSig will infer intercellular flows through GEMs constructed by different methods that are enriched for the same regulatory TFs.
Analyzing CellChat output directly suggested there were 6,886 potential inflowtooutflow relationships. Of these, 3,167 were shared across both conditions, 1,511 were unique to the control condition and 2,208 were unique to the stimulated condition. From CellChat results alone, we cannot infer which of these relations are truly intercellular flows, that is, whether the second interaction depends on the first, and we cannot infer the intracellular mediators of these intercellular flows. By contrast, FlowSig inferred only 44 intercellular flows across 6 signal inflow variables, 20 GEMs and 12 signal outflow variables (see ‘Comparison to other methods’ in the Supplementary Results and Supplementary Fig. 1).
DIALOGUE identified four multicellular programs (MCPs) from the Kang et al. dataset. MCP1 was enriched across CD14^{+} monocytes, CD8^{+} T cells and B cells, suggesting that there was coordination through intercellular flows between these cell types (Supplementary Fig. 2a). In MCP4, CD8^{+} T and CD14^{+} cells exhibited significant differential expression between conditions (Supplementary Fig. 2b). DIALOGUE identified upregulation of the signal ligand CCL4 (in CD8^{+} T cells), which FlowSig inferred to drive signal outflow. scITD decomposed the dataset into two latent factors (Supplementary Fig. 3a): Factor 1 was significantly enriched for FlowSig signal outflow ligands CXCL10, CXCL11 and TNFSF10 (Supplementary Fig. 3b) and intercellularflowdriving interactions (Supplementary Fig. 3c). MOFAcellular decomposed the dataset into five factors (Supplementary Fig. 4a): Factor 1 was enriched for signal outflow variables CXCL11 and TNFSF10 (Supplementary Fig. 4b). Applying MOFATalk to the ligand–receptor interaction scores inferred from LIANA^{32} yielded four factors (Supplementary Fig. 4c): Factor 1 was enriched for the interactions CCL2–CCR1 and CCL8–CCR1 (between CD14^{+} cells, dendritic cells (DCs) and FGR3^{+} cells) and signal outflow of TNFSF13B (Supplementary Fig. 4d). Tensorcell2cell extracted six factors from ligand–receptor interaction scores inferred from LIANA (Supplementary Fig. 5a): CD14^{+} cells, DCs and FGR3^{+} cells were identified as key signal receiver groups (Supplementary Fig. 5b). Clustering ligand–receptor interactions identified that CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 were upregulated after stimulation (Supplementary Fig. 5c). Finally, MultiNicheNet identified CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 as differentially expressed between conditions (Supplementary Fig. 6a). MultiNicheNet also identified outflow of CXCL10, CXCL11 and FASLG and inflow into CCR1 (Supplementary Fig. 6b).
Validating FlowSig using a cortical organoid system
We tested FlowSig against new scRNAseq data generated from an organoid model of cortical development, for which fibroblast growth factor (FGF) and bone morphogenetic protein (BMP) signaling are known to drive patterning^{33}. We generated cortical organoids from human embryonic stem cells and collected the organoids at day 18 (D18) and D35 in culture for scRNAseq analysis. In the organoid system, the cell fate for cortical identity is determined by D18, and signal responses to FGF and BMP, as measured by graded TF expression, are established by D35. The continual exposure of FGF and BMP signaling drives drastic changes in gene expression, and thus between D18 and D35 there are transcriptional changes and changes in cell type composition as the organoids mature. Hence, when applying FlowSig to this dataset, rather than assume the D18 and D35 populations are sampled from the same underlying ‘steady state’ distributions of gene transcription, we treat the D35 data as a ‘perturbed’ form of the ‘control’ D18 data due to exposure to FGF and BMP signaling.
We identified differentially flowing signals from the 77 unique ligand–receptor interactions identified by CellChat^{34} analysis. FlowSig identified 26 differentially inflowing signals (Fig. 3a) and 16 differentially outflowing signals (Fig. 3b), including FGF and BMP (see ‘Identifying differentially flowing signal variables’ in the Methods). We used pyLIGER^{35} to construct 20 GEMs from 2,793 highly variable genes (Fig. 3c and Supplementary Fig. 8a–c). Cells from the D18 timepoint were more enriched for GEM2 through GEM4, GEM7, GEM10, GEM18 and GEM19, whereas cells from the D35 timepoint were enriched for GEM8, GEM11, GEM12, GEM16 and GEM20. Altogether, FlowSig constructed 62 variables for intercellular flow inference. After inference, we aggregated signal inflow variables by their parent signaling pathway. For example, we classified both FGFR1 and FGFR3 inflows under the FGF signaling pathway, which were activated by received FGF2 ligand.
To determine the dominant drivers of intercellular flow, we ranked signal inflow variables by their total edge frequency. We found that FGF, midkine (MK), pleiotrophin (PTN) and neuregulin (NRG) were drivers of intercellular flow. FGF inflow, in particular, drove signal outflow, including BMP4, insulinlike growth factorII (IGFII), nerve growth factor (NGF), NRG1 and NRG3, through numerous GEMs (Fig. 3d). By examining the top GEMspecific TFs mediated by FGFinduced flow (see ‘Interpreting gene expression modules’ in the Methods), we found that EOMES could be a potential regulatory candidate of FGF inflow. We observed that BMP inflow was regulated through many fewer GEMs (Fig. 3e) and could be mediated by PAX6 or NR2F1.
To verify FlowSig analysis, we analyzed a perturbed organoid culture in which we activated the FGF and BMP signaling pathways by adding FGF8b and BMP4, respectively, between D15 and D21. We collected organoid samples at D35 and subjected them to quantitative reverse transcription PCR (RT–qPCR) for gene expression analysis (Fig. 3f,g). Compared with the nonexposed control organoids, we observed that activating FGF signaling significantly downregulated the expression of EOMES (Fig. 3f), whereas elevating BMP signaling simultaneously downregulated PAX6 and upregulated NR2F1 (Fig. 3g). These experimental data demonstrate that FlowSig accurately captures the dominant drivers of intercellular flows from real biological datasets.
FlowSig identifies changes in intercellular flows due to stimulation
To demonstrate how FlowSig recovers intercellular flows driven by an external perturbation, we analyzed scRNAseq data of human pancreatic islets stimulated by interferonγ (IFNγ)^{36}. We constructed ten GEMs using pyLIGER that aligned with the five celltype clusters, alpha, beta 1 to 3, and delta, that we identified independently (Fig. 4a and Supplementary Fig. 9a–c). We used these cell type annotations as input for preliminary CellChat analysis; that is, for each condition, CellChat infers significant pairwise ligand–receptor interactions between the cell groups defined by these celltype labels.
IFNγ stimulation increased inflow of the FGF signaling pathway through FGFR1 (through ligands FGF7 and FGF9, specifically), interleukin6 (IL6) through IL6R and IL6ST, MIF through CD74 and CD44, MDK through NCL and SST through SSTR2 (Fig. 4b). IFNγ stimulation increased outflow of GCG, INHBA and NAMPT, and decreased outflow of ANGPTL2, SPP1, transforming growth factor β1 (TGFβ1), tumor necrosis superfactor family member 12 (TNFSF12) and UCN3 (Fig. 4c). FlowSig identified that FGF, IL6, MDK and SST were the dominant drivers of intercellular flows that drove the outflow of GCG, INHBA, NAMPT, SPP1, TGFB1, TNFSF12 and UCN3 through GEM1, GEM3, GEM5 and GEM6 (Fig. 4d). We observed that GEM1 is enriched in both the alpha and beta 1 clusters, GEM3 and GEM5 are enriched in the alpha cluster, GEM4 is enriched in the beta 2 cluster and GEM6 is enriched in the beta 1 cluster (Fig. 4a), suggesting that intercellular flows are driven by cell type. These results agree with previous work establishing that, in the pancreas, alpha cells are the main secretors of GCG and beta cells are the main secretors of UCN3, and that SST regulates both GCG and UCN3 (ref. ^{37}). We observed that the same TFs contributed to all of these GEMs—ID1, NR1D1, TFF3 and ZNF419—suggesting that these TFs mediate intercellular flows across both conditions.
To further explore the effects of IFNγ stimulation, we split the global intercellular flow network into two networks. First, we constructed a network corresponding to outflow signals upregulated by IFNγ stimulation by taking outflowing signals that were differentially expressed for the IFNγ condition (adjusted P < 0.05 and log_{2}(fold change (FC)) > 0.5), the GEMs that connected to these outflow variables and the signal inflows nodes connected to these GEMs. From this node set, we then extracted the subgraph from the global intercellular flow network (Fig. 4e). The second network corresponded to the intercellular flow network of outflowing signals downregulated by IFNγ (adjusted P < 0.05 and log_{2}(FC) < –0.5) and was constructed in a similar manner (Fig. 4f). Both networks contain the same signal inflow nodes and share nearidentical GEMs. However, GEM3, which drives GCG and NAMPT outflow and is itself regulated by SSTR2 (SST) signaling, is present only in the ‘upregulated’ network, suggesting that it has a specialized role activated by IFNγ stimulation. GEM3 is primarily enriched within alpha cells, suggesting that stimulation drives outflow of GCG and NAMPT from alpha cells. All other inflowing signals and GEMs are shared across both conditions, suggestive of dual regulatory roles. For example, IL6 signaling drives both upregulation of INHBA and NAMPT and downregulation of SPP1, TGFB1 and UCN3 (through GEM4).
FlowSig uses multiple perturbations to find diseasedriven changes
To demonstrate that FlowSig can handle multiple perturbations, we analyzed scRNAseq of human bronchoalveolar lavage fluid (BALF) cells sampled from healthy controls and from people with either moderate or severe COVID19 (ref. ^{38}). We used CellChat and the celltype annotations from the original study to infer significant ligand–receptor interactions, and found 46, 55 and 54 active signaling pathways for healthy controls and the moderate and severe COVID19 groups, respectively.
We constructed 20 GEMs using pyLIGER (Supplementary Fig. 10a b) that captured differences across both condition (Fig. 5a) and cell type (Fig. 5b). FlowSig identified differentially inflowing and outflowing signals specific to each COVID19 condition with respect to healthy controls (Fig. 5c and Supplementary Fig. 10c,d). We note the differential expression of many inflammatory CC chemokines (CCLs) in severe COVID19, including CCL2, CCL3, CCL8, CCL3L1 and CCL7, and CXC chemokines such as CXCL2 and CXCL8 (Supplementary Fig. 10d). In moderate COVID19, we observed differential outflow of fewer inflammatory cytokines, including CCL5 and CCL23.
To analyze the intercellular flows driving these differential outflows, for each set of differentially outflowing signals, we extracted the upstream inflowing signals for which there was a directed path to at least one of the outflowing signals and the corresponding GEMs from the inferred FlowSig network (Fig. 5d–f). Despite the number of differentially outflowing signals increasing with COVID19 severity, the number of inferred signal inflows decreased from 37 to 32 (loss of AXL, CD4, F2RL1, ITGAX and ITGB2, TNFRSF12A and TNFRSF14; gain of CAP1) to 25 (loss of CD27, CXCR3, FPR1, IL6R and IL6ST, LTBR, NCL, NRP2 and PLXNA2, SDC1, TNFRSF13B, TNFRSF17 and TNFRSF25; gain of AXL, CD4, F2RL1 and TNFRSF14). GEMs showed a similar trend: the number of regulatory GEMs decreased from 16 to 13 between healthy and moderate COVID19 (loss of GEM4, GEM10, GEM12 and GEM14; gain of GEM7). The results from Figure 5a,b suggest that the shift from healthy to moderate COVID19 is associated with a downregulation in intercellular flows through epithelial cells (GEM4), plasma and T cells (GEM10) and macrophages and neutrophils (GEM12), but an upregulation of intercellular flows through mast cells (GEM7). From moderate to severe COVID19, there was a decrease from 13 to 8 (loss of GEM1, GEM2, GEM5, GEM11, GEM13, GEM18 and GEM19; gain of GEM12 and GEM14).
We also calculated the intersections between the signal inflow sets (Fig. 5g) and GEM sets (Fig. 5h) driving signal outflows. We observed that 20 out of 37 signal inflows are shared across all three conditions. There were no signal inflows unique to either moderate or severe COVID19 alone, whereas inflow through TNFRSF12A (due to TNFSF12) and ITGAX and ITGB2 (due to C3) drive outflows in only healthy controls. Only inflow through CAP (from RETN1) is shared between moderate and severe COVID19 but is absent in healthy controls. There were more signal inflows shared between the healthy and moderate COVID19 groups than between the healthy and severe COVID19 groups or between the moderate and severe COVID19 groups. We observed a similar trend amongst inferred regulatory GEMs. The most shared GEMs were between only the healthy and moderate COVID19 groups (7 out of 17) and across all three conditions (5 out of 17). GEM4 and GEM10, which are associated with epithelial cells and T cells, respectively, mediated signal outflows in only healthy individuals. Only GEM7, which is associated with mast cells, was shared between the moderate and severe COVID19 groups but not healthy controls. No GEMs that regulate the differential outflows in severe COVID19 were unique to the severe COVID19 group. These results demonstrate how FlowSig can use multiple perturbations to identify trends in intercellular flows. Here, FlowSig identified that increasing severity of COVID19 is associated with (1) a gradual loss of regulatory intercellular inflows and (2) an increase of inflammatory chemokine outflow that is driven by macrophages and neutrophils.
FlowSig identifies regulators of spatial intercellular flow
We applied FlowSig to spatial Stereoseq data of mouse embryogenesis sampled at stage E9.5 of embryogenesis^{39}. We used nonnegative spatial factorization^{4} to construct 20 spatially resolved GEMs from 712 spatially variable genes (Fig. 6a and Supplementary Fig. 11a). We identified Shh outflow to be highly spatially variable (Moran’s I = 0.37; adjusted P = 0.014; Supplementary Fig. 11b), and inferred Shh inflow across the tissue (Supplementary Fig. 11c), in line with Shh’s importance in development^{40}. FlowSig identified several upstream drivers of Shh outflow, including Bmp4, Cxcl12, Fgf15, Mdk, Ptn and Wnt5a, which regulate Shh outflow through GEM2, GEM5, GEM11 and GEM14 (Fig. 6b) and inferred that received Shh inflow (denoted for brevity as rShh) drives outflow of several signal ligands through GEM2, GEM5, GEM9, GEM11, GEM12, GEM14, GEM15 and GEM17 (Fig. 6c).
We used these spatially resolved measurements to infer both specific upstream regulators of Shh outflow and downstream targets of rShh inflow. For each GEM, we extracted the top 10 TFs by module membership (see ‘Interpreting gene expression modules’ in the Methods). We identified potential upstream TFs of Shh outflow using random forest models^{41}, where we ranked TFs by feature (Gini) importance relative to all potential upstream TFs of Shh (see ‘Inferring upstream TF regulators of spatial signals’ in the Methods; Fig. 6d). We identified Foxa2, Foxp2, Myc, Zc3h7a and Foxa1 as the top five upstream regulatory TFs of Shh outflow. Of these, Foxa1 and Foxa2 have been established to regulate Shh^{42}, as has Foxp2 (ref. ^{43}). Although Myc has been established to be regulated downstream of Shh signaling^{44,45}, its role as an upstream regulator is less clear.
To identify downstream targets of rShh inflow, we used pyGAM^{46} (cubic splines, a gamma error distribution, and log link) to fit expression of the top 10 TFs of each inferred downstream GEM as a function of rShh inflow. We ranked TFs by the Spearman correlation between predicted TF expression and rShh itself (Fig. 6e). The downstream TFs that correlated least with rShh included known downstream targets Barhl1 (ref. ^{47}) and Nkx21 (ref. ^{48}), as well as Meox1, Tcf21 and Foxp2, whereas the TFs that were most correlated included known targets like Foxe1 (ref. ^{49}) and Nkx22 (ref. ^{50}), as well as Pou3f1, Tlx2 and Nkx24. We observe that Foxa2 is implicated both upstream and downstream of Shh outflow and inflow, respectively, suggesting that Foxa2 could drive selfproduction of Shh.
We observed potential bidirectional flows between Shh and Bmp4, Cxcl12, Igf2, Mdk and Wnt5a. To validate these flows further, we performed the following analysis. For each ligand, we extracted the top GEMspecific TFs that were both upstream of the ligand and downstream of rShh. We used random forest modeling to calculate feature importance for each TF to ligand outflow. Only Wnt5a was significantly regulated by TFs that were also downstream targets of rShh inflow through GEM5 (Fig. 6f). Furthermore, outflowing Wnt5a and inflowing rWnt5a were spatially colocalized with inflowing rShh and outflowing Shh (Supplementary Fig. 11d,e). Foxa2, Nkx61 and Sox21 were the top upstream regulators of Wnt5a through GEM5, in which Foxa2 is known to regulate Wnt5a^{51}. To infer whether inflowing rWnt5a regulated Shh outflow, we used pyGAM to fit the top TFs of GEM11 as functions of rWnt5a inflow and ranked them by Spearman correlation of the predicted values with rWnt5a (Fig. 6g). We observed that Myc, one of the top upstream regulators of Shh outflow, negatively correlated with rWnt5a inflow.
These observations suggested the following bidirectional flow between Shh and Wnt5a (Fig. 6h). First, outflow and diffusion of Shh drives inflow of rShh, selfamplifying Shh outflow through Foxa2. Inflow of rShh also drives Wnt5a outflow through Foxa2, Nkx61 and/or Sox21. Inflow of rWnt5a through spatial diffusion downregulates Shh outflow through Myc. This module resembles an activator–inhibitor system that can generate potential Turing patterns^{52}, with three key features. First, one or both signals can propagate—here, both Shh and Wnt5a ligands diffuse. Second, one of the signals—Shh—upregulates both itself through Foxa2 and the other signal, Wnt5a through Foxa1, Nkx61 and Sox21. Third, the other signal, Wnt5a, inhibits the activating signal, Shh. We found that Wnt5a inhibits Shh by downregulating Myc, an upstream regulator of Shh. It has been shown that activator–inhibitor systems can generate Turing patterns, which are defined by their complex spatial variation and are known to drive cell fate patterning in development^{53,54,55}, suggesting that at E9.5, Shh and Wnt5a play similar roles.
Discussion
We developed FlowSig to infer intercellular communication activities that may depend on one another through coordinated GEMs. Key to our method is the construction of variables that measure either intercellular information (received and sent) or intracellular information. FlowSig applies graphical causal modeling and causal structure learning to scRNAseq and ST data. As highdimensional omics data continue to accumulate, the field will shift towards more predictive analyses, for which causal inference and causal structure learning models are likely to be key.
FlowSig complements the growing suite of methods for constructing multicellular representation programs. For example, DIALOGUE^{5} uses multilevel modeling to extract coordinated programs involving two or more cell types that have significantly correlated gene expression. Such coordinated programs are likely mediated through the communicationdriven intercellular flows that FlowSig can infer. Other methods, such as MOFAcellular^{22} and scITD^{20}, decompose gene expression data into samplespecific and sampleshared latent GEMs that do not distinguish intercellular signal genes from intracellular signalprocessing genes. MOFAtalk^{22} and Tensorcell2cell^{21} extract coordinated programs of intercellular signaling from ligand–receptor interaction scores. Of the methods to which we compared FlowSig, the most similar is MultiNicheNet^{23}, which also constructs an intercellular signaling dependency network using pretrained signaling databases to construct the datasetspecific network; FlowSig uses conditional independence and conditional invariance testing to determine dependencies directly from the data.
To construct signal inflow and outflow variables, we used CellChat for nonspatial applications and COMMOT for spatial applications. There is a wide range of cell–cell communication inference methods^{11,13}, albeit with limited overlap in results^{32}. Therefore, the choice of method can affect FlowSig output. Alternative communication methods, including CellPhoneDB^{32} and LIANA^{32}, as well as alternative GEM construction methods, such as cNMF^{7}, can be used as input.
To reduce computation time, we inferred ‘coarsegrained’ intercellular flows, in which intracellular processing mechanisms are modeled through multigene GEMs. We assume that these GEMs contain regulatory TFs that mediate signal inflow and outflow. Although we can extract downstream TFs from GEMs, we do not know the precise gene regulatory networks (GRNs) that mediate these signals. One could use methods such as SCENIC^{56} to infer cellwise enrichment for significant regulons or incorporate data that measure open chromatin accessibility^{57} to identify activated TFs. New data modalities, such as Phosphoseq^{58}, that measure posttranslational response and thus signal inflow, will become useful for validation.
It is worth discussing FlowSig’s limitations. As FlowSig uses conditional independence invariance testing based on partial correlation, the analyzed datasets must have sufficiently large sample sizes to estimate dependencies with sufficient statistical significance^{59}. Furthermore, partial correlation assumes that the data are distributed according to a linear Gaussian model, which can be an unrealistic assumption^{60}. Furthermore, as the number of variables increases, so too does the number of false positive relations inferred by the graph learning algorithms used by FlowSig. For nonspatial applications, to learn intercellular flows accurately, the perturbation must significantly shift the distribution of one or more variables. However, if the perturbation completely reduces signal variable expression to zero or induces expression of a variable not expressed in the control condition, partial correlation testing cannot be performed for the perturbed variable because it will have an s.d. of zero. One key limitation is that FlowSig infers a static graph, when intercellular flows are dynamic. Therefore, it will be important to extend FlowSig to capture spatiotemporal flows.
Methods
FlowSig model
FlowSig’s analyses are the same when applied to either nonspatial scRNAseq or ST data. However, to compensate for the reduced precision of inflowing signals measurements from nonspatial scRNAseq, we apply FlowSig to only scRNAseq studies with an appropriate control condition and one or more perturbed conditions representing disease, external stimulation or biological time. We require input from intercellular communication inference and recommend using CellChat^{61} and COMMOT^{14} for nonspatial and spatial data, respectively. FlowSig provides functionality to construct GEMs from nonspatial data and NSF using pyLIGER. However, FlowSig flexibly allows users to use input from other cell–cell communication methods, such as CellPhoneDB^{62} or LIANA^{32}, or from other GEM construction methods, such as cNMF^{7}. We assume that, for each condition, the gene expression matrix (X) has been filtered and variance stabilized, for example by librarysize normalization and log transformation. We note that original, unnormalized counts are also needed to construct GEMs. We use the input to construct augmented ‘flow expression’ matrices for each biological condition that measure inflowing signals, GEM enrichment and outflowing signals, which we define using three methods:

1.
We define inflowing signals differently for nonspatial versus spatial data. For nonspatial scRNAseq data, for each significant ligand–receptor interaction inferred from cell–cell communication analysis (L–R), we define the inflowing signal amount as \(R\times \overline{{TF}}\), where R is the receptor gene expression and \(\overline{{TF}}=\left({TF}_1 + \dots + {TF}_m\right)/m\) is the average gene expression of the known immediate downstream TF targets that we infer from pathway knowledge databases, such as OmniPath^{26} or exFINDER^{63}, where m is the number of known TF targets (see ‘Constructing downstream TF target sets to measure signal inflow’ in the next section). For interactions involving receptor multiunits, \(L\hbox{}{R}_{1}+\ldots {R}_{n}\), where n is the number of receptor subunits, we use the geometric mean of receptor subunit gene expression values, \({R=\left({R}_{1}\ldots {R}_{n}\right)}^{\frac{1}{n}}\), to calculate the inflow signal amount. Our rationale is that receptor gene expression quantifies a cell’s ‘potential’ to receive intercellular signals, and the weighting by average downstream TF expression quantifies the actual downstream activation due to ligand–receptor binding and thus provides a more accurate measure of whether the cell actually received the signal. However, this definition is not exactly the same as the amount of ‘received ligand,’ which may not necessarily trigger downstream activation. By contrast, for ST data, we can measure the inflowing signal directly at each spatial spot using output from spatial CCC inference methods, such as COMMOT^{14}. For a general method, for a given ligand (L) at ST spot (S), for every L–R in which L is inferred to partake, we define the inflowing signal amount as \({\sum }_{R}{C}_{S}^{\,(LR)}\), where \({C}_{S}^{\,(LR)}\) is the inferred communication score for interaction L–R at spot S.

2.
We defined GEM enrichment using output from matrix factorization methods, but GEM enrichment can be constructed from other dimensionality reduction methods in a similar manner. For matrix factorization methods, which decompose the gene expression matrix X into X = WH^{T} where, if X is an N × G matrix, where N is the number of cells and G is the number of genes, W is an N × K matrix describing cell membership into K GEMs, where K is the number of factors, and H is a G × K matrix describing the loadings of each GEM, and H^{T} is the transpose of matrix H, where the rows and columns have been interchanged to ensure correct matrix multiplication. Then, if we define \(\widetilde{W}\) to be the normalized factor membership matrix such that the rows sum to unity, we define each GEM enrichment variable as \({\widetilde{W}}_{k}\), where k = 1, …, K. To standardize GEM enrichment values so that they are on the same scale as logtransformed gene expression values, we use the logtransformed \(\log\, (1+\alpha \widetilde{W})\), where α is the scaling factor used to transform the original unnormalized counts, \(Y=\log\, (1+\alpha \widetilde{X})\), where \(\widetilde{X}\) is the normalized gene expression matrix, such that the rows sum to unity.

3.
Outflowing signals are defined as the gene expression of signal ligands implicated from cell–cell communication analysis. In the case of ligand multiunits, \({L}_{1}+\ldots {L}_{n}  R\), we use the geometric mean of ligand subunit gene expression values, \({\left({L}_{1}\ldots {L}_{n}\right)}^{\frac{1}{n}}\).
Therefore, we associate cells with a vector containing three types of measurements: signal inflow measurements, which are receptor gene expression weighted by the average expression of their known downstream TF genes; intracellular ‘module’ enrichment, which is the cell’s membership weight to a multigene set module, which measures how strongly the cell expresses those genes in the module; and signal outflow, which is ligand gene expression. When measuring signal inflow, we are not measuring from which cells the signals were sent, but rather how much signal has been received by the cell. Similarly, when measuring signal outflow, we are not measuring how much of the expressed signal ligand was actually received by other cells (as measured by, for example, signal inflow), but simply how much of the signal the cell is expressing.
FlowSig applies algorithms from causal structure learning that are based on the concepts of conditional independence testing and, if perturbation data are available, conditional invariance testing, to learn the directed intercellular flow network from the augmented flow expression matrices. Conditional independence testing infers the set of statistical dependencies from the data, whereas conditional invariance infers which variables shifted significantly in distribution after perturbation, for example, owing to disease or external stimulation. All conditional independence and conditional invariance testing are performed using partial Pearson’s correlation to generate sufficient statistics. Despite partial correlation testing relying on the potentially unrealistic assumption that gene expression values are distributed according to a linear multivariate Gaussian distribution, we use the partial correlation method because it is significantly faster than other methods that use nonparametric kernelbased tests, and we can correct for biologically unrealistic edges by analyzing the learned CPDAGs rather than a DAG. To learn the CPDAG, we use the UTIGSP^{24} algorithm when analyzing nonspatial scRNAseq with perturbation data and the GSP^{27} algorithm for spatial data with no considered perturbation. Both of these methods estimate a CPDAG containing both directed and undirected edges that corresponds to the Markov equivalence class inferred from conditional independence and conditional invariance testing. Graphically, the Markov equivalence class is defined by the set of graphs that have the same skeleton graph, which is the undirected equivalent of the CPDAG, and vstructures, which are defined as directed node triplets (x, y, z), where edges are oriented such that \(x\to z\leftarrow y\). FlowSig reorients undirected edges inferred from UTIGSP or GSP according to the assumption that inflow signal nodes must be directed towards GEM nodes, GEM nodes must be directed towards outflow signal nodes and edges between two GEM nodes can be bidirectional.
We also use bootstrap aggregation to further validate the learned intercellular flow network. For nonspatial scRNAseq, we bootstrap by resampling individual cells from each condition with replacement. However, for ST data, we need to account for the spatial dependencies that affect correlation. Therefore, we perform a version of block bootstrapping^{64} as follows. For each bootstrap realization, we divide the tissue into nonoverlapping spatial regions, which we can obtain from either kmeans clustering on the spatial coordinates, leiden clustering of the spatial connectivity graph or from predefined tissue region annotations. Then, within each ‘block,’ we resample with replacement. For each bootstrap realization, FlowSig outputs an adjacency matrix (A), that corresponds to the estimated CPDAG, where A_{ij} = 1 if an edge has been inferred and A_{ij} = 0 otherwise. For B bootstrap realizations, where B > 0 is the number of bootstrap samples, we then take the averaged adjacency, \(\tilde{A}=\,{B}^{1}\mathop{\sum }\nolimits_{b=1}^{B}{A}^{(b)},\) as the final CPDAG. To remove lowconfidence edges, for every edge in the equivalent undirected skeleton graph of the CPDAG, we calculate the total edge weight as \(w\left(i,{j}\right)=\,{A}_{{ij}}+{A}_{{ji}}\). For a specified threshold, defined by the parameter \({w}^{* } < 1\), if \(w\left(i,{j}\right) < {w}^{* }\), we remove the edge from the network, that is, we set \({A}_{{ij}}={A}_{{ji}}=0\). Once the bootstrap aggregated CPDAG has been learned, biologically unrealistic arcs or edges are removed or reoriented, respectively. For all directed arcs from the filtered CPDAG, we retain only arcs that are directed from inflow signals to GEMs, GEMs to other GEMs or from GEMs to outflow signals. Similarly, for undirected edges, we orient edges such that nodes are directed in the same manner. In the case that an edge connects one GEM to another, we include both directions into the final intercellular flow network.
Identifying differentially flowing signal variables
When inferring intercellular flows, we prioritize ‘informative’ inflowing and outflowing signal variables. In the case of scRNAseq analysis, where perturbation data are available, we consider only ‘differentially flowing’ inflow and outflow signals. For all applications in this study, we use a Mann–Whitney U (Wilcoxon ranksum) test to assign variables as differentially flowing if their adjusted P values (after correction for multiple hypothesis testing) fall below a specified threshold (for example, adjusted P < 0.05), indicating statistical significance, and whose log (FC) values are above a specified threshold (for example, log (FC) > 0.5). We analyzed inflow signal variables separately from outflow signal variables. That is, we performed two separate Mann–Whitney U tests—one to identify differentially inflowing variables from only the set of inflow signal variables and one to identify differentially outflowing variables from only the set of outflow signal variables. When analyzing ST data, in which perturbation data are not as readily available, FlowSig instead prioritizes inflow and outflow variables that are spatially variable. For all applications considered, we retain variables for which the graphbased global Moran’s I, which we calculate using Squidpy^{65}, is above a specified threshold, for example (I > 0.1).
Constructing downstream TF target sets to measure signal inflow
To measure signal inflow more accurately from nonspatial scRNAseq data, we used prior knowledge from OmniPath^{26} to weight the gene expression of receptors that have been implicated in intercellular communication from prior cell–cell communication inference. For each ligand–receptor interaction, we searched the KinaseExtra and PathwayExtra modules for TFs that are the first downstream targets of the relevant receptors. Because OmniPath has been constructed for human knowledge, when constructing the downstream TFs for mouse data, we convert the mouse receptor genes implicated from communication inference to their human orthologs and perform the same procedure as for human data.
Interpreting gene expression modules
TFs are the mediators of signal transduction, that is, signal inflow, and the primary regulators of gene transcription, that is, signal outflow. To gain a deeper functional understanding of intercellular flows, it is important to interpret FlowSig output both with respect to GEMs, which describe the expression patterns of coordinated multigene sets, as well as individual GEMspecific TFs. For both nonspatial and spatial data, we consider only a priori known TFs, which in this case are based on TF lists provided by pySCENIC^{56}. Specifically, we use the list provided in allTFs_mm.txt for mouse data and the list provided in allTFs_hg38.txt for human data.
For nonspatial scRNAseq data, we used pyLIGER^{35} to construct integrated GEMs. For a dataset describing \({C}\) conditions, pyLIGER uses joint matrix factorization to decompose each conditionspecific gene expression counts matrix, \({X}^{\,(c)}\in {{\mathbb{Z}}}_{\ge 0}^{N\times G}\), where \({{\mathbb{Z}}}_{\ge 0}\) is the set of all nonnegative integers, N is the number of cells and G is the number of genes, into K GEMs through \({X}^{\,(c)}={F}^{\,(c)}\cdot {\left(W+{V}^{\,\left(c\right)}\right)}^{T}\), where A^{T} is the transpose of matrix A, where rows and columns have been swapped. Here, \({F}^{\,(c)}\in\) \({{\mathbb{R}}}_{\ge 0}^{N\times K}\) is the conditionspecific factors matrix, describing the membership of the cells in condition c to each of the K GEMs, and \(W\in {{\mathbb{R}}}_{\ge 0}^{G\times K}\) and \({V}^{\,(c)}\in {{\mathbb{R}}}_{\ge 0}^{G\times K}\) are the conditionshared and conditionspecific loadings matrix, describing the membership of genes to each of the K GEMs. Larger values of \({F}_{{nk}}^{\,\left(c\right)}\) correspond to greater membership of cell n in condition c to GEM k, while larger values of \({W}_{{gk}}+{V}_{{gk}}^{\,\left(c\right)}\) correspond to greater overall membership of gene g to GEM k. We use the columns of \({F}^{\,(c)}\) as our K GEM variables and use the columns of \(W+{V}^{\,(c)}\) to extract the top TFs for each GEM. For each module k, we sort genes by decreasing order of the loadings sum, \({W}_{{gk}}+{V}_{{gk}}^{\,\left(c\right)}\), and then extract the top contributing TFs in the order by which they appear in the sorted lists.
For ST data, we use NSF^{4} to construct spatially resolved GEMs. In brief, NSF decomposes the gene expression counts, \(X\in {{\mathbb{Z}}}_{\ge 0}^{N\times G}\), which has N spots and G genes, into K GEMs (factors) through \(X={F}{W}^{T}\), where the factors matrix, \(F\in {{\mathbb{R}}}_{\ge 0}^{N\times K}\), describes the spotwise membership to the K GEMs (factors) and is fit using Gaussian processes whose means and covariances vary with spatial locations. The loadings matrix, \(W\in {{\mathbb{R}}}_{\ge 0}^{G\times K},\) describes the gene weight membership to each of the K GEMs. Larger values of F_{nk} indicate a higher enrichment of spot n for GEM k, which describes a spatially varying gene expression pattern; larger values of W_{gk} indicate greater membership of gene g to GEM k, that is, how much gene g contributes to the gene expression pattern. We use the columns of the factor matrix, F, as our K GEM variables and use the columns of loadings matrix, W, to extract the top contributing TFs for each spatial GEM. For each module k, we sort all genes by decreasing order of their W_{gk} value. We then extract the top contributing TFs by the order in which they appeared in the sorted list.
Inferring upstream TF regulators of spatial signals
To infer which TFs could potentially regulate inferred signal outflow variables, we borrow the approach of Cang et al.^{14} After FlowSig infers the global intercellular flow network, for each signal outflow variable that is connected in the network, we first backtrack through the directed network to infer which spatial GEMs are connected to the signal outflow node. For each GEM with a directed edge to the signal outflow variable, we extract the top 10 contributing TFs (see ‘Interpreting gene expression modules’ in Methods). We then use the scikitlearn implementation of the Random Forest regression model^{66} to model the signal ligand gene expression as a function of the TF genes as independent variables. We then ranked the TFs with respect to their feature importance, which is calculated from the Gini importance (mean decrease in impurity).
Experimental validation
Human cortical organoid generation
All experiments using human embryonic stem cells (hESCs) were approved by the University of California, Irvine (UCI) Human Stem Cell Research Oversight (hSCRO) Committee. The hESC line H9 was obtained from the WiCell Institute under a materialtransfer agreement. The methods for hESC maintenance and cortical organoid production were previously established^{67,68}. In brief, H9 cells were maintained with inactivated mouse embryonic feeders (PMEFCF, Millipore Sigma) on a 0.1% gelatincoated plate and cultured in DMEM/F12 (HyClone) with 20% knockout serum replacement (KSR, Invitrogen), nonessential amino acids (NEAAs, Invitrogen), GlutaMAX (Invitrogen), 100 mg ml^{–1} primocin (InvivoGen), 0.1 mM βmercaptoethanol (Invitrogen) and 10 ng ml^{–1} of fibroblast growth factor 2 (FGF2, Invitrogen) at 5% CO_{2} at 37 °C. The medium was refreshed daily. At ~70–80% confluency, H9 cells were differentiated into cortical organoids. After dissociation, 9,000 cells per well were plated into lowattachment Vbottom 96well plates (Sumitomo Bakelite, MS9096V) to form aggregates in medium consisting of Glasgow’s Minimal Essential Medium (GMEM, Invitrogen), 20% KSR, 0.1 nM nonessential amino acids, 100 mg ml^{–1} primocin, 0.1 mM βmercaptoethanol, sodium pyruvate (Invitrogen), Wnt inhibitor IWR1endo (Calbiochem) and TGFβ inhibitor SB431542 (Stemgent). ROCK inhibitor Y27632 (20 µM, BioPioneer) was added in the medium from D0 to D6 to prevent cell death. From D0 to D18, the organoids were maintained at 5% CO_{2}, 37 °C, and half of the medium was changed every 2–3 d. From D18 to D35, the organoids were transferred to Petri dishes and cultured in the medium consisting of DMEM/F12 with N2 (Invitrogen), GlutaMAX, chemically defined lipid concentrate (CDLC, Invitrogen) and 0.4% methylcellulose (Sigma) at 5% CO_{2}, 40% O_{2} and 37 °C. The medium was refreshed every 2–3 d.
Sample preparation and scRNAseq
Organoids were collected at D18 (160 organoids) and D35 (25 organoids), dissociated into single cells and subjected to Evercode Cell Fixation (Parse Biosciences). The organoids were dissociated into a singlecell suspension using Papain Dissociation System (Worthington), following the manufacturer’s manual. The dead cells in the singlecell suspension were removed using EasySep Dead Cell Removal (Annexin V) Kit (STEMCELL Technologies), following the manufacturer’s manual. The cell suspension was then passed through a 40 mm cell strainer before assessing cell number and viability. Samples with total cell numbers >1,000,000 and >80% viability were further processed for cell fixation and freezing following Parse Biosciences User Manual. The samples were then sent to Genomics Research and Technology Hub, UCI, for barcoding and library preparation using Evercode WT kit (Parse Biosciences). Ten thousand cells per sample and 50,000 reads per cell were targeted for sequencing. The sequencing was done using NovaSeq 6000 (Illumina). Alignment was performed using Splitpipe (Parse Biosciences).
Growth factor exposure and RT–qPCR
Between D15 and D21, the organoids were exposed to 400 ng ml^{–1} FGF8b or 50 ng ml^{–1} BMP4 (with 3 mM CHIR99021) in the culture medium. Untreated organoids were used as a control group. The organoid samples were collected at D35 and lysed using Buffer RLT (Qiagen). RNA was extracted using the RNeasy Mini Kit (Qiagen), following the manufacturer’s manual. Then, 1,000–3,000 ng RNA from each sample was converted to complementary DNA using SuperScript IV FirstStrand Synthesis Reaction (Invitrogen). PowerTrack SYBR Green Master Mix (Applied Biosystems), cDNA and primers were mixed and loaded into 384well plates (Invitrogen). The RT–qPCR was carried out by using QuantStudio 7 RealTime PCR System (Applied Biosystems). The following primers were used: EOMES (amplicon size, 225 bp) forward 5′CGACAATAACATGCAGGGCAA3′, reverse 5′TCATTCAAGTCCTCCACGCC3′; PAX6 (amplicon size 48 bp) forward 5′ TGTCCAACGGATGTGTGAGTA3′, reverse 5′CAGTCTCGTAATACCTGCCCA3′; CoupTF1(NR2F1) (amplicon size 104 bp) forward 5′ATCGTGCTGTTCACGTCAGAC3′, reverse 5′TGGCTCCTCACGTACTCCTC3′; GAPDH (amplicon size 69 bp) forward 5′CTCTCTGCTCCTCCTGTTCGAC3′, reverse 5′TGAGCGATGTGGCTCGGCT3′.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The human cortical organoid scRNAseq are available at NCBI GEO at accession number GSE239542. The human pancreatic islet scRNAseq data were originally published by Burkhardt et al.^{36}; the raw gene expression counts and treatment condition metadata were downloaded from NCBI GEO at accession GSE161465. The scRNAseq data of human COVID19 BALF samples were originally published in Liao et al.^{38}; the gene expression matrices and celltype annotation metadata were downloaded from NCBI GEO at GSE145926. The spatial Stereoseq of mouse embryogenesis at time E9.5 was published originally in Chen et al.^{39}; the annotated spatial data were extracted from the file ‘Mouse_embryo_all_stage.h5ad’ hosted at https://db.cngb.org/stomics/mosta/download/.
Code availability
FlowSig is available to install as a Python package from GitHub at https://github.com/axelalmet/flowsig. All scripts used to generate the analysis in this manuscript are available at GitHub at https://github.com/axelalmet/FlowSigAnalysis_2023. The processed versions of all datasets used in this study, including celltype annotation and cell–cell communication output from CellChat and COMMOT for nonspatial and spatial data, respectively, are available at: https://doi.org/10.5281/zenodo.10850397 (ref. ^{69}).
References
Wolpert, L. Positional information and pattern formation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 295, 441–450 (1981).
Gao, C. et al. Iterative singlecell multiomic integration using online learning. Nat. Biotechnol. 39, 1000–1007 (2021).
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).
Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization. Nat. Methods 20, 229–238 (2023).
JerbyArnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from singlecell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Sherman, T. D., Gao, T. & Fertig, E. J. CoGAPS 3: Bayesian nonnegative matrix factorization for singlecell analysis with asynchronous updates and sparse data structures. BMC Bioinf. 21, 453 (2020).
Kotliar, D. et al. Identifying gene expression programs of celltype identity and cellular activity with singlecell RNAseq. eLife 8, e43803 (2019).
Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from singlecell transcriptomic data. Nat. Commun. 12, 5261 (2021).
Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in singlecell atlases. Nat. Cell Biol. 25, 337–350 (2023).
Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in singlecell transcriptomics. Nat. Commun. 12, 5684 (2021).
Almet, A. A., Cang, Z., Jin, S. & Nie, Q. The landscape of cell–cell communication through singlecell transcriptomics. Curr. Opin. Syst. Biol. 26, 12–23 (2021).
Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).
Wang, X., Almet, A. A. & Nie, Q. The promising application of cell–cell interaction analysis in cancer from singlecell and spatial transcriptomics. Semin. Cancer Biol. 95, 42–51 (2023).
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A. & Nolan, G. P. Causal proteinsignaling networks derived from multiparameter singlecell data. Science 308, 523–529 (2005).
Chen, X. et al. An individualized causal framework for learning intercellular communication networks that define microenvironments of individual tumors. PLoS Comput. Biol. 18, e1010761 (2022).
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Arnol, D., Schapiro, D., Bodenmiller, B., SaezRodriguez, J. & Stegle, O. Modeling cell–cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).
Mitchel, J. et al. Tensor decomposition reveals coordinated multicellular patterns of transcriptional variation that distinguish and stratify disease individuals. Preprint at bioRxiv https://doi.org/10.1101/2022.02.16.480703 (2022).
Armingol, E. et al. Contextaware deconvolution of cell–cell communication with Tensorcell2cell. Nat. Commun. 13, 3665 (2022).
Flores, R. O. R., Lanzer, J. D., Dimitrov, D., Velten, B. & SaezRodriguez, J. Multicellular factor analysis of singlecell data for a tissuecentric understanding of disease. eLife 12, e93161 (2023).
Browaeys, R. et al. MultiNicheNet: a flexible framework for differential cellcell communication analysis from multisample multicondition singlecell transcriptomics data. Preprint at bioRxiv https://doi.org/10.1101/2023.06.13.544751 (2023).
Squires, C., Wang, Y. & Uhler, C. Permutationbased causal structure learning with unknown intervention targets. In Proc. 36th Conference on Uncertainty in Artificial Intelligence Vol. 124, 1039–1048 (PMLR, 2020).
Verma, T. S. & Pearl, J. Equivalence and Synthesis of Causal Models (1991).
Türei, D. et al. Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis. Mol. Syst. Biol. 17, 1–16 (2021).
Solus, L., Wang, Y. & Uhler, C. Consistency guarantees for greedy permutationbased causal inference algorithms. Biometrika 108, 795–814 (2021).
Bohnenpoll, T. et al. A SHH–FOXF1–BMP4 signaling axis regulating growth and differentiation of epithelial and mesenchymal tissues in ureter development. PLoS Genet. 13, e1006951 (2017).
Briscoe, J. & Small, S. Morphogen rules: design principles of gradientmediated embryo patterning. Development 142, 3996–4009 (2015).
Zagorski, M. et al. Decoding of position in the developing neural tube from antiparallel morphogen gradients. Science 356, 1379–1383 (2017).
Kang, H. M. et al. Multiplexed droplet singlecell RNAsequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Dimitrov, D. et al. Comparison of methods and resources for cellcell communication inference from singlecell RNAseq data. Nat. Commun. 13, 3224 (2022).
O’Leary, D. D. M., Chou, S.J. & Sahara, S. Area patterning of the mammalian cortex. Neuron 56, 252–269 (2007).
Jin, S. et al. Inference and analysis of cell–cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Lu, L. & Welch, J. D. PyLiger: scalable singlecell multiomic data integration in Python. Bioinformatics 38, 2946–2948 (2022).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at singlecell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Hartig, S. M. & Cox, A. R. Paracrine signaling in islet function and survival. J. Mol. Med. 98, 451–467 (2020).
Liao, M. et al. Singlecell landscape of bronchoalveolar immune cells in patients with COVID19. Nat. Med. 26, 842–844 (2020).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoballpatterned arrays. Cell 185, 1777–1792 (2022).
Briscoe, J. & Thérond, P. P. The mechanisms of Hedgehog signalling and its roles in development and disease. Nat. Rev. Mol. Cell Biol. 14, 416–429 (2013).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Mavromatakis, Y. E. et al. Foxa1 and Foxa2 positively and negatively regulate Shh signalling to specify ventral midbrain progenitor identity. Mech. Dev. 128, 90–103 (2011).
Chiu, Y.C. et al. Foxp2 regulates neuronal differentiation and neuronal subtype specification. Dev. Neurobiol. 74, 723–738 (2014).
Hatton, B. A. et al. Nmyc Is an essential downstream effector of shh signaling during both normal and neoplastic cerebellar growth. Cancer Res. 66, 8655–8661 (2006).
Rao, G., Pedone, C. A., Coffin, C. M., Holland, E. C. & Fults, D. W. cMyc enhances sonic hedgehoginduced medulloblastoma formation from nestinexpressing neural progenitors in mice. Neoplasia 5, 198–204 (2003).
Servén, D., Brummitt, C. & Abedi, H. pyGAM: Generalized additive models in Python. Preprint at https://doi.org/10.5281/ZENODO.1208724 (2018).
Pöschl, J. et al. Expression of BARHL1 in medulloblastoma is associated with prolonged survival in mice and humans. Oncogene 30, 4721–4730 (2011).
Gulacsi, A. & Anderson, S. A. Shh Maintains Nkx2.1 in the MGE by a Gli3independent mechanism. Cereb. Cortex 16, i89–i95 (2006).
Brancaccio, A. et al. Requirement of the forkhead gene Foxe1, a target of sonic hedgehog signaling, in hair follicle morphogenesis. Hum. Mol. Genet. 13, 2595–2606 (2004).
Briscoe, J. et al. Homeobox gene Nkx2.2 and specification of neuronal identity by graded Sonic hedgehog signalling. Nature 398, 622–627 (1999).
Katoh, M. & Katoh, M. Transcriptional mechanisms of WNT5A based on NFκB, Hedgehog, TGFβ, and Notch signaling cascades. Int. J. Mol. Med. 23, 763–769 (2009).
Gierer, A. & Meinhardt, H. A theory of biological pattern formation. Kybernetik 12, 30–39 (1972).
Müller, P. et al. Differential diffusivity of nodal and lefty underlies a reactiondiffusion patterning system. Science 336, 721–724 (2012).
Glover, J. D. et al. Hierarchical patterning modes orchestrate hair follicle morphogenesis. PLoS Biol. 15, e2002117 (2017).
Raspopovic, J., Marcon, L., Russo, L. & Sharpe, J. Digit patterning is controlled by a Bmp–Sox9–Wnt Turing network modulated by morphogen gradients. Science 345, 566–570 (2014).
Van de Sande, B. et al. A scalable SCENIC workflow for singlecell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).
Buenrostro, J. D. et al. Singlecell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Blair, J. D. et al. Phosphoseq: Integrated, multimodal profiling of intracellular protein dynamics in single cells. Preprint at bioRxiv https://doi.org/10.1101/2023.03.27.534442 (2023).
Gamella, J. L., Taeb, A., HeinzeDeml, C. & Bühlmann, P. Characterization and greedy learning of gaussian structural causal models under unknown interventions. Preprint at https://arxiv.org/abs/2211.14897 (2022).
Li, C. & Fan, X. On nonparametric conditional independence tests for continuous variables. Wiley Interdiscip. Rev. Comput. Stat. 12, e1489 (2020).
Jin, S. et al. Inference and analysis of cellcell communication using CellChat. Nat. Commun. 12, 1088 (2021).
GarciaAlonso, L. et al. Singlecell roadmap of human gonadal development. Nature 607, 540–547 (2022).
He, C., Zhou, P. & Nie, Q. exFINDER: identify external communication signals using singlecell transcriptomics data. Nucleic Acids Res. 51, e58 (2023).
Tang, L., Schucany, W. R., Woodward, W. A. & Gunst, R. F. A Parametric Spatial Bootstrap. Report No. SMUTR337 (Southern Methodist University, 2006).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Pedregosa, F. et al. Scikitlearn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Watanabe, M. et al. Selforganized cerebral organoids with humanspecific features predict effective drugs to combat Zika virus infection. Cell Reports 21, 517–532 (2017).
Watanabe, M. et al. TGFβ superfamily signaling regulates the state of human stem cell pluripotency and capacity to create wellstructured telencephalic organoids. Stem Cell Reports 17, 2220–2238 (2022).
Almet, A. Processed datasets used in Almet et al. (2024), "Inferring patterndriving intercellular flows from singlecell and spatial transcriptomics". Zenodo https://doi.org/10.5281/zenodo.10850397 (2024).
Acknowledgements
This work was (partially) supported by NSF grants DMS1763272, CBET2134916 and MCB2028424, NIH grants R01AR079150 and R01DE030565, the Chan Zuckerberg Initiative grant AN0000000062, and a grant from the Simons Foundation (594598). This work was supported by the NIH R00HD096105, NSF RECODE2225624, New Investigator Faculty Award and startup funds from the UCI School of Medicine (M.W.), and the FRAXA Postdoctoral Fellowship (Y.C.T.). We thank C. Squires for useful discussions about the UTIGSP algorithm and X. Wang for initial preprocessing of the cortical organoid scRNAseq data.
Author information
Authors and Affiliations
Contributions
A.A.A. and Q.N. conceived the method. A.A.A. implemented the method. A.A.A. generated the numerical results. Y.C.T. and M.W. generated the experimental results. A.A.A., Y.C.T., M.W. and Q.N. interpreted the results, generated the figures and wrote the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks David Fischer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes, Supplementary Results, Supplementary Tables 1–4 and Supplementary Figures 1–11.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Almet, A.A., Tsai, YC., Watanabe, M. et al. Inferring patterndriving intercellular flows from singlecell and spatial transcriptomics. Nat Methods (2024). https://doi.org/10.1038/s4159202402380w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4159202402380w