Abstract
From single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), one can extract high-dimensional gene expression patterns that can be described by intercellular communication networks or decoupled gene modules. These two descriptions of information flow are often assumed to occur independently. However, intercellular communication drives directed flows of information that are mediated by intracellular gene modules, in turn triggering outflows of other signals. Methodologies to describe such intercellular flows are lacking. We present FlowSig, a method that infers communication-driven intercellular flows from scRNA-seq or ST data using graphical causal modeling and conditional independence. We benchmark FlowSig using newly generated experimental cortical organoid data and synthetic data generated from mathematical modeling. We demonstrate FlowSig’s utility by applying it to various studies, showing that FlowSig can capture stimulation-induced changes to paracrine signaling in pancreatic islets, demonstrate shifts in intercellular flows due to increasing COVID-19 severity and reconstruct morphogen-driven activator–inhibitor patterns in mouse embryogenesis.
Similar content being viewed by others
Main
Cells communicate through biochemical signaling to organize biological activities. Inflows of intercellular signals are processed through intracellular gene regulatory mechanisms involving transcription factors (TFs) and their downstream targets, which result in outflows of other signals. These spatiotemporal flows of ‘cause and effect’ drive every biological process. One famous example of an ‘intercellular flow’ is Wolpert’s French Flag Problem1, wherein a spatially propagating morphogen drives coordinated expression of multiple TFs, generating the eponymous ‘flag’. Biological homeostasis is maintained by coordination between intercellular flows, which is perturbed in disease. Disentangling these intercellular flows is critical to understanding health and disease.
scRNA-seq and ST generate simultaneous measurements of 10,000–20,000 genes, yielding high-dimensional snapshots of gene expression in biological tissue. From these data, patterns can be extracted that vary along axes such as trajectory, disease status, space and time. There are two primary categories of methods to extract such patterns. First, one can construct gene expression modules (GEMs), defined by gene sets such that intra-set expression is more correlated than is gene expression between sets2,3,4,5,6,7,8,9,10. Second, one can infer ligand–receptor interaction networks that facilitate intercellular communication directly from non-spatial scRNA-seq11,12,13 or spatial data13,14,15. The interplay between both ligand–receptor interactions and GEMs drives intercellular flows across tissues, but there are few methods that can infer such flows. We aim to address this gap.
In studies by Sachs et al.16 and Chen et al.17, which are similar to this work, graphical causal modeling was used to learn dependencies directly from single-cell data. Sachs et al. inferred a signaling network from multi-perturbation flow cytometry data of phosphoproteins measured in CD8+ T cells. Chen et al. inferred person-specific networks between GEMs generated from bulk RNA-seq and scRNA-seq data sampled from head and neck squamous cell carcinoma tumors. There is also the node-centric expression model by Fischer et al.18 and the spatial variance component analysis framework by Arnol et al.19, which infer how gene expression depends on the local environment. Other methods construct ‘multicellular representations’ of gene expression programs coordinated by several cell states5,20,21,22,23 (see Supplementary Table 1 for a comparison of methods).
Here, we present FlowSig, a method that identifies ligand–receptor interactions whose inflows are mediated by intracellular processes and drive subsequent outflow of other intercellular signals. Using graphical modeling and conditional independence testing, FlowSig learns a completed partial directed acyclic graph (CPDAG) describing intercellular flows between three types of constructed variables: inflowing signals, intracellular gene modules and outflowing signals. To reduce the false discovery rate, we orient the CPDAG according to the biological assumption that inflowing intercellular signals are processed by intracellular models before being converted to other outflowing signals. FlowSig can be applied to either non-spatial scRNA-seq or ST data. To analyze non-spatial scRNA-seq data, in which ligand–receptor interactions are harder to infer accurately, we incorporate information gained from ‘control versus perturbed’ studies, in which the system has been altered by, for example, external stimulation, disease or time. FlowSig uses differential expression analysis and conditional invariance testing to infer the set of inflow and outflow variables that most significantly shift in distribution and thus most likely drive intercellular flows. In doing so, we reduce the set of possible graphs that could be generated by the data and learn a more accurate CPDAG. We validate FlowSig using (1) synthetic data generated from simulations of mathematical models of intercellular flows and (2) novel experimental data generated from cortical organoids. We benchmark FlowSig against several methods and show the unique insights gained from the platform. FlowSig is applied to scRNA-seq of stimulated human pancreatic islets, identifying specific changes due to stimulation. We analyze the case of multiple perturbations due to different COVID-19 severities resulting in distinct intercellular flow mechanisms. Applying FlowSig to ST data of mouse embryogenesis, we uncover regulatory TFs that enable a ‘flow module’ resembling Turing’s activator–inhibitor system.
Results
FlowSig uses gene expression measurements and output from cell–cell communication inference to learn intercellular flows that describe directed dependencies. These dependencies are oriented from inflowing intercellular signals to intracellular GEMs, which could be individual TFs or cellwise enrichment for correlated gene sets, and from GEMs to outflowing intercellular signals (Fig. 1a). We model the intercellular flows using graphical causal models, where nodes represent the flow variables—inflowing signals, GEMs and outflowing signals—and learn a directed graph using conditional independence testing and the unknown target interventional greedy sparsest permutation algorithm (UT-IGSP)24. Considering that one can use statistical conditional independence relations to infer, at best, a set of equivalent directed acyclic graphs (DAGs) with the same undirected skeleton graph and directed v-structures (connected node triplets (x, y, z) with the directed edges x→ y← z)25, we use UT-IGSP to learn an initial CPDAG, which can contain both directed arcs and undirected edges. We then construct the intercellular flow network by reorienting undirected edges and removing biologically unrealistic arcs so that edges are directed from inflowing signals to GEMs, between GEMs and from GEMs to outflowing signals.
Although the core steps in using FlowSig to analyze non-spatial scRNA-seq and ST data are the same, there are several differences. For non-spatial scRNA-seq data, we must overcome a fundamental issue: it is not possible to directly measure the intercellular signals that each cell received. Therefore, we impose two constraints (Fig. 1b). First, we consider only studies comparing a ‘control’ condition against one or more perturbed conditions, for example, healthy versus diseased. We use the additional information gained from perturbation data through conditional invariance testing to narrow down the set of possible flow graphs, reducing the occurrence of false positive edge discovery. Second, for each ligand–receptor interaction inferred from cell–cell communication inference, we extract downstream TF targets from the OmniPath database26 to measure signal inflow. Receptor gene expression quantifies the potential for a cell to receive an intercellular signal, and downstream TF expression quantifies the extent to which the cell actually received the signal; we define signal inflow as the product of receptor gene expression and the average expression of downstream TF targets.
ST technologies are currently in their infancy, so there are relatively fewer control versus perturbed ST studies than scRNA-seq studies. However, we can use communication methods such as COMMOT14 to spatially constrain and measure the amount of inflowing signal more accurately (Fig. 1c). Therefore, FlowSig uses the greedy sparsest algorithm (GSP)27, which does not use perturbation data, to analyze ST data.
Synthetic validation of FlowSig
We first benchmarked FlowSig using synthetic data generated from mathematical models of intercellular flows (see ‘Generating synthetic data from model simulations’ in the Supplementary Notes). For simplicity, we modeled GEMs as individual TFs. We considered three scenarios. In the first scenario, we examined unidirectional intercellular flow induced by SHH signaling that generates outflow of BMP4 through FOXF1 (ref. 28), with flows learned over a set of five nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow due to SHH–PTCH1 binding, FOXF1 TF and BMP4 ligand (Fig. 2a). The second scenario involved SHH-induced tissue patterning, characterized by the expression of NKX2.2, OLIG2, PAX6 and IRX3 (ref. 29). Flows were inferred over a set of seven nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow (SHH–PTCH1 complex), NKX2.2 TF, OLIG2 TF, PAX6 TF and IRX3 TF (Fig. 2b). In the third scenario, we explored competition between SHH and BMP4 in driving dorsoventral patterning30. Flows were learned over a set of nine nodes, including SHH ligand, unbound PTCH1 receptor, inflowing SHH (SHH–PTCH1 complex), BMP4 ligand, unbound BMP1A and BMPR2 receptor, inflowing BMP4 (BMP complex) and three GEM variables, dorsal, intermediate and ventral (Fig. 2c). We wanted to validate two core FlowSig assumptions. The first is that accurate measurement of inflowing signal is needed to infer intercellular flows. For all models, we compared the use of bound ligand–receptor complex as signal inflow to total receptor expression (free receptor plus bound complex), the latter of which is directly measured from scRNA-seq and ST data. The second is that including perturbation data increases the accuracy of intercellular flow inference. We quantified the accuracy of FlowSig by measuring the true positive rate (TPR) and true negative rate (TNR) for each scenario. For all scenarios (Fig. 2d–f), we found that the average TPR does not change if we use bound receptor expression to measure signal inflow, or if perturbation data are introduced. However, measuring inflow using bound receptor increases the average TNR. This is especially true for the models describing SHH-driven patterning and competition between SHH and BMP4, in which flows are more complex and multidirectional (Fig. 2e,f). Incorporating perturbation data through conditional invariance testing reduces the variation in TNR values, both in terms of the interquartile range and outliers, resulting in ‘tighter’ estimates of intercellular flows. These results suggest that FlowSig reduces the number of false positive discoveries inferred from baseline GSP and UT-IGSP algorithms.
Benchmarking FlowSig against multicellular representation methods
To provide additional insight into FlowSig’s capabilities, we benchmarked it against methods that construct multicellular program representations from scRNA-seq and ST data, including DIALOGUE5, scITD20, MOFAcellular22, MOFAtalk22, MultiNicheNet23 and Tensor-cell2cell21. We also compared FlowSig with direct CellChat output (Supplementary Table 1). All methods were benchmarked using an scRNA-seq dataset of stimulated peripheral blood mononuclear cells sampled from people with lupus, which was generated by Kang et al.31. We summarize key points here (see ‘Comparison to other methods’ in the Supplementary Results for a full discussion). We also evaluated FlowSig’s robustness to different inputs constructed by alternative cell–cell communication and GEM construction methods (see ‘Robustness of FlowSig to different input methodologies’ in the Supplementary Results) and found that different cell–cell communication methods can result in different sets of intercellular flows, owing to discrepancies in inferred ligand–receptor interactions; however, FlowSig will infer intercellular flows through GEMs constructed by different methods that are enriched for the same regulatory TFs.
Analyzing CellChat output directly suggested there were 6,886 potential inflow-to-outflow relationships. Of these, 3,167 were shared across both conditions, 1,511 were unique to the control condition and 2,208 were unique to the stimulated condition. From CellChat results alone, we cannot infer which of these relations are truly intercellular flows, that is, whether the second interaction depends on the first, and we cannot infer the intracellular mediators of these intercellular flows. By contrast, FlowSig inferred only 44 intercellular flows across 6 signal inflow variables, 20 GEMs and 12 signal outflow variables (see ‘Comparison to other methods’ in the Supplementary Results and Supplementary Fig. 1).
DIALOGUE identified four multicellular programs (MCPs) from the Kang et al. dataset. MCP1 was enriched across CD14+ monocytes, CD8+ T cells and B cells, suggesting that there was coordination through intercellular flows between these cell types (Supplementary Fig. 2a). In MCP4, CD8+ T and CD14+ cells exhibited significant differential expression between conditions (Supplementary Fig. 2b). DIALOGUE identified upregulation of the signal ligand CCL4 (in CD8+ T cells), which FlowSig inferred to drive signal outflow. scITD decomposed the dataset into two latent factors (Supplementary Fig. 3a): Factor 1 was significantly enriched for FlowSig signal outflow ligands CXCL10, CXCL11 and TNFSF10 (Supplementary Fig. 3b) and intercellular-flow-driving interactions (Supplementary Fig. 3c). MOFAcellular decomposed the dataset into five factors (Supplementary Fig. 4a): Factor 1 was enriched for signal outflow variables CXCL11 and TNFSF10 (Supplementary Fig. 4b). Applying MOFATalk to the ligand–receptor interaction scores inferred from LIANA32 yielded four factors (Supplementary Fig. 4c): Factor 1 was enriched for the interactions CCL2–CCR1 and CCL8–CCR1 (between CD14+ cells, dendritic cells (DCs) and FGR3+ cells) and signal outflow of TNFSF13B (Supplementary Fig. 4d). Tensor-cell2cell extracted six factors from ligand–receptor interaction scores inferred from LIANA (Supplementary Fig. 5a): CD14+ cells, DCs and FGR3+ cells were identified as key signal receiver groups (Supplementary Fig. 5b). Clustering ligand–receptor interactions identified that CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 were upregulated after stimulation (Supplementary Fig. 5c). Finally, MultiNicheNet identified CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 as differentially expressed between conditions (Supplementary Fig. 6a). MultiNicheNet also identified outflow of CXCL10, CXCL11 and FASLG and inflow into CCR1 (Supplementary Fig. 6b).
Validating FlowSig using a cortical organoid system
We tested FlowSig against new scRNA-seq data generated from an organoid model of cortical development, for which fibroblast growth factor (FGF) and bone morphogenetic protein (BMP) signaling are known to drive patterning33. We generated cortical organoids from human embryonic stem cells and collected the organoids at day 18 (D18) and D35 in culture for scRNA-seq analysis. In the organoid system, the cell fate for cortical identity is determined by D18, and signal responses to FGF and BMP, as measured by graded TF expression, are established by D35. The continual exposure of FGF and BMP signaling drives drastic changes in gene expression, and thus between D18 and D35 there are transcriptional changes and changes in cell type composition as the organoids mature. Hence, when applying FlowSig to this dataset, rather than assume the D18 and D35 populations are sampled from the same underlying ‘steady state’ distributions of gene transcription, we treat the D35 data as a ‘perturbed’ form of the ‘control’ D18 data due to exposure to FGF and BMP signaling.
We identified differentially flowing signals from the 77 unique ligand–receptor interactions identified by CellChat34 analysis. FlowSig identified 26 differentially inflowing signals (Fig. 3a) and 16 differentially outflowing signals (Fig. 3b), including FGF and BMP (see ‘Identifying differentially flowing signal variables’ in the Methods). We used pyLIGER35 to construct 20 GEMs from 2,793 highly variable genes (Fig. 3c and Supplementary Fig. 8a–c). Cells from the D18 timepoint were more enriched for GEM-2 through GEM-4, GEM-7, GEM-10, GEM-18 and GEM-19, whereas cells from the D35 timepoint were enriched for GEM-8, GEM-11, GEM-12, GEM-16 and GEM-20. Altogether, FlowSig constructed 62 variables for intercellular flow inference. After inference, we aggregated signal inflow variables by their parent signaling pathway. For example, we classified both FGFR1 and FGFR3 inflows under the FGF signaling pathway, which were activated by received FGF2 ligand.
To determine the dominant drivers of intercellular flow, we ranked signal inflow variables by their total edge frequency. We found that FGF, midkine (MK), pleiotrophin (PTN) and neuregulin (NRG) were drivers of intercellular flow. FGF inflow, in particular, drove signal outflow, including BMP4, insulin-like growth factor-II (IGF-II), nerve growth factor (NGF), NRG1 and NRG3, through numerous GEMs (Fig. 3d). By examining the top GEM-specific TFs mediated by FGF-induced flow (see ‘Interpreting gene expression modules’ in the Methods), we found that EOMES could be a potential regulatory candidate of FGF inflow. We observed that BMP inflow was regulated through many fewer GEMs (Fig. 3e) and could be mediated by PAX6 or NR2F1.
To verify FlowSig analysis, we analyzed a perturbed organoid culture in which we activated the FGF and BMP signaling pathways by adding FGF8b and BMP4, respectively, between D15 and D21. We collected organoid samples at D35 and subjected them to quantitative reverse transcription PCR (RT–qPCR) for gene expression analysis (Fig. 3f,g). Compared with the non-exposed control organoids, we observed that activating FGF signaling significantly downregulated the expression of EOMES (Fig. 3f), whereas elevating BMP signaling simultaneously downregulated PAX6 and upregulated NR2F1 (Fig. 3g). These experimental data demonstrate that FlowSig accurately captures the dominant drivers of intercellular flows from real biological datasets.
FlowSig identifies changes in intercellular flows due to stimulation
To demonstrate how FlowSig recovers intercellular flows driven by an external perturbation, we analyzed scRNA-seq data of human pancreatic islets stimulated by interferon-γ (IFN-γ)36. We constructed ten GEMs using pyLIGER that aligned with the five cell-type clusters, alpha, beta 1 to 3, and delta, that we identified independently (Fig. 4a and Supplementary Fig. 9a–c). We used these cell type annotations as input for preliminary CellChat analysis; that is, for each condition, CellChat infers significant pairwise ligand–receptor interactions between the cell groups defined by these cell-type labels.
IFN-γ stimulation increased inflow of the FGF signaling pathway through FGFR1 (through ligands FGF7 and FGF9, specifically), interleukin-6 (IL-6) through IL-6R and IL-6ST, MIF through CD74 and CD44, MDK through NCL and SST through SSTR2 (Fig. 4b). IFN-γ stimulation increased outflow of GCG, INHBA and NAMPT, and decreased outflow of ANGPTL2, SPP1, transforming growth factor β1 (TGFβ1), tumor necrosis superfactor family member 12 (TNFSF12) and UCN3 (Fig. 4c). FlowSig identified that FGF, IL-6, MDK and SST were the dominant drivers of intercellular flows that drove the outflow of GCG, INHBA, NAMPT, SPP1, TGFB1, TNFSF12 and UCN3 through GEM-1, GEM-3, GEM-5 and GEM-6 (Fig. 4d). We observed that GEM-1 is enriched in both the alpha and beta 1 clusters, GEM-3 and GEM-5 are enriched in the alpha cluster, GEM-4 is enriched in the beta 2 cluster and GEM-6 is enriched in the beta 1 cluster (Fig. 4a), suggesting that intercellular flows are driven by cell type. These results agree with previous work establishing that, in the pancreas, alpha cells are the main secretors of GCG and beta cells are the main secretors of UCN3, and that SST regulates both GCG and UCN3 (ref. 37). We observed that the same TFs contributed to all of these GEMs—ID1, NR1D1, TFF3 and ZNF419—suggesting that these TFs mediate intercellular flows across both conditions.
To further explore the effects of IFN-γ stimulation, we split the global intercellular flow network into two networks. First, we constructed a network corresponding to outflow signals upregulated by IFN-γ stimulation by taking outflowing signals that were differentially expressed for the IFN-γ condition (adjusted P < 0.05 and log2(fold change (FC)) > 0.5), the GEMs that connected to these outflow variables and the signal inflows nodes connected to these GEMs. From this node set, we then extracted the subgraph from the global intercellular flow network (Fig. 4e). The second network corresponded to the intercellular flow network of outflowing signals downregulated by IFN-γ (adjusted P < 0.05 and log2(FC) < –0.5) and was constructed in a similar manner (Fig. 4f). Both networks contain the same signal inflow nodes and share near-identical GEMs. However, GEM-3, which drives GCG and NAMPT outflow and is itself regulated by SSTR2 (SST) signaling, is present only in the ‘upregulated’ network, suggesting that it has a specialized role activated by IFN-γ stimulation. GEM-3 is primarily enriched within alpha cells, suggesting that stimulation drives outflow of GCG and NAMPT from alpha cells. All other inflowing signals and GEMs are shared across both conditions, suggestive of dual regulatory roles. For example, IL-6 signaling drives both upregulation of INHBA and NAMPT and downregulation of SPP1, TGFB1 and UCN3 (through GEM-4).
FlowSig uses multiple perturbations to find disease-driven changes
To demonstrate that FlowSig can handle multiple perturbations, we analyzed scRNA-seq of human bronchoalveolar lavage fluid (BALF) cells sampled from healthy controls and from people with either moderate or severe COVID-19 (ref. 38). We used CellChat and the cell-type annotations from the original study to infer significant ligand–receptor interactions, and found 46, 55 and 54 active signaling pathways for healthy controls and the moderate and severe COVID-19 groups, respectively.
We constructed 20 GEMs using pyLIGER (Supplementary Fig. 10a b) that captured differences across both condition (Fig. 5a) and cell type (Fig. 5b). FlowSig identified differentially inflowing and outflowing signals specific to each COVID-19 condition with respect to healthy controls (Fig. 5c and Supplementary Fig. 10c,d). We note the differential expression of many inflammatory CC chemokines (CCLs) in severe COVID-19, including CCL2, CCL3, CCL8, CCL3L1 and CCL7, and CXC chemokines such as CXCL2 and CXCL8 (Supplementary Fig. 10d). In moderate COVID-19, we observed differential outflow of fewer inflammatory cytokines, including CCL5 and CCL23.
To analyze the intercellular flows driving these differential outflows, for each set of differentially outflowing signals, we extracted the upstream inflowing signals for which there was a directed path to at least one of the outflowing signals and the corresponding GEMs from the inferred FlowSig network (Fig. 5d–f). Despite the number of differentially outflowing signals increasing with COVID-19 severity, the number of inferred signal inflows decreased from 37 to 32 (loss of AXL, CD4, F2RL1, ITGAX and ITGB2, TNFRSF12A and TNFRSF14; gain of CAP1) to 25 (loss of CD27, CXCR3, FPR1, IL-6R and IL-6ST, LTBR, NCL, NRP2 and PLXNA2, SDC1, TNFRSF13B, TNFRSF17 and TNFRSF25; gain of AXL, CD4, F2RL1 and TNFRSF14). GEMs showed a similar trend: the number of regulatory GEMs decreased from 16 to 13 between healthy and moderate COVID-19 (loss of GEM-4, GEM-10, GEM-12 and GEM-14; gain of GEM-7). The results from Figure 5a,b suggest that the shift from healthy to moderate COVID-19 is associated with a downregulation in intercellular flows through epithelial cells (GEM-4), plasma and T cells (GEM-10) and macrophages and neutrophils (GEM-12), but an upregulation of intercellular flows through mast cells (GEM-7). From moderate to severe COVID-19, there was a decrease from 13 to 8 (loss of GEM-1, GEM-2, GEM-5, GEM-11, GEM-13, GEM-18 and GEM-19; gain of GEM-12 and GEM-14).
We also calculated the intersections between the signal inflow sets (Fig. 5g) and GEM sets (Fig. 5h) driving signal outflows. We observed that 20 out of 37 signal inflows are shared across all three conditions. There were no signal inflows unique to either moderate or severe COVID-19 alone, whereas inflow through TNFRSF12A (due to TNFSF12) and ITGAX and ITGB2 (due to C3) drive outflows in only healthy controls. Only inflow through CAP (from RETN1) is shared between moderate and severe COVID-19 but is absent in healthy controls. There were more signal inflows shared between the healthy and moderate COVID-19 groups than between the healthy and severe COVID-19 groups or between the moderate and severe COVID-19 groups. We observed a similar trend amongst inferred regulatory GEMs. The most shared GEMs were between only the healthy and moderate COVID-19 groups (7 out of 17) and across all three conditions (5 out of 17). GEM-4 and GEM-10, which are associated with epithelial cells and T cells, respectively, mediated signal outflows in only healthy individuals. Only GEM-7, which is associated with mast cells, was shared between the moderate and severe COVID-19 groups but not healthy controls. No GEMs that regulate the differential outflows in severe COVID-19 were unique to the severe COVID-19 group. These results demonstrate how FlowSig can use multiple perturbations to identify trends in intercellular flows. Here, FlowSig identified that increasing severity of COVID-19 is associated with (1) a gradual loss of regulatory intercellular inflows and (2) an increase of inflammatory chemokine outflow that is driven by macrophages and neutrophils.
FlowSig identifies regulators of spatial intercellular flow
We applied FlowSig to spatial Stereo-seq data of mouse embryogenesis sampled at stage E9.5 of embryogenesis39. We used non-negative spatial factorization4 to construct 20 spatially resolved GEMs from 712 spatially variable genes (Fig. 6a and Supplementary Fig. 11a). We identified Shh outflow to be highly spatially variable (Moran’s I = 0.37; adjusted P = 0.014; Supplementary Fig. 11b), and inferred Shh inflow across the tissue (Supplementary Fig. 11c), in line with Shh’s importance in development40. FlowSig identified several upstream drivers of Shh outflow, including Bmp4, Cxcl12, Fgf15, Mdk, Ptn and Wnt5a, which regulate Shh outflow through GEM-2, GEM-5, GEM-11 and GEM-14 (Fig. 6b) and inferred that received Shh inflow (denoted for brevity as r-Shh) drives outflow of several signal ligands through GEM-2, GEM-5, GEM-9, GEM-11, GEM-12, GEM-14, GEM-15 and GEM-17 (Fig. 6c).
We used these spatially resolved measurements to infer both specific upstream regulators of Shh outflow and downstream targets of r-Shh inflow. For each GEM, we extracted the top 10 TFs by module membership (see ‘Interpreting gene expression modules’ in the Methods). We identified potential upstream TFs of Shh outflow using random forest models41, where we ranked TFs by feature (Gini) importance relative to all potential upstream TFs of Shh (see ‘Inferring upstream TF regulators of spatial signals’ in the Methods; Fig. 6d). We identified Foxa2, Foxp2, Myc, Zc3h7a and Foxa1 as the top five upstream regulatory TFs of Shh outflow. Of these, Foxa1 and Foxa2 have been established to regulate Shh42, as has Foxp2 (ref. 43). Although Myc has been established to be regulated downstream of Shh signaling44,45, its role as an upstream regulator is less clear.
To identify downstream targets of r-Shh inflow, we used pyGAM46 (cubic splines, a gamma error distribution, and log link) to fit expression of the top 10 TFs of each inferred downstream GEM as a function of r-Shh inflow. We ranked TFs by the Spearman correlation between predicted TF expression and r-Shh itself (Fig. 6e). The downstream TFs that correlated least with r-Shh included known downstream targets Barhl1 (ref. 47) and Nkx2-1 (ref. 48), as well as Meox1, Tcf21 and Foxp2, whereas the TFs that were most correlated included known targets like Foxe1 (ref. 49) and Nkx2-2 (ref. 50), as well as Pou3f1, Tlx2 and Nkx2-4. We observe that Foxa2 is implicated both upstream and downstream of Shh outflow and inflow, respectively, suggesting that Foxa2 could drive self-production of Shh.
We observed potential bidirectional flows between Shh and Bmp4, Cxcl12, Igf2, Mdk and Wnt5a. To validate these flows further, we performed the following analysis. For each ligand, we extracted the top GEM-specific TFs that were both upstream of the ligand and downstream of r-Shh. We used random forest modeling to calculate feature importance for each TF to ligand outflow. Only Wnt5a was significantly regulated by TFs that were also downstream targets of r-Shh inflow through GEM-5 (Fig. 6f). Furthermore, outflowing Wnt5a and inflowing r-Wnt5a were spatially colocalized with inflowing r-Shh and outflowing Shh (Supplementary Fig. 11d,e). Foxa2, Nkx6-1 and Sox21 were the top upstream regulators of Wnt5a through GEM-5, in which Foxa2 is known to regulate Wnt5a51. To infer whether inflowing r-Wnt5a regulated Shh outflow, we used pyGAM to fit the top TFs of GEM-11 as functions of r-Wnt5a inflow and ranked them by Spearman correlation of the predicted values with r-Wnt5a (Fig. 6g). We observed that Myc, one of the top upstream regulators of Shh outflow, negatively correlated with r-Wnt5a inflow.
These observations suggested the following bidirectional flow between Shh and Wnt5a (Fig. 6h). First, outflow and diffusion of Shh drives inflow of r-Shh, self-amplifying Shh outflow through Foxa2. Inflow of r-Shh also drives Wnt5a outflow through Foxa2, Nkx6-1 and/or Sox21. Inflow of r-Wnt5a through spatial diffusion downregulates Shh outflow through Myc. This module resembles an activator–inhibitor system that can generate potential Turing patterns52, with three key features. First, one or both signals can propagate—here, both Shh and Wnt5a ligands diffuse. Second, one of the signals—Shh—upregulates both itself through Foxa2 and the other signal, Wnt5a through Foxa1, Nkx6-1 and Sox21. Third, the other signal, Wnt5a, inhibits the activating signal, Shh. We found that Wnt5a inhibits Shh by downregulating Myc, an upstream regulator of Shh. It has been shown that activator–inhibitor systems can generate Turing patterns, which are defined by their complex spatial variation and are known to drive cell fate patterning in development53,54,55, suggesting that at E9.5, Shh and Wnt5a play similar roles.
Discussion
We developed FlowSig to infer intercellular communication activities that may depend on one another through coordinated GEMs. Key to our method is the construction of variables that measure either intercellular information (received and sent) or intracellular information. FlowSig applies graphical causal modeling and causal structure learning to scRNA-seq and ST data. As high-dimensional omics data continue to accumulate, the field will shift towards more predictive analyses, for which causal inference and causal structure learning models are likely to be key.
FlowSig complements the growing suite of methods for constructing multicellular representation programs. For example, DIALOGUE5 uses multilevel modeling to extract coordinated programs involving two or more cell types that have significantly correlated gene expression. Such coordinated programs are likely mediated through the communication-driven intercellular flows that FlowSig can infer. Other methods, such as MOFAcellular22 and scITD20, decompose gene expression data into sample-specific and sample-shared latent GEMs that do not distinguish intercellular signal genes from intracellular signal-processing genes. MOFAtalk22 and Tensor-cell2cell21 extract coordinated programs of intercellular signaling from ligand–receptor interaction scores. Of the methods to which we compared FlowSig, the most similar is MultiNicheNet23, which also constructs an intercellular signaling dependency network using pretrained signaling databases to construct the dataset-specific network; FlowSig uses conditional independence and conditional invariance testing to determine dependencies directly from the data.
To construct signal inflow and outflow variables, we used CellChat for non-spatial applications and COMMOT for spatial applications. There is a wide range of cell–cell communication inference methods11,13, albeit with limited overlap in results32. Therefore, the choice of method can affect FlowSig output. Alternative communication methods, including CellPhoneDB32 and LIANA32, as well as alternative GEM construction methods, such as cNMF7, can be used as input.
To reduce computation time, we inferred ‘coarse-grained’ intercellular flows, in which intracellular processing mechanisms are modeled through multigene GEMs. We assume that these GEMs contain regulatory TFs that mediate signal inflow and outflow. Although we can extract downstream TFs from GEMs, we do not know the precise gene regulatory networks (GRNs) that mediate these signals. One could use methods such as SCENIC56 to infer cellwise enrichment for significant regulons or incorporate data that measure open chromatin accessibility57 to identify activated TFs. New data modalities, such as Phospho-seq58, that measure post-translational response and thus signal inflow, will become useful for validation.
It is worth discussing FlowSig’s limitations. As FlowSig uses conditional independence invariance testing based on partial correlation, the analyzed datasets must have sufficiently large sample sizes to estimate dependencies with sufficient statistical significance59. Furthermore, partial correlation assumes that the data are distributed according to a linear Gaussian model, which can be an unrealistic assumption60. Furthermore, as the number of variables increases, so too does the number of false positive relations inferred by the graph learning algorithms used by FlowSig. For non-spatial applications, to learn intercellular flows accurately, the perturbation must significantly shift the distribution of one or more variables. However, if the perturbation completely reduces signal variable expression to zero or induces expression of a variable not expressed in the control condition, partial correlation testing cannot be performed for the perturbed variable because it will have an s.d. of zero. One key limitation is that FlowSig infers a static graph, when intercellular flows are dynamic. Therefore, it will be important to extend FlowSig to capture spatiotemporal flows.
Methods
FlowSig model
FlowSig’s analyses are the same when applied to either non-spatial scRNA-seq or ST data. However, to compensate for the reduced precision of inflowing signals measurements from non-spatial scRNA-seq, we apply FlowSig to only scRNA-seq studies with an appropriate control condition and one or more perturbed conditions representing disease, external stimulation or biological time. We require input from intercellular communication inference and recommend using CellChat61 and COMMOT14 for non-spatial and spatial data, respectively. FlowSig provides functionality to construct GEMs from non-spatial data and NSF using pyLIGER. However, FlowSig flexibly allows users to use input from other cell–cell communication methods, such as CellPhoneDB62 or LIANA32, or from other GEM construction methods, such as cNMF7. We assume that, for each condition, the gene expression matrix (X) has been filtered and variance stabilized, for example by library-size normalization and log transformation. We note that original, unnormalized counts are also needed to construct GEMs. We use the input to construct augmented ‘flow expression’ matrices for each biological condition that measure inflowing signals, GEM enrichment and outflowing signals, which we define using three methods:
-
1.
We define inflowing signals differently for non-spatial versus spatial data. For non-spatial scRNA-seq data, for each significant ligand–receptor interaction inferred from cell–cell communication analysis (L–R), we define the inflowing signal amount as \(R\times \overline{{TF}}\), where R is the receptor gene expression and \(\overline{{TF}}=\left({TF}_1 + \dots + {TF}_m\right)/m\) is the average gene expression of the known immediate downstream TF targets that we infer from pathway knowledge databases, such as OmniPath26 or exFINDER63, where m is the number of known TF targets (see ‘Constructing downstream TF target sets to measure signal inflow’ in the next section). For interactions involving receptor multi-units, \(L\hbox{-}{R}_{1}+\ldots {R}_{n}\), where n is the number of receptor sub-units, we use the geometric mean of receptor sub-unit gene expression values, \({R=\left({R}_{1}\ldots {R}_{n}\right)}^{\frac{1}{n}}\), to calculate the inflow signal amount. Our rationale is that receptor gene expression quantifies a cell’s ‘potential’ to receive intercellular signals, and the weighting by average downstream TF expression quantifies the actual downstream activation due to ligand–receptor binding and thus provides a more accurate measure of whether the cell actually received the signal. However, this definition is not exactly the same as the amount of ‘received ligand,’ which may not necessarily trigger downstream activation. By contrast, for ST data, we can measure the inflowing signal directly at each spatial spot using output from spatial CCC inference methods, such as COMMOT14. For a general method, for a given ligand (L) at ST spot (S), for every L–R in which L is inferred to partake, we define the inflowing signal amount as \({\sum }_{R}{C}_{S}^{\,(L-R)}\), where \({C}_{S}^{\,(L-R)}\) is the inferred communication score for interaction L–R at spot S.
-
2.
We defined GEM enrichment using output from matrix factorization methods, but GEM enrichment can be constructed from other dimensionality reduction methods in a similar manner. For matrix factorization methods, which decompose the gene expression matrix X into X = WHT where, if X is an N × G matrix, where N is the number of cells and G is the number of genes, W is an N × K matrix describing cell membership into K GEMs, where K is the number of factors, and H is a G × K matrix describing the loadings of each GEM, and HT is the transpose of matrix H, where the rows and columns have been interchanged to ensure correct matrix multiplication. Then, if we define \(\widetilde{W}\) to be the normalized factor membership matrix such that the rows sum to unity, we define each GEM enrichment variable as \({\widetilde{W}}_{k}\), where k = 1, …, K. To standardize GEM enrichment values so that they are on the same scale as log-transformed gene expression values, we use the log-transformed \(\log\, (1+\alpha \widetilde{W})\), where α is the scaling factor used to transform the original unnormalized counts, \(Y=\log\, (1+\alpha \widetilde{X})\), where \(\widetilde{X}\) is the normalized gene expression matrix, such that the rows sum to unity.
-
3.
Outflowing signals are defined as the gene expression of signal ligands implicated from cell–cell communication analysis. In the case of ligand multi-units, \({L}_{1}+\ldots {L}_{n} - R\), we use the geometric mean of ligand sub-unit gene expression values, \({\left({L}_{1}\ldots {L}_{n}\right)}^{\frac{1}{n}}\).
Therefore, we associate cells with a vector containing three types of measurements: signal inflow measurements, which are receptor gene expression weighted by the average expression of their known downstream TF genes; intracellular ‘module’ enrichment, which is the cell’s membership weight to a multigene set module, which measures how strongly the cell expresses those genes in the module; and signal outflow, which is ligand gene expression. When measuring signal inflow, we are not measuring from which cells the signals were sent, but rather how much signal has been received by the cell. Similarly, when measuring signal outflow, we are not measuring how much of the expressed signal ligand was actually received by other cells (as measured by, for example, signal inflow), but simply how much of the signal the cell is expressing.
FlowSig applies algorithms from causal structure learning that are based on the concepts of conditional independence testing and, if perturbation data are available, conditional invariance testing, to learn the directed intercellular flow network from the augmented flow expression matrices. Conditional independence testing infers the set of statistical dependencies from the data, whereas conditional invariance infers which variables shifted significantly in distribution after perturbation, for example, owing to disease or external stimulation. All conditional independence and conditional invariance testing are performed using partial Pearson’s correlation to generate sufficient statistics. Despite partial correlation testing relying on the potentially unrealistic assumption that gene expression values are distributed according to a linear multivariate Gaussian distribution, we use the partial correlation method because it is significantly faster than other methods that use nonparametric kernel-based tests, and we can correct for biologically unrealistic edges by analyzing the learned CPDAGs rather than a DAG. To learn the CPDAG, we use the UT-IGSP24 algorithm when analyzing non-spatial scRNA-seq with perturbation data and the GSP27 algorithm for spatial data with no considered perturbation. Both of these methods estimate a CPDAG containing both directed and undirected edges that corresponds to the Markov equivalence class inferred from conditional independence and conditional invariance testing. Graphically, the Markov equivalence class is defined by the set of graphs that have the same skeleton graph, which is the undirected equivalent of the CPDAG, and v-structures, which are defined as directed node triplets (x, y, z), where edges are oriented such that \(x\to z\leftarrow y\). FlowSig reorients undirected edges inferred from UT-IGSP or GSP according to the assumption that inflow signal nodes must be directed towards GEM nodes, GEM nodes must be directed towards outflow signal nodes and edges between two GEM nodes can be bidirectional.
We also use bootstrap aggregation to further validate the learned intercellular flow network. For non-spatial scRNA-seq, we bootstrap by resampling individual cells from each condition with replacement. However, for ST data, we need to account for the spatial dependencies that affect correlation. Therefore, we perform a version of block bootstrapping64 as follows. For each bootstrap realization, we divide the tissue into non-overlapping spatial regions, which we can obtain from either k-means clustering on the spatial coordinates, leiden clustering of the spatial connectivity graph or from predefined tissue region annotations. Then, within each ‘block,’ we resample with replacement. For each bootstrap realization, FlowSig outputs an adjacency matrix (A), that corresponds to the estimated CPDAG, where Aij = 1 if an edge has been inferred and Aij = 0 otherwise. For B bootstrap realizations, where B > 0 is the number of bootstrap samples, we then take the averaged adjacency, \(\tilde{A}=\,{B}^{-1}\mathop{\sum }\nolimits_{b=1}^{B}{A}^{(b)},\) as the final CPDAG. To remove low-confidence edges, for every edge in the equivalent undirected skeleton graph of the CPDAG, we calculate the total edge weight as \(w\left(i,{j}\right)=\,{A}_{{ij}}+{A}_{{ji}}\). For a specified threshold, defined by the parameter \({w}^{* } < 1\), if \(w\left(i,{j}\right) < {w}^{* }\), we remove the edge from the network, that is, we set \({A}_{{ij}}={A}_{{ji}}=0\). Once the bootstrap aggregated CPDAG has been learned, biologically unrealistic arcs or edges are removed or reoriented, respectively. For all directed arcs from the filtered CPDAG, we retain only arcs that are directed from inflow signals to GEMs, GEMs to other GEMs or from GEMs to outflow signals. Similarly, for undirected edges, we orient edges such that nodes are directed in the same manner. In the case that an edge connects one GEM to another, we include both directions into the final intercellular flow network.
Identifying differentially flowing signal variables
When inferring intercellular flows, we prioritize ‘informative’ inflowing and outflowing signal variables. In the case of scRNA-seq analysis, where perturbation data are available, we consider only ‘differentially flowing’ inflow and outflow signals. For all applications in this study, we use a Mann–Whitney U (Wilcoxon rank-sum) test to assign variables as differentially flowing if their adjusted P values (after correction for multiple hypothesis testing) fall below a specified threshold (for example, adjusted P < 0.05), indicating statistical significance, and whose log (FC) values are above a specified threshold (for example, log (FC) > 0.5). We analyzed inflow signal variables separately from outflow signal variables. That is, we performed two separate Mann–Whitney U tests—one to identify differentially inflowing variables from only the set of inflow signal variables and one to identify differentially outflowing variables from only the set of outflow signal variables. When analyzing ST data, in which perturbation data are not as readily available, FlowSig instead prioritizes inflow and outflow variables that are spatially variable. For all applications considered, we retain variables for which the graph-based global Moran’s I, which we calculate using Squidpy65, is above a specified threshold, for example (I > 0.1).
Constructing downstream TF target sets to measure signal inflow
To measure signal inflow more accurately from non-spatial scRNA-seq data, we used prior knowledge from OmniPath26 to weight the gene expression of receptors that have been implicated in intercellular communication from prior cell–cell communication inference. For each ligand–receptor interaction, we searched the KinaseExtra and PathwayExtra modules for TFs that are the first downstream targets of the relevant receptors. Because OmniPath has been constructed for human knowledge, when constructing the downstream TFs for mouse data, we convert the mouse receptor genes implicated from communication inference to their human orthologs and perform the same procedure as for human data.
Interpreting gene expression modules
TFs are the mediators of signal transduction, that is, signal inflow, and the primary regulators of gene transcription, that is, signal outflow. To gain a deeper functional understanding of intercellular flows, it is important to interpret FlowSig output both with respect to GEMs, which describe the expression patterns of coordinated multigene sets, as well as individual GEM-specific TFs. For both non-spatial and spatial data, we consider only a priori known TFs, which in this case are based on TF lists provided by pySCENIC56. Specifically, we use the list provided in allTFs_mm.txt for mouse data and the list provided in allTFs_hg38.txt for human data.
For non-spatial scRNA-seq data, we used pyLIGER35 to construct integrated GEMs. For a dataset describing \({C}\) conditions, pyLIGER uses joint matrix factorization to decompose each condition-specific gene expression counts matrix, \({X}^{\,(c)}\in {{\mathbb{Z}}}_{\ge 0}^{N\times G}\), where \({{\mathbb{Z}}}_{\ge 0}\) is the set of all nonnegative integers, N is the number of cells and G is the number of genes, into K GEMs through \({X}^{\,(c)}={F}^{\,(c)}\cdot {\left(W+{V}^{\,\left(c\right)}\right)}^{T}\), where AT is the transpose of matrix A, where rows and columns have been swapped. Here, \({F}^{\,(c)}\in\) \({{\mathbb{R}}}_{\ge 0}^{N\times K}\) is the condition-specific factors matrix, describing the membership of the cells in condition c to each of the K GEMs, and \(W\in {{\mathbb{R}}}_{\ge 0}^{G\times K}\) and \({V}^{\,(c)}\in {{\mathbb{R}}}_{\ge 0}^{G\times K}\) are the condition-shared and condition-specific loadings matrix, describing the membership of genes to each of the K GEMs. Larger values of \({F}_{{nk}}^{\,\left(c\right)}\) correspond to greater membership of cell n in condition c to GEM k, while larger values of \({W}_{{gk}}+{V}_{{gk}}^{\,\left(c\right)}\) correspond to greater overall membership of gene g to GEM k. We use the columns of \({F}^{\,(c)}\) as our K GEM variables and use the columns of \(W+{V}^{\,(c)}\) to extract the top TFs for each GEM. For each module k, we sort genes by decreasing order of the loadings sum, \({W}_{{gk}}+{V}_{{gk}}^{\,\left(c\right)}\), and then extract the top contributing TFs in the order by which they appear in the sorted lists.
For ST data, we use NSF4 to construct spatially resolved GEMs. In brief, NSF decomposes the gene expression counts, \(X\in {{\mathbb{Z}}}_{\ge 0}^{N\times G}\), which has N spots and G genes, into K GEMs (factors) through \(X={F}{W}^{T}\), where the factors matrix, \(F\in {{\mathbb{R}}}_{\ge 0}^{N\times K}\), describes the spotwise membership to the K GEMs (factors) and is fit using Gaussian processes whose means and covariances vary with spatial locations. The loadings matrix, \(W\in {{\mathbb{R}}}_{\ge 0}^{G\times K},\) describes the gene weight membership to each of the K GEMs. Larger values of Fnk indicate a higher enrichment of spot n for GEM k, which describes a spatially varying gene expression pattern; larger values of Wgk indicate greater membership of gene g to GEM k, that is, how much gene g contributes to the gene expression pattern. We use the columns of the factor matrix, F, as our K GEM variables and use the columns of loadings matrix, W, to extract the top contributing TFs for each spatial GEM. For each module k, we sort all genes by decreasing order of their Wgk value. We then extract the top contributing TFs by the order in which they appeared in the sorted list.
Inferring upstream TF regulators of spatial signals
To infer which TFs could potentially regulate inferred signal outflow variables, we borrow the approach of Cang et al.14 After FlowSig infers the global intercellular flow network, for each signal outflow variable that is connected in the network, we first backtrack through the directed network to infer which spatial GEMs are connected to the signal outflow node. For each GEM with a directed edge to the signal outflow variable, we extract the top 10 contributing TFs (see ‘Interpreting gene expression modules’ in Methods). We then use the scikit-learn implementation of the Random Forest regression model66 to model the signal ligand gene expression as a function of the TF genes as independent variables. We then ranked the TFs with respect to their feature importance, which is calculated from the Gini importance (mean decrease in impurity).
Experimental validation
Human cortical organoid generation
All experiments using human embryonic stem cells (hESCs) were approved by the University of California, Irvine (UCI) Human Stem Cell Research Oversight (hSCRO) Committee. The hESC line H9 was obtained from the WiCell Institute under a material-transfer agreement. The methods for hESC maintenance and cortical organoid production were previously established67,68. In brief, H9 cells were maintained with inactivated mouse embryonic feeders (PMEF-CF, Millipore Sigma) on a 0.1% gelatin-coated plate and cultured in DMEM/F12 (HyClone) with 20% knockout serum replacement (KSR, Invitrogen), non-essential amino acids (NEAAs, Invitrogen), GlutaMAX (Invitrogen), 100 mg ml–1 primocin (InvivoGen), 0.1 mM β-mercaptoethanol (Invitrogen) and 10 ng ml–1 of fibroblast growth factor 2 (FGF2, Invitrogen) at 5% CO2 at 37 °C. The medium was refreshed daily. At ~70–80% confluency, H9 cells were differentiated into cortical organoids. After dissociation, 9,000 cells per well were plated into low-attachment V-bottom 96-well plates (Sumitomo Bakelite, MS9096V) to form aggregates in medium consisting of Glasgow’s Minimal Essential Medium (GMEM, Invitrogen), 20% KSR, 0.1 nM non-essential amino acids, 100 mg ml–1 primocin, 0.1 mM β-mercaptoethanol, sodium pyruvate (Invitrogen), Wnt inhibitor IWR-1-endo (Calbiochem) and TGF-β inhibitor SB431542 (Stemgent). ROCK inhibitor Y-27632 (20 µM, BioPioneer) was added in the medium from D0 to D6 to prevent cell death. From D0 to D18, the organoids were maintained at 5% CO2, 37 °C, and half of the medium was changed every 2–3 d. From D18 to D35, the organoids were transferred to Petri dishes and cultured in the medium consisting of DMEM/F12 with N2 (Invitrogen), GlutaMAX, chemically defined lipid concentrate (CDLC, Invitrogen) and 0.4% methylcellulose (Sigma) at 5% CO2, 40% O2 and 37 °C. The medium was refreshed every 2–3 d.
Sample preparation and scRNA-seq
Organoids were collected at D18 (160 organoids) and D35 (25 organoids), dissociated into single cells and subjected to Evercode Cell Fixation (Parse Biosciences). The organoids were dissociated into a single-cell suspension using Papain Dissociation System (Worthington), following the manufacturer’s manual. The dead cells in the single-cell suspension were removed using EasySep Dead Cell Removal (Annexin V) Kit (STEMCELL Technologies), following the manufacturer’s manual. The cell suspension was then passed through a 40 mm cell strainer before assessing cell number and viability. Samples with total cell numbers >1,000,000 and >80% viability were further processed for cell fixation and freezing following Parse Biosciences User Manual. The samples were then sent to Genomics Research and Technology Hub, UCI, for barcoding and library preparation using Evercode WT kit (Parse Biosciences). Ten thousand cells per sample and 50,000 reads per cell were targeted for sequencing. The sequencing was done using NovaSeq 6000 (Illumina). Alignment was performed using Split-pipe (Parse Biosciences).
Growth factor exposure and RT–qPCR
Between D15 and D21, the organoids were exposed to 400 ng ml–1 FGF8b or 50 ng ml–1 BMP4 (with 3 mM CHIR99021) in the culture medium. Untreated organoids were used as a control group. The organoid samples were collected at D35 and lysed using Buffer RLT (Qiagen). RNA was extracted using the RNeasy Mini Kit (Qiagen), following the manufacturer’s manual. Then, 1,000–3,000 ng RNA from each sample was converted to complementary DNA using SuperScript IV First-Strand Synthesis Reaction (Invitrogen). PowerTrack SYBR Green Master Mix (Applied Biosystems), cDNA and primers were mixed and loaded into 384-well plates (Invitrogen). The RT–qPCR was carried out by using QuantStudio 7 Real-Time PCR System (Applied Biosystems). The following primers were used: EOMES (amplicon size, 225 bp) forward 5′-CGACAATAACATGCAGGGCAA-3′, reverse 5′-TCATTCAAGTCCTCCACGCC-3′; PAX6 (amplicon size 48 bp) forward 5′- TGTCCAACGGATGTGTGAGTA-3′, reverse 5′-CAGTCTCGTAATACCTGCCCA-3′; CoupTF1(NR2F1) (amplicon size 104 bp) forward 5′-ATCGTGCTGTTCACGTCAGAC-3′, reverse 5′-TGGCTCCTCACGTACTCCTC-3′; GAPDH (amplicon size 69 bp) forward 5′-CTCTCTGCTCCTCCTGTTCGAC-3′, reverse 5′-TGAGCGATGTGGCTCGGCT-3′.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The human cortical organoid scRNA-seq are available at NCBI GEO at accession number GSE239542. The human pancreatic islet scRNA-seq data were originally published by Burkhardt et al.36; the raw gene expression counts and treatment condition metadata were downloaded from NCBI GEO at accession GSE161465. The scRNA-seq data of human COVID-19 BALF samples were originally published in Liao et al.38; the gene expression matrices and cell-type annotation metadata were downloaded from NCBI GEO at GSE145926. The spatial Stereo-seq of mouse embryogenesis at time E9.5 was published originally in Chen et al.39; the annotated spatial data were extracted from the file ‘Mouse_embryo_all_stage.h5ad’ hosted at https://db.cngb.org/stomics/mosta/download/.
Code availability
FlowSig is available to install as a Python package from GitHub at https://github.com/axelalmet/flowsig. All scripts used to generate the analysis in this manuscript are available at GitHub at https://github.com/axelalmet/FlowSigAnalysis_2023. The processed versions of all datasets used in this study, including cell-type annotation and cell–cell communication output from CellChat and COMMOT for non-spatial and spatial data, respectively, are available at: https://doi.org/10.5281/zenodo.10850397 (ref. 69).
References
Wolpert, L. Positional information and pattern formation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 295, 441–450 (1981).
Gao, C. et al. Iterative single-cell multi-omic integration using online learning. Nat. Biotechnol. 39, 1000–1007 (2021).
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).
Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization. Nat. Methods 20, 229–238 (2023).
Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Sherman, T. D., Gao, T. & Fertig, E. J. CoGAPS 3: Bayesian non-negative matrix factorization for single-cell analysis with asynchronous updates and sparse data structures. BMC Bioinf. 21, 453 (2020).
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-seq. eLife 8, e43803 (2019).
Zhao, Y., Cai, H., Zhang, Z., Tang, J. & Li, Y. Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data. Nat. Commun. 12, 5261 (2021).
Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single-cell atlases. Nat. Cell Biol. 25, 337–350 (2023).
Seninge, L., Anastopoulos, I., Ding, H. & Stuart, J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat. Commun. 12, 5684 (2021).
Almet, A. A., Cang, Z., Jin, S. & Nie, Q. The landscape of cell–cell communication through single-cell transcriptomics. Curr. Opin. Syst. Biol. 26, 12–23 (2021).
Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).
Wang, X., Almet, A. A. & Nie, Q. The promising application of cell–cell interaction analysis in cancer from single-cell and spatial transcriptomics. Semin. Cancer Biol. 95, 42–51 (2023).
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A. & Nolan, G. P. Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529 (2005).
Chen, X. et al. An individualized causal framework for learning intercellular communication networks that define microenvironments of individual tumors. PLoS Comput. Biol. 18, e1010761 (2022).
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell–cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).
Mitchel, J. et al. Tensor decomposition reveals coordinated multicellular patterns of transcriptional variation that distinguish and stratify disease individuals. Preprint at bioRxiv https://doi.org/10.1101/2022.02.16.480703 (2022).
Armingol, E. et al. Context-aware deconvolution of cell–cell communication with Tensor-cell2cell. Nat. Commun. 13, 3665 (2022).
Flores, R. O. R., Lanzer, J. D., Dimitrov, D., Velten, B. & Saez-Rodriguez, J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife 12, e93161 (2023).
Browaeys, R. et al. MultiNicheNet: a flexible framework for differential cell-cell communication analysis from multi-sample multi-condition single-cell transcriptomics data. Preprint at bioRxiv https://doi.org/10.1101/2023.06.13.544751 (2023).
Squires, C., Wang, Y. & Uhler, C. Permutation-based causal structure learning with unknown intervention targets. In Proc. 36th Conference on Uncertainty in Artificial Intelligence Vol. 124, 1039–1048 (PMLR, 2020).
Verma, T. S. & Pearl, J. Equivalence and Synthesis of Causal Models (1991).
Türei, D. et al. Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis. Mol. Syst. Biol. 17, 1–16 (2021).
Solus, L., Wang, Y. & Uhler, C. Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika 108, 795–814 (2021).
Bohnenpoll, T. et al. A SHH–FOXF1–BMP4 signaling axis regulating growth and differentiation of epithelial and mesenchymal tissues in ureter development. PLoS Genet. 13, e1006951 (2017).
Briscoe, J. & Small, S. Morphogen rules: design principles of gradient-mediated embryo patterning. Development 142, 3996–4009 (2015).
Zagorski, M. et al. Decoding of position in the developing neural tube from antiparallel morphogen gradients. Science 356, 1379–1383 (2017).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Dimitrov, D. et al. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-seq data. Nat. Commun. 13, 3224 (2022).
O’Leary, D. D. M., Chou, S.-J. & Sahara, S. Area patterning of the mammalian cortex. Neuron 56, 252–269 (2007).
Jin, S. et al. Inference and analysis of cell–cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Lu, L. & Welch, J. D. PyLiger: scalable single-cell multi-omic data integration in Python. Bioinformatics 38, 2946–2948 (2022).
Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
Hartig, S. M. & Cox, A. R. Paracrine signaling in islet function and survival. J. Mol. Med. 98, 451–467 (2020).
Liao, M. et al. Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat. Med. 26, 842–844 (2020).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
Briscoe, J. & Thérond, P. P. The mechanisms of Hedgehog signalling and its roles in development and disease. Nat. Rev. Mol. Cell Biol. 14, 416–429 (2013).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Mavromatakis, Y. E. et al. Foxa1 and Foxa2 positively and negatively regulate Shh signalling to specify ventral midbrain progenitor identity. Mech. Dev. 128, 90–103 (2011).
Chiu, Y.-C. et al. Foxp2 regulates neuronal differentiation and neuronal subtype specification. Dev. Neurobiol. 74, 723–738 (2014).
Hatton, B. A. et al. N-myc Is an essential downstream effector of shh signaling during both normal and neoplastic cerebellar growth. Cancer Res. 66, 8655–8661 (2006).
Rao, G., Pedone, C. A., Coffin, C. M., Holland, E. C. & Fults, D. W. c-Myc enhances sonic hedgehog-induced medulloblastoma formation from nestin-expressing neural progenitors in mice. Neoplasia 5, 198–204 (2003).
Servén, D., Brummitt, C. & Abedi, H. pyGAM: Generalized additive models in Python. Preprint at https://doi.org/10.5281/ZENODO.1208724 (2018).
Pöschl, J. et al. Expression of BARHL1 in medulloblastoma is associated with prolonged survival in mice and humans. Oncogene 30, 4721–4730 (2011).
Gulacsi, A. & Anderson, S. A. Shh Maintains Nkx2.1 in the MGE by a Gli3-independent mechanism. Cereb. Cortex 16, i89–i95 (2006).
Brancaccio, A. et al. Requirement of the forkhead gene Foxe1, a target of sonic hedgehog signaling, in hair follicle morphogenesis. Hum. Mol. Genet. 13, 2595–2606 (2004).
Briscoe, J. et al. Homeobox gene Nkx2.2 and specification of neuronal identity by graded Sonic hedgehog signalling. Nature 398, 622–627 (1999).
Katoh, M. & Katoh, M. Transcriptional mechanisms of WNT5A based on NF-κB, Hedgehog, TGFβ, and Notch signaling cascades. Int. J. Mol. Med. 23, 763–769 (2009).
Gierer, A. & Meinhardt, H. A theory of biological pattern formation. Kybernetik 12, 30–39 (1972).
Müller, P. et al. Differential diffusivity of nodal and lefty underlies a reaction-diffusion patterning system. Science 336, 721–724 (2012).
Glover, J. D. et al. Hierarchical patterning modes orchestrate hair follicle morphogenesis. PLoS Biol. 15, e2002117 (2017).
Raspopovic, J., Marcon, L., Russo, L. & Sharpe, J. Digit patterning is controlled by a Bmp–Sox9–Wnt Turing network modulated by morphogen gradients. Science 345, 566–570 (2014).
Van de Sande, B. et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat. Protoc. 15, 2247–2276 (2020).
Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486–490 (2015).
Blair, J. D. et al. Phospho-seq: Integrated, multi-modal profiling of intracellular protein dynamics in single cells. Preprint at bioRxiv https://doi.org/10.1101/2023.03.27.534442 (2023).
Gamella, J. L., Taeb, A., Heinze-Deml, C. & Bühlmann, P. Characterization and greedy learning of gaussian structural causal models under unknown interventions. Preprint at https://arxiv.org/abs/2211.14897 (2022).
Li, C. & Fan, X. On nonparametric conditional independence tests for continuous variables. Wiley Interdiscip. Rev. Comput. Stat. 12, e1489 (2020).
Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).
Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607, 540–547 (2022).
He, C., Zhou, P. & Nie, Q. exFINDER: identify external communication signals using single-cell transcriptomics data. Nucleic Acids Res. 51, e58 (2023).
Tang, L., Schucany, W. R., Woodward, W. A. & Gunst, R. F. A Parametric Spatial Bootstrap. Report No. SMU-TR-337 (Southern Methodist University, 2006).
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Watanabe, M. et al. Self-organized cerebral organoids with human-specific features predict effective drugs to combat Zika virus infection. Cell Reports 21, 517–532 (2017).
Watanabe, M. et al. TGFβ superfamily signaling regulates the state of human stem cell pluripotency and capacity to create well-structured telencephalic organoids. Stem Cell Reports 17, 2220–2238 (2022).
Almet, A. Processed datasets used in Almet et al. (2024), "Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics". Zenodo https://doi.org/10.5281/zenodo.10850397 (2024).
Acknowledgements
This work was (partially) supported by NSF grants DMS1763272, CBET2134916 and MCB2028424, NIH grants R01AR079150 and R01DE030565, the Chan Zuckerberg Initiative grant AN-0000000062, and a grant from the Simons Foundation (594598). This work was supported by the NIH R00HD096105, NSF RECODE2225624, New Investigator Faculty Award and start-up funds from the UCI School of Medicine (M.W.), and the FRAXA Postdoctoral Fellowship (Y.C.T.). We thank C. Squires for useful discussions about the UT-IGSP algorithm and X. Wang for initial preprocessing of the cortical organoid scRNA-seq data.
Author information
Authors and Affiliations
Contributions
A.A.A. and Q.N. conceived the method. A.A.A. implemented the method. A.A.A. generated the numerical results. Y.-C.T. and M.W. generated the experimental results. A.A.A., Y.-C.T., M.W. and Q.N. interpreted the results, generated the figures and wrote the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks David Fischer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes, Supplementary Results, Supplementary Tables 1–4 and Supplementary Figures 1–11.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Almet, A.A., Tsai, YC., Watanabe, M. et al. Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02380-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-024-02380-w