Main

Cell–cell communication (CCC) inference has recently emerged as a major component of the analysis of single-cell and spatially resolved transcriptomics data, with over 100 tools contributing valuable developments1,2. All single-cell methods are based on multiple assumptions, including that gene co-expression across dissociated cells, or groups of cells, reflects CCC within tissues3. Similarly, CCC methods that use spatial information quantify co-localizations at different scales, some summarizing interactions globally, across slides as a whole4,5,6, and others locally at the individual cell or location7,8,9.

Most methods have focused on protein-mediated interactions, predominantly inferred from transcriptomics data2,3, and only a few from multi-omics data10. As a consequence, other modes of intercellular signalling have been typically ignored3 except for limited metabolite-mediated CCC predictions from transcriptomics data11,12,13,14. Emerging multi-omics technologies15 are anticipated to provide a broader picture of molecular mediators and in turn prompt the development of new CCC tools.

While early methods analysed CCC from single-condition datasets, increasing sample numbers and experimental design complexity have prompted various strategies to extract differential CCC insights. These strategies include methods that (1) consider each interaction independently8,16,17, (2) make use of dimensionality reduction to perform pairwise comparisons between conditions18,19 or (3) model all variables, samples and cell types simultaneously20. Approaches 2 and 3 can be thought of as modelling orchestrated CCC events, here referred to as ‘intercellular programmes’.

CCC methods typically rely on pre-existing knowledge2,3, with extensive efforts dedicated to curating and extending protein-mediated19,21,22 and, to a lesser extent, metabolite-mediated knowledge11,13,14,23. In some resources, the interactions are associated with pathways19 or transcriptional regulators24,25, leading to multiple heterogeneous databases and potential inconsistencies caused solely by the choice of resource3.

Finally, all these developments use various infrastructures, with each CCC tool being typically designed for a specific task or data type.

Here, we introduce LIANA+, an all-in-one framework that enables CCC inference beyond a single task or data type. To illustrate the distinguishing features of LIANA+, we applied it to a recent spatially resolved metabolome–transcriptome dataset of a murine Parkinson’s disease model26. In this case study, we identified dopamine-mediated CCC events and the brain subregions where they take place. Moreover, we jointly analysed single-nucleus and spatially resolved human heart data with a complex cross-conditional experimental design27. In this analysis, we hypothesized intercellular and intracellular signalling mechanisms driving fibrosis in ischaemic heart regions.

Results

LIANA+ as an all-in-one solution to model CCC

LIANA+ is a scalable framework (Extended Data Fig. 1, Supplementary Table 1 and Supplementary Note 1) that integrates methods to infer CCC from dissociated data (Fig. 1a) and methods to study global and local relationships from spatially resolved data (Fig. 1b), expanding them to multi-omics technologies. When working with cross-conditional datasets, these methods are supplemented by different strategies to extract deregulated CCC events via hypothesis-free and hypothesis-driven approaches (Fig. 1c). LIANA+ uses standardized scverse28 input and output formats (Fig. 2a), enabling interoperability with external packages and the straightforward extensions of CCC approaches. Moreover, we propose a flexible causal subnetwork search to link CCC events with intracellular signalling (Fig. 1d). Each of these components leverages a comprehensive prior knowledge base that encompasses metabolite- and protein-mediated intercellular and intracellular signalling (Figs. 1e and 2b–e and Supplementary Note 2). Taken together, LIANA+ synthesises heterogeneous methods and resources1,2, providing an all-in-one framework to study intercellular communication (Fig. 1 and Supplementary Table 2).

Fig. 1: LIANA+ framework overview.
figure 1

a, LIANA+ re-implements and adapts eight ligand–receptor methods to infer interactions from single-cell data, along with a flexible consensus that can integrate any combination of these methods. b, LIANA+ implements multi-view learning as well as eight local metrics to respectively capture global and local interactions from spatially resolved omics data. c, LIANA+ includes diverse strategies to identify deregulated CCC events across conditions: (1) differential CCC analysis with PyDESeq2 (refs. 47,48) for hypothesis-driven exploration and (2) unsupervised (hypothesis-free) approaches including standard NMF or higher-order factorizations via Tensor-cell2cell20 and multi-view factor analysis43. d, LIANA+ connects intercellular interactions to intracellular signalling pathways using sign-coherent network optimization. e, LIANA+ is built on a rich knowledge base—OmniPath22 and BioCypher59—which comprise ligand–receptor interactions and annotations, including such mediated by metabolites68, as well as intracellular knowledge, such as signalling pathways and TFs. Finally, all components (ae) of LIANA+ are applicable to both dissociated single-cell and spatially resolved multi-omics data. aFor spatially weighted Spearman correlation, along with the standard metric, we implemented its masked version from scHOT32; for Moran’s R we adapted both the global and local versions from SpatialDM8 and for the spatially weighted product, we also included a max-normalized version.

Fig. 2: LIANA+ uses standardized inputs and outputs to streamline the inference intercellular and intracellular signalling.
figure 2

a, LIANA+ accepts unimodal (AnnData) or multi-modal (MuData) data objects as inputs with (optional) prior knowledge and/or spatial information. These are then transformed into data frames with aggregated interaction results or unimodal objects with statistics at the individual spot or cell level. To enable the inference of CCC across modalities, the methods implemented in LIANA+ accept MuData objects77 as input. These provide essential functionalities to load and store multi-modal data77, and can be thought of as an extension of AnnData objects78, which are the default input of LIANA+ when working with unimodal single-cell or spatial data. b,c, LIANA+ makes use of existing prior knowledge frameworks (b)22,59,68 to annotate interactions according to, for example, pathways, disease or location (c). d, Similarly, it uses this prior knowledge to infer putative causal (sign-coherent) signalling networks49,79, emanating from ligand–receptor interactions down to active TFs. e, The TF activities can be estimated by making use of generalistic regulon prior knowledge80 and standard enrichment analyses81.

LIANA+ enables modelling CCC across spatial modalities

A particular challenge of emerging spatial technologies is that, on a single tissue section, they can combine different technologies15, and hence the observations from each technology may correspond to distinct spatial locations that need to be aligned26,29. LIANA+ handles spatial multi-omics datasets from diverse technologies, including such with distinct observations across modalities (Methods).

Expanding on previous work4, LIANA+ uses a multi-view modelling approach to learn spatial relationships across distinct types of features, spatial contexts and technologies (represented as views; Fig. 1b). This approach enables the joint modelling of combinations of complex tissue structures (for example, cell neighbourhoods) and functions (for example, signalling pathways). For example, relationships between ligands and pathways4 or cell types and pathways27,30. Yet it models relationships globally and hence does not provide information about the tissue locations within which these interactions occur. To complement it, we implemented eight local metrics (Fig. 1b and Supplementary Note 3). These have previously been used to identify co-expression patterns between genes across spatial31,32,33 and pseudotime32 contexts, and have been recently applied to ligand–receptor interactions8,9,34. We illustrate the joint application of multi-view modelling and local metrics to study metabolite-mediated interactions from multi-omics data using a recent murine Parkinson’s disease model dataset26. This spatially resolved dataset provides metabolome and transcriptome measurements, respectively generated using matrix-assisted laser desorption/ionization mass spectrometry imaging and 10X Visium technologies26 (Fig. 3a). Briefly, three mice were subjected to unilateral 6-hydroxydopamine-induced lesions in one hemisphere while the other remained intact26 (Fig. 3b). These 6-hydroxydopamine-induced lesions selectively destroy substantia nigra-originated dopaminergic neurons, impairing dopamine-mediated regulatory mechanisms in the striatum—an area of the brain crucial for movement coordination (Fig. 3b).

Fig. 3: LIANA+ models intercellular communication from spatial multi-omics data.
figure 3

a, Spatially resolved transcriptomics (10X Visium) and metabolomics (matrix-assisted laser desorption/ionization mass spectrometry imaging) yield two matrices with different sets of features for each modality (f1 or f2), and observations (n or m) which correspond to different locations captured on the same tissue section26. b, Parkinson’s disease mouse model annotated for striatum in intact and lesioned hemispheres, with three replicates (3✕). c, Multi-view modelling integrates metabolite peak intensities, brain-specific receptor expression and cell-type proportions to identify their spatial relationships. This approach enables the estimation of joint performance and individual contributions of receptor expression and cell-type proportions in predicting metabolite peak intensities. Cell-type proportions were deconvoluted35 using a murine single-cell atlas36 as a reference. d, Dopamine predictors ranked according to their median importance (y axis; ordinary least squares t-values), with names shown for the top three predictors: Drd2 and MSNs 1 and 2. e, Normalized dopamine peak intensities. f, log1p Drd2 receptor gene expression. gi, Local interactions between dopamine and its canonical D2R receptor, encoded by Drd2, as measured by spatially weighted cosine similarity (g), its corresponding uncorrected permutation P values (h) and interaction categories (i). The images showcase slide B1 from experiment V11L12-109 (ref. 36). Source numerical data are available in source data.

Source data

LIANA+ jointly models global associations across modalities

Using multi-view modelling, we inferred spatial relationships between metabolites, their corresponding brain-specific metabolite receptors and cell types across different spatial contexts (views) (Fig. 3c and Methods). Specifically, we trained a model that predicts metabolite intensities using spatially adjacent receptor expression and cell-type proportions, deconvoluted using Tangram35 with a murine brain single-cell atlas36 as a reference. We then quantified the strength of association (importance) between the metabolite intensities and their predictors. We further calculated the individual contribution of each predictor view as well as the performance of the multi-view model in explaining the variance of the metabolites’ intensities (Methods). For each slide, we carried out this modelling process in the lesioned and intact hemispheres independently (Methods).

Several metabolite peaks were explained relatively well by the joint model (R2 > 0.5), including dopamine and its derivative 3-methoxytyramine, among other potentially deregulated, but unannotated, metabolite peaks (Extended Data Fig. 3a). Focusing on dopamine, we saw a large difference in explained variance between the intact (R2 = 0.535) and lesioned hemispheres (R2 ≈ 0; Extended Data Fig. 3a), which is expected due to the absence of dopamine in the striatum of the lesioned hemisphere26 (Fig. 3e and Extended Data Fig. 4). Looking further into the intact hemisphere, we saw that the cell-type proportions were a better predictor of metabolite intensities than the receptors (Extended Data Fig. 3b,c and Methods), implying that dopamine signalling in this region was more closely associated with the abundance of specific cell types than the expression of brain-specific metabolite receptors. Similarly, we noted that cell types were typically better predictors than receptors for the remainder of the well-explained metabolite peaks (Extended Data Fig. 3b). Focusing on the intact hemispheres, we found that the three best predictors of dopamine (median t-value of >3) were dorsal medium-sized spiny neurons (MSNs) 1 and 2 (ref. 36) and Drd2 dopamine receptor gene expression (Fig. 3d). This reflects anticipated associations with dopamine, as D2R (encoded by Drd2) is a canonical receptor of dopamine. Similarly, GABAergic MSNs 1/2 are key receivers of dopamine signalling and were classified as types D1 or D2 according to the expression of dopamine receptor genes (Drd1 or Drd2, respectively; Fig. 3f and Extended Data Fig. 4d) they express36. Moreover, dopamine’s relationship with its top three predictors differed notably between intact and lesioned hemispheres (Extended Data Fig. 3d–f).

This association of dopamine with the MSN1 cell type corroborated the findings of the original publication26, while our analysis highlighted the interactions of dopamine with MSN2 and its canonical D2 receptor, which were not previously reported there36.

LIANA+ infers local interactions at individual locations

The approach described above models global spatial relationships—that is, it considers all spots to compute a single value per interaction across the slide. As such, it provides a single statistic for the importance of each interaction in a slide but does not provide information about the region or locations where the interactions occur. To complement global relationships identified with LIANA+, we implemented eight metrics to pinpoint local interactions at the individual spot or cell location. Briefly, LIANA+ includes (1) four spatially weighted variants of commonly used similarity metrics (cosine similarity, Pearson and Spearman correlation and Jaccard index), (2) a masked version of Spearman correlation32, (3) simple spatially weighted products and (4) a bivariate extension8,33,37 of the univariate spatial clustering measure—Moran’s I (Methods). We evaluated the performance of these metrics in two separate tasks (Extended Data Fig. 2 and Supplementary Note 4), and chose spatially weighted cosine as the default local metric in LIANA+ due to its interpretability and consistent performance. Along with the local metrics, we further provide local permutation P values and categories, the latter reflecting whether an interaction between the two variables is positive, negative or neither (Methods).

Using spatially weighted cosine similarity, we focused on identifying the specific locations at which putative interactions with dopamine occurred. We saw that, as anticipated, the putative interactions of dopamine and D2R, highlighted above by the global multi-view learning approach, occurred only within the intact striatum regions (Fig. 3g), which was further corroborated by low P values (Fig. 3h) and a positive association (category) between the two variables (Fig. 3i). With similar results for MSN1 and MSN2 cell types (Extended Data Fig. 4e,f). Our analysis additionally hinted at an anticipated asymmetry following unihemispheric lesion38: while the interaction between D2R and dopamine was present in the intact hemisphere (Fig. 3i), Drd2 was also expressed in the dopamine-depleted, lesioned hemisphere (Fig. 3f).

In conclusion, using LIANA+ we captured perturbation-driven changes in dopamine’s distribution and its associations with its canonical D2R receptor and MSN cell types 1 and 2. We also pinpointed the specific regions where these interactions take place, recapitulating and extending perturbed dopamine-signalling hypotheses26 and illustrating how LIANA+ enables CCC analyses from spatial multi-omics data.

Ligand–receptor inference weakly reflects co-localization

Identifying co-localized genes from spatially resolved data alone can help us to pinpoint relevant interactions driving disease. However, it remains limited by a common coverage–resolution trade-off in most spatial technologies39,40: they either quantify a limited fraction of molecules or they capture multiple cells within spots, relying on deconvolution to quantify cell-type frequencies within them39,40. Leveraging Slide-tags, a recent technology capable of measuring full transcriptome single-nucleus data while also preserving spatial information41, we evaluated spatially uniformed CCC methods using cell type and gene expression co-localization as an indirect ground truth3—assuming that co-localization is a proxy of communication (Methods). We found weak associations between ligand–receptor interactions predicted in a spatially agnostic manner and the co-localization of their ligands, receptors and cell types (Fig. 4 and Supplementary Note 5). Therefore, in the next section, we combined dissociated and spatial data to contextualize CCC inference to proteins and cell types known to co-localize.

Fig. 4: Comparison of ligand–receptor inference methods using the spatial co-localization of cell types and ligand–receptors as assumed truth.
figure 4

Each method, as re-implemented in LIANA+, and its individual ligand-receptor (LR) scoring functions are represented by a different box colour. Scoring functions that reflect the magnitude strength of an interaction are shown in dark red, while those that capture cell-type specificity in black. a, Quantification of the performance of each method’s scoring functions using the AUROC. b, The balanced accuracy for each of the methods filtered according to their suggested false positive filtering thresholds. c,d, The normalized F1 scores following filtering for each score (*) (c) and method (d). For each metric, a score of 0.5, denoted by the dashed red line, indicates random performance. Note that while we show 0 to 1 for normalized F1, unlike balanced accuracy and AUROC, it is bound between negative and positive infinity (Methods). Both NATMI and Connectome use expression products (Product*) as a measure of magnitude strength. The boxes were ordered according to the median performance across datasets (central line in cyan within each box; n = 5 datasets). The box hinges represent the first and third quartiles, and the whiskers extend up to 1.5 times the interquartile range above and below the box hinges. Outliers are depicted as individual hollow points beyond the whiskers. Source numerical data are available in source data.

Source data

LIANA+ detects deregulated inter- and intracellular events

With growing sample sizes and increasingly complex experimental setups, versatile methods are required to analyse CCC across conditions. Standard dimensionality reduction approaches, such as non-negative matrix factorization (NMF) or principal component analysis, have been previously used to investigate cross-talk between cell types18,19,42. However, such approaches are limited to capturing the variance across data with only two dimensions at a time, for example, observations and interactions. To address this, LIANA+ couples nine ligand–receptor methods (Fig. 1a and Supplementary Table 3) with higher-order factorizations20,43 for the scalable inference of cross-conditional CCC from dissociated single-cell data (Fig. 1c). The combination of ligand–receptor inference with such factorization methods allows all interactions, cell types and samples to be considered simultaneously20, thus enabling the identification of coordinated CCC events, or intercellular programmes, across conditions. Moreover, such approaches also highlight the specific interactions and cell types that contribute to these differences, thus offering a complete overview of the elements driving the variation across samples. Along with Tensor-cell2cell20,44, which uses tensor-based factorization to decompose CCC across samples20, we implemented in LIANA+ an unsupervised approach that leverages the flexibility of Bayesian multi-view factor analysis43 (Methods and Supplementary Note 6). This approach has as a distinguishing feature the ability to obtain interaction importances per cell-type pair (view) rather than their global importance across all cell types20. Using five public cross-conditional atlases (Supplementary Table 4), we show that, regardless of the ligand–receptor method, both Tensor-cell2cell and multi-view factorization capture intercellular programmes that separate samples according to different conditions (Supplementary Note 6 and Extended Data Fig. 5). To complement the aforementioned unsupervised analyses, LIANA+ uses state-of-the-art45,46 differential expression analysis47,48 to enable the targeted (hypothesis-driven) exploration of CCC across conditions (Methods). Briefly, LIANA+ combines ligand–receptor prior knowledge with differential statistics, calculated at the pseudobulk level, to identify interactions deregulated across cell-type pairs16 (Methods). Furthermore, since intercellular CCC events and intracellular signalling are intertwined, LIANA+ provides a distinctive strategy to predict signalling pathways up- or downstream of CCC events, combining its comprehensive knowledge base with a causal subnetwork search (Fig. 1d and Supplementary Note 7). Specifically, we solve a network optimization problem49 that identifies a subnetwork with a signal flow consistent with the changes in CCC and transcription factors (TFs) (Methods). A distinct feature of our causal subnetwork search is the consideration of sign coherence, thus providing solutions in line with the activator or inhibitory effects of interactions reported in the literature.

To demonstrate the synergistic nature of the components in LIANA+, we analysed a cross-conditional dataset integrating single-nucleus and spatial transcriptomics data. This dataset comprised 29 single-nucleus and 28 10X Visium spatial samples from human myogenic, fibrotic and ischaemic heart regions following myocardial infarction27 (Fig. 5a). Using these data, we had previously shown the role of myofibroblasts and macrophages in fibrosis, characterized by the synthesis of extracellular matrix proteins for scar tissue formation27. Here, we further elucidate the intercellular and corresponding intracellular signalling mechanisms facilitating cardiac tissue repair and remodelling.

Fig. 5: LIANA+ models intercellular communication from dissociated and spatially resolved transcriptomics data.
figure 5

a, Sampling sites from human myocardial infarction patients. b, Overview of NMF applied to local interactions inferred for each location/observation across all slides. c, log2-transformed odds ratios and Fisher’s exact test P values representing the enrichment or depletion of fibrotic, ischaemic and myogenic labels in each of the factors inferred by NMF. d, Pathway enrichment50 of NMF ligand–receptor loadings. Asterisks indicate FDR-corrected P values <0.05, with the colour map distinguishing between positive (red) and negative (blue) enrichment. e, NMF factor 1 scores per observation in an ischaemic tissue section. f,g, Spatially weighted cosine similarity of FN1 (f) and SPP1 (g) with the ITGA5 and ITGB1 complex, respectively, in a selected (ACH0014) sample. h, The procedure to decompose ligand–receptor interactions inferred between cell types from dissociated single-nucleus data samples into factors and corresponding feature sets. i, Multi-view factor scores following ligand–receptor score decomposition with one-way analysis of variance P value across ischaemic, fibrotic and myogenic samples (n = 29). The central line within each box marks the median, with the box hinges representing the first and third quartiles. The whiskers extend up to 1.5 times the interquartile range above and below the box hinges. Outliers are depicted as individual hollow points. j, Multi-view factorization factor 1 variance explained across cell-type pairs (views). Abbreviations include cardiomyocytes (CM), endothelial cells (EN), FB, MY, pericytes (PC) and vascular smooth muscle cells (VM). k, To link deregulated inter- and intracellular signalling events, we combine ligand–receptor prior knowledge with differential contrast statistics between ischaemic samples versus the rest. Then, using knowledge of intracellular protein–protein interactions and TF regulons, we identify sign-coherent subnetworks that connect intercellular interactions (start nodes) with deregulated TFs (end nodes; Methods). l, A subset of interactions, the ligand and/or receptors of which are known to play a role in fibrosis27,53,55 and were deregulated in FB and/or MY types. Only interactions with the highest loadings (>95th percentile) from the NMF analysis on spatially informed local ligand–receptor interactions were included in the differential expression analysis. Asterisks signify genes with FDR <0.05, and the colour bar corresponds to log2FC47,48. m, Sign-coherent signalling network originating from FN1 and SPP1 and propagating down to TFs deregulated in MY in ischaemia. n, Corresponding legend for the network in m. Source numerical data are available in source data.

Source data

LIANA+ infers intercellular programmes from spatial data

To identify intercellular programmes shared across the conditions, we first inferred local ligand–receptor interactions using spatially weighted cosine similarity for each of the 28 spatial transcriptomics slides. Then, we concatenated the resulting interactions across all slides and applied NMF, obtaining five intercellular programmes (factors) across all slides and conditions (Methods). Each factor resulted in scores per observation (spot), along with associated interaction loadings (Fig. 5b). We saw that factors 1, 2 and 5 were enriched in ischaemic samples, and factor 4 was enriched in fibrotic and downregulated in (healthy) myogenic samples. In contrast, factor 3 was positively associated with the myogenic condition (Fig. 5c). To elucidate the functional processes associated with the identified factors, we assessed whether ligand–receptor interactions with high loadings were enriched in a set of canonical pathways50 (Methods and Fig. 5d). We saw that interactions from ischaemia-associated factors 1 and 2 were enriched in hypoxic, epidermal growth factor receptor (EGFR), mitogen-activated protein kinase (MAPK) and JAK-STAT pathways, reflecting anticipated inflammatory patterns in ischaemic regions51. Moreover, we noted that interactions in both the ischaemia-associated factor 4 and the fibrosis-associated factor 5 were enriched in transforming growth factor (TGF)-β signalling—a well-known driver of fibrosis52. Both factors 4 and 5 were negatively associated with pro-inflammatory pathways, such as tumour-necrosis factor-α (TNF-α) and NFK-β, suggesting the semi-orthogonal (independent) positioning of the interactions in the latent space concerning factors 1 and 2. Similar downregulation of pro-inflammatory pathways was noted in the myogenic-associated factor 3, probably related to the reduced presence of immune cells within the functioning cardiac muscle tissue. As expected, ischaemia-associated factor 1 localized to ischaemic regions (Fig. 5e). Among the top 30 interactions were several integrins (ITGB1, ITGAV, ITGA5 and ITGA7) interacting with pro-fibrotic (FN1 and SPP1) gene markers27,53 and matrix glycoproteins (TNC and THBS1) (Extended Data Fig. 6a). These interactions of integrin complexes with TNC are in agreement with the high expression levels of TNC found in other studies that also focus on the early stages of myocardial infarction54. FN1 has been reported as a marker of myofibroblasts27, while SPP1 (ref. 27), FN1 (ref. 55) or both53 as markers of pro-fibrotic macrophages. Moreover, THBS1 and ITGB1 have recently been implicated in a self-amplifying, immune-cell recruitment loop, including FN1+ THBS1-expressing macrophages55. Overall, the identified intercellular programmes aligned well with the existing literature on pro-fibrotic response upon myocardial infarction, including potential interactions of FN1 and SPP1 with ITGB1-containing complexes within ischaemic regions (Fig. 5f,g).

LIANA+ infers intercellular programmes from dissociated data

To investigate fibrosis-associated intercellular programmes further, we performed an unsupervised CCC analysis on the complementary dissociated single-nucleus data, enabling the exploration of specific cell types involved in intercellular signalling. We additionally incorporated co-localization information to mitigate the anticipated high false positive rates (Supplementary Note 5). In particular, we considered interactions only if they were identified as condition relevant by the NMF analysis, and if they occurred between cell-type pairs observed to co-localize (Extended Data Fig. 6b and Methods). Then, we inferred ligand–receptor interactions with LIANA+ for each of the 29 single-nucleus samples, followed by multi-view factorization4 on the obtained ligand–receptor interactions across all cell-type pairs and all samples (Fig. 5h and Methods). This unsupervised approach identified a factor (factor 1), whose sample loadings were significantly different across the conditions (P = 9.9 × 10−12; Fig. 5i). Moreover, this approach enabled us to quantify how well the variance within each cell-type pair is captured by each of the factors (Methods). We saw that within factor 1, interactions with fibroblasts (FB) were prominent sources of CCC, including the FB-to-myeloid signalling axis (R2 = 25.1%) (Fig. 5j), which was also consistent with a strong spatial association between the two cell types (Extended Data Fig. 6b). Moreover, among the interactions with the highest loadings, which were predominantly emitted by FB, we observed interactions involving collagens, integrins and lamins, and FN1 (Extended Data Fig. 6c). In summary, this unsupervised and synergistic analysis highlighted the importance of FB and myeloid cells (MY) as potential drivers of the disease trajectory following early myocardial infarction in both single-cell and spatial data. Thus, our findings pointed to a necessity for an in-depth exploration of FB and MY interactions, concurrent with their known role in the fibrotic process in cardiac tissues upon myocardial infarction27.

LIANA+ identifies deregulated intracellular signalling

To investigate the deregulation of specific ligand–receptor interactions in ischaemia along with downstream signalling that originates from these interactions, we performed differential expression analysis, using both myogenic and fibrotic samples as reference (Fig. 5k and Methods).

We saw deregulation of FB-to-MY signalling and the associated ligand–receptor interactions, in agreement with the preceding unsupervised analyses. Examining the individual ligand and receptor genes, we saw that the expression of THBS1 and TNC was significantly deregulated (false discovery rate (FDR) <0.05) in FB and MY, respectively, while SPP1, THBS1 and FN1 were deregulated in both (Fig. 5l). Additionally, genes involved in ITGB1-containing integrin complexes, interacting with FN1 and SPP1, were upregulated in MY (Fig. 5l). Building on these findings, we delved into the potential intracellular signalling pathways within MY triggered by the FN1/SPP1 and ITGA5 and ITGB1 integrin complex interactions (Fig. 5k). This focus was motivated by the reported role of these proteins in fibrosis27,53,55 and the co-deregulation of their genes in myeloid and fibrotic cell types (Fig. 5l) as well as the FB-to-MY signalling axis highlighted by both the multi-view factorization latent space (Fig. 5i) and spatial analysis (Extended Data Fig. 6b). Using a signed and directed prior knowledge network22 (Methods), we linked the predicted interaction between FN1/SPP1 and ITGA5 and ITGB1 with deregulated downstream TFs (Fig. 5k and Methods). This hinted at a putative signalling network involving kinases MAPK1 and MAPK14, and TF co-regulators ATM, EP300 and YAP (Fig. 5m,n). Specifically, these regulatory proteins were predicted to upregulate SMAD1/3 TFs, key members of canonical TGF-β superfamily signalling56. We also noted the downregulation of FOXO3, which is potentially mediated by SMAD3-dependent TGF-β1 signalling in cardiac myofibroblasts57. Finally, we cross-checked the predictions from our analysis in public heart failure atlases (Supplementary Table 5) and saw that using LIANA+ to integrate single-cell and spatial data improved the reliability of predictions (Supplementary Note 8 and Extended Data Fig. 7).

In conclusion, our analysis generated mechanistic hypotheses about the intercellular and intracellular events linked to the establishment of the myofibroblast phenotype and recruitment of pro-fibrotic SPP1+ macrophages in myocardial infarction27. These results underscore how LIANA+ offers a complete suite to identify disease-related communication patterns along with diverse strategies to interpret the underlying biological processes and intracellular signalling mechanisms.

Discussion

In this work, we introduce LIANA+, which unifies and expands previous methodological developments, redefining them into synergistic components to enable diverse CCC analyses from single-cell and spatial (multi-)omics data.

Dissociated and spatially resolved data generation has commonly focused on transcriptomics2,3, limiting CCC method development to protein-mediated interactions. Yet emerging multi-omics technologies enable the quantification of diverse molecules15,26,58. Building on a flexible and efficient infrastructure28 and a rich biological knowledge base22,59, we combined spatially informed multi-view modelling with local spatial metrics to examine the interactions of metabolites, receptors and cell types in a murine Parkinson’s disease model26. Our analysis revealed dopamine-mediated interactions in specific brain subregions, highlighting how LIANA+ addresses challenges to integrate different data modalities, a scenario we anticipate to grow with the increasing combined use of complementary spatial technologies15,39.

Single-cell technologies capture cellular heterogeneity but typically lose tissue architecture information during dissociation, leading to many false positives in spatially agnostic CCC inference, as our evaluations demonstrate. Using LIANA+, we jointly analysed the CCC events following myocardial infarction from single-nucleus and spatially resolved transcriptomics data27, thereby addressing the resolution limitations of spatial transcriptomics data15,39 and the anticipated high false positive rates from dissociated data. By combining NMF with spatially informed local interactions predicted from spatial transcriptomics data, we highlighted known drivers of fibrosis in ischaemic regions, such as SPP1 (refs. 27,53) and FN1 (refs. 53,55). After constraining the inference of CCC from dissociated transcriptomics data to co-localized cell types, ligands and receptors, we used a multi-view factorization approach43 to identify FB as major sources of CCC events in ischaemia, potentially via a previously reported FB-to-MY signalling axis27. A targeted analysis of CCC and associated downstream signalling events revealed a causal signalling pathway linking FN1/SPP1 ligands to downstream TGF-β-associated TFs, such as SMAD3 and FOXO3 (refs. 56,57). Thus, this application highlights the potential of LIANA+ to generate unsupervised hypotheses and translate those into mechanistic biological insights.

The components of LIANA+ can be flexibly used in different ways beyond those in the illustrated examples. In particular, while we combined unsupervised factorizations with targeted differential analysis to uncover deregulated interactions, each approach can be used independently, depending on the dataset at hand. For example, if a specific hypothesis is available a priori, we may directly use a targeted analysis, while the unsupervised analyses might be better suited for complex experimental designs. Additionally, LIANA+ facilitates the flexible construction of models, allowing for the integration of an arbitrary number of modalities, as illustrated by our application of multi-view modelling4 to study the relationships among metabolites, cell types and receptors. Consequently, with the increasing prevalence of spatial technologies that capture diverse molecular types39, LIANA+ is well suited to study signalling events mediated by these molecules across combinations of modalities and technologies. Furthermore, LIANA+ enables the inference of putative causal signalling networks. To our knowledge, it is currently the only approach that considers the sign of the molecular interactions underlying inter- and intracellular signalling, a key feature of signal transduction. We applied this method to link deregulated protein-mediated CCC and TFs, but it should be applicable to any set of molecules, including metabolite-mediated signalling networks60. Finally, the flexibility of LIANA+ enables other CCC methods, factorizations61,62,63 or network approaches64,65,66 to be incorporated.

The methods implemented in LIANA+ have a number of limitations. First, while each of its methods can flexibly infer interactions between any set of variables, they typically use prior knowledge, which is limited, often exhibiting biases and a trade-off between coverage and quality3,24,67. Most curation efforts have focused on annotating ligand–receptor interactions19,21, and additional prior knowledge efforts are needed in particular for the inference of CCC beyond protein-mediated events. Moreover, contextualizing prior knowledge to specific cell types, tissues or diseases can help to reduce erroneous predictions. As an example, we customized MetalinksDB68, a comprehensive resource for the inference of metabolite-mediated CCC, to brain-specific metabolites. Second, CCC from dissociated single-cell data remains limited to the co-expression of communication partners, which may not translate to the protein level, let alone imply a functional interaction3. Likewise, while spatially resolved data is a step further from its dissociated counterparts, it is limited to the co-localization of molecules. Finally, while our preliminary evaluations support the ability of LIANA+ to generate CCC insights across a range of technologies, systematic benchmarks of CCC methods are still pending. Ours and other evaluations remain limited by the lack of ground truth3, using instead orthogonal modalities, such as spatial data3,69 or downstream signalling3. As emerging technologies70,71,72 that capture bona fide CCC events become measurable at scale and widely available, LIANA+ will facilitate such benchmarks and comparisons. In the meantime, LIANA+—like other CCC inference methods—remains a tool for hypothesis generation.

As illustrated in this manuscript, the synergistic components in LIANA+ can be combined in various ways, and their configurations can be tailored to address diverse and emerging questions and datasets. Given its modularity, new methods can be integrated into the framework and benefit from the established ecosystem of methods and resources. We further envision LIANA+ to be a versatile tool for the study of CCC driven by diverse mediators, beyond protein-mediated and metabolite-mediated interactions, expanding the range of CCC events that could be studied, such as host–microbiome interactions73,74,75. Thus, LIANA+ not only stands as a comprehensive and scalable tool for studying communication events but also serves as a catalyst for future developments in the field.

Methods

Bivariate spatially informed metrics

LIANA+ implements spatially weighted global and local metrics to estimate bivariate co-localizations from spatial data. For statistical testing of the local metrics, we use spot label permutations to generate a null distribution against which empirical local P values are computed. Inspired by GeoDa82, we also categorize local bivariate associations according to the magnitude and sign of the two variables. We describe these in detail in Supplementary Note 3. When working with interactions, the members of which contain heteromeric complexes, we consider the minimum expression of complex members per spot. For any interactions where the members are not expressed in at least 10% of spots are excluded by default.

We provide a detailed tutorial on the bivariate metrics at https://liana-py.readthedocs.io/en/latest/notebooks/bivariate.html.

Learning spatial relationships across multi-views

To learn multivariate interactions from spatially resolved data, we adapted MISTy’s multi-view learning approach4, where a view is a collection of variables (for example, a modality). This approach jointly models different spatial and functional aspects of the data, such that it can fit any number of views and each view can contain any number of variables. As shown in this work, one can use it to model different combinations of RNA expression, cell-type proportions and metabolite peak intensities.

In LIANA+, multi-view objects are represented as subclasses of MuData77, modified to ensure the correct format of the views and corresponding spatial connectivities. Each multi-view object has an intrinsic view (intraview) that contains the target variables of interest for each observation. The other views can be considered as ‘extra views’, composed solely of predictor variables. Extra views can also represent different transformations of the variables within the intraview, for example, incorporating different spatial contexts4. Once the multi-view structure (model design) is defined, each target is modelled by predictors from each view independently. As such, for each target we obtain (1) relationship statistics (‘importances’) for each of the predictors from the distinct views, (2) the relative ‘contribution’ of each view to the joint prediction of each target and (3) the goodness of fit (for example, R2) of the model. The statistics for each predictor (1) signifying its importance in the prediction of a given target variable, are calculated depending on the modelling approach. For example, for random forests, we use the reduction of variance explained that can be attributed to each predictor across all regression trees. For linear models, which were used throughout this manuscript, we calculate the ordinary least squares t-statistics of the estimated parameters under the zero value null hypothesis83. The independent view-specific predictions are combined by a cross-validated linear meta-model4 (by default k = 10) to obtain the contributions of view-specific models (2), along with the goodness of fit of the overall model, for each target variable (3). In particular, we can discern between the contribution of the intraview, modelled as the intrinsic variability among target variables within the same cells/spots, from the joint predictive contribution of extra views, which encode spatial information.

By default, models are based on random forests84 and can capture complex non-linear relationships. Here, we also implemented linear models as a signed alternative that provides the direction of relationships, though LIANA+ also accepts external single-view models.

To enable region-specific CCC modelling, we mask each extra view on the basis of labels (corresponding to regions) assigned to observations within the ‘intra’ view. This process is preceded by the spatial weighting of each extra view.

To facilitate the use of our multi-view learning approach, we provide in-depth tutorials using custom and predefined multi-view model structures at https://liana-py.readthedocs.io/en/latest/notebooks/misty.html.

Estimation of spatial connectivities

As in MISTy4, spatial connectivity weights are calculated using families of radial basis kernels: \({w}_{ij}={{\mathrm{e}}}^{-\frac{{d}_{ij}^{2}}{{l}^{2}}}\), Gaussian \({w}_{ij}={{\mathrm{e}}}^{-\frac{{d}_{ij}^{2}}{2{l}^{2}}}\), linear \({w}_{ij}=1-\frac{{d}_{ij}}{l}\) and exponential kernels \({w}_{ij}={{\mathrm{e}}}^{-\frac{{d}_{ij}}{l}}\); where w is a weight matrix \({w}_{ij}\in [0,1]\) of shape n × n; dij is the Euclidean distance between cells or spots i and j; and l is a parameter controlling the length or bandwidth. We additionally use a cut-off parameter below which spatial connectivities are set to zero.

Throughout the manuscript, unless otherwise specified, we used Gaussian weights with a bandwidth of 150 and a cut-off of 0.1, and the diagonal (self to self) was set to 1 for the local scores. For multi-view learning, the diagonal was set to 0 (by default) to avoid self-referencing. For both the product and normalized product, we additionally apply L1 normalization to the weights. This adjustment accounts for the variability in the number of neighbours across different spots, implicitly accounted for by the remainder of the local metrics.

When working with multi-modal spatial technologies, the different modalities of which have observations with distinct locations, spatial connectivity weights are estimated according to a reference coordinate system. We detail this process in Supplementary Note 9.

Ligand–receptor pathway enrichment

To perform ligand–receptor pathway enrichment, we first convert gene set (pathway) resources, represented as weighted bipartite graphs where each gene belongs to a gene set, into ligand–receptor sets. Specifically, we assign a weight to each ligand–receptor interaction on the basis of the mean weight of the ligands and receptors involved in the interaction, also taking into account the presence of heteromeric subunits. Moreover, we assign a given ligand–receptor interaction to a specific gene set (or pathway) only if all members of the interaction are part of the gene set, and in the case of weighted resources if all members are additionally sign-consistent. Finally, once a ligand–receptor resource is generated, we use decoupler-py to perform enrichment with univariate linear regression81. In this manuscript, we used the PROGENy resource50 to assign pathway annotations to ligand–receptor interactions. In contrast to classic pathway gene sets, PROGENy contains consensually regulated targets of pathway perturbations, not genes thought to be members of the pathways. However, this resource-conversion procedure is applicable to any resource, including undirected resources, such as Gene Ontology terms for which all members of a gene set will be assigned a weight of 1.

Hypothesis testing for deregulated CCC across conditions

To enable hypothesis testing for CCC, similarly to the strategy implemented in MultiNicheNet16, we first generate pseudobulk profiles by summing raw expression counts for each sample and cell type with the decoupler-py package81. After filtering low-quality genes (for example, considering minimum expression in terms of total counts and samples in which the gene is expressed), we perform differential analysis for each cell type independently with DESeq2 (ref. 47), as implemented in PyDESeq2 (ref. 48).

Once feature statistics per cell type are generated, we transform those into a data frame of interaction statistics by joining them to a ligand–receptor resource, while additionally calculating average feature expression and expression proportions per cell type on the basis of a user-provided AnnData object78. Similarly to any other method in LIANA+, interactions expressed in less than 10% (by default) of the cells per cell type are filtered, considering the individual members of heteromeric complexes. A detailed tutorial is available at: https://liana-py.readthedocs.io/en/latest/notebooks/targeted.html.

Sign-consistent intracellular networks

By combining CCC predictions with prior knowledge networks of intracellular signalling, here we uncover putative causal networks linking CCC events to TFs. To accomplish this, we used CORNETO79—a Python package that unifies network inference problems from prior knowledge—to implement a modified version of the integer linear programming formulation implemented in CARNIVAL60, described in Supplementary Note 10.

We show the inference of sign-consistent networks downstream of deregulated CCC events, identified using differential expression analysis with PyDESeq248 at https://liana-py.readthedocs.io/en/latest/notebooks/targeted.html.

NMF on ligand–receptor local scores

A utility function was implemented that takes an AnnData object78 as input and uses Scikit-learn’s84 NMF implementation to decompose the input matrix into two matrices of dimensions k, n and k, d; where d is the number of features, n is the number of observations (cells) and k is the number of components (factors). To estimate k, we additionally provide a heuristic elbow selection procedure, in which the optimal component number (k) is chosen from a sequential range of components using elbow selection as implemented in the kneedle package85. The selection of the optimal k is based on the mean absolute reconstruction error.

Identifying intercellular programmes across samples

Inspired by the CCC factorization approach proposed in Tensor-cell2cell20 and building on our recent application of multi-view factor analysis43 to dissociated cross-condition atlases86, we use the ligand–receptor inference methods from LIANA+ to infer interactions across each sample independently and then transform this into a multi-view structure of cell-type pairs (views), each represented by samples and the ligand–receptor interaction scores. To build the multi-view structure, we use the MuData format77, and only views with at least 20 (by default) interactions in at least three (by default) samples are kept. Moreover, we exclude samples if they have less than ten interactions (by default), and interactions are considered only if they are present in at least 50% of the samples (by default). Then we use the Bayesian multi-view factor analysis approach, implemented within the MOFA+ statistical framework43, to decompose ligand–receptor scores across samples into intercellular communication programmes. Specifically, the outputs of this factorization include a number of factors, each with (1) factor scores per sample, (2) ligand–receptor interaction factor loadings for each cell-type pair (view) and (3) variance explained per view (cell-type pair) for each factor. Tutorials on extracting intercellular programmes from single-cell dissociated data with multi-view factorization and Tensor-cell2cell are available at https://liana-py.readthedocs.io/en/latest/notebooks/mofatalk.html; https://liana-py.readthedocs.io/en/latest/notebooks/liana_c2c.html.

Spot calling evaluation using local metrics

To benchmark how well each local score in LIANA+ preserves biological information, we devised spot classification and regression tasks. In the spot classification task, we used four public breast cancer 10X Visium slides76, with annotations labelled as malignant (containing ‘cancer’ in their annotation) or non-malignant spots (any other spot). For each slide, we calculated ligand–receptor scores using the local metrics in LIANA+. Then for each local metric, we trained and evaluated random forest classifiers, with 100 estimators, using a stratified K-fold cross-validation strategy (k = 10). Area under the receiver operating characteristic curve (AUROC) and weighted F1 were calculated on the test sets, and their average across the folds was used in visualizations.

In the regression task, we used a public dataset with 28 10X Visium slides from left-ventricle heart tissues to compare how well different local metrics capture cell-type specific ligand–receptor events. In particular, we checked how well the local scores LIANA+ predict cell-type proportions per spot, inferred using cell2location42 as done in Kuppe et al.27. We used a random forest regressor, with 100 estimators, utilising a K-fold cross-validation training strategy (k = 5), and calculated the variance explained (R2) and root mean squared error for each score. All classification and regression tasks were performed using Scikit-learn (v.1.3.2).

For the inference of ligand–receptor interactions throughout this work, we used LIANA’s consensus resource—a resource combining the curated ligand–receptor resources in OmniPath22.

Spatial co-localization evaluation

To evaluate the agreement of spatially uniformed ligand–receptor methods in LIANA+ with spatially proximal ligand–receptor and cell-type pairs, we used five processed and recently published spatially informed single nucleus RNA sequencing (slide-tags) datasets41. Making use of spatial information for each dataset, we estimated global Moran’s R using LIANA+ to generate an indirect ground truth—that is, ligand–receptors and cell types co-localized to a greater extend than random (Moran’s R >0 and FDR <0.05). Then we ran all ligand–receptor methods in LIANA+ without taking spatial information into account. AUROC was calculated for the whole distribution of each ligand–receptor method’s scoring functions. To calculate balanced accuracy and normalized F1 (below), we used false positive filtering thresholds as suggested by each of the methods’ authors (if available). For CellPhoneDB, CellChat and Geometric mean, interactions with P values below 0.05 were filtered. For CellChat’s ligand–receptor (LR) probabilities, we additionally only kept interactions for which either the ligand or receptor P values were under 0.05. Similarly, for Connectome and log2 fold changes (log2FC), interactions were kept only if both ligand and receptor P values were under 0.05 and had a positive scaled weight or log-fold change, respectively. For SingleCellSignalR, we considered ligand–receptor interactions with LR scores above 0.6. For LIANA’s rank aggregate, we kept only those with a magnitude rank <0.05. Since the authors of NATMI and scSeqComm do not suggest a threshold, we kept interactions if they were within the top 5% of the specificity weight and interscore distributions, respectively. Similarly, when evaluating the individual scoring functions from each method, we considered only the top 5% as positive predictions.

$${\mathrm{Normalized}}\;{\mathrm{F}}{1}=\frac{1}{2}\left(\frac{{\mathrm{F}}{1}_{{\rm{observed}}}}{{\mathrm{F}}{1}_{{\rm{permuted}}}}\right)$$

where ‘observed’ is the F1 score for the actual ligand–receptor predictions, while the ‘permuted’ F1 was generated by shuffling the predictions of each method (100 times) and calculating an F1 score.

$${\rm{Balanced}}\,{\rm{accuracy}}={\frac{1}{2}\left(\frac{{\mathrm{TP}}}{{\mathrm{TP}}+{\mathrm{FN}}}+\frac{{\mathrm{TN}}}{{\mathrm{TN}}+{\mathrm{FP}}}\right)}$$

where TP is for true positives, TN is for true negatives, FP is for false positives and FN is for false negatives.

Sample label classification

For the condition classification task, building on a similar approach20, we used public, pre-processed, cross-conditional atlases (Supplementary Table 4), each selected such that they include more than five samples per condition following pre-processing. To ensure that only high-quality samples were used in each of the atlases, we removed any samples with less than 1,000 cells or z-transformed total counts above or below a z-score of 3 and −2, respectively. In the Carraro et al. dataset87, we kept samples with more than 700 cells. Moreover, only cell types found in at least five samples and with at least 20 cells in each sample were considered. To ensure that the samples were balanced between the conditions if either condition had a sample ratio higher than 1.5 times the number of samples in the other condition, the overrepresented condition was subsampled to the number of samples in the underrepresented one. Each dataset was normalized to 10,000 total counts per cell and log1p-transformed.

Subsequent to pre-processing, we inferred ligand–receptor interactions at the cell-type level using the re-implemented (and homogenized) methods in LIANA+, independently for each sample. Any interactions not expressed in at least 10% of the cells in both source and receiver cell types were filtered.

Then the output from LIANA+ was converted to the structures used by the factorization approaches employed by MOFA+ and Tensor-cell2cell—a multi-view object and a four-dimensional tensor, respectively. To run both factorization approaches, we considered interactions only if they were present in 33% of the samples, and any interactions missing in a sample were assumed to be biologically meaningful and assigned as zero. For all datasets, we decomposed the CCC events into 10 factors, except Reichart et al.88, which was decomposed into 20 factors due to its larger sample size. Using the factor scores from each method–factorization combination we then performed a classification task, similar to the one from Armingol et al.20. Specifically, a random forest classifier, with 100 estimators, was trained and evaluated on the sample factor scores computed for each score-factorization combination, utilizing a stratified K-fold cross-validation strategy (k = 3), performed over five seeds. Then the mean AUROC and weighted F1 scores were calculated on the testing set’s probabilities and label predictions, respectively.

Prediction reliability in public heart failure atlases

To quantify the reliability of predictions from LIANA+, we evaluated the ability of different predictions to separate heart-failure versus non-heart-failure myeloid samples using six publicly available heart-failure datasets (Supplementary Table 5). Specifically, we used CCC predictions across different stages of the analysis of human myocardial infarction data27 (Fig. 5), essentially progressively reducing the number of features (genes) to only those that were predicted as most relevant (Extended Data Fig. 7). Across each stage, predictions were chosen such that the resulting unique features (that is, ligand or receptor genes) were roughly equal to 25 (±1). For stages 1 and I (Extended Data Fig. 7a), we used the lowest CellPhoneDB P values for MY and the highest Global Moran’s R calculated across all samples from the single-cell and spatial data, respectively. Then we generated myeloid pseudobulks from each sample and using the reduced features from each stage we calculated adjusted Rand index and silhouette scores with respect to the condition labels of the samples. To calculate the adjusted Rand index we used k-means clustering on the z-transformed normalized counts with k = 2. To estimate z-scores and P values for the metrics across datasets, we generated null distributions using 1,000 randomly chosen sets of genes with the same size as the actual predictions.

Analysis of murine Parkinson’s disease model

We obtained a pre-processed dataset of three murine brain sections following 6-hydroxydopamine perturbation in one hemisphere26, with joint metabolite and transcriptome measurements generated with matrix-assisted laser desorption/ionization–mass spectrometry imaging and 10X Visium technologies, respectively. We processed the metabolite and count matrices for each slide separately, applying standard log1p-normalization and standard quality control measures to the gene expression data. For the metabolite intensities, we used total count normalization followed by z-transformation. The observations of two modalities were manually aligned following the identification of tissue-containing observations for the metabolome data, using a procedure similar to the original publication26. We inferred cell-type proportions using Tangram’s cell cluster level approach35, fit with 1,000 epochs and a learning rate of 0.1. As a reference for deconvolution, we used an annotated single-cell dataset from Zeisel et al.36. Specifically, we used the ‘TaxonomyRank4’ cell group label, along with subgroups for dopaminergic and MSNs, which resulted in 48 refined murine brain cell types. Following the alignment of the two modalities, we modelled metabolite peaks (intraview) with cell-type proportions and brain-specific receptors as predictors (extra views). We bypassed modelling the intraview—that is, we did not model each metabolite peak by the remainder of the peaks (as done by default), since we were interested solely in the predictive performance of the extra views (receptors and cell types). We focused on the intersection of the top 250 highly variable metabolite peaks (targets) across the three slides and excluded any predictors with little-to-no variation—that is, genes not within the top 12,500 highly variable genes and cell types with a coefficient of variation below the 20th percentile. Brain-specific receptors were obtained from MetalinksDB68, customized to include only metabolites found in the brain or cerebrospinal fluid. After the pre-processing steps, using our multi-view modelling procedure, we analysed 83 metabolite peaks as targets, along with 45 receptors and 37 cell types as predictors. Finally, we used spatially weighted cosine similarity on the z-transformed matrices of each modality to estimate the local scores and their corresponding P values and categories. For all analyses, we used a bandwidth of 1,000 with a cut-off of 0.1 to calculate the spatial connectivities. A tutorial on spatially resolved multi-omics data integration is shown at https://liana-py.readthedocs.io/en/latest/notebooks/sma.html.

Analysis of human myocardial infarction data

Following basic filtering and standard log1p-normalization, we estimated ligand–receptor local scores using spatially weighted cosine similarity on each of the 28 processed 10X Visium transcriptomics slides27. We considered interactions whose members were expressed in at least 10% of the spots. Then we concatenated the resulting ligand–receptor AnnData objects (slides) and kept only those interactions present in at least ten of the slides. Subsequently, we decomposed the concatenated object with NMF. We used an elbow selection procedure to automatically determine five factors as the optimal number. Then, we used Fisher’s exact test to examine whether specific condition labels were enriched when considering samples with average factor scores above the 75th quantile. Pathway activities of ligand–receptor interaction loadings were calculated using linear regression81 and sets of ligand–receptor pathways, annotated using the PROGENy pathway resource50 with all genes. We used a pre-processed dataset of 29 single-nucleotide samples from the same publication89. Raw gene counts were normalized to 10,000 total counts per cell and log1p-transformed. We inferred ligand–receptor interactions per sample using LIANA’s magnitude rank aggregate—a consensus of multiple magnitude-focused scores (Supplementary Table 3), considering only interactions with all members expressed in at least 5% of the cells per cell type. We further inferred ligand–receptor interactions only if they were deemed as condition relevant in the spatial analysis—that is, those with at least one standard deviation above the mean per NMF factor. Moreover, we considered cell-type pairs to interact only if they had a strong spatial relationship across all 10X Visium slides (target R2 > 0.05; median t-value >1.645), as determined when the modelling of each cell-type proportion (inferred with cell2location42) by the remainder of the cell types. Then we decomposed the obtained ligand–receptor interactions from the dissociated data with multi-view factorization43, considering interactions with at least 15 interactions in 20% of the samples, and views with at least 10 samples. Any missing interaction values were filled with zeroes.

For hypothesis testing, we generated pseudobulk profiles for cell type using decoupler-py81, considering only genes with at least 10 counts across each of the samples or at least 20 counts in total, a large n of 5, and expressed in at least 10% of the samples81. Then within each profile, we performed differential expression analysis with PyDeSeq247,48, contrasting ischaemic samples versus the rest—that is, we treated fibrotic and myogenic samples as baseline references. The output statistics were then converted into a data frame of ligand–receptor differential statistics using LIANA+, keeping only interactions all members of which (including complex subunits) were expressed in at least 5% of the cells in source and target cell types.

Then we estimated TF activities using the Wald statistics from PyDeSeq2 with univariate linear regression81 and CollecTRI80. For the inference of the downstream signalling events, we obtained OmniPath’s protein–protein interaction network, considering interactions with consensus direction and a curation effort >3. Then using CORNETO (v0.9.1-alpha.5), we inferred the plausible causal networks propagating via the interaction between FN1/SPP1 and the ITGA5 and ITGB1 complex, down to all TFs identified as significantly deregulated (FDR <0.05) in MY. We used a gene expression proportion cut-off of 0.1, such that nodes above the cut-off were assigned a penalty of 1, and those below a penalty of 0.01. An edge penalty of 0.02 was also used, and to ensure the solution space was thoroughly explored, the problem was solved 100 times, each time introducing small amounts of uniform noise. We used the Gurobi90 solver under an academic licence. Then the acyclic subnetworks obtained by each iteration were concatenated, such that edges from any solution were kept, and the network union was visualized with CytoScape91.

Statistics and reproducibility

The details for the pre-processing of the datasets used for the analyses are provided in the sections above. If not indicated otherwise in that section or the legends, no data were excluded from training and analysis. Similarly, unless stated otherwise, we used the default parameters for the methods within LIANA+ and any external method that we used. The details of statistical tests used throughout this study were provided in the corresponding method sections and the figure legends.

Protocol

A step-by-step protocol for installing the software and an example application can be found on Nature Protocol Exchange92.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.