## Main

Cells interact on multiple length-scales through direct contact of surface-bound receptors and ligands, tight junctions and mechanical effects, and through indirect mechanisms, including soluble factors. Usually, these communication events cannot be directly observed but are critical to understand emergent phenomena in tissue niches1. Molecular signatures of sender and receiver cell types are used to infer latent cell communication events in a tissue through co-occurrence of ligand and receptor expression across putatively communicating cell types2,3 and through gene expression signatures in the receiving cell2,4. Here we propose node-centric expression models (NCEM) to improve cell communication inference through the use of spatial graphs of cells to constrain axes of cellular communication. We infer cell communication from image-structured molecular profiling assays of RNA or proteins with subcellular resolution (Fig. 1a). We defined an NCEM as a graph neural network that predicts a cell’s observed gene expression vector from its cell type label and its niche5 (Fig. 1b and Methods). Cell–cell dependencies may be caused by diverse molecular mechanisms not limited to ligand–receptor-based communication. Therefore, we consider the effects of niche composition on all genes in an unbiased manner. Previous mathematical models of cell–cell interactions in spatial data differed in at least one out of the following central design choices that constitute an NCEM (Supplementary Table 1): they did not represent statistical dependencies of gene expression6,7, did not model cell communication events6,7,8, did not work on targeted protocols with limited ligand and receptor gene capture8,9,10 or relied on leave-one-gene hold-outs4,9, which can result in false discoveries of dependencies (Extended Data Fig. 1 and Methods).

We demonstrate cell communication inference with NCEMs on six datasets measured with MERFISH11,12, CODEX13, MIBI-TOF14, MELC15 and chip cytometry16 (Extended Data Fig. 2 and Methods). On average, intracell-type variance accounted for 40.6% of the total variance (Supplementary Fig. 1 and Methods). We defined the screened neighborhood sizes such that they cover the range of average node degrees of the given dataset (Extended Data Fig. 2b). We fit a linear model of gene expression based on a niche represented as interaction terms between the receiver cell type and the presence of each (sender) cell type in the neighborhood (Methods). Linear NCEMs were most predictive on an intermediate length scale of 69 µm across the six datasets (Fig. 1c), showing that cell–cell dependencies appear on length scales characteristic of molecular mechanisms of cell communication. NCEMs outperformed nonspatial baseline models consistently by an average ΔR2 (Online Methods) of 0.016 (Fig. 1c). As expected, the ΔR2 is small compared to the baseline model R2 that characterizes between-cell-type variance (0.39–0.79) because cell type identity accounts for a large fraction of variance in single-cell gene expression assays17. The inferred length scales were robust to data downsampling (Extended Data Fig. 3a), out-of-domain data from an unseen genetic knockout condition (Extended Data Fig. 3b–e), to simulated segmentation errors (Extended Data Fig. 3f,g) and to removal of the interaction terms from the linear model (Supplementary Fig. 2). The spatial effect on model performance varies between target cell types, suggesting that cell-type-specific molecular mediators of cell–cell dependency are captured (Supplementary Fig. 3).

NCEMs can be extended to spot transcriptomics if within-cell-type variation can be recovered from spot transcriptomics datasets in deconvolution analyses18,19. Here NCEMs model the expression variation within cell types across spots as a function of the inferred spot composition (Fig. 1d and Methods). We considered data from lymph nodes18,19 (Extended Data Fig. 4a,b) for which a deconvolution was previously demonstrated with cell2location19. Linear NCEMs were substantially better at predicting gene expression states of cell types in particular spots than a nonspatial baseline model, both globally (Fig. 1e) and for each cell type (Fig. 1f). The inferred couplings were stable to moderate subsampling of the transcriptomics spots in the training data (Extended Data Fig. 4d). We found putatively interacting ligand–receptor pairs for almost all type couplings in CellPhoneDB3 and NicheNet2 analyses of matched single-cell RNA sequencing (scRNA-seq) data, thus demonstrating the need for a quantitative description of statistical couplings in niches (Extended Data Fig. 4e–g). We also identified spatial dependencies between entire niches when modeling spot graphs (Supplementary Fig. 4).

Next we interpreted the spatial dependencies in the MERFISH brain data. We found that L2/3 intratelencephalic (IT) cells depend on the presence of Sncg expressing cells, vascular leptomeningeal cells, and L4/5 cells in their niche (Extended Data Fig. 5 and Supplementary Fig. 5). These associations are reproduced by CellPhoneDB (Supplementary Fig. 6a,b). The L2/3 IT cell subclusters are spatially localized in distinct areas of the primary motor cortex12. Indeed, the relative performance of NCEM is spatially structured (Extended Data Fig. 5f and Supplementary Fig. 5e). We quantified these dependencies between cell types as ‘cell type couplings’, the number of significant gene-wise coefficients of the cell type pair in an NCEM fit (Extended Data Fig. 5g and Methods). We discovered a dependency of CD8 T cells on multiple other cell types in the chip cytometry colon data (Extended Data Fig. 6) and a dependency of CD8 T cells on proximity to the tumor–immune boundary14 in colorectal cancer (Extended Data Fig. 7).

Similarly, we interpreted NCEM fits on the deconvoluted Visium lymph node data. We identified a bidirectional dependency of B cells and follicular dendritic cells (FDCs) that is indicative of positive feedback between both cell types in germinal center organization20 (Fig. 2a,b and Extended Data Fig. 4c). Similarly, we found evidence for a dependency of mast cells on B cells (Fig. 2b) and a mast cell subcluster associated with niches enriched in B cells (Fig. 2a). The FDC subcluster associated with niches enriched in B cells (cluster 3) showed increased expression of Cxcl13, a key chemokine for germinal centers20 (Extended Data Fig. 4c), supporting the association of these couplings with germinal centers. We further dissected these couplings based on the gene-wise effects of all senders on one receiver type (‘receiver effect analysis’, Fig. 2c) and of one sender on all receivers (‘sender effect analysis’, Fig. 2d), which contextualizes differential expression results of the FDC–B cell axis (Fig. 2e and Supplementary Data 1). Multiple T cell clusters had a similar effect on B cells in a ‘sender similarity analysis’ (Fig. 2f), in which we correlated the coefficient vectors of sender cell types that correspond to B cell receivers, which demonstrates conservation of cell type identity in the sender profile.

In contrast to linear NCEMs, nonlinear NCEMs can account for weighted or higher-order interactions between cell types (Extended Data Fig. 8a). As for linear NCEMs, we found resolution-dependent prediction performance profiles in nonlinear NCEMs (Extended Data Fig. 8b and Methods) and a dependency between L4/5 IT and L2/3 IT cells (Extended Data Fig. 8c and Methods). Notably, the nonlinear models did not outperform linear models in gene expression prediction, which suggests that the spatial dependencies in the given datasets are well described by linear models (Extended Data Fig. 8b). Next, we considered a conditional variational autoencoder (CVAE) version of an NCEM to model cell-intrinsic latent states. We conditioned the distribution over node expression states on a graph embedding of the niche and the cell type (Extended Data Fig. 9a). Even though CVAE–NCEM attained much higher predictive performance in reconstruction tasks (Extended Data Fig. 9b and Supplementary Fig. 7a), these models did not consistently capture spatial dependencies because niche states were represented in latent variables (Extended Data Fig. 9c–f and Supplementary Fig. 7b–e).

The interpretation of spatial dependencies inferred on targeted spatial molecular profiling assays is constrained by the limited capture of ligand–receptor pairs. We imputed the cell-wise gene expression in the MERFISH fetal liver data using corresponding scRNA-seq data21 (Extended Data Fig. 10). We designed a graph kernel of cell-wise receptor activity to directly model ligand–receptor interactions between neighboring cells (Fig. 2g). The peak predictive performance of the ligand–receptor nonlinear NCEM was much higher compared to the nonlinear NCEM (R2 of 0.799 and 0.947), demonstrating the increased complexity of the input compared to categorical cell type labels. We observed differential receptor signaling as differential latent neuron activation of Kit11 in sinusoidal endothelial cells (SECs) depending on their proximity to arterial endothelial cells (AECs) (Supplementary Data 2).

NCEMs are linear and nonlinear graph neural networks and CVAEs that model cell communication events in spatial omics assays (Supplementary Table 2). We identified statistical dependencies between cells on physiologically relevant length scales and interpreted fits based on model parameters. The statistical identifiability of cell type couplings may improve with increased capture of niche heterogeneity, through the inclusion of three-dimensional data, by increasing the number of cells measured and by increasing the variation in the training data through perturbations of niche structure. Uncertainty in segmentation of cells or nuclei can be improved on the level of the spatial measurement22 or may be addressed in model extensions. We found that linear NCEMs perform well in the presented tasks and are promising candidates for cell communication inference. The complexity of the graph neural network used in the NCEM defines the complexity of the motifs of cell communication that can be discovered and may be expanded given more complex datasets. CVAE–NCEMs may be used to model cell-intrinsic variation together with niche effects. Similarly, the graph kernels tailored to ligand–receptor signaling presented here provide constrained latent variables that explain extrinsic variation and could be used together with variables for cell-intrinsic variation. Nonlinear NCEMs learn a cellular representation within the spatial graph23, and we demonstrated that these representations can model niches and may be exploited for unsupervised analysis of tissue structures.

## Methods

### Data

#### Fetal liver (MERFISH)

Lu et al.11 measured wild type (WT) and Tet2−/− fetal livers with MERFISH in 140 (WT) and 195 (Tet2−/−) images across four WT fetal livers at E14.5 and two Tet2−/− knockout fetal livers at E14.5, with 132 genes observed in 40,864 (WT) and 54,970 (Tet2−/−) cells. We used cell types as originally annotated by Lu et al.: AEC, erythroid cell, erythroid progenitor, hepatocyte, megakaryocyte, macrophage, myeloid and SEC. We removed cells with an unknown label from the dataset. We scaled model outputs by the node size in the respective output layer of each model class to mitigate count noise (Supplementary Fig. 8f).

#### Brain (MERFISH)

Zhang et al.12 measured mouse primary motor cortex with MERFISH in 64 images across two mice, with 254 genes observed in 284,098 cells. We used the cell types originally annotated by Zhang et al.: astrocytes, endothelial, L2/3 IT neurons, L4/5 IT, L5/6 near-projecting neurons, L5 IT, L5 pyramidal tract neurons, L6 cortico-thalamic projection neurons, L6 IT, L6 IT Car3, L6b, Lamp5, microglia, oligodendrocyte precursor cells, oligodendrocytes, perivascular macrophages, pericytes, parvalbumin, smooth muscle cells, Sncg, somatostatin (Sst), Sst Chodl, vascular leptomeningeal cells, vasoactive intestinal polypeptide, and other cells, where L identifies the layer (L1–L6) of the distinctive laminar structure based on cytoarchitectural features (Extended Data Fig. 2a). Parvalbumin, Sst, vasoactive intestinal polypeptide, Sncg and Lamp5 define five subclasses of GABAergic cells. We removed cells labeled as ‘other’ from the dataset. We used an identifier for the respective mouse as domain information.

#### Colon (chip cytometry)

Jarosch et al.16 measured an inflamed colon with chip cytometry in two images from one patient, with 19 genes observed in 11,321 cells. We used the cell types originally annotated by Jarosch et al.: B cells, CD4 T cells, CD8 T cells, GATA3+ epithelial, Ki67 high epithelial, Ki67 low epithelial, lamina propria cells, macrophages, monocytes, PD-L1+ cells, intraepithelial lymphocytes, muscular cells and other lymphocytes (Extended Data Fig. 2a). We coarsened the cell type annotation by combining Ki67 high epithelial and Ki67 low epithelial to a joined annotation of Ki67 epithelial. We log-transformed the gene expression values for use in the analyses presented here to mitigate count noise (Supplementary Fig. 8b).

#### Cancer (MIBI-TOF)

Hartmann et al.14 measured colorectal carcinoma and healthy adjacent tissue with MIBI-TOF in 58 images across four individuals, with 36 genes observed in 63,747 cells. We used the cell types originally annotated by Hartmann et al.: endothelial, epithelial, fibroblast, CD11c myeloid, CD68 myeloid, CD4 T cells, CD8 T cells and other immune cells (Extended Data Fig. 2a). The cohort in this dataset includes two patients with colorectal carcinoma and two healthy controls. We scaled the model outputs by cell-wise size factors.

#### Tonsils (MELC)

Pascual-Reguant et al.15 measured tonsils from patients undergoing tonsillectomy with multiepitope ligand cartography (MELC), an immunohistochemistry approach, in one image across one patient, with 51 genes observed in 9,512 cells. We used the cell types originally annotated by Pascual-Reguant et al.: B cells, endothelial cells, innate lymphoid cell (ILC), monocytes/macrophages/dendritic cells (DC), natural killer (NK) cells, plasma cells, T cytotoxic cells, T helper cells (Extended Data Fig. 2a). We removed cells labeled as ‘other from the dataset.

#### Cancer (CODEX)

Schürch et al.13 measured advanced-stage colorectal cancer with CODEX in 140 images across 35 patients, with 57 genes observed in 272,266 cells. We used the cell types originally annotated by Schürch et al.: B cells, CD11b+ monocytes, CD11c+ dendritic cells, CD11b+ CD68+ macrophages, CD163+ macrophages, CD68+ macrophages, CD68+ macrophages GzmB+, CD68+ CD163+ macrophages, CD3+ T cells, CD4+ T cells, CD4+ T cells CD45RO+, CD4+ T cells GATA3+, CD8+ T cells, NK cells, T regs, adipocytes, dirt, granulocytes, immune cells, immune cells/vasculature, lymphatics, nerves, plasma cells, smooth muscle, stroma, tumor cells, tumor cells/immune cells and vasculature (Extended Data Fig. 2a). We removed cells with an annotation of dirt or an undefined label from the dataset. We merged the macrophage subclusters (CD11b+ CD68+ macrophages, CD163+ macrophages, CD68+ macrophages, CD68+ macrophages GzmB+ and CD68+ CD163+ macrophages) and the CD4+ T cell subclusters (CD4+ T cells, CD4+ T cells CD45RO+ and CD4+ T cells GATA3+). We scaled model outputs by the node size in the respective output layer of each model class to mitigate count noise (Supplementary Fig. 8e).

#### Lymph node (Visium)

We performed deconvolution with cell2location19 on a 10x Visium lymph node dataset based on a scRNA-seq dataset from the same tissue as previously described19. The cell type labels used for deconvolution were B cells, DC, endothelial, follicular dendritic cells, ILC, macrophage, mast, monocytes, NK, natural killer T cells (NKT), CD4 T cells, CD8 T cells, T cells (TIM3), T follicular regulatory cells fr, T regulatory cells reg) and vascular smooth muscle cells.

#### Dataset partitions

We randomly selected 10% of all nodes across all images and patients as the test set. From the remaining nodes, 10% of all nodes are selected as the validation set.

### MERFISH-scRNA-seq integration

We integrated scRNA-seq with MERFISH data to impute the full gene expression in the spatial graph of cells measured in MERFISH. We performed this integration between the MERFISH fetal liver (WT) data and scRNA-seq of E14.5 whole fetal liver cells sequenced by 10x Genomics platform11, which is available as GSE172127 on GEO. The scRNA-seq dataset contains 9,448 cells across 28,692 features. We performed quality control and removed cells with fewer than 500 detected genes, genes expressed in less than three cells and cells with more than 10% of the transcripts coming from mitochondrial genes from the dataset. We applied Tangram21 to generate a spatially resolved representation of the scRNA-seq fetal liver dataset21. We used 131 out of 132 genes from the MERFISH fetal liver (WT) data, which were both present in the MERFISH and the scRNA-seq dataset, to perform the mapping.

### Variance decomposition into inter- and intra-cell-type variance

The variance of a single-cell resolved dataset can be decomposed into intercell-type variance, intracell-type variance and gene variance. The gene variance is independent of cell type definitions and can therefore be considered separately from intra- and inter-cell-type variance:

$$\begin{array}{l}\mathop {\sum}\limits_i^N {\mathop {\sum}\limits_j^J {\left( {y_{i,j} - {\bar {\text {y}}}} \right)^2} }\\ = \underbrace {\mathop {\sum}\nolimits_i^N {\mathop {\sum}\nolimits_j^J {\left( {y_{i,j} - \bar y_{k(i),j}} \right)^2} } }_{{\mathrm{intracell-}}\,{\mathrm{type}}\,{\mathrm{variance}}} + \underbrace {\mathop {\sum}\nolimits_i^N {\mathop {\sum}\nolimits_j^J {\left( {\bar y_{k(i),j} - \bar y_j} \right)^2} } }_{{\mathrm{intercell-}}\,{\mathrm{type}}\,{\mathrm{variance}}} + \underbrace {\mathop {\sum}\nolimits_i^N {\mathop {\sum}\nolimits_j^J {\left( {\bar y_j - {\bar {\text{y}}}} \right)^2} } }_{{\mathrm{gene}}\,{\mathrm{variance}}}\end{array}$$
(1)

where yi,j is the gene expression of cell i out of N and gene j out of J, $$\bar x_{k,j}$$ is the mean expression of each gene j in each cell type k, k(i) is the cell type of cell i, $${\bar {\text{y}}}_j$$ is the mean expression of each gene j and $$\bar y$$ is the mean of the dataset.

### Simulations

#### Segmentation errors

We simulated segmentation errors in which the segment boundary between two neighboring cells is misplaced (Extended Data Fig. 3f,g). We selected a fraction of cells (10% or 50%) in the MERFISH fetal liver data at random, selecting one neighbor at random for each selected cell, and transferred a fraction of the total molecular abundance vector of the selected cell to its neighbor.

#### Spatial dependencies

We simulated single-cell resolved spatial data from scratch by using the cell graph from the chip cytometry colon data (Extended Data Fig. 2) and the simulated node-wise gene expression vectors. We modeled cell types using the number of genes originally defined in the dataset and drew a mean expression value for each gene from a uniform distribution between 0 and 10. We considered two scenarios of dependencies between cells: (1) a dataset without spatial dependencies in which all cells are drawn from one cell type and are independent and identically distributed and (2) a dataset with spatial dependencies in which cells belong to either one of two cell types, where we introduced dependencies on the presence of the respective other cell type in the neighborhood in 50% of the genes, with strong effect sizes drawn from a uniform distribution between 4 and 6. We fitted NCEMs, Misty9 and SVCA4 on both simulated datasets (Supplementary Fig. 2). The simulated images contained over 4,500 cells per image. To reduce the runtime for SVCA for these samples we cropped both images in the lower-right corner to create images with approximately 850 cells each.

### Models

The inputs to NCEMs are (1) a gene expression matrix $$Y \in R^{N \times J}$$ where N is the number of cells and J is the number of genes, (2) a matrix of observed cell types $$X^{\tt{l}} \in R^{N \times L}$$ where L is the number of unique cell type labels and (3) a matrix specifying the batch assignments $$X^{\tt{c}} \in R^{N \times C}$$ of C distinct batches or domains, such as images or patients. We denote the adjacency matrix of connected cells as $$A \in R^{N \times N}$$, which is calculated based on the spatial proximity of cells per image. For linear models and models with an indicator aggregator, we used a binary adjacency matrix Aij = 1 if $$d(x_a,x_b) \le \delta _{{\mathrm{max}}}$$ where $$d( \cdot , \cdot )$$ describes the euclidean distance between nodes a, bN in space and δmax is the neighborhood size (resolution), and Aij = 0 otherwise. For models using graph convolutions, we normalized A such that all rows sum to one: D−1A where D is the diagonal node degree matrix. The output of NCEMs is $$\hat{Y} \in R^{N \times J}$$, a reconstruction of the gene expression matrix. In selected datasets, we applied size factor scaling to the network output Y using the size factors $${{{\mathrm{sf}}}}_i = \frac{{\mathop {\sum}\nolimits_j^J {y_{ij}} }}{{\frac{1}{N}\mathop {\sum}\nolimits_{i^\prime }^N {\mathop {\sum}\nolimits_{j^\prime }^J {y_{i^\prime j^\prime }} } }}$$. The global data handling per dataset is reported in Supplementary Table 3, model hyperparameters for linear models are reported in Supplementary Table 4, and the parameters for nonlinear and CVAE models in Supplementary Table 5.

#### Loss functions

We use a Gaussian log-likelihood, ll, loss as an optimization objective for linear and nonlinear models with $${\mathrm{ll}}(y) = \frac{1}{{N * J}}\mathop {\sum}\limits_i^N {\mathop {\sum}\limits_j^J {( { - {\mathrm{log}}\left( {\sqrt {2\pi } \sigma _j} \right) - 0.5\frac{{\left( {y_{ij} - \hat{y}_{ij}} \right)^2}}{{\sigma _j^2}}} )} }$$ over cells i and genes j, where σj is the predicted standard deviation of a gene (Supplementary Fig. 8). The loss function of CVAE model is the negative data log-likelihood in addition to the Kullback–Leibler divergence between the variation posterior qϕ(z) and the prior p(z) on the latent variables: $${\mathrm{ll}}_{{{{\mathrm{CVAE}}}}} = - {\mathrm{ll}}(y) + D_{{{{\mathrm{KL}}}}}\left( {q_\phi (z)||p(z)} \right)$$.

#### Optimization

We ran grid searches to find the optimal set of hyperparameters for each dataset where the batch size is the number of images per dataset. We selected the number of nodes evaluated per image per batch to improve convergence. We trained all models with the Adam optimizer: linear models with 0.05, the remaining models with multiple learning rates of {0.5, 0.05, 0.005}. Additionally, we used a learning scheduler on the validation loss with a patience of 20 epochs, which reduces the learning rate by a factor of 0.5, so $${\mathrm{lr}}_{\mathrm{new}} = {\mathrm{lr}} \ast 0.5$$ and early stopping with a patience of 100 epochs. The exact description of all grid searches in code are supplied in the benchmarking repository (Code Availability). We trained linear models for hypothesis testing using ordinary least squares estimators on the full dataset.

#### Linear NCEM

The linear nonspatial baseline model infers a reconstruction Ŷ from a node’s cell type and respective domain information via  ﻿Ŷ= XDβ, where XD is the design matrix and $$\beta \in R^{(L + C) \times J}$$ are the parameters learned by the model. The design matrix of nonspatial baseline models is given by $$X^{\tt{D}} = (X^{\tt{l}},X^{\tt{c}}) \in R^{N \times (L + C)}$$. The spatial counterpart model, the linear NCEM, has access to an additional spatial sender–receiver interaction matrix. First, we computed the binary sender cell presence in the neighborhood of each cell $$X^{\tt{S}} = 1_{(AX^l > 0)} \in \{ 0,1\} ^{N \times L}$$, where 1(·) represents an indicator function. To generate a matrix representation of sender–receiver cell interactions, we compute the interaction terms between the cell type of the index cell and the presence of each cell type in its neighborhood as the outer product between Xl and XS. The resulting interaction matrix is $$X^{\tt{TS}} \in \{ 0,1\} ^{N \times L^2}$$, and the design matrix for the linear model with interaction terms is given by $$X^{\tt{D}} = (X^{\tt{l}},X^{\tt{TS}},X^{\tt{c}}) \in R^{N \times (L + L^2 + C)}$$. This design matrix can be related to a graph neural network: Xl and Xc are node-wise condition vectors that can be appended to a local graph embedding centered on an index cell, and XTS is equivalent to an outer product of the one-hot-encoded representation of an index cell with the projection obtained from a single-layer graph neural network that embeds one-encoded cell type feature vectors with a feature-wise max pooling operator across the neighborhood without the index cell. This projection is a cell-type-dimensional indicator for the presence of each cell type in the neighborhood. Linear NCEMs perform parameter inference on Ŷ = XDβ where $$\beta \in R^{(L + L^2 + C) \times J}$$. We also considered an NCEM without interaction terms which does not have receiver-specific sender effects but only global sender effects, which account for the presence of senders in the niche via $$X^{\tt{D}} = (X^{\tt{l}},X^{\tt{S}},X^{\tt{c}}) \in R^{N \times (L + L + C)}$$. We evaluated significance of coefficients corresponding to the interaction matrix XTS with a Wald test.

#### Linear NCEM for deconvoluted spot transcriptomics

The baseline model is the same as for the standard linear NCEM. The corresponding NCEM treats the spot as a neighborhood and uses the deconvoluted cell type abundances per spot $$X^{\tt{F}} \in R^{(N*L)xL}$$ as a vector-shaped neighborhood summary, replacing a kernel on a graph. Note that (N*L) is the number of spots times the number of cell types: this model treats every type- and spot-wise gene expression vector, a result of the deconvolution, as an observation. The overall design matrix of the linear model includes the interaction between the target cell type and the spot composition, and spot-wise covariates: $$X^{\tt{D}} \in R^{(N*L) \times (L + L^2 + C)}$$. Note that here, the spot composition is the same for all L gene expression prediction problems per spot. As for the linear NCEM, we again fit a linear model to this design matrix to predict deconvoluted gene expression. One can define a corresponding nonlinear model that uses the deconvoluted cell type abundances per spot $$X^{\tt{F}} \in R^{N \times L}$$ as a vector-shaped node feature space. Note that N is the number of spots in this nonlinear model. These feature vectors can be connected based on spot proximity in a graph embedding of spots $$f_{{{{\mathrm{enc}}}}}:q_\phi (z_s|g(A,X^{\tt{l}})_s)$$. The cell-type-wise gene expression decoder for spot s and cell type k is then $$f_{{{{\mathrm{dec}}}}}:p_\theta (Y_{sk}|z_s,X^{\tt{k}},X_s^{\tt{c}})$$, where Xk is a one-hot embedding of the cell type k.

#### Nonlinear NCEM

NCEMs include nonlinear models that encode the neighborhood through a graph neural network (NL-NCEM) and decode expression vectors. The corresponding nonspatial baseline model is a nonlinear model (NL) that predicts expression from cell type and graph-level predictors. A local graph embedding is given by $$f_{{{{\mathrm{enc}}}}}:q_\phi (z_i|X_i^l,g(A,X^{\tt{l}})_i,X_i^{\tt{c}})$$, which encodes the cell type labels Xl, some graph-level predictors Xc and the local graph embedding g(A, Xl), based on the adjacency matrix A, into a latent state z. The latent state of cell i is input to a fully connected layer stack given by $$f_{{{{\mathrm{dec}}}}}:p_\theta (Y_i|z_i,X_i^{\tt{l}},X_i^{\tt{c}})$$. If one uses an indicator embedding function as described in the section Linear NCEM and all hidden layers are removed from the NL-NCEM, a single linear transformation of the input remains, which is equivalent to the linear NCEM. Alternatively, g(A, Xl) can be a graph embedding learned by a graph-convolutional network (GCN)5. A one-layer GCN is given by $$g(A,X^l) = {\it{{\mathrm{softmax}}}}( {ReLU(\bar AX^lW)} )$$, where $$W \in R^{L \times H}$$ is a weight matrix, H is the dimension of the learned node representation and $$\bar A$$ is the normalized adjacency matrix.

#### Ligand–receptor NCEM (NL-NCEM-LR)

Here we consider a specific NL-NCEM with a tailored graph kernel. This graph kernel embeds each cell into a receptor dimensional latent space z based on the receptor gene expression on the index cell, ligand gene expression on neighboring cells, and the adjacency matrix A, which encodes the set of neighbors MN of cell i: $$f_{{{{\mathrm{enc}}}}}:z_{ik} = g(A_i,Y_{i,r(k)},Y_{:,l(k)}) = \mathop {\sum}\nolimits_m^M {f_R(Y_{i,r(k)}) * f_L\left( {Y_{m,r(k)}} \right)}$$. Here, r(k) and l(k) encode the gene index of receptor and ligand that correspond to ligandreceptor pair k. The latent state and graph-level predictors Xc are input to a fully connected layer stack that decodes gene expression $$f_{{{{\mathrm{dec}}}}}:p_\theta (Y_i|z_i,X^{\tt{c}})$$. The corresponding nonspatial baseline is a nonlinear model (NL) that receives receptor expression of the index cell as bottleneck activation and has the same decoder. This baseline model is not nested in the nonlinear ligand–receptor NCEM (NL-NCEM-LR) but models a baseline which imputes all genes’ expression based on ligand gene expression within the cell.

#### Conditional variational autoencoder NCEM

A conditional variational autoencoder NCEM (CVAE–NCEM) learns a distribution over node states Y based on a node-wise latent space z. The nonspatial CVAE null model contains the cell type and graph-level predictors as a condition in the variational posterior and the likelihood model. In CVAE–NCEM, the conditions are the cell type labels Xl, some graph-level predictors Xc and the local graph embedding g(A, Xl). The encoder is given by $$f_{{\it{{\mathrm{enc}}}}}:q_\phi ( {z_i|Y_i,X_i^{\tt{l}},g(A,X^{\tt{l}})_i,X_i^{\tt{c}}} )$$ and the decoder is defined by $$f_{{{{\mathrm{dec}}}}}:p_\theta ( {Y_i|z_i,X_i^{\tt{l}},g(A,X^{\tt{l}})_i,X_i^{\tt{c}}} )$$. A CVAE–NCEM for a full dataset depends on both the niche and the type of the cell itself. This setting presents the challenge of encountering a nonidentifiability between variance attributed to latent variables, cell type conditions and neighborhood context. In this study, we consider the CVAE–NCEM trained on the molecular vectors of a single target cell type as a function of the full neighborhood context to remove the nonidentifiability with respect to cell type variation and focus on the nonidentifiability between latent variables and neighborhoods.

### Model evaluation

We evaluated model performance using the coefficient of determination: $$R_i^2 = 1 - \frac{{\mathop {\sum}\nolimits_j^J {\left( {y_{ij} - \hat{y}_{ij}} \right)^2} }}{{\mathop {\sum}\nolimits_j^J {\left( {y_{ij} - \bar y_{ij}} \right)^2} }}$$ for cells i and over genes j. We selected the best performing models based on R2 on a validation dataset and showed this metric evaluated on test data in the manuscript. The performance of CVAEs is additionally assessed in style transfer tasks. In style transfer, the gene expression state and neighborhood of a reference node a from the source domain is encoded to estimate the latent states of this node. This latent representation is then decoded to the target domain of cell b, which implies conditioning the decoding on the target neighborhood:

$$z_a \sim q_\phi \left( {\bf{z}|Y_a,X_a^{\tt{l}},g(A,X^{\tt{l}})_a,X_a^{\tt{c}}} \right)$$
(2)
$$\hat{Y}_b = p_{\theta} \left( {\mathbf{{z}_a},{\mathbf{X}}_b^{\tt{l}},g({\mathbf{A}},{\mathbf{X}}^{\tt{l}})_b,{\mathbf{X}}_b^{\tt{c}}} \right)$$
(3)

where a, b are cell indices, qϕ is the amortized variational posterior and pθ is the decoder network. See also Conditional variational autoencoder NCEM for details on the notation.

### Unsupervised analysis

We used uniform manifold approximation and projection (UMAP) to embed the cells in two dimensions for visualization of high-dimensional data.

We computed the UMAP of B cell, FDC and mast cell substates (Fig. 2a) based on 50 principal components (PCs) and k = 500. We computed the UMAP of the scRNA-seq reference dataset of lymph nodes (Extended Data Fig. 4b) based on 50 PCs with k = 100. We computed the UMAP of the MERFISH brain data12 matrix (Extended Data Fig. 5a) based on the first 35 PCs and the k-nearest neighbor graph with k = 10. We computed the UMAP of L2/3 IT neurons in slice 153 (Extended Data Fig. 5a) and slice 162 (Supplementary Fig. 5a) of the MERFISH brain dataset based on the first 40 PCs with k = 40 and performed Louvain community detection using Scanpy17 to define stable L2/3 IT substates. We computed UMAPs of CD8 T cells in area 1 in the chip cytometry dataset (Extended Data Fig. 6) based on the gene expression matrix directly and k = 22, and UMAPs of CD8 T cells in image 1, 5, 8 and 16 of the MIBI-TOF cancer dataset (Extended Data Fig. 7) based on the gene expression directly and k = 60. We performed Louvain community detection of the latent space in CVAE and CVAE–NCEM IND models (Extended Data Fig. 9d,e and Supplementary Fig. 7c,d) based on the latent space using k = 80 for the MERFISH brain dataset and k = 250 for the chip cytometry colon dataset.

We performed cluster enrichment with Fisher’s exact test. Each contingency table is composed of two categorical variables. The first variable describes the binary assignment of cells to one L2/3 IT subcluster. The second variable describes the presence of a source cell type in their neighborhood. We performed Benjamini and Hochberg false discovery rate correction (FDR) of cluster enrichment P values. A similar approach was used for the cluster enrichment analysis of CD8 T cells in the chip cytometry colon and the MIBI-TOF cancer datasets.

### Type coupling analysis

We performed type coupling analysis, sender effect and receiver effect analysis based on a Wald test on the parameters estimates of linear NCEM obtained on the full dataset as ordinary least squares estimates. We performed FDR-correction of the resulting P values using the Benjamini-Hochberg correction method. The coupling measure between sender and receiver cells is the L1-norm of coefficients of significant coefficients that correspond to the specific receiver–sender interaction term in the linear model, or the number of differentially expressed genes. The sender effect and receiver effect analysis consists of the set of coefficients and their significance for a particular sender and receiver, respectively. The sender similarity analysis is a hierarchical clustering of the Pearson product-moment correlation coefficients of coefficient vectors of sender cell types for one defined receiver cell type.

### Differential receptor activity in NL-NCEM-LR

We used a t-test to obtain a ranking for highly differential receptor signaling in SEC depending on the presence of neighboring AEC. We used the neighborhood size that corresponded to the best performing resolution of the NL-NCEM-LR model.

### Subsampling robustness analysis

We randomly subsampled the spatial transcriptomics spots from the Visium lymph node data to 5%, 25%, 50% and 75% of all spots across three cross validations. We deconvoluted the resulting subsampled slide with cell2location and used this inferred spot composition as input to the NCEM type coupling analysis for spot-transcriptomic data. In order to assess the robustness with respect to identified putative dependencies, we computed the R2 between the inferred coefficient vectors over genes for each cell type pair between the fit to the complete data and the fit to the subsampled slide.

### CellPhoneDB and NicheNet

We inferred putatively communicating ligand–receptor pairs in lymph nodes using CellPhoneDB as implemented in squidpy7 on scRNA-seq data24 with n = 53,275 cells on the 10,000 most variable genes. We quantified sender–receiver interactions as the number of significant ligand–receptor pairs at an FDR-corrected P value of 0.05. Additionally, we considered the presence of nonzero expression of cognate ligand–receptor pairs (Extended Data Fig. 4e). We performed the CellPhoneDB analysis shown in Supplementary Fig. 6 based on n = 1,000 permutations. Additionally, we used randomly subsampled data for the analysis of MERFISH brain12 10% with n = 27,655, MIBI TOF cancer14 40% with n = 25,498 and CODEX cancer13 10% with n = 25,186.

We defined the 5,000 most variable genes per receiver cell type as target genes in a NicheNet analysis. For the following cell types, we limited the number of highly variable genes to the number given in brackets depending on the respective intracell-type heterogeneity: DC (500), endothelial (1,500), erythrocyte (250), HSC (1,000), macrophages (4,000), mast (1,000), monoctyes (2,000), myeloid (2,000), neutrophil (400), stromal cells (1,500) and T Treg (3,000). We defined all remaining genes as background genes for NicheNet. We selected the top-100-ranked ligands from NicheNet and thresholded the putative ligands to be expressed in at least 5% of all sender cells.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.