Introduction

The ability to measure gene expression within a spatial context, which is referred to as spatial transcriptomics (ST), includes a wide range of technologies, including assays based on the well-established in-situ fluorescent hybridization (FISH)1,2,3, and groundbreaking in-situ spatial barcoding3,4,5,6,7,8. Current ST techniques have the capacity for extensive multiplexing (i.e., hundreds to thousands of genes assayed in the same tissue) and generating an additional data modality representing the spatial position of the measured gene expression. The spatial information from ST experiments has allowed researchers to address questions about the tissue architecture of organs and diseases3,9,10,11. Of particular importance has been the use of ST to assess tissue heterogeneity in many cancerous tissues6,12,13,14,15,16,17,18,19,20,21, as well as infected tissues22. Spatial transcriptomics has also enabled a better understanding of cell-to-cell communication23,24,25 and identifying potential druggable targets18,26,27.

One common step in ST analysis is the identification of genes that differentiate tissue domains within a sample (i.e., differentially expressed genes among tissue niches)28,29,30. Although detecting spatially variable genes without a priori definition of tissue domains (i.e., clusters) is increasingly becoming a popular choice, many studies complete the identification of differentially expressed genes in ST data within domains in an analogous fashion as it is carried out among scRNA-seq cell clusters or cell populations. In those studies, once tissue niches have been identified in the ST samples via Louvain clustering, for example, researchers often proceed with non-parametric tests such as Wilcoxon’s rank sum test31,32,33 to identify differentially expressed (DE) genes among the niches. Although this approach may be appropriate for cases where transcriptomic differences between the compared domains are substantial (e.g., tumor vs. stroma), it does not account for the spatial dependency, which results in gene expression of neighboring sampling units (e.g., cell or spots) to be more similar than distant sampling units34. Because the spatial dependency in ST data is a driving factor of the gene expression patterns observed in tissues35,36, more sophisticated statistical methods could be used to account for the spatial dependency between sampling units37,38,39. Common approaches in many novel methods include identifying genes with spatial patterns, such as gene expression “hot spots”, or testing for genes showing high expression on each tissue domain (i.e., cluster) detected in a sample35,38,39,40,41,42,43,44. Benchmarking to compare the performance of these approaches has also been done45, which is crucial to aid in method selection. However, despite the wide availability of methods to detect spatially variable genes, less effort has been directed to quantify the impact of disregarding spatial dependency in ST data analysis.

Quantifying the impact of non-spatial approaches for detecting differentially expressed genes is an important endeavor, given that failure to account for the spatial autocorrelation in ST experiments may result in inflation of the type I error rate40,46,47,48. An increased type I error rate leads to more genes erroneously being identified as differentially expressed due to inaccuracy in the p-values (i.e., p-values too small). The impact of inflated type I error rates is increased due to unreliable estimation of gene expression variation, as the variation estimates do not consider the spatial correlation among the neighboring and distant sampling units. Even in non-spatial scRNA-seq, traditional differential expression methods fail to account for type I error inflation46, which led us to believe that considering the spatial correlation in ST experiments can alleviate this phenomenon.

Using linear mixed models offers a simple alternative for DE analysis in ST data. In bulk RNA-seq analysis, robust and well-established pipelines apply linear model fitting to test for differences in expression between two or more categories49,50. However, their application to ST requires additional considerations, given the spatial nature of this modality. One such consideration, which takes advantage of the flexibility of linear mixed models, is the incorporation of spatial covariance structures and variogram analysis51,52. To implement this approach as an alternative for the analysis of ST data, we performed differential gene expression analysis among groups of regions of interest (ROIs), spots, or cells in multiple ST experiments using a spatially aware implementation. The implementation tested for genes with significantly higher (or lower) expression in one group of ROIs, spots, or cells (e.g., cluster, tissue niche) to other clusters or tissue niches by fitting linear mixed models that explicitly account for the random spatial effects via spatial covariance structures. This implementation was tested on publicly available ST data sets generated with 10X Genomics’ Visium platform and Nanostring’s GeoMx and Spatial Molecular Imager (CosMx-SMI) platforms. We fitted corresponding non-spatial and spatial models to assess the impact of accounting for the spatial autocorrelation on the downstream DE analysis results.

Results

Comparison of non-spatial and spatial models

Models with or without spatial covariance structures were fitted for each gene to determine the most suitable alternative for capturing the expression differences among tissue domains. The tissue domain or cell type annotations for each ROI, spot, or cell were obtained from the studies that generated the data sets (Table 1; Supplementary Table S1). These studies generated the annotations using histopathology methods (Visium and GeoMx data sets) and cell phenotyping (CosMx data sets). Assessment of the models using the Akaike Information Criterion (AIC), an estimate of model fit, showed that spatial models with an exponential covariance structure provided a more accurate fit to Visium and SMI data than non-spatial models (Fig. 1). Among the four Visium samples, between 28 and 41% of the tests (i.e., gene expression in domain A vs gene expression in other domains) showed a better fit to the data when using a spatial model (i.e., lower AIC) compared to a non-spatial model. For the SMI datasets, the percentage of tests favoring the spatial models varied from 32 to 67%. In contrast, for the analysis of the GeoMx data sets, no more than 16% of the spatial models were favored over the non-spatial models (Fig. 1). When considering only genes with high expression in the samples (above the median expression), the proportion of favored spatial models increased to 48–66% in Visium studies and 51–93% in SMI studies (Fig. 1).

Table 1 Summary of spatial transcriptomics samples used in the differential expression tests.
Figure 1
figure 1

The results of model comparison between non-spatial models and spatial models with exponential covariance structure using AIC. For each gene x cluster test, the models with the lowest AIC were deemed to be a better fit to the data (solid color: spatial model with lower AIC, translucid color: non-spatial model with lower AIC). The tests were separated according to the average gene expression across all ROIs/spots/cells in the tissue sample (high vs low expression based on the median gene expression as threshold).

Control of type I error by spatial models

The differential expression p-values tended to be smaller in the non-spatial models than the spatial models, possibly due to an increase in the type I error inflation. However, these patterns were dissimilar among the ST technologies (Fig. 2). In the Visium experiments, 65–71% of the p-values were larger in the spatial models compared to the non-spatial models. In SMI, 60–66% of the p-values from the spatial models were larger than those from the non-spatial models. In the GeoMx experiments, the p-values from the spatial models were larger in 40–54% of the tests compared to the non-spatial models. These modeling results suggest a potential slight inflation in the type I error rate for the non-spatial models, whereby p-values generated by non-spatial models are too small likely due to inaccurate estimation of the variance in test statistic. In other words, the variance estimation for the non-spatial models is too small, resulting in a larger test statistic and artificially smaller p-value.

Figure 2
figure 2

Comparison between non-spatial and spatial (exponential model) differential expression tests. Each point corresponds to the − log10 (p-value) resulting from a (non-spatial or spatial) linear model fit between the expression of a gene and a binary variable indicating whether an ROI/spot/cell belongs to a biological annotation. The p-values indicate if the gene is differentially expressed (model coefficient different to zero) for a specific biological annotation compared to the rest of the ROIs/spots/cells. The solid line indicates a 1:1 correspondence (i.e., non-spatial and spatial models yield the same p-values). The colored dashed lines indicate the linear trend of the p-values for each sample. If a colored line lies below the solid line, p-values from the non-spatial model tend to be larger than those from the spatial model.

In the tests, we grouped all the spots or cells that did not belong to the tissue niche or cell type in which differentially expressed genes were being detected. Hence, we also tested for pairwise differentially expressed genes among three cell types in the two SMI data sets. Similar to the other tests pooling cell types, 44–64% of the p-values from the spatial models were larger than the non-spatial model p-values (Supplementary Fig. S1).

Discussion

Researchers often aim to detect differences in gene expression between cells or tissue niches, with many methods available for non-spatially informed assays, such as single-cell or “bulk” RNAseq49,50,53,54. Although spatial statistics methods have existed in the literature for several decades51, only recently have spatial statistics been applied to detect spatially variable genes in biological tissues assayed with ST35,38,39,40,41,42,43,44. In this study, we have shown that detecting differentially expressed genes in ST data benefits from statistical models that consider spatial autocorrelation. This leads to a more accurate estimate of the variance and thus produces more stable estimates of p-values. In other words, the spatial models account for the non-independence in the cells/spots, which is not addressed by traditional non-spatial linear models (i.e., two sample t-tests assuming independence between observations). Failure to consider this dependency between observations may cause the tests to underestimate the variance of the test statistic resulting in overly small p-values. Our results highlight the importance of considering the spatial dependency present in spatial-resolved transcriptomics data, which is often neglected in many studies conducting differential expression analyses. Notably, an excess of small p-values has also been noted in non-spatial scRNA-seq differential expression analysis46.

Our results comparing the models with and without a spatial correlation structure indicated that for densely sampled ST data (e.g., Visium, SMI), spatial models present a better model fit. For non-densely sampled experiments (e.g., GeoMx using ROIs), there was a slight tendency for non-spatial models to fit the data better when compared to spatial models, probably due to less spatial correlation among ROIs that are often sampled distant from one another. Considering this finding, using non-spatial models, such as two-sample t-tests, may be appropriate to study differential gene expression in studies using GeoMx where the ROIs are more spatially distant. Nonetheless, the correlation among ROIs within a single slide and the technical variation among slides in the same study could be considered when testing for differentially expressed genes55. Our results also indicate that for Visium and SMI, the spatial models performed better than non-spatial models in cases where the differential expression test involved a highly expressed gene. Nonetheless, the utility of spatial models in moderating the excess of small p-values might depend on the relative sample size of the groups being compared. If one of the groups is represented by a few cells, the non-spatial and spatial models produce similar p-values (Supplementary Fig. S1). In addition, genes with low expression are likely to show excessive zeroes (a characteristic of ST data56,57), and hence, fitting spatial mixed models may become challenging. Novel application of Bayesian methods to detect spatially variable genes appears robust to excessive zeroes in ST data57,58.

Our results were indicative that p-values obtained from the spatial model constituted a more biologically informative ranking metric for gene set enrichment analysis (GSEA). Using Benjamini-Hochberg (FDR) adjusted p-values from the non-spatial and spatial models as ranking metrics, we performed GSEA for the Hallmark gene sets with the R package fgsea59,60. The GSEA was conducted individually for each histopathology-defined domain in the glioblastoma Visium data set61. We observed that across all the significantly enriched Hallmark gene sets, the results were more significant using the p-values from the spatial models as compared to the non-spatial models, with the exceptions of oxidative phosphorylation in the necrosis niche and KRAS signaling downregulation in the necrotic edge niche (Fig. 3). A lower score of the KRAS signaling is expected in the necrotic edge, assuming that the tumor cells in this niche are not actively proliferating62. Although the GSEA was conducted on a single Visium sample (UKF243), and comprehensive testing is required to evaluate the information p-values can provide for pre-ranked GSEA, our analysis suggests that p-values derived from spatial models can be more appropriate for gene set enrichment analysis when using ST data.

Figure 3
figure 3

Scores resulting from the GSEA analysis calculated using the fgsea package. Genes were ranked using the p-values obtained by the non-spatial (white) or spatial models (dark grey). The spatial niches (cellular, necrosis, necrotic edge, vascular_hyper) were generated via histopathology examination by a previous study61. The gene sets depicted here represent Hallmark gene sets showing significant enrichment (adjusted p-value < 0.05). The “vascular_hyper” niche refers to tumor tissue with high vascularization.

Testing for differential gene expression is time-consuming for modern single-cell or spatial applications, as hundreds to thousands of individual tests are performed (i.e., each combination of gene expression in domain A vs gene expression in other domains). In addition, each test often includes hundreds to thousands of cells or spots. When applying spatial models for differential expression, the advantages of accurate estimation come at the cost of longer computation times than the non-spatial models (Fig. 4). Previously, we performed these models using the long-supported R package nlme. However, the estimation of parameters was exceedingly time-consuming (data not shown). Hence, we switched to using the R package spaMM to fit the statistical models. Using a High-Performance Computing environment (HPC), differential expression of a single gene between two tissue domains can take anywhere from a few seconds to more than 2 h in Visium- or SMI-generated data. Each test was run using a single core and 8 GB of memory, resources not typically available in conventional laptop computers if run across thousands of tests simultaneously. After considering these results, we opted to implement differential gene expression analysis using spaMM (as opposed to nlme) in our R package for spatial transcriptomics analysis spatialGE63, and we have named this approach STdiff. In the spatialGE R package, we made efforts to parallelize the analyses, but such efforts alone are not enough to achieve feasible computing times on personal computers and require the use of an HPC environment.

Figure 4
figure 4

Time of execution in log10(minutes) for the non-spatial (nsp) and spatial (sp) test conducted. Each dot represents a single test involving a gene and a cell type or tissue domain. Execution time corresponds to the time a single CPU core took to run the “fitme” function from the spaMM R package. The number of ROIs, spots, or cells is shown on the x-axis.

We also give a word of caution to researchers completing differential expression analysis on tissue domains or clusters defined on the same expression data, which leads to circularity and could result in overinterpretation of the function of the defined tissue domains. We propose that our approach and any other method that tests for differential expression on clusters defined with the same tested data should be only used to provide biological identity to the clusters (i.e., phenotyping). A non-circular application of these methods would require delineating tissue domains based on images by an expert pathologist, followed by differential expression analyses on the pathologist’s annotations. An example of this application is our testing on the glioblastoma Visium dataset61 included in this study.

In summary, considering spatial dependency is needed when conducting differential expression analysis in densely sampled spatially resolved transcriptomic experiments. In this study, we demonstrate that applying mixed models with spatial correlation structure effectively accounts for the correlation between spots or cells, thereby controlling for the inflated type I error rates observed in non-spatial models. Specifically, we show that spatial models with an exponential correlation structure provide a better fit to ST data than non-spatial models.

Material and methods

Spatial transcriptomic data sets

Spatial transcriptomics technologies are diverse, ranging in cellular and molecular resolution. Hence, we tested the utility of spatial linear mixed models for differential gene expression analysis using a series of data sets that reflected the spectrum of cellular and molecular resolution in ST technologies. We obtained publicly available ST data from spatial-barcoding technologies, including 10X Genomics' Visium and NanoString's GeoMx platforms, as well as the imaging technology produced from NanoString's CosMx Spatial Molecular Imager (SMI). The Visium data sets were generated by studies of the brain motor cortex64 and glioblastoma61. The GeoMx and SMI data sets were obtained from NanoString's Spatial Organ Atlas repository65. For each technology, we selected two tissue types with two samples for each tissue type (i.e., a total of 4 samples for each technology). More details of the selected samples and their access links are provided in the supplemental materials (Table 1; Supplementary Table S1). Using these data sets, we tested the utility of spatial models to detect DE genes. For this reason, a requisite for sample selection was that it contained biologically meaningful annotations (i.e., tissue domains, niches, or clusters) for each ROI/spot/cell. Preparation of expression and annotation data was carried out using the R statistical programming software version 4.166. Data was normalized using library size normalization and log-transformation in the package spatialGE67.

Model

In differential gene expression analysis, the goal is to identify genes for which the average expression in a group is significantly higher or lower than that in other groups. In the context of ST, the sampling units (cells, spots, ROIs) are grouped using either a clustering method or prior knowledge of the tissue (e.g., tissue domains or niches). Hence, the objective remains the same: To detect genes with significantly higher or lower expression in one group of cells, spots, or ROIs (i.e., spots or cells in a domain or tissue niche) compared to ROIs/spots/cells in another tissue domain or outside of the tissue domain of interest.

For the non-spatial case of our DE analysis proposal, the expression of a given gene (\({y}_{s}\)) at a given sample unit location (\(s\)) can be modeled as:

$${y}_{s}={\mu }_{k}+{\varepsilon }_{s}$$

where \({\mu }_{k}\) is the mean expression of the gene in cluster \(k\), and \({\varepsilon }_{s}\) is the random error at location \(s\), with \({\varepsilon }_{s}\sim N\left(0, {\sigma }^{2}\right).\) In order to extend this model to the spatial case, we add the effect of the spatial dependency as part of the random effects (\({U}_{s}\)) term to account for the correlation among neighboring sampling units as:

$${y}_{s}={\mu }_{k}+{{U}_{s}+\varepsilon }_{s}$$

where \({U}_{s}\) is defined as \({U}_{s}\sim MVN\left(0, V\left(\theta , d\right)\right)\), where \(d\) represents the distance between two ROIs/spots/cells. Several types of covariance structures can define the spatial dependency. In this study, we have tested the use of the commonly used exponential covariance structure, which is a particular case of the Matérn covariance structure, \(V\left(\theta ,d\right)={\tau }^{2}{\text{exp}}\left(-\frac{d}{\rho }\right)\). Other spatial covariance structures could be used. However, the spaMM R package includes support for the exponential structure. Other methods for detecting spatially variable genes also use exponential or Gaussian covariance structures (e.g., nnSVG43, SPARK-X40). The use of semiovariograms51 can be exploited in future studies that assess the fit of different covariance structures to spatial transcriptomics data.

Application of models on spatial transcriptomic data sets

The application of spatial models to densely sampled tissues can be computationally intensive, particularly as the number of ROIs/spots/cells increases. Spatial transcriptomics technologies such as Visium and SMI contain thousands of spots or cells, respectively, resulting in massive covariance matrices to manipulate thousands of genes. To test for the utility of spatial models over non-spatial linear models, we randomly chose 5000 genes in each sample of the GeoMx and Visium data sets. All genes were used in testing for the SMI data sets. Next, annotations for each ROI/spot/cell were used to indicate whether the ROI/spot/cell belonged to a biological cluster or tissue domain. For each combination of gene and ROI/spot/cell annotation, we fit non-spatial and spatial models with exponential covariance structure to test for differential expression between the ROI/spot/cells assigned to that biological annotation and the rest ROI/spot/cells (Table 1). Additionally, we assessed the utility of spatial models in pairwise comparisons between two given cell types of the SMI data sets. Specifically, we tested for differentially expressed genes among tumor cells, macrophages, and T cells in the non-small cell lung cancer (NSCLC) data set and among hepatocytes, stellate cells, and non-inflammatory macrophages of the liver data set. The models were fit using the spaMM68 R package on a high-performance computing (HPC) environment with one core assigned to each test and 8 GB of memory per core. The Akaike Information Criterion (AIC) was used to compare the spatial and non-spatial models. The AIC is an estimate of model fit based on the log-likelihood penalized by the complexity of the model using the formula \(AIC=2k-2ln(\widehat{L})\), where \(\widehat{L}\) is the estimated maximum likelihood of the model given the data and \(k\) is the number of parameters in the model. Given a set of models, the best-fitting model out of the group is the one with the smallest AIC. All analyses were conducted in R (version 4.1)66, and visualizations with the ggplot269 package.