Spatial transcriptomics deconvolution at single-cell resolution using Redeconve

Zhou, Zixiang; Zhong, Yunshan; Zhang, Zemin; Ren, Xianwen

doi:10.1038/s41467-023-43600-9

Download PDF

Article
Open access
Published: 01 December 2023

Spatial transcriptomics deconvolution at single-cell resolution using Redeconve

Nature Communications volume 14, Article number: 7930 (2023) Cite this article

10k Accesses
1 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Computational deconvolution with single-cell RNA sequencing data as reference is pivotal to interpreting spatial transcriptomics data, but the current methods are limited to cell-type resolution. Here we present Redeconve, an algorithm to deconvolute spatial transcriptomics data at single-cell resolution, enabling interpretation of spatial transcriptomics data with thousands of nuanced cell states. We benchmark Redeconve with the state-of-the-art algorithms on diverse spatial transcriptomics platforms and datasets and demonstrate the superiority of Redeconve in terms of accuracy, resolution, robustness, and speed. Application to a human pancreatic cancer dataset reveals cancer-clone-specific T cell infiltration, and application to lymph node samples identifies differential cytotoxic T cells between IgA+ and IgG+ spots, providing novel insights into tumor immunology and the regulatory mechanisms underlying antibody class switch.

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Introduction

Spatial transcriptomics (ST) technologies provide new tools to identify the cellular organization and interactions of biological samples, which is pivotal to biomedical studies. Multiple ST technologies have been developed and applied to mouse and human brains, lymph node, heart, etc., providing novel insights into cellular communication networks underlying different conditions. However, sequencing-based ST technologies, e.g., the 10x Genomics Visium platform and Slide-seq¹, are essentially of a spot-by-gene matrix structure, needing additional data to provide the cellular information. While the commercial emergence of imaging-based ST technologies, e.g., seqFISH+², MERFISH³, 10x Genomics Xenium⁴, and NanoString CosMx⁵, provides subcellular resolution, these technologies are limited by low gene throughput, with hundreds of customized genes detected, making their discovery potential unparallel to whole transcriptome-wide spatial technologies. Therefore, integrative analysis of whole transcriptome-wide ST data together with matched single-cell RNA sequencing (scRNA-seq) data is of high significance for biological discoveries.

Multiple effective and efficient algorithms have been proposed for integrative analysis of whole-transcriptome ST and scRNA-seq data. The current algorithms can be categorized to two groups: (1) mapping-based methods, e.g., NovospaRc⁶, Tangram⁷, Celltrek⁸, and CytoSPACE⁹, which map single cells to the positions of ST data according to gene expression similarity or related measures; and (2) deconvolution-based methods, e.g., CARD¹⁰, RCTD¹¹, cell2location¹², DestVI¹³, SpatialDWLS¹⁴, SPOTlight¹⁵, STRIDE¹⁶, CellDART¹⁷, Celloscope¹⁸, DSTG¹⁹, and Stereoscope²⁰, which try to reconstruct the ST observations by modeling the experimental process as sampling from different combinations of single cells. Mapping-based methods are superior to the current deconvolution-based methods regarding their single-cell resolution as the resolution of current deconvolution methods is limited to tens of cell types. However, mapping-based methods may introduce artificial biases during the mapping process due to the absence of strong constraint on the reconstruction accuracy of the ST observations. It is urgently needed to develop a deconvolution-based algorithm with single-cell resolution to fully release the biological information hidden in ST data.

In this study, we develop an algorithm, named as Redeconve²¹, to estimate the cellular composition of ST spots. Different from previous deconvolution-based algorithms, Redeconve introduces a regularizing term to solve the collinearity problem of high-resolution deconvolution, with the assumption that similar single cell states have similar abundance in ST spots. This algorithmic innovation not only improves the deconvolution resolution from tens of cell types to thousands of single cell states, but also greatly improve the reconstruction accuracy of ST data, enabling illustration of the nuanced biological mechanisms hidden in the ST data. Stringent comparison with the state-of-the-art algorithms including cell2location, CARD, DestVI, CellTrek, NovoSpaRc, and Tangram demonstrates the superiority of Redeconve in terms of reconstruction accuracy, cell abundance estimation per spot, sparseness of the reconstructed cellular composition, cell state resolution, and computational speed. Application to human pancreatic cancer data reveals novel insights into tumor-infiltrating CD8 + T cells, and application to human lymph node data reveals new clues for the regulatory factors of IgA+ and IgG+ B cells.

Results

Redeconve: a quadratic programming model for single-cell deconvolution of ST data

Redeconve uses scRNA-seq or single-nucleus RNA-seq (snRNA-seq) as reference to estimate the abundance of different cell states in each spot of ST data (Fig.1a). Different from previous deconvolution methods, Redeconve does not need to group single cells into clusters and then do deconvolution. Instead, Redeconve treats each cell of the sc/snRNA-seq data as a specific cell state serving as reference to estimate the cellular composition of ST data. The direct usage of sc/snRNA-seq data as reference is conceptually direct and computationally efficient, with the potential to handle the heterogeneity of ST data. However, direct usage of sc/snRNA-seq data as reference will introduce a new challenge, i.e., collinearity. That is, multiple single cells have similar profiles of gene expression, prohibiting the accurate estimation of the abundance of individual cell states. We introduce a biologically reasonable heuristic by assuming that similar cells have similar abundance within ST spots, and thus mathematically introduce a regularization term in the deconvolution model based on non-negative least regression. Solving this regularized deconvolution model by quadratic programming will produce robust estimation of the cellular composition at single-cell resolution for each spot of ST data.

**Fig. 1: Overview of the Redeconve algorithm and benchmark analysis.**

High accuracy, resolution, robustness, efficacy, and scalability of Redeconve

We applied Redeconve to multiple ST datasets from various platforms (10x Visium, Slide-seq v2, ST, etc.) and compared the performance with other methods. We first compared the consistency of results among different methods at the cell-type resolution based on a human breast cancer dataset. The results suggested that deconvolution-based methods including Redeconve had higher consistency with each other than mapping-based methods (Fig. 1b), indicating the relative superiority and robustness of deconvolution-based methods. This observation is confirmed on additional ST datasets (Supplementary Fig. 1). Different from previous deconvolution-based methods which only reported cell-type-level results, Redeconve can further dictate fine-grained cell states at single-cell resolution (Fig. 1c and Supplementary Fig. 2). On a ST dataset from a human breast cancer sample, Redeconve resolved 249 different cell states from 9 major cell types (Fig. 1c). On a ST dataset from mouse cerebellum, Redeconve resolved 1000 different cell states from 14 major cell types (Fig. 1c). In contrast, the resolution of previous deconvolution methods is limited by the clustering results of sc/snRNA-seq data.

In addition to the robustness and resolution superiority, Redeconve also improves the reconstruction accuracy of gene expression per spot, and the improvement is independent on similarity measures such as cosine similarity, Pearson’s correlation, and Root Mean Square Error (RMSE) between the true ST gene expression profile and the reconstructed gene expression vector (Fig. 1d, and Supplementary Figs. 3–4). Redeconve also reached high accuracy of estimated cell abundance (based on a ground truth by nucleus counting, Fig. 1e and Supplementary Fig. 5), and superior computational speed (Fig. 1f and Supplementary Fig. 6). When suitable reference is provided, e.g., matched scRNA-seq data, Redeconve can reach >0.8 cosine accuracy for most ST spots (Fig. 1d). With no suitable reference available (for example, only snRNA-seq data are accessible for brain samples), Redeconve still outperforms other methods (Fig. 1d). Pairwise comparison between Redeconve and other methods further shows the superiority of Redeconve on almost all spots regarding the reconstruction accuracy (Supplementary Figs. 7–12). Because Redeconve conducts deconvolution analysis spot by spot, parallel computation is enabled and thus Redeconve demonstrates superior computation speed compared with current deconvolution algorithms (Fig. 1f and Supplementary Fig. 6).

To evaluate the performance of Redeconve in estimating the absolute abundance of cells within ST spots, we applied Redeconve to three datasets: Mouse Brain, PDAC and Human Breast Cancer Xenium, in which the cell counts were obtained by nucleus counting based on image segmentation^12,22,23. Without any priori information, the results of Redeconve showed high conformity with the “ground-truth” cell counts (Fig. 1e), similar to those methods with cell counts (or cell density) as priori knowledge e.g., cell2location and Tangram (Supplementary Fig. 5). We used Shannon entropy to estimate the potential number of different cell states within each spatial spot (see Methods for details about using perplexity as a metric). Redeconve revealed high spot heterogeneity by showing that some spots had complex cellular composition while others had a relatively simple one. In contrast, the entropy of other methods is uniformly high, showing that each spot had been composed of almost all the cell types in reference, which is unrealistic (Supplementary Fig. 13).

Single-cell resolution is unique to Redeconve compared with previous deconvolution algorithms

Then we examined whether the current deconvolution-based algorithms could be upgraded to single-cell resolution by switching the required cell types to thousands of single cells as Redeconve does. Among all the methods we evaluated, only cell2location and DestVI completed the task but took a rather long time compared with the cell-type inputs (Supplementary Fig. 14) while other algorithms reported errors. Although single-cell inputs improved the reconstruction accuracy of cell2location on the ST data of a human lymph node sample based on the 10x Genomics Visium platform, cell2location did not reach improvement on the human pancreatic tumor and mouse brain datasets, and DestVI failed on all three evaluations (Supplement Fig. 15). In contrast, Redeconve outperformed cell2location and DestVI on almost all spots of the evaluated datasets (Fig. 2a). When switching the inputs from cell types to single cells, DestVI achieved well sparsity regarding the different cell states within each spot (measured by perplexity according to Shannon entropy), similar to the performance of Redeconve. But cell2location reported extremely high perplexity for most spots, indicating overpredicted presence of almost all cell types and thus high false positive rate (Fig. 2b). Therefore, changing inputs from cell types to single cells cannot upgrade the performance of current algorithms to levels parallel to that of Redeconve, and the superiority of Redeconve analysis is mainly derived from algorithmic innovation.

**Fig. 2: Performance benchmarking with single-cell inputs and simulated datasets.**

Evaluating the impact of cell-type resolution on deconvolution by simulation

To evaluate how the cell-type resolution of reference data impacts the deconvolution analysis, we devised a series of simulation experiments to showcase the performance differences of Redeconve and the state-of-the-art algorithms. We constructed three pseudo-bulk RNA-seq datasets by averaging the gene expression data of individual cells based on scRNA-seq data from the PDAC²⁴, human lymph node^12,25 and human testis²⁶ datasets separately (Fig. 2c and Methods). Then we applied Redeconve and cell2location, the only alternative method capable of this task. With direct comparison with ground-truth, the results indicate that Redeconve performs substantially better than cell2location, as evidenced by its significantly higher accuracy (Supplementary Figs. 16–18). When examining the relationship between accuracy and number of clusters in single-cell reference, Redeconve showed an increase in accuracy when the number of clusters grows, while cell2location experienced a sharp drop (Fig. 2d). This suggests that Redeconve is capable of handling large-scale scRNA-seq data more effectively and can use finer-grained clusters to increase accuracy instead of becoming confused. Furthermore, simulation experiments also corroborate the validity of using perplexity as a metric of sparsity (Supplementary Table 1 and Methods).

Evaluating the estimating accuracy of cell-type proportion by 10x Genomics Xenium data as ground truth

Single-cell ST platforms, such as MERFISH³, Xenium⁴ and CoxMx⁵, are commercially emerging as a powerful tool for the high-resolution mapping of the precise location of single cells, but are limited by the number of genes profiled during experiments because customized probes specific to target genes need to be designed and synthesized before experiments. The high resolution of these platforms provides natural ground truth to evaluate the performance of Redeconve. Here, we used a human breast cancer Xenium dataset generated by 10x Genomics⁴ to evaluate the performance of Redeconve regarding reconstruction of ST spot expression profiles, cell type proportion predictions and abundance of individual cell states. This dataset encompasses not only Xenium data containing coordinates and expression profiles of segmented single cells, but also matched scRNA-seq (including 5’, 3’ and scFFPE-seq) and Visium data, enabling us to generate ground truths for Visium spots regarding cell abundances and cell type proportions (See Methods for details). 3906 Visium spots overlapped with the Xenium data were extracted for comparative analysis (Fig. 3a). Compared with the state-of-the-art algorithms including cell2location, DestVI, CARD, NovoSpaRc, CellTrek, and Tangram, Redeconve demonstrated superior cosine similarities between the predicted cell type proportions and the ground truths for most of the Visium spots (Fig. 3b). Specially, Redeconve exhibited superior performance on more than 60% and 70% of spots compared to alternative deconvolution-based or mapping-based methods, respectively (Supplementary Fig. 19). Redeconve, cell2location and Tangram demonstrated comparable performance in estimating the absolute cell abundance within Visium spots, as evidenced by high Pearson’s correlation with the ground-truth cell counts indicated by the overlapped cell counts according to the Xenium data, but the performance of Redeconve was more robust to the selection of scRNA-seq references (Fig. 3c and Supplementary Fig. 20). Similarly, the performance of Redeconve in reconstructing the expression profiles of different Visium spots was also more robust to the selection of different scRNA-seq references compared with the state-of-the-art algorithms (Fig. 3d and Supplementary Fig. 21).

**Fig. 3: Benchmarking Redeconve performance on a human breast cancer Xenium dataset.**

Single-cell resolution by Redeconve enables identification of pancreatic cancer-clone-specific T cell infiltration

To demonstrate the power of deconvolution at single-cell resolution on solving practical biological problems, we further investigated the Redeconve results of the human pancreatic ST dataset²⁴. The ST is from the original ST platform, and scRNA-seq data from the same individual were obtained through InDrop. Redeconve with single cells as reference outperformed other methods regarding the reconstruction accuracy for almost all the spots (Fig. 4b, c and Supplementary Fig. 7). Using cell types as reference and varying the cell-type resolution from 20 to 318 clusters, Redeconve still resulted in stable superior performance compared with other methods (with the same inputs) (Supplementary Fig. 22), suggesting the advantage of Redeconve by excluding the interference of single-cell reference vs cell-type reference, although Redeconve is the only algorithm designed to take single cells as reference as we demonstrated in the previous sections. Benchmark regarding individual cell types again showed the superiority of Redeconve. We identified marker genes for each cell type (Supplementary Table 2), and calculated the expression consistency between ST observation and reconstructed profiles by different algorithms across all spots (See Methods for details). Redeconve outperformed other algorithms on most cell types (13/20 in top one), especially for cancer, ductal, endocrine cells, and demonstrated comparable performance to the best performers on the remaining of cell types (20/20 in top three, Supplementary Fig. 23a, b). In addition, the performance of Redeconve, cell2location, and Tangram was robust to cell type abundance variations in scRNA-seq data, while the performances of DestVI, CARD, and NovoSpaRc were positively correlated with cell type abundances (p-value < 0.05) (Supplementary Fig. 23c).

**Fig. 4: Single-cell deconvolution of a human PDAC (pancreatic ductal adenocarcinoma) ST dataset.**

Histological analysis based on H&E staining identified four tissue regions: pancreatic, cancer, duct epithelium, and stroma²⁴ (Fig. 4a). Redeconve, CARD, and DestVI successfully distinguished the four types of tissue regions, consistent with histological analysis (Supplementary Fig. 24,). Meanwhile, cell2location, NovoSpaRc and Tangram failed in several conditions (Fig. 4d and Supplementary Fig. 24). Further inspection into a specific spot in the upper cancer region (Fig. 4d, the upper zoomed-in piechart) shows that deconvolution-based methods (Redeconve, cell2location, DestVI and CARD) are able to detect fibroblast, which is known to be abundant in pancreatic cancer^24,27,28, while mapping methods (Tangram and NovoSpaRc) fail in this task.

Then we examined the detailed characteristics of tumor-infiltrating T cells based on these results, which is important to understand the tumor immune microenvironment of pancreatic cancers. The results of cell2location, NovoSpaRc, Tangram and DestVI reported T cells in almost all spots (Fig. 5a), inconsistent with the nature of PDAC as cold tumors; Meanwhile, Redeconve and CARD clearly suggested the sparsity of tumor-infiltrating T cells in pancreatic cancer, consistent with the spatial distribution of T cell-related genes (CD3, IL32 and TMSB4X, Fig. 5a, Supplementary Figs. 25–27). As CARD is limited by the cell-type resolution, it is difficult to provide more detailed insights, but Redeconve analysis enables deeper investigation. We identified three T cells in the reference scRNA-seq data that appeared in multiple ST spots, indexed as “T.cell.8”, “T.cell.11” and “T.cell.35” separately (Fig. 5b). By examining their expression profiles in the reference scRNA-seq, we identified T cell 11 as regulatory T cell (CD4⁺ FOXP3⁺) and 8 and 35 as CD8⁺ cytotoxic T cells. For fair comparison, we further divided T cells in the scRNA-seq reference data into three groups, i.e., cytotoxic, helper and regulatory T cells and used these three T cell types together with other cell types as reference to re-run other deconvolution algorithms (Supplementary Fig. 28). Consistent with the spatial distribution of CD8 and FOXP3, the result of Redeconve is the most reasonable (Supplementary Figs. 25 and 27). According to the Redeconve deconvolution results, almost all the T cells within cancer region were similar to regulatory T cell 11, and T cell states similar to 8 and 35 only appeared outside or at the edge of the cancer region (Fig. 5b, c), consistent with the immune suppressive status of the cancer region of pancreatic tumors^24,29.

**Fig. 5: Cancer-clone-specific CD8 + T cell infiltration revealed by Redeconve in human pancreatic cancer.**

We further conducted co-localization analysis of these three T cell states with the resting cell states by calculating the Pearson correlation coefficient of abundance across all spots based on the Redeconve results (Fig. 5d). The results suggested that the regulatory T cell state similar to T cell 11 mainly co-localized with macrophages similar to macrophages B. 6, 8, and 16 together with duct cells of two different states. Interestingly, T cell 8 and 35 were mainly co-localized with cancer cells, indicating dispersed cancer cells outside the cancer region. Although provided scRNA-seq reference with higher T cell resolution (cytotoxic/helper/regulatory T cells), such co-localization was not observed by other methods (Supplementary Fig. 29).

Furthermore, these two T cell states were separately co-localized with different cancer clones, with T cell state 8 co-localized with cancer clone B and 35 with cancer clone A. Differential gene expression analysis based on the reference scRNA-seq data further indicated the differences between these two pairs of T cells and cancer cells (Fig. 5e, f). It is revealed previously that TM4SF1+ cancer cells denoted late-stage while S1004A+ cancer cells (clone B) denoted early-stage^30,31,32. Our analysis identified the co-existence of TM4SF1+ cancer cells (clone A) and S1004A+ cancer cells (clone B) with different CD8⁺ T cells, which is important to understand the interactions between cancer and T cells. We found that interferon-induced genes (IFIT1 and IFI44L, for example) and HLA-related genes (HLA-A, HLA-B and HLA-C) were all up-regulated in cancer clone B (Fig. 5f), and correspondingly T cell state 8, which is colocalized with cancer clone B, had high expression of HMGB2, HLA-B and HLA-C (Fig. 5f), indicating well-stimulated T cell response^33,34. In contrast, T cell state 35 was HMGB2-negative, HLA-low and TMBS10-positive and co-localized with more A-type macrophages, indicating a less efficacy state^33,34. Therefore, with accurate deconvolution at the single-cell resolution, Redeconve can reveal detailed cell-cell interaction at single-cell level and enables discoveries revealing the underlying mechanisms of tumor immunity.

Redeconve sheds novel insights into the regulatory mechanisms underlying antibody class switch

Redeconve were further applied to analyze an ST data of human secondary lymphoid organs¹². We again compared Redeconve with other methods on this dataset. In terms of cosine similarity-based reconstruction accuracy, Redeconve achieved mean similarities of 0.868 and significantly outperformed other methods (Fig. 1d). Redeconve achieved high reconstruction accuracy for almost all spots, while, as for other methods, low similarities regions were obvious (Supplementary Fig. 30). We further checked the sparsity of the results by calculating L0-norm. L0-norm of Redeconve has a reasonable distribution between 4 and 32, indicating that only dozens of cell states appear in one spot. In contrast, other methods except CellTrek demonstrated results that almost all cell types appeared in every spot. CellTrek, a mapping-based algorithm, reached low level of L0-norm by generating many “zero-cell” spots, of which Redeconve successfully reconstructed the cellular composition (Supplementary Fig. 31).

We further characterized the spatial heterogeneity at single cell resolution to explore the potential regulators of antibody class switch based on this human lymph node data. During the antibody maturation, an activated B cell can change its antibody production from IgM to either IgA, IgG, or IgE depending on the functional requirements, which is termed as class switching³⁵. However, the detailed regulators underlying antibody class switching is unclear. Consistent with previous examples, Redeconve outperformed other methods in reconstructing the ST gene expression profiles for almost all spots (Fig. 1d). Spatial pie chart showed that Redeconve produced obvious regional division, while other methods showed blurred or even no boundaries (Fig. 6a). CellTrek failed to analyze some of the spots. Furthermore, compared with cell-type deconvolution, Redeconve identified 159 different cell states from 17 cell types (Supplementary Fig. 2). 12 different B plasma cell states were identified in the ST data, which can be further divided into 3 groups (IgA + , IgG+ and negative) based on the expression of IGHA and IGHG genes. Interestingly, we found that IgA+ and IgG+ B plasma cells are spatially mapped to spots in different regions with little overlap, which means that we could define IgA+ and IgG+ spots based on the abundance of those B plasma cells (Fig. 6b). Next, we took one spot in each of the two regions for detailed inspection at the single-cell resolution. The cell proportion of the two spots shows that CD8⁺ T cells account for a large proportion in the IgA+ spot, suggesting latent interactions between CD8⁺ T cells and IgA+ B plasma cells (Fig. 6c). To confirm the universality of such phenomenon, we conducted differential gene expression analysis between IgA+ and IgG+ spots to identify up-regulated and down-regulated genes (Fig. 6d). As we expected, IGHA and IGHG were the most differentially-expressed genes; Genes associated with T cells (TRAC, TRBC2, CD3D, CD8A for example) were more up-regulated in IgA+ spots, confirming the existence of such interaction. Since lymph node is one of the organs that generate IgA+ plasma cells, the IgA+ spots might be the potential induction sits for IgA+ plasma cells, and CD8⁺ T cells may play an important role in such process (Fig. 6d). Further co-localization analysis provides more insights (Fig. 6e). We found co-localization of IgA+ plasma cells with CD8⁺ cytotoxic T cells, consistent with previous observation that CD8⁺ cytotoxic T cells can help the formation of IgA+ plasma cells^36,37. Furthermore, co-location of IgG+ plasma cells and macrophages was identified (Fig. 6e), indicating the roles of macrophages during the genesis of IgG+ plasma cells^38,39. Hence, deconvolution at single cell resolution by Redeconve gains additional insights that may be helpful for uncovering previously opaque biological question.

**Fig. 6: Single-cell deconvolution of a human secondary lymphoid organ ST dataset by Redeconve revealed differences between IgA+ and IgG+ spots regarding cellular composition.**

Discussion

Integrative analysis of disassociated single-cell and in situ ST data is pivotal to construct a comprehensive map of the cellular composition and interactomes of tissues. However, because of technological limitations, current computational methods for integrative analysis of single-cell and ST data are limited to the cell type resolution. To deep mine the biomedical information hidden in the single-cell and ST data, here we present Redeconve, a single-cell resolution deconvolution algorithm for integrative analysis of ST data with sc/snRNA-seq data as reference based on a quadratic programming model with regularization of cell-cell similarity, which enables building of comprehensive spatial maps at single-cell resolution for diverse tissues.

We performed stringent evaluation on multiple datasets from a diverse set of ST platforms. The results suggested superiority of Redeconve compared with the state-of-the-art deconvolution-based and mapping-based algorithms in terms of resolution, accuracy, sparsity, robustness, and computational speed. Such improvement from cell-type to single-cell resolution unlocks novel biological discoveries as exemplified by applications in human pancreatic cancer and lymph node samples.

While Redeconve enables deconvolution at single-cell resolution and thus will be a powerful tool for biomedical discoveries, matching between scRNA-seq and ST data appears to be an important factor determining the quality of deconvolution analysis as shown by our evaluation on different tissues (Fig. 1d). Therefore, construction and selection of reference scRNA-seq data according to the specific ST data configuration will be critical in future applications.

Although Redeconve demonstrates superior computational efficacy compared with the state-of-the-art deconvolution algorithms, the single-cell resolution may require extensive computational cost for resolving thousands of cellular states, especially when the cellular throughput of scRNA-seq technologies increases exponentially. Because of the computational complexity of quadratic programming, Redeconve can currently resolve thousands of cellular states based on a standard machine. An enhanced version based on algorithmic innovation or hardware acceleration is needed to handle scRNA-seq datasets of tens of thousands of cellular states.

Deconvolution at single-cell resolution unlocked by Redeconve may also benefit the imputation of ST data with the aid of the rich information in scRNA-seq data. Redeconve has implemented a function to reconstruct the gene expression profiles of individual spots based on the single-cell deconvolution results based on a parsimony principle. The imputed ST data may be more informative to dissect the cellular states of specific tissues.

In summary, we present an algorithm named as Redeconve for conducting deconvolution-based analysis of scRNA-seq and ST data at single-cell resolution. The usage of Redeconve is expected to help mapping the cellular architecture at fine granularity across diverse biomedical situations including tumor, immune, development, neurology, and other health and disease conditions. Applications to human pancreatic cancer and lymph nodes showed the potential of Redeconve to bring completely novel insights due to the single-cell resolution unlocked and the superior technical metrics of Redeconve compared to the current state-of-the-art algorithms. We expect Redeconve will be a useful tool to advance the application of scRNA-seq and ST technologies in diverse research disciplines.

Methods

Algorithm

Model overview

In general, we apply an improved linear regression model to deconvolute ST data at single-cell resolution. Given a single-cell (or single-nucleus) expression matrix $X$ with dimensions ${n}_{{{{{{\rm{genes}}}}}}}\times {n}_{{{{{{\rm{cells}}}}}}}$ and a ST expression matrix $Y$ with dimensions ${n}_{{{{{{\rm{genes}}}}}}}\times {n}_{{{{{{\rm{spots}}}}}}}$ as input, Redeconve returns a matrix $\beta$ with dimensions ${n}_{{{{{{\rm{cells}}}}}}}\times {n}_{{{{{{\rm{spots}}}}}}}$ indicating the estimated number of each cell in each spot. The goal of our model is to optimize the following loss function for each spot separately:

$$\begin{array}{c}L(\beta ):=\mathop{\sum }\limits_{j=1}^{J}{\left({y}_{j}-\mathop{\sum }\limits_{i=1}^{I}{x}_{ij}{\beta }_{i}\right)}^{2}+c\cdot \mathop{\sum}\limits _{{i}_{1}\ne {i}_{2}}{R}_{{i}_{1},{i}_{2}}{({\beta }_{{i}_{1}}-{\beta }_{{i}_{2}})}^{2}\\ s.t. \,{\beta }_{i}\ge 0 \,{{{{{\rm{for}}}}}}\, i=1,\, 2,\ldots,\, I\end{array}$$

(1)

Here $i=1,\, 2,\ldots,\,I$ denotes cells and $j=1,2,\ldots,J$ denotes genes. The first term is the traditional Least Square (LS) term and the second term is a regularization term, $c$ is a hyperparameter tuning the weight between the two terms. We will later explain the regularization term in details.

Note that this is a typical quadratic programming problem, so we can rewrite our goal as:

$$\begin{array}{c}{\min }_{\beta }\, \frac{1}{2}\,{\beta }^{T}G\beta -{d}^{T}\beta \\ s.t.\,{a }^{T}\beta \ge b\end{array}$$

(2)

Where $G$ is the Hessian matrix, ${d}^{T}=(2{\sum }_{j}{y}_{j}{x}_{1j},\ldots,\, 2{\sum }_{j}{y}_{j}{x}_{Ij})$, and ${a}^{T}$, $b$ are separately

$${a}^{T}=\left(\begin{array}{cccc}1 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1\end{array}\right),\,b=\left(\begin{array}{c}0\\ 0\\ \vdots \\ 0\end{array}\right)$$

(3)

So we can efficiently solve this problem with the solve.QP function in R package “quadprog”.

The regularization term

In sc/snRNA-seq data, the collinearity among cells is serious: cells of the same cell type have very similar expression profiles. This problem would lead to instability of coefficients and reduction of efficiency when directly doing linear regression. To solve this collinearity problem, we further include a regularization term into the loss function. By add this term, we aim at stabilizing the coefficients while having minor effect on the residuals.

In the regularization term $c{\sum }_{{i}_{1}\ne {i}_{2}}{R}_{{i}_{1},{i}_{2}}{({\beta }_{{i}_{1}}-{\beta }_{{i}_{2}})}^{2}$, ${R}_{{i}_{1},{i}_{2}}$ is a measure of similarity between cell ${i}_{1}$ and ${i}_{2}$, which is

$${R}_{{i}_{1},{i}_{2}}=\left\{\begin{array}{c}{r}_{{i}_{1},\, {i}_{2}},\quad{r}_{{i}_{1},{i}_{2}} \, > \, 0\\ \, 0,\quad{r}_{{i}_{1},{i}_{2}}\le 0\end{array}\right.$$

(4)

Where ${r}_{{i}_{1},{i}_{2}}$ is the Pearson correlation coefficient between cell ${i}_{1}$ and ${i}_{2}$. Namely, when the Pearson correlation coefficient is greater than zero, ${R}_{{i}_{1},{i}_{2}}$ is equal to the Pearson correlation coefficient; otherwise ${R}_{{i}_{1},{i}_{2}}$ is zero. So, we manually bring the coefficients of cells whose expression profile is similar closer. By doing this, we can guarantee the robustness and precision of our result.

Determination of the hyperparameter

A key point of this model is how to select the hyperparameter: an extremely small hyperparameter will make the regularization term ineffective, while an extremely large one will greatly affect the fitting residuals. An ideal hyperparameter should be as large as possible while affecting the fitting residual as little as possible. Here we offer 3 ways to set the hyperparameter:

1.
“default”: use the default hyperparameter we set according to the number of cells and genes;
2.
“customized”: set the hyperparameter arbitrarily by the user;
3.
“autoselection”: automatically calculate and select the optimal hyperparameter.

In mode “default”, we use the following formula to set the hyperparameter:

$$c={c}_{0}\cdot {n}_{{{{{{\rm{genes}}}}}}}/{n}_{{{{{{\rm{cells}}}}}}}^{2}$$

(5)

Where ${c}_{0}$ is a predetermined constant and is set to 10⁵. The idea of this formula is: (1) the LS term is approximately proportional to ${n}_{{{{{{\rm{genes}}}}}}}$, so as ${n}_{{{{{{\rm{genes}}}}}}}$ increases $c$ should synchronously increase; (2) the regularization term is approximately proportional to the square of ${n}_{{{{{{\rm{cells}}}}}}}$, so as ${n}_{{{{{{\rm{cells}}}}}}}$ increases $c$ should decrease by ${n}_{{{{{{\rm{cells}}}}}}}^{2}$.

In mode “autoselection”, we apply the following method to determine the optimal hyperparameter:

1.
We first calculate a hyperparameter ${c}_{d}$ according to the formula in mode “default”, and set up a series of hyperparameter ${c}_{1},\,{c}_{2},\,{c}_{3},\,{c}_{4},\,{c}_{5}$ as $0.01{c}_{d},\,0.1{c}_{d},\,{c}_{d},\,10{c}_{d},\, 100{c}_{d}$;
2.
Then we run deconvolution with these hyperparameters separately, and calculate the residual ${\varepsilon }_{i}$ for each ${c}_{i}$;
3.
We further calculate:
$${d}_{i}=\frac{\varDelta \varepsilon }{\varDelta c}=\frac{{\varepsilon }_{i+1}-{\varepsilon }_{i}}{{c}_{i+1}-{c}_{i}}$$
(6)
4.
We check these ${d}_{i}$, then choose ${c}_{i}$ that maximizes ${d}_{i}$ as the optimal hyperparameter (This indicates: if the parameter continues to increase, the residual will increase significantly). Namely, we choose ${c}_{i}$ that satisfies:

$${\max }_{i\in 1,2,\cdots,I}{d}_{i}=\frac{{\varepsilon }_{i+1}-{\varepsilon }_{i}}{{c}_{i+1}-{c}_{i}}$$

(7)

By this procedure, we can get the hyperparameter that maximizes the power of regularization term while having minor effect on the LS term.

We use examples to illustrate the effect of hyperparameters on the results. We applied Redeconve to the human lymph node dataset with a series of different hyperparameters from 0 to 1e08, then calculated the deconvolution residuals (RMSE_normal) to evaluate the effect of hyperparameter (Supplementary Fig. 32). The results showed that an optimal hyperparameter can enhance the deconvolution precision in addition to avoiding co-linearity caused by closely similar cell states. Also, the hyperparameter would also affect the number of cell states selected in the result. A bigger hyperparameter would lead to more cell states selected (Supplementary Fig. 33). We set the hyperparameter as 0 and 1e04 separately on the PDAC dataset. With a hyperparameter of 1e04, more T cells were detected than a hyperparameter of zero in the PDAC dataset (Supplementary Fig. 34). Considering the distribution of CD3+ cells (Shown in Supplementary Figs. 25–27), this example clearly illustrates how the hyperparameter enables biological discovery.

Data preprocessing

To run the deconvolution, the following data preprocessing steps are necessary. Note that some steps are alternative according to users’ needs.

1.
Get the expression profiles of cell type/Sampling of single cells. If a cell-type deconvolution is to be run, we will estimate the expression profile ${\bar{x}}_{ij}$ of cell type $i$ and gene $j$ as the average expression of gene $j$ across all cells within cell type $i$. If a single-cell deconvolution is to be run and the number of single cells is overwhelming, we will take stratified samples of cells by cell type to get a rational number of cells.
2.
Gene filtering. Deconvoluting with tens of thousands of genes is time-consuming or even misleading, so we select highly variable genes before deconvolution for computational efficacy. Filtering criteria include the following three standards: (1) These genes appear in both sc/snRNA-seq data and ST; (2) The variance of these genes in sc/snRNA-seq data must be larger than a threshold (default is 0.025); (3) The average counts per spot must be bigger than a threshold (default is 0.003). This finally results in ~8000 genes for deconvolution. Redeconve allows deconvolution without gene filtering with higher computational cost.
3.
Normalization of reference. We add a pseudo-count of 0.5 to the “zeros” in sc/snRNA-seq data, and normalize sc/snRNA-seq data to TPM (transcripts per million). Preprocessing operations are not needed for ST data.

Real datasets for benchmarking

PDAC

ST data of a human pancreatic ductal adenocarcinomas (PDAC-A) with 438 spots and sample-matched scRNA-seq data (InDrop) with 1926 single cells across 20 cell types were integrated by Moncada et al., and an intersection of 19,736 genes was used in our study. The annotation of four main structural regions based on histological analysis by Moncada et al. was used during our analysis to depict the spatial characteristics of the ST data.

Human lymph node

Human lymph node Visium data were downloaded from the 10x Genomics website (https://www.10xgenomics.com/resources/datasets/human-lymph-node-1-standard-1-1-0), which includes a total number of 4035 spots. ScRNA-seq data were collected from Kleshchevnikov et al, of which 73,260 cells across 34 cell types were collected. Since this scRNA-seq dataset captured a wide spectrum of immune cell states spanning lymph nodes, tonsils and spleen, we used it as reference to reveal the phenotypic diversity of immune cells when deconvoluting at single cell resolution.

Mouse cerebellum

The DropViz scRNA-seq dataset were generated by Saunders A. et al. and were collected by Cable D. M. et al. along with the annotations of the cells. The Slide-seq mouse cerebellum data were collected by Cable D. M. et al. using the Slide-seq v2 protocol¹¹. Both of these datasets were downloaded from https://singlecell.broadinstitute.org/single_cell/study/SCP948/robust-decomposition-of-cell-type-mixtures-in-spatial-transcriptomics#study-download.

Human breast cancer

Human Breast Cancer Visium data related to the Wu et al. study⁴⁰ was available at https://zenodo.org/record/4739739#.Ys0v6jdBy3D. Sample ‘CID4290’ that includes 2426 in tissue spots was used for deconvolution. ScRNA-seq data that includes 100,064 single cells with annotations (Access number: GSE176078, the NCBI GEO database) served as reference to do deconvolution analysis.

Human testis

The processed Human Testis Slide-seq dataset was download from https://www.dropbox.com/s/q5djhy006dq1yhw/Human.7z?dl=0 and sample ‘Puck5’ with 36,591 spots was used for evaluation in this study⁴¹. The reference scRNA-seq data that includes 6490 single cells was obtained from the NCBI GEO database with access number GSE112013, and the corresponding annotations were available in the supplementary information Table S1 by Guo et al.

Mouse brain

10x Visium and snRNA-seq data (includes annotation) were available in the ArrayExpress database with accession numbers E-MTAB-11114 and E-MTAB-11115, respectively¹². Sample ‘ST8059048’ containing 2987 spots was used for evaluation in this study, and all 40,532 single cells across 59 cell types served as reference. In addition, the corresponding data of nuclei counts estimated by histological image segmentation based on deep learning s was downloaded from https://github.com/vitkl/cell2location_paper/blob/master/notebooks/selected_results/mouse_visium_snrna/segmentation/144600.csv.

Human breast cancer xenium

The Human Breast Cancer Xenium dataset is available at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast. A single FFPE tissue block was analyzed by scFFPE-seq, Visium and Xenium. In addition, 3’ and 5’ gene expression data from dissociated tumor cells is also available⁴.

Comparing Redeconve with alternative methods

We compared Redeconve with recently developed deconvolution-based methods (cell2location, DestVI¹³ and CARD¹⁰) as well as mapping-based methods (NovoSpaRc, CellTrek⁸ and Tangram⁷).

Criteria of selecting alternative methods

In considering which methods to include for the comparison, we required methods that (1) are specifically designed for end-to-end estimating the abundance/proportion of cells or cell types using scRNA-seq and ST data as input; (2) demonstrate superior performance in the corresponding publications and third-party evaluation papers; and (3) are peer reviewed with a publicly available software implementation before Dec 2022.

Parameter setting

Prediction results for the 6 datasets were obtained by running the corresponding programs of the algorithms aforementioned based on the default settings except some special considerations: (1) 1000 cells were randomly selected in NovoSpacRc to avoid large number of total cells; (2) 1000 stratified samples of cells were used for Redeconve in almost all the datasets except PDAC where we used total 1926 cells; (3) minCountGene and minCountSpot of the createCARDObject function were set to 0 to prevent unexpected gene or spot filtering in CARD. The output of each method was either a cell-by-spot matrix represented absolute abundance (Redeconve, Tangram) or proportion (NovoSpaRc) of single cells existing at each spot or estimated cell-type abundance (cell2location) or proportion (DestVI, CARD) matrix except CellTrek, of which the outcome was predicted spatial coordinates for individual cells. Hence, for CellTrek, we obtained cell-by-spot abundance matrix by assigning single cells to specific spots according to whether the spot area designed by ST platforms covered the predicted coordinates. We only evaluated CellTrek on the two 10x Genomics Visium-based datasets (human lymph node and human breast cancer) because of running errors on other ST datasets in our computational environment.

Calculating performance metrics

To demonstrate superior performance of Redeconve, we firstly estimated predicted expression profiles for spatial spots. For all datasets, spot-wise cosine similarities, Pearson’s correlations and RMSEs between observed and predicted spot-by-gene expression matrix were calculated. In order to compute these metrics based on the output of each algorithm, we calculated the predicted expression matrix through two ways: (1) for Redeconve, NovoSpaRc, CellTrek and Tangram, we multiplied spot-by-cell abundance or proportion matrix by the cell-by-gene sc/snRNA expression matrix; (2) for cell2location, DestVI and CARD, we multiplied the cell-type abundance or proportion matrix by the reference cell-type expression matrix, where the reference was generated through averaging sc/snRNA expression data according to cell types. When calculating RMSEs, the total number of UMIs for each spot in both observed and predicted expression profile was normalized to ${n}_{{{\mbox{genes}}}}$. We then estimated sparsity of the results through calculating cell-type proportion matrices of all programs and comparing the results according to cell-type information entropy and L₀ norm. The L₀-norm represents number of cell types present at each spot (nonzero values). We also evaluated the performance of cell abundance estimation by Pearson’s correlation between results of individual methods (Redeconve, cell2location, CellTrek and Tangram) and the cell numbers estimated by histological image segmentation based on deep learning for the mouse brain dataset. Finally, computational efficiencies were estimated through comparing total time spent by each algorithm on a computer with Intel(R) Xeon(R) Platinum 8253 CPU, where we set the maximum number of cores to 96. In addition, we tested the run time of these programs on a single NVIDIA A40 card if GPU acceleration supported (cell2location, DestVI, NovoSpaRc, and Tangram).

Assessment at single cell resolution

Cell-by-spot abundance matrix is required for comparison among deconvolution-based methods at single-cell resolution. We, therefore, applied Redeconve with 1000 single cells sampled from the reference scRNA-seq data for the two ST datasets (PDAC and human lymph node) and assigned every single cell a unique cell type since cell2location, DestVI and CARD only support cell-type deconvolution. The result matrices of Redeconve, cell2location and DestVI (no result was available for CARD because of running errors) was obtained according to the corresponding default settings. Cosine similarity, information entropy, perplexity and runtime efficiencies were evaluated as mentioned above.

Information entropy and perplexity

We calculate Information entropy $H$ and perplexity $P$ for each spot separately by the following formula:

$$H=-\mathop{\sum}\limits_{i}{\beta }_{i}\,{\log }_{2}\,({\beta }_{i})$$

(8)

$$P={2}^{H}$$

(9)

where $i=1,\,2,\ldots,\,I$ denotes different cell states. ${\beta }_{i}$ were normalized in advance so that their sum equaled to 1 (i.e., they denote proportion rather than absolute abundance). When $\beta$ is uniformly distributed (namely ${\beta }_{i}$ is a constant, $\frac{1}{I}$, for all $i$), we can know by simple calculation that the perplexity equals to the number of states $I$. This means that perplexity can reveal the number of states when the distribution is uniform. For other distributions, perplexity can also approximately represent the number of states. “Number of states” in the setting of single-cell deconvolution refers to “number of cell states (or types)”. Namely, the perplexity of each spot can approximately represent the number of cell states/types occurred in this spot. By calculating perplexity on simulated and real datasets, we have verified that perplexity showed good consistency with number of non-zero cell types/states in the result, but poor consistency with absolute cell abundance (Supplementary Fig. 35 and Supplementary Table 1).

Cell-type level benchmark based on the PDAC dataset

Marker genes were first identified for each cell types (Supplementary Table 2). Then, for each cell type, similarities of marker genes expression between ST observation and reconstructed profiles by different algorithms across all spots were calculated. Ranks of cosine similarities of individual cell types were used as metrics to summarize the overall performance. In addition, linear regression and statistical test were used to show relationship between cell type abundances and performance metrics.

Cell abundance of ST spots on PDAC dataset

To generate ground truth of cell abundance for each ST spot, we first registered H&E and fluorescent images using Adobe Photoshop CC. Such registration enabled the determination of spatial coordinates for ST spots. After that, Cellpose¹³ was applied through squidpy¹⁴ to detect cell nuclei from the H&E image. Finally, we counted the absolute number of nuclei within each spot and referred to these values as cell abundance.

Generating and analyzing simulation datasets

We used 3 scRNA-seq data to generate simulation data separately: PDAC, human lymph node and human testis. Prior to analysis, all scRNA-seq data were down-sampled to around 1000 cells, with the exception of PDAC which contained a total of 1926 cells. To generate a pseudo-bulk for subsequent deconvolution, all single-cells were aggregated together and assigned an abundance value of 1. To perform deconvolution, we clustered the scRNA-seq reference with 5 different resolutions using FindCluster() function in Seurat package. Together with directly using all single-cells as input, this results in 6 groups of references. Then the differently annotated references were used for deconvolution by Redeconve and cell2location and the results were used to compare with ground-truth, calculate cosine similarity and perplexity (Fig. 2c, d and Supplementary Figs. 16–18).

Benchmarking on human breast cancer Xenium dataset

The Human Breast Cancer Xenium dataset contains scRNA-seq, Visium and Xenium data for a single FFPE tissue block. By mapping Xenium cells to Visium spots, it becomes possible to generate ground truth data regarding cell abundances and cell type proportions. To achieve this, we chose Replicate 1 of Xenium data to align spatial locations of Xenium cell centers to corresponding H&E images through translation and rotation. After that, a key-point registration approach was employed to align H&E images in Xenium and Visium data based on 155 manually identified landmark features on commonly shared microstructures. Then, FindHomography() function in cv2 package with RANSAC method was applied to transform Xenium to Visium coordinates. Hence, the ground truths of cell abundance were generated through counting the transformed cell centers located within each Visium spot. To further generate ground truths of cell type proportion for Visium spots, we labeled each cluster in scFFPE-seq and Xenium data with a corresponding cell type designation (Supplementary Table. 3–4). The proportions of various types of Xenium cells in Visium spots were considered as ground truth cell type proportions.

Based on the generated ground truths, we computed spot-wise cosine similarities between predicted and ground truth cell type proportions for Redeconve and alternative methods. In this approach, we chose scFFPE-seq data as reference for the deconvolution. In addition, Pearson’s correlation was applied to measure the performance of cell abundance estimation for Redeconve, cell2location and Tangram. Finally, a selection of distinct single-cell references (including 5’, 3’, and scFFPE-seq) were applied for the purpose of assessing robustness of the computational algorithms.

Downstream analyses after Redeconve deconvolution

Human lymph node

We firstly ran Redeconve on default setting to obtain deconvolution result at single-cell resolution. Then, we investigated the spatial distribution of plasma cells after grouping these plasma cells into IgA + , IgG+ and others based on the expression of IGHA1, IGHG1, IGHG3 and IGHG4. IgA+ and IgG+ spots were determined by the following three steps: (1) identifying the top 50% spots with the highest abundance of IgA+ and IgG+ plasma cell enriched, which were named as spot sets A and G; (2) identifying the difference sets between A and G, and naming as AD and GD; (3) selecting spots from AD and GD with the top 1% IgA+ and IgG+ plasma abundance, which were assumed to be IgA+ and IgG+ spots respectively. EdgeR⁴² was applied to perform differential gene expression analysis and identified significantly differential genes between IgA+ and IgG+ spots. Then, we calculated Pearson’s correlation coefficient among single cell states in the reference across IgA+ and IgG+ spots and took single cells as nodes and correlated cells (Pearson > 0.2) as edges to generate the cell-cell co-location network.

PDAC

We ran Redeconve with all the 1926 single cells as reference, and all the parameters were kept default. For downstream analyses, we first compared Redeconve with existing tools as described in the aforementioned sections. Then, to study the distribution of T cells, we distinguished from NK cells T cells by the expression of CD3D, CD3E or CD3G in the scRNA-seq data. We further picked out those T cells that frequently appeared in the ST spots (T cells 8, 11, and 35). To study the spatial colocalization of these T cells with other cells, we calculated the Pearson’s correlation of cell abundance across spatial spots, and generated a colocalization network of single cell resolution using those cell pairs whose Pearson correlation were greater than 0.4 with the R package igraph⁴³.

Statistics and reproducibility

For all datasets except for PDAC, we down sampled the sc/snRNA-seq reference to around 1000 cells. Stratified sampling was performed when cell types are available, otherwise simple random sampling was performed. The exact number of chosen cells for each dataset are as follows: human breast cancer: 1001, human lymph nodes: 1000, human testis: 999, Mouse Brain: 1003, Mouse cerebellum: 1003, human breast cancer Xenium (scFFPE): 1001, human breast cancer Xenium (3’): 998, human breast cancer Xenium (5’): 1002. The seed was set to 2233. All other parts of this study do not involve randomization. The Investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All relevant data supporting the key findings of this study are available within the article and its Supplementary Information files. The PDAC data used in this study are available in the Gene Expression Omnibus database under accession code GSE111672. The processed human lymph nodes Visium data are available at 10x Genomics website [https://www.10xgenomics.com/resources/datasets/human-lymph-node-1-standard-1-1-0]. The processed human lymph nodes scRNA-seq data are available from Kleshchevnikov et al. [https://cell2location.cog.sanger.ac.uk/browser.html]. The mouse cerebellum data used in this study are available in the Single Cell Portal database under accession code SCP948 [https://singlecell.broadinstitute.org/single_cell/study/SCP948/robust-decomposition-of-cell-type-mixtures-in-spatial-transcriptomics#study-download]. The processed human breast cancer Visium data are available at zenodo [https://zenodo.org/record/4739739#.Ys0v6jdBy3D]. The processed human breast cancer scRNA-seq data used in this study are available in the Gene Expression Omnibus database under accession code GSE176078. The processed human testis Slide-seq data are available at dropbox [https://www.dropbox.com/s/q5djhy006dq1yhw/Human.7z?dl=0]. The processed human testis scRNA-seq data used in this study are available in the Gene Expression Omnibus database under accession code GSE112013. The processed mouse brain Visium data used in this study are available in the ArrayExpress database under accession code E-MTAB-11114. The processed mouse brain snRNA-seq data used in this study are available in the ArrayExpress database under accession code E-MTAB-11115. The processed Visium, 3’ scRNA-seq, 5’ scRNA-seq and scFFPE-seq for human breast cancer Xenium dataset are available at 10x Genomics website [https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast]. Source data are provided with this paper.

Code availability

The codes used to generate the figures in this paper is available at https://codeocean.com/capsule/1351962/tree/v1. The package is available on GitHub with detailed documentation at https://github.com/ZxZhou4150/Redeconve, https://doi.org/10.5281/zenodo.8384152²¹.

References

Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Article CAS PubMed Google Scholar
Eng, C. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568, 235–239 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
Janesick, A. et al. High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue. bioRxiv, 2022.2010.2006.510405, https://doi.org/10.1101/2022.10.06.510405 (2022).
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat Biotechnol 40, 1794–1806 (2022).
Moriel, N. et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat. Protoc. 16, 4177–4200 (2021).
Article CAS PubMed Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18, 1352–1362 (2021).
Article PubMed PubMed Central Google Scholar
Wei, R. et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40, 1190–1199 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol 41, 1543–1548 (2023).
Ma, Y. & Zhou, X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat. Biotechnol. 40, 1349–1359 (2022).
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022).
Article CAS PubMed Google Scholar
Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol. 40, 661–671 (2022).
Article CAS PubMed Google Scholar
Lopez, R. et al. DestVI identifies continuums of cell types in spatial transcriptomics data. Nat. Biotechnol. 40, 1360–1369 (2022).
Dong, R. & Yuan, G. C. SpatialDWLS: accurate deconvolution of spatial transcriptomic data. Genome Biol. 22, 145 (2021).
Article PubMed PubMed Central Google Scholar
Elosua-Bayes, M., Nieto, P., Mereu, E., Gut, I. & Heyn, H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 49, e50 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sun, D., Liu, Z., Li, T., Wu, Q. & Wang, C. STRIDE: accurately decomposing and integrating spatial transcriptomics using single-cell RNA sequencing. Nucleic Acids Res. 50, e42 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bae, S. et al. CellDART: cell type inference by domain adaptation of single-cell and spatial transcriptomic data. Nucleic Acids Res. 50, e57 (2022).
Article CAS PubMed PubMed Central Google Scholar
Geras, A. et al. Celloscope: a probabilistic model for marker-gene-driven cell type deconvolution in spatial transcriptomics data. Genome Biol 24, 120 (2023).
Song, Q. & Su, J. DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence. Brief Bioinform. 22, https://doi.org/10.1093/bib/bbaa414 (2021).
Andersson, A. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol. 3, 565 (2020).
Article PubMed PubMed Central Google Scholar
Zhou, Z., Zhong, Y., Zhang, Z. & Ren, X. Spatial transcriptomics deconvolution at single-cell resolution using Redeconve. Zenodo, https://doi.org/10.5281/zenodo.8384152 (2023).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Article CAS PubMed Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19, 171–178 (2022).
Article CAS PubMed PubMed Central Google Scholar
Moncada, R. et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 38, 333–342 (2020).
Article CAS PubMed Google Scholar
10x Genomics Support. V1_Human_Lymph_Node - Datasets - Spatial Gene Expression, <https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Human_Lymph_Node> (2020).
Guo, J. et al. The adult human testis transcriptional cell atlas. Cell Res 28, 1141–1157 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Hutton, C. et al. Single-cell analysis defines a pancreatic fibroblast lineage that supports anti-tumor immunity. Cancer Cell 39, 1227–1244.e1220 (2021).
Article CAS PubMed PubMed Central Google Scholar
Elyada, E. et al. Cross-species single-cell analysis of pancreatic ductal adenocarcinoma reveals antigen-presenting cancer-associated fibroblasts. Cancer Discov. 9, 1102–1123 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bear, A. S., Vonderheide, R. H. & O’Hara, M. H. Challenges and opportunities for pancreatic cancer immunotherapy. Cancer Cell 38, 788–802 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zheng, B. et al. TM4SF1 as a prognostic marker of pancreatic ductal adenocarcinoma is involved in migration and invasion of cancer cells. Int J. Oncol. 47, 490–498 (2015).
Article CAS PubMed Google Scholar
Fu, F. et al. Role of transmembrane 4 L Six Family 1 in the development and progression of cancer. Front. Mol. Biosci. 7, 202 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, D. et al. Lost miR-141 and upregulated TM4SF1 expressions associate with poor prognosis of pancreatic cancer: regulation of EMT and angiogenesis by miR-141 and TM4SF1 via AKT. Cancer Biol. Ther. 21, 354–363 (2020).
Article PubMed PubMed Central Google Scholar
Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat. Med. 24, 978–985 (2018).
Article CAS PubMed Google Scholar
Zhang, Q. et al. Landscape and dynamics of single immune cells in hepatocellular carcinoma. Cell 179, 829–845.e820 (2019).
Article CAS PubMed Google Scholar
Cooper, M. D., Lawton, A. R. & Kincade, P. W. A two-stage model for development of antibody-producing cells. Clin. Exp. Immunol. 11, 143–149 (1972).
CAS PubMed PubMed Central Google Scholar
Simonelli, C. et al. Both CD8+ and CD16+ human T cell clones can provide B cell help for immunoglobulin production. Int. J. Clin. Lab. Res. 22, 36–39 (1992).
Article CAS PubMed Google Scholar
Kawanishi, H., Saltzman, L. & Strober, W. Mechanisms regulating IgA class-specific immunoglobulin production in murine gut-associated lymphoid tissues. II. Terminal differentiation of postswitch sIgA-bearing Peyer’s patch B cells. J. Exp. Med. 158, 649–669 (1983).
Article CAS PubMed Google Scholar
Snapper, C. M. & Mond, J. J. Towards a comprehensive view of immunoglobulin class switching. Immunol. Today 14, 15–17 (1993).
Article CAS PubMed Google Scholar
De Becker, G. et al. Immunoglobulin isotype regulation by antigen-presenting cells in vivo. Eur. J. Immunol. 24, 1523–1528 (1994).
Article PubMed Google Scholar
Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. et al. Dissecting mammalian spermatogenesis using spatial transcriptomics. Cell Rep. 37, 109915 (2021).
Article CAS PubMed PubMed Central Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Nepusz, G. C. A. T. The igraph software package for complex network research. InterJ. Complex Sys. 1695, 1–9 http://igraph.org (2006).

Download references

Acknowledgements

This work was supported by Changping Laboratory, the National Natural Science Foundation of China (32022016 X.R., 92159305 X.R., and 31991171X.R.), National Key R&D Program of China (2020YFE0202200 X.R. and 2022YFC3400904 X.R.).

Author information

These authors contributed equally: Zixiang Zhou, Yunshan Zhong.

Authors and Affiliations

Changping Laboratory, Yard 28, Science Park Road, Changping District, Beijing, China
Zixiang Zhou, Yunshan Zhong, Zemin Zhang & Xianwen Ren
Biomedical Pioneering Innovation Center (BIOPIC), Peking University, 100871, Beijing, China
Zixiang Zhou & Zemin Zhang

Authors

Zixiang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yunshan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zemin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xianwen Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.R. conceived this study, designed the algorithm, supervised the analysis, and wrote the manuscript. Z.X.Z developed the software, conducted the data analysis, and wrote the manuscript. Y.Z. conducted the data analysis and wrote the manuscript. Z.M.Z provided valuable discussion on the data analysis and wrote the manuscript.

Corresponding author

Correspondence to Xianwen Ren.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ken Chen, Jing Su and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, Z., Zhong, Y., Zhang, Z. et al. Spatial transcriptomics deconvolution at single-cell resolution using Redeconve. Nat Commun 14, 7930 (2023). https://doi.org/10.1038/s41467-023-43600-9

Download citation

Received: 26 October 2022
Accepted: 14 November 2023
Published: 01 December 2023
DOI: https://doi.org/10.1038/s41467-023-43600-9

This article is cited by

From sequence to consequence: Deciphering the complex cis-regulatory landscape
- Greg Jude Dsilva
- Sanjeev Galande
Journal of Biosciences (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Redeconve: a quadratic programming model for single-cell deconvolution of ST data

High accuracy, resolution, robustness, efficacy, and scalability of Redeconve

Single-cell resolution is unique to Redeconve compared with previous deconvolution algorithms

Evaluating the impact of cell-type resolution on deconvolution by simulation

Evaluating the estimating accuracy of cell-type proportion by 10x Genomics Xenium data as ground truth

Single-cell resolution by Redeconve enables identification of pancreatic cancer-clone-specific T cell infiltration

Redeconve sheds novel insights into the regulatory mechanisms underlying antibody class switch

Discussion

Methods

Algorithm

Model overview

The regularization term

Determination of the hyperparameter

Data preprocessing

Real datasets for benchmarking

PDAC

Human lymph node

Mouse cerebellum

Human breast cancer

Human testis

Mouse brain

Human breast cancer xenium

Comparing Redeconve with alternative methods

Criteria of selecting alternative methods

Parameter setting

Calculating performance metrics

Assessment at single cell resolution

Information entropy and perplexity

Cell-type level benchmark based on the PDAC dataset

Cell abundance of ST spots on PDAC dataset

Generating and analyzing simulation datasets

Benchmarking on human breast cancer Xenium dataset

Downstream analyses after Redeconve deconvolution

Human lymph node

PDAC

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links