Introduction

High-throughput single-cell RNA sequencing (scRNA-seq) has emerged as a revolutionary approach to dissect cellular compositions and characterize molecular properties of complex tissues,1 and has been applied to a wide range of fields resulting in profound discoveries.2 However, spatial information of individual cells is lost during the process of tissue dissociation. While it is paramount to investigate the molecular composition of individual cells in the spatial contexture, current methods such as RNA hybridization,3 in situ sequencing,4 immunohistochemistry,5 and purifying predefined subpopulations for subsequent transcriptomic profiling6 are limited by the throughput and complex experimental procedures that are only accessible by a handful of laboratories. The combination of scRNA-seq with in situ RNA patterns or tissue shapes provides computational solutions for high-throughput mapping of the spatial locations of many individual cells,7,8,9,10 but such methods rely on the availability of spatial references, limiting their wide applications. It is of pressing need to develop a novel method to reconstruct cell spatial organizations de novo from scRNA-seq data to further release the great power of such technology.

The spatial organization of individual cells has recently been shown to be self-assembled via ligand-receptor interactions,11,12 implying that cellular spatial organization is inherently encoded by their identity. We argue that the spatial relationship of cells may be reconstructed de novo, at least in part, by integrating scRNA-seq data with ligand-receptor interaction information. Here we formulate this hypothesis as a mathematical model, referred to as CSOmap (Cellular Spatial Organization mapper), and evaluate its performance computationally and experimentally on a diverse scRNA-seq datasets for various human and mouse tissues. All results support that CSOmap not only can reconstruct cell spatial organizations de novo from scRNA-seq data alone, but also can quantify the statistical significance of cell-cell interactions and reveal the potentially critical ligand-receptor pairs mediating such interactions. In particular, CSOmap allows in silico perturbations to evaluate the potential effects of gene overexpression or knockdown and cell adoptive transfer or depletion on the changes of cell spatial organizations. We applied CSOmap to tumor-infiltrating immune cells and gained new insights into the role of regulatory T cells in tumor immunity.

Results

Overview of CSOmap

With the hypothesis that cell spatial organization is inherently encoded by cell identity, we formulate the computation process from scRNA-seq data to cell spatial organization based on three assumptions: (1) the potential of cellular interactions can be approximated by a function of the abundance of interacting ligands and receptors, and their affinity; (2) cells with high interacting potentials tend to locate in close proximity; (3) cells compete for their interacting partners due to physiological and spatial constraints. We formulate these hypotheses as a mathematical optimization model (named as CSOmap) that predicts coordinates of each cell in a three-dimensional pseudo-space based on input scRNA-seq data and known ligand-receptor interactions13,14 (Fig. 1). The algorithmic process is composed of two main steps. The first is to estimate the cellular interacting potentials by integrating thousands of ligand-receptor pairs, resulting in a cell-by-cell affinity matrix (Fig. 1a). The second is to embed the inherently high-dimensional affinity matrix into three-dimensional space (Fig. 1b). The limited availability of space determines that it is not feasible to position cells with the same interacting potentials equally close to their partners. Thus, we applied Student’s t-distribution to resolve the cell competition problem, enlightened by the widely used visualization technique t-SNE.15 After embedding cells into three-dimensional space, spatial structures/patterns of cells can be analyzed by density-based clustering16 (Fig. 1c), connections and corresponding statistical significance among predefined cell types can be summarized (Fig. 1d), and the dominant ligand-receptor pairs underlying a specified pair of cell types can be calculated (Fig. 1e). When a critical gene or cell population was determined, CSOmap can be further applied to examine the effects of in silico perturbations including gene overexpression or knockdown and cellular depletion or adoptive transfer (Fig. 1f).

Evaluating the validity of CSOmap based on public scRNA-seq datasets

We first evaluated CSOmap on various publicly available scRNA-seq datasets. In a scRNA-seq dataset of pancreas,17 the spatial separation of the endocrine and exocrine compartments provides a natural reference for assessing the performance of CSOmap. On both the human and mouse pancreatic scRNA-seq data, CSOmap successfully recapitulated such spatial separation (Fig. 2), with endocrine cells forming one structure and exocrine cells composing the other compartment. The visual separation in human was further supported by random permutation-based statistical testing, with endocrine rather than exocrine cells showing significant interactions with endothelial cells (Fig. 2). CSOmap was then applied to the scRNA-seq data of human placenta and decidua.18 CSOmap successfully reconstructed the early maternal-fetal interface, i.e., fetal placenta cells, rather than maternal blood cells, showing significant interactions with maternal decidua cells (Supplementary information, Fig. S1). Quantitative evaluation based on the scRNA-seq data of mouse liver lobules8 showed that CSOmap reached high consistence (R = 0.85, P < 0.01, Spearman correlation, Supplementary information, Fig. S1b) with the reference.8 Due to the inherent difficulty to dissociate endothelial cells from liver cells, paired-cell sequencing has been customized to resolve the spatial positions of endothelial cells within liver lobules.19 In silico predictions by CSOmap reached consistent results with paired-cell sequencing (R = 0.73, P < 0.04, Spearman correlation, Supplementary information, Fig. S1c). Systematic evaluation based on the Tabula Muris datasets20 demonstrated that CSOmap could reproduce the organ-level separations by revealing significantly higher intra-organ cellular interactions than inter-organ interactions for almost all organs except tongue (15/16, 93.75%, Supplementary information, Fig. S2), of which 176 out of 199 interacting cell type pairs were from different cell types. Of 6 organs that have both epithelial and endothelial cells available in the Tabula Muris dataset (Supplementary information, Fig. S3), we observed that in almost all the organs except trachea the epithelial cells occupy the outside space (topologically equivalent to the organ edges) while the endothelial cells occupy the inner space (topologically equivalent to the organ basement), suggesting the spatial resolution below the organ level. Such successful applications clearly show the effectiveness, robustness, and wide applicability of CSOmap for multiple organs from different organisms and different technical platforms.

To further demonstrate the effectiveness of CSOmap to reconstruct the cell spatial organization de novo based on scRNA-seq data, we applied CSOmap to a human scRNA-seq dataset consisting of both normal and fibrotic lungs.21 CSOmap was applied individually for each healthy donor and patient with pulmonary fibrosis, and then the spatial characteristics of alveolar cells were compared among donors and patients. Based on the scRNA-seq data of normal donors, CSOmap revealed that Type II alveolar cells disperse in the outer pseudo-space (topologically equivalent to the alveolar space) and Type I alveolar cells form compact basal structures together with endothelial, alveolar macrophages, and other cells (Fig. 3a). The visual characteristics were further confirmed by quantifying the distance of Type II alveolar cells to the center of the pseudo-space (Fig. 3b). Permutation-based statistical testing suggests that Type II alveolar cells are spatially exclusive to themselves and other cell types (i.e., depleted in the neighborhood of Type II alveolar cells), but Type I alveolar cells show significant interactions with themselves, endothelial cells, and macrophages (i.e., enriched in the neighborhood of Type I alveolar cells). These spatial characteristics agree with the histological observations of human alveoli,22 suggesting the validity of CSOmap.

However, samples from patients with pulmonary fibrosis show distinct spatial characteristics. For idiopathic pulmonary fibrosis (IPF), Type II alveolar cells do not disperse in the outer space but rather show significant interactions with other cells (Fig. 3c, d), consistent with the pathological observation of diffuse alveolar septal thickening and type II pneumocyte hyperplasia.23 For systemic sclerosis-associated interstitial lung disease (SSc-ILD), although Type II alveolar cells still disperse in the outer space, Type I alveolar cells do not have significant interactions with endothelial/lymphatic cells or macrophages (Fig. 3e, f), agreeing with the pathological characteristics of injured alveolar epithelium.24 By analyzing the dominant ligand-receptor pairs that mediate the spatial organizations of alveolar cells in normal lungs, we found that SFTPA1-TLR2 was ranked top in mediating interactions among Type I and II alveolar cells. However, for pulmonary fibrosis samples, the scores of SFTPA1-TLR2 were significantly reduced (Supplementary information, Fig. S4). Since mutations or aberrant expression of SFTPA1 and TLR2 have been associated with pulmonary fibrosis,25,26 the identification of the critical role of SFTPA1-TLR2 in maintaining the normal spatial organization of human alveoli by CSOmap via an unbiased approach further underscores the validity of CSOmap and the molecular insights that it may bring.

Evaluating the validity of CSOmap based on experimental tissue dissection and imaging

We further validated the performance of CSOmap with experimental tissue dissection and imaging. First, we dissected a liver tumor sample into tumor edges and cores, and applied scRNA-seq separately. With the scRNA-seq data only, CSOmap reconstructed a cell spatial organization with tumor core cells located interiorly and tumor edge cells exteriorly (Fig. 4a). The spatial separation was statistically significant by quantitatively evaluating the distance of tumor core and edge cells to the center of the pseudo-space (Fig. 4b). The spatial patterns of genes encoding heat-shock proteins (Hsps; Hsp40, Hsp70 and Hsp90) revealed by CSOmap, i.e., with high expression level in the center while low expression level near the edge (Fig. 4c), were further experimentally confirmed by immunohistochemical (IHC) staining on independent liver tumor samples (Fig. 4d, e), suggesting the effectiveness and robustness of CSOmap in deriving new biological insights.

We then assessed the CSOmap results in a quantitative manner by simultaneously generating scRNA-seq and IHC staining data for a tumor sample derived from hepatocellular carcinoma (HCC). In brief, six cell types, including regulatory T cells (Tregs, marked by Foxp3+), exhausted CD8+ T cells (Texs, marked by PD-1+), CD8+PD-1 T cells, type 1 dendritic cells (cDC1, marked by CLEC9A+), macrophages (marked by CD68+), and other cells, were labeled by specific antibodies on a 1 cm × 1 cm × 100 μm tumor tissue. This tumor tissue was consecutively spliced into 1 cm × 1 cm × 5 μm pieces for IHC staining, and then the cell types and positions of 1,181,790 cells were recorded to serve as the reference for evaluating the performance of CSOmap (Fig. 4f and Supplementary information, Fig. S5). With 1,329 scRNA-seq profiles based on SMART-seq2,27 CSOmap reached high concordance with the results of IHC analysis (Fig. 4g, R = 0.69, P = 2.2 × 10−6, Spearman correlation) and recapitulated multiple cell-cell interactions exemplified by CD8 T cells-macrophages and Tregs-Texs pairing (Fig. 5). After removing the potential biases introduced by the uneven cell counts of different cell types, the consistence score between IHC results and the CSOmap prediction was still 0.54 (P = 2.0 × 10−4, Spearman correlation, Fig. 5) while the correlations based on random coordinate assignment and random gene pairs were −0.12 and 0.34, respectively.

CSOmap reveals the critical role of CD63-TIMP1 interaction in tumor morphology

Besides the spatial reconstruction, CSOmap can also provide important insights into the underlying molecular mechanisms. We applied CSOmap separately to a head and neck cancer (HNC) scRNA-seq dataset28 and a melanoma scRNA-seq dataset.29 Based on the IHC images of the original report,28 the spatial characteristics of HNC tumor microenvironment can be summarized as follows: (1) malignant cells not subject to partial epithelial to mesenchymal transition (p-EMT) were located close to each other and formed a loose structure; (2) malignant cells subject to p-EMT were located at the interface between malignant cells and cancer-associated fibroblasts (CAFs); (3) CAFs were connected to each other and formed a compact structure (Fig. 6a). CSOmap not only qualitatively recapitulated all these IHC characteristics (Fig. 6b), but also highlighted the distinct spatial patterns of malignant cells between HNC and melanoma, i.e., malignant cells in HNC tended to form a loose structure (adjusted P > 0.05, permutation-based test, Fig. 6c, d) while tumor cells in melanoma tended to form a compact structure (Fig. 6e, adjusted P < 0.05, permutation-based test). By analyzing the dominant ligand-receptor pairs contributing to this spatial organization, we identified that the interactions between CD63 and TIMP1 contributed ~66% to the cellular interaction potential of melanoma malignant cells (Fig. 6f) while HNC cells expressed CD63 and TIMP1 at much lower levels. Using CSOmap, we were able to readily perform “in silico perturbation” of CD63 and re-calculate the spatial characteristics. Indeed, in silico knockdown of CD63 expression in melanoma malignant cells resulted in the transition from compact to loose structures while overexpression of CD63 in HNC malignant cells resulted in compact structure (Fig. 6g). The association of CD63 with the morphology of melanoma has been experimentally supported by a previous in vivo and in vitro study,30 in which the mechanism underlying such association was attributed to the negative linkage between CD63 signaling and EMT. This notion is recapitulated by CSOmap since the p-EMT program was observed in the HNC dataset but absent in melanoma, suggesting the effectiveness of CSOmap in spatial reconstruction and the potential in revealing the underlying molecular mechanism.

Further biological insights were also provided by the spatial reconstruction of CSOmap based on the melanoma and HNC scRNA-seq data. First, the malignant cells of both melanoma and HNC did not show significant interactions with T cells according to our CSOmap analyses, which may partially explain the immune evasion of these tumors. By contrast, while the CAFs, endothelial cells and macrophages account for much smaller fraction of the datasets, they showed statistically significant interactions with almost all the other cell types. These results recapitulate the critical roles of these cells in the spatial organization of tumors and in the regulation of tumor-infiltrating lymphocytes.31 Second, comparison of the spatial organizations between melanoma samples that are treatment-naïve (TN)29 and on immunotherapy32 highlights the tumor and T cell compartments observed by IHC (Fig. 6h). Upon treatment, increased tumor-T interactions were observed by IHC (also revealed by CSOmap), indicating the potential effects of immunotherapy. Differential gene expression analysis based on the CSOmap prediction indicated that malignant cells not interacting with T cells show lower levels of class I major histocompatibility complex (MHC) molecules and JUN but higher level of CDK6. These results recapitulate the cancer cell program contributing to resistance of immune checkpoint blockade in melanoma identified recently,32 suggesting the effectiveness of CSOmap in generating valid biological insights.

CSOmap provides new insights into the roles of regulatory T cells in tumor immunity

Since CSOmap also allows in silico cellular perturbation, we applied CSOmap to three scRNA-seq datasets based on T cells from the peripheral blood, tumors and tumor-adjacent normal tissues of patients with HCC,33 non-small cell lung cancer (NSCLC),34 or colorectal cancer (CRC).35 Since blood, tumor and normal tissues have distinct morphologies, and tertiary lymphoid structures (TLSs) are frequently found in tumors,36 we hypothesize that T cells infiltrating into different tissues may also demonstrate distinct spatial organization characteristics. CSOmap analyses on all three datasets suggest that T cells from tumors tend to have significantly more interactions with themselves while T cells from the peripheral blood tend to disperse from each other (Fig. 7a), confirming our hypothesis. Cellular density analysis clearly indicated the existence of tightly-linked structures (Fig. 7b), with tumor-derived T cells forming the major part of these structures (Fig. 7c). It has been reported that Tregs tend to trap tumor-infiltrating CD8+ T cells into TLSs or draining lymph nodes.37 Consistently, our analysis indicated that Tregs and tumor-infiltrating Texs are the major parts of these compact structures (Fig. 7d), which we speculate to correspond to TLSs in or near tumors. Interestingly, although blood-derived T cells did not show significant interacting potential to each other compared to tumor-derived T cells, those compact structures composed of blood-derived T cells were also observed in the HCC and NSCLC datasets, supporting the colony-forming capacity of T lymphocytes from peripheral blood as reported previously.38

Among T cells, Tregs and Texs exhibited significant interaction, and the ligand-receptor pair CCL4-CCR8 drove such interaction (Fig. 7e). While CCL4 was highly expressed in most activated CD8+ T cells, its expression level in Texs was two-fold of that in other cells. CCR8 was specifically and highly expressed in tumor Tregs. According to the spatial organization reconstructed by CSOmap, Texs could be further divided into two subgroups: Texs interacting and not interacting with Tregs. Consistent with a previous report,39 MKi67 was depleted in Texs interacting with Tregs (Fig. 7f), suggesting reduced proliferation by Tregs. Since Texs are characterized by high expression of T cell exhaustion markers including PDCD1, CTLA4, HAVCR2, TIGIT and LAG3, we examined the expression levels of their ligands in Tregs. CD274+ (or PDL1, gene for the PDCD1 ligand) and CD80+ (gene for CTLA-1 ligand) Tregs were significantly enriched in Tregs interacting with Texs while CD86+ Tregs were enriched in Tregs not interacting with Texs in CRC (Fig. 7g). These results suggest that Tregs might suppress CD8+ T cells via PD-1 and CTLA-4-mediated co-inhibitory axes. Similar trends were found in HCC and NSCLC despite varying significance. In addition to Texs, Tregs also showed significant interactions with a set of CD8+ effector memory T cells (Tems). In CRC, T cell receptor (TCR)-based tracking suggests frequent state transitions between Tems and Texs in tumor.35 The ratio of Tems to Texs in tumor has also been associated with better survival in lung adenocarcinoma patients34 and better response to immunotherapies in melanoma recently.40 The notable interaction between Tregs and Tems in tumor may suggest a role of Tregs in the early stage of immune evasion of tumors. Similar to Texs, Tems interacting with Tregs showed higher expression levels of IFNG and TNF than those not interacting, and the expressions of IFNG and TNF showed significant correlations with CCL4 in Texs/Tems interacting with Tregs rather than those not interacting, suggesting that functional CD8+ T cells were prone to be targeted by Tregs due to high level of CCL4 secretion. IHC analysis of HCC and CRC samples confirmed the interactions of Tregs with Texs and Tems based on the colocalization of Tregs and CD8+ T cells in tumor (Fig. 5 and Supplementary information, Fig. S6).

In silico Treg depletion by CSOmap revealed that a subset of blood-enriched recently activated effector memory T cells (Temras) demonstrate significantly increased interactions with Texs via the CXCR3-CCL5 axis (Supplementary information, Fig. S7), which is different from the CCR8-CCL4 axis mediating the Treg-Tex interactions. It has been recently reported in murine and human melanoma that, compared with CCR2 and CCR5, CXCR3 is necessary for the successful trafficking of tumoricidal T cells across tumor vascular checkpoints,41 consistent with our finding that CXCR3 might mediate the migration of blood T cells into tumor via the CCL5 gradient. Treg depletion also increased the interactions of CD4+CXCR6+ tissue-resident helper T cells (Ths) with Texs via the CXCR3-CCL5 axis (Supplementary information, Fig. S7), supporting the role of T cell competition in immune regulation as revealed previously.42

In silico cell transfer reveals phenotypic determinants in T cell-based tumor cell killing

CSOmap also enables computational simulation of adoptive cell transfer, which has proven to be effective immunotherapy for cancer treatment.43 It is currently difficult to experimentally evaluate the phenotypic outcome of adoptively transferred T cells. We used the HNC and melanoma datasets as foundations to simulate their tumor microenvironments and used blood- and tumor-derived T cells for in silico adoptive transfer. We simulated a gradient of TCR-pMHC affinity between the adoptively transferred T cells and the malignant cells and quantified the numbers of tumor-T interactions, tumor-infiltrating T cells and targeted malignant cells. Interestingly, while the numbers of interactions between T and malignant cells increased in a linear function of the TCR-pMHC affinity, those infiltrating T cells and targeted malignant cells increased in a logarithmic function (Fig. 8). Visually, an interface formed between T cells and malignant cells (Fig. 8). This phenomenon observed in silico might recapitulate and explain the morphological patterns frequently observed in tumor microenvironment by IHC and multiplexed ion beam imaging.44 While the TCR-pMHC affinity was the dominant determinant of T-malignant cell interactions, the phenotypes of T cells and malignant cells also contributed significantly (Fig. 8 and Supplementary information, Fig. S8). In particular, tumor-derived T cells showed significantly higher efficiency in tumor infiltration than blood-derived T cells (Fig. 8) while malignant cells of melanoma were more prone to be targeted than those of HNC (Supplementary information, Fig. S8) according to ANOVA analysis with repeated measures. Computationally, these results recapitulated the variations of T cell transfer-based therapies across cancers and the predictive values of immunophenotypic characterization of infused T cell product in engraftment and responses observed in various clinical trials.45

Robustness of CSOmap to its technical parameters

We used the HNC and melanoma datasets to show the robustness of CSOmap to its parameter selections. First, we show that it is necessary to embed the cell-by-cell affinity matrix estimated by ligand-receptor interactions into 3D space. As shown by the HNC dataset (Supplementary information, Fig. S9), eight cluster pairs showed different statistical significance before and after 3D embedding (Supplementary information, Fig. S9a), and the changes before and after 3D embedding are nonlinear (Supplementary information, Fig. S9b), with the spatial configuration inferred with 3D embedding showing significantly higher consistence with the IHC images (Supplementary information, Fig. S9c). This result suggests that spatial conflicts may play key roles in shaping cell spatial organization, which is further confirmed by the low correlations between 2D and 3D embeddings (Supplementary information, Fig. S10). Determining the neighborhood of each cell is challenging. However, we show that our results are robust to the selection of the number of cells in the neighborhood of a cell. Using the median distance of the 3rd and 5th nearest neighbors as the cutoff to determine the neighborhood of each cell, CSOmap generates highly consistent cell-cell interaction maps (R > 0.99, P < 10−80, Spearman correlation, Supplementary information, Fig. S11). We also evaluated the impacts of the comprehensiveness of ligand-receptor pairs on spatial inference by including the data in CellPhoneDB.18 The results suggest that the cell spatial organizations inferred by CSOmap are highly reproducible with or without ligand-receptor pairs in CellPhoneDB (R > 0.97, P < 10−50, Spearman correlation, Supplementary information, Fig. S12), suggesting that the current ligand-receptor pairs may be adequate for cell spatial organization inference.

Discussion

CSOmap provides a computational tool to reconstruct cellular spatial organization de novo from scRNA-seq data. The underlying assumption is that cells can compete and self-assemble into specific spatial patterns via ligand-receptor interactions. A wide collection of factors may hinder such computational prediction, including the incomplete nature of known ligand-receptor interactions and their affinity parameters (particularly for TCR-pMHCs), the dropout issue of scRNA-seq, biases or errors of estimating the protein abundance of ligands and receptors by transcriptomic data, the distinction of capacity to infer realistic interactions, and the unavailability of other physical, chemical, and nutrient factors involved in cell organizations. Despite these difficulties, evaluations on multiple scRNA-seq datasets spanning human and mouse physiological and diseased conditions demonstrated that CSOmap can recapitulate critical spatial characteristics qualitatively and quantitatively. Such validity of CSOmap in reconstructing cell spatial organizations de novo from scRNA-seq data supports the hypothesis that cell morphology may be inherently encoded by cell identity, and suggests that ligand-receptor mediated cellular self-assembly may play key roles in tissue morphogenesis.

Compared with the recently proposed method Novosparc,10 which computationally maps single cells into predefined tissue shapes based on the assumption that similar single cells have similar spatial locations, CSOmap makes four conceptual advances. First, CSOmap is a de novo spatial reconstruction method. In contrast, Novosparc is reference-based although the reference is a predefined geometric shape. For scRNA-seq datasets without available tissue shapes, only de novo inferring tools can be used to reconstruct cell spatial organizations in a pseudo-space. Second, CSOmap is built on the assumption that ligand-receptor interactions mediate cell self-assembly. Based on this assumption, it becomes feasible to reconstruct cell spatial organizations de novo, and the applications of CSOmap to almost all the evaluated instances suggest that similar single cells often, but not always (due to spatial conflicts), have similar spatial locations. However, the assumption behind Novosparc that similar single cells have similar spatial locations cannot indicate the roles of ligand-receptor interactions in cell morphogenesis and spatial inference. Third, CSOmap enables the evaluation of statistical significance of cell-cell interactions and the roles of individual ligand-receptor pairs in dictating such interactions while Novosparc does not provide such mechanistic insights. Finally, CSOmap allows in silico simulations of gene/cell perturbations to evaluate the roles of specific ligand/receptors and cell types in shaping/re-shaping a targeted tissue due to its nature of de novo inference. However, with predefined tissue shapes, it is hard for Novosparc to evaluate the roles of such changes in tissue shapes. This feature is important especially when perturbation experiments are hard to conduct.

In summary, CSOmap is able to generate profound hypotheses into the molecular mechanisms underlying cell spatial formation by using its several key features: the de novo reconstruction nature of CSOmap for cell spatial organization, the quantification ability for the statistical significance of cell-cell interactions and the roles of individual ligand-receptor pairs in shaping such interactions, and the convenience of in silico manipulation including gene overexpression, knockdown, cell adoptive transfer and depletion. Such computational modeling can provide important insights into various biological questions including development, immune response, and tumor immune escape. CSOmap will be applicable to interrogation of cellular organizations in pseudo-space from scRNA-seq data for various tissues in diverse systems, and it can be greatly enhanced when more complete knowledge of ligand-receptor interactions and other critical factors are available.

Material and methods

Overview of CSOmap

CSOmap reconstructs cellular spatial organization of individual cells from scRNA-seq data based on three principles: (1) cellular spatial organization is determined by ligand-receptor mediated cellular self-assembly, with cells having high affinity spatially close to each other; (2) the affinity of cells can be defined by the abundance of ligands and receptors and their interacting potentials; (3) cells compete with each other to form spatial structures. Hence, the core algorithm of CSOmap to reconstruct spatial organization of single cells from RNA-seq data includes two steps: (1) estimating the cellular affinity matrix based on the gene expression profiles of individual cells and known ligand-receptor interactions; (2) embedding the inherently high-dimensional cellular affinity matrix into three-dimensional pseudo-space resembling the realistic biological tissues during which cell competitions are sufficiently considered. CSOmap also includes additional algorithms for analyzing the resultant three-dimensional coordinates of single cells, including estimating the density of each cell, identifying spatially-defined cell clusters/structures, evaluating the number of connections and statistical significance between two cell clusters defined by expression profiles or other characteristics, calculating the contributions of each ligand-receptor pair to the interaction potential of two cell clusters, and in silico molecular and cellular interference. The details of the core and additional algorithms of CSOmap are depicted as follows separately.

Estimating cell-cell affinity by ligand-receptor interactions

The estimation of cell-cell affinity is critical to the performance of CSOmap. To define a valid function of cell-cell affinity, we assume that the affinity of two cells equals to the affinity summation of all the protein complexes formed by the proteins from the surfaces or extracellular matrices of two cells. We applied a series of approximations to facilitate computation at the genome scale. Details are stated as follows.

In biological reality, the number of components of a protein complex varies from two to tens. For computational convenience, we converted all the interactions of more than two components to binary interactions regardless of the complicated nonlinear effects. Given a binary interaction, i.e., one ligand A and one receptor B, according to the law of mass action in chemistry, the concentration of the complex AB can be calculated according to the following formula:

$$\left[ {{\mathrm{AB}}} \right] = k\left[ {\mathrm{A}} \right]^a\left[ {\mathrm{B}} \right]^b$$
(1)

where [AB] is the concentration of the complex AB, [A] is the concentration of the ligand A, [B] is the concentration of the receptor B, k is the reaction constant, and a and b are the stoichiometric coefficients of A and B, respectively. The parameters k, a and b vary according to the chemical natures of A and B. For similarity, we approximate Formula (1) by the following formula to handle thousands of pairs of ligand and receptor:

$$[{\mathrm{AB}}] \propto w_{{\mathrm{A,B}}}[{\mathrm{A}}][{\mathrm{B}}]$$
(2)

where wA,B is introduced to summarize the total effects of the parameters k, a and b. Upon this approximation, the cell-cell affinity is defined by the following formula:

$${\mathrm{A}}_{{\mathrm{C1,C2}}} \propto \sum_{i = 1}^I {([{\mathrm{A}}_{{\mathrm{C1}}}{\mathrm{B}}_{{\mathrm{C2}}}] + [{\mathrm{A}}_{{\mathrm{C2}}}{\mathrm{B}}_{{\mathrm{C1}}}])} \\ \propto \sum_{i = 1}^I w_{{\mathrm{A,B}}}([{\mathrm{A}}_{{\mathrm{C1}}}][{\mathrm{B}}_{{\mathrm{C2}}}] \, + \, [{\mathrm{A}}_{{\mathrm{C2}}}][{\mathrm{B}}_{{\mathrm{C1}}}])$$
(3)

where AC1,C2 denotes the affinity of Cell C1 and Cell C2, [Ac] or [Bc] denotes the concentration of the A or B molecule on Cell c, i is the index of ligand-receptor pairs, and there are a total of I pairs. Because the ligand and receptor can be simultaneously expressed by both of the cells, a symmetric term of the concentrations of the complex is added. Furthermore, we use the mRNA abundance of the ligand and receptor to approximate their protein concentrations, and thus Formula (3) can be updated as follows:

$${\mathrm{A}}_{{\mathrm{C1,C2}}} \propto \mathop {\sum}\limits_{i = 1}^I {w_{{\mathrm{A,B}}}({\mathrm{A}}_{{\mathrm{C1}}}^{TPM} \times {\mathrm{B}}_{{\mathrm{C2}}}^{TPM} + {\mathrm{A}}_{{\mathrm{C2}}}^{TPM} \times {\mathrm{B}}_{{\mathrm{C1}}}^{TPM})}$$
(4)

where $${\mathrm{A}}_c^{TPM}$$ or $${\mathrm{B}}_c^{TPM}$$ is the mRNA level of the ligand A or receptor B in Cell c estimated by the Transcripts Per Million (TPM) measure. Since there are thousands of ligand-receptor interactions, for which most of the parameters of their interacting dynamics (summarized by wA,B) are not available, we set wA,B = 1 in the current version of CSOmap while providing wA,B as a parameter of the software for incorporating users’ knowledge of the chemical natures of the ligand-receptor interactions. According to (4), the computational method is thus established for estimation of cell-cell affinity based on scRNA-seq data and ligand-receptor interactions. In practice, we used the human ligand-receptor interaction database FANTOM5 with incorporation of immune-relevant chemokines, cytokines, co-stimulators, co-inhibitors and their receptors for estimating the cell-cell affinity matrix14,46 (Supplementary information, Table S1). Some of these ligands such as chemokines and cytokines are not membrane-located but secreted proteins. We included these ligands into the estimation of cell-cell affinity similar to those membrane proteins because they often form gradients to affect the migration of other cells, particularly for chemokines. Interactions involved in B2M were manually filtered because of its housekeeping nature. Ligand-receptor interactions in the CellphoneDB18 database were also included to show the robustness of CSOmap predictions (Supplementary information, Table S2). Because of the potential noises introduced in the estimation of cell-cell affinity due to noises in gene expression levels and various approximations, we further discretize the cell-by-cell affinity matrix by retaining the top k highest-affinity neighbors for each cell to reduce noise (k = 50 by default).

Embedding the high-dimensional cell-cell affinity matrix into three-dimensional space

When the discretized cell-by-cell affinity matrix is obtained, this inherently high-dimensional matrix is further embedded into a three-dimensional space. The principle behind this operation is that the realistic biological space is of only three dimensions and that positive affinity values in the affinity matrix only indicate the interacting potentials rather than realistically occurred facts. Cells having interacting potentials with common targets need to compete the space to change potentials to reality. Considering this factor, we introduce three constraints to build the computational model. First, a minimum distance between cells is pre-defined because all cells have positive sizes and cannot be ultimately squeezed. Second, the total available space is also pre-defined by a parameter named as space radius to simulate the limited realistic space. Finally and most importantly, Student’s t-distribution is introduced to resolve the crowding issues of cell-cell interactions, motivated by the visualization algorithm t-SNE, which allows cells to compete with others to form the spatial organization. Taken all these considerations together, we propose the following computational model as the core of CSOmap:

$$\min \mathop {\sum}\limits_{i = 1}^n {\mathop {\sum}\limits_{j \ne i} {p_{ij}\log \frac{{p_{ij}}}{{q_{ij}}}} }$$
(5)

subject to:

$$p_{ij} = \frac{1}{Z}\mathop {\sum}\limits_{k = 1}^K {w_{L_k,R_k}(e_i^{L_k} \times e_j^{R_k} + e_i^{R_k} \times e_j^{L_k}} )\,{\mathrm{for}}\,i \, \ne \, j$$
(6)
$$q_{ij} = \frac{1}{{\mathop {Z}\limits^\sim }}\frac{1}{{1 + d_{ij}^2}}\,{\mathrm{for}}\,i \, \ne \, j$$
(7)
$$d_{ij}^2 = \sqrt {\mathop {\sum}\limits_{k = 1}^3 ( y_i^k - y_j^k)^2} \,{\mathrm{for}}\,i \, \ne \, j$$
(8)
$$d_{ij} \ge r\,{\mathrm{for}}\,i \ne j$$
(9)
$$\left| {y_i^k} \right| \le R\,{\mathrm{for}}\,i = 1 \cdots n\,{\mathrm{and}}\,k = 1,2,3$$
(10)

where $$e_i^{L_k}$$ or $$e_i^{R_k}$$ is the TPM values of the kth ligand or receptor in the ith cell, $$w_{L_k,R_k}$$ is the weight summarizing the chemical nature of the kth pair of ligand-receptor, pij is the cell-cell affinity between i and j estimated by the aforementioned method, $$y_i^k$$ is the kth coordinate of the ith cell, dij is the Euclidean distance between the ith and jth cells in the embedded space, and qij is the probability of the jth cell locating in the neighborhood of the ith cell. Constraints (6)–(8) give out the definitions of pij, qij, and dij while constraints (9) and (10) impose the spatial limitations. Kullback-Leibler divergence is used to define the loss function (5). This optimization model is highly similar to the model used to implementing non-linear dimensional reduction in the frequently used visualization algorithm t-SNE except that constraints (9) and (10) are imposed to consider the spatial limitations. Similarly, a gradient-descent algorithm is used to solve this optimization problem with random initialization and then by updating the solution with the guidance of the gradient. When the maximum number of iterations was reached, the resultant three-dimensional coordinates were reported for subsequent analyses. In principle, large r and small R will reduce the volume of cell space and thus provide repulsive forces while the cell-cell affinity provides attracting forces. The repulsive and attracting forces together guide the self-assembly of cells and finally determine the cellular spatial organizations. Particularly, large r and small R will introduce fluctuations into the cellular spatial organizations. To obtain a stable organization, we set r = 1 and R = 50 in practice. The initializing solution is randomly assigned in a 50 × 50 × 50 cube, and the maximum number of iterations is set to 1000. When the three-dimensional coordinates are obtained, a rotation of the coordination system is made by principal component analysis to guarantee the X and Y axes to capture most spatial variations. By default, we set the number of dimensions as 3 because biological tissues/organs are in 3D space, but the users can tune this parameter to 2 or 1 to examine specific tissue models.

Density analysis and clustering spatially compact cell clusters

Given the three-dimensional coordinates of all cells, a straightforward analysis is to check what spatial structures are formed, which can be examined visually and quantitatively. CSOmap implements a series of visualization method to facilitate the recognition of spatially organized structures, including global three-dimensional views with various rotation angels, cross-section views with various slicing methods, and even dynamic views to show how cells self-assemble into organizations via ligand-receptor mediated interactions. Categorical or numerical features can be used to color cells to highlight the patterns. Quantitatively, the compactness of the neighborhood of a cell, named as density, can be calculated by counting the number of cells within a predefined radius. By default, the radius is set to the median distance of each cell to its third nearest neighbor because the number of neighbors of a cell cannot be too large due to limited space. When the density of individual cells are defined, clustering based on fast searching and finding density peaks16 can then be applied to identify spatially compact cell clusters and dissociative cells. Sensitivity analysis suggested that the identification of compact structures is robust to the selection of the radius.

Evaluating the statistical significance of cellular interactions between cell clusters

The resultant three-dimensional coordinates of CSOmap also allow us to examine whether two cell clusters tend to interact with each other significantly and thus locate close to each other. Given a threshold defining the neighborhood radius of a cell, e.g., the median of the third nearest distance, a pair of cells can be assumed to “directly” interact with each other if their distance is less than the cutoff. Therefore, the total number of cell-cell interactions between two clusters can be counted. The statistical significance of the observed number of cell-cell interactions can be further evaluated by random permutation of the cluster labels of individual cells. With 1000 random permutations, the distribution of the randomly expected cell-cell interaction numbers of the given two cell clusters can be constructed, and thus right-tailed and left-tailed tests can be conducted, respectively, to calculate the P-values for the hypotheses that the observed interaction number was larger than that randomly expected (enrichment) and that the observed number was smaller than that randomly expected (depletion). When P-values for enrichment between all clusters are obtained, the Benjamini-Hochberg procedure is used to estimate the q-values. If the enrichment (depletion) q-value for a given pair of cell clusters is less than 0.05, these two cell clusters are claimed to significantly interact with (disperse away from) each other. Otherwise, if the enrichment and depletion q-values are both larger than 0.05, the cell-cell interactions are assigned to the “other” type.

Determining the dominant ligand-receptor pairs underlying cell-cell interactions

Given a pair of cells, the contribution of a specific ligand-receptor interaction to the cell-cell interacting affinity can be calculated by the following formula:

$$c_k^{ij} = \frac{{w_{L_k,R_k}(e_i^{L_k} \times e_j^{R_k} + e_i^{R_k} \times e_j^{L_k})}}{{\mathop {\sum}\nolimits_{k = 1}^K {w_{L_k,R_k}(e_i^{L_k} \times e_j^{R_k} + e_i^{R_k} \times e_j^{L_k})} }}$$
(11)

where $$c_k^{ij}$$ denotes the contribution of the kth ligand-receptor interaction to the cell-cell affinity of the ith and jth cells. Therefore, given a pair of cell clusters, the contribution of a specific ligand-receptor pair to the interactions of the two clusters can be calculated by the following formula:

$$c_k^{{\mathrm{c}}_a,{\mathrm{c}}_b} = \frac{1}{N}\mathop {\sum}\limits_{\scriptstyle i \in {\mathrm{c}}_a,j \in {\mathrm{c}}_b\atop\\ \scriptstyle d_{ij} \le T} {c_k^{ij}}$$
(12)

where $$c_k^{{\mathrm{c}}_a,{\mathrm{c}}_b}$$ denotes the contribution of the kth ligand-receptor interaction to the interactions of two clusters ca and cb, and N is the total number of cell pairs between ca and cb that conform to the definition of “direct” cell-cell interaction, i.e., the distance between two cells i and j should be less than the predefined threshold T (dij ≤ T). When the contributions of all the ligand-receptor pairs are calculated, the ligand-receptor pairs with the highest scores are assumed to be the dominant molecular contributors underlying the cell clusters.

Evaluating the spatial effects of individual genes or cell clusters by in silico interference

CSOmap also provides functions to analyze the effects of in silico interfering genes or cell clusters on the cellular organization. In reality, cell-cell interactions form a highly nonlinear system, and thus it is hard to predict the spatial effects of gene alterations or cell interference. By simulating the cellular spatial organization via ligand-receptor mediated self-assembly, CSOmap provides an easy way to interrogate the nonlinear effects of ligand/receptor or cellular changes on the cellular organizations, and thus can provide important insights into the true biological mechanisms that are too expensive or even impossible to obtain by experimental methods. Currently, the in silico interference types of CSOmap include in silico gene knockdown, gene overexpression, adoptive cell transfer, and cell depletion. When cellular spatial organization with in silico interference is obtained, it can then be compared to the original organization to identify the significant differences. Although the current in silico interference can only examine the effects of ligand and receptors, it can be further enhanced by incorporating gene-gene interactions in the future to introduce dynamics for the expression levels of ligands and receptors. Since the output coordinates of CSOmap are in virtual spaces, it is now not possible to directly compare the changes of cellular spatial organizations at the coordinate level. All the comparisons in the current manuscript were conducted after abstracting the coordinate results into cell-cell interacting graphs. For the HNC dataset, in silico CD63 overexpression was conducted through changing all the original CD63 expression values to TPM 5000. For the melanoma dataset, in silico CD63 knockdown was implemented by resetting the original expression values to zero. For the CRC T cell dataset, Treg depletion was implemented by removing all the cells belonging to the CD4-CTLA4 cluster.