Identifying multicellular spatiotemporal organization of cells with SpaceFlow

Ren, Honglei; Walker, Benjamin L.; Cang, Zixuan; Nie, Qing

doi:10.1038/s41467-022-31739-w

Download PDF

Article
Open access
Published: 14 July 2022

Identifying multicellular spatiotemporal organization of cells with SpaceFlow

Nature Communications volume 13, Article number: 4076 (2022) Cite this article

12k Accesses
20 Citations
12 Altmetric
Metrics details

Subjects

Abstract

One major challenge in analyzing spatial transcriptomic datasets is to simultaneously incorporate the cell transcriptome similarity and their spatial locations. Here, we introduce SpaceFlow, which generates spatially-consistent low-dimensional embeddings by incorporating both expression similarity and spatial information using spatially regularized deep graph networks. Based on the embedding, we introduce a pseudo-Spatiotemporal Map that integrates the pseudotime concept with spatial locations of the cells to unravel spatiotemporal patterns of cells. By comparing with multiple existing methods on several spatial transcriptomic datasets at both spot and single-cell resolutions, SpaceFlow is shown to produce a robust domain segmentation and identify biologically meaningful spatiotemporal patterns. Applications of SpaceFlow reveal evolving lineage in heart developmental data and tumor-immune interactions in human breast cancer data. Our study provides a flexible deep learning framework to incorporate spatiotemporal information in analyzing spatial transcriptomic data.

Cell clustering for spatial transcriptomics data with graph neural networks

Article 27 June 2022

STEM enables mapping of single-cell and spatial transcriptomics data with transfer learning

Article Open access 06 January 2024

SPACEL: deep learning-based characterization of spatial transcriptome architectures

Article Open access 22 November 2023

Introduction

The spatiotemporal pattern of gene expression is critical to unraveling key biological mechanisms from embryonic development to disease. Recent advances in spatially resolved transcriptomics (ST) technologies provide new ways to characterize the gene expression with spatial information that the popular nonspatial single-cell RNA-sequencing (scRNA-seq) method is unable to capture. The majority of current ST technologies may be categorized into in situ hybridization (ISH)-based and spatial barcoding-based, varying in gene throughput and resolution^1,2,3. ISH-based methods can detect target transcripts at the sub-cellular resolution, such as Multiplexed Error-Robust Fluorescence ISH (MERFISH) and sequential fluorescence ISH (seqFISH), for about 100–1000 and 10,000 genes respectively^4,5. Spatial barcoding-based methods can capture the whole transcriptome with varying spatial spot resolutions, such as Visium in 55 μm, the Slide-seq in 10 μm⁶, and the spatiotemporal enhanced resolution ‘omics sequencing (Stereo-seq) in nanometer (subcellular) resolution⁷.

Compared to non-spatial technologies such as scRNA-seq, the presence of spatial information in ST data necessitates development of methods that natively handle the high-dimensional features in space. High-dimensional spatially aware analyses have been previously explored largely in the context of image data^8,9,10,11. By considering each gene as one channel in an image, spatial transcriptomic data may be abstracted as high-dimensional images. However, uncovering the biological interactions between genes in tissue requires new computational methods tailored specifically to transcriptomic data.

Many methods developed for non-spatial transcriptomic data such as scRNA-seq or bulk spatial transcriptomics data^12,13 may provide insights in designing approaches for ST data at single-cell resolution through recasting the relevant tasks in a spatial manner. For example, the identification of spatially variable genes in ST data^14,15 can be viewed as the spatial extension of the highly variable genes in scRNA-seq data. Similarly, methods have been developed to identify spatial domains in ST data¹⁶, the analog of cell clustering in scRNA-seq data analysis, but using spatial information to produce spatially coherent regions. Giotto¹⁷, BayesSpace¹⁸, and SC-MEB¹⁹ use Markov random fields to model the related gene expression in neighboring cells. stLearn utilizes morphological information to perform spatial smoothing before clustering²⁰. MULTILAYER uses graph partitioning to segment tissue domains²¹. MERINGUE performs graph-based clustering using a weighted graph that combines spatial and transcriptional similarity²². SpaGCN²³, SEDR²⁴, SCAN-IT²⁵, stMVC²⁶, and STAGATE²⁷ build deep auto-encoder networks to learn low-dimensional embeddings of both gene expression and spatial information, and segment domains through embedding clustering. RESEPT learns a three-dimensional embedding from ST data by a spatial retained graph autoencoder and treats the embedding as a 3D image, identifying domains through image segmentation using a convolutional neural network²⁸.

The domain segmentation methods reviewed above are the ST counterpart of cell clustering in scRNA-seq data analysis. Contrary to discrete clustering, another powerful analysis in scRNA-seq is the concept of continuous pseudotime which can represent developmental trajectories. The dynamics of many developing systems such as regeneration and cancer progression are often spatially organized^29,30. The ST data thus provides an opportunity to simultaneously reveal both spatial and temporal structures of development. While pseudotime methods for scRNA-seq can be directly applied to ST data, the resulting trajectory may be discontinuous in space. stLearn combines non-spatial pseudotime with spatial distance by simple average, as well as filters connections between clusters inferred by scRNA-seq trajectory inference methods using a spatial distance cutoff, but the resulting connections are limited by the initial pseudotime trajectories inferred without using spatial information²⁰. There is thus a demand for computational tools for integrative reconstruction of fine-resolution spatiotemporal trajectories from ST data which is continuous both in time and space.

As pseudotime trajectories are traditionally computed from a low-dimensional embedding of transcriptomic data³¹, the computation of spatiotemporal trajectories can be viewed as a problem of constructing spatially aware embeddings of ST data. Multiple strategies for computing spatially aware embeddings may be used such as Hierarchical SNE³², Hierarchical UMAP³³, dual embedding³⁴. Additionally, deep graph neural network-based approaches, such as DeepWalk³⁵, Variational Graph Auto-Encoder (VGAE)³⁶, Graph2Gauss³⁷, and Deep Graph Infomax (DGI)³⁸, while computationally more expensive, have been utilized for ST data due to their flexibility to model and learn non-linear and complex salient spatial dependencies between genes and cells.

In this work, we develop a framework to reveal continuous temporal relationships with spatial context using ST data. By combining a DGI framework with spatial regularization designed to capture both local and global structural patterns, we extract a spatially consistent low-dimensional embedding and construct a pseudo-Spatiotemporal Map (pSM), representing a spatially coherent pseudotime ordering of cells that encodes biological relationships between cells, along with a region segmentation. We compare SpaceFlow with five existing methods on six ST datasets, demonstrating competitive performance on benchmarks, and use SpaceFlow to reveal evolving cell lineage structures, spatiotemporal patterns, cell-cell communications, tumor-immune interfaces and spatial dynamics of cancer progression.

Results

Overview of SpaceFlow

SpaceFlow takes Spatial Transcriptomic (ST) data as input (Fig. 1a) and outputs a spatially consistent low-dimensional embedding, domain segmentation, and pseudo-Spatiotemporal Map (pSM) of the tissue. The input ST data consists of an expression count matrix and spatial coordinates of cells or spots. The output embedding encodes the expression of ST data so that nearby embeddings in the latent space reflect not only the similarity in expression but also spatial proximity. The domain segmentation characterizes the spatial patterns of tissue without the need for histological or pathological knowledge. The pSM is a map that represents the pseudo-spatiotemporal relationship of cells in ST data.

Before applying the deep graph network, a Spatial Expression Graph (SEG) is constructed (Fig. 1b) with nodes in the graph representing cells with expression profile attached, while edges model the spatial adjacency relationship of cells (Fig. 1b). In addition, an Expression Permuted Graph (EPG) is constructed by randomly permuting the nodes in SEG and used as negative inputs for the network. To encode the SEG into low-dimensional embeddings, a graph convolutional encoder is built with Parametric ReLU (PReLU) as activation (Fig. 1c). The graph convolutional encoder applies a weighted aggregation to the expression of a cell with its spatial neighborhood to capture local expression patterns into embeddings. We utilize a Deep Graph Infomax (DGI) framework to train the encoder³⁸, which optimizes a Discriminator Loss (Fig. 1d bottom) to learn to distinguish the embeddings from SEG and EPG input. Compared to other GCN architectures, this allows the encoder to learn embeddings that emphasize specifically the spatial expression patterns that corresponding to meaningful structure as opposed to those due to non-spatial variation or noise.

Distant cells of the same cell type may exhibit a high degree of transcriptional similarity even when in very different parts of tissue. Consequently, in order to produce embeddings that most meaningfully represent spatial structure, one needs spatial consistency in embeddings, meaning that the latent space embeddings should be distant not only if their expression profile is distinct, but also when the expression is similar but their spatial locations are distant. We use embedding regularization to enforce this structure in the latent space (Fig. 1d), which takes the spatial distance matrix and the embedding distance matrix of cells or spots as input. These two matrices are then input into linear kernels (Spatial kernel and Embedding kernel) to calculate the loss of each cell pair based on the spatial or embedding distance. The spatial losses and embedding losses of cells from these kernels are then combined to produce the final regularization loss which is added to the discriminator loss used to train the encoder (Methods). The learned low-dimensional embeddings (Fig. 1e) for the ST data are then used in downstream analysis, including the pseudo-Spatiotemporal Map (pSM), domain segmentation, and low-dimensional visualization (Fig. 1f) to analyze spatiotemporal patterns of tissues.

Comparison of SpaceFlow with five existing methods for ST data at spot resolution

To evaluate the quality of the SpaceFlow embeddings, we compared it with five existing methods for unsupervised segmentation on ST data: one non-spatial method Seurat v4³⁹, and four spatial methods Giotto¹⁷, stLearn²⁰, MERINGUE²², and BayesSpace¹⁸ on a 10x Visium human Dorso-Lateral Pre-Frontal Cortex (DLPFC) dataset consisting of twelve samples⁴⁰. Spots are annotated as one of six layers (layer 1 through layer 6) or white matter, and these annotations are used as the ground truth for benchmarking.

To compare the domain segmentation performance quantitatively, we used the adjusted Rand Index (ARI) to measure the similarity between the inferred domains and the expert annotations across all twelve sections (Fig. 2a). SpaceFlow shows a 0.427 median ARI score, the second-highest across the six methods, slightly lower than the BayesSpace, which has 0.438 median ARI. MERINGUE shows the lowest median ARI score (0.232), followed by Seurat (0.300) and then Giotto (0.332), and stLearn (0.369). Interestingly, the DGI method without the spatial regularization used in SpaceFlow shows a significant decrease in ARI, with a 0.332 median score, indicating that spatial regularization does effectively improve the domain segmentation of DGI.

**Fig. 2: Comparison with five unsupervised methods shows that SpaceFlow can identify biologically meaningful spatial domains and generate spatially consistent low-dimensional embeddings.**

Next, we performed a more detailed analysis on section 151671 (Fig. 2b–f). We first computed the domain segmentation for each method and visualized the output compared to the expert annotation (Fig. 2b). It is seen that all methods fail to capture the subtle structure of Layer 4 (L4), suggesting that this ST data does not have the necessary spatial resolution to capture the L4 structure. Both SpaceFlow and BayesSpace can capture all the remaining structures (L3, L5, L6, and WM) observed in the annotation. Moreover, BayesSpace identified the outer ring of the WM as an additional structure, whereas SpaceFlow found a different structure at the top right part of Layer 3 (labeled as domain 3 in orange). The structure found in SpaceFlow is consistent with the domains from Giotto, stLearn, and MERINGUE. stLearn also identified L5, L6, and WM that are consistent with the annotation but with noisy boundaries between domains. Giotto and MERINGUE identified the L6 and WM domain but are unable to identify the boundary of L5 along with the same noisy boundary issues. Seurat showed an overall disordered domain structure and can barely capture the white matter (WM) structures. The DGI method (Fig. 2b), like SpaceFlow but without spatial regularization, showed a layered structure with inconsistent boundary shape and non-contiguous domains, reinforcing the importance of spatial regularization. Similar results can also be observed in samples 151507 and 151673 (Supplementary Fig. 1a, c).

We next compared the low-dimensional embeddings from Seurat, stLearn, DGI, and SpaceFlow (the segmentation methods Giotto, MERINGUE, and BayesSpace do not produce embeddings), applying UMAP to the embeddings to produce a two-dimensional visualization of all spots colored by the layer annotation. We observe that SpaceFlow embeddings produce embeddings that clearly separate the spots by layer when compared to stLearn, DGI, and Seurat (Fig. 2c). As the separation between low-dimensional embeddings of the regions provides an upper limit on the ability of segmentation to separate the regions, this shows that the incorporation of spatial regularization produces more distinct embeddings between different layers and thereby a greater ability to distinguish them in downstream analysis.

To study how the low-dimensional embeddings from different methods encode spatial information, we show the same UMAP embeddings colored by the spatial distances between the spot and the origin (Fig. 2d), so that embeddings which preserve the spatial structure will maintain this color gradient. Seurat embeddings exhibit a high level of mixing in this color, indicating significant deformities in both local and global structure, whereas DGI shows local color gradients with a minor color mixture and shows no global gradient structure. In contrast, SpaceFlow and stLearn both exhibit a clear global gradient structure with a clear coloring trend in each annotation layer. This indicates the learned embeddings from SpaceFlow and stLearn encode transcriptional information while also preserving the local and global spatial structure of the data.

To check whether the identified domains from SpaceFlow are biologically meaningful, we performed a domain-specific expression analysis. We found spatial specific expression patterns for the identified domain-specific genes. For instance, the top domain-specific gene for domain 3 (orange) in SpaceFlow is SAA1. Within domain 3, it is expressed in 90% of cells with a mean expression of 0.96 (scaled from 0 to 1), whereas outside this domain it is expressed in less than 30% of cells with a mean expression of approximately 0.13 (Fig. 2e). This spatial expression specificity is also clear in the spatial expression heatmap showing top-1 marker genes for each domain (Fig. 2f). Among these genes, we found that PCP4 was previously reported as the marker for layer 6 in prefrontal cortex⁴¹. The other genes have clear layer correlations although not previously reported, suggesting new experiments are needed for validating the potential new marker genes. We carried out a Gene Ontology (GO) analysis for the domain-specific genes whose p value is less than 0.01 in domain 3 (Fig. 2g). We observed GO terms associated with regulation of cAMP-dependent protein kinase activity, ionotropic glutamate receptor signaling pathway, regulation of long-term synaptic potentiation regulation of synaptic plasticity, etc. This suggests that the spatially specific expression in the identified domain 3 may be related to long-term synaptic activity, which is consistent with the observation that several top domain-3 specific expression patterns such as MALAT1 (p value < 5.5e−33)⁴², CAMK2A (p value < 7.4e−23)⁴³, PPP3CA (p value < 4e−16)⁴⁴ are involved in long-term synaptic potentiation. The fact that this gene expression is clearly related to neural activity indicates a meaningful subdivision of Layer 3 despite a lack of annotations for this region. This suggests that the expert annotations, even if accurately describing the layer structure, may not paint a complete picture of the spatial structure within the data.

SpaceFlow uncovers pseudo-spatiotemporal relationships among cells

Next, we study the pseudo-Spatiotemporal Map (pSM) computed by SpaceFlow. Different from traditional pseudotime as used in scRNA-seq analysis, which only considers the similarity in expression between cells, the pSM considers both spatial and transcriptional relationships among cells simultaneously (Methods). In spatial visualizations of the pseudotimes produced from Seurat, Monocle, traditional single-cell pseudotime methods that do not incorporate spatial information, we observed a lack of layered patterns as well as significant noise (Fig. 3a). In contrast, both spatially aware methods tested, stLearn and SpaceFlow present a layer-patterned pSM with a clear and smooth color gradient (Fig. 3a), suggesting a pseudo-spatiotemporal ordering from White Matter (WM) to Layer 3. This ordering mirrors the correct inside-out developmental sequence of cortical layers and reflects the layered spatial organization of the tissue. However, stLearn shows less consistency with the annotation in the White Matter (WM) region when compared to SpaceFlow. Similar patterns can also be observed in samples 151507 and 151673 (Supplementary Fig. 1b, d). We also run SpaceFlow on more 10x Visium ST datasets, the results can be found in Supplementary Information (Supplementary Fig. 3).

**Fig. 3: SpaceFlow generates pseudo-Spatiotemporal Map for ST data and uncovers pseudo-spatiotemporal relationship between cells in both spots-resolution and single-cell resolution ST data.**

To test the capability of SpaceFlow on single-cell resolution ST data with a large number of cells, we evaluated SpaceFlow on a Stereo-seq dataset from mouse olfactory bulb tissue, capturing 28243 genes across 18197 cells⁷, comparing with traditional pseudotime methods Seurat, Monocle, and Slingshot. We observed that Seurat shows little variation in pseudotime across the tissue except for the outer rings, which show slightly higher pseudotime values than in the inside. Monocle is much noisier than Seurat and shows no clear patterns. Slingshot is similar to Seurat and exhibits outer-ring patterns. By contrast, SpaceFlow presents a clear layered pattern mirroring the annotated layers of the olfactory bulb tissue (Fig. 3b). The pSM value (red) is lowest in the external plexiform layer (EPL) and then increases when moving away in both directions. This ordering in the pSM is consistent with the developmental sequence of these layers, where starting from the central EPL, development proceeds bilaterally outwards, leading to the mitral cell layer (MCL) and glomerular layer (GL), olfactory nerve layer (ONL), and the granule cell layer (GCL) develops last⁴⁵. The highest values are observed on the inner side, with the peak in the granule cell layer (GCL) and the rostral migratory stream (RMS). This shows that the pSM computed by SpaceFlow is not only more clearly spatially organized than non-spatial pseudotime, but that these results also more accurately represent the temporal and developmental relationships between cells.

We next identify marker genes from the pSM. By calculating the top genes by correlation with the pSM values, we found genes that are predominantly expressed in layers of the olfactory bulb tissue (Fig. 3c). One of the top marker genes, NRGN, shows clear expression patterns localized in the granule cell layer (GCL), and previous experiments have shown that NRGN is usually expressed in granule-like structures in pyramidal cells of the hippocampus and cortex⁴⁶. This shows how the pSM can be used to facilitate biomarker identification for tissues.

Next, we compared the domain segmentation performance of SpaceFlow against Seurat, running without incorporating spatial data, as well as the spatial methods MERINGUE and DGI on the Stereo-seq data. We show a global and a zoomed view of the identified domains for each method (Fig. 3d). The Seurat segmentation is characterized by two large regions – all inner layers except the olfactory nerve layer (ONL) are mainly combined into one region (domain 0 in dark blue) and the ONL is segmented as another (domain 1 in orange). However, even in the inner layers, there are many cells classified as domain 1 (orange), lacking a clear separation between domains. To control for the effect of the resolution parameter, we considered values of 0.3 to 0.8, 1.0 and 2.0, resulting in 13, 15 and 30 clusters respectively (Supplementary Fig. 2a). However, the spatial consistency of clusters does not improve with a higher resolution parameter. MERINGUE identified three major layers, with one additional compared to Seurat, which corresponds to the external plexiform layer (EPL) in the annotation in Fig. 3b; however, there is significant spatial noise and it is difficult to see boundaries between tissue layers even in the zoomed view. With the DGI method, we observed a layered structure of domains, but significant mixing of domains is still visible in the zoomed view. In SpaceFlow, the eight-layer structure is much clearer, as nearly no mixture between occurs the corresponding ONL (domain 2 in green) and GL (domain 5 in silver gray) regions, whereas there are clear mixtures of labels by the other methods across all regions.

SpaceFlow reveals evolving cell lineage structures in chicken heart development ST data

To study how the pSM may be used to uncover spatial expression dynamics in embryonic development, we retrieved and utilized an ST dataset on the chicken heart²⁹ at four key Hamburger-Hamilton ventricular developmental stages. The dataset contains 12 tissue sections in total, and is sequenced at day 4 (5 sections), day 7 (4 sections), day 10 (2 sections) and day 14 (1 section). To build a baseline for comparison, we first visualized spot annotation from the original study (Fig. 4a). Then, we computed the domain segmentation from SpaceFlow for each time point (Fig. 4b) and labeled the domains based on their top marker genes as compared with the literature (Details in Supplementary Data 1).

We first found an evolving lineage structure, annotated as Valve in Fig. 4b. This newly identified structure is evident from Day 7 (D7) with a layered structure and consistent shape during heart development. The identified structure is consistent with the anatomical regions of the chicken heart at the sequenced stages^29,47. We also characterized the transition dynamics of the myocardium from the immature to the mature state across the period from D4 to D14 (immature myocardium annotated by orange/yellow change into cardiomyocytes annotated in red/pink). Moreover, we identified that the epicardium structure (annotated in green) on the outer ring of immature myocardium transformed into cardiomyocytes from D7 to D14.

To better understand the spatiotemporal organization of the chicken heart during development, we computed the pSM for each time point separately (Fig. 4c), considering that pseudo-time across tissues with different time points may not be comparable. Similar to the domain segmentation, the identified valve structures are clear from D7 to D14 in the pSM. In addition, the myocardium in the ventricles is more homogeneous in the pSM (blue) than in the domain segmentation. This suggests the difference in the myocardium of ventricles might be much more subtle than regions showing different pSM values. We also found the annotated myocardium in ventricles to consistently show higher pSM values (blue) than other regions, which indicates the pseudo-spatiotemporal ordering of the myocardium in the ventricles is later than other regions in the same stage. By contrast, the identified valve structures show yellow color in the pSM from D7 to D14, suggesting the ordering is relatively late compared with the regions colored in red or orange. By plotting pSM values of spots against the first component of the UMAP embedding (Fig. 4d), similar patterns can be observed, where the cells with valve annotations colored in blue shows intermediate pSM values (y-axis) and lies in the middle of the trajectories in Fig. 4d. These spatiotemporal patterns revealed in the pSM are consistent with previous observations in chicken cardiac development⁴⁷.

Through a hierarchical clustering for domains across all four stages based on the expression of top domain-specific marker genes, we found expression programs specific to evolving structures (Fig. 4e). We observed the valves of D7 and D10 to be similar to each other in expression, with genes that regulate cell growth and proliferation such as S100A11, S100A6, and CNMD, as well as genes associated with cell-collagen interaction such as TGFBI, found as the top marker genes for these populations. We also performed GO analysis to study the function of identified valve structures (Fig. 4f) and found enrichment of GO terms associated with negative regulation of BMP signaling pathway and negative regulation of epithelial-mesenchymal transformation (EMT). Previous studies found that EMT mediated by BMP2 is required for signaling from the myocardium to the underlying endothelium to form endocardial cushion (EC), which ultimately gives rise to the mature heart valves and septa⁴⁸. We also observed enrichment of positive regulation of canonical Wnt signaling pathway, previously shown as a regulator of endocardial cushion maturation as well as valve leaflet stratification, homeostasis, and pathogenesis⁴⁹.

To investigate cell-cell communication between the identified valve structures and other tissue regions, we performed space-constrained CellChat analysis⁵⁰ using the domain labels from SpaceFlow as groupings. The top two identified pathways for the valve structures are midkine (MK) and pleiotrophin (PTN), which belong to the subfamily of heparin-binding growth factors. We observed strong signaling in MDK-SDC2, MDK-NCL, PTN-SDC2, and PTN-NCL ligand-receptor pairs from valve tissue to nearby immature cardiomyocytes and atrium cardiomyocytes (Fig. 4g). These interactions have various functions, such as angiogenesis, oncogenesis, stem cell self-renewal, and play important roles in the regeneration of tissues, such as the myocardium, cartilage, neuron, muscle, and bone⁵¹. Studies have shown that midkine impedes the calcification of aortic valve interstitial cells through cell-cell communications⁵². In addition, SDC2 is found required for migration of the bilateral heart fields towards the mid-line in zebrafish model⁵³. Pleiotrophin (PTN) is usually considered a cytokine and growth factor that promotes angiogenesis⁵⁴. Together, the observed cell-cell communication based on the structures identified by SpaceFlow suggests anti-calcification and pro-angiogenesis processes are important during the maturation of valve tissue.

SpaceFlow identifies tumor-immune microenvironment in human breast cancer ST data

To study the cancer microenvironment interaction and tumor progression, we applied SpaceFlow to human breast cancer ST data⁵⁵. We show here results for sample G, consistent with the original paper. Results for other samples can be found in the Supplementary Information (Supplementary Fig. 6). First, we performed domain segmentation and compared it to the expert annotation (Fig. 5b). The obtained domains were labeled based on their marker genes (Details in Supplementary Data 2). The regions in-situ cancer-1/2, APC, B,T-1/2, and invasive-1/2 identified in the SpaceFlow segmentation agreed with the annotations Immune rich, Immune:B/plasma, and Cancer 1 respectively from the original study. However, we also identified three tumor-immune interface regions, labeled as Tu.Imm.Itfc-1/2/3, which were labeled as mixtures of other cell types in the original study.

**Fig. 5: SpaceFlow identifies tumor-immune cell-cell communication in human breast cancer ST data.**

To reveal the pseudo-spatiotemporal relationship between spots in tissues, we generated the pSM and compared it with the spatially visualized pseudotime calculated by two alternatives: Monocle, which does not use spatial information, and applying DPT to spatially aware embeddings from stLearn. In the Monocle pseudotime, we observed regional patterns consistent with the Cancer: immune rich and Cancer 1 annotations from the original study (Fig. 5c). However, the spatial noise in the Monocle pseudotime makes visualizing the overall structure of cancer development difficult. In the pseudotime from stLearn, we can only observe two major types of regions, the Cancer: immune rich regions with larger pseudotime values, and other regions that are more homogeneous but noisy in pseudotime (Fig. 5d). In the SpaceFlow pSM, we see a much clearer representation of the spatiotemporal structure of the cancer cells and the patterns is highly consistent with the expert annotation in Cancer: Immune rich, Cancer 1, and Immune: B/plasma regions (Fig. 5e). The in-situ cancer-1 regions show the lowest pSM values, whereas the Invasive-1 and Invasive-2 regions present the highest pSM values, which indicates the in-situ cancer-1 developmentally preceded than invasive regions. This trajectory can be seen clearly when we plot pSM values against the UMAP component 1 of the embeddings (Fig. 5f). A smooth progression is shown starting from the left bottom corner with the in-situ cancer-1 and branching into APC,B,T-1/2, and in-situ-cancer-2, which then merge into tumor-immune interface populations and end in the invasive-1/2 population. This suggests that in-situ-cancer-2 may be metastasized from in-situ cancer-1.

To study the characteristics of the tumor microenvironment, we identified marker genes for each domain (Fig. 5g). We found invasive-1, invasive-2, and Invasive-Connective (Inva-Conn) share strong expressions of the genes MMP11 and MMP14. Matrix metalloproteinase (MMP) family genes are involved in the breakdown of the extracellular matrix in processes such as metastasis⁵⁶. In the in-situ cancer-1 population, we observed region-specific Interferon-induced expressions, such as IFI27, IFI6, which are associated with cancer growth inhibition and apoptosis promotion⁵⁷. The in-situ cancer-2 population shows a strong and specific expression of TMEM59 and SOX4, which both can promote apoptosis. In tumor-immune interfaces, we found both pro-tumor and anti-tumor gene expressions. For instance, In tumor-immune interface-3, pro-tumor expression markers are TIMP1, a member of MMPs involved in the degradation of the extracellular matrix, whereas IGFBP4, PFDN5, CD63 repress tumor progression^58,59. We visualized these dual activities of pro-tumor and anti-tumor expression and annotated with pro-tumor or anti-tumor labels to confirm our observations (Fig. 5h). These dual activities are also confirmed in GO analysis (Fig. 5i). The enrichment of the marker genes of tumor-immune interface-1 show pro-tumor GO terms such as: negative regulation of intrinsic apoptotic pathway in response to DNA damage by p53 class mediator, negative regulation of plasmacytoid dendritic cell cytokine production (reduce type I interferon production). Anti-tumor enrichment is also found, such as positive regulation of T cell mediated cytotoxicity (promotes the killing of cancer cells), antigen processing and presentation via MHC class I B (enhances antigen presentation).

To study the cell-cell communication between the invasive (or in-situ cancer) regions and the tumor-immune interfaces, we inferred cell-cell communication through Space-constrained CellChat analysis⁵⁰. We found strong cell-cell communication between the invasive tissue region and the nearby tumor microenvironment through the collagen pathway, which facilities EMT transition and multiple processes associated with cancer progression and metastasis. Similar cell-cell communication is observed in in-situ cancer, where MDK-SDC1 and APP-CD74 signaling are observed to promote the progression and metastasis (Fig. 5j, k). The detailed function annotations for the communicating ligands and receptors can be found in Supplementary Table 1.

Discussion

In this work, we presented SpaceFlow, which (1) encodes the ST data into low-dimensional embeddings reflecting both expression similarity and the spatial proximity of cells in ST data, (2) incorporates spatiotemporal relationships of cells or spots in ST data through a pseudo-Spatiotemporal Map (pSM) derived from the embeddings, and (3) identifies spatial domains with consistent expression patterns, clear boundaries, and less noise.

SpaceFlow achieves competitive segmentation performance with alternative methods when benchmarked against expert annotations. Furthermore, the pSM utilizes the spatially consistent embeddings to reveal pseudo-spatiotemporal patterns in tissue. In DLPFC and Stereo-seq data, the pSM shows layered patterns that are consistent with the developmental sequences of the human cortex and mouse olfactory bulb respectively, which is not visible from non-spatial pseudotime. Applied to chicken heart developmental data, the pSM reveals evolving lineage structures and uncovers the dynamics in the spatiotemporal relationships of cells across different developmental stages, helping to understand the changes of functional and structural organization in tissue development. Studying human breast cancer ST data using SpaceFlow, we demonstrate its potential to identify tumor-immune interfaces and dynamics of cancer progression, providing tools to study tumor evolution and interactions between tumor and the tumor microenvironment.

Though similarity in gene expression and spatial proximity are related in many cases⁶⁰, this relationship is not absolute. Pseudotime methods developed for scRNA-seq data, such as Monocle⁶¹ and Slingshot⁶² can produce developmental trajectories that are not spatially organized. The pSM developed here can generate spatially contiguous trajectories based on the integrative usage of gene expression and spatial information. Specifically, the spatial regularization in SpaceFlow constrains the low-dimensional embedding spatially so that the embedding is continuous both in space and time. The low-dimensional spatial constraint also reduces noise in the high-dimensional gene expression data resulting in smoother domain segmentation boundaries and spatiotemporal maps.

In practice, the training time of SpaceFlow on ST data with fewer than 10,000 cells is usually less than 5 min on a GPU. The computational cost of training largely depends on the calculation of spatial regularization loss for model optimization, which is quadratic to the number of cells or spots. To accelerate model training, we compute this regularization loss over a random subset of cell-cell pairs (Details in Methods). With a fixed number of cell pairs in the subset, the training can scale linearly with the number of cells or spots, and it has been shown not affecting the outcome (Supplementary Fig. 2b–e). In the current implementation, the training will automatically be switched to the approximated regularization strategy when detecting a cell population larger than 10,000. With this strategy, training time varies from 30 s to 3 min for numbers of cells/spots ranging from 3000 to 50,000 on GeForce RTX 2080 Ti GPU. Future work could explore possible alternatives to selecting random subsets such as density-based subsampling^32,33,63, which may be more accurate for estimating the regularization loss.

The spatial regularization used in this work reflect the a priori assumption that nearby cells with similar gene expression are more closely related than spatially distant cells with the same level of transcriptional similarity. In connected tissues with low geometric complexity examined in this work, the current spatial regularization with Euclidean distance has good performance. However, it may not cover the complexity of the spatial distribution patterns and dependencies that may vary among different locations of a tissue. Extension of regularization for disjoint tissues like lymph nodes or tissues with high geometric complexity can be developed by location adaptive spatial regularizations. The general framework proposed in SpaceFlow can also be easily extended by combining the latent space regularization with other choices of embedding algorithms, which may offer various tradeoffs in terms of expressive ability and computational efficiency. However, we expect that the general principle that explicit regularization for spatial structure improves performance on ST to hold for a variety of different embedding architectures.

In addition to spatial regularization, SpaceFlow is a flexible framework able to incorporate auxiliary features about connectivity among cells in spatial or single-cell omics data. For example, it can be directly applied to 3D ST data with spatial graph input based on 3D coordinates. Future improvement could be achieved by adapting the framework for spatially resolved Epigenetic data with proper preprocessing steps, such as peak calling on spatially resolved chromatin modification data⁶⁴. Other non-genomic data modalities, such as the local texture features from histological images or expert domain annotation priors could be used to improve the robustness of the SpaceFlow embeddings. Under the SpaceFlow framework, different regularization terms reflect different prior knowledge about the tissue organization and their integration might enhance the performance of the result. In addition, the directed connectivity matrix inferred by RNA velocity⁶⁵ could be used as a constraint to derive low-dimensional embeddings consistent with RNA velocity which may improve the representation of developmental trajectory. Overall, SpaceFlow provides a robust framework and an effective tool to incorporate prior knowledge or spatial constraints to ST data analysis for inference of spatiotemporal patterns of cells in tissues.

Methods

Data preprocessing

The raw count expression matrix of ST data is preprocessed as the following. First, genes with expression in fewer than 3 cells and cells with expression of fewer than 100 genes are removed. Next, normalization is performed, where the expression of each gene is divided by total expression in that cell, so that every cell has the same total count after normalization. Then, the normalized expression is multiplied by a scale factor (10,000 by default) and log-transformed with a pseudo-count one. The log-transformed expression matrix of the top 3000 highly variable genes (HVGs) is then selected as the input for constructing the spatial expression graph. We adopt a dispersion-based method to select highly expressed genes⁶⁶. The genes are put into 20 bins based on their mean expression, and then the normalized dispersion is computed as the absolute difference between dispersion (variance/mean) and median dispersion of the expression mean, normalized by the median absolute deviation of each bin. Genes with high dispersion in each bin are then selected.

Construction of Spatial Expression Graph

We next convert the log-transformed expression matrix of highly expressed genes into a Spatial Expression Graph (SEG) as the input of our deep graph network. The Spatial Expression Graph is built based on the spatial proximity of cells, with nodes representing cells with expression profiles attached, while edges characterizing the spatial neighborhood of cells. Similarly, in spot-resolution ST data, we use a node to represent a spot in the graph. The SEG is characterized by two matrices, expression matrix ${{{{{\bf{X}}}}}}=\left\{{x}_{1},{x}_{2},..,{x}_{n}\right\}$ and spatial adjacency matrix ${{{{{\bf{A}}}}}}\in {{\mathbb{R}}}^{{{{{{\rm{N}}}}}}\times {{{{{\rm{N}}}}}}}$. Here, ${x}_{i}$ represents the expression features of the cell or spot $i$, while the element ${{{{{{\bf{A}}}}}}}_{i,j}$ in adjacency matrix is equal to 1 if there is an edge between cell/spot i and j, otherwise, ${{{{{{\bf{A}}}}}}}_{i,j}=0$.

We provide two methods for constructing the SEG, namely, alpha-complex-based and k-nearest-neighbor-based. The alpha-complex-based method is used by default, where a Voronoi cell is first created for each cell or spot located at r as:

$$V(r)=\{x\in {{\mathbb{R}}}^{2}{{{{{\rm{|}}}}}}\left|\left|x-r\right|\right|\le \left|\left|x-{r}^{{\prime} }\right|\right|,\forall r{\prime} \in {{{{{\boldsymbol{C}}}}}}\}$$

(1)

where $C$ is the set of coordinates for all the cells or spots, and $\left|\left|\bullet \right|\right|$ is the Euclidean distance. Next, the 1-skeleton of the alpha complex⁶⁷ is used to determine the neighborhood edges ${{{{{\boldsymbol{E}}}}}}$ of the spots, which can be formulated as follows:

$${{{{{\boldsymbol{E}}}}}}=\{(i,j){{{{{\rm{|}}}}}}{\cap }_{k\in \{i,j\}}(V({r}_{k})\cap B({r}_{k},\delta ))\}$$

(2)

Where $B(x,\delta )$ is a circle area in ${{\mathbb{R}}}^{2}$ centered at $x$ with a radius $\delta$. The radius $\delta$ is estimated by the mean distance of k nearest neighbors of the spot. In k-nearest-neighbor-based method, the edges of SEG are built based on the top k nearest neighbors of cells.

Spatially regularized Deep Graph Infomax

To encode the ST data into low-dimensional embeddings of cells or spots, we use the Deep Graph Infomax (DGI)³⁸, an unsupervised graph network, as the framework of our model. DGI has the advantage of capturing not only the cell expression patterns but also the cell neighborhood microenvironment, as well as high-level patterns, such as global or regional patterns. Specifically, a two-layer Graph Convolutional Network (GCN) is used as the encoder of DGI with SEG as input. The GCN generates node embeddings $\varepsilon \left({{{{{\bf{X}}}}}},{{{{{\bf{A}}}}}}\right)={{{{{\bf{H}}}}}}=\left\{{h}_{1},{h}_{2},..,{h}_{n}\right\}$ for each cell or spot.

DGI adopts a contrastive learning strategy⁶⁸ to learn the encoder, where features are learned through teaching which data points from an unlabeled dataset are similar or distinct. Similar data points are constructed by pairing cell embedding ${h}_{i}$ with a global summary vector ${{{{{\boldsymbol{s}}}}}}$, whereas the distinct data points are represented by the pairs of the summary vector ${{{{{\boldsymbol{s}}}}}}$ and embeddings from a constructed Expression Permuted Graph (EPG). The summary vector ${{{{{\boldsymbol{s}}}}}}$ reflects global patterns of SEG, and it is implemented by a sigmoid of the mean of all cell embeddings. EPG is a graph built by random permutating the node features ${{{{{\boldsymbol{X}}}}}}$ in SEG, with the adjacency ${{{{{\boldsymbol{A}}}}}}$ keeping the same. Mathematically, this learning process is achieved by maximizing the following objective function:

$${{{{{{\rm{Loss}}}}}}}_{{{{{{\rm{DGI}}}}}}}=\frac{1}{2N}({\sum }_{i=1}^{N}{{\mathbb{E}}}_{({{{{{\bf{X}}}}}}{{{{{\boldsymbol{,}}}}}}{{{{{\bf{A}}}}}})}[{{\log }}{{{{{\rm{D}}}}}}({{{{{{\rm{h}}}}}}}_{i},{{{{{\bf{s}}}}}})]+{\sum }_{j=1}^{N}{{\mathbb{E}}}_{(\widetilde{{{{{{\rm{X}}}}}}},\widetilde{{{{{{\rm{A}}}}}}})}[{{\log }}(1-{{{{{\rm{D}}}}}}(\widetilde{{{{{{{\rm{h}}}}}}}_{j}},{{{{{\bf{s}}}}}}))])$$

(3)

where ${h}_{i}$ is the embedding of node $i$ from the SEG, $\widetilde{{{{{{{\rm{h}}}}}}}_{{{{{{\rm{j}}}}}}}}$ is the embedding of node $j$ from the EPG. $\widetilde{{{{{{\bf{X}}}}}}}$ and $\widetilde{{{{{{\bf{A}}}}}}}$ are the permuted node features and corresponding adjacency matrix of EPG. The D is the discriminator, which is defined by ${{{{{\rm{D}}}}}}({h}_{i},{{{{{\bf{s}}}}}})={{{{{\rm{Sigmoid}}}}}}({h}_{i}^{T}\varTheta {{{{{\bf{s}}}}}})$.

Where $\varTheta \in {{\mathbb{R}}}^{{N}_{F}\times {N}_{F}}$ is trainable weight. Through this contrastive learning strategy, the encoder is forced to learn global patterns and neglect random spatial expression patterns in the embeddings.

To enforce the spatial consistency in the embeddings, so that the closeness between embeddings not only reflects the expression similarity but also their spatial proximity, we add a spatial regularization to the objective function in DGI. Mathematically, the revised objective function can be expressed as follows:

$${{{{{{\rm{Loss}}}}}}}_{{{{{{\rm{Total}}}}}}}={{{{{{\rm{Loss}}}}}}}_{{{{{{\rm{DGI}}}}}}}+{\gamma }* {\sum }_{i=1}^{N}{\sum }_{j=1}^{N}\frac{{{{{{{\bf{D}}}}}}}_{i,j}^{(s)}* (1-{{{{{{\bf{D}}}}}}}_{i,j}^{(z)})}{N* N}$$

(4)

where ${{{{{{\bf{D}}}}}}}_{i,j}^{(s)}$ represents the spatial distance between cell/spot i to j in Euclidean space, and ${{{{{{\bf{D}}}}}}}_{i,j}^{(z)}$ is the embedding distance between cell/spot i to j in embedding space, ${{{{{\rm{N}}}}}}* {{{{{\rm{N}}}}}}$ is the normalization term, where ${{{{{\rm{N}}}}}}$ is the number of cells or spots in ST data. The spatial regularization penalizes the generation of close embeddings for cells or spots that are spatially far from each other. In another word, so that, the close embeddings caused by the expression similarity are pushed further from each other if they are spatially distant. Strong spatial regularization may overemphasize generation of spatially smooth embeddings which do not necessarily coincide with more textured biological heterogeneity. To mitigate this issue, we added a regularization strength parameter ${\gamma }$ to control spatial regularization strength relative to the reconstruction loss. The default regularization strength is set to 0.1 as a loose prior for keeping more detailed texture (values ranging from 0 to 1).

Domain segmentation and pseudo-Spatiotemporal Map

The domain segmentations are obtained by running Leiden clustering⁶⁹ with the low-dimensional embeddings from SpaceFlow as input. By default, the parameter for the local neighborhood size is set to 50 for to produce a smoother segmentation. The pseudo-Spatiotemporal Map (pSM) is calculated by running the diffusion pseudotime (DPT)³¹ using the low-dimensional embeddings output from SpaceFlow. The DPT is an algorithm using diffusion-like random walks to estimate the ordering and transitions between cells. Using the embeddings from SpaceFlow that encoded both spatial and expression information of cells as input, DPT can output a spatiotemporal order which is consistent in both space and pseudotime. The root cell for pSM can be specified with prior knowledge, otherwise, in default, the cell that with the largest sum distance to others in embedding space is assigned as the root cell in our strategy.

Parameters of the model

The deep graph network is built and trained based on PyTorch. To construct SEG, the default number of nearest neighbors $k$ of a cell or spot for adding edges is set to 15; A larger $k$ will lead to a bigger spatial neighborhood. The DeepGraphInfomax model in PyTorch Geometric library is used for implementing DGI. The default latent dimension size for low-dimensional embeddings is set to 50. A two-layer Graph Convolutional Network (GCN) is utilized as the encoder for SEG with Parametric ReLUs (PReLU) as the activation functions. The number of neurons for both layers is set equal to the low-dimensional embedding size.

Training procedure

The optimizer used for training DGI is Adam with a default learning rate ${lr}=0.001$ applied⁷⁰. The maximum number of epochs for training is set to 1000, with an early stopping strategy applied to avoid overfitting. Specifically, the minimum epoch for early stopping is set to 100, and the patience of epochs with no loss decrease is set to 50. A GeForce RTX 2080 Ti GPU is used for training the DGI model. The training time varies from 30 s to 3 min numbers of cells/spots ranging from 3000 to 50,000, and the subsampling strategy stated below needed to be applied when the number is greater than 10,000.

Accelerating the computation of spatial regularization loss

Because the computational cost of training largely depends on the calculation of spatial regularization loss, which is quadratic to the number of cells or spots, we designed a strategy as follows to accelerating the training. The spatial regularization loss is used in model optimization, which involves calculating the weighted average of an inner product of a spatial distance matrix and an embedding distance matrix. It has $O({{{{{\rm{M}}}}}}* {N}^{2})$ computational complexity and memory cost, where ${N}^{2}$ is the number of edges in a fully-connected spatial graph with $N$ cells, ${{{{{\rm{M}}}}}}$ is the size of the latent dimension. However, during each training step, we compute the spatial regularization loss over a random fixed-size subset of edges, which reduces the computational complexity of regularization loss from quadratic to constant. When tested on the slideseqv2 dataset with 41,876 cells, the computational and memory cost dropped from over 5 h and 18 GB to less than 3 min and 4GB (Supplementary Fig. 4). We additionally found significant improvements in performance applying SpaceFlow to a seqFISH mouse embryogenesis dataset72 (Supplementary Fig. 5).

Benchmarking

Segmentation benchmarking

To benchmark domain segmentation performance, we compare SpaceFlow against five methods, Seurat 4³⁹, Giotto¹⁷, stLearn²⁰, MERINGUE²², BayesSpace¹⁸ using the LIBD human dorsolateral prefrontal cortex (DLPFC) ST data⁴⁰. To make the domains comparable between benchmarking methods, we set the target number of clusters equal to the number of clusters in annotation for all benchmarking methods. The adjusted Rand index (ARI) is used to quantify the similarity between the clustering result and the annotation.

With Seurat, the RNA transcript counts are used for the input, with genes expressed in fewer than 3 cells filtered, and cells expressing fewer than 100 genes removed. Then, the SCTransform function in Seurat R package is applied to normalize the UMI count data using regularized negative binomial regression. Next, the RunUMAP, FindNeighbors, FindClusters methods are performed on the normalized count data sequentially with the latent dimension size of 50 and default cluster resolution of 0.4.

When benchmarking with stLearn, the count matrix and the spot positions were used as input, which is downloaded directly from the data sources (see Data Availability). The count matrix input was read via Read10X function in the stLearn package. Next, filter_genes, normalize_total, log1p, run_pca functions were applied sequentially to preprocess data, with the minimal number of genes for filtering set to 3. Next, the histological image of the tissue is preprocessed by using the tiling and extract_features functions. Then, the SME_normalize function is used with the parameter setting of use_data=“raw” and weights=”physical_distance”. Finally, the scale and run_pca are performed on the normalized data with number of principal components of 50. The principal components from normalized data will then be used for segmentation or pseudotime analysis via Leiden and DPT, respectively.

With Giotto, we input the count matrix and the spot positions, and then applied the normalizeGiotto, addStatistics, calculateHVG to preprocess data and identify highly variable genes (HVG). HVGs expressed in at least 3 cells and with a mean normalized expression greater than 0.4 are then feed into runPCA function for the principal components. The spatial network was then created through the createSpatialNetwork function with the parameter for the kNN method set to k = 5 and a maximum distance of 400 in kNN. Finally, the doHMRF method is used for clustering with the parameter beta set to 40.

For BayesSpace benchmarking, we input expression matrix and the spot positions through the getRDS(“2020_maynard_prefrontal-cortex”) method. Next, the modelGeneVar and getTopHVGs methods in scran method are used to model the variance of log-expression profile of each gene and extract the top 2000 highly variable genes. Then, the runPCA function in the scater package is used for principal components. The BayesSpace clustering method spatialCluster is applied with 15 principal components, with 50,000 MCMC iterations and gamma = 3 for smoothing.

For MERINGUE benchmarking, we input the spatial locations and the top 50 principal components from the expression matrix of ST data. Next, the spatial adjacency weight matrix is constructed using the getSpatialNeighbors function in the R package of MERINGUE, with a setting of filterDist = 2. Then, getSpatiallyInformedClusters is performed to get spatially informed clusters by weighting graph-based clustering with spatial information, with a setting of k = 20, alpha = 1, beta = 1.

Pseudo-spatiotemporal map benchmarking

The pSM is compared to the spatial embedding method stLearn²⁰, and three non-spatial pseudotime methods, Seurat 4³⁹, Monocle⁶¹, and Slingshot⁶². In stLearn, because the histological image of the tissue is required for spatial-aware embedding, we only made comparisons when the histological was available. We calculated pseudotime from stLearn by running the diffusion pseudotime (DPT) (Haghverdi et al. 2016) using the stLearn embedding. In Seurat 4, the DPT is run using the principal components of the expression data, whereas in Monocle and Slingshot, the recommended workflows with the default parameters are performed.

Downstream analysis

Marker genes identification

To identify marker genes that can best characterize specific expressions for domains output from SpaceFlow, the rank_genes_groups method in the Scanpy package (v1.8.2) is used. When performing this method, the Wilcoxon rank-sum test with a Benjamini–Hochberg p value correction is applied. The cutoff of the adjusted p value for domain-specific marker genes is set to 0.01.

Domain annotation

The domains identified by SpaceFlow are annotated based on the literature report of the domain-specific marker genes. The details of the literature support and marker gene list can be found in Supplementary Data 1 and 2.

Gene Ontology enrichment analysis

The Gene Ontology (GO) Enrichment Analysis in the GO Consortium website is carried out to identify the enriched GO terms for domain-specific maker genes with adjusted p value < 0.01.

Space-constrained CellChat analysis

CellChat analysis is performed on ST data using the domain labels from SpaceFlow as groupings. Inferred CellChat communications between domains are further scrutinized such that the communication links are only allowed between spatially adjacent domains. The CellChat v1.1.3 is used under a R v4.1.2 environment.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data analyzed in this paper can be downloaded in raw form from the original publication. Specifically, the DLPFC data is available in the “spatialLIBD package [http://spatial.libd.org/spatialLIBD]. The processed Stereo-seq data from mouse olfactory bulb tissue is accessible at “SEDR analyses [https://github.com/JinmiaoChenLab/SEDR_analyses]”. The chicken heart ST data is retrieved from GEO database under accession code “GSE149457 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149457]”. The human breast cancer ST data can be obtained from the Zenodo dataset “4751624”. The sample we used is the same as the one demonstrated in the original paper (patient G-sample 1). Both chicken heart ST data and breast cancer ST data were sequenced by 10x Visium platform. The Slide-seq V2 can be accessed in Squidpy package⁷¹ or downloaded from “Broad Institute database [https://singlecell.broadinstitute.org/single_cell/study/SCP815/highly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with-slide-seqv2]”. The seqFISH data can be accessed at the “Spatial Mouse Atlas [https://marionilab.cruk.cam.ac.uk/SpatialMouseAtlas/]”. The Gene Ontology Consortium database can be accessed via “Gene Ontology Consortium [http://geneontology.org/]”. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding author upon reasonable request.

Code availability

The SpaceFlow package is implemented in Python with a dependency of Pytorch and is available on the GitHub repository “SpaceFlow [https://github.com/hongleir/SpaceFlow]”. It is also deposited at Zenodo dataset “6668286”.

References

Asp, M., Bergenstråhle, J. & Lundeberg, J. Spatially resolved transcriptomes-next generation tools for tissue exploration. Bioessays 42, e1900221 (2020).
Article PubMed Google Scholar
Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central CAS Google Scholar
Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball patterned arrays. Cell. 185, 1777–1792.e21 (2021).
Cassese, A. et al. Spatial autocorrelation in mass spectrometry imaging. Anal. Chem. 88, 5871–5878 (2016).
Article CAS PubMed Google Scholar
Abdelmoula, W. M. et al. Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data. Proc. Natl Acad. Sci. USA 113, 12244–12249 (2016).
Article CAS PubMed PubMed Central Google Scholar
Abdelmoula, W. M. et al. massNet: integrated processing and classification of spatially resolved mass spectrometry data using deep learning for rapid tumor delineation. Bioinformatics 38, 2015–2021 (2022).
Article CAS Google Scholar
Zhang, W. et al. Spatially aware clustering of ion images in mass spectrometry imaging data using deep learning. Anal. Bioanal. Chem. 413, 2803–2819 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bohland, J. W. et al. Clustering of spatial gene expression patterns in the mouse brain and comparison with classical neuroanatomy. Methods 50, 105–112 (2010).
Article CAS PubMed Google Scholar
Huisman, S. M. H. et al. BrainScope: interactive visual exploration of the spatial and temporal human brain transcriptome. Nucleic Acids Res. 45, e83 (2017).
Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).
Article CAS PubMed PubMed Central Google Scholar
Edsgärd, D., Johnsson, P. & Sandberg, R. Identification of spatial expression trends in single-cell gene expression data. Nat. Methods 15, 339–342 (2018).
Article PubMed PubMed Central CAS Google Scholar
Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol. 5, 1–10 (2022).
Article Google Scholar
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39, 1375–1384 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. et al. SC-MEB: spatial clustering with hidden Markov random field using empirical Bayes. Brief. Bioinform. 23, bbab466 (2022).
Pham, D. et al. stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues. bioRxiv https://doi.org/10.1101/2020.05.31.125658 (2020).
Moehlin, J., Mollet, B., Colombo, B. M. & Mendoza-Parra, M. A. Inferring biologically relevant molecular tissue substructures by agglomerative clustering of digitized spatial transcriptomes with multilayer. Cell Syst. 12, 694–705.e3 (2021).
Article CAS PubMed Google Scholar
Miller, B. F., Bambah-Mukku, D., Dulac, C., Zhuang, X. & Fan, J. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res. 31, 1843–1855 (2021).
Article PubMed PubMed Central Google Scholar
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Article PubMed CAS Google Scholar
Fu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. bioRxiv https://doi.org/10.1101/2021.06.15.448542 (2021).
Cang, Z., Ning, X., Nie, A., Xu, M. & Zhang, J. Scan-IT: Domain segmentation of spatial transcriptomics images by graph neural network. In British Machine Vision Conference (British Machine Vision Conference, 2021).
Zuo, C. et al. Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning. Res. Square https://doi.org/10.21203/rs.3.rs-1287670/v1 (2022).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder. bioRxiv https://doi.org/10.1101/2021.08.21.457240 (2021).
Chang, Y. et al. Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning. bioRxiv https://doi.org/10.1101/2021.07.08.451210 (2021).
Mantri, M. et al. Spatiotemporal single-cell RNA sequencing of developing chicken hearts identifies interplay between cellular differentiation and morphogenesis. Nat. Commun. 12, 1771 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Misra, A. et al. Characterizing neonatal heart maturation, regeneration, and scar resolution using spatial transcriptomics. J. Cardiovasc. Dev. Dis. 9, 1 (2021).
Article PubMed PubMed Central CAS Google Scholar
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Article CAS PubMed Google Scholar
van Unen, V. et al. Visual analysis of mass cytometry data by hierarchical stochastic neighbour embedding reveals rare cell types. Nat. Commun. 8, 1740 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Marcílio-Jr, W. E., Eler, D. M., Paulovich, F. V. & Martins, R. M. HUMAP: hierarchical uniform manifold approximation and projection. Preprint at https://arxiv.org/abs/2106.07718 (2021).
Pezzotti, N. et al. Multiscale visualization and exploration of large bipartite graphs. Comput. Graph. Forum 37, 549–560 (2018).
Article Google Scholar
Perozzi, B., Al-Rfou, R. & Skiena, S. DeepWalk. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14 (ACM Press, 2014). https://doi.org/10.1145/2623330.2623732.
Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at https://arxiv.org/abs/1611.07308?context=cs (2016).
Bojchevski, A. & Günnemann, S. Deep Gaussian embedding of graphs: unsupervised inductive learning via ranking. Preprint https://arxiv.org/abs/1707.03815 (2017).
Veličković, P. et al. Deep Graph Infomax. arXiv [stat.ML]. Preprint at https://arxiv.org/abs/1809.10341 (2018).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Article CAS PubMed PubMed Central Google Scholar
Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 24, 425–436 (2021).
Article CAS PubMed PubMed Central Google Scholar
Watakabe, A. et al. Area-specific substratification of deep layer neurons in the rat cortex. J. Comp. Neurol. 520, 3553–3573 (2012).
Article CAS PubMed Google Scholar
Bernard, D. et al. A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J. 29, 3082–3093 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lisman, J., Yasuda, R. & Raghavachari, S. Mechanisms of CaMKII action in long-term potentiation. Nat. Rev. Neurosci. 13, 169–182 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. Long-term potentiation modulates synaptic phosphorylation networks and reshapes the structure of the postsynaptic interactome. Sci. Signal. 9, rs8 (2016).
Article PubMed Google Scholar
Martín-López, E., Corona, R. & López-Mascaraque, L. Postnatal characterization of cells in the accessory olfactory bulb of wild type and reeler mice. Front. Neuroanat. 6, 15 (2012).
Article PubMed PubMed Central CAS Google Scholar
Xiang, Y., Xin, J., Le, W. & Yang, Y. Neurogranin: a potential biomarker of neurological and mental diseases. Front. Aging Neurosci. 12, 584743 (2020).
Article CAS PubMed PubMed Central Google Scholar
Martinsen, B. J. Reference guide to the stages of chick heart embryology. Dev. Dyn. 233, 1217–1237 (2005).
Article PubMed Google Scholar
Wang, R. N. et al. Bone Morphogenetic Protein (BMP) signaling in development and human diseases. Genes Dis. 1, 87–105 (2014).
Article PubMed PubMed Central Google Scholar
Alfieri, C. M., Cheek, J., Chakraborty, S. & Yutzey, K. E. Wnt signaling in heart valve development and osteogenic gene induction. Dev. Biol. 338, 127–135 (2010).
Article CAS PubMed Google Scholar
Suoqin, C. F. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1–20 (2021).
ADS CAS Google Scholar
Xu, C., Zhu, S., Wu, M., Han, W. & Yu, Y. Functional receptors and intracellular signal pathways of midkine (MK) and pleiotrophin (PTN). Biol. Pharm. Bull. 37, 511–520 (2014).
Article CAS PubMed Google Scholar
Zhou, Q. et al. Midkine prevents calcification of aortic valve interstitial cells via intercellular crosstalk. Front. Cell Dev. Biol. 9, 794058 (2021).
Article PubMed PubMed Central Google Scholar
Arrington, C. B. & Yost, H. J. Extra-embryonic syndecan 2 regulates organ primordia migration and fibrillogenesis throughout the zebrafish embryo. Development 136, 3143–3152 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, J. et al. The pro-angiogenic cytokine pleiotrophin potentiates cardiomyocyte apoptosis through inhibition of endogenous AKT/PKB activity. J. Biol. Chem. 282, 34984–34993 (2007).
Article CAS PubMed Google Scholar
Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 12, 1–14 (2021).
Article CAS Google Scholar
Cabral-Pacheco, G. A. et al. The roles of matrix metalloproteinases and their inhibitors in human diseases. Int. J. Mol. Sci. 21, 9739 (2020).
Article CAS PubMed Central Google Scholar
Wang, H. et al. Knockdown of IFI27 inhibits cell proliferation and invasion in oral squamous cell carcinoma. World J. Surg. Oncol. 16, 64 (2018).
Article PubMed PubMed Central Google Scholar
Ma, H.-C. et al. Hepatitis C virus ARFP/F protein interacts with cellular MM-1 protein and enhances the gene trans-activation activity of c-Myc. J. Biomed. Sci. 15, 417–425 (2008).
Article CAS PubMed Google Scholar
Koh, H. M., Jang, B. G. & Kim, D. C. Prognostic value of CD63 expression in solid tumors: a meta-analysis of the literature. Vivo 34, 2209–2215 (2020).
Article CAS Google Scholar
Nitzan, M., Karaiskos, N., Friedman, N. & Rajewsky, N. Gene expression cartography. Nature 576, 132–137 (2019).
Article ADS CAS PubMed Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Article CAS PubMed PubMed Central Google Scholar
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Pezzotti, N., Höllt, T., Lelieveldt, B., Eisemann, E. & Vilanova, A. Hierarchical stochastic neighbor embedding. Comput. Graph. Forum 35, 21–30 (2016).
Article Google Scholar
Deng, Y. et al. Spatial-CUT&Tag: spatially resolved chromatin modification profiling at the cellular level. Science 375, 681–686 (2022).
Article ADS CAS PubMed Google Scholar
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Delfinado, C. J. A. & Edelsbrunner, H. An incremental algorithm for Betti numbers of simplicial complexes on the 3-sphere. Comput. Aided Geom. Des. 12, 771–784 (1995).
Article MathSciNet MATH Google Scholar
Le-Khac, P. H., Healy, G. & Smeaton, A. F. Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934(2020).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Palla, G. et al. Squidpy: a scalable framework for spatial single cell analysis. bioRxiv https://doi.org/10.1101/2021.02.19.431994 (2021).

Download references

Acknowledgements

The project was partly supported by the National Science Foundation grant DMS176372, the National Institutes of Health grants U01AR073159, R01DE030565, and P30AR075047, and a Simons Foundation Grant (594598).

Author information

Authors and Affiliations

The NSF-Simons Center for Multiscale Cell Fate Research, University of California Irvine, Irvine, CA, 92627, USA
Honglei Ren, Benjamin L. Walker & Qing Nie
Department of Mathematics, University of California Irvine, Irvine, CA, 92627, USA
Benjamin L. Walker & Qing Nie
Department of Mathematics, North Carolina State University, Raleigh, NC, 27695, USA
Zixuan Cang
Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, 92627, USA
Qing Nie

Authors

Honglei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin L. Walker
View author publications
You can also search for this author in PubMed Google Scholar
Zixuan Cang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Nie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.C., Q.N., H.R., and B.W. conceived the project; H.R. implemented the algorithm and code, and conducted data analysis. All the authors wrote and approved the manuscript; Q.N. supervised the research.

Corresponding author

Correspondence to Qing Nie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of additional Supplementary File

Supplementary Dataset 1

Supplementary Dataset 2

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ren, H., Walker, B.L., Cang, Z. et al. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat Commun 13, 4076 (2022). https://doi.org/10.1038/s41467-022-31739-w

Download citation

Received: 23 March 2022
Accepted: 30 June 2022
Published: 14 July 2022
DOI: https://doi.org/10.1038/s41467-022-31739-w

This article is cited by

Spatial multi-omics: novel tools to study the complexity of cardiovascular diseases
- Paul Kiessling
- Christoph Kuppe
Genome Medicine (2024)
Streamlining spatial omics data analysis with Pysodb
- Senlin Lin
- Fangyuan Zhao
- Zhiyuan Yuan
Nature Protocols (2024)
PROST: quantitative identification of spatially variable genes and domain detection in spatial transcriptomics
- Yuchen Liang
- Guowei Shi
- Zhonghui Tang
Nature Communications (2024)
Benchmarking spatial clustering methods with spatially resolved transcriptomics data
- Zhiyuan Yuan
- Fangyuan Zhao
- Yi Zhao
Nature Methods (2024)
Spatial Transcriptomics in a Case of Follicular Thyroid Carcinoma Reveals Clone-Specific Dysregulation of Genes Regulating Extracellular Matrix in the Invading Front
- Vincenzo Condello
- Johan O. Paulsson
- C. Christofer Juhlin
Endocrine Pathology (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.