Spatial transition tensor of single cells

Spatial transcriptomics and messenger RNA splicing encode extensive spatiotemporal information for cell states and transitions. The current lineage-inference methods either lack spatial dynamics for state transition or cannot capture different dynamics associated with multiple cell states and transition paths. Here we present spatial transition tensor (STT), a method that uses messenger RNA splicing and spatial transcriptomes through a multiscale dynamical model to characterize multistability in space. By learning a four-dimensional transition tensor and spatial-constrained random walk, STT reconstructs cell-state-specific dynamics and spatial state transitions via both short-time local tensor streamlines between cells and long-time transition paths among attractors. Benchmarking and applications of STT on several transcriptome datasets via multiple technologies on epithelial–mesenchymal transitions, blood development, spatially resolved mouse brain and chicken heart development, indicate STT’s capability in recovering cell-state-specific dynamics and their associated genes not seen using existing methods. Overall, STT provides a consistent multiscale description of single-cell transcriptome data across multiple spatiotemporal scales.


Supplementary Note 1: Theoretical foundation of STT
The input to the STT is the single-cell gene expression matrices of both spliced(S) and unspliced(U) counts (Fig. 1c), and the cell annotations that serve as initial guess for cell state membership.Through an iterative scheme between parameter estimation and dynamics decomposition, STT constructs an attractor-wise velocity tensor named transition tensor of shape ℝ   ×2××  , where   denotes number of cells,   number of genes and  number of attractors.Other quantities of tensor-based dynamics, including the memberships of cells in the attractors, transition probabilities, and transition paths, are subsequently obtained in this construction (main text Methods).
Theoretically, STT is motivated by the stochastic model of gene expression and splicing dynamics within individual cells expressed as {   = (  (,  1 , . .,    ) −     ) +    , ,   = (    −     ) +    , , where   and   are the unspliced and spiced counts for gene i.The nonlinear function   (,  1 , . .,    ) models how other genes regulate the production rate of gene i.The system can possess multiple fixed points or attractors representing the different cell types.
The gene expression function permits the bifurcation of system modelling cell-fate commitment.The parameters   represents the mRNA splicing rate and   is the mRNA degradation rate.The independent Wiener process term  , and  , represent the effect of stochastic noise in gene expression.The presence of stochasticity also drives the rare noise-induced cell state transitions among multi-stable attractors, which typically occurs at the time scale larger than splicing dynamics.
To better estimate the parameters, STT assumes that the majority of cells are located within the multiple attractor basins that correspond to the different biological cell types, with a small fraction of cells making transitions across the saddle points 1 .Around these steady-states, the unspliced mRNA production term can be expanded and approximated to its linear expansion, thus introducing the attractor-dependent mRNA transcription rate (main text Methods).Such expansion allows robust estimation of transition tensors parameters (main text Fig. 1d, Methods).
A key step in STT to connect local dynamics, encoded by the attractor-wise transition tensor, to the global dynamics in development, is the construction of a tensor-based cellular random walk, whose transition probability matrix can be expressed as Here   is the cellular transition probability matrix defined using the inner-product kernel between averaged tensor over attractors and cell's unspliced/spliced counts, and   is the transition probability matrix defined using connectivity(similarity) between cells' gene expression (main text Methods and ref 2 ).The underlying rationale and insights to use   and especially   can be drawn from following theorem in ref 3 : Where ℓ() = () + () is the non-equilibrium part of ground truth drift term.
In the STT implementation, the gaussian kernel with bandwidth  was replaced by knearest-neighbor graph adopted by the CellRank package, which yields the similar continuum limits as cell number tends to infinity 3 .Overall, the theorem suggest that transition tensors can provide a cellular random walk description that is asymptotically consistent with continuous stochastic differential equation by constructing inner-product velocity kernel.
In addition to the global dynamics induced by inner-product kernel, to visualize the local transition tensor components, STT plots the streamlines using the function in scVelo, which adopted the cosine similarity between tensor's direction and cell's displacement 5 .
The choice could be supported by following theorem in ref 3 to guarantee the accuracy of streamlines in continuum limit as cell numbers go to infinity.Table 2 and the GitHub notebook link.

Synthetic dataset of toggle-switch circuit
We used the Euler-Maruyama method to simulate the stochastic differential equation model of circuit 6 containing two mutually inhibited genes X and Y to generate both unspliced and spliced counts: was generated.
To analyze the generated dataset, we used the result of Leiden clustering for both spliced and unspliced counts under resolution 0.002 as the initial guess for attractor membership, which resulted in  = 2 attractors as input to STT.
To visualize the results, the 2D embedding of PCA using both spliced and unspliced counts was utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Synthetic dataset of EMT circuit
We used the SDE model of modified EMT circuit of nine molecular species based on Tian et al. to include unspliced and spliced counts 6 .Noise terms were added following the same approach of the toggle-switch model.To analyze the generated dataset, we used the result of Leiden clustering for both spliced and unspliced counts under resolution 0.01 as the initial guess for attractor membership, which resulted in  = 3 attractors as input to STT.
To visualize the results, the 2D embedding of UMAP using both spliced and unspliced counts was utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Human lung A549 EMT dataset
The dataset was downloaded and processed as in ref 6 with 3132 cells and 2000 highly variable selected genes.We used the temporal label as the initial guess for attractor membership, and chosed in  = 3 attractors as input to STT.

Adult bone marrow dataset
The dataset was downloaded using the scv.datasets.bonemarrow () from the scVelo package, the counts were normalized and the top 2000 highly variable genes were selected.We used original annotation as the initial guess for attractor membership, which resulted in  = 7 attractors as input to STT.
To visualize the results, the 2D embeddings of transition coordinate of STT (main text Methods) were utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Mouse brain spatial dataset
The processed data was downloaded from https://zenodo.org/record/6798659and we followed the SIRV pipeline to impute the unspliced and spliced counts.For imputed data, all 117 genes are used for calculation.We used region annotation as the initial guess for attractor membership, which resulted in  = 8 attractors as input to STT.
To visualize the results, the spatial coordinates in original spatial data were utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Chicken heart spatial dataset
The processed data was downloaded from https://zenodo.org/record/6798659and we followed the SIRV pipeline to impute the unspliced and spliced counts.For imputed data, top 2000 genes selected by scanpy.pp.highly_variable_genes are used for calculation, with 'flavor=seurat v3'.We used the region annotation as the initial guess for attractor membership, which resulted in  = 5 attractors as input to STT.
To visualize the results, the spatial coordinates in original spatial data were utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Mouse coronal hemibrain spatial dataset
The processed data was downloaded from https://www.dropbox.com/s/c5tu4drxda01m0u/mousebrain_bin60.h5ad?dl=0.We used top 2000 genes selected by scanpy.pp.highly_variable_genes are used for calculation, with 'flavor=seurat v3'.We used the region annotation as the initial guess for attractor membership, which resulted in  = 15 attractors as input to STT.
To visualize the results, the spatial coordinates in original spatial data were utilized.The streamlines of various tensor components were plotted using "pl.velocity_embedding_stream"function in scVelo where the velocity graph ("tl.velocity_graph") was calculated using the cosine similarity between the tensor components and corresponding displacements in k-NN smoothed counts.

Figure S2 Figure S3 Figure S4 .
Figure S2 Additional analysis result for A549 dataset.(a) The UMAP embedding of spliced counts only with cells color-coded by collection time.(b-d) Streamlines of RNA velocity inferred by other methods.(e) Visualization of leiden clustering, averaged tensor speed, cellular entropy and collection time in UMAP of both unspliced and spliced counts (of 2000 highly variable genes).(f) The differentially expressed (DE) genes of each attractor identified by the Wilcox test using Scanpy.

Figure S5 .Figure S6 Figure S7
Figure S5.Analysis result for pancreas dataset.(a) Attractors predicted by STT.(b) The CellRank absorption probability analysis based on STT multi-stability kernel.(c) The streamlines of various tensor components.

Figure S8 Figure S9 Figure S10
Figure S8 Sensitivity analysis of threshold to filter multi-stability genes in mouse brain spatial dataset.(a) The STT attractors detected for increasing values of multi-stability genes score from 0.1 to 0.3.(b) Corresponding streamlines of tensor components.

Table S3 Summary table of STT computation time and memory usage when analyzing single-cell and spatial transcriptome datasets in the manuscript, tested on a personal laptop.
Here peak memory indicates the maximum amount of total memory used during running, within which the incremental memory denotes the memory requested when executing the dynamical analysis API of STT.The memory usage was recorded by the Python memory profiler package.The timing was recorded by taking the average of 7 executions by the Python timeit package.The performance was tested on a MacBook Air 2023 with M2 chip and 16GB memory.