Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics

Advances in single-cell technologies allow scrutinizing of heterogeneous cell states, however, detecting cell-state transitions from snap-shot single-cell transcriptome data remains challenging. To investigate cells with transient properties or mixed identities, we present MuTrans, a method based on multiscale reduction technique to identify the underlying stochastic dynamics that prescribes cell-fate transitions. By iteratively unifying transition dynamics across multiple scales, MuTrans constructs the cell-fate dynamical manifold that depicts progression of cell-state transitions, and distinguishes stable and transition cells. In addition, MuTrans quantifies the likelihood of all possible transition trajectories between cell states using coarse-grained transition path theory. Downstream analysis identifies distinct genes that mark the transient states or drive the transitions. The method is consistent with the well-established Langevin equation and transition rate theory. Applying MuTrans to datasets collected from five different single-cell experimental platforms, we show its capability and scalability to robustly unravel complex cell fate dynamics induced by transition cells in systems such as tumor EMT, iPSC differentiation and blood cell differentiation. Overall, our method bridges data-driven and model-based approaches on cell-fate transitions at single-cell resolution.


Supplementary Note 1: Theoretical Analysis Dynamical Systems Description of Cell-fate Transition
As a popular and basic description, the stochastic transition of cell-fate can be modeled by over-damped Langevin equation (OLE) where X t denotes the gene expression vector at time t, W t denotes the standard Wiener process, accounting for the noise in gene expression dynamics and ε represents the noise amplitude of the system. Typically, the potential field U (x) possesses several local minimals or potential wells, corresponding to the attractors of cell fate ( Supplementary   Fig. 1a).
Under small or moderate noise, the dynamical system will exhibit metastability property [1]. In most of the time, the gene expression of a single cell fluctuates gently around the attractors (Supplementary Fig. 1b). Driven by the noise or differentiation signal, switches between multiple macroscopic cell states may occur eventually [1,2,3,4], creating the transition cells that are often traveling across the saddle point (also known as the transition state [5,6] or intermediate state [7]) between attractor basins (Supplementary Such metastability naturally introduces multi-scale model reduction (or coarse-graining) approaches to simplify the model computation and representation. When ε is small, the famous Kramers' reaction rate theory [8] establishes the quantitative relations between switch rate k ij from attractor S i to S j with potential barrier height ∆U , stating that k ij ∝ e −∆U/ε . Following rigorous analysis [9], the original OLE dynamics can then be upscaled as a Markov jump process, whose discrete states correspond to the attractors of OLE, and jump rate between states specified by k ij .
It is thus interesting and important to investigate how the computational model of cell-fate transition can be combined with single-cell data analysis [10]. As shown in Supplementary Fig. 1-c, following Lafon and Coifman's results [11], the diffusion map random walk (DMRW) constructed on single-cell expression data [12,13] yields the continuous limit which is exactly described by OLE dynamics. As a deeper exploration, below we will show in mathematically rigorous manner, that the coarse-grained Markov Chain of DMRW by MuTrans is consistent with the OLE model reduction by Kramers' reaction rate theory, in the continuum limit and zero noise limit. As the result of such comprehensive understandings, various concepts in single-cell data analysis and computational modeling of cell-fate transition, can be therefore unified through MuTrans implementation (1d).

Mathematical Analysis of MuTrans and OLE Consistency
To demonstrate the consistency, the key assumption on single-cell data is as follows, which is also the basis for the discussion by Lafon and Coifman about the continuum limit of Diffusion Map [11].
Assumption. The single-cell data points x 1 , x 2 , ..., x n ∈ R d are the i.i.d. samples from the distribution ρ(x), with the Boltzmann-Gibbs form ρ(x) = e − U (x) ε .
In MuTrans, the microscopic cell-cell dynamics p(x, y) is constructed in the similar form with the diffusion map. The key procedure to construct the diffusion map is to define the weight function on the edge between data points x and y, in the scaled heatkernel form, , z δ = (4πδ) d/2 (2) and q δ (x) = ∑ y∈S k δ (x, y), here S denotes the whole sets of data points. The transition probability of diffusion-map random walk is therefore defined as where d δ,α (x) = ∑ y∈S w δ,α (x, y). The backward operator of the process can also be defined as T δ,α b ϕ(x) ≜ ∑ y p δ,α (x, y)ϕ(y). The continuum limit of diffusion map process mainly involves two limits, the largesample limit n → ∞ in the first step and the small band-width limit δ → 0 in the second step.
To find the limit of n → ∞, the central technique is the law of large numbers, stat- whereq δ = ∫ k δ (x, y)ρ(y)dy.
To find the limit of δ → 0, the following lemma is important, which can also be obtained from Laplace integral formula.

Lemma 1. For the smooth function ϕ(x), we have
where h(x) is independent of ϕ(x).
Applying the lemma to Equation (3) and collecting the leading orders, we find the O(1) term is ϕ(x), and the O(δ) term ∆(ϕρ 1−α ) When α = 0.5, we arrive at the famous conclusion by Lafon and Coifman [11], connecting the dynamics of diffusion-map random walk and over-damped Langevin equaion, Theorem 2. [11] where the limiting operator correspond to the infinitesimal generator of SDE whose invariant distribution is exactly ρ(x).
The theorem suggests if the data-points are sampled from the invariant distribution of SDE (4), then we can recover the original dynamics directly from data when inspecting the random walk defined by diffusion maps with time step O(δ). This result coincides with the intuitions to construct Brownian motion by invariant principle [14], since the space correlation scale from Gaussian kernel k δ (x, y) is O( √ δ), matching with time scale of O(δ). From this perspective, the microscopic p(x, y) constructed in our single-cell algorithm reflects the over-damped Langevin dynamics underlying the cell-fate decision process.
Next we move further by asking the following question. As discussed, MuTrans can provide with a natural way for the multi-scale reduction of diffusion map random walk. On the other hand, the classical way to directly coarse-grain the OLE dynamics (4) is from Kramers reaction rate formula, by viewing the coarse-grained dynamics as the discrete-state jump process, with transition rate which directly relates to the barrier height of U (x). Are the two coarse-graining ways equivalent, if the data is indeed sampled from the stationary distribution of SDE?
Using MuTrans to study diffusion map dynamics p δ,α , we can calculate the coarsegrained transition probability matrixP δ,α , where the transition probability from S 1 to .
For simplicity of discussion, below we restrict to the one dimensional case, and assume U (x) is the double-well potential with two local minimum at x = a and x = b, plus the saddle point at x = c. We also assume that the stable states S 1 and S 2 are accurate in MuTrans. Then as the sample number tends to infinity, we have To find the limit as δ → 0, we need the following lemma. Therefore, apply the Taylor's expansion to f (x, y) around (c, c), and note that the integral

Lemma 3. For the smooth function
Since we consider the state-transition under small noise, after taking the δ → 0 limit, we are also interested limit regarding small noise ε → 0. Applying the Laplace integral asymptotic for ρ(x) = e −ε −1 U (x) , and pluging in Equation (5), we finally arrive at

Further Discussions
Presently, the interplay between single-cell data analysis and dynamical models in computational biology can be realized in two diverging ways, namely the model-based approach and data-based approach respectively.
The model-based approach assumes the existence of a generative model (commonly a continuous dynamical systems) where the single-cell data is sampled from, and analyze the gene expression dynamics via fitting dozens of parameters in computational model.
For example, by modeling the gene regulation network with ordinary differential equation (ODE), Ocone et al. [15] proposed a framework to reconstruct the gene regulatory dynamics from single-cell snapshot data. Subsequently, SCUBA [16] involved the fitting of a bifurcating ODE system to describe the gene expression dynamics in time-series data. SCOUP [17] inferred the cell lineage and pseudotime via the parameter estimation for the conceived Ornstein-Uhlenbeck process, which was a special class of stochastic differential equation (SDE). The framework had been extended to time-series data with the consideration of more general Gaussian process dynamics [18,19,20]. Recently, partial differential equation (PDE) models were also adopted to simulate the pseudodynamics (population density evolution) in the single-cell RNAseq time series [21].
On the other hand, data-based approach directly defines stochastic dynamics on data points, yielding discrete models in both space and time. For instance, diffusion map and diffusion pseudotime [13] were among the earliest proposals to construct stochastic dynamics directly on single-cell data. The population balance analysis (PBA) method 8 [22] also utilized discrete random walk on the cellular graph to infer cell state dynamics, and discussed in detail the relationship between the constructed random walks and the continuous partial differential equation. In addition, the Topographer [23] and Palantir [24] also applied the discrete Markov chain model to gain a probabilistic understanding of the cell fate decision.
In general, the model-based methods provide more mechanism understandings, while tend to be problem-restrictive and computation-expensive; on the contrary, the databased methods usually provide more robust and scalable results, comprising with the sacrifice of model interpretability.
MuTrans combines the advantages of both approaches by constructing a data-based cell-cell random walk in the first step, and interpreting and decomposing the dynamics from a multi-scale stochastic model perspective, to perform cell clustering, lineage inference and transition cells/genes analysis in a consistent and mathematically rigorous way. 9

Supplementary Note 2: Algorithm Details
As an appendix to main text, in this part we present the MuTrans along with downstream   Transcendental algorithm and the large-scale dataset pre-processing module DECLARE with more technical details, with the framework outlined in Supplementary Fig. 2.
The central task of MuTrans is to extract three key dynamical information from single-cell transcriptome data, namely • The mutual transition probabilities among attractor basinsP K×K , • The relative cell positions within attractors (membership function) ρ k (x).
To this end, we use {S k ,P ij , ρ k (x)} to induce two random walk transition probability matrix (rwTPM) on • cluster-cluster resolution,p(x, y), which relies on S k andP ij , • cell-cluster resolution,p(x, y), which relies onP ij and ρ k (x).
Then we optimize {S k ,P ij , ρ k (x)} to ensure that the two model-based rwTPMp and p best approximate the cell-cell resolution rwTPM p, which is directly constructed from single-cell data via a diffusion-like kernel.
Finally, the downstream analysis (Transcendental) is performed on the obtained optimal {S * k ,P * ij , ρ * k (x)} to visualize the cell-fate dynamical manifold, infer the cell lineage and dissect transition cells and relevant genes.

Cell-Cell Resolution rwTPM
Cell-Fate Landscape Cell Lineage Transition Cell/Gene Analysis

Construction of Microscopic Cellular Random Walk
We begin to construct the random walk on the cell-cell level by defining a similarity function g(x, y) in Gaussian kernel form, expressed as g(x, y) = e − m 2 (x,y) is the metric between cell x and y, and σ(x) is the local standard deviation of the Gaussian kernel concentrated around cell. In our algorithm, the choice of metric m can be either Euclidean, correlation or cosine, and the value of σ can be obtained by the perplexity parameter as in tSNE. Based on the Gaussian kernel similarity, we then introduce the weight function w(x, y) between any pair of cells by making g(x, y) symmetric, i.e, w(x, y) = 1 2 (g(x, y) + g(y, x)). The weighted random walk with w(x, y) can hence be modeled as the Markov chain with transition probability p(x, y) such that Here p(x, y) denotes the probability that the cell currently in state x will switch to the state y in one step of the random walk. Such microscopic random walk yields an , which satisfies the detailed balance condition that µ(x)p(x, y) = µ(y)p(y, x). Under the random walk perspective of singlecell data, the dynamics of gene expression can be viewed as a series of consecutive stochastic switches from one cell to another, which is also consistent with the OLE dynamics (1).

Multi-scale Representation of Stochastic Dynamics
The aim of the multi-scale representation algorithm is to dissect the original, microscopic random walk dynamics {p(x, y), x, y ∈ S} into the multi-scale structure ) .
Here {P ij } i,j=1 defines a coarse-grained Markov chain on the states {S k } K k=1 , which is the partition of the microscopic space S and corresponds to the concept of attractor basins of dynamical system. The {ρ k (x), x ∈ S} K k=1 specifies the probability of cell x belongs to the attractor basin S k , determining the relative position of the cells in the holistic landscape. The underlying multi-scale structure naturally induces the dynamics on microscopic cellular level as defined below.
The Cluster-Cluster Resolution Dynamics. The coarse-grained Markov chain with the structure Here 1 S k (z) denotes the indicator function of cluster S k such that 1 S k (z) = 1 for cell z ∈ S k and 1 S k (z) = 0 otherwise. It is worth noting that, althoughp(x, y) is defined at the cell-cell level, it is indeed induced from the transitions in cluster-cluster resolution: from the cluster-cluster transition perspective, the stochastic transition from cell x ∈ S i to y ∈ S j can be decomposed into a two-stage process; the cell witnesses the switch of cellular state from cluster S i to S j with probabilityP ij in the first stage, and then pick up the cell y in cluster S j according to its relative portion at equilibrium µ(y) µ j . The Cell-Cluster Resolution Dynamics. Certain dynamics information might be lost during the coarse-graining, especially regarding the saddle points that represent the transition cells amid attractor basins, since the cluster-cluster resolution dynamics in Equation (6) assigns these cells to one specific cluster exclusively. In order to discover the transition cells in higher resolution, we introduce the membership function to quantify the relative cell positions in the clusters. The element ρ k (x) represents the probability that the cell x belongs to cluster S k and we require that for any x ∈ S. If the gene expression profile of the cell z is distinctive to the cluster S k , then ρ k (z) will approach to one as the indicator function 1 S k . On the other hand, for the cell possessing mixed lineage identities, its membership function ρ k (z) will have several significant positive components, suggesting its potential origin and destination during the transition process. The idea of membership function is sometimes referred to "soft clustering" previously [25], since the boundary among the discrete clusters are softened by assigning cells the probability belonging to certain clusters, which takes the value continuously from 0 to 1. With the introduction of {ρ k (x), x ∈ S} K k=1 , we are able to induce a refined microscopic dynamicsp[P ij , ρ k (x)] beyond Equation (6) as In the current set-up, since the membership of the cell to cluster is no longer exclusive, the transition from cell x to y can be realized in all the possible channels from any cluster S i to S j with the weight ρ i (x)ρ j (y). Therefore, the dynamics in Equation (7) recovers 13 information from the cell-cluster resolution.
The Optimal Multi-scale Representation. We aim to find the multi-scale repre- , whose induced random walksp andp optimally approximate the original data-driven dynamics p. As pointed out in [26], the natural choice to measure the distance between Markov chain dynamics is the Hilbert-Schmidt norm for the operator, defined as where µ is the invariant distribution of p. Therefore, we first formulate following the optimization problem to detect the optimal coarse-grained Markov-chain The solved optimal with the initial condition ρ 0 k (x) = 1 S * k (x) for any x ∈ S. The Coarse Graining Procedure. We aim to solve the minimization problem by the iteration scheme,P We expand the objective function in optimization problem (8) as Since the objective function yields the quadratic form with respect to {P ij } K i,j=1 , the problem can be tackled explicitlŷ We can verify that the obtainedP (t+1) ij satisfy the constrains in problem (8). The original problem turns out to be the NP-hard combinatorial optimization. To overcome such limitation, a heuristic algorithm has been proposed [26] that shared the spirit with Kmeans clustering. To present the heuristic method, we rewrite the objective function as The update of S k can be achieved by the greedy step where The Refinement Procedure. Given the coarse-grained results {S * k } K k=1 and {P * ij } K i,j=1 , the main obstacle of solving the optimal refinement representation problem (9) arises from the constraints for ρ k (x). To overcome the difficulty, we introduce the change of , then the refinement problem (9) can be tackled via the classical quasi-Newton method with respect to objective functioñ

Construction of Dynamical Manifold
To construct the dynamical manifold in single-cell analysis, challenges arise from the curse of dimensionality. For the high dimensional transcriptome dataset, the appropriate illustration of dynamical manifold depends on the reasonable low dimensional embedding of the cells that represents the ordering of cell-fate transition dynamics. To solve the issue, we define a transition coordinate ξ(x) on the two-dimension plane, based on the multi-scale dynamics property.
Our dimension reduction procedure involves two steps. In the initial step, we determine the center positions of each cell clusters, which correspond to the attractors of corresponding dynamical systems. In the second step, we assign the coordinate to each individual cell by considering the transition cell information. The initial centerdetermination step starts with an appropriate two-dimensional representation x 2D of each cell x, which can be realized by the spectral visualization of stochastic dynamics combined with MDS techniques, or simply by classical method like Diffusion Maps or tSNE. Instead of directly utilizing x 2D as the cell coordinate in landscape function, we calculate the center Y k of cluster S * k , by taking the average coordinate over cells within certain threshold of cluster membership function ρ * k (x), i.e.
The threshold h 2D allow us not only to consider the cells that tightly attracted to the fixed point in attractor basin S * k , but also to include cells that are in the transition process from or toward the attractor basin. Having determined the position of attractors, we define the two-dimensional transition coordinate ξ(x) according to the membership function ρ * k (x) in different attractor basins, such that For the cell possessing mixed identities of state S * i and S * j , its transition coordinate will lie in the middle of y i and y j . In each attractor basin S * k , we can also obtain the local covariance matrix Λ k of the transition coordinate ξ(x), i.e.
which represents the distribution of cells relevant to the basin. Next, we aim to define the landscape function that describes the global stochastic transitions among the multiple attractors in the system. We piece the local information of different attractor basins together by fitting a Gaussian mixture model with mixture weight given byμ * , the stationary distribution of coarse-grained dynamics. The probability distribution function P(z) of the mixture model can be written as is the 2-dimension Gaussian probability density function with mean y k and covariance Λ k . The landscape function ϕ is defined on the 2-dimension space for any point z such that ϕ(z) = − ln P(z) and visualized as the dynamical manifold.
Specifically, the function of individual cell x is calculated as ϕ(ξ(x)).
We argue that the landscape function ϕ and its associated dynamical manifold accommodate rich information about the multi-scale stochastic dynamics during cell fate decisions. Firstly, the re-arranged transition coordinates ξ(x) allow typical cells that are distinctive to certain cell states are positioned in the basin around corresponding attractors {Y k } K k=1 , while the transition cells are laid along the connecting path between attractors across the saddle point, according to membership function ρ * k (x). Secondly, the relative depth of the attractor basin reflect the stationary distributionμ * of coarsegrained dynamics , therefore depicting the relative stability of the cell state in the stochastic transition dynamics among different states. Moreover, the flatness of the attractor basin is determined by the local covariance matrix Λ k , which reveals the abundance and distribution of transition cells, and therefore the sharpness of cell fate switch.
Compared with the existing landscape proposals, distinctive features about our method arise in two aspects. Firstly, we resolved the curse of dimensionality issue in landscape visualization by the introduction of transition coordinates, which incorporated multi-scale dynamics information. In previous attempts, in order to visualize certain energy landscape in three-dimension space, the cells were projected onto the two-dimension plane by applying classical dimension reduction methods like PCA, tSNE or elastic embedding, which might lack dynamical interpretations. As the improvement, the transition coordinates in our method faithfully reflect the progress of cell development, by assigning and sorting the transition cells along the connecting path between their departing and targeting stable states. Secondly, the dynamical manifold in MuTrans purely serves as the descriptive tool to vividly and accurately demonstrate the dynamical information about cell fate conversion process. Some previous proposals further utilized the exact value of energy landscape to infer the underpinning dynamics, which introduced further assumptions about the dynamical models. For instance, scEpath [27] inferred the clustercluster transition probability by assuming that the equilibrium distribution yielded the Boltzmann-Gibbs form with the constructed signaling-entropy energy landscape. Meanwhile, DensityPath [28] probed the development trajectory with the hypothesis that the least action path corresponded to the geodesic curve on the density landscape. In comparison, our multi-scale representation of cell state transition dynamics and the downstream analysis were all directly and optimally derived from of the original highdimension space of gene expression, independent of the visualized dynamical manifold.
Consequently, the inference results toward the transition process by our method might be applicable to more general cases, and the relevant demonstration-aimed dynamical manifold seems to be closer to the original idea in Waddington's metaphor.

The Transcendental Procedure
Following the results of MuTrans, the Transition Cell and Relevant Analysis (Transcendental) procedure performs three tasks relevant to the transition cells • Inferring the cell lineage (transition trajectory) on cell states level; • Identifying the transition cells with the defined transition cell score (TCS); • Distinguishing the transition driver (TD), intermediate-hybrid (IH) or meta-stable (MS) genes during interested transition process.

Inference of the Cell Lineage.
Beyond the visualization of dynamical manifold, MuTrans also provides more quantitative approaches to infer the cell lineage from the transition dynamics. Based on the Mu-Trans coarse-grained reduction naturally emerges, whose nodes {V k } K k=1 denote the coarse-grained cell states {S * k } K k=1 , and nonnegative weights W ij on edge E ij can be computed from the cell state transition probabilityP * ij (described below).Given the initial attractor, a tree-like structure is then subtracted from G(V, E, W ) to represent the cell lineage during development. to every cell state, which is based on the transition path theory [29,30,31,32,33,34].
Given the sets of root and ending clusters, denote them as starting set A and the target set B, respectively. Suppose we already derive the coarse-grained cluster-cluster scale Markov chain X t with transition probabilityP.
Below we first define the core concepts in transition path theory as follows which can be found in previous literatures. [34]). For a given path {X t }, the in-transition times from set A to B are defined as the union of sets

Definition 1 (In-Transition Times
where t A n and t B n are the nth exit and entrance time of set A and B respectively such that Definition 2 (Transition Paths [34,30,31]). For a given path {X t }, the nth transition

The set of all transition paths is defined as
The probability distribution of transition paths is defined as: Definition 3 (Probability of Transition Paths [34]). The probability of observing transition path at state i is defined as Definition 5 (Committor Function [34]). The forward and backward committor functions are defined as Here P i denotes the probability of the forward process X conditioned on X 0 = i and P R i the probability of the reversed process X R conditioned on X R 0 = i.
Here q + i denotes the probability that the cell starting from cluster S i first enters set B rather than set A, and q − i the probability that the cell arriving at cluster S i came last from set A instead of B.
Committor functions can be computed via linear systems.

Proposition 5. [34] The committor functions solve the following linear equations
We can check that q − i = 1 − q + i for the detailed-balance process. Then we can have transition paths probability computed as follows. (16) can be expressed as

Proposition 6. [34] The probability of transition paths defined in
To observe the transition paths at state i, we pick it with the stationary distribution µ i , and require that the path last exit from set A and first enter the set B. This happens with the probability q − i and q + i , respectively. Similarly, we can define the probability flux of transition paths, which is important to the detection of development trajectories discussed below. It quantifies the proportion of cells that are on a transition path from A to B and moving directly from S i to S j . Definition 6 (Probability Flux of Transition Paths [34]).
We can also write f ij in terms of committor functions which is used in computation Proposition 7. [34] The probability flux of transition paths can be expressed as We therefore define the concept of development trajectories that connect different attractor basins to quantify the cell lineages.
To calculate the effective probability development trajectories, we need to eliminate the effect of detours along transition paths, and therefore define the notion of net transition paths probability flux as follows With the transition paths probability flux, we can define the capacity for each development trajectory (transition path).
Definition 9 (Capacity and Bottleneck [34]). Given the development trajectory ω dr = (i 0 , i 1 , . . . , i n ) from set A to B, its capacity is defined as By joining the most probable paths from the predetermined root state S * r to all the other states, a tree structure of graph can naturally arise, also known as the shortest path-tree in graph theory. The connecting path between S * r and any cell state S * k on the tree coincides with the most probable path from S * r toward S * k . For simplicity and intuition, we will call the computed shortest path-tree as the Most Probable Path Tree (MPPT) that reflects the cell lineage.
In the Python package of MuTrans, we use functions in PyEMMA [35] for the computations in transition path theory.

Identification of Transition Cells.
In addition to the MPPT, which describes the stochastic transitions on the cell states level, we also aim to quantify the cell state transition in the single-cell resolution.
To inspect the transition process from state S * i to S * j , we first sort the set of cells R ij ⊂ S * i ∪ S * j that are relevant to specific process through the threshold h mb of membership matrix, such that This step ensures that the cells evolved in the bifurcation to other lineage would be excluded from the analysis of S * i to S * j transition. To discover transition cells, we define a cell-specific measure, Transition Cell Score .
From the OLE dynamical model (1), τ ij remains one or zero when the cell wanders around the attractors of S * i or S * j , while τ ij transits from one to zero as the cellular state switches from S * i to S * j along the transition path. Motivated by this intuition, we arrange all the cells in R ij according to τ ij in descending order, and denote the ordered where t m = m/L. To deal with the noise in data, we also use the logistic function l(t; θ, t c ) = 1 1 + e θ(t−tc) (17) to smooth the reorderedτ ij (t), where the parameter θ reflects the steepness of transition.A larger θ corresponds to a sharper transition.
For the smoothedτ ij , we expect to observe a transition layer in its graph (Supple- well defined for these cells.

Classification of Marker Genes
Based on the calculated TCS and identified transition cells in the switch from S * i to S * j , the Transcendental further classifies the marker genes regarding transition and stable cells ( Supplementary Fig. 3). In the first step, all the differentially expressed (DE) genes between S * i and S * j are probed via standard approaches. Next, all the DE genes are classified into three types based on the following criterion: where the averagesτ ij andḡ are taken over all the transition cells.  Overall, above classification of genes resolves the previous discrepancy in defining transition cells, either from the dynamical [6] or the static [7] perspectives. We argue that the transition cells may display both transient (dynamical) and hybrid (static) features, manifesting in the expression of TD or IH genes respectively.

Directionality of Transition
In Transcendental, the directionality of transition is indicated by the tree obtained in lineage inference. We note that reverting the starting and targeting state does not change the identification results of transition cells or genes, since we have the relationship for TCS that τ ij (x) = 1 − τ ji (x) and the metric of transition cells or genes defined above are invariant under such transformation.

DECLARE Pre-Processing Module for Large-Scale Dataset
The rapid emerging of large-scale scRNA-seq datasets poses computational challenges to the trajectory analysis. Therefore we propose an optional pre-processing module here, named DECLARE (dynamics-preserving cells aggregation) to first coarse-grain the transition dynamics on the scale of microscopic attractors instead of single-cells to reduce the computational complexity.
The construction of rwTPM among microscopic attractors is inspired by the concept of coarse-graining of Markov Chain defined below.
where the limit is taken in the almost sure sense, and 1 {·} is the indicator function.
In DECLARE, we begin with the partition of the state space into M k with the clustering of cells in hundreds or thousands of clusters, using methods such as k-means or kNN network partition. Next, the coarse-grained transition probability matrix can be indeed calculated analytically instead of through numerical simulations, which is guaranteed by the following proposition:

Proposition 8. [34] The coarse-grained random walk transition probability matrix (cg-rwTPM) defined by (18) can be expressed by cell-cell transition probabilities p(x, y) and
its stationary distribution π(x), It is also straightforward to verify that the coarse-grained Markov chain X t with {T ij } has the stationary distribution µ with µ i = ∑ x∈M i π(x).
With the calculated T ij from DECLARE, we can then take it as the input to the MuTrans as the cell-cell scale rwTPM, to construct dynamical manifold and transition 26 trajectories.

The Choice of Attractor Numbers
The determination of cluster numbers remain the challenging problem in single-cell transcriptome data analysis. In MuTrans implementation, we suggest the following strategies to choose the number of attractors: • Given the constructed cell-cell rwTPM, we can calculate the eigenvalues and use as |λ 1 | > |λ 2 | > · · · > |λ N |. The k−th EPI is defined as , When a peak of the EPI is observed at index k, we reason that there is an obvious eigen-gap in the normalized graph Laplacian of the cellular network, and therefore k is a candidate for the number of clusters. Note that we may observe multiple peaks, which means that we may choose different k depends on desired our resolutions.
• For assessment, after the implementation of MuTrans, we can inspect the re-ordered rwTPM across cell-cell, cluster-cluster, and cell-cluster scales, with cells within the same attractor are aligned together. Ideally, the re-ordered rwTPMs should present clear block structures since the cellular transition probabilities within the same attractor are significantly greater than those among cells from different attractors.
• We also combine with prior biological knowledge, marker genes analysis, or compare with labels in original publication to further validate the results.

Timing and Computation Efficiency of Algorithm
In implementation, the major computation cost of MuTrans involves solving the optimization problem Eq.(9) with number of unknowns (i.e. ρ k (x)) N * k, where N is the total number of cells and k is the number of cell clusters. As shown in Supplementary  28 In practice, we can adopt the following strategies to speed up the algorithm: • Coarse-graining the data. For very large dataset, one can apply the DECLARE module to coarse-grain the data. This will decrease N and k simultaneously in the computation. Our benchmark in human HSC dataset (main text Figure 6) shows that the strategy may speed up the computation by one magnitude, while retain accurate dynamical manifold construction and trajectory analysis.

Supplementary Note 3: Data Analysis Details The Simulation Dataset
To test consistency of our method with models of multi-scale dynamics, we simulate the stochastic state-transition process based on 1)a saddle-node bifurcation model in the regime of intermediate noise level 2) the over-damped Langevin systems confined in multiple potential wells. The full scripts to simulate the data is available at https: //github.com/cliffzhou92/MuTrans-release with two Matlab live script notebooks provided for reproducible analysis results.

Saddle-node Bifurcation
The model assumed that characteristic gene expression level of the cell x t was confined by the conceived potential field V (x, λ) from the gene regulatory network, which was influenced by both the extrinsic signal λ and stochastic noise with amplitude σ.
Here we chose the potential function V (x, λ) = 100 To generate simulation data, we used Euler-Maruyama method to solve the SDEs with initial condition x 0 = 2.1038, time step 10 −5 , and took the 2,000 cells near phasetransition points on the bifurcation plot to construct rwTPM as the input to MuTrans.
The script to reproduce simulation and analysis for this dataset is available at https:// github.com/cliffzhou92/MuTrans-release/blob/main/Example/example_saddle_node. mlx

Back-and-forth Transitions in Potential Fields
To have back-and-forth transitions among potential wells, we simulated the over-damped Langevin systems with sufficiently large noise σ Here the potential filed V (x) has multiple local minimums representing different attractors of dynamical system. Specifically we chose The

31
We let noise amplitude σ change to generate different datasets, and re-run Mu-Trans to inspect the coarse-grained transition probabilities. The main text results corresponded to σ = 0.8. With the increase of noise level, the transition probabilities between different attractors (off-diagonal elements) become larger. In addition, the direct transition from attractor 1 to 3 becomes more frequent than the transitions involving attractor 2. These observations are consistent with previous studies about this triple potential-well system [37]. The script to reproduce simulation and analysis for this dataset is available at https://github.com/cliffzhou92/ MuTrans-release/blob/main/Example/example_triple_well.mlx

The EMT Dataset
The EMT data analyzed in main text was from GSE110357 with 354 cells sequenced by Smart-Seq2 platform. Prior to MuTrans analysis, we selected 1500 highly variable genes using vst method in Seurat pipeline. Following the strategy in original publication [38],
We analyzed the 1,081 cells collected in first three days of induction experiment to inspect bifurcation dynamics with data log transformed. The processed input gene expression matrix and script to reproduce all the supplementary results for iPSC data is available at https://github.com/cliffzhou92/MuTrans-release. The UMAP dimensional reduction of this data was computed in Scanpy with 60 neighbors.
In Supplementary Fig. 6, we displayed the MuTrans results with more details. In To show the robustness toward choosing the number of attractors K, we re-ran Mu-Trans with K = 7 (which is another peak shown in EPI (g)), as results displayed in Supplementary Fig. 6 (h). Interestingly, the attractors are highly consistent with Leiden clustering results in (c). The results regarding transition paths and transition entropy trend during the bifurcation is similar to K = 9 as shown in (e-f) and main text Figure   4f.
Next, in Supplementary Table 4

cells) 3) the cells that have larger membership in Pre-En attractor (called Pre-En-bifur cells). With the genes expression statistical test for each group of cells (the test also
including "negative markers" that are uniquely down-regulated), we found certain genes that might be used to predict the cells fate into either Pre-En or Pre-M attractor, and denoted them as bifur-prediction genes in PS attractor.    Figure 4h of main text) The list of MS, IH and TD genes in Pre-M to M transition of iPSC data. The genes marked with red color are top-rank, significant genes used to calculate gene expression trend in Figure 4h of main text. The selected GO terms for different groups of genes are also displayed. T  NANOG  MIXL1  SOX17  LEFTY1  NOTCH1  NOTCH2  PDGFB  PDGFA  EPCAM  GSC  GATA6  RPL35A  PTX2  TBX20  KDR  WNT4  EMILIN2  TGFB2  MYL3  GATA5  WNT11  ISL1 transition cells (

The Myelopoiesis Dataset
The Myelopoiesis data analyzed in main text was from GSE70245, containing 382 cells sequenced by Fluidigm C1 platform. For comparable analysis with labels in original publication, we downloaded the 532 feature genes identified by ICGS from https:// static-content.springer.com/esm/art%3A10.1038%2Fnature19348/MediaObjects/ 41586_2016_BFnature19348_MOESM464_ESM.xlsx as the input to MuTrans. We used tSNE dimensional reduction to remove seven outlier cells. The data matrix is then log transformed. The script for data processing can be downloaded at https://github. com/cliffzhou92/MuTrans-release/blob/main/Data/myelopoiesis_process.m.
The processed input gene expression matrix and script to reproduce all the supplementary results for EMT data is also available at https://github.com/cliffzhou92/

MuTrans-release.
We displayed the dynamical system analysis results in Supplementary Fig. 9 and dynamical manifold in Supplementary Fig. 10. For myelopoiesis dataset we used the tSNE dimension reduction results as the input 2D embedding to construct transition coordinates and the subsequent landscape. In Supplementary Fig. 10 (c-e), we compared the discrepancy between MuTrans results and the original labels generated by ICGS method in [39], and find the difference mainly stemmed from the transition cells recognized by MuTrans. The ICGS clustering tends to assign transition cells into either of the attractors during cell-fate switch.
In Supplementary Table 5 and Supplementary Fig. 11, we provided the Transcendental genes list during Multi-Lin to Gran transition as the supplementary to Figure 5c of main text. The Transcendental results for HSPC to Meg transition was also displayed in Supplementary Fig. 12 and Supplementary     Fig. 11 and Figure 5c of main text.) The list of MS, IH and TD genes in Multi-lineage to Gran transition of myelopoiesis data, with the orders of appearance corresponding to the positions from top to bottom in the figure.The genes marked with red color are top genes used to calculate gene expression trend in Figure 5c of main text.

The Lymphoid Lineage Differentiation Dataset
The lymphoid lineage differentiation dataset analyzed in main text is from GSE100037 with cells sequenced by CEL-Seq 2 platform. For quality control, we kept 2,018 cells with more than 200 genes expressed and 2000 total UMI counts. Prior to MuTrans analysis, we selected 2000 highly variable genes as the input using vst method in Seurat pipeline. The processed input gene expression matrix and script to reproduce all the supplementary results for EMT data is available at https://github.com/cliffzhou92/ MuTrans-release.
In Supplementary Fig. 13 Figure 5d of main text) The list of MS, IH and TD genes in Pro-B to Pre-B transition of lymphoid lineage data.The genes marked with red color are top genes used to calculate gene expression trend in Figure 5d of main text.

The Human HSC Differentiation Dataset
The human HSC differentiation dataset analyzed in main text is from the link https://

Supplementary Note 4: Methods Comparison Details
MuTrans provides the functions to identify and characterize transition cells from singlecell transcriptome data, and also resolves the progressing of cell-fate transitions in complex lineages. Here we compare MuTrans with other existing approaches (such as pseudotime ordering and cell-fate bias probability) for the detection of transition cells, and capacity to unravel complex cell lineages during differentiation.

Scrutinizing Bifurcation Dynamics in iPSC Dataset
As shown in Supplementary Fig. 15a We applied the existing lineage inference algorithms with the parameters and results described below. • For DPT, the neighbor parameter is set as 50. The method does not link the precursor states to mature states in the low dimension manifold ( Supplementary   Fig. 15c), and the computed pseudotime of En is also large than M.
• For PAGA, the graph layout is based on the refinement of diffusion map with max iteration number 50. The method infers the bifurcation lineage, suggesting the potential transitions between precursor and mature states, or even between the mature En and M populations ( Supplementary Fig. 15d).
• For RaceID3 and StemID2, the parameters in clustering are set as default and in lineage tree inference and projection, the number of randomization for cell positions is set as 500 and p-value cut-off for link is set as 0.1. The projected lineage tree generated by RaceID3 and StemID2 implies the transition cells between precursor 57 and mature states, however the spanning tree does not show the bifurcation lineage ( Supplementary Fig. 15e).
• For VarID, the number of principle components used is 50, the number of nearest neighbor is set as 50 and the weight alpha value is set as 0.1. All other parameters are set as default. The cluster-cluster transition probability indicated by VarID resolves the bifurcation dynamics ( Supplementary Fig. 15f).
• For FateID, the clustering is based on VarID results and the targeting fates are set as En and M respectively. All other parameters are set as default. As shown in Supplementary Fig. 15g , FateID seems not produce the expected differentiation routes.

Dissecting Complex Lineage in Myelopoiesis Dataset
In this dataset, MuTrans revealed complex branching dynamics mediated by the hub multi-lineage basin, where multiple streams of transition cells depart ( Supplementary   Fig. 16a).
• For Monocle 3, all the parameters are set as default. The method seems not identify the bifurcations into Gran/Mono or Eryth/Meg lineages. Also the lineage tree suggests the transition from multi-lineage cells to Gran/Mono lineages is very sharp, with hardly any transition cells exists (Supplementary Fig. 16b).
• For DPT, the neighbor parameter is set as 10. The low dimensional manifold or pseudo time does not resolve the bifurcations ( Supplementary Fig. 16c).
• For PAGA, the graph layout is based on the refinement of diffusion map with max iteration number 50. The method highlights multi-lineage state as the hub in cell lineage and resolves the overall bifurcations ( Supplementary Fig. 16d).
• For RaceID3 and StemID2, the parameters in clustering are set as default and in lineage tree inference and projection, the number of randomization for cell positions is set as 100 and p-value cut-off for link is set as 0.1. The projected lineage tree suggests transition cells exist between multi-lineage cells and other cell states ( Supplementary Fig. 16e).
• For VarID, the number of principle components used is 100, the number of nearest neighbor is set as 10 and the weight alpha value is set as 10. VarID can depict the overall cell lineage ( Supplementary Fig. 16f).
• For FateID, the clustering is based on VarID results and the targeting fates are set as myeolocyte and monocyte respectively. All other parameters are set as default.
• For Palantir, the number of diffusion component used is 10 and the number of waypoints is set as 70. It does not detect branches in this dataset by default setting, and its low-dimensional manifold separates multi-lineage cells into two disconnected regions (Supplementary Fig. 16h).  Table 4). Here DPT and Monocle 3 pseudotime during the processes witness the sharp switch-like transitions, resulting in very few transition cells to be detected between attractors. We also observed that the duration of pseudotime spent in stable cells is significantly shorter than that in transition cells (indicated by the Y-label in Supplementary Fig. 17). This is because DPT and Monocle 3 pseudotime is defined based on manifold distance in low dimensional space instead of cell transition dynamics. In low dimensional space, the stable cells are crowded together, while the transition cells are on the paths connecting different clusters of cells. Thus the manifold distances within stable cells are shorter than the distance along the transition path.
Hence, wide-applied pseudotime might not serve as the natural tool to inspect transition cells in this dataset. We also compared the ordering of cells produced by Fate ID cell fate probability and Palantir pseudo time. We found that the two measures change continuously during the transition, lacking resolutions to distinguish between stable and transition cells in the interested cell-fate switch.

MuTrans as a Pseudotime Method
The main functions of MuTrans focus on probing the transition cells between stable states instead of calculating the pseudotime. However, based on the transition cell analysis of MuTrans, we can also naturally define a uniform pseudotime ordering, where L k is the distance between state S k and root state in the inferred lineage. As shown in Supplementary