Reconstructing growth and dynamic trajectories from single-cell transcriptomics data

Time-series single-cell RNA sequencing (scRNA-seq) datasets provide unprecedented opportunities to learn dynamic processes of cellular systems. Due to the destructive nature of sequencing, it remains challenging to link the scRNA-seq snapshots sampled at different time points. Here we present TIGON, a dynamic, unbalanced optimal transport algorithm that reconstructs dynamic trajectories and population growth simultaneously as well as the underlying gene regulatory network from multiple snapshots. To tackle the high-dimensional optimal transport problem, we introduce a deep learning method using a dimensionless formulation based on the Wasserstein–Fisher–Rao (WFR) distance. TIGON is evaluated on simulated data and compared with existing methods for its robustness and accuracy in predicting cell state transition and cell population growth. Using three scRNA-seq datasets, we show the importance of growth in the temporal inference, TIGON’s capability in reconstructing gene expression at unmeasured time points and its applications to temporal gene regulatory networks and cell–cell communication inference.

This file includes the following subsections: 1. Supplementary Figures: • Supplementary Figure 1: Trajectory inference benchmark on simulated data.
• Supplementary Figure 3: Gene regulatory network inference benchmark on simulated data.
• Supplementary Figure 5: Temporal cell-cell communication from TIGON for EMT dataset.
• Supplementary Figure 6: Information captured in dimension reduction methods with various dimensions.
• Supplementary Figure 9: TIGON's performance on EMT dataset using first ten principal components (PCs).
• Supplementary Figure 13: TIGON's performance without long-term reconstruction error for simulated dataset.
• Supplementary Figure 14: TIGON's performance when changing the weight between Wasserstein and Fisher-Rao for simulated dataset.
• Supplementary Figure 15: TIGON's performance using different number of samples and time points as input for simulated data.
• Supplementary Figure 16: Morphology of data captured by Gaussian mixture model.
• Supplementary Figure 17: Comparisons of directionality between velocity and gradient of growth.
2. Pseudo code: TIGON algorithm 3. Supplementary Notes: • Supplementary Note 1: Gene analysis utilizing reversible and differentiable dimension reduction methods • Supplementary Note 2: Quantification and evaluating metrics • Supplementary Note 3: Benchmarking TIGON against other trajectory and GRN inference methods on simulated data • Supplementary Note 4: Impacts of dimension reductions on TIGON for EMT dataset • Supplementary Note 5: TIGON using revsersible UMAP for iPSCs dataset • Supplementary Note 6: Necessity of including long-term reconstruction error in the loss function • Supplementary Note 7: Exploration of weights between Wasserstein and Fisher-Rao in Wasserstein-Fisher-Rao metric • Supplementary Note 8: Impacts of number of samples and time points in training data • Supplementary Note 9: Synergy between gradient of growth and velocity 4. Supplementary Tables: • Supplementary Table 1: Summery of GRN inference methods.
• Supplementary Table 2: List of hyperparameters used in the training process.
• Supplementary Table 3: Autoencoder (AE) architecture and hyperparameters.Supplementary Figure 3: Gene regulatory network inference benchmark on simulated data.It is supplement to figure 2j.Comparisons over 13 GRN inference methods.GRNs were inferred for transitioning cell type at five time points t = 0, 10, 20, 30, 40.Pearson and Spearman correlation quantify correlations between edge weights in the GRN from predictions and groundtruth.The area under precision-recall curve (AUPRC) and the area under the receiver operating characteristic (AUROC) quantify the binary classification accuracy in predicting GRN edge with directions and self-regulation but without signs, and weights.The average metrics over five time points are also shown.

Algorithm TIGON algorithm
Require: A series of snapshots t Preprocessing: Using Gaussian mixture model to generate density ρ t i from snapshot C i g(x,s)ds dt

Compute transport cost
Compute short-term reconstruction error Integrating backward from t i+1 to t Compute long-term reconstruction error end for Update N N 1 and N N 2 using the Adam optimizer by minimizing the Loss end for Detailed architecture of AE and its hyperparameters used in this work are given in Supplementary Table 3.

Inference of GRNs
The GRN is constructed in a directed, signed, and weighted graph with self-regulation from the regulatory matrix using the Jacobian of velocity , where ∂v i ∂x j describes the regulatory strength from source j th gene to target i th gene.This is the case when TIGON is applied to the gene expression space.If TIGON is applied to a latent space from dimension reduction, the Jacobian describes the interactive relationship between different components in the dimension reduction space.We next describe how to approximate the regulatory matrix in the gene expression space using the velocity learned in latent space.
We consider x and v as the state of a cell and its velocity in the original high-dimensional gene expression space.After dimension reduction, the location of the cell in the latent space is z = f (x) with its velocity ṽ learned from the neural network ṽ(z, t) = N N 1 (z, t) at time t, where f denotes the function of dimension reduction such as the linear map using eigenvectors of PCA and encoder in AE.We use x = h(z) and v = h(z + ṽ) − h(z) to denote the reconstructed state of a cell and its velocity in the high-dimensional gene expression space, where h denotes the reverse function of dimension reduction such as decoder in AE and the linear map using transpose of eigenvectors in PCA.
The GRN requires computing ∂v i ∂x j at time t.By using the chain rule, we have: where the two partial derivatives can be computed by the reversible and differentiable dimension reduction methods and automatic differentiation of neural networks.

Inference of contribution of each gene to growth
The contribution of each gene to growth is assessed from the gradient of growth ∇g = ∂g ∂x j d j=1 .
To approximate ∂g ∂x j from the latent space at time t where the growth is learned from the neural network g = N N 2 (z, t) on the latent space, we use the chain rule again: where ∂N N 2 (z,t) ∂z can be approximated by the automatic differentiation of neural networks, and ∂h(z) j can be computed from the reversible and differentiable dimension reduction methods.

Supplementary Note 2 Quantification and evaluating metrics Accuracy for predictions of cellular dynamics
The accuracy for cellular dynamics predictions is quantified by three quantities: accuracy for velocity, accuracy for trajectories and accuracy for ratio of population (figure 2 and Supplementary Figure 1).They are all quantified by mean squared errors (MSEs).For each cell with index i, we calculate its state x i (t), velocity v(x i , t), and density ρ(x i , t) at K discrete time points: Accuracy for velocity is calculated as MSEs for N = 400 cells for K = 21 discrete time points between ground truth and predicted velocity: Similarly, accuracy for trajectory is calculated as MSEs between ground truth and predicted trajectories: We calculate the ratio of population at each time t j as the ratio between summed densities for cells at transition state over summed densities for all cells.For time t j , we find index set I for cells that are in transition state.Then the ratio is defined as: Then the accuracy of ratio of population is calculated as MSEs between ground truth and predicted ratios: The ground truth of velocity, trajectory and cell population ratio is obtained from the three-gene model.

Accuracy in gene regulatory network (GRN) inference
GRN is given as a weighted, directed, and signed graph with self-regulation.TIGON is able to infer all features of a GRN by computing the Jacobian matrix of ODEs' right hand side.
We first followed the BEELINE benchmark [2] protocol to use area under precision-recall curve (AUPRC) and the area under the receiver operating characteristic (AUROC) to quantify the GRN inference accuracy (Supplementary Figure 3).Specifically, both AUPRC and AUROC treat GRN prediction as a binary classification problem, considering directions, signs, and self-regulation within the GRN, yet without incorporating weights.
Furthermore, we also used Pearson and Spearman correlations to quantify the accuracy of GRN predictions.Specifically, these two correlations are calculated the weights of GRN edges between predictions and ground truth.This quantity treats GRN prediction as a regression problem, considering directions, signs and self-regulation within the GRN and also weights.
To calculate ground truth GRN weights, we the Jacobian matrix of the ODEs' right hand side.In the three-gene model, the right hand side for each gene is given as the following: We excluded the noise term and the degradation term on the right hand side to compute the interaction relationships.The edge weight of GRN is obtained via the partial derivative.For example, the regulation of gene A on gene B is given as ∂A .Positive value of J AB indicates gene A activates B, negative value of J AB indicates gene A inhibits B and the absolute |J AB | indicates the strength of regulation.For GRN inference method, the predictive regulatory matrix J is also calculated.For the edge without inferred weights, we set the component to be 0 in J. Then the Pearson and Spearman correlation are used to calculate the correlations between ground truth weights and inferred GRN weights (Supplementary Figure 3).

Additional metrics for accuracy in trajectory inference
In addition to the three dynamic OT-based methods, TIGON, TrajectoryNet [3], and MIOFlow [4], we added further comparisons with several single-cell trajectory inference methods, that are inferred from pseudotime analysis.Since the pseudotime analysis provides neither trajectory along biological time nor velocity, we used three standard metrics used in a benchmark work [5].Within the benchmark work, we chosed three metrics that do not account for bifurcation, which is the scenario presented in the simulated dataset.Specifically, they are cor dist, NMSE rf and NMSE lm.cor dist quantifies the similarity in cellular positions between predicted and ground truth trajectories.NMSE rf and NMSE lm quantify the accuracy of cellular positions prediction in one trajectory using positions of cells within another trajectory.We used the cells undergoing transitions in simulated data as comparisons.The velocity inferred by three dynamic OT-based methods was first translated into likely cell transitions using function scv.tl.velocity graph in scVelo [6].The three dynamic OT-based methods were then considered as cell graph based methods using cell transitions as the input for dynbenchmark [5].

Fate probability quantification
For the lineage tracing dataset [7], each cell at day 2 has a unique trajectory inferred by TIGON.To compute the probability of their descendant cells committing to Neu fate, we first calculated the fate densities for Neu and Mo cells at day 6.We used the cell fate annotation from the original work for lineage tracving data [7] in the 2-dimensional SPRING space.For Neu cells, {d N eu 6,j } N j=1 , and Mo cells, {d M o 6,j } M j=1 , at day 6, we reconstructed fate densities for them: Here we used diagonal covariance matrix Σ = σI ∈ R 2×2 , with σ = 0.025.For an observed cells d 2,j at day 2, we integrated its inferred velocity v forward along the trajectory to predict its state at day 6: Then we utilize the predicted cell state at day 6 to calculate its Neu fate probability using the fate densities: The clonal fate probabilities from Waddington-OT (WOT) [8], population analysis (PBA) [9], and FateID [10] were calculated using the predictions precomputed and made available by the original study on lineage tracing dataset [7].

Growth inference from lineage tracing barcode or databased
For the lineage tracing data, we used the approach from [11] to estimate the value of growth g.The experimental data is measured at a set of time points t 1 , t 2 , • • • , t T .First, we classified cells into several groups with the identical barcode.With the same barcode, cells at earlier time points are the clonal progenitors of cells at later time points.For each group of cells with identical barcode k, we define a relative cell population change at t i as the ratio of cell number between t i+1 and t i , which is given as the following: Then all cells at t i with barcode k have value of growth estimated by Here we consider the clones from Mo and Neu trajectory and detected from day 2. Notice that the observed cells do not include all descendants since day 2 includes half of cells, day 4 includes 30% of the other half and day 6 includes the remaining cells.We adjusted the number of cells based on the sampling percentage to make sure that the number of cells are comparable among days.
For other datasets without lineage tracing barcode, we used the approach from [11] to estimate the growth rate g where the birth and death scores are calculated by the mean of the z-scores of genes annotated to birth (KEGG CELL CYCLE or GO:0006915) and death (KEGG APOSTOSIS or GO:0007049).The scores are then smoothed over cells after 5 iterations to obtain the birth rates b and death rates d.The growth rate is g = b − d.

Comparisons of different dimension reductions
The comparison of TIGON's performance with different dimension reductions is quantified by four quantities: consistency of velocity, growth, GRN and gradient of growth.
To compute the consistency of velocity, we projected the velocity in low-dimensional space back to the gene space, and calculated MSEs and cosine similarity between velocity inferred from different dimension reductions (DRs).Observed cells at all input time points {t 1 , • • • , t T } are used.In EMT dataset, the velocity at reduced space is mapped back to 3000-dimensional highly variable gene space.In iPSCs dataset, the velocity at reduced space is mapped back to the 96-dimensional gene space.These metrics for consistency between two DRs are defined as: MSE of velocity in gene space between DR1 and DR2 = 1 Cosine Similarity of velocity in gene space between DR1 and DR2 = 1 where N j denotes the number of observed cells at time t j and x i denotes the reconstructed gene expression from low-dimensional space.Consistency of growth inference is calculated as Pearson correlation of observed cells between two DRs.Given paired data {(x 1 , y 1 ), ..., (x n , y n )}, the Pearson correlation coefficient r 2 , where x, ȳ denotes the sample mean.The Pearson correlation for growth is then calculated as: Pearson correlation of growth between DR1 and DR2 = r g DR1 ,g DR2 .
Here, we listed values of growth at different cells and time points as a vector.The paired data for Pearson correlation calculation for two DRs ia given as {(g DR1 (x i , t j ), g DR2 (x i , t j ))} N j ,T i=1,j=1 .We further introduce the metrics for quantifying the consistency of GRNs and gradient of growth inferred from different DRs.The GRN is inferred by the regulatory matrix J(x, t) = ∂v i (x,t) on the d-dimensional gene space.To ensure the robustness, we average the regulatory matrix at all observed time points and N = 50 cells: JDR = 1 N T T j=1 N i=1 J DR (x i , t j ).Then the consistency of GRNs is calculated as Pearson correlation for regulatory matrices from two DRs for all components: Pearson correlation of GRNs between DR1 and DR2 = r JDR1 , JDR2 .
Similarly, consistency of gradient of growth is calculated as Pearson correlation for average gradient of growth ∇ḡ DR = 1 Pearson correlation of gradient of growth between DR1 and DR2 = r ∇ḡ DR1 ,∇ḡ DR2 . ( Supplementary Note 3 Benchmarking TIGON against other trajectory and GRN inference methods on simulated data This section expands comparisons between TIGON and other trajectory inference methods and GRN inference methods (figure 2g-h).
First, we compared four dynamic optimal transport (OT) based trajectory inference methods including TIGON, balanced OT, TrajectoryNet [3], and MIOFlow [4].In this work, we refer specifically balanced OT to the framework by excluding the growth term in TIGON.In the simulated dataset, there is a group of cells making transition from A to state B, and a group of quiescent cells staying at state C all the time (Supplementary Figure 1a).At the same time, transition cells divide with increasing rate leading to an increasing ratio of cell population between transition cells and quiescent cells (Supplementary Figure 1d).TIGON captures the behaviors in both transition and the cell population ratio (figure 2b and Supplementary Figure 1d).The other three balanced OT-based methods only capture one of the features.For balanced OT, it successfully captures the transition from state A to state B, however, it falsely predicts a transition from quiescent state C to state B (figure 2g).Since balanced OT cannot introduce growth, it utilizes the false transition to compensate the increasing ratio of population between transition and quiescent cells.On the other hand, TrajectoryNet and MIOFlow circumvent the false transition (Supplementary Figure 1d-e).As a trade-off, they infer unchanged ratio of population (Supplementary Figure 1c).Despite these three methods are able to correctly identify one behavior during the dynamics, TIGON has better accuracy on velocity and ratio of population predictions measured by MSEs (figure 2h-i).For the accuracy of trajectory, TrajectoryNet achieved the best performance and TIGON ranks the second.The different behaviors of velocity and trajectory accuracy between TIGON and TrajectoryNet are revealed by the velocity at quiescent state and initial cells at transition state.Specifically, Trajecto-ryNet has random velocity at quiescent state and inaccurate directions for initial cells at transition state (Supplementary Figure 1d).Those cause the inaccurate velocity predictions from Trajecto-ryNet.The inaccurate directions of velocity in TrajectoryNet may be caused by their assumption that these initial cells are coming from a Gaussian distribution [4].On the other hand, the inaccurate predictions on velocity only have minor effects on trajectory accuracy when one consider the trajectory at a long-term behavior.Indeed, TrajectoryNet shows better performance for trajectory accuracy.We further quantified the accuracy of growth predicted by TIGON.The spearman and Pearson correlations of growth are above or around 0.5 indicating the accurate growth prediction from TIGON in growth estimation (Supplementary Figure 2d).
In addition, we explored more comparisons with other trajectory inference methods with the standard metrics used in the benchmark of single-cell trajectory inference methods, i.e. pseudotime analysis [5].We first selected 17 out of 45 methods that are ranked top three among their own wrapper type reported in the benchmark work.Next, among these 17 methods, eight of them successfully produced outputs and generated valid evaluating metrics using the cells undergoing transitions in simulated data generated from our three-gene model.These eight methods include SLICER [12], Slingshot [13], MST, Component 1, Embeddr, Angle, ElPiGraph Cycle and reCAT.In addition, three dynamic OT-based methods were included for comparisons: TIGON, TrajectoryNet [3], and MIOFlow [4].We selected three standard metrics to assess the quality of trajectory that include cor dist for quantifying the similarity in cellular positions between two trajectories, and NMSE rf and NMSE lm for quantifying the accuracy of cellular positions prediction in one trajectory using positions of cells within another trajectory.By comparing with the top methods ranked in dynbenchmark [5], three dynamic OT-based methods rank middle for cor dist and NMSE lm and rank top five for NMSE rf (Supplementary Figure 1g).Furthermore, it's worth noting that the dynamic OT-based methods offer additional functionalities beyond the pseudotime-based methods.These include the application of biological time, the capability to infer velocity and single-cell trajectory, and the reconstruction of single-cell dynamics within the gene expression space.
Furthermore, we compared GRN inference in TIGON and 12 other GRN inference methods.BEELINE [2] provides a standard benchmark pipeline by implementing 13 algorithms.Applying to the simulated data, 11 of the implemented algorithms successfully generated GRNs.We also included CellOracle for comparisons [14].Among 13 methods studied, there are eight methods allowing inference of causal GRNs (Supplementary Table 1).Six of those methods use either pseudotime or biological time, if data available, for causal inference to construct GRN.Interestingly, CellOracle provides a novel approach utilizing potential causal effects between transcription factors and target genes.TIGON also has a temporal causal inference for GRNs.Although the most of methods in comparisons infer one GRN from all cells in the data, CellOracle and TIGON are only two methods that infer cell-type specific GRNs.While CellOracle cannot infer self-regulations for a giving gene.Compared to all methods, TIGON can utilize temporal information biological time to infer cell-type specific GRNs with the capability of finding a signed, directed, and weighted GRN with self-regulation (Supplementary Table 1).We evaluated all methods on simulated data for transition cells at five time points, t = 0, 10, 20, 30, 40 (Supplementary Figure 3).By classifying GRN's edges with only directions, TIGON achieves the best average AUPRC and AUROC.Further comparisons for predicting GRN's edges with directions and signs, TIGON ranks 2nd and 3rd among all methods in Pearson and Spearman correlations.Although PPCOR achieves the best performance on both correlations, it always assigned the highest weights to edges for self-regulation which is coincidentally the case for the simulated model.Moreover, PPCOR cannot distinguish the directions of regulation between source and target genes.time, it's important to note that this is a pre-training process.The pre-trained TIGON model is able to run on a personal computer for trajectory and gene analysis.To circumvent potential issues of stiff equations, we employed an adaptive step ODE solver.Using a fixed time step ODE solver could significantly trim down the training time, but the numerical stability may be a problem.Furthermore, to avoid local minima during training, we initiated the process with a large standard deviation for the Gaussian mixture model and gradually decreased it during the training phase.Optimization of the training period could also be achieved by providing suitable initial parameters, for instance, regularization constraints on velocity or growth.Such measures could notably reduce the required time for training the model.transition (figure 4).This might be the reason that EMT shows more time points with positive cosine similarity than other datasets.
From these three datasets, we clearly observed the positive cosine similarity at the transition states, suggesting a possible synergy between growth and transition processes at the early and transition stages.
However, the iPSCs dataset presents an interesting exception, as the cosine similarity remains negative throughout all the time points.This suggests possible distinct or complex dynamics in iPSCs than other datasets may involve.

Supplementary Figure 1 :Supplementary Figure 2 :
Trajectory inference benchmark on simulated data.It is supplement to figure 2. (a) Simulation results at five time points.(b) Values of growth.Data is shown for identical cells in (a).(c) Comparisons of predicted cell population ratio between transition cells and quiescent cells as inferred by TIGON, balanced OT, TrajectoryNet, and MIOFlow.The dashed line shows the ground truth for the average cell population ratio across different time points, as obtained from the simulation.Line plots show the average cell population ratio across these time points, based on n = 5 independent repeats for each of the 21 time points.In the line graph, individual data points represent the average cell population ratio for the corresponding time point.Error bars represent one standard deviation from the mean, while the shaded region surrounding the line represents the range within one standard deviation above and below the mean.(d-e) Single-cell dynamics inference from (d) TrajectoryNet and (e) MIOFlow.(f) Accuracy in predicting trajectory, which is measured by mean squared errors (MSEs).The error bars show one standard deviation above and below the mean for each method from n = 5 independent repeats.Scatter plots show the accuracy from each repeat.(g) Comparisons over 11 trajectory inference methods.Trajectories are calculated for transition cells from simulated data, that are COR dist , NMSE rf , and NMSE lm listed from left to right in three barplots.TIGON's performance on simulated data.(a-c) Gene analysis for transition cells at different time points: (a) regulatory matrix, (b) GRN in a form of weighted directed graph and (c) gradient of growth.In (b) for GRNs, pointed arrows (blunt arrows) represent the activation (inhibition) from the source gene at starting point to the target gene at the end point, and width of lines represents the regulatory strength.(d) Predictive accuracy of growth from TIGON.The results were evaluated by Spearman and Pearson correlations based on n = 10 independent repeats.Sample sizes for each time point from t = 0 to t = 40 are 400, 433, 518, 682 and 914, respectively.Boxplots show the distributions of values of growth in a five-number summary where the center line shows the median, the upper and lower limits of the box show the interquartile range (IQR), spanning from the 25th to the 75th percentiles, and upper and lower whiskers show the maximum and the minimum by excluding "outliers" outside the interquartile range.

Supplementary Figure 4 :
TIGON's performance on EMT dataset using ten-dimensional autoencoder latent space.It is supplement to figure 4. (a) Trajectories and velocity for cells at gene expression space.There are 20 cells initially sampled from the density at 0 hour, where solid dots show their snapshots at 5 time points.Circles show the observed cells from the scRNA-seq data.(b) Regulatory matrix and (c) gene regulatory network (GRN) for six EMT marker genes.(d) Regulatory matrix for top 10 (top) upregulated target genes and (bottom) downregulated target genes of an EMT transcription factor (TF) SNAI1.(e) Gradient of growth for top 10 (top) growth-related genes and top 10 (bottom) death-related genes.

Supplementary Figure 5 :Supplementary Figure 6 :Supplementary Figure 7 :Supplementary Figure 8 :Supplementary Figure 9 :eSupplementary Figure 14 :Supplementary Figure 15 :Supplementary Figure 16 :
Temporal cell-cell communication from TIGON for EMT dataset.It is supplement to figure 4h-i.(a) Barplots of information flow for all significant signaling pathways inferred by CellChat.(b) Aggregated cell-cell communication network by counting the number of links.Edge colors are consistent with the sources as sender, and edge weights are proportional to the interaction strength.Thicker edge line indicates a stronger signal.(c-f) Inferred communication networks for four signaling pathways: (c) FN1, (d) SPP1, (e) COLLAGEN, and (f) LAMININ.Information captured in dimension reduction methods with various dimensions.(a-c) Reconstruction errors from autoencoder (AE) measured by mean squared errors (MSEs) versus dimension of latent space for (a) lineage tracing dataset; (b) EMT dataset; and (c) iPSCs dataset.The line plot shows MSEs derived from n = 20 independent repeats for each dimension of latent space.Error bars represent one standard deviation from the mean.(d-f) Explained variances ratio from principal component analysis (PCA) versus number of principal components for (d) lineage tracing dataset; (e) EMT dataset; and (f) iPSCs dataset.Consistency of TIGON's performance on different dimension reductions (DRs).(a) Cosine similarity of inferred velocity for observed cells at the gene expression space for all genes.The cosine similarity was calculated for PCA and AE at different dimensions.(b) Mean squared errors (MSEs) of inferred velocity for observed cells at the gene expression space for all genes.MSEs were calculated for PCA and AE at different dimensions.(c) Pearson correlation of inferred growth of observed cells using PCA and AE at different dimensions.(d) Pearson correlation of the average gene regulatory matrices of 50 random samples at observed time points using PCA and AE at different dimensions.(e) Pearson correlation of the average gradients of growth of 50 random samples at observed time points using PCA and AE at different dimensions.TIGON's performance on EMT dataset using eight-dimensional autoencoder latent space.(a-b) Visualization of TIGON's outputs on UMAP space.(a) Trajectories of 20 cells that are initially sampled from the density at 0 hour, where solid dots show their snapshots at 5 time points.Circles show the observed cells from the scRNA-seq data.(b) Values of growth for all observed cells.(c) Trajectories and velocity for cells at gene expression space.Identical cells in (a) are shown in (c).(d) Regulatory matrix and (e) gene regulatory network (GRN) for six EMT marker genes.(f) Regulatory matrix for top 10 (top) upregulated target genes and (bottom) downregulated target genes of an EMT transcription factor (TF) SNAI1.(g) Gradient of growth for top 10 (top) growth-related genes and top 10 (bottom) death-related genes.TIGON's performance on EMT dataset using first ten principal components (PCs).(a-b) Visualization of TIGON's outputs on UMAP space.(a) Trajectories of 20 cells that are initially sampled from the density at 0 hour, where solid dots show their snapshots at 5 time points.Circles show the observed cells from the scRNA-seq data.(b) Values of growth for all observed cells.(c) Trajectories and velocity for cells at scaled gene expression space.Identical cells in (a) are shown in (c).(d) Regulatory matrix and (e) gene regulatory network (GRN) for six EMT marker genes.(f) Regulatory matrix for top 10 (top) upregulated target genes and (bottom) downregulated target genes of an EMT transcription factor (TF) SNAI1.(g) Gradient of growth for top 10 (top) growth-related genes and top 10 (bottom) death-related genes.Supplementary Figure 10: TIGON's performance on iPSCs dataset using first four principal components (PCs).(a-b) Visualization of TIGON's outputs on first two PCs (a) Trajectories of 20 cells that are initially sampled from the density at day 0, where solid dots show their snapshots at 8 time points.Circles show the observed cells.(b) Values of growth for all observed cells.(c) Trajectories and velocity for cells at scaled gene expression space.Identical cells in (a) are shown in (c).(d) Regulatory matrix and (e) gene regulatory network (GRN) for six EMT marker genes.(f) Gradient of growth for top 10 (top) growth-related genes and top 10 (bottom) death-related genes.Supplementary Figure 11: TIGON's performance on iPSCs dataset using first eight principal components (PCs).(a-b) Visualization of TIGON's outputs on first two PCs (a) Trajectories of 20 cells that are initially sampled from the density at day 0, where solid dots show their snapshots at 8 time points.Circles show the observed cells.(b) Values of growth for all observed cells.(c) Trajectories and velocity for cells at scaled gene expression space.Identical cells in (a) are shown in (c).(d) Regulatory matrix and (e) gene regulatory network (GRN) for six EMT marker genes.(f) Gradient of growth for top 10 (top) growth-related genes and top 10 (bottom) death-related genes.Supplementary Figure 12: TIGON's performance on iPSCs dataset using reversible UMAP.(a-c) Visualization of TIGON's outputs on three-dimensional UMAP space.(a-b) Velocity and trajectories of 20 cells that are initially sampled from the density at day 0, where solid dots show their snapshots at 8 time points.Circles show observed cells from the data.(c) Values of growth for 20 cells along trajectories.Identical cells were picked as in (a-b).(d) Trajectories of cells on gene expression space of three bifurcation marker genes: NANOG, HAND1, and SOX17.(e) Pearson correlation of inferred growth of observed cells from different dimension reductions.long-term reconstruction error Supplementary Figure 13: TIGON's performance without long-term reconstruction error for simulated dataset.(a) Cellular dynamics inferred by TIGON without long-term reconstruction error for cells sampled at time=0.(b) (left) Accuracy in velocity predictions, (middle) accuracy in predicting ratio of cell population between transition cells and quiescent cells and (right) accuracy in predicting trajectory, which are measured by mean squared errors (MSEs).The error bars represent one standard deviation above and below the mean from n = 5 and n = 10 independent repeats from TIOGN and TIGON without long-term reconstruction error.Scatter plots show the accuracy from each repeat.TIGON's performance when changing the weight between Wasserstein and Fisher-Rao for simulated dataset.(a-b) (top) Cellular dynamics inferred by TIGON with weight between Wasserstein and Fisher-Rao (a) α = 0.1 and (b) α = 10 for 100 cells sampled at t = 0, (bottom left) regulatory matrix and (bottom right) gradient of growth.Each dot denotes a cell colored by estimated growth rate.Arrowed line denotes velocity with length showing the speed.(c) Transport cost, sum of mean squared errors (MSEs) between density of data and estimated density at training time points, and MSE between density of data and estimated density at middle time points t = 5, 15, 25, 35 along training process.(d) (left) Accuracy in velocity predictions, (middle) accuracy in predicting ratio of cell population between transition cells and quiescent cells and (right) accuracy in predicting trajectory, which are measured by MSEs.TIGON's performance using different number of samples and time points as input for simulated data.(a) Accuracy of constructing cell densities by down-sampling samples, which is measured by mean squared errors (MSEs).The line plot shows MSEs derived from n = 20 independent repeats for each down-sampling.The shaded region around the line represents the range within one standard deviation above and below the mean.(b) Cellular dynamics inferred by TIGON by excluding data at one middle time point.Training data at t = 0, 10, 20, 30, 40, 50 was selected.(c) (left) Accuracy in velocity predictions, (middle) accuracy in predicting ratio of cell population between transition cells and quiescent cells and (right) accuracy in predicting trajectory, which are measured by MSEs.Morphology of data captured by Gaussian mixture model.Each dot denotes a sampled cell from the Gaussian mixture model with the given standard derivation and colored by time.(a) Simulated dataset.(b) Lineage tracing dataset.(Bottom) Reconstructed densities from the Gaussian mixture model with the given standard derivation at different time points.(c) EMT dataset.(d) iPSCs differentiation dataset.and gradient of growth Supplementary Figure 17: Comparisons of directionality between velocity and gradient of growth.For each dataset, cosine similarity is calculated between velocity and gradient growth for observed cells at different time points.Datasets include (a) simulated data; (b) linage tracing dataset; (c) EMT dataset; (d) iPSCs dataset.For each dataset, calculations used TIGON results at the embedding space in main text: (a) figure 2; (b) figure 3; (c) figure 4 using UMAP embedding; and (d) figure 5.

Table 1 :
[2]mery of GRN inference methods.Except TIGON and CellOracle, the rest of methods were included in BeeLine benchmark[2].

Table 2 :
List of hyperparameters used in the training process.TIGON uses two neural networks approximate velocity v(x, t) ≈ N N1(x, t) and growth rate g(x, t) ≈ N N2(x, t), respectively.