Main

Advances in single-cell RNA sequencing (scRNA-seq) technologies have facilitated the high-resolution dissection of the mechanisms underlying cellular differentiation and other temporal processes1,2,3. Although scRNA-seq is a destructive assay, a widely used set of computational approaches leverage the asynchronous nature of dynamical biological processes to order cells along a so-called pseudotime in the task of trajectory inference4,5,6,7. Traditional methods for trajectory inference typically require the initial state of the underlying biological process to be known and use manifold learning to determine a metric space in which distances capture changes in differentiation state.

Recently, RNA velocity has emerged as a bottom-up, mechanistic approach for the trajectory inference task. RNA velocity, which describes the change of spliced messenger RNA (mRNA) over time, makes use of concomitant detection of unspliced and spliced RNA transcripts with standard scRNA-seq protocols8. Upon estimation, RNA velocity is typically incorporated into analyses in two ways: (1) inferring a cell-specific differentiation pseudotime or (2) constructing a transition matrix inducing a Markov chain over the data to determine initial, transient and terminal subpopulations of cells9.

There are currently two popular methods for estimating RNA velocity. The first, referred to as the steady-state model, assumes (1) constant rates of transcription and degradation of RNA; (2) a single, global splicing rate8,10; (3) that the cellular dynamics reached an equilibrium in the induction phase and do not include basal transcription; and (4) gene-wise independence. The second method, referred to as the EM model, was previously described and implemented in the scVelo package11. The EM model relaxes the assumption of the system having reached a steady-state, infers the full set of transcriptional parameters and estimates a latent time per cell, per gene by formulating the problem in an expectation-maximization (EM) framework.

While these approaches for estimating RNA velocity have been successfully used to interpret single-cell dynamics12,13, they also suffer from limitations derived from their modeling assumptions and downstream usage14,15,16,17. For example, both methods lack a global notion of uncertainty. Thus, assessing the robustness of the RNA velocity estimate, or deciding to what extent velocity analysis is appropriate for a given dataset can be difficult. Although the EM model can be used to rank putative ‘driving’ genes by their likelihood, there is no direct connection between gene likelihood, visualization and correctness. For example, in the case of dentate gyrus neurogenesis, visualization of RNA velocity suggests that granule mature cells develop into their immature counterparts even though a selection of high likelihood genes suggests the reverse (correct) dynamics11.

Estimation of RNA velocity with current approaches is also tightly coupled to the parameterization of the differential equations underlying transcription. Assumptions such as constant transcription, splicing and degradation rates may be too simple to explain dynamics that arise in multi-lineage14 or even single-lineage18 cell differentiation. The methods outlined to estimate RNA velocity lack extensibility and flexibility to adapt to more complicated, real-world scenarios. Emerging technologies such as VASA-seq19, which have greater sensitivity for unspliced RNA detection, may provide sufficient signal to fit more complex models.

To address these issues, we present veloVI (velocity variational inference), a deep generative model for estimating RNA velocity. VeloVI reformulates the inference of RNA velocity via a model that shares information between all cells and genes, while learning the same quantities, namely kinetic parameters and latent time, as in the EM model. This reformulation leverages advances in deep generative modeling20, which have become integral to many single-cell omics analytical tasks such as multimodal data integration21,22, perturbation modeling23,24 and data correction25. As its output, veloVI returns an empirical posterior distribution of RNA velocity (matrix of cells by genes by posterior samples), which can be incorporated into the downstream analysis of the results. Here, we show that veloVI represents a substantial improvement over the EM model in terms of fit to the data. Additionally, it provides a layer of interpretation and model criticism lacking from previous methods while also greatly improving flexibility for model extensions.

We use veloVI to enhance analyses of velocity at the level of cells, genes and whole datasets. At the level of a cell, veloVI illuminates cell states that have directionality estimated with high uncertainty, which adds a notion of confidence to the velocity stream and highlights regions of the phenotypic manifold that warrant further investigation and more careful interpretation. We couple this analysis with a metric called velocity coherence that explains the extent to which a gene agrees/disagrees with the inferred directionality. At the level of genes and datasets, we propose a permutation-based technique using veloVI that can identify partially observed dynamics or systems in steady states. This can be used to determine the extent to which RNA velocity analysis is suitable for a particular dataset.

Finally, veloVI is an extensible framework to fit more sophisticated transcriptional models. We highlight this flexibility by extending the current transcriptional model with a time-dependent transcription rate and show how this extension can improve the model fit.

Results

Variational inference for estimating RNA velocity

VeloVI posits that the unspliced and spliced abundances of RNA for each gene in a cell are generated as a function of kinetic parameters (transcription, splicing and degradation rates), a latent time and a latent transcriptional state (induction state, repression state and their respective steady states). Additionally, veloVI posits that each gene’s latent times (per cell) are tied via a low-dimensional latent variable that we call the cell representation. These representations capture the notion that the observed state of a cell is a composition of multiple concomitant processes that together span the phenotypic manifold1. This modeling choice is justified by the observation that with the EM model, which is fit independently per gene, the inferred latent time matrix (of shape cells by genes) has a low-rank structure (but notably, not rank one; Extended Data Fig. 1).

The complete architecture of veloVI manifests as a variational autoencoder26. The encoder neural networks take the unspliced and spliced abundances of a cell as input and output the posterior parameters for the cell representation and latent transcriptional state variables. The gene-wise, state-specific, latent time is parametrized by a neural network that takes a sample of the cell representation as input. The likelihood of cellular unspliced and spliced abundances is then a function of the latent time, the kinetic rate parameters and the state assignment probabilities (Fig. 1a and Methods). The model’s parameters are optimized simultaneously using standard gradient-based procedures. After optimization, the cell-gene-specific velocity is computed as a function of the degradation rate, the splicing rate and the fitted unspliced and spliced abundances, which directly incorporate the posterior distributions over time and transcriptional state.

Fig. 1: Overview of the veloVI model.
figure 1

a, veloVI encodes the unspliced and spliced abundances into the cell representation through a neural network. This cell representation is used to further encode the transcriptional state assignment of each cell/gene. Using the cell representation as input, the decoder neural network outputs a cell-gene-state specific latent time. The likelihood is a function of the latent time, kinetic rates (rates of transcription (α), splicing (β) and degradation (γ)) and uncertainty over the state assignment. The model is optimized end-to-end using stochastic variational inference techniques. b, The variational inference formulation quantifies the intrinsic uncertainty of a velocity estimate by sampling from the posterior distribution and measuring variability around the mean velocity vector. This notion is contrasted by its extrinsic counterpart, which quantifies variation of velocity extrapolated cell states within a cell’s neighborhood defined by transcriptomic similarity. KNN, k-nearest neighbor.

As a Bayesian deep generative model, veloVI can output a posterior distribution over velocities at the cell-gene level. This distribution can be used to quantify an intrinsic uncertainty over the first-order directions that a cell can take in the gene space. In downstream analyses, velocity is often used to construct a cell–cell transition matrix that reweights the edges of a nearest-neighbors graph according to the similarity of the first-order displacement of a cell and its neighborhood8,11. By piping posterior velocity samples through this computation, we also quantify an extrinsic uncertainty, which reflects both the intrinsic uncertainty and the variability among the cell’s neighbors in gene space (Fig. 1b and Methods). In contrast, the EM model and steady-state model do not carry any explicit notion of uncertainty. Indeed, both previous models only allow evaluating an uncertainty post-hoc based on quantifying velocity variation over a cell’s neighbors9. Finally, a point estimate of the velocity averaged over samples for a cell allows veloVI’s output to be used directly in scVelo’s downstream visualization and graph construction functionalities as well as other packages building upon scVelo9,27.

veloVI improves data fit over the EM model and is stable

We performed a multifaceted analysis to evaluate veloVI’s ability to robustly fit transcriptional dynamics across a range of simulated and real datasets, comparing to both the EM model and the steady-state formulation of RNA velocity as implemented in the scVelo package11.

We first assessed each model’s ability to recover kinetic parameters in simulated data (Methods). With an increasing number of observations, veloVI outperformed the EM model and was better than the steady-state model in recovering the simulated ratio of degradation and splicing rate for each gene (Supplementary Fig. 1a). Similarly, veloVI’s inferred latent time and velocity correlated significantly better (two-sided Welch’s t-test, P < 0.001) with ground truth compared to EM estimates when simulating data with parameters previously estimated on real data (Methods and Fig. 2a). It is notable that these simulations reflect an idealized scenario as cells are simulated via the EM model generative process, which assumes gene-wise independence, induction followed by repression states and a single lineage (Methods). Nonetheless, veloVI outperforms the EM model even in these EM-favorable conditions. We also benchmarked the runtime of veloVI and EM model. For this comparison, we ran both models on subsamples of a mouse retina dataset28 containing approximately 114,000 cells. Across multiple subsamples, inference was substantially faster using veloVI compared to the EM model (Supplementary Fig. 1b). Specifically, considering 20,000 cells, veloVI achieved a fivefold speed-up.

Fig. 2: Benchmarking of velocity and latent time recovery.
figure 2

a, Accuracy of VI and EM models on recovering ground truth latent time and velocity. For each quantity, we provide results in the form of box plots (n = 1,074 genes) and scatter plots (VI model y axis, EM model x axis). Rate parameters were estimated by each model on the pancreas dataset. Following this, a subset of these parameters was used to simulate new data, which each model was fitted to. In these plots, each point corresponds to a gene, where the value is the correlation between the estimated time/velocity versus the ground truth. The color coding in the scatter plots indicates whether the simulated rate parameters were derived from the original EM or VI model fit. Box plots indicate the median (center line), interquartile range (hinges) and whiskers at 1.5× interquartile range. b, UMAP colored with FUCCI-derived cell-cycle score for the RPE1- (top) and U2OS-FUCCI cells (bottom). c, Comparison of the velocity consistency from veloVI (blue) and the EM model (orange). The cell-wise velocity consistency in case of RPE1-FUCCI cells is shown on top, their U2OS counterparts on the bottom (n = 2,793 cells, top; n = 1,146 cells, bottom). Box plots indicate the median (center line) and interquartile range (hinges). d, Comparison of veloVI’s and the EM model’s estimated velocities. The violin plots show the log-transformed ratio of each method’s velocity sign accuracy, which were computed per gene. Box plots indicate the median (center line) and interquartile range (hinges) (n = 140 genes, top; n = 395 genes, bottom). Significance was assessed with a one-sided Welch’s t-test (P < 0.001). e, Spliced abundance of PIF1 versus cell cycle score for each cell in the U2OS-FUCCI dataset (left). Estimated velocity in PIF1 using the VI (middle) and EM model (right) plotted against the cell-cycle score.

To further validate the accuracy of veloVI, we compared veloVI and the EM model on cell-cycle datasets of fluorescent ubiquitination-based cell-cycle indicator (FUCCI) RPE1 and U2OS cells13,29 as it offers orthogonal validation of directionality/time via a protein-derived cell-cycle score (Fig. 2b). To assess model performance, we first compared the local consistency of the velocity vector fields generated by each model. This consistency measure quantifies the extent to which the velocities of cells with similar transcriptomic profiles (nearest neighbors) agree and relies on the assumption that velocities change smoothly over the phenotypic manifold. Compared to the EM model, veloVI achieves a higher velocity consistency (Fig. 2c). We also tested whether the direction of the velocity at the gene level aligns with a ground truth heuristic based on the cell cycle (Methods). As before, veloVI yielded consistent results and outperformed the EM model (RPE1 (resp. U2OS), 66% (resp. 68%) genes have higher velocity sign accuracy under veloVI; Fig. 2d) significantly (one-sided Welch’s t-test, P < 0.001). As a complementary validation of these findings, we confirmed that the velocities of individual genes inferred by veloVI change more smoothly (are less noisy) with respect to the ground truth ‘time’ compared to the EM model (RPE1 (resp. U2OS), 78% (resp. 65%) genes have higher R2 under veloVI) (Fig. 2e, Supplementary Fig. 2 and Methods).

We then evaluated the stability of velocity estimates on real datasets processed with 12 different RNA abundance quantification algorithms8,28,30,31,32,33, based on previous work that highlighted general inconsistencies in velocity estimation34 (Methods). To do so, we measured the correlation of velocity of each cell between pairs of quantification flavors on five benchmarking examples, namely pancreas endocrinogenesis at embryonic day 15.5 (ref. 35) as well as datasets of spermatogenesis36, mouse developing dentate gyrus37, the prefrontal cortex of a mouse38 and 21–22-month-old mouse brains39. When aggregating these correlations for each pair of quantification algorithms, veloVI scored both a higher mean correlation and lower variance compared to the EM model. Compared to the much simpler steady-state model, veloVI tended to have a similar mean correlation, but with lower variance (Fig. 3a, Extended Data Fig. 2 and Supplementary Figs. 37).

Fig. 3: Velocity model comparison in complex biological systems.
figure 3

a, Comparison of velocity estimation when using different algorithms to quantify unspliced and spliced counts. Correlation of velocities derived from pairs of quantification algorithms and from velocities estimating using one of veloVI (VI), the EM and steady-state (SS) model are compared on the pancreas35 and spermatogenesis36 data. Box plots indicate the median (center line), interquartile range (hinges) and whiskers at 1.5× interquartile range (n = 78 pairs of quantification methods each). b, Comparison of veloVI and the EM model based on gene-wise MSE (left), cell-wise velocity consistency (middle) and gene-wise velocity Pearson correlation (right) on datasets of pancreas endocrinogenesis35 (n = 1,074 genes for MSE and velocity correlation; n = 3,696 cells for velocity consistency), hippocampus8 (n = 1,292 genes for MSE and velocity correlation; n = 18,213 for velocity consistency), forebrain8 (n = 822 genes for MSE and velocity correlation; n = 1,720 for velocity consistency) and retina43 (n = 700 genes for MSE and velocity correlation; n = 2,726 for velocity consistency). Box plots indicate the median (center line), interquartile range (hinges) and whiskers at 1.5× interquartile range. c, Velocity comparison on the level of individual genes in the pancreas dataset (Sulf2 and Top2a). For each gene, the velocity of the EM model is plotted against veloVI (left) and the gene phase portrait is given (right). Each observation is colored by its cell type as defined in previous work35.

To assess how well the inferred dynamics reflect the observed data, we computed the mean squared error (MSE) of the fit for the unspliced and spliced abundances and compared the MSE to that of the EM model on a selection of datasets (Supplementary Table 1). For each dataset, we computed the ratio of the MSE for veloVI and the EM model at the level of a gene. VeloVI had better performance for a majority of the genes in each dataset (Fig. 3b). Additionally, across all datasets, veloVI had higher velocity consistency among cells (Fig. 3b). We attribute this increase to the explicit low-dimensional modeling in veloVI that shares statistical strength across all cells and genes.

Despite sharing many model assumptions, the velocities estimated for a gene with veloVI were partially correlated on average with their EM counterpart (Fig. 3b). To highlight the differences in velocity estimation at the level of individual genes, we examined Sulf2, a marker of endocrine progenitor cells and Top2a, a cell-cycle marker, in the pancreas dataset (Fig. 3c). For both of these genes, the EM model predicted a wide range of velocities for cells that had near-zero unspliced and spliced abundances. For example, terminal beta cells had substantially positive velocity under the EM model for Sulf2 despite being located at the bottom-left of the phase portrait (defined as the scatter-plot of unspliced versus spliced abundance of a gene) and with known development occurring later than endocrine progenitors and pre-endocrine cells. In the case of veloVI, beta cells had nearly zero velocity, reflecting their belonging to the putative repression steady state for this gene. We attribute this result to veloVI’s velocity directly marginalizing over the latent cell representations, which explicitly incorporates the probability that a cell belongs to induction, repression, or their respective steady states (Methods). We observed similar results for Top2a, in which cell types without a strong cell-cycle signature and near-zero unspliced/spliced abundance had positive velocity in the EM model, but near-zero velocity using veloVI.

veloVI enables interpretable velocity analysis

We then investigated how the uncertainty in the velocity estimates of veloVI could be used to scrutinize its output, both at the level of cells (which might be incorrectly modeled) and at the level of individual genes (which might be inconsistent with the aggregated, cell-level output). We used this uncertainty to (1) measure the variability in the phenotypic directionality suggested by the velocity vector in each cell (here, intrinsic uncertainty) and (2) quantify the variability of predicted future cell states under the velocity-induced cell–cell transition matrix (here, extrinsic uncertainty; Fig. 1b and Methods).

We applied these uncertainty metrics to the pancreas dataset (Fig. 4a). We observed that the intrinsic uncertainty was elevated in ductal and Ngn3-low endocrine progenitor populations, while the extrinsic uncertainty highlighted these same populations in addition to terminal alpha and beta cells. These results demonstrate that lower intrinsic uncertainty does not necessarily preclude higher extrinsic uncertainty. While the former relies on estimating the velocity vector (which is cell-intrinsic), most velocity pipelines also account for other cells in the dataset, which presumably represent the potential past and future states of the cell, to determine cell transitions (Fig. 1b). In the case of alpha and beta cells, these cells represent terminal populations in the pancreas dataset, which may explain the high extrinsic uncertainty as there are no observed successor states. Conversely, in the case of transient cell populations, such as Ngn3-high endocrine progenitors and pre-endocrine cells, both metrics assign a low uncertainty. We attribute the low intrinsic uncertainty of these cells to the fact that their dynamics agree well with the underlying model assumptions (Extended Data Fig. 3). The addition of low extrinsic uncertainty further suggests that these cell types have clear successor populations in this dataset (Fig. 4a).

Fig. 4: Velocity uncertainty and permutation score analysis.
figure 4

a, From left to right: velocity stream, intrinsic and extrinsic uncertainty of the pancreas dataset estimated by veloVI. The left UMAP embedding is colored by cell type according to original cluster annotations11,35. Uncertainty is defined as the variance of the cosine similarity between samples of the velocity vector (intrinsic) and future cell states (extrinsic) and their respective mean (Methods). b, The corresponding cumulative distribution functions (CDFs) of the gene velocity coherence score is shown for alpha and Ngn3-high EP cells. The velocity coherence, defined for one cell and gene as the product of the velocity and the expected displacement of that cell/gene, is averaged within cell types. w.r.t., with regard to. c, The effect on the error between inferred dynamics and data when permuting unspliced and spliced abundance in the case of Top2a (top) and Sst (bottom). Coloring of cell types is according to a. d, Permutation score densities of datasets of the pancreas35, hippocampus8, forebrain8, spermatogenesis36, retina43, brain39, prefrontal cortex38, PBMCs and simulated data15 (top). Kurtosis (left) and skew (right) for each dataset (bottom).

To further understand what aspects of the data these uncertainty metrics capture, we (in silico) perturbed the pancreas dataset by either (1) downsampling the total counts of each cell to mimic changes in sequencing depth and capture efficiency; (2) subsampling unspliced counts for a subset of genes to mimic the biased capture of unspliced molecules; or (3) adding random multiplicative noise to each abundance value (Methods). We applied each perturbation at various strengths and found that for each perturbation source, the intrinsic uncertainty increased with the perturbation strength. We found a similar response for the extrinsic uncertainty except in the case of total count downsampling, which required a high strength to shift the extrinsic uncertainty (Extended Data Fig. 4). These results suggest that the uncertainty metrics can capture random noise in the data, as well as bias in how the transcripts are measured.

Finally, we asked whether we could use veloVI’s uncertainty to address the common behavior of unexpected ‘backflow’ in two-dimensional velocity visualizations; when projecting the average veloVI velocity onto a Uniform Manifold Approximation and Projection (UMAP)40 plot (using procedures from elsewhere11), we observed an incorrect ‘backflow’ of directionality in alpha and beta cells, which showed transitions toward their known progenitors. While these terminal populations have high extrinsic uncertainty according to veloVI, it remains difficult to explain which genes cause the inconsistency. In the case of scVelo, it has been proposed to use the likelihood of a gene as a proxy; however, the likelihood has no direct connection to cell–cell transition-based analyses.

To this end, we sought to score genes in each cell according to how well their velocity agrees with the predicted future cell state that is derived via the velocity-induced transition matrix (incorporating velocity information from all genes as well as gene expression in neighboring cells; Methods). We reasoned that this score, which we call velocity coherence, could help gain insight into why a particular directionality might manifest. A positive score of a gene indicates the velocity value of that gene (the time derivative of its spliced mRNA) agrees with its expression in the inferred future cell state (same direction) and likewise, a negative score indicates disagreement (Fig. 4b and Extended Data Fig. 5a).

In the alpha cells, for example, there are both positively and negatively scoring genes. Genes with a negative score, such as Gcg and Sphkap, were fit correctly by veloVI (alpha cells after pre-endocrine cells in time along the inferred trajectory on the phase portrait), but disagree with the predicted future cell state, suggesting that other genes are outweighing these genes in the transition matrix computation (Extended Data Fig. 5b). Indeed, genes such as Rnf130, Etv1 and Grb10, which had a positive score that agrees with the backflow, seemed to have been fit incorrectly (alpha cells precede pre-endocrine cells along the inferred trajectory on the phase portrait) (Extended Data Fig. 5c). The incorrect fits can putatively be explained by violated model assumptions such as a transcriptional burst in alpha cells (Rnf130), ambiguous phase portraits (Etv1) and multi-kinetics (Grb10).

Conversely, the dynamics in Ngn3-high cells are correctly visualized in the UMAP representation (Fig. 4a). We attribute this result to the presence of many genes agreeing with both the model assumptions and the predicted future state of a cell (Extended Data Fig. 5d). Compared to the 95% percentile of the coherence score in alpha cells, more than twice as many genes ranked above this threshold in the Ngn3-high cluster (135 versus 54); however, even in this case, we found that many genes were fit with incorrect dynamics for this cell type (Extended Data Fig. 5e).

Taken together, these results suggest that the visualization of dynamics on a two-dimensional embedding with previously described procedures is explained by small subsets of genes. Thus, caution is warranted when analyzing projections of velocity estimates onto a two-dimensional embedding of the data. We urge users to investigate the dynamics at the level of individual genes to identify which genes meet the model assumptions. Putative candidates are given by our proposed velocity coherence score. Additionally, to identify genes viable for RNA velocity analysis due to the presence of transient cell populations, we propose a score outlined next.

veloVI identifies insufficiently observed or steady-state dynamics

In datasets with non-differentiating, hierarchically-related cell types, spurious cell state transitions may manifest when applying RNA velocity14,15. Indeed, the underlying transcriptional likelihood model cannot readily distinguish between the case of a transient population and that of multiple steady-state populations. Therefore, we devised a procedure to use a trained veloVI model to identify genes with phase portraits that are consistent with a developmental process versus ones that are consistent with steady-state dynamics or are confounded by noise.

We reasoned that the model fit of genes showing only steady-state dynamics would be robust to a permutation of the data while the model fit of genes with transient populations would worsen. Specifically for every gene, cell type and species (spliced/unspliced) independently, we permuted the abundances of cells in a manner equivalent to shuffling cell barcodes. Subsequently, we passed this perturbed dataset through the veloVI model’s trained encoder and decoder and recorded the absolute error of the fit grouped by genes and cell types. We then used the t-test statistic to compare the mean absolute error in each cell-type-gene group between the perturbed and original dataset (Extended Data Fig. 6 and Methods). We hypothesized that the t-test statistic, capturing the effect of the permutation, would be elevated in transient populations with strong time dependence and, conversely, near-zero in steady-state populations.

In the pancreas dataset, the permutation strongly affects the ductal and Ngn3 low EP cells for the cell-cycle gene Top2a. Indeed, these cell types trace fully-observed induction and repression states for Top2a. In the case of the delta-specific gene Sst, where no such transient connection is observed, for example from ductal to pre-endocrine to delta cells, no single cell type is strongly affected when permuting (Fig. 4c). Consequently, even though Sst is essential for the identity of delta cells, the gene does not display continuous dynamics from ductal progenitor cells and, thus, does not include the necessary information to be analyzed with RNA velocity.

We then applied this procedure to a variety of datasets. In one set of tests, we used datasets describing cellular development. These datasets serve as partial positive controls as we expect directed dynamical processes, as modeled by RNA velocity, to take place in at least a subset of cells in the dataset. As negative controls, we used simulated data of bursty kinetics15 with no overall differentiation of cell state and datasets containing multiple cell types that are in steady-state. To summarize the permutation for a gene, we used the maximum permutation effect t-test statistic across cell types (permutation score). Two clusters of datasets emerged when characterizing the per-gene permutation score distribution (Fig. 4d). One cluster, with a fatter right tail (quantified by skewness and kurtosis), contained positive control datasets such as the pancreas and spermatogenesis. Despite having relatively many genes sensitive to permutation, the datasets of this cluster also contained many genes that were not sensitive, suggesting that there are likely many non-dynamical genes used for downstream analysis with RNA velocity. The other cluster, with less density in the right tail, contained negative controls such as the peripheral blood mononuclear cells (PBMCs), null-data simulation and the prefrontal cortex.

Between these two clusters of datasets, we also found a few ambiguous datasets, such as the mouse retina (positive control) and brain (negative control), which suggests that there exist some cell subsets within these datasets that are affected by the permutation and hence, possibly reflect a directed dynamical process that is appropriate for modeling with RNA velocity; however, upon closer inspection of the brain dataset, we identified mature neurons as responsible for skewing the permutation score density (Extended Data Fig. 7a). The cluster of mature neurons was singled out as it attributes for about one-third of the highest permutation scores (Extended Data Fig. 7b). For the genes with the highest permutation score, these neuronal cells exhibit a bimodal distribution in which one mode has low unspliced and spliced abundance while the other has respectively higher abundances (Extended Data Fig. 7c). Thus, we attribute this skewing to coarse labeling of this population (Extended Data Fig. 7d). When excluding mature neurons from this analysis, the distribution shifted and its key characteristics moved toward the cluster formed by the negative control cases (Extended Data Fig. 7e).

In the accompanying code to this manuscript, we provide these permutation score densities as a resource for users of RNA velocity, which will enable the datasets we analyzed here to serve as references for the score distribution and thus as a systematic approach to measure the overall transient dynamics of a dataset. For example, datasets exhibiting similar permutation score distributions as the given negative control cases (for example, via kurtosis or skew) are not suitable for RNA velocity analysis with current models.

In Supplementary Notes 1 and 2, we provide case studies outlining how veloVI can be used in practice on PBMCs (negative control) and mouse developing dentate gyrus (partial positive control). These demonstrations synthesize veloVI’s uncertainty quantification and permutation procedure along with the velocity coherence. When applying the permutation procedure, we were able to provide further evidence for the lack of transient populations in the case of PBMCs (Supplementary Note 1), as well as identify transient populations of neuroblasts and granule immature cells for many genes in dentate gyrus (Supplementary Note 2). Taken together, these results demonstrate that the permutation score is also useful for identifying cell populations that lack detectable transient dynamics.

veloVI is an extensible framework for dynamical modeling

The transcriptional model assumptions at the level of one gene (for example, constant rates that impose a specific structure of phase portraits) can be shown to be violated in many cases. For example, in the case of transcriptional bursts in which the transcription rate increases with time18 or multiple kinetics within a single gene14, the assumption of constant kinetic rates is violated. Thus, there remains a need for modeling frameworks that are extensible and support varied and more nuanced dynamical assumptions. While veloVI makes many of the same assumptions as in the EM model, it leverages black-box computational and statistical techniques that allow its generative model to be altered to include new assumptions without needing to extensively rewrite inference recipes or generally sacrifice scalability.

To explore veloVI as a general modeling framework, we adapted it to use gene-specific, time-dependent transcription rates. Under this extension, transcription rates are free to monotonically increase or decrease with respect to time14, thus allowing for modeling the acceleration of RNA abundance, which can impact the curvature of the model fit (Methods and Fig. 5a). To infer these additional parameters, only the likelihood function of veloVI needed to be adapted. Applying this modified version of veloVI to the pancreas, dentate gyrus and forebrain datasets, we observed improved fit for the majority of genes (Fig. 5b). In the case of the pancreas dataset, the added flexibility allowed veloVI to better fit genes that seem more linear in their phase portraits, for example, as it can reduce the curvature of the fitted dynamics (Fig. 5c).

Fig. 5: Extension to modeling time-dependent transcription rates.
figure 5

a, Time-dependent transcription rate for different sets of parameter values. b, Log10 MSE ratio of models with constant and time-dependent transcription rate in the case of the pancreas (n = 1,074 genes), dentate gyrus (n = 1,292 genes), forebrain (n = 822 genes) and retina (n = 700 genes). Box plots indicate the median (center line) and interquartile range (hinges). c, Gene phase portrait with inferred dynamics with a constant (solid line) or time-dependent transcription rate (dashed line) (top). Corresponding time-dependent transcription rates are shown in terms of the inferred latent time (bottom).

In the case of Smarca1, the model using a constant transcription rate inferred a downregulation (repression) of alpha cells differentiating into their progenitor populations of pre-endocrine cell and ductal cells (Fig. 5c). Contrastingly, using a time-dependent transcription rate, the upregulation of ductal to pre-endocrine to alpha cells is inferred by the generalized model. Similar observations apply to Atad2 and Cdkn1a. While the constant transcription rate model inferred the correct regulation type for Ppp1r1a, its generalized counterpart captures the underlying dynamics more accurately (Fig. 5c). Overall, for most genes, we observed a decreasing transcription rate over time (Supplementary Fig. 8).

Altogether, this exemplary model extension demonstrates the flexibility of veloVI’s modeling approach. The flexibility allows us to quickly prototype extensions and infer additional parameters within a single, consistent framework. We, thus, expect future models to benefit from such flexibility.

Discussion

Here, we reformulated the estimation of RNA velocity in a variational inference framework with veloVI. Our method compares favorably to previously proposed methods8,11 and adds actionable metrics into downstream data analyses at the cell level via uncertainty quantification and at the level of a gene and dataset with the permutation score. We believe that veloVI will facilitate more systematic analyses with RNA velocity and help reduce the strong reliance on prior knowledge to guide whether results are sensible. As an example, our permutation score could be used to filter genes that are considered for further analysis. We also note that related work has very recently incorporated deep learning with RNA velocity and we review these methods and compare them to veloVI in Supplementary Note 3.

We view this formulation of modeling transcriptional dynamics with probabilistic models and deep learning as a step toward a more rigorous pipeline that faithfully captures the biophysical phenomenon of RNA metabolism. In this work, we relied on previously described data processing approaches that smooth unspliced/spliced abundances across nearest neighbors before velocity estimation. We also borrowed many assumptions from the EM model, including, for example, the lack of explicit support for multiple diverging lineages that would result in genes reflecting a superposition of dynamical signals.

In contrast to previous models, veloVI is built in an extensible way using the scvi-tools framework21. As a proof of concept, we demonstrated that veloVI could be easily extended to use time-dependent transcription rates, which improved model fit for many genes. We anticipate that the veloVI framework will be further adapted to overcome other computational challenges including estimating velocity while accounting for batch effects, using multimodal technologies with measurements that span biology’s central dogma41,42 and directly modeling the unspliced and spliced RNA counts with count-based likelihoods. Furthermore, while veloVI’s estimated velocities are relative to a given maximum time of the process (similar as for the EM model), they are no longer relative with respect to the splicing rate as in the steady-state model. In future iterations, we anticipate including prior information from metabolic labeling data to estimate absolute velocities. We discuss these challenges, other considerations and future opportunities in Supplementary Note 4.

A philosophical challenge with RNA velocity relates to the notion that models should use bottom-up mechanistic approaches while also being general enough to be applied across a variety of biological systems, each with their own caveats and unique dynamics. In this work, we use a low-dimensional representation of a cell’s phenotypic state to capture multiple biological processes (for example, differentiation and cell cycle). More complex models likely need prior information, such as known experimental time points or cell type lineages to solve issues of statistical identifiability that arise in these more general modeling scenarios; however, incorporating such priors can contradict the usage of RNA velocity as a de novo discovery tool for the trajectory inference task. Despite all these outlined challenges, we envision that veloVI will facilitate applications of RNA velocity via uncertainty-aware analysis as well as easier model prototyping, benefiting both users and method developers.

Methods

veloVI model specification

We begin with the formulation of the ‘dynamical’ model of RNA velocity as presented by ref. 11. We posit transcriptional states k {1, 2, 3, 4}, where k = 1 indicates induction, k = 2 indicates the induction steady state, k = 3 indicates repression and k = 4 indicates the repression steady state.

Let αgk be the gene-state-specific reaction rate of transcription. Let βg be the gene-specific splicing rate constant and let γg be the gene-specific degradation rate constant. Each gene has a switching time \({t}_{g}^{s}\) when the system switches from induction phase to repression phase.

Given the solution to the ordinary differential equations11, the unspliced transcript abundance at time tng for cell n and gene g is defined as

$${\bar{u}}^{(g)}({t}_{ng},k):= {u}_{gk}^{0}{e}^{-{\beta }_{g}({t}_{ng}-{t}_{gk}^{0})}+\frac{{\alpha }_{gk}}{{\beta }_{g}}\left(1-{e}^{-{\beta }_{g}({t}_{ng}-{t}_{gk}^{0})}\right),$$
(1)

where \({t}_{gk}^{0}\) is the initial time of the system in state k. The spliced transcript abundance is defined as

$$\begin{array}{rcl}{\bar{s}}^{(g)}({t}_{ng},k)&:= &{s}_{gk}^{0}{e}^{-{\gamma }_{g}\tau }+\frac{{\alpha }_{gk}}{{\gamma }_{g}}\left(1-{e}^{-{\gamma }_{g}({t}_{ng}-{t}_{gk}^{0})}\right)\\ &&+\frac{{\alpha }_{gk}-{\beta }_{g}{u}_{gk}^{0}}{{\gamma }_{g}-{\beta }_{g}}\left({e}^{-{\gamma }_{g}({t}_{ng}-{t}_{gk}^{0})}-{e}^{-{\beta }_{g}({t}_{ng}-{t}_{gk}^{0})}\right).\end{array}$$
(2)

Induction state

For the induction state, k = 1, we have \({u}_{g1}^{0}=0\), \({s}_{g1}^{0}=0\), αg1 > 0 and \({t}_{g1}^{0}=0\). Thus, the unspliced transcript abundance can then be expressed as

$${\bar{u}}^{(g)}({t}_{ng},k=1):= \frac{{\alpha }_{g1}}{{\beta }_{g}}\left(1-{e}^{-{\beta }_{g}{t}_{ng}}\right).$$
(3)

Likewise, the spliced transcript abundance can be simplified to

$${\bar{s}}^{(g)}({t}_{ng},k=1):= \frac{{\alpha }_{g1}}{{\gamma }_{g}}\left(1-{e}^{-{\gamma }_{g}{t}_{ng}}\right)+\frac{{\alpha }_{g1}}{{\gamma }_{g}-{\beta }_{g}}\left({e}^{-{\gamma }_{g}{t}_{ng}}-{e}^{-{\beta }_{g}{t}_{ng}}\right).$$
(4)

Induction steady state

For the induction steady state, k = 2, the unspliced and spliced transcript abundances are defined as limits of the system:

$${\bar{u}}^{(g)}({t}_{ng},k=1):= \mathop{\lim }\limits_{{t}_{ng}\to \infty }{\bar{u}}^{(g)}({t}_{ng},k=1)=\frac{{\alpha }_{g1}}{{\beta }_{g}}$$
(5)
$${\bar{s}}^{(g)}({t}_{ng},k=2):= \mathop{\lim }\limits_{{t}_{ng}\to \infty }{\bar{s}}^{(g)}({t}_{sg},k=1)=\frac{{\alpha }_{g1}}{{\gamma }_{g}}.$$
(6)

Repression state

For the repression state, k = 3, we have αg3 = 0 and \({t}_{g3}^{0}={t}_{g}^{\,s}\). Thus, the number of unspliced transcripts can then be expressed as

$${\bar{u}}^{(g)}({t}_{ng},k=3):= {u}_{g3}^{0}{e}^{-{\beta }_{g}\left({t}_{ng}-{t}_{g3}^{0}\right)}.$$
(7)

Likewise, the number of spliced transcripts can be simplified to

$${\bar{s}}^{(g)}({t}_{ng},k=3):= {s}_{g3}^{0}{e}^{-{\gamma }_{g}\left({t}_{ng}-{t}_{g3}^{0}\right)}-\frac{{\beta }_{g}{u}_{g3}^{0}}{{\gamma }_{g}-{\beta }_{g}}\left({e}^{-{\gamma }_{g}\tau }-{e}^{-{\beta }_{g}({t}_{ng}-{t}_{g3}^{0})}\right).$$
(8)

The initial conditions, \({u}_{g3}^{0}\) and \({s}_{g3}^{0}\) are defined by the induction model at the switching time \({t}_{g}^{s}\), such that

$${u}_{g3}^{0}={\bar{u}}^{(g)}({t}_{sg},k=2)$$
(9)
$${s}_{g3}^{0}={\bar{s}}^{(g)}({t}_{sg},k=2).$$
(10)

Repression steady state

For the repression steady state, the limit upon which tng → , there is no expression, so we have

$${\bar{u}}^{(g)}({t}_{ng},k=4):= 0$$
(11)
$${\bar{s}}^{(g)}({t}_{ng},k=4):= 0.$$
(12)

Model assumptions

As in ref. 11, this model assumes that for one gene, at the initial time of the system, cells are first in induction phase in which both spliced and unspliced expression increases. Then cells potentially reach a steady state of this induction state. Next at some future time \({t}_{g}^{s}\) the system switches to repression state. Finally, the repression reaches a steady state in which there is no expression. Further assumptions are necessary to identify the dynamical model parameters44; thus, we assume that each gene is on the same time scale (precisely each gene has a maximum time of t = 20 as shown previously11).

veloVI generative process

We posit a generative process that takes into account the underlying dynamics of the system. Compared to Bergen et al.11, the model here does not treat each gene independently; instead, the latent time and states for each (cell and gene) pair are tied together via a local low-dimensional latent variable.

For each cell we draw a low-dimensional (d = 10 dimensions throughout this manuscript) latent variable

$${z}_{n} \sim {{{\rm{Normal}}}}(0,{I}_{d})$$
(13)

that summarizes the latent state of each cell. Next, for each gene g in cell n we draw the distribution over the state assignments as well as the state assignment itself

$${\pi }_{ng} \sim {{{\rm{Dirichlet}}}}(0.25,0.25,0.25,0.25)$$
(14)
$${k}_{ng} \sim {{{\rm{Categorical}}}}({\pi }_{ng})$$
(15)

Here πng is sampled from a Dirichlet distribution, which has the support of the probability simplex. In other words, the Dirichlet provides a distribution over discrete probability distributions. If kng = 1 (induction), then the time is a function of zn,

$${\rho }_{ng}^{(1)}={[{h}_{{{{\rm{ind}}}}}({z}_{n})]}_{g}$$
(16)
$${t}_{ng}^{(1)}={\rho }_{ng}^{(1)}{t}_{g}^{\,s}$$
(17)

where \({h}_{{{{\rm{ind}}}}}:{{\mathbb{R}}}^{d}\to {(0,1)}^{G}\) is parameterized as a fully connected neural network. Notably, this parameterization results in an induction-specific time that is constrained to be less than the switching time.

Else, if kng = 3 (repression),

$${\rho }_{ng}^{(3)}={[{h}_{{{{\rm{rep}}}}}({z}_{n})]}_{g}$$
(18)
$${t}_{ng}^{(3)}=\Big({t}_{\max}-{t}_{g}^{\,s}\Big){\rho }_{ng}^{(3)}+{t}_{g}^{\,s}$$
(19)

where \({t}_{\max }:= 20\) is used to fix the time scale across genes and identify the rate parameters of the model. Similarly to the previously defined function, \({h}_{{{{\rm{rep}}}}}:{{\mathbb{R}}}^{d}\to {(0,1)}^{G}\) and is also a neural network.

We also consider two potential steady states. If kng = 2 (induction steady state) or if kng = 4 (repression steady state), we consider the limit as time approaches , which is described in the previous section.

Finally, the observed data are sampled from normal distributions as

$${u}_{ng} \sim {{{\rm{Normal}}}}\left({\bar{u}}^{(g)}({t}_{ng}^{({k}_{ng})},{k}_{ng}),{({c}_{k}{\sigma }_{g}^{\,u})}^{2}\right)$$
(20)
$${s}_{ng} \sim {{{\rm{Normal}}}}\left({\bar{s}}^{(g)}({t}_{ng}^{({k}_{ng})},{k}_{ng}),{({c}_{k}{\sigma }_{g}^{\,s})}^{2}\right)$$
(21)

For veloVI, we consider the observed data \({\{({s}_{n},{u}_{n})\}}_{n = 1}^{N}\) to be the nearest-neighbor smoothed expression data that is also used as input to scVelo as well as velocyto. In addition, we assume the data have been preprocessed such that for each gene, the smoothed spliced and unspliced abundances are independently min-max scaled into [0, 1]. By using the normal distribution, we assume that the smoothed expression (which represents an average of random variables) has a sampling distribution centered on some mean value and that this sampling distribution is approximately normal; however, the flexibility of this modeling framework will enable extensions that consider the discrete nature of unique molecular identifiers used in standard scRNA-seq assays.

We include a state-dependent scaling factor on the variance. For all experiments in this manuscript, we used ck = 1 except for the repression steady state in which c4 = 0.1. This hyperparameter choice forces the variance of abundance in the repression steady state to be less than that of other transcriptional states, which reflects the notion that the repression steady state corresponds to zero transcriptional activity. Despite the assumption of zero transcriptional activity, the normal distribution here captures noise that arises during the experimental process (ambient transcripts) as well as during preprocessing (for example, KNN smoothing). Finally, in the following, let θ be the set of parameters of the generative process (α, β, γ, ts and neural network parameters).

veloVI inference procedure

We seek the following: (1) point estimates of the transcription rate, degradation and splicing rate constants and the switching time point; (2) point estimates of the parameters of the neural networks; and (3), a posterior distribution over the latent variables, which in this case includes z and π. Noting that the model evidence pθ(u, s) cannot be computed in closed form, we use variational inference45 to approximate the posterior distribution as well as accomplish the other tasks. Following inference, velocity can be calculated as a functional of the variational posterior distribution.

Variational posterior

We posit the following factorization on the approximate posterior distribution

$${q}_{\phi }(z,\pi \,| \,u,s):= \mathop{\prod }\limits_{n}^{N}{q}_{\phi }({z}_{n}\,| \,{u}_{n},{s}_{n})\mathop{\prod }\limits_{g}^{G}{q}_{\phi }({\pi }_{ng}\,| \,{z}_{n}),$$
(22)

in which dependencies are specified using neural networks with parameter set ϕ. Here z factorizes over all n cells and πng over all n cells and g genes.

For the likelihoods, we integrate over the choice of transcriptional state kng, such that the likelihoods for unspliced and spliced transcript abundances,

$${p}_{\theta }({u}_{ng}\,| \,{z}_{n},{\pi }_{n})=\mathop{\sum}\limits_{{k}_{ng}\in \{1,2,3,4\}}{\pi }_{ng{k}_{ng}}{{{\rm{Normal}}}}\left({\bar{u}}^{(g)}({t}_{ng}^{({k}_{ng})},{k}_{ng}),{({c}_{k}{\sigma }_{g}^{\,u})}^{2}\right)$$
(23)
$${p}_{\theta }({s}_{ng}\,| \,{z}_{n},{\pi }_{n})=\mathop{\sum}\limits_{{k}_{ng}\in \{1,2,3,4\}}{\pi }_{ng{k}_{ng}}{{{\rm{Normal}}}}\left({\bar{s}}^{(g)}({t}_{ng}^{({k}_{ng})},{k}_{ng}),{({c}_{k}{\sigma }_{g}^{\,s})}^{2}\right)$$
(24)

are mixtures of normal distributions.

Objective

The objective that is minimized during inference is composed of two terms

$${{{{\mathcal{L}}}}}_{{{{\rm{velo}}}}}(\theta ,\phi ;u,s)={{{{\mathcal{L}}}}}_{{{{\rm{elbo}}}}}(\theta ,\phi ;u,s)+\lambda {{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}(\theta ;u,s),$$
(25)

where \({{{{\mathcal{L}}}}}_{{{{\rm{elbo}}}}}\) is the negative evidence lower bound45 of \(\log {p}_{\theta }(u,s)\) and \({{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}\) is an additional penalty that regularizes the location of the transcriptional switch in the phase portrait. In more detail,

$$\begin{array}{l}{{{{\mathcal{L}}}}}_{{{{\rm{elbo}}}}}(\theta ,\phi ;u,s)=\mathop{\sum}\limits_{n}-{{\mathbb{E}}}_{{q}_{\phi }({z}_{n},{\pi }_{n}| {u}_{n},{s}_{n})}\left[\log {p}_{\theta }({u}_{n},{s}_{n}\,| \,{z}_{n},{\pi }_{n})\right] \\ \qquad\qquad\qquad\quad +{{{\rm{KL}}}}\left({q}_{\phi }({z}_{n}\,| \,{u}_{n},{s}_{n})\ \parallel \ p(z)\right)\\ \qquad\qquad\qquad\quad +{{\mathbb{E}}}_{{q}_{\phi }({z}_{n}| {u}_{n},{s}_{n})}\left[\mathop{\sum}\limits_{g}{{{\rm{KL}}}}\left({q}_{\phi }({\pi }_{ng}\,| \,{z}_{n})\ \parallel \ p({\pi }_{ng})\right)\right],\end{array}$$
(26)

which can be estimated using minibatches of data. In particular, we use randomly sampled minibatches of 256 cells for inference. For the penalty term \({{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}\), we start by only considering cells that are above the 99th percentile of unspliced abundance for each gene. Using these cells we compute the median unspliced and spliced abundance for each gene separately. Let u* and s* be the outcome of this procedure, then

$${{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}(\theta ;u,s)=\mathop{\sum}\limits_{g}{\left({u}_{g3}^{0}-{u}_{g}^{* }\right)}^{2}+{\left({s}_{g3}^{0}-{s}_{g}^{* }\right)}^{2},$$
(27)

where \({u}_{g3}^{0}\) and \({s}_{g3}^{0}\) were defined as the initial conditions of the repression phase at the switch time \({t}_{g}^{s}\).

Initialization

We initialize αg1 to be equal to the median unspliced abundance for the cells above the 99th percentile for each gene. The other global parameters, including the splicing, degradation and switch time are initialized to a constant value shared by all genes. All neural network initialization uses the default implementation in PyTorch.

Optimization

To optimize \({{{{\mathcal{L}}}}}_{{{{\rm{velo}}}}}\) we use stochastic gradients26 along with the Adam optimizer with weight decay46 as implemented in PyTorch47. For all experiments we use λ = 0.2 for scaling the regularization term in the loss. As a result of minibatching, veloVI’s memory usage is constant throughout training. Unless otherwise specified, all neural networks are fully connected feedforward networks that use standard activation functions such as ReLU for hidden layers and softplus or exponential for parameterizing non-negative distributional parameters.

Architecture

An overview of the veloVI architecture is shown in Supplementary Fig. 9.

Downstream tasks

Fitted abundance values

The fitted values (used, for example, in MSE benchmarks) for unspliced and spliced abundance are the posterior predictive mean:

$${{\mathbb{E}}}_{p({u}_{n}^{* }| {u}_{n},{s}_{n})}\left[{u}_{n}^{* }\right],\,{{\mathbb{E}}}_{p({s}_{n}^{* }| {u}_{n},{s}_{n})}\left[{s}_{n}^{* }\right],$$

where \({u}_{n}^{* }\) and \({s}_{n}^{* }\) are unobserved random variables representing posterior predictive values of unspliced and spliced abundances for cell n. The posterior predictive in the case of unspliced abundance is defined as

$$p({u}_{n}^{* }\,| \,{u}_{n},{s}_{n})=\int\,{p}_{\theta }({u}_{n}^{* }\,| \,{z}_{n},{\pi }_{n}){q}_{\phi }({z}_{n},{\pi }_{n}\,| \,{u}_{n},{s}_{n})d{\pi }_{n}d{z}_{n},$$
(28)

which uses the variational posterior distribution as a plug-in estimator for the true (unknown) posterior distribution.

We compare these fitted abundance values from veloVI to the analog of the EM model, which itself can be interpreted as a posterior predictive mean. Considering just the unspliced values, for example, the EM model posits a normal likelihood p(ungtng, kng) similar to veloVI but without the latent cell state zn and learns posterior distributions q(tngung, sng) and q(kngung, sng). Under the EM model, the posterior distributions are Dirac delta distributions and the corresponding posterior predictive is expressed as

$$p({u}_{ng}^{* }\,| \,{u}_{ng},{s}_{ng})=\int\,{p}_{\theta }({u}_{ng}^{* }\,| \,{t}_{ng},{k}_{ng}){q}_{\phi }({t}_{ng},{k}_{ng}\,| \,{u}_{ng},{s}_{ng})d{k}_{ng}d{t}_{ng}.$$
(29)

State assignment

The state assignment for each gene and cell is the approximate posterior mean

$${{\mathbb{E}}}_{{q}_{\phi }({z}_{n}| {u}_{n},{s}_{n})}\left[{{\mathbb{E}}}_{{q}_{\phi }({\pi }_{ng}| {z}_{n})}[{\pi }_{ng}]\right].$$

Gene-wise latent time

The latent time is computed for each gene and cell as

$${{\mathbb{E}}}_{{q}_{\phi }({z}_{n}| {u}_{n},{s}_{n})}\left[{{\mathbb{E}}}_{{q}_{\phi }({\pi }_{ng}| {z}_{n})}\left[{t}_{ng}^{({k}_{ng})}\right]\right],$$

where the outer expectation with respect to qϕ(znun, sn) is estimated with Monte Carlo samples, while the inner expectation is computed analytically over the transcriptional states kng.

RNA velocity

The velocity of a particular gene in a particular cell is similarly a function of the variational posterior. Recall that the velocity is computed as

$${v}^{(g)}({t}^{(k)},k):= {\left.\frac{d{\bar{s}}^{(g)}(t,k)}{dt}\right\vert }_{{t}^{(k)}}={\beta }_{g}{\bar{u}}^{(g)}({t}^{(k)},k)-{\gamma }_{g}{\bar{s}}^{(g)}({t}^{(k)},k).$$

Thus, we can compute samples of a posterior predictive velocity distribution via the following process

  1. 1.

    Sample zn from qϕ(znun, sn).

  2. 2.

    Compute \({{\mathbb{E}}}_{{q}_{\phi }({\pi }_{ng}| {z}_{n})}\left[{v}^{(g)}\left({t}_{ng}^{({k}_{ng})},{k}_{ng}\right)\right]\) for each gene.

This provides samples from a distribution over the velocity for every gene–cell pair, which we then use in downstream tasks.

Intrinsic uncertainty

Let \({\bar{v}}_{n}\) be the posterior predictive velocity mean from the procedure above. The intrinsic uncertainty is then computed as \({\mathbb{V}}{{{{\rm{ar}}}}}_{{q}_{\phi }({v}_{n}| {u}_{n},{s}_{n})}[c({v}_{n},{\bar{v}}_{n})]\) where c denotes the cosine similarity. In effect, denote by \(\{{v}_{n}^{(l)}\}_{l = 1}^{L}\) the set of L velocity vector samples of cell n from the variational posterior. Then we have:

$${\hat{\sigma}}_{n}^{2}=\frac{1}{L-1}\mathop{\sum}\limits_{l=1}^{L}{\left(\frac{{v}_{n}^{(l)}\cdot {\bar{v}}_{n}}{\parallel {v}_{n}^{(l)}\parallel \parallel {\bar{v}}_{n}\parallel}-\frac{1}{L}\mathop{\sum}\limits_{j = 1}^{L}\frac{{v}_{n}^{(j)}\cdot {\bar{v}}_{n}}{\parallel {v}_{n}^{(j)}\parallel \parallel {\bar{v}}_{n}\parallel }\right)}^{2}.$$
(30)

In this manuscript, we use L = 100 samples.

Extrinsic uncertainty

Let T(v1:N, s1:N) be a function that maps the velocity vectors and spliced abundances of the entire dataset (with n cells) to a cell–cell transition matrix computed as described previously11. Namely, this function compares the similarity of the displacement δij of nearest neighbors si and sj (defined using s1:N) to the velocity of cell i, vi, via the cosine similarity

$$\cos ({\delta }_{ij},{v}_{i})=\frac{{\delta }_{ij}^{T}{v}_{i}}{\parallel {\delta }_{ij}\parallel \parallel {v}_{i}\parallel }$$
(31)

as the basis for computing transition probabilities between pairs of cells.

Following the construction of T(v1:N, s1:N) for one sample of velocity, the predicted future cell state is computed by the matrix multiplication T(v1:N, s1:N)S, where S is the cells by genes matrix of spliced RNA abundances. These predicted future cell state vectors (over samples of velocity) then undergo the same variance computation procedure as described for the intrinsic uncertainty (namely, variance of the cosine similarity).

Time-dependent transcription rate

To highlight veloVI’s extensibility with respect to model choice, we consider the time-dependent transcription rate

$${\alpha }^{(k)}(t)=\left\{\begin{array}{ll}{\alpha }_{1}-({\alpha }_{1}-{\alpha }_{0}){e}^{-{\lambda }_{\alpha }t},\quad &k\in \{1,2\},\\ 0,\quad &k\in \{3,4\},\end{array}\right.$$
(32)

with parameters \({\alpha }_{0},{\alpha }_{1},{\lambda }_{\alpha }\in {{\mathbb{R}}}^{+}\) and k indicating the transcriptional state. The system of differential equations describing the process of splicing stays otherwise unchanged and is, thus, given by

$$\begin{array}{rcl}\dot{u}&=&{\alpha }^{(k)}(t)-\beta u\\ \dot{s}&=&\beta u-\gamma s.\end{array}$$
(33)

Consequently, it is of the general form

$$\dot{x}=Ax+g(t),$$
(34)

with dependent variable x, system matrix A, inhomogeneity g(t) and solution

$$x(t)={x}_{0}{e}^{\,A(t-{t}_{0})}+{e}^{\,At}\int\nolimits_{{t}_{0}}^{t}{e}^{-As}g(s)ds.$$
(35)

As the abundance of unspliced mRNA is modeled independently of its spliced counterpart, its solution of equation (33) can be found directly. Comparing equation (33) with equations (34) and (35), we find that x = u, A = − β, g(t) = α(k)(t). Consequently, the abundance of unspliced mRNA at time t is given by

$$\begin{array}{rcl}u(t)&=&{u}_{0}^{(k)}{e}^{-\beta {\tau }^{(k)}}+{\alpha }_{1}^{(k)}{e}^{-\beta t}\displaystyle\int\nolimits_{{t}_{0}^{(k)}}^{t}{e}^{\beta s}ds-\left({\alpha }_{1}^{(k)}-{\alpha }_{0}^{(k)}\right){e}^{-\beta t}\displaystyle\int\nolimits_{{t}_{0}^{(k)}}^{t}{e}^{\beta s}{e}^{-{\lambda }_{\alpha }^{(k)}s}ds\\ &=&{u}_{0}^{(k)}{e}^{-\beta {\tau }^{(k)}}+\frac{{\alpha }_{1}^{(k)}}{\beta }\left(1-{e}^{-\beta {\tau }^{(k)}}\right)\\ &&-\frac{{\alpha }_{1}^{(k)}-{\alpha }_{0}^{(k)}}{\beta -{\lambda }_{\alpha }^{(k)}}{e}^{-{\lambda }_{\alpha }^{(k)}{t}_{0}^{(k)}}\left({e}^{-{\lambda }_{\alpha }^{(k)}{\tau }^{(k)}}-{e}^{-\beta {\tau }^{(k)}}\right),\end{array}$$
(36)

with state-dependent initial time \({t}_{0}^{(k)}\), \({\tau }^{(k)}=t-{t}_{0}^{(k)}\) and \({u}_{0}^{(k)}=u({t}_{0}^{(k)})\).

Similarly, this allows solving for s(t), with x = s, A = − γ, g(t) = βu(t). Applying solution formula (35), the abundance of spliced mRNA at time t is given by

$$\begin{array}{rcl}s(t)&=&{s}_{0}^{(k)}{e}^{-\gamma {\tau }^{(k)}}+{e}^{-\gamma t}\displaystyle\int\nolimits_{{t}_{0}^{(k)}}^{t}{e}^{\gamma {t}^{{\prime} }}\beta u({t}^{{\prime} })d{t}^{{\prime} }\\ &=&{s}_{0}^{(k)}{e}^{-\gamma {\tau }^{(k)}}+\frac{{\alpha }_{1}^{(k)}}{\gamma }\left(1-{e}^{-\gamma {\tau }^{(k)}}\right)+\frac{{\alpha }_{1}^{(k)}-\beta {u}_{0}^{(k)}}{\gamma -\beta }\left({e}^{-\gamma {\tau }^{(k)}}-{e}^{-\beta {\tau }^{(k)}}\right)\\ &&-\frac{\beta \left({\alpha }_{1}^{(k)}-{\alpha }_{0}^{(k)}\right)}{\left(\beta -{\lambda }_{\alpha }^{(k)}\right)\left(\gamma -{\lambda }_{\alpha }^{(k)}\right)}{e}^{-{\lambda }_{\alpha }^{(k)}{t}_{0}^{(k)}}\left({e}^{-{\lambda }_{\alpha }^{(k)}{\tau }^{(k)}}-{e}^{-\gamma {\tau }^{(k)}}\right)\\ &&+\frac{\beta \left({\alpha }_{1}^{(k)}-{\alpha }_{0}^{(k)}\right)}{\left(\beta -{\lambda }_{\alpha }^{(k)}\right)(\gamma -\beta )}{e}^{-{\lambda }_{\alpha }^{(k)}{t}_{0}^{(k)}}\left({e}^{-\beta {\tau }^{(k)}}-{e}^{-\gamma {\tau }^{(k)}}\right),\end{array}$$
(37)

These new functions can be used as the mean in the veloVI likelihood, thus allowing optimization in a similar manner as described previously, with the addition of the new parameters \({\alpha }_{0},{\alpha }_{1},{\lambda }_{\alpha }\in {{\mathbb{R}}}^{+}\).

Data preprocessing

All datasets were pre-processed following the same steps. Genes with fewer than 20 unspliced or spliced counts were removed. Transcriptomic counts of each cell were normalized by their median, pre-filtered library size and the 2,000 most highly variable genes selected based on dispersion. The aforementioned steps are performed using scVelo’s11filter_and_normalize function.

Following gene filtering and count normalization, the first 30 principal components were calculated and a nearest neighbor graph with k = 30 neighbors was constructed. In a final step, counts were smoothed by the mean expression across their neighbors to compute final RNA abundances. These steps were performed by scVelo’s moments function.

To estimate RNA velocity, the preprocessed unspliced and spliced abundances were (gene-wise) min−max scaled to the unit interval. Following, the steady-state model was applied to the entire dataset. Genes for which the estimated steady-state ratio and R2 statistic are positive were considered for further analysis. If not stated otherwise, this subset of genes was used for parameter inference of veloVI and the EM model.

All datasets used, with the exception of the PBMC dataset, were obtained with spliced and unspliced RNA quantification and details can be obtained from the original publication (Supplementary Table 1). In the case of the PBMC dataset, we quantified RNA abundances using the kallisto bustools RNA velocity workflow28, using an index and defaults as described in the tutorial on the software’s website and automatically annotated via totalVI48 using the Seurat v.3 CITE-seq PBMC dataset49,50 as a reference.

Benchmarking against EM and steady-state models

VeloVI was benchmarked against the EM and steady-state model by first comparing the accuracy of inferred parameters on simulated data. For each number of observations (1,000, 2,000, 3,000, 4,000 and 5,000), we simulated ten datasets of unspliced and spliced counts with 1,000 kinetic parameter tuples (transcription rate αg, splicing rate βg, degradation rate γg) following a multivariate log-normal distribution. Latent time is Poisson distributed with a maximum of 20 h with the switch from induction to transcription, \({t}_{g}^{s}\), taking place after 2–10 h. The simulations were performed using the simulation function as implemented in scVelo11 with noise_level=0.8.

As an additional validation, we inferred kinetic rates for the pancreas data using both veloVI and the EM model. Following, we randomly sampled overall 2,000 estimated parameter tuples (transcription rate αg, splicing rate βg, degradation rate γg, switch time \({t}_{g}^{s}\)) from the union of the parameters estimated by either algorithm and simulated splicing kinetics with noise_level=1. As the data are simulated and rate parameters and time are known, the ground truth velocities are defined as well. For each model, the Spearman correlations between ground truth and inferred latent time were compared. We used the Spearman correlation as it is an order statistic. Contrastingly, in the case of velocity estimates, we relied on Pearson correlation.

To compare the runtimes of veloVI and EM model were run on random subsets a mouse retina dataset12 containing 1,000, 3,000, 5,000, 7,500, 10,000, 15,000 and 20,000 cells. The EM model was run on an Intel(R) Core i9-10900K CPU @ 3.70GHz CPU using eight cores. VeloVI was run on an Nvidia RTX3090 GPU.

In the case of real-world data, for each gene, we compared the MSE between the observed abundance and the model-predicted abundance. We did this for each of the veloVI and EM models and separately for spliced and unspliced abundances. The result is the MSE per gene, per method and per species. In the case of the EM model, the abundance prediction is directly a function of the rates, time and transcriptional state and in the case of veloVI, this is the posterior predictive mean. Additionally, for each gene, velocity estimates from the veloVI and EM models were compared through Pearson correlation.

In addition to the MSE, the model-specific velocity consistency11 was also compared. The velocity consistency c quantifies the mean Pearson correlation of the velocity v(xj) of a reference cell xj with the velocities of its neighbors \({{{{\mathcal{N}}}}}_{k}({x}_{j})\) in a KNN graph.

$$c=\frac{1}{k}\mathop{\sum}\limits_{x\in {{{{\mathcal{N}}}}}_{k}({x}_{j})}\,{{\mbox{corr}}}\,(v({x}_{j}),v(x))$$

To calculate the consistency, we rely on scVelo’s velocity_confidence function. This evaluation metric makes the assumption that better local consistency is inherently good, reflecting smooth changes in velocity over the phenotypic manifold. We note that this is a heuristic evaluation and the validity of this metric can be affected by, for example, low density of similar cell states, misspecification of the KNN graph due to only considering spliced RNA, etc.

If a ‘ground truth’ cellular ordering, for example, a cell-cycle score13,29, is given, we can make use of this source of information to estimate ‘ground truth’ velocities \(\hat{v}\) via finite differences. We estimated this heuristic by first taking the median per gene of the first-order moment smoothed, spliced RNA abundance of all cells at a given cell-cycle position pi, which we denote by \({\bar{s}}^{(i)}\). Then, assuming the pi are ordered (pi < pi+1), \({\hat{v}}^{(i)}\) is defined as

$${\hat{v}}^{(i)}\propto {\bar{s}}^{(i+1)}-{\bar{s}}^{(i)}$$
(38)

Finally, we compared the sign of all ground truth velocities with their inferred counterparts of veloVI and the EM model (which are aggregated per position in the same way) by computing the sign accuracy per gene. The sign accuracy, which is the fraction of times that the signs agree, accounts for positive velocity, negative velocity and zero velocity. As a baseline, we included a random predictor that chose positive, negative or zero velocity with equal probability. The scEU-seq cell-cycle data (RPE1-FUCCI cells)29 included, on average, 9.63 (s.d. 7.01) observations per cell cycle position and the U2OS-FUCCI13 dataset provided 1.15 (s.d. 0.36) observations per cell cycle position. In the case of the U2OS-FUCCI dataset, the ground truth ordering was derived by the original authors using a polar regression on the scatter-plot of the two FUCCI protein markers. In the case of the RPE1-FUCCI cells, the ground-truth ordering was derived by the original authors using a pseudotime method on the FUCCI protein marker values.

As an additional validation, for each gene, we fitted a GAM to the inferred velocities of the two models versus the cell-cycle score. Similarly to ref. 13, we transformed the cell-cycle score in each dataset to I = [0, 2π]. To take the periodic nature of the cell cycle into account, we fitted the GAM per gene using spliced RNA abundance sng as the response and the score as the variable, where the cell-cycle score was transformed to the range [I − 2π, I, 2π]. For each gene, a GAM with a univariate spline term for the triple of (shifted) cell-cycle positions was fitted. For each feature, 20 splines of degree three were used. For each gene, we reported the R2 score.

Stability analysis across quantification algorithms

To assess the robustness of estimation using different means of quantifying unspliced and spliced reads, we relied on previously preprocessed and published data34. The collection contains outputs of variants of the alevin32, kallisto/bustools28 velocyto8, dropEST31 and starsolo51 pipelines. For details of how the data were generated, we refer to the original work34.

To compare estimation across quantification algorithms, we first defined a reference set of genes for which to calculate RNA velocity. The set of reference genes was defined as the set of genes kept by preprocessing the data of one quantification method. In the case of the dentate gyrus data, starsolo was chosen for the quantification method, for all others velocyto. Data were pre-processed according to our described pre-processing pipeline. Counts from all other quantification approaches the same pre-processing steps were followed, except for gene filtering. To prevent the reference genes from being filtered out, they are passed to the filter_and_normalize function via the argument retain_genes.

Velocities were estimated for the steady-state model, EM model and veloVI. The velocities of the first two models were quantified using the function velocity with mode=‘deterministic’ and mode=‘dynamics’, respectively, implemented in scVelo11. For veloVI, model parameters were inferred using default parameters and mean velocities estimated from 25 samples drawn from the posterior.

To compare estimates across quantification algorithms, for each model, cell and pair of quantification algorithms, the Pearson correlation between the paired velocity estimates, was calculated. For each model, the correlation scores were aggregated by taking the mean over cells for one quantification algorithm pair to assess robustness. The distribution of this mean correlation over all quantification algorithm pairs is used for visualization in Fig. 4 and Extended Data Fig. 2.

Analysis with uncertainty quantification and velocity coherence

We used extrapolated future states Tsn of a cell to evaluate if inferred velocities are coherent. The velocity vn of a given cell n is coherent if it points in the same direction as the empirical displacement δn = Tsn − sn. Directionality is compared by calculating the Hadamard product δnvn. In case both vectors point in the same direction for a given cell, the resulting entry will be positive and negative otherwise. To aggregate the score we report its mean per gene and cell type.

To benchmark the uncertainty quantification, we started with the pancreas dataset and added one of three kinds of perturbations at various strengths. After applying each perturbation, we ran the standard veloVI pipeline and recorded the uncertainty metrics. The first perturbation consisted of downsampling the cells to X% of their original library size (thus removing (1 − X)% of their transcripts; and for unspliced and spliced separately). This was achieved with scanpy.pp.downsample_counts. The second perturbation consisted of binomial thinning of the unspliced counts with probability P (unspliced = np.random.binomial(unspliced, 1 − P)). The final perturbation was multiplicative random noise. To each spliced and unspliced abundance value (this time after library size normalization) we multiplied the value with lognormally distributed noise (np.exp(np.random.normal(0, scale))). Across all perturbations we used a common gene set that was derived from the standard veloVI pipeline; this ensures that the uncertainty values are comparable as they incorporate information across all genes.

Permutation scoring

To quantify how robust the inferred dynamics are with respect to random permutations in the input data, we define a gene- and cell-type-specific permutation effect, which is then aggregated to a gene-specific permutation score (Extended Data Fig. 6). For this analysis, we considered all highly variable genes and did not filter our genes based on estimates of the steady-state model.

To calculate the score, the unspliced and spliced abundances belonging to one (cell type, gene) pair are independently permuted (cell barcodes are shuffled independently per unspliced/spliced). Repeating over all pairs, this results in a permuted data matrix. We then estimate the model fit of the unspliced and spliced abundance for permuted data matrix (the posterior predictive mean, Supplementary Methods). Note that because veloVI can handle held-out data, computing the model fit of permuted data does not require any additional training. Finally, for each (cell type and gene) pair we compute μp and μ0, which denote the mean absolute error between the model fit abundances and the observed abundances (spliced and unspliced errors added together) for the permuted and original data matrices, respectively.

To quantify the extent to which the mean absolute errors of the two samples are not equal, we define the permutation effect as the t-test statistic

$$T=\frac{{\mu }_{p}-{\mu }_{0}}{\sqrt{2\frac{{S}^{2}}{n}}},$$

with number of cells n and pooled variance S2 of the absolute errors. To limit the effect of dataset size, we consider the maximum sample size of n = 200 observations. The permutation score is aggregated on a gene level by considering the maximum test statistic across cell types. This aggregation allows comparing the permutation score across different datasets.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.