Abstract
RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot singlecell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a genespecific dynamical model of RNA metabolism and provides a transcriptomewide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI’s posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use timedependent transcription rates.
Main
Advances in singlecell RNA sequencing (scRNAseq) technologies have facilitated the highresolution dissection of the mechanisms underlying cellular differentiation and other temporal processes^{1,2,3}. Although scRNAseq is a destructive assay, a widely used set of computational approaches leverage the asynchronous nature of dynamical biological processes to order cells along a socalled pseudotime in the task of trajectory inference^{4,5,6,7}. Traditional methods for trajectory inference typically require the initial state of the underlying biological process to be known and use manifold learning to determine a metric space in which distances capture changes in differentiation state.
Recently, RNA velocity has emerged as a bottomup, mechanistic approach for the trajectory inference task. RNA velocity, which describes the change of spliced messenger RNA (mRNA) over time, makes use of concomitant detection of unspliced and spliced RNA transcripts with standard scRNAseq protocols^{8}. Upon estimation, RNA velocity is typically incorporated into analyses in two ways: (1) inferring a cellspecific differentiation pseudotime or (2) constructing a transition matrix inducing a Markov chain over the data to determine initial, transient and terminal subpopulations of cells^{9}.
There are currently two popular methods for estimating RNA velocity. The first, referred to as the steadystate model, assumes (1) constant rates of transcription and degradation of RNA; (2) a single, global splicing rate^{8,10}; (3) that the cellular dynamics reached an equilibrium in the induction phase and do not include basal transcription; and (4) genewise independence. The second method, referred to as the EM model, was previously described and implemented in the scVelo package^{11}. The EM model relaxes the assumption of the system having reached a steadystate, infers the full set of transcriptional parameters and estimates a latent time per cell, per gene by formulating the problem in an expectationmaximization (EM) framework.
While these approaches for estimating RNA velocity have been successfully used to interpret singlecell dynamics^{12,13}, they also suffer from limitations derived from their modeling assumptions and downstream usage^{14,15,16,17}. For example, both methods lack a global notion of uncertainty. Thus, assessing the robustness of the RNA velocity estimate, or deciding to what extent velocity analysis is appropriate for a given dataset can be difficult. Although the EM model can be used to rank putative ‘driving’ genes by their likelihood, there is no direct connection between gene likelihood, visualization and correctness. For example, in the case of dentate gyrus neurogenesis, visualization of RNA velocity suggests that granule mature cells develop into their immature counterparts even though a selection of high likelihood genes suggests the reverse (correct) dynamics^{11}.
Estimation of RNA velocity with current approaches is also tightly coupled to the parameterization of the differential equations underlying transcription. Assumptions such as constant transcription, splicing and degradation rates may be too simple to explain dynamics that arise in multilineage^{14} or even singlelineage^{18} cell differentiation. The methods outlined to estimate RNA velocity lack extensibility and flexibility to adapt to more complicated, realworld scenarios. Emerging technologies such as VASAseq^{19}, which have greater sensitivity for unspliced RNA detection, may provide sufficient signal to fit more complex models.
To address these issues, we present veloVI (velocity variational inference), a deep generative model for estimating RNA velocity. VeloVI reformulates the inference of RNA velocity via a model that shares information between all cells and genes, while learning the same quantities, namely kinetic parameters and latent time, as in the EM model. This reformulation leverages advances in deep generative modeling^{20}, which have become integral to many singlecell omics analytical tasks such as multimodal data integration^{21,22}, perturbation modeling^{23,24} and data correction^{25}. As its output, veloVI returns an empirical posterior distribution of RNA velocity (matrix of cells by genes by posterior samples), which can be incorporated into the downstream analysis of the results. Here, we show that veloVI represents a substantial improvement over the EM model in terms of fit to the data. Additionally, it provides a layer of interpretation and model criticism lacking from previous methods while also greatly improving flexibility for model extensions.
We use veloVI to enhance analyses of velocity at the level of cells, genes and whole datasets. At the level of a cell, veloVI illuminates cell states that have directionality estimated with high uncertainty, which adds a notion of confidence to the velocity stream and highlights regions of the phenotypic manifold that warrant further investigation and more careful interpretation. We couple this analysis with a metric called velocity coherence that explains the extent to which a gene agrees/disagrees with the inferred directionality. At the level of genes and datasets, we propose a permutationbased technique using veloVI that can identify partially observed dynamics or systems in steady states. This can be used to determine the extent to which RNA velocity analysis is suitable for a particular dataset.
Finally, veloVI is an extensible framework to fit more sophisticated transcriptional models. We highlight this flexibility by extending the current transcriptional model with a timedependent transcription rate and show how this extension can improve the model fit.
Results
Variational inference for estimating RNA velocity
VeloVI posits that the unspliced and spliced abundances of RNA for each gene in a cell are generated as a function of kinetic parameters (transcription, splicing and degradation rates), a latent time and a latent transcriptional state (induction state, repression state and their respective steady states). Additionally, veloVI posits that each gene’s latent times (per cell) are tied via a lowdimensional latent variable that we call the cell representation. These representations capture the notion that the observed state of a cell is a composition of multiple concomitant processes that together span the phenotypic manifold^{1}. This modeling choice is justified by the observation that with the EM model, which is fit independently per gene, the inferred latent time matrix (of shape cells by genes) has a lowrank structure (but notably, not rank one; Extended Data Fig. 1).
The complete architecture of veloVI manifests as a variational autoencoder^{26}. The encoder neural networks take the unspliced and spliced abundances of a cell as input and output the posterior parameters for the cell representation and latent transcriptional state variables. The genewise, statespecific, latent time is parametrized by a neural network that takes a sample of the cell representation as input. The likelihood of cellular unspliced and spliced abundances is then a function of the latent time, the kinetic rate parameters and the state assignment probabilities (Fig. 1a and Methods). The model’s parameters are optimized simultaneously using standard gradientbased procedures. After optimization, the cellgenespecific velocity is computed as a function of the degradation rate, the splicing rate and the fitted unspliced and spliced abundances, which directly incorporate the posterior distributions over time and transcriptional state.
As a Bayesian deep generative model, veloVI can output a posterior distribution over velocities at the cellgene level. This distribution can be used to quantify an intrinsic uncertainty over the firstorder directions that a cell can take in the gene space. In downstream analyses, velocity is often used to construct a cell–cell transition matrix that reweights the edges of a nearestneighbors graph according to the similarity of the firstorder displacement of a cell and its neighborhood^{8,11}. By piping posterior velocity samples through this computation, we also quantify an extrinsic uncertainty, which reflects both the intrinsic uncertainty and the variability among the cell’s neighbors in gene space (Fig. 1b and Methods). In contrast, the EM model and steadystate model do not carry any explicit notion of uncertainty. Indeed, both previous models only allow evaluating an uncertainty posthoc based on quantifying velocity variation over a cell’s neighbors^{9}. Finally, a point estimate of the velocity averaged over samples for a cell allows veloVI’s output to be used directly in scVelo’s downstream visualization and graph construction functionalities as well as other packages building upon scVelo^{9,27}.
veloVI improves data fit over the EM model and is stable
We performed a multifaceted analysis to evaluate veloVI’s ability to robustly fit transcriptional dynamics across a range of simulated and real datasets, comparing to both the EM model and the steadystate formulation of RNA velocity as implemented in the scVelo package^{11}.
We first assessed each model’s ability to recover kinetic parameters in simulated data (Methods). With an increasing number of observations, veloVI outperformed the EM model and was better than the steadystate model in recovering the simulated ratio of degradation and splicing rate for each gene (Supplementary Fig. 1a). Similarly, veloVI’s inferred latent time and velocity correlated significantly better (twosided Welch’s ttest, P < 0.001) with ground truth compared to EM estimates when simulating data with parameters previously estimated on real data (Methods and Fig. 2a). It is notable that these simulations reflect an idealized scenario as cells are simulated via the EM model generative process, which assumes genewise independence, induction followed by repression states and a single lineage (Methods). Nonetheless, veloVI outperforms the EM model even in these EMfavorable conditions. We also benchmarked the runtime of veloVI and EM model. For this comparison, we ran both models on subsamples of a mouse retina dataset^{28} containing approximately 114,000 cells. Across multiple subsamples, inference was substantially faster using veloVI compared to the EM model (Supplementary Fig. 1b). Specifically, considering 20,000 cells, veloVI achieved a fivefold speedup.
To further validate the accuracy of veloVI, we compared veloVI and the EM model on cellcycle datasets of fluorescent ubiquitinationbased cellcycle indicator (FUCCI) RPE1 and U2OS cells^{13,29} as it offers orthogonal validation of directionality/time via a proteinderived cellcycle score (Fig. 2b). To assess model performance, we first compared the local consistency of the velocity vector fields generated by each model. This consistency measure quantifies the extent to which the velocities of cells with similar transcriptomic profiles (nearest neighbors) agree and relies on the assumption that velocities change smoothly over the phenotypic manifold. Compared to the EM model, veloVI achieves a higher velocity consistency (Fig. 2c). We also tested whether the direction of the velocity at the gene level aligns with a ground truth heuristic based on the cell cycle (Methods). As before, veloVI yielded consistent results and outperformed the EM model (RPE1 (resp. U2OS), 66% (resp. 68%) genes have higher velocity sign accuracy under veloVI; Fig. 2d) significantly (onesided Welch’s ttest, P < 0.001). As a complementary validation of these findings, we confirmed that the velocities of individual genes inferred by veloVI change more smoothly (are less noisy) with respect to the ground truth ‘time’ compared to the EM model (RPE1 (resp. U2OS), 78% (resp. 65%) genes have higher R^{2} under veloVI) (Fig. 2e, Supplementary Fig. 2 and Methods).
We then evaluated the stability of velocity estimates on real datasets processed with 12 different RNA abundance quantification algorithms^{8,28,30,31,32,33}, based on previous work that highlighted general inconsistencies in velocity estimation^{34} (Methods). To do so, we measured the correlation of velocity of each cell between pairs of quantification flavors on five benchmarking examples, namely pancreas endocrinogenesis at embryonic day 15.5 (ref. ^{35}) as well as datasets of spermatogenesis^{36}, mouse developing dentate gyrus^{37}, the prefrontal cortex of a mouse^{38} and 21–22monthold mouse brains^{39}. When aggregating these correlations for each pair of quantification algorithms, veloVI scored both a higher mean correlation and lower variance compared to the EM model. Compared to the much simpler steadystate model, veloVI tended to have a similar mean correlation, but with lower variance (Fig. 3a, Extended Data Fig. 2 and Supplementary Figs. 3–7).
To assess how well the inferred dynamics reflect the observed data, we computed the mean squared error (MSE) of the fit for the unspliced and spliced abundances and compared the MSE to that of the EM model on a selection of datasets (Supplementary Table 1). For each dataset, we computed the ratio of the MSE for veloVI and the EM model at the level of a gene. VeloVI had better performance for a majority of the genes in each dataset (Fig. 3b). Additionally, across all datasets, veloVI had higher velocity consistency among cells (Fig. 3b). We attribute this increase to the explicit lowdimensional modeling in veloVI that shares statistical strength across all cells and genes.
Despite sharing many model assumptions, the velocities estimated for a gene with veloVI were partially correlated on average with their EM counterpart (Fig. 3b). To highlight the differences in velocity estimation at the level of individual genes, we examined Sulf2, a marker of endocrine progenitor cells and Top2a, a cellcycle marker, in the pancreas dataset (Fig. 3c). For both of these genes, the EM model predicted a wide range of velocities for cells that had nearzero unspliced and spliced abundances. For example, terminal beta cells had substantially positive velocity under the EM model for Sulf2 despite being located at the bottomleft of the phase portrait (defined as the scatterplot of unspliced versus spliced abundance of a gene) and with known development occurring later than endocrine progenitors and preendocrine cells. In the case of veloVI, beta cells had nearly zero velocity, reflecting their belonging to the putative repression steady state for this gene. We attribute this result to veloVI’s velocity directly marginalizing over the latent cell representations, which explicitly incorporates the probability that a cell belongs to induction, repression, or their respective steady states (Methods). We observed similar results for Top2a, in which cell types without a strong cellcycle signature and nearzero unspliced/spliced abundance had positive velocity in the EM model, but nearzero velocity using veloVI.
veloVI enables interpretable velocity analysis
We then investigated how the uncertainty in the velocity estimates of veloVI could be used to scrutinize its output, both at the level of cells (which might be incorrectly modeled) and at the level of individual genes (which might be inconsistent with the aggregated, celllevel output). We used this uncertainty to (1) measure the variability in the phenotypic directionality suggested by the velocity vector in each cell (here, intrinsic uncertainty) and (2) quantify the variability of predicted future cell states under the velocityinduced cell–cell transition matrix (here, extrinsic uncertainty; Fig. 1b and Methods).
We applied these uncertainty metrics to the pancreas dataset (Fig. 4a). We observed that the intrinsic uncertainty was elevated in ductal and Ngn3low endocrine progenitor populations, while the extrinsic uncertainty highlighted these same populations in addition to terminal alpha and beta cells. These results demonstrate that lower intrinsic uncertainty does not necessarily preclude higher extrinsic uncertainty. While the former relies on estimating the velocity vector (which is cellintrinsic), most velocity pipelines also account for other cells in the dataset, which presumably represent the potential past and future states of the cell, to determine cell transitions (Fig. 1b). In the case of alpha and beta cells, these cells represent terminal populations in the pancreas dataset, which may explain the high extrinsic uncertainty as there are no observed successor states. Conversely, in the case of transient cell populations, such as Ngn3high endocrine progenitors and preendocrine cells, both metrics assign a low uncertainty. We attribute the low intrinsic uncertainty of these cells to the fact that their dynamics agree well with the underlying model assumptions (Extended Data Fig. 3). The addition of low extrinsic uncertainty further suggests that these cell types have clear successor populations in this dataset (Fig. 4a).
To further understand what aspects of the data these uncertainty metrics capture, we (in silico) perturbed the pancreas dataset by either (1) downsampling the total counts of each cell to mimic changes in sequencing depth and capture efficiency; (2) subsampling unspliced counts for a subset of genes to mimic the biased capture of unspliced molecules; or (3) adding random multiplicative noise to each abundance value (Methods). We applied each perturbation at various strengths and found that for each perturbation source, the intrinsic uncertainty increased with the perturbation strength. We found a similar response for the extrinsic uncertainty except in the case of total count downsampling, which required a high strength to shift the extrinsic uncertainty (Extended Data Fig. 4). These results suggest that the uncertainty metrics can capture random noise in the data, as well as bias in how the transcripts are measured.
Finally, we asked whether we could use veloVI’s uncertainty to address the common behavior of unexpected ‘backflow’ in twodimensional velocity visualizations; when projecting the average veloVI velocity onto a Uniform Manifold Approximation and Projection (UMAP)^{40} plot (using procedures from elsewhere^{11}), we observed an incorrect ‘backflow’ of directionality in alpha and beta cells, which showed transitions toward their known progenitors. While these terminal populations have high extrinsic uncertainty according to veloVI, it remains difficult to explain which genes cause the inconsistency. In the case of scVelo, it has been proposed to use the likelihood of a gene as a proxy; however, the likelihood has no direct connection to cell–cell transitionbased analyses.
To this end, we sought to score genes in each cell according to how well their velocity agrees with the predicted future cell state that is derived via the velocityinduced transition matrix (incorporating velocity information from all genes as well as gene expression in neighboring cells; Methods). We reasoned that this score, which we call velocity coherence, could help gain insight into why a particular directionality might manifest. A positive score of a gene indicates the velocity value of that gene (the time derivative of its spliced mRNA) agrees with its expression in the inferred future cell state (same direction) and likewise, a negative score indicates disagreement (Fig. 4b and Extended Data Fig. 5a).
In the alpha cells, for example, there are both positively and negatively scoring genes. Genes with a negative score, such as Gcg and Sphkap, were fit correctly by veloVI (alpha cells after preendocrine cells in time along the inferred trajectory on the phase portrait), but disagree with the predicted future cell state, suggesting that other genes are outweighing these genes in the transition matrix computation (Extended Data Fig. 5b). Indeed, genes such as Rnf130, Etv1 and Grb10, which had a positive score that agrees with the backflow, seemed to have been fit incorrectly (alpha cells precede preendocrine cells along the inferred trajectory on the phase portrait) (Extended Data Fig. 5c). The incorrect fits can putatively be explained by violated model assumptions such as a transcriptional burst in alpha cells (Rnf130), ambiguous phase portraits (Etv1) and multikinetics (Grb10).
Conversely, the dynamics in Ngn3high cells are correctly visualized in the UMAP representation (Fig. 4a). We attribute this result to the presence of many genes agreeing with both the model assumptions and the predicted future state of a cell (Extended Data Fig. 5d). Compared to the 95% percentile of the coherence score in alpha cells, more than twice as many genes ranked above this threshold in the Ngn3high cluster (135 versus 54); however, even in this case, we found that many genes were fit with incorrect dynamics for this cell type (Extended Data Fig. 5e).
Taken together, these results suggest that the visualization of dynamics on a twodimensional embedding with previously described procedures is explained by small subsets of genes. Thus, caution is warranted when analyzing projections of velocity estimates onto a twodimensional embedding of the data. We urge users to investigate the dynamics at the level of individual genes to identify which genes meet the model assumptions. Putative candidates are given by our proposed velocity coherence score. Additionally, to identify genes viable for RNA velocity analysis due to the presence of transient cell populations, we propose a score outlined next.
veloVI identifies insufficiently observed or steadystate dynamics
In datasets with nondifferentiating, hierarchicallyrelated cell types, spurious cell state transitions may manifest when applying RNA velocity^{14,15}. Indeed, the underlying transcriptional likelihood model cannot readily distinguish between the case of a transient population and that of multiple steadystate populations. Therefore, we devised a procedure to use a trained veloVI model to identify genes with phase portraits that are consistent with a developmental process versus ones that are consistent with steadystate dynamics or are confounded by noise.
We reasoned that the model fit of genes showing only steadystate dynamics would be robust to a permutation of the data while the model fit of genes with transient populations would worsen. Specifically for every gene, cell type and species (spliced/unspliced) independently, we permuted the abundances of cells in a manner equivalent to shuffling cell barcodes. Subsequently, we passed this perturbed dataset through the veloVI model’s trained encoder and decoder and recorded the absolute error of the fit grouped by genes and cell types. We then used the ttest statistic to compare the mean absolute error in each celltypegene group between the perturbed and original dataset (Extended Data Fig. 6 and Methods). We hypothesized that the ttest statistic, capturing the effect of the permutation, would be elevated in transient populations with strong time dependence and, conversely, nearzero in steadystate populations.
In the pancreas dataset, the permutation strongly affects the ductal and Ngn3 low EP cells for the cellcycle gene Top2a. Indeed, these cell types trace fullyobserved induction and repression states for Top2a. In the case of the deltaspecific gene Sst, where no such transient connection is observed, for example from ductal to preendocrine to delta cells, no single cell type is strongly affected when permuting (Fig. 4c). Consequently, even though Sst is essential for the identity of delta cells, the gene does not display continuous dynamics from ductal progenitor cells and, thus, does not include the necessary information to be analyzed with RNA velocity.
We then applied this procedure to a variety of datasets. In one set of tests, we used datasets describing cellular development. These datasets serve as partial positive controls as we expect directed dynamical processes, as modeled by RNA velocity, to take place in at least a subset of cells in the dataset. As negative controls, we used simulated data of bursty kinetics^{15} with no overall differentiation of cell state and datasets containing multiple cell types that are in steadystate. To summarize the permutation for a gene, we used the maximum permutation effect ttest statistic across cell types (permutation score). Two clusters of datasets emerged when characterizing the pergene permutation score distribution (Fig. 4d). One cluster, with a fatter right tail (quantified by skewness and kurtosis), contained positive control datasets such as the pancreas and spermatogenesis. Despite having relatively many genes sensitive to permutation, the datasets of this cluster also contained many genes that were not sensitive, suggesting that there are likely many nondynamical genes used for downstream analysis with RNA velocity. The other cluster, with less density in the right tail, contained negative controls such as the peripheral blood mononuclear cells (PBMCs), nulldata simulation and the prefrontal cortex.
Between these two clusters of datasets, we also found a few ambiguous datasets, such as the mouse retina (positive control) and brain (negative control), which suggests that there exist some cell subsets within these datasets that are affected by the permutation and hence, possibly reflect a directed dynamical process that is appropriate for modeling with RNA velocity; however, upon closer inspection of the brain dataset, we identified mature neurons as responsible for skewing the permutation score density (Extended Data Fig. 7a). The cluster of mature neurons was singled out as it attributes for about onethird of the highest permutation scores (Extended Data Fig. 7b). For the genes with the highest permutation score, these neuronal cells exhibit a bimodal distribution in which one mode has low unspliced and spliced abundance while the other has respectively higher abundances (Extended Data Fig. 7c). Thus, we attribute this skewing to coarse labeling of this population (Extended Data Fig. 7d). When excluding mature neurons from this analysis, the distribution shifted and its key characteristics moved toward the cluster formed by the negative control cases (Extended Data Fig. 7e).
In the accompanying code to this manuscript, we provide these permutation score densities as a resource for users of RNA velocity, which will enable the datasets we analyzed here to serve as references for the score distribution and thus as a systematic approach to measure the overall transient dynamics of a dataset. For example, datasets exhibiting similar permutation score distributions as the given negative control cases (for example, via kurtosis or skew) are not suitable for RNA velocity analysis with current models.
In Supplementary Notes 1 and 2, we provide case studies outlining how veloVI can be used in practice on PBMCs (negative control) and mouse developing dentate gyrus (partial positive control). These demonstrations synthesize veloVI’s uncertainty quantification and permutation procedure along with the velocity coherence. When applying the permutation procedure, we were able to provide further evidence for the lack of transient populations in the case of PBMCs (Supplementary Note 1), as well as identify transient populations of neuroblasts and granule immature cells for many genes in dentate gyrus (Supplementary Note 2). Taken together, these results demonstrate that the permutation score is also useful for identifying cell populations that lack detectable transient dynamics.
veloVI is an extensible framework for dynamical modeling
The transcriptional model assumptions at the level of one gene (for example, constant rates that impose a specific structure of phase portraits) can be shown to be violated in many cases. For example, in the case of transcriptional bursts in which the transcription rate increases with time^{18} or multiple kinetics within a single gene^{14}, the assumption of constant kinetic rates is violated. Thus, there remains a need for modeling frameworks that are extensible and support varied and more nuanced dynamical assumptions. While veloVI makes many of the same assumptions as in the EM model, it leverages blackbox computational and statistical techniques that allow its generative model to be altered to include new assumptions without needing to extensively rewrite inference recipes or generally sacrifice scalability.
To explore veloVI as a general modeling framework, we adapted it to use genespecific, timedependent transcription rates. Under this extension, transcription rates are free to monotonically increase or decrease with respect to time^{14}, thus allowing for modeling the acceleration of RNA abundance, which can impact the curvature of the model fit (Methods and Fig. 5a). To infer these additional parameters, only the likelihood function of veloVI needed to be adapted. Applying this modified version of veloVI to the pancreas, dentate gyrus and forebrain datasets, we observed improved fit for the majority of genes (Fig. 5b). In the case of the pancreas dataset, the added flexibility allowed veloVI to better fit genes that seem more linear in their phase portraits, for example, as it can reduce the curvature of the fitted dynamics (Fig. 5c).
In the case of Smarca1, the model using a constant transcription rate inferred a downregulation (repression) of alpha cells differentiating into their progenitor populations of preendocrine cell and ductal cells (Fig. 5c). Contrastingly, using a timedependent transcription rate, the upregulation of ductal to preendocrine to alpha cells is inferred by the generalized model. Similar observations apply to Atad2 and Cdkn1a. While the constant transcription rate model inferred the correct regulation type for Ppp1r1a, its generalized counterpart captures the underlying dynamics more accurately (Fig. 5c). Overall, for most genes, we observed a decreasing transcription rate over time (Supplementary Fig. 8).
Altogether, this exemplary model extension demonstrates the flexibility of veloVI’s modeling approach. The flexibility allows us to quickly prototype extensions and infer additional parameters within a single, consistent framework. We, thus, expect future models to benefit from such flexibility.
Discussion
Here, we reformulated the estimation of RNA velocity in a variational inference framework with veloVI. Our method compares favorably to previously proposed methods^{8,11} and adds actionable metrics into downstream data analyses at the cell level via uncertainty quantification and at the level of a gene and dataset with the permutation score. We believe that veloVI will facilitate more systematic analyses with RNA velocity and help reduce the strong reliance on prior knowledge to guide whether results are sensible. As an example, our permutation score could be used to filter genes that are considered for further analysis. We also note that related work has very recently incorporated deep learning with RNA velocity and we review these methods and compare them to veloVI in Supplementary Note 3.
We view this formulation of modeling transcriptional dynamics with probabilistic models and deep learning as a step toward a more rigorous pipeline that faithfully captures the biophysical phenomenon of RNA metabolism. In this work, we relied on previously described data processing approaches that smooth unspliced/spliced abundances across nearest neighbors before velocity estimation. We also borrowed many assumptions from the EM model, including, for example, the lack of explicit support for multiple diverging lineages that would result in genes reflecting a superposition of dynamical signals.
In contrast to previous models, veloVI is built in an extensible way using the scvitools framework^{21}. As a proof of concept, we demonstrated that veloVI could be easily extended to use timedependent transcription rates, which improved model fit for many genes. We anticipate that the veloVI framework will be further adapted to overcome other computational challenges including estimating velocity while accounting for batch effects, using multimodal technologies with measurements that span biology’s central dogma^{41,42} and directly modeling the unspliced and spliced RNA counts with countbased likelihoods. Furthermore, while veloVI’s estimated velocities are relative to a given maximum time of the process (similar as for the EM model), they are no longer relative with respect to the splicing rate as in the steadystate model. In future iterations, we anticipate including prior information from metabolic labeling data to estimate absolute velocities. We discuss these challenges, other considerations and future opportunities in Supplementary Note 4.
A philosophical challenge with RNA velocity relates to the notion that models should use bottomup mechanistic approaches while also being general enough to be applied across a variety of biological systems, each with their own caveats and unique dynamics. In this work, we use a lowdimensional representation of a cell’s phenotypic state to capture multiple biological processes (for example, differentiation and cell cycle). More complex models likely need prior information, such as known experimental time points or cell type lineages to solve issues of statistical identifiability that arise in these more general modeling scenarios; however, incorporating such priors can contradict the usage of RNA velocity as a de novo discovery tool for the trajectory inference task. Despite all these outlined challenges, we envision that veloVI will facilitate applications of RNA velocity via uncertaintyaware analysis as well as easier model prototyping, benefiting both users and method developers.
Methods
veloVI model specification
We begin with the formulation of the ‘dynamical’ model of RNA velocity as presented by ref. ^{11}. We posit transcriptional states k ∈ {1, 2, 3, 4}, where k = 1 indicates induction, k = 2 indicates the induction steady state, k = 3 indicates repression and k = 4 indicates the repression steady state.
Let α_{gk} be the genestatespecific reaction rate of transcription. Let β_{g} be the genespecific splicing rate constant and let γ_{g} be the genespecific degradation rate constant. Each gene has a switching time \({t}_{g}^{s}\) when the system switches from induction phase to repression phase.
Given the solution to the ordinary differential equations^{11}, the unspliced transcript abundance at time t_{ng} for cell n and gene g is defined as
where \({t}_{gk}^{0}\) is the initial time of the system in state k. The spliced transcript abundance is defined as
Induction state
For the induction state, k = 1, we have \({u}_{g1}^{0}=0\), \({s}_{g1}^{0}=0\), α_{g1} > 0 and \({t}_{g1}^{0}=0\). Thus, the unspliced transcript abundance can then be expressed as
Likewise, the spliced transcript abundance can be simplified to
Induction steady state
For the induction steady state, k = 2, the unspliced and spliced transcript abundances are defined as limits of the system:
Repression state
For the repression state, k = 3, we have α_{g3} = 0 and \({t}_{g3}^{0}={t}_{g}^{\,s}\). Thus, the number of unspliced transcripts can then be expressed as
Likewise, the number of spliced transcripts can be simplified to
The initial conditions, \({u}_{g3}^{0}\) and \({s}_{g3}^{0}\) are defined by the induction model at the switching time \({t}_{g}^{s}\), such that
Repression steady state
For the repression steady state, the limit upon which t_{ng} → ∞, there is no expression, so we have
Model assumptions
As in ref. ^{11}, this model assumes that for one gene, at the initial time of the system, cells are first in induction phase in which both spliced and unspliced expression increases. Then cells potentially reach a steady state of this induction state. Next at some future time \({t}_{g}^{s}\) the system switches to repression state. Finally, the repression reaches a steady state in which there is no expression. Further assumptions are necessary to identify the dynamical model parameters^{44}; thus, we assume that each gene is on the same time scale (precisely each gene has a maximum time of t = 20 as shown previously^{11}).
veloVI generative process
We posit a generative process that takes into account the underlying dynamics of the system. Compared to Bergen et al.^{11}, the model here does not treat each gene independently; instead, the latent time and states for each (cell and gene) pair are tied together via a local lowdimensional latent variable.
For each cell we draw a lowdimensional (d = 10 dimensions throughout this manuscript) latent variable
that summarizes the latent state of each cell. Next, for each gene g in cell n we draw the distribution over the state assignments as well as the state assignment itself
Here π_{ng} is sampled from a Dirichlet distribution, which has the support of the probability simplex. In other words, the Dirichlet provides a distribution over discrete probability distributions. If k_{ng} = 1 (induction), then the time is a function of z_{n},
where \({h}_{{{{\rm{ind}}}}}:{{\mathbb{R}}}^{d}\to {(0,1)}^{G}\) is parameterized as a fully connected neural network. Notably, this parameterization results in an inductionspecific time that is constrained to be less than the switching time.
Else, if k_{ng} = 3 (repression),
where \({t}_{\max }:= 20\) is used to fix the time scale across genes and identify the rate parameters of the model. Similarly to the previously defined function, \({h}_{{{{\rm{rep}}}}}:{{\mathbb{R}}}^{d}\to {(0,1)}^{G}\) and is also a neural network.
We also consider two potential steady states. If k_{ng} = 2 (induction steady state) or if k_{ng} = 4 (repression steady state), we consider the limit as time approaches ∞, which is described in the previous section.
Finally, the observed data are sampled from normal distributions as
For veloVI, we consider the observed data \({\{({s}_{n},{u}_{n})\}}_{n = 1}^{N}\) to be the nearestneighbor smoothed expression data that is also used as input to scVelo as well as velocyto. In addition, we assume the data have been preprocessed such that for each gene, the smoothed spliced and unspliced abundances are independently minmax scaled into [0, 1]. By using the normal distribution, we assume that the smoothed expression (which represents an average of random variables) has a sampling distribution centered on some mean value and that this sampling distribution is approximately normal; however, the flexibility of this modeling framework will enable extensions that consider the discrete nature of unique molecular identifiers used in standard scRNAseq assays.
We include a statedependent scaling factor on the variance. For all experiments in this manuscript, we used c_{k} = 1 except for the repression steady state in which c_{4} = 0.1. This hyperparameter choice forces the variance of abundance in the repression steady state to be less than that of other transcriptional states, which reflects the notion that the repression steady state corresponds to zero transcriptional activity. Despite the assumption of zero transcriptional activity, the normal distribution here captures noise that arises during the experimental process (ambient transcripts) as well as during preprocessing (for example, KNN smoothing). Finally, in the following, let θ be the set of parameters of the generative process (α, β, γ, t^{s} and neural network parameters).
veloVI inference procedure
We seek the following: (1) point estimates of the transcription rate, degradation and splicing rate constants and the switching time point; (2) point estimates of the parameters of the neural networks; and (3), a posterior distribution over the latent variables, which in this case includes z and π. Noting that the model evidence p_{θ}(u, s) cannot be computed in closed form, we use variational inference^{45} to approximate the posterior distribution as well as accomplish the other tasks. Following inference, velocity can be calculated as a functional of the variational posterior distribution.
Variational posterior
We posit the following factorization on the approximate posterior distribution
in which dependencies are specified using neural networks with parameter set ϕ. Here z factorizes over all n cells and π_{ng} over all n cells and g genes.
For the likelihoods, we integrate over the choice of transcriptional state k_{ng}, such that the likelihoods for unspliced and spliced transcript abundances,
are mixtures of normal distributions.
Objective
The objective that is minimized during inference is composed of two terms
where \({{{{\mathcal{L}}}}}_{{{{\rm{elbo}}}}}\) is the negative evidence lower bound^{45} of \(\log {p}_{\theta }(u,s)\) and \({{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}\) is an additional penalty that regularizes the location of the transcriptional switch in the phase portrait. In more detail,
which can be estimated using minibatches of data. In particular, we use randomly sampled minibatches of 256 cells for inference. For the penalty term \({{{{\mathcal{L}}}}}_{{{{\rm{switch}}}}}\), we start by only considering cells that are above the 99th percentile of unspliced abundance for each gene. Using these cells we compute the median unspliced and spliced abundance for each gene separately. Let u^{*} and s^{*} be the outcome of this procedure, then
where \({u}_{g3}^{0}\) and \({s}_{g3}^{0}\) were defined as the initial conditions of the repression phase at the switch time \({t}_{g}^{s}\).
Initialization
We initialize α_{g1} to be equal to the median unspliced abundance for the cells above the 99th percentile for each gene. The other global parameters, including the splicing, degradation and switch time are initialized to a constant value shared by all genes. All neural network initialization uses the default implementation in PyTorch.
Optimization
To optimize \({{{{\mathcal{L}}}}}_{{{{\rm{velo}}}}}\) we use stochastic gradients^{26} along with the Adam optimizer with weight decay^{46} as implemented in PyTorch^{47}. For all experiments we use λ = 0.2 for scaling the regularization term in the loss. As a result of minibatching, veloVI’s memory usage is constant throughout training. Unless otherwise specified, all neural networks are fully connected feedforward networks that use standard activation functions such as ReLU for hidden layers and softplus or exponential for parameterizing nonnegative distributional parameters.
Architecture
An overview of the veloVI architecture is shown in Supplementary Fig. 9.
Downstream tasks
Fitted abundance values
The fitted values (used, for example, in MSE benchmarks) for unspliced and spliced abundance are the posterior predictive mean:
where \({u}_{n}^{* }\) and \({s}_{n}^{* }\) are unobserved random variables representing posterior predictive values of unspliced and spliced abundances for cell n. The posterior predictive in the case of unspliced abundance is defined as
which uses the variational posterior distribution as a plugin estimator for the true (unknown) posterior distribution.
We compare these fitted abundance values from veloVI to the analog of the EM model, which itself can be interpreted as a posterior predictive mean. Considering just the unspliced values, for example, the EM model posits a normal likelihood p(u_{ng}∣t_{ng}, k_{ng}) similar to veloVI but without the latent cell state z_{n} and learns posterior distributions q(t_{ng}∣u_{ng}, s_{ng}) and q(k_{ng}∣u_{ng}, s_{ng}). Under the EM model, the posterior distributions are Dirac delta distributions and the corresponding posterior predictive is expressed as
State assignment
The state assignment for each gene and cell is the approximate posterior mean
Genewise latent time
The latent time is computed for each gene and cell as
where the outer expectation with respect to q_{ϕ}(z_{n}∣u_{n}, s_{n}) is estimated with Monte Carlo samples, while the inner expectation is computed analytically over the transcriptional states k_{ng}.
RNA velocity
The velocity of a particular gene in a particular cell is similarly a function of the variational posterior. Recall that the velocity is computed as
Thus, we can compute samples of a posterior predictive velocity distribution via the following process

1.
Sample z_{n} from q_{ϕ}(z_{n}∣u_{n}, s_{n}).

2.
Compute \({{\mathbb{E}}}_{{q}_{\phi }({\pi }_{ng} {z}_{n})}\left[{v}^{(g)}\left({t}_{ng}^{({k}_{ng})},{k}_{ng}\right)\right]\) for each gene.
This provides samples from a distribution over the velocity for every gene–cell pair, which we then use in downstream tasks.
Intrinsic uncertainty
Let \({\bar{v}}_{n}\) be the posterior predictive velocity mean from the procedure above. The intrinsic uncertainty is then computed as \({\mathbb{V}}{{{{\rm{ar}}}}}_{{q}_{\phi }({v}_{n} {u}_{n},{s}_{n})}[c({v}_{n},{\bar{v}}_{n})]\) where c denotes the cosine similarity. In effect, denote by \(\{{v}_{n}^{(l)}\}_{l = 1}^{L}\) the set of L velocity vector samples of cell n from the variational posterior. Then we have:
In this manuscript, we use L = 100 samples.
Extrinsic uncertainty
Let T(v_{1:N}, s_{1:N}) be a function that maps the velocity vectors and spliced abundances of the entire dataset (with n cells) to a cell–cell transition matrix computed as described previously^{11}. Namely, this function compares the similarity of the displacement δ_{ij} of nearest neighbors s_{i} and s_{j} (defined using s_{1:N}) to the velocity of cell i, v_{i}, via the cosine similarity
as the basis for computing transition probabilities between pairs of cells.
Following the construction of T(v_{1:N}, s_{1:N}) for one sample of velocity, the predicted future cell state is computed by the matrix multiplication T(v_{1:N}, s_{1:N})S, where S is the cells by genes matrix of spliced RNA abundances. These predicted future cell state vectors (over samples of velocity) then undergo the same variance computation procedure as described for the intrinsic uncertainty (namely, variance of the cosine similarity).
Timedependent transcription rate
To highlight veloVI’s extensibility with respect to model choice, we consider the timedependent transcription rate
with parameters \({\alpha }_{0},{\alpha }_{1},{\lambda }_{\alpha }\in {{\mathbb{R}}}^{+}\) and k indicating the transcriptional state. The system of differential equations describing the process of splicing stays otherwise unchanged and is, thus, given by
Consequently, it is of the general form
with dependent variable x, system matrix A, inhomogeneity g(t) and solution
As the abundance of unspliced mRNA is modeled independently of its spliced counterpart, its solution of equation (33) can be found directly. Comparing equation (33) with equations (34) and (35), we find that x = u, A = − β, g(t) = α^{(k)}(t). Consequently, the abundance of unspliced mRNA at time t is given by
with statedependent initial time \({t}_{0}^{(k)}\), \({\tau }^{(k)}=t{t}_{0}^{(k)}\) and \({u}_{0}^{(k)}=u({t}_{0}^{(k)})\).
Similarly, this allows solving for s(t), with x = s, A = − γ, g(t) = βu(t). Applying solution formula (35), the abundance of spliced mRNA at time t is given by
These new functions can be used as the mean in the veloVI likelihood, thus allowing optimization in a similar manner as described previously, with the addition of the new parameters \({\alpha }_{0},{\alpha }_{1},{\lambda }_{\alpha }\in {{\mathbb{R}}}^{+}\).
Data preprocessing
All datasets were preprocessed following the same steps. Genes with fewer than 20 unspliced or spliced counts were removed. Transcriptomic counts of each cell were normalized by their median, prefiltered library size and the 2,000 most highly variable genes selected based on dispersion. The aforementioned steps are performed using scVelo’s^{11}filter_and_normalize function.
Following gene filtering and count normalization, the first 30 principal components were calculated and a nearest neighbor graph with k = 30 neighbors was constructed. In a final step, counts were smoothed by the mean expression across their neighbors to compute final RNA abundances. These steps were performed by scVelo’s moments function.
To estimate RNA velocity, the preprocessed unspliced and spliced abundances were (genewise) min−max scaled to the unit interval. Following, the steadystate model was applied to the entire dataset. Genes for which the estimated steadystate ratio and R^{2} statistic are positive were considered for further analysis. If not stated otherwise, this subset of genes was used for parameter inference of veloVI and the EM model.
All datasets used, with the exception of the PBMC dataset, were obtained with spliced and unspliced RNA quantification and details can be obtained from the original publication (Supplementary Table 1). In the case of the PBMC dataset, we quantified RNA abundances using the kallisto bustools RNA velocity workflow^{28}, using an index and defaults as described in the tutorial on the software’s website and automatically annotated via totalVI^{48} using the Seurat v.3 CITEseq PBMC dataset^{49,50} as a reference.
Benchmarking against EM and steadystate models
VeloVI was benchmarked against the EM and steadystate model by first comparing the accuracy of inferred parameters on simulated data. For each number of observations (1,000, 2,000, 3,000, 4,000 and 5,000), we simulated ten datasets of unspliced and spliced counts with 1,000 kinetic parameter tuples (transcription rate α_{g}, splicing rate β_{g}, degradation rate γ_{g}) following a multivariate lognormal distribution. Latent time is Poisson distributed with a maximum of 20 h with the switch from induction to transcription, \({t}_{g}^{s}\), taking place after 2–10 h. The simulations were performed using the simulation function as implemented in scVelo^{11} with noise_level=0.8.
As an additional validation, we inferred kinetic rates for the pancreas data using both veloVI and the EM model. Following, we randomly sampled overall 2,000 estimated parameter tuples (transcription rate α_{g}, splicing rate β_{g}, degradation rate γ_{g}, switch time \({t}_{g}^{s}\)) from the union of the parameters estimated by either algorithm and simulated splicing kinetics with noise_level=1. As the data are simulated and rate parameters and time are known, the ground truth velocities are defined as well. For each model, the Spearman correlations between ground truth and inferred latent time were compared. We used the Spearman correlation as it is an order statistic. Contrastingly, in the case of velocity estimates, we relied on Pearson correlation.
To compare the runtimes of veloVI and EM model were run on random subsets a mouse retina dataset^{12} containing 1,000, 3,000, 5,000, 7,500, 10,000, 15,000 and 20,000 cells. The EM model was run on an Intel(R) Core i910900K CPU @ 3.70GHz CPU using eight cores. VeloVI was run on an Nvidia RTX3090 GPU.
In the case of realworld data, for each gene, we compared the MSE between the observed abundance and the modelpredicted abundance. We did this for each of the veloVI and EM models and separately for spliced and unspliced abundances. The result is the MSE per gene, per method and per species. In the case of the EM model, the abundance prediction is directly a function of the rates, time and transcriptional state and in the case of veloVI, this is the posterior predictive mean. Additionally, for each gene, velocity estimates from the veloVI and EM models were compared through Pearson correlation.
In addition to the MSE, the modelspecific velocity consistency^{11} was also compared. The velocity consistency c quantifies the mean Pearson correlation of the velocity v(x_{j}) of a reference cell x_{j} with the velocities of its neighbors \({{{{\mathcal{N}}}}}_{k}({x}_{j})\) in a KNN graph.
To calculate the consistency, we rely on scVelo’s velocity_confidence function. This evaluation metric makes the assumption that better local consistency is inherently good, reflecting smooth changes in velocity over the phenotypic manifold. We note that this is a heuristic evaluation and the validity of this metric can be affected by, for example, low density of similar cell states, misspecification of the KNN graph due to only considering spliced RNA, etc.
If a ‘ground truth’ cellular ordering, for example, a cellcycle score^{13,29}, is given, we can make use of this source of information to estimate ‘ground truth’ velocities \(\hat{v}\) via finite differences. We estimated this heuristic by first taking the median per gene of the firstorder moment smoothed, spliced RNA abundance of all cells at a given cellcycle position p_{i}, which we denote by \({\bar{s}}^{(i)}\). Then, assuming the p_{i} are ordered (p_{i} < p_{i+1}), \({\hat{v}}^{(i)}\) is defined as
Finally, we compared the sign of all ground truth velocities with their inferred counterparts of veloVI and the EM model (which are aggregated per position in the same way) by computing the sign accuracy per gene. The sign accuracy, which is the fraction of times that the signs agree, accounts for positive velocity, negative velocity and zero velocity. As a baseline, we included a random predictor that chose positive, negative or zero velocity with equal probability. The scEUseq cellcycle data (RPE1FUCCI cells)^{29} included, on average, 9.63 (s.d. 7.01) observations per cell cycle position and the U2OSFUCCI^{13} dataset provided 1.15 (s.d. 0.36) observations per cell cycle position. In the case of the U2OSFUCCI dataset, the ground truth ordering was derived by the original authors using a polar regression on the scatterplot of the two FUCCI protein markers. In the case of the RPE1FUCCI cells, the groundtruth ordering was derived by the original authors using a pseudotime method on the FUCCI protein marker values.
As an additional validation, for each gene, we fitted a GAM to the inferred velocities of the two models versus the cellcycle score. Similarly to ref. ^{13}, we transformed the cellcycle score in each dataset to I = [0, 2π]. To take the periodic nature of the cell cycle into account, we fitted the GAM per gene using spliced RNA abundance s_{ng} as the response and the score as the variable, where the cellcycle score was transformed to the range [I − 2π, I, 2π]. For each gene, a GAM with a univariate spline term for the triple of (shifted) cellcycle positions was fitted. For each feature, 20 splines of degree three were used. For each gene, we reported the R^{2} score.
Stability analysis across quantification algorithms
To assess the robustness of estimation using different means of quantifying unspliced and spliced reads, we relied on previously preprocessed and published data^{34}. The collection contains outputs of variants of the alevin^{32}, kallisto/bustools^{28} velocyto^{8}, dropEST^{31} and starsolo^{51} pipelines. For details of how the data were generated, we refer to the original work^{34}.
To compare estimation across quantification algorithms, we first defined a reference set of genes for which to calculate RNA velocity. The set of reference genes was defined as the set of genes kept by preprocessing the data of one quantification method. In the case of the dentate gyrus data, starsolo was chosen for the quantification method, for all others velocyto. Data were preprocessed according to our described preprocessing pipeline. Counts from all other quantification approaches the same preprocessing steps were followed, except for gene filtering. To prevent the reference genes from being filtered out, they are passed to the filter_and_normalize function via the argument retain_genes.
Velocities were estimated for the steadystate model, EM model and veloVI. The velocities of the first two models were quantified using the function velocity with mode=‘deterministic’ and mode=‘dynamics’, respectively, implemented in scVelo^{11}. For veloVI, model parameters were inferred using default parameters and mean velocities estimated from 25 samples drawn from the posterior.
To compare estimates across quantification algorithms, for each model, cell and pair of quantification algorithms, the Pearson correlation between the paired velocity estimates, was calculated. For each model, the correlation scores were aggregated by taking the mean over cells for one quantification algorithm pair to assess robustness. The distribution of this mean correlation over all quantification algorithm pairs is used for visualization in Fig. 4 and Extended Data Fig. 2.
Analysis with uncertainty quantification and velocity coherence
We used extrapolated future states Ts_{n} of a cell to evaluate if inferred velocities are coherent. The velocity v_{n} of a given cell n is coherent if it points in the same direction as the empirical displacement δ_{n} = Ts_{n} − s_{n}. Directionality is compared by calculating the Hadamard product δ_{n}∘v_{n}. In case both vectors point in the same direction for a given cell, the resulting entry will be positive and negative otherwise. To aggregate the score we report its mean per gene and cell type.
To benchmark the uncertainty quantification, we started with the pancreas dataset and added one of three kinds of perturbations at various strengths. After applying each perturbation, we ran the standard veloVI pipeline and recorded the uncertainty metrics. The first perturbation consisted of downsampling the cells to X% of their original library size (thus removing (1 − X)% of their transcripts; and for unspliced and spliced separately). This was achieved with scanpy.pp.downsample_counts. The second perturbation consisted of binomial thinning of the unspliced counts with probability P (unspliced = np.random.binomial(unspliced, 1 − P)). The final perturbation was multiplicative random noise. To each spliced and unspliced abundance value (this time after library size normalization) we multiplied the value with lognormally distributed noise (np.exp(np.random.normal(0, scale))). Across all perturbations we used a common gene set that was derived from the standard veloVI pipeline; this ensures that the uncertainty values are comparable as they incorporate information across all genes.
Permutation scoring
To quantify how robust the inferred dynamics are with respect to random permutations in the input data, we define a gene and celltypespecific permutation effect, which is then aggregated to a genespecific permutation score (Extended Data Fig. 6). For this analysis, we considered all highly variable genes and did not filter our genes based on estimates of the steadystate model.
To calculate the score, the unspliced and spliced abundances belonging to one (cell type, gene) pair are independently permuted (cell barcodes are shuffled independently per unspliced/spliced). Repeating over all pairs, this results in a permuted data matrix. We then estimate the model fit of the unspliced and spliced abundance for permuted data matrix (the posterior predictive mean, Supplementary Methods). Note that because veloVI can handle heldout data, computing the model fit of permuted data does not require any additional training. Finally, for each (cell type and gene) pair we compute μ_{p} and μ_{0}, which denote the mean absolute error between the model fit abundances and the observed abundances (spliced and unspliced errors added together) for the permuted and original data matrices, respectively.
To quantify the extent to which the mean absolute errors of the two samples are not equal, we define the permutation effect as the ttest statistic
with number of cells n and pooled variance S^{2} of the absolute errors. To limit the effect of dataset size, we consider the maximum sample size of n = 200 observations. The permutation score is aggregated on a gene level by considering the maximum test statistic across cell types. This aggregation allows comparing the permutation score across different datasets.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The processed pancreas data, including spliced and unspliced count abundances, can be downloaded from scVelo’s GitHub (https://github.com/theislab/scvelo_notebooks/raw/master/data/Pancreas/endocrinogenesis_day15.h5ad). The forebrain and dentate gyrus datasets can be downloaded from the Kharchenko laboratory at Harvard (forebrain, http://pklab.med.harvard.edu/velocyto/DG1/10X43_1.loom and dentate gyrus, http://pklab.med.harvard.edu/velocyto/hgForebrainGlut/hgForebrainGlut.loom). The Friedrich Miescher Institute for Biomedical Research (https://www.fmi.ch/groups/gbioinfo/RNAVeloQuant/RNAVeloQuant.html) provides the processed data of the dentate gyrus, mouse brain, pancreas, prefrontal cortex and spermatogenesis. The mouse retina and PBMC data are available for download via figshare (https://figshare.com/projects/veloVI_datasets/145476).
Code availability
veloVI is implemented in a standalone package at https://github.com/YosefLab/velovi, which has also been deposited via Zenodo (https://doi.org/10.5281/zenodo.7897641) (ref. ^{52}). Code to reproduce the results in the manuscript can be found at https://github.com/YosefLab/velovi_reproducibility, as well as deposited via Zenodo (https://doi.org/10.5281/zenodo.7931042) (ref. ^{53}).
References
Wagner, A., Regev, A. & Yosef, N. Revealing the vectors of cellular identity with singlecell genomics. Nat. Biotechnol. https://doi.org/10.1038/nbt.3711 (2016).
Buenrostro, J. D. et al. Integrated singlecell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).
Tanay, A. & Regev, A. Scaling singlecell genomics from phenomenology to mechanism. Nature https://doi.org/10.1038/nature21350 (2017).
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Setty, M. et al. Characterization of cell fate probabilities in singlecell data with palantir. Nat. Biotechnol. 37, 451–460 (2019).
Street, K. et al. Slingshot: cell lineage and pseudotime inference for singlecell transcriptomics. BMC Genom. 19, 1–16 (2018).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Lange, M. et al. Cellrank for directed singlecell fate mapping. Nat. Methods 19, 159–170 (2022).
Zeisel, A. et al. Coupled premRNA and mrna dynamics unveil operational strategies underlying transcriptional responses to stimuli. Mol. Syst. Biol. 7, 529 (2011).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing rna velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Clark, B. S. et al. Singlecell RNAseq analysis of retinal development identifies NFI factors as regulating mitotic exit and lateborn cell specification. Neuron 102, 1111–1126 (2019).
Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with singlecell proteogenomics. Nature 590, 649–654 (2021).
Bergen, V., Soldatov, R. A., Kharchenko, P. V. & Theis, F. J. RNA velocity—current challenges and future perspectives. Mol. Syst. Biol. 17, e10282 (2021).
Gorin, G., Fang, M., Chari, T. & Pachter, L. RNA velocity unraveled. PLoS Comput. Biol. 18, e1010492 (2022).
MarotLassauzaie, V. et al. Towards reliable quantification of cell state velocities. PLoS Comput. Biol. 18, e1010031 (2022).
Zheng, S. C., SteinO’Brien, G., Boukas, L., Goff, L. A. & Hansen, K. D. Pumping the brakes on RNA velocity–understanding and interpreting RNA velocity estimates. Perprint at bioRxiv https://doi.org/10.1101/2022.06.19.494717 (2022).
Barile, M. et al. Coordinated changes in gene expression kinetics underlie both mouse and human erythroid maturation. Genome Biol. 22, 1–22 (2021).
Salmen, F. et al. Highthroughput total RNA sequencing in single cells using vasaseq. Nat. Biotechnol. https://doi.org/10.1038/s41587022013618 (2022).
Lopez, R., Gayoso, A. & Yosef, N. Enhancing scientific discoveries in molecular biology with deep generative models. Mol. Syst. Biol. 16, e9198 (2020).
Gayoso, A. et al. A python library for probabilistic analysis of singlecell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Gong, B., Zhou, Y. & Purdom, E. Cobolt: integrative analysis of multimodal singlecell sequencing data. Genome Biol. 22, 1–21 (2021).
Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts singlecell perturbation responses. Nat. Methods https://doi.org/10.1038/s4159201904948 (2019).
Lotfollahi, M., Naghipourfar, M., Theis, F. J. & Wolf, F. A. Conditional outofdistribution generation for unpaired data using transfer vae. Bioinformatics 36, i610–i617 (2020).
Fleming, S. J. et al. Unsupervised removal of systematic background noise from dropletbased singlecell experiments using CellBender. Nat. Methods 20, 1323–1335 (2023).
Kingma, D. P. & Welling, M. Autoencoding variational Bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2022).
Atta, L., Sahoo, A. & Fan, J. Veloviz: RNA velocityinformed embeddings for visualizing cellular trajectories. Bioinformatics 38, 391–396 (2022).
Melsted, P. et al. Modular, efficient and constantmemory singlecell RNAseq preprocessing. Nat. Biotechnol. 39, 813–818 (2021).
Battich, N. et al. Sequencing metabolically labeled transcripts in single cells reveals mRNA turnover strategies. Science 367, 1151–1156 (2020).
He, D. et al. Alevinfry unlocks rapid, accurate and memoryfrugal quantification of singlecell RNAseq data. Nat. Methods 19, 316–322 (2022).
Petukhov, V. et al. dropest: pipeline for accurate estimation of molecular counts in dropletbased singlecell RNAseq experiments. Genome Biol. 19, 1–16 (2018).
Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNAseq data. Genome Biol. 20, 1–16 (2019).
Kaminow, B., Yunusov, D. & Dobin, A. STARsolo: accurate, fast and versatile mapping/quantification of singlecell and singlenucleus RNAseq data. Preprint at bioRxiv https://doi.org/10.1101/2021.05.05.442755 (2021).
Soneson, C., Srivastava, A., Patro, R. & Stadler, M. B. Preprocessing choices affect RNA velocity results for droplet scRNAseq data. PLoS Comput. Biol. 17, e1008585 (2021).
BastidasPonce, A. et al. Massive singlecell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development https://doi.org/10.1242/dev.173849 (2019).
Hermann, B. P. et al. The mammalian spermatogenesis singlecell transcriptome, from spermatogonial stem cells to spermatids. Cell Rep. 25, 1650–1667 (2018).
Hochgerner, H., Zeisel, A., Lönnerberg, P. & Linnarsson, S. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by singlecell rna sequencing. Nat. Neurosci. 21, 290–299 (2018).
Bhattacherjee, A. et al. Cell typespecific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. Nat. Commun. https://doi.org/10.1038/s41467019120543 (2019).
Ximerakis, M. et al. Singlecell transcriptomic profiling of the aging mouse brain. Nat. Neurosci. 22, 1696–1708 (2019).
McInnes, L., Healy, J., Saul, N. & Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Mimitou, E. P. et al. Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat. Biotechnol. 39, 1246–1258 (2021).
Swanson, E. et al. Simultaneous trimodal singlecell measurement of transcripts, epitopes, and chromatin accessibility using teaseq. eLife 10, e63632 (2021).
Giudice, Q. L., Leleu, M., Manno, G. L. & Fabre, P. J. Singlecell transcriptional logic of cellfate specification and axon guidance in early born retinal neurons. Development https://doi.org/10.1242/dev.178103 (2019).
Li, T., Shi, J., Wu, Y. & Zhou, P. On the mathematics of RNA velocity i: theoretical analysis. CSIAM Trans. Appl. Math. 2, 1–55 (2021).
Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).
Kingma, D. P. & Ba, J. L. Adam: A method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).
Paszke, A. et al. Automatic differentiation in PyTorch. in NIPS Workshop Autodiff (2017).
Gayoso, A. et al. Joint probabilistic modeling of singlecell multiomic data with totalvi. Nat. Methods https://doi.org/10.1038/s4159202001050x (2021).
10x Genomics. 10k PBMCs from a healthy donor, single cell gene expression dataset by CellRanger 6.1.0 (2021).
Stuart, T. et al. Comprehensive integration of singlecell data. Cell 177, 1888–1902 (2019).
Dobin, A. et al. STAR: Ultrafast universal RNAseq aligner. Bioinformatics https://doi.org/10.1093/bioinformatics/bts635 (2013).
Gayoso, A. & Weiler, P. Yoseflab/velovi: velovi 0.2.1 https://doi.org/10.5281/zenodo.7897641 (2023).
Gayoso, A., Weiler, P. & Hong, J. YosefLab/velovi_reproducibility: velovi reproducibility 1.0 https://doi.org/10.5281/zenodo.7931042 (2023).
Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics 35, 4472–4473 (2019).
Acknowledgements
We thank R. Lopez and M. Jones for feedback on the concepts and benchmarking of veloVI. We acknowledge members of the Streets, Theis and Yosef laboratories for general feedback. A.S. is a Chan Zuckerberg Biohub investigator. A.G. and N.Y. were supported by the Chan Zuckerberg Initiative Essential Open Source Software Cycle 4 grant (EOSS40000000121) for scvitools. M.L. acknowledges financial support from the Joachim Herz Stiftung via Addon Fellowships for Interdisciplinary Life Science. A.S. is a Chan Zuckerberg Biohub investigator and is supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM124916. F.J.T. acknowledges support by the BMBF (grant nos. 01IS18036B and 01IS18053A) and by the Helmholtz Associations Initiative and Networking Fund through Helmholtz AI (grant no. ZTIPF501).
Author information
Authors and Affiliations
Contributions
A.G. and P.W. contributed equally. A.G., P.W. and M.L. conceptualized the study. A.G. conceptualized the statistical model with contributions from M.L. and P.W. A.G. designed and implemented veloVI with contributions from P.W., J.H. and M.L. P.W. designed and implemented modeling extensions. D.K. designed and implemented model uncertainty analyses with contributions from A.G., P.W. and M.L. A.G., P.W. and J.H. designed and implemented analysis methods with contributions from M.L. A.S., F.J.T. and N.Y. supervised the work. A.G., P.W., M.L., F.J.T. and N.Y. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
M.L. consults for Santa Ana Bio, is a parttime employee at Relation Therapeutics and owns interests in Relation Therapeutics. F.J.T. consults for Immunai, Singularity Bio, CytoReason and Omniscope and has ownership interest in Dermagnostix and Cellarity. N.Y. is an advisor and/or has equity in Cellarity, Celsius Therapeutics and Rheos Medicine. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Jianhua Xing and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Lowrank structure of latent time.
PCA variance ratio of genecell specific latent time as inferred by the EM model.
Extended Data Fig. 2 Preprocessing stability of inference methods.
Correlation of velocities derived from pairs of quantification algorithms and from velocities estimating using one of veloVI (VI), the EM (EM), and steadystate model (SS) on datasets of prefrontal cortex (PFC) (left, N=78 pairs of quantification methods), 2122 months old mouse brains (middle, N=78 pairs of quantification methods), and hippocampus (right, N=55 pairs of quantification methods). Unspliced and spliced counts are quantified with different algorithms^{46,47,48,49,50,51}^{,54}. Velocities are estimated by veloVI (VI, blue), the EM model (EM, orange), and the steadystate model (SS, green). Box plots indicate the median (center line), interquartile range (hinges), and whiskers at 1.5x interquartile range.
Extended Data Fig. 3 Phase portraits in pancreas endocrinogenesis.
Phase portraits of Rbfox3, Sulf2, Igfbpl1, and Cbfa2t3. Each cell is colored by its cell type.
Extended Data Fig. 4 Effect of data perturbation on uncertainty.
a. The effect of downsampling (0%, 25%, 50%, 75%) counts on phase portraits of Sulf2 (left) colored by cell type, intrinsic uncertainty per cell (middle, N=3696 cells), and extrinsic uncertainty per cell (right, N=3696 cells). b. The effect of unobserved unspliced reads (dropout probability 0.0, 0.5, 0.9, 0.98) in 400 and 800 genes on phase portraits of Fam135a (left), intrinsic uncertainty per cell (middle, N=3696 cells), and extrinsic uncertainty per cell (right, N=3696 cells). c. The effect of multiplicative noise (scale 0.1, 0.5, 1.0, 1.5) on phase portraits of Sulf2 (left), intrinsic uncertainty per cell (middle, N=3696 cells), and extrinsic uncertainty per cell (right, N=3696 cells). Box plots indicate the median (center line), and interquartile range (hinges).
Extended Data Fig. 5 Gene analysis based on extrinsic uncertainty.
a. UMAP embedding of the Pancreas dataset colored by extrinsic uncertainty (left); The velocity coherence score across all genes for Alpha and Ngn3high cells (right). b, c. Genes with the lowest/highest velocity coherence in Alpha cells, respectively. c, d. Genes with the lowest/highest velocity coherence in Ngn3high cells, respectively. e, Genes fit with incorrect dynamics in Ngn3high cells.
Extended Data Fig. 6 Overview of permutation score construction.
a. First, the cells of one cell type are selected. These are shuffled independently for each genes (and independently in each of unspliced and spliced matrices). This is repeated for each cell type and the data are concatenated. This new permuted dataset is fed into a pretrained veloVI model (trained on the same original dataset). The fit of unspliced and spliced abundance is obtained for each new perturbed cell. Following this, for each gene, the mean absolute error (spliced and unspliced) is computed per cell type. The original and perturbed mean absolute errors are compared with the Ttest statistic. This provides a permutation effect statistic for each gene and each cell type. To obtain the permutation score, a scalar score for each gene, we take the maximum permutation effect statistic across cell types.
Extended Data Fig. 7 Permutation score analysis of old mouse brain.
a. Density of permutation score per cell type: arachnoid barrier cells (ABC), astrocyterestricted precursors (ARP), astrocytes (ASC), choroid plexus epithelial cells (CPC), dendritic cells (DC), endothelial cells (EC), ependymocytes (EPC), hemoglobinexpressing vascular cells (HbVC), macrophages (MAC), microglia (MG), monocytes (MNC), neural stem cells (NSC), neuroendocrine cells (NendC), olfactory ensheathing glia (OEG), oligodendrocytes (OLG), oligodendrocyte precursor cells (OPC), pericytes (PC), vascular and leptomeningeal cells (VLMC), vascular smooth muscle cells (VSMC), mature neurons (mNEUR) (N=2000 genes each). b. Percentage of cell types scoring assigned the highest permutation socre for a given gene. c. Genes assigned the highest permutation score. d, UMAP embedding of dataset colored by whether cells are mature neurons (mNEUR). e, Permutation score densities (left), and their kurtosis and skew when using the full dataset (brown) compared to excluding mature neurons.
Supplementary information
Supplementary Information
Supplementary Table 1, Supplementary Figs. 1–12 and Supplementary Notes 1–4.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gayoso, A., Weiler, P., Lotfollahi, M. et al. Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nat Methods (2023). https://doi.org/10.1038/s4159202301994w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4159202301994w