Letter | Published:

# RNA velocity of single cells

## Abstract

RNA abundance is a powerful indicator of the state of individual cells. Single-cell RNA sequencing can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput1. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. Here we show that RNA velocity—the time derivative of the gene expression state—can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy in the neural crest lineage, demonstrate its use on multiple published datasets and technical platforms, reveal the branching lineage tree of the developing mouse hippocampus, and examine the kinetics of transcription in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.

## Access optionsAccess options

from\$8.99

All prices are NET prices.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Linnarsson, S. & Teichmann, S. A. Single-cell genomics: coming of age. Genome Biol. 17, 97 (2016).

2. 2.

Zeisel, A. et al. Coupled pre-mRNA and mRNA dynamics unveil operational strategies underlying transcriptional responses to stimuli. Mol. Syst. Biol. 7, 529 (2011).

3. 3.

Gray, J. M. et al. SnapShot-Seq: a method for extracting genome-wide, in vivo mRNA dynamics from a single total RNA sample. PLoS ONE 9, e89673 (2014).

4. 4.

Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

5. 5.

Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

6. 6.

Islam, S. et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2013).

7. 7.

Klein, A. M. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

8. 8.

Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

9. 9.

Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2017).

10. 10.

Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

11. 11.

The Tabula Muris Consortium, Quake, S. R., Wyss-Coray, T., & Darmanis, S. Single-cell transcriptomic characterization of 20 organs and tissues from individual mice creates a Tabula Muris. Preprint at https://biorxiv.org/content/early/2018/03/29/237446 (2018).

12. 12.

Vollmers, C. et al. Circadian oscillations of protein-coding and regulatory RNAs in a highly dynamic mammalian liver epigenome. Cell Metab. 16, 833–845 (2012).

13. 13.

Furlan, A. et al. Multipotent peripheral glial cells generate neuroendocrine cells of the adrenal medulla. Science 357, eaal3753 (2017).

14. 14.

Kriegstein, A. & Alvarez-Buylla, A. The glial nature of embryonic and adult neural stem cells. Annu. Rev. Neurosci. 32, 149–184 (2009).

15. 15.

Malatesta, P. et al. Neuronal or glial progeny: regional differences in radial glia fate. Neuron 37, 751–764 (2003).

16. 16.

Johnston, R. J. & Desplan, C. Stochastic mechanisms of cell fate specification that yield random or robust outcomes. Annu. Rev. Cell Dev. Biol. 26, 689–719 (2010).

17. 17.

Iwano, T., Masuda, A., Kiyonari, H., Enomoto, H. & Matsuzaki, F. Prox1 postmitotically defines dentate gyrus cells by specifying granule cell identity over CA3 pyramidal cell fate in the hippocampus. Development 139, 3051–3062 (2012).

18. 18.

Plass, M. et al. Prox1 postmitotically defines dentate gyrus cells by specifying granule cell identity over CA3 pyramidal cell fate in the hippocampus. Science 360, eaaq1723 (2018).

19. 19.

Petukhov, V. et al. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments. Genome Biol. 19, 78 (2018).

20. 20.

Zeisel, A. et al. Molecular architecture of the mouse nervous system. Preprint at https://biorxiv.org/content/early/2018/04/06/294918 (2018).

21. 21.

Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat. Neurosci. 21, 120–129 (2018).

22. 22.

Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature  551, 333–339 (2017).

23. 23.

Hochgerner, H., Zeisel, A., Lönnerberg, P. & Linnarsson, S. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing. Nat. Neurosci.  21, 290–299 (2018).

24. 24.

Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).

## Acknowledgements

The work reported here was supported by the Swedish Foundation for Strategic Research (RIF14-0057 and SB16-0065), the Knut and Alice Wallenberg Foundation (2015.0041), the Erling Persson Foundation (HumDevCellAtlas) and the Wellcome Trust (108726/Z/15/Z) to S.L.; Center for Innovative Medicine (CIMED) to K.L. and P.C.; Swedish Research Council, Marie Curie Integration Grant EPIOPC, 333713, European Research Council EPIScOPE, 681893, Swedish Brain Foundation, Ming Wai Lau Centre for Reparative Medicine, Cancerfonden and Karolinska Institutet to G.C.-B. P.V.K. was supported by NIH R01HL131768 from NHLBI and CAREER (NSF-14-532) award from NSF.

### Reviewer information

Nature thanks A. Klein, R. Satija, M. Stadler and the other anonymous reviewer(s) for their contribution to the peer review of this work.

## Author information

### Affiliations

1. #### Laboratory of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden

• Gioele La Manno
• , Amit Zeisel
• , Emelie Braun
• , Hannah Hochgerner
• , Peter Lönnerberg
• , Alessandro Furlan
• , Lars E. Borm
• , David van Bruggen
• , Gonçalo Castelo-Branco
2. #### Science for Life Laboratory, Solna, Sweden

• Gioele La Manno
• , Amit Zeisel
• , Emelie Braun
• , Hannah Hochgerner
• , Peter Lönnerberg
• , Lars E. Borm
3. #### Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

• Ruslan Soldatov
• , Viktor Petukhov
• , Jean Fan
• , Zehua Liu
• , Jimin Guo
•  & Peter V. Kharchenko
4. #### Department of Applied Mathematics, Peter The Great St. Petersburg Polytechnic University, St, Petersburg, Russia

• Viktor Petukhov
5. #### Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden

• Katja Lidschreiber
•  & Patrick Cramer
6. #### Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden

• Maria E. Kastriti
7. #### John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK

• Xiaoling He
•  & Roger Barker
8. #### Division of Neurodegeneration, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden

• Erik Sundström
9. #### Max Planck Institute for Biophysical Chemistry, Department of Molecular Biology, Göttingen, Germany

• Patrick Cramer
10. #### Harvard Stem Cell Institute, Cambridge, MA, USA

• Peter V. Kharchenko

### Contributions

S.L. conceived the concept of RNA velocity and P.V.K. showed that RNA velocity could be detected through analysis of unspliced transcripts in single cells. P.V.K. and S.L. designed and supervised the study. P.V.K., S.L., G.L.M. and R.S. developed the analytical framework, analysed data, made figures and drafted the manuscript, with contributions from all co-authors. P.V.K., G.L.M., R.S. and P.L. implemented the software, with assistance from V.P. and J.F. Z.L. examined RNA degradation signals. A.Z. and H.H. performed the mouse hippocampus experiment. P.C. supervised and K.L. and H.H. performed metabolic labelling. M.E.K. and I.A. performed validations of chromaffin differentiation rate. E.B. and L.E.B. performed and analysed the fluorescent in situ hybridization experiment on tissue dissected by X.H. E.S. and R.B. provided human embryonic brain tissue. D.v.B. performed the human forebrain single-cell RNA-seq experiment under supervision of G.C.-B. J.G. assisted with measurement and interpretation of mouse bone marrow. The paper was read and approved by all co-authors.

### Competing interests

The authors declare no competing interests.

### Corresponding authors

Correspondence to Sten Linnarsson or Peter V. Kharchenko.

## Extended data figures and tables

1. ### Extended Data Fig. 1 Most of the intronic reads arise due to internal priming from stable positions.

ad, Examples of read density around intronic polyA and polyT sequences. The browser screenshots show the density of reads from the 10x Chromium mouse hippocampus dataset (top track of each panel), mouse bone marrow inDrop dataset (second track from the top), and chromaffin differentiation assessed using SMART-seq2 (third track). The bottom two tracks show gene annotation, and positions of polyA or polyT sequences with a length of at least 15 bp with one allowed mismatch. The polyA/polyT boxes are coloured blue if the stretch is in a concordant orientation to the transcription of the underlying gene (that is, would result in a polyA sequence in the nascent RNA molecule being transcribed), or red if they are oriented in the discordant position (that is, would result in a polyT sequence in the RNA). The 3′-end-based 10x Chromium and inDrop protocols show discrete peaks downstream of the polyA priming sites, with the 10x dataset also showing peaks upstream of the polyT sites. The SMART-seq2 protocol shows much more diffused peaks, expected from the full-length purification procedure used by the protocol. eh, Average read density profiles around concordant and discordant internal priming sites. The plots show observed/expected intronic read density around (A)15 or (T)15 sequences (with 1 allowed mismatch) within the intronic regions. The x axis shows position relative to the motif position (in basepairs), in a genomic reference orientation. The bold lines show genome-wide average (trimmed of two extreme values among chromosomes for each position). The averages of individual chromosomes are shown as semi-transparent lines. e, Profiles of mouse hippocampus 10x Chromium dataset (n = 18,213). f, Profile for human forebrain 10x data (n = 1,720). g, Profile for the chromaffin differentiation data measured using SMART-seq2 (n = 385). h, Profile for the mouse bone marrow data measured using inDrop (n = 3,018). The top left corner of each plot shows the number of all intronic reads (that is, falling within the gene, but not touching an exon) that falls within the 250 bp around internal priming sites (1,500 bp was used for the SMART-seq2 dataset). In 10x data, while concordant internal priming sites produce stronger signal, their frequency within the genome is lower than those of discordant sites, so that overall discordant sites account for slightly higher fraction of intronic signals. By contrast, the inDrop dataset appears to have very limited discordant priming.

2. ### Extended Data Fig. 2 Estimation of the characteristic time of RNA metabolism in human cells.

a, Design of the metabolic labelling experiment in human cells. HEK293 cells were exposed to 4-thiouridine (4sU) for 5, 15 or 30 min, and the labelled fraction was isolated and analysed. A no pull-down control was also analysed, and represents the equilibrium state (indicated by ∞). b, Expected profiles of the abundance and fraction of labelled spliced and unspliced RNA molecules. c, The observed dynamic profiles of genes were clustered, yielding two groups: the majority (83.4%) were concordant with the expectation of increasing labelling; and a smaller fraction (16.6%) of discordant genes. Bars indicate s.e.m. ngenes = 998, ntechnical = 2, nbiological = 2. d, Curves showing maximum likelihood fit to the data, based on the analytical solution for a step increase in the transcription rate. The fit yields values of β and γ, and of the characteristic time constant τ, defined as the time required to reach 1 – 1/e ≈ 63.2% of the asymptotic value. e, The distribution of τ values. f, The joint distribution of the fit β and γ parameters (n = 832).

3. ### Extended Data Fig. 3 Degradation rates are conserved over a wide range of terminally differentiated cell types.

Conservation of the RNA degradation rate over a wide range of different cell types in the adult mouse (Tabula Muris dataset). a, The distribution over the genes of the correlation of spliced and unspliced molecule counts across all the cell types (ngenes = 8,385). b, Legend enumerating the tissues and cell classes annotated by the Tabula Muris consortium (n = 48). Functionally, developmentally or phenotypically related classes are coloured with similar colours to aid the interpretation of the plots below. cd, A representative selection of genes with high correlation (ρ > 0.9) (c) and typical correlation (0.9 > ρ > 0.4) (d). γ was estimated by robust linear regression (RANSAC). e, Plots show a selection of genes displaying two clearly distinct degradation rates (such genes with double γ amounted to 10.8% of the total). The values of the two different γ were estimated by regression mixture modelling. f, g, Two examples of genes where multiple values of γ are explained by alternative splicing in different cell types.

4. ### Extended Data Fig. 4 Structure-based velocity estimation.

a, b, For genes that are observed only outside of the steady state, such as genes upregulated late in the chromaffin differentiation (a) or downregulated early in the Schwann cell precursors (b), gene-relative γ fit is likely to deviate from its steady-state value. c, d, To correct for such effects, a structure-based γ fit will first predict γ for every gene based on its structural parameters, and then use the k most correlated genes in the dataset to adjust M value ($$M={\mathrm{log}}_{2}({u}_{{\rm{o}}}/{u}_{{\rm{s}}{\rm{s}}})$$, where uss is the unspliced counts predicted from spliced counts under steady-state, and uo is the observed unspliced count) using robust mean, and re-estimate γ. e, Scatter plot comparing gene-relative and structure-based γ estimates, with coloured circles highlighting γ adjustments for genes downregulated early in SCPs (blue) and upregulated late in chromaffin cells (green). The values are shown on a natural log scale. fi, Cell expression velocity in the chromaffin E12.5 dataset, based on the structure-based γ estimates, shown on the first five PCs.

5. ### Extended Data Fig. 5 RNA velocity analysis of inDrop datasets: visual stimulus response of cortical pyramidal neurons and neutrophil differentiation.

a, Simplified illustration of a model of activation of pyramidal neurons of the visual cortex after exposure to a light stimulus. b, Velocity estimates projected onto a two-dimensional PCA plot of the dataset (n = 952). c, Average transition probability of unstimulated cells (top), cells stimulated for 1 h (middle) and cell stimulated for 4 h (bottom). The unstimulated cells mostly were stationary and only few cells show the tendency of activating early response genes (probably as a result of the dissociation procedure). Cells stimulated for 1 h were characterized by expressing immediate early genes and high velocity in late response genes, and they were, therefore, transitioning to a state more similar to the one observed for the 4-h activation time point. After 4 h of stimulations cells appeared to be reverting to a state comparable to the unstimulated sample (bottom). d, e, Top panels, phase portraits of early (d) and late (e) response genes. Bottom panels, violin plots show expression distribution over the cell population at each time point (left half of the violin) and extrapolation to the future using velocity (right half of the violin). In the plot, transitions of single cells are indicated by lines connecting the two halves of the violins and coloured by the sign of the velocity of each gene. f, Grid visualization shows cell expression velocity estimates for the inDrop mouse bone marrow dataset on a t-SNE plot (n = 3,018). g, Major cell populations are labelled based on manual annotation. The velocity flow in a captures neutrophil maturation, starting from the dividing cells on the right, all the way to Il1b activation on the left. Expression profiles for five marker genes are shown below. h, The plots illustrate gene-relative model fits for several example genes. For each gene, the first column shows spliced molecular counts in different cells. The second column shows unspliced molecular counts. The third column shows phase portrait of a gene (unspliced versus spliced dependency) and the resulting γ fit (dashed red line), as determined using extreme quantile method. Each point corresponds to a cell, coloured according to cluster labels shown in g. The last column shows unspliced count signal residual based on the estimated γ fit, with positive residuals indicating expected upregulation, and negative residuals indicating expected downregulation of a gene.

6. ### Extended Data Fig. 6 Dynamics of maturation of enterocytes during intestinal homeostasis.

a, Velocity field projected on a 2D t-SNE plot. The clusters are labelled and coloured as in the original publication11 to facilitate comparison (n = 2,683). Velocity analysis revealed a transition related to the maturation of distal and proximal enterocytes. No consistent velocity was observed in the part of the manifold occupied by stem cells and transit amplifying (TA) cells, suggesting that stem cell dynamics are more difficult to capture either for their slower rate or a more stochastic nature. The small velocities of transit amplifying cells were likely to be driven by the cell cycle. b, A selection of the cell cycle genes that were removed from the analysis is plotted on the t-SNE. Despite the removal of the genes annotated as cell cycle genes, we still observed important segregation by cell cycle, illustrating the difficulty of disentangling cell cycle phase from the cell state. c, A selection of phase portraits that show genes underlying the observed velocity field. Markers of endocrine, goblet and tuft cells displayed no detectable velocity. Velocity towards and from stem cell states was detectable for a limited set of genes (such as the stem cell marker Lgr5), however, on the genome-wide level, the exact dynamics of this process was probably confounded by the high correlation with cell cycle.

7. ### Extended Data Fig. 7 RNA velocity unveils the dynamics of differentiation and myelination of oligodendrocytes.

a, t-SNE projection shows the landscape of oligodendrocyte lineage differentiation and myelination process in the hindbrain (pons) of adolescent (P20) mice (n = 6,307). The velocity field reflects the dynamics of expression of both the initial differentiation wave and the following expression changes associated with the myelination process. The cell clusters are coloured by pseudotime as in c to facilitate interpretation. b, Expression patterns of landmark genes of the differentiation process. Pdgfra is the canonical marker of oligodendrocyte precursors (OPCs), Neu4 marks committed oligodendrocyte precursors (COPs), Tmem2 is enriched in newly formed oligodendrocytes (NFOLs) and the expression of Mog is upregulated at the beginning of the myelination process in myelin-forming oligodendrocytes (MFOLs). c, A selection of phase portraits underlying the velocity field shown in a. d, t-SNE projections and velocity vector field of the same dataset, but analysed using a more naive feature selection that has retained other axes of variation on top of the oligodendrocyte maturation (sex and day of dissection). Notice that despite the separation of populations into Xist+ and Xist tracks, the velocity field correctly captures progression from progenitors to newly formed oligodendrocytes in the two parallel tracks. e, Level of expression of Xist showing that most of the extra variation is driven by the sex of the animal. f, Cells coloured by the day on which the experiment was performed.

8. ### Extended Data Fig. 8 Agreement of velocity predictions with the observed expression derivatives.

a, Maturation progression of granule neurons in the mouse hippocampus dataset is approximated by pseudotime (estimated with a principal curve). b, For a pair of example genes (rows), the plots show unspliced and spliced gene expression profiles along the pseudotime (left panels), empirically estimated smoothed pseudotime derivative of the observed gene expression and the estimated RNA velocity (middle panels), as well as the relationship between spliced and unspliced expression (right panel). The velocity estimates for the two chosen genes are highly correlated with the empirically observed derivative, indicating accurate velocity estimation. c, The majority (75%) of the genes that were differentially regulated along the pseudotime trajectory showed a positive correlation with the empirical expression derivative. The distribution of such genes is split according to three classes of trajectory-associated genes as shown in d. By contrast, velocity estimates for genes that were not differentially expressed along the pseudotime trajectory did not show such correlation (grey). Incorporating information about co-regulated genes into velocity estimation using gene kNN clustering (see Supplementary Note 1) can significantly boost the accuracy of the velocity predictions (lower panel). d, Trajectory-associated genes were classified as early, transient and late, according to their peak expression time. x axis, cells ordered by pseudotime; y axis, genes ordered by their peak expression time. e, The genes that were well-correlated in terms of their spliced expression patterns with Ptprg, also showed a high correlation of their velocity estimates with Ptprg. To assess the degree of consistency of the velocities of co-regulated genes, we introduced a measure of velocity coordination for a given gene, as a difference between the mean correlations of the velocity estimates of the co-regulated genes and the velocity estimates of all genes. The two quantities being compared are shown for Ptprg with dotted vertical lines: grey, mean velocity correlation with all genes; red, mean velocity correlation with top co-regulated genes. Velocity coordination provides an unbiased measure of quality for velocity estimates. f, Velocities of co-regulated genes were correlated. Distribution of gene velocity coordination values is shown for genes that had co-regulated genes (that is, the genes that had well-correlated gene neighbours in terms of their spliced expression pattern, green), as well as for the genes that did not have enough co-regulated genes (without neighbours, grey). g, Co-regulated genes that had high velocity coordination tended to have high correlation with the empirical derivatives. Spearman correlation coefficient is shown. hk, Velocity performance during maturation of pyramidal neurons (h). Genes differentially expressed during maturation had high correlation of velocity with empirical derivative (i), co-regulated genes tended to have correlated velocity estimates (j) and the degree of velocity coordination was associated with its correlation with empirical derivative (k). lm, Velocity performance during chromaffin differentiation. ps, Velocity performance during maturation of oligodendrocytes. Number of top co-regulated genes analysed for velocity correlation: 200 (g), 150 (k, o, s).

9. ### Extended Data Fig. 9 Branching developmental trajectories of developing hippocampus.

a, t-SNE plot of the developmental dentate gyrus dataset. Cells are coloured by cluster identities, with labels shown for the major cell types. b, Expression of radial glia (and astrocyte) marker Hes1, and cell cycle genes Top2a and Cdk1 shown on the t-SNE plot. c, Marker genes of different regions of the hippocampus (in situ hybridization images from Allen Mouse Brain Atlas24 show prominent expression signals at different extremities of the branching plot. Scale bars, 0.5 mm.

10. ### Extended Data Fig. 10 Single-cell velocity estimates for individual cells in the embryonic hippocampus dataset.

a, Arrows indicate the extrapolated state projected onto the t-SNE plot of the manifold. b, Selected phase portraits and fits of the equilibrium slope (γ) for the developing cells in the embryonic hippocampus dataset. For each gene, the first column shows spliced–unspliced phase portrait. The dashed line shows the γ fit. The second column illustrates the magnitude of the residuals (that is, the difference between observed and expected unspliced abundance, which closely tracks with velocity) for several genes involved in the development of different neural lineages. The third column shows the observed expression profile for spliced molecules.

## Supplementary information

1. ### Supplementary Note 1

Theory and Computational Methods: This Supplementary Note describes the theoretical formulation of the employed transcriptional model, underlying assumptions, mathematical and computational details of the estimation procedure, and provides additional details related to processing of individual datasets.

3. ### Supplementary Note 2

Considerations for accurate determination and visualization of RNA velocity: This Supplementary Note illustrates simulations of the underlying model under different scenarios, details various ways in which velocity can be estimated and visualized, and discusses known limitations of both procedures. It is divided into 11 sections and includes Supplementary Figures 1-20.

### DOI

https://doi.org/10.1038/s41586-018-0414-6

• ### Technique to measure the expression dynamics of each gene in a single cell

• Allon M. Klein

Nature (2018)

• ### Research Highlights

Nature Biotechnology (2018)

• ### The adult human testis transcriptional cell atlas

• Jingtao Guo
• , Edward J. Grow
• , Hana Mlcochova
• , Geoffrey J. Maher
• , Cecilia Lindskog
• , Xichen Nie
• , Yixuan Guo
• , Yodai Takei
• , Jina Yun
• , Long Cai
• , Robin Kim
• , Douglas T. Carrell
• , Anne Goriely
• , James M. Hotaling