Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A comparison of single-cell trajectory inference methods

Abstract

Trajectory inference approaches analyze genome-wide omics data from thousands of single cells and computationally infer the order of these cells along developmental trajectories. Although more than 70 trajectory inference tools have already been developed, it is challenging to compare their performance because the input they require and output models they produce vary substantially. Here, we benchmark 45 of these methods on 110 real and 229 synthetic datasets for cellular ordering, topology, scalability and usability. Our results highlight the complementarity of existing tools, and that the choice of method should depend mostly on the dataset dimensions and trajectory topology. Based on these results, we develop a set of guidelines to help users select the best method for their dataset. Our freely available data and evaluation pipeline (https://benchmark.dynverse.org) will aid in the development of improved tools designed to analyze increasingly large and complex single-cell datasets.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of several key aspects of the evaluation.
Fig. 2: A characterization of the 45 methods evaluated in this study and their overall evaluation results.
Fig. 3: Detailed results of the four main evaluation criteria: accuracy, scalability, stability and usability.
Fig. 4: Complementarity between different trajectory inference methods.
Fig. 5: Practical guidelines for method users.
Fig. 6: Demonstration of how a common framework for TI methods facilitates broad applicability using some example datasets.

Similar content being viewed by others

Data availability

The processed real and synthetic datasets used in this study are deposited on Zenodo (https://doi.org/10.5281/zenodo.1443566)55.

The main analysis repository is available at https://benchmark.dynverse.org and is divided into several experiments. Every experiment has its own set of scripts and results, each accompanied by an illustrated readme that can be browsed and explored on the Github website.

Code availability

The analysis scripts call several other R packages, of which an overview is available at dynverse.org. These packages include dynwrap, used to wrap the output of methods into the common trajectory model, dyneval, which contains the evaluation metrics, dynguidelines, the guidelines app, and dynplot for plotting trajectories.

References

  1. Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 21350 (2017).

    Article  Google Scholar 

  2. Etzrodt, M., Endele, M. & Schroeder, T. Quantitative single-cell approaches to stem cell research. Cell Stem Cell 15, 546–558 (2014).

    Article  CAS  PubMed  Google Scholar 

  3. Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Cannoodt, R., Saelens, W. & Saeys, Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46, 2496–2506 (2016).

    Article  CAS  PubMed  Google Scholar 

  5. Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data.Curr. Opin. Syst. Biol. 7, 36–46 (2018).

    Article  Google Scholar 

  6. Liu, Z. et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat. Commun. 8, 22 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Schlitzer, A. et al. Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat. Immunol. 16, 718–728 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. See, P. et al. Mapping the human DC lineage through the integration of high-dimensional techniques. Science 356, eaag3009 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Aibar, S. et al. SCENIC: Single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Regev, A. et al. Science forum: the human cell atlas. eLife 6, e27041 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Han, X. et al. Mapping the mouse cell atlas by microwell-seq. Cell 172, 1091–1107.e17 (2018).

    Article  CAS  PubMed  Google Scholar 

  14. Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).

    Article  PubMed Central  Google Scholar 

  15. Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017).

    Article  Google Scholar 

  16. Henry, V. J., Bandrowski, A. E., Pepin, A.-S., Gonzalez, B. J. & Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database (Oxford) 2014, bau069 (2014).

  17. Davis, S. et al. List of software packages for single-cell data analysis. https://github.com/seandavi/awesome-single-cell (2018); https://doi.org/10.5281/zenodo.1294021

  18. Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol. 14, e1006245 (2018)

  19. Bendall, S. C. et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157, 714–725 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Shin, J. et al. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Campbell, K. & Yau, C. Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/026872 (2015).

  22. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  PubMed  Google Scholar 

  23. Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 2859 (2014).

    Article  Google Scholar 

  25. Matsumoto, H. & Kiryu, H. SCOUP: a probabilistic model based on the Ornstein–Uhlenbeck process to analyze single-cell expression data during differentiation. BMC Bioinformatics 17, 232 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117–e117 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome. Biol. 17, 106 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H. & Tsuda, K. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics 17, 363 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Cannoodt, R. et al. SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. Preprint at bioRxiv https://doi.org/10.1101/079509 (2016).

  32. Lönnberg, T. et al. Single-cell RNA-seq and computational analysis using temporal mixture modeling resolves TH1/TFH fate bifurcation in malaria. Sci. Immunol. 2, eaal2192 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Campbell, K. R. & Yau, C. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Res. 2, 19 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Tian, L. et al. scRNA-seq mixology: Towards better benchmarking of single cell RNA-seq protocols and analysis methods. Preprint at bioRxiv https://doi.org/10.1101/433102 (2018).

  35. Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).

    Article  CAS  PubMed  Google Scholar 

  36. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome. Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    Article  CAS  PubMed  Google Scholar 

  38. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pya, N. & Wood, S. N. Shape constrained additive models. Stat. Comput. 25, 543–559 (2015).

    Article  Google Scholar 

  40. Taschuk, M. & Wilson, G. Ten simple rules for making research software more robust. PLoS Comput. Biol. 13, e1005412 (2017).

  41. Mangul, S. et al. A comprehensive analysis of the usability and archival stability of omics computational tools and resources. Preprint at bioRxiv https://doi.org/10.1101/452532 (2018).

  42. Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Artaza, H. et al. Top 10 metrics for life science software good practices. F1000Res. 5, 2000 (2016).

    Article  Google Scholar 

  44. Saelens, W., Cannoodt, R. & Saeys, Y. A comprehensive evaluation of module detection methods for gene expression data. Nat. Commun. 9, 1090 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Manno, G. L. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: Can we all be better than average? Mol. Syst. Biol. 7, 537 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Gitter, A. Single-cell RNA-seq pseudotime estimation algorithms. https://github.com/agitter/single-cell-pseudotime (2018); https://doi.org/10.5281/zenodo.1297423

  48. Kouno, T. et al. Temporal dynamics and transcriptional control using single-cell gene expression analysis. Genome. Biol. 14, R118 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Zeng, C. et al. Pseudotemporal ordering of single cells reveals metabolic control of postnatal β cell proliferation. Cell. Metab. 25, 1160–1175.e11 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Papadopoulos, N., Parra, R. G. & Soeding, J. PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes. Bioinformatics, btz078 (2019).

  51. Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).

    PubMed  PubMed Central  Google Scholar 

  52. Jurman, G., Visintainer, R., Filosi, M., Riccadonna, S. & Furlanello, C. in Proc. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 1–10 (IEEE, 2015); https://doi.org/10.1109/DSAA.2015.7344816

  53. Wright, M. N. & Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1-17 (2017).

  54. Beaulieu-Jones, B. K. & Greene, C. S. Reproducibility of computational workflows is automated using continuous analysis. Nat. Biotechnol. 35, 3780 (2017).

    Article  Google Scholar 

  55. Cannoodt, R., Saelens, W., Todorov, H. & Saeys, Y. Single-cell -omics datasets containing a trajectory (Version 2.0.0). Zenodo https://doi.org/10.5281/zenodo.1443566 (2018).

Download references

Acknowledgements

We would like to thank the original authors of the methods for their feedback and improvements on the method wrappers. This study was supported by the Fonds Wetenschappelijk Onderzoek (R.C., 11Y6218N and W.S., 11Z4518N) and BOF (Ghent University, H.T.). Y.S. is an ISAC Marylou Ingram scholar.

Author information

Authors and Affiliations

Authors

Contributions

R.C., W.S., H.T. and Y.S. designed the study. R.C. and W.S. performed the experiments and analyzed the data. W.S., R.C. and H.T. implemented software packages. R.C., W.S., Y.S. and H.T. prepared the manuscript. Y.S. supervised the project.

Corresponding author

Correspondence to Yvan Saeys.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 A common interface for TI methods.

(a) The input and output of each TI method is standardized. As input, each TI method receives either raw or normalized counts, several parameters, and a selection of prior information. After its execution, a method uses one of the seven wrapper functions to transform its output to the common trajectory model. This common model then allows to perform common analysis functions on trajectory models produced by any TI method. (b) Illustrations of the specific transformations performed by each of the wrapper functions.

Supplementary Figure 2 Results from the evaluation, for all methods and across all evaluation criteria.

(a) We characterized the methods according to the wrapper type, their required priors, whether the inferred topology is constrained by the algorithm (fixed) or a parameter (param), and the types of inferable topologies. The methods are grouped vertically based on the most complex trajectory type they can infer. (b) The overall results of the evaluation on four criteria: benchmarking using a reference trajectory on real and synthetic data, scalability with increasing number of cells and features, stability across dataset subsamples, and quality of the implementation. (c) Accuracy of trajectory inference methods across metrics, dataset sources and dataset trajectory types. The performance of a method is generally more stable across dataset sources, but very variable depending on the metric and trajectory type. (d) Predicted execution times and memory usage for varying numbers of cells and features (# cells × # features). Predictions were made by training a regression model after running each method on bootstrapped datasets with varying numbers of cells and features. (e) Stability results by calculating the average pairwise similarity between models inferred across multiple runs of the same method. (f) Usability scores of the tool and corresponding manuscript, grouped per category.

Supplementary Figure 3 Accuracy of trajectory inference methods.

(a) Overall score for all methods across 339 datasets, colored by the source of the datasets. Black line indicates the mean. (b) Similarity between the overall scores of all dataset sources, compared to real datasets with a gold standard, across all methods (n = 46, after filtering out methods that errored too frequently). Shown in the top left is the Pearson correlation. (c) Bias in the overall score towards trajectory types for all methods across 339 datasets. Black line indicates the mean. (d) Distributions of the difference in size between predicted and reference topologies. A positive difference means that the topology predicted by the method is more complex than the one in the reference.

Supplementary Figure 4 Scalability of trajectory inference methods.

(a) Three examples of average observed running times across five datasets (left) and the predicted running time (right). (b) Overview of the scalability results of all methods, ordered by their average predicted running time from (a). We predicted execution times and memory usage for each method with increasing number of features or cells, and used these values to classify each method into sublinear, linear, quadratic and superquadratic based on the shape of the curve.

Supplementary Figure 5 Agreement between actual values and predictions for execution times and memory usage.

We created a predictive model of the running time and memory usage based on a set of scaling datasets (left), and validated this model based on the similarity of the predictions and actual values on all benchmark datasets (right). Shown are the values for each method and dataset (n = 65618 for training, n = 11939 for test). Top left indicates the Pearson correlation coefficient.

Supplementary Figure 6 Usability of trajectory inference methods.

Shown is the score given for each method on every item from the usability score sheet (Supplementary Table 3). Each aspect of the quality control was part of a category, and each category was weighted so that it contributed equally to the final quality score. Within each category, each aspect also received a weight depending on how often it was mentioned in a set of papers discussing good practices in tool development and evaluation. This is represented in the plot as the height on the y-axis. Top: Average usability score for each method. Right: The average score of each quality control item. Shown into more detail are those items which had an average score lower than 0.5.

Supplementary information

Supplementary Information

Supplementary Figures 1–6 and Supplementary Notes 1 and 2

Reporting Summary

Supplementary Table 1

Overview of available trajectory inference tools, and whether they were included in this study.

Supplementary Table 2

Overview of the real datasets used in this study.

Supplementary Table 3

Scoring sheet for assessing usability of trajectory inference methods. Each quality aspect was given a weight based on how many times it was mentioned in a set of articles discussing best practices for tool development.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saelens, W., Cannoodt, R., Todorov, H. et al. A comparison of single-cell trajectory inference methods. Nat Biotechnol 37, 547–554 (2019). https://doi.org/10.1038/s41587-019-0071-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-019-0071-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing