A comparison of single-cell trajectory inference methods

Saelens, Wouter; Cannoodt, Robrecht; Todorov, Helena; Saeys, Yvan

doi:10.1038/s41587-019-0071-9

Article
Published: 01 April 2019

A comparison of single-cell trajectory inference methods

Nature Biotechnology volume 37, pages 547–554 (2019)Cite this article

92k Accesses
693 Citations
246 Altmetric
Metrics details

Subjects

Abstract

Trajectory inference approaches analyze genome-wide omics data from thousands of single cells and computationally infer the order of these cells along developmental trajectories. Although more than 70 trajectory inference tools have already been developed, it is challenging to compare their performance because the input they require and output models they produce vary substantially. Here, we benchmark 45 of these methods on 110 real and 229 synthetic datasets for cellular ordering, topology, scalability and usability. Our results highlight the complementarity of existing tools, and that the choice of method should depend mostly on the dataset dimensions and trajectory topology. Based on these results, we develop a set of guidelines to help users select the best method for their dataset. Our freely available data and evaluation pipeline (https://benchmark.dynverse.org) will aid in the development of improved tools designed to analyze increasingly large and complex single-cell datasets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of several key aspects of the evaluation.**

**Fig. 2: A characterization of the 45 methods evaluated in this study and their overall evaluation results.**

**Fig. 3: Detailed results of the four main evaluation criteria: accuracy, scalability, stability and usability.**

**Fig. 4: Complementarity between different trajectory inference methods.**

**Fig. 5: Practical guidelines for method users.**

**Fig. 6: Demonstration of how a common framework for TI methods facilitates broad applicability using some example datasets.**

Trajectory inference across multiple conditions with condiments

Article Open access 27 January 2024

Trajectory-based differential expression analysis for single-cell sequencing data

Article Open access 05 March 2020

Generalized and scalable trajectory inference in single-cell omics data with VIA

Article Open access 20 September 2021

Data availability

The processed real and synthetic datasets used in this study are deposited on Zenodo (https://doi.org/10.5281/zenodo.1443566)⁵⁵.

The main analysis repository is available at https://benchmark.dynverse.org and is divided into several experiments. Every experiment has its own set of scripts and results, each accompanied by an illustrated readme that can be browsed and explored on the Github website.

Code availability

The analysis scripts call several other R packages, of which an overview is available at dynverse.org. These packages include dynwrap, used to wrap the output of methods into the common trajectory model, dyneval, which contains the evaluation metrics, dynguidelines, the guidelines app, and dynplot for plotting trajectories.

References

Tanay, A. & Regev, A. Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 21350 (2017).
Article Google Scholar
Etzrodt, M., Endele, M. & Schroeder, T. Quantitative single-cell approaches to stem cell research. Cell Stem Cell 15, 546–558 (2014).
Article CAS PubMed Google Scholar
Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cannoodt, R., Saelens, W. & Saeys, Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46, 2496–2506 (2016).
Article CAS PubMed Google Scholar
Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data.Curr. Opin. Syst. Biol. 7, 36–46 (2018).
Article Google Scholar
Liu, Z. et al. Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat. Commun. 8, 22 (2017).
Article PubMed PubMed Central Google Scholar
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 59 (2019).
Article PubMed PubMed Central Google Scholar
Schlitzer, A. et al. Identification of cDC1- and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow. Nat. Immunol. 16, 718–728 (2015).
Article CAS PubMed Google Scholar
Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19, 271–281 (2017).
Article CAS PubMed PubMed Central Google Scholar
See, P. et al. Mapping the human DC lineage through the integration of high-dimensional techniques. Science 356, eaag3009 (2017).
Article PubMed PubMed Central Google Scholar
Aibar, S. et al. SCENIC: Single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
Article CAS PubMed PubMed Central Google Scholar
Regev, A. et al. Science forum: the human cell atlas. eLife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
Han, X. et al. Mapping the mouse cell atlas by microwell-seq. Cell 172, 1091–1107.e17 (2018).
Article CAS PubMed Google Scholar
Schaum, N. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Article PubMed Central Google Scholar
Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85–91 (2017).
Article Google Scholar
Henry, V. J., Bandrowski, A. E., Pepin, A.-S., Gonzalez, B. J. & Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database (Oxford) 2014, bau069 (2014).
Davis, S. et al. List of software packages for single-cell data analysis. https://github.com/seandavi/awesome-single-cell (2018); https://doi.org/10.5281/zenodo.1294021
Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput. Biol. 14, e1006245 (2018)
Bendall, S. C. et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 157, 714–725 (2014).
Article CAS PubMed PubMed Central Google Scholar
Shin, J. et al. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).
Article CAS PubMed PubMed Central Google Scholar
Campbell, K. & Yau, C. Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/026872 (2015).
Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Article CAS PubMed Google Scholar
Setty, M. et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat. Biotechnol. 34, 637–645 (2016).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 2859 (2014).
Article Google Scholar
Matsumoto, H. & Kiryu, H. SCOUP: a probabilistic model based on the Ornstein–Uhlenbeck process to analyze single-cell expression data during differentiation. BMC Bioinformatics 17, 232 (2016).
Article PubMed PubMed Central Google Scholar
Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).
Article CAS PubMed PubMed Central Google Scholar
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Article PubMed PubMed Central Google Scholar
Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117–e117 (2016).
Article PubMed PubMed Central Google Scholar
Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome. Biol. 17, 106 (2016).
Article PubMed PubMed Central Google Scholar
duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H. & Tsuda, K. CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics 17, 363 (2016).
Article PubMed PubMed Central Google Scholar
Cannoodt, R. et al. SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. Preprint at bioRxiv https://doi.org/10.1101/079509 (2016).
Lönnberg, T. et al. Single-cell RNA-seq and computational analysis using temporal mixture modeling resolves TH1/TFH fate bifurcation in malaria. Sci. Immunol. 2, eaal2192 (2017).
Article PubMed PubMed Central Google Scholar
Campbell, K. R. & Yau, C. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Res. 2, 19 (2017).
Article PubMed PubMed Central Google Scholar
Tian, L. et al. scRNA-seq mixology: Towards better benchmarking of single cell RNA-seq protocols and analysis methods. Preprint at bioRxiv https://doi.org/10.1101/433102 (2018).
Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).
Article CAS PubMed Google Scholar
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome. Biol. 18, 174 (2017).
Article PubMed PubMed Central Google Scholar
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).
Article CAS PubMed Google Scholar
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pya, N. & Wood, S. N. Shape constrained additive models. Stat. Comput. 25, 543–559 (2015).
Article Google Scholar
Taschuk, M. & Wilson, G. Ten simple rules for making research software more robust. PLoS Comput. Biol. 13, e1005412 (2017).
Mangul, S. et al. A comprehensive analysis of the usability and archival stability of omics computational tools and resources. Preprint at bioRxiv https://doi.org/10.1101/452532 (2018).
Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).
Article PubMed PubMed Central Google Scholar
Artaza, H. et al. Top 10 metrics for life science software good practices. F1000Res. 5, 2000 (2016).
Article Google Scholar
Saelens, W., Cannoodt, R. & Saeys, Y. A comprehensive evaluation of module detection methods for gene expression data. Nat. Commun. 9, 1090 (2018).
Article PubMed PubMed Central Google Scholar
Manno, G. L. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article PubMed PubMed Central Google Scholar
Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: Can we all be better than average? Mol. Syst. Biol. 7, 537 (2011).
Article PubMed PubMed Central Google Scholar
Gitter, A. Single-cell RNA-seq pseudotime estimation algorithms. https://github.com/agitter/single-cell-pseudotime (2018); https://doi.org/10.5281/zenodo.1297423
Kouno, T. et al. Temporal dynamics and transcriptional control using single-cell gene expression analysis. Genome. Biol. 14, R118 (2013).
Article PubMed PubMed Central Google Scholar
Zeng, C. et al. Pseudotemporal ordering of single cells reveals metabolic control of postnatal β cell proliferation. Cell. Metab. 25, 1160–1175.e11 (2017).
Article PubMed PubMed Central Google Scholar
Papadopoulos, N., Parra, R. G. & Soeding, J. PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes. Bioinformatics, btz078 (2019).
Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 5, 2122 (2016).
PubMed PubMed Central Google Scholar
Jurman, G., Visintainer, R., Filosi, M., Riccadonna, S. & Furlanello, C. in Proc. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 1–10 (IEEE, 2015); https://doi.org/10.1109/DSAA.2015.7344816
Wright, M. N. & Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1-17 (2017).
Beaulieu-Jones, B. K. & Greene, C. S. Reproducibility of computational workflows is automated using continuous analysis. Nat. Biotechnol. 35, 3780 (2017).
Article Google Scholar
Cannoodt, R., Saelens, W., Todorov, H. & Saeys, Y. Single-cell -omics datasets containing a trajectory (Version 2.0.0). Zenodo https://doi.org/10.5281/zenodo.1443566 (2018).

Download references

Acknowledgements

We would like to thank the original authors of the methods for their feedback and improvements on the method wrappers. This study was supported by the Fonds Wetenschappelijk Onderzoek (R.C., 11Y6218N and W.S., 11Z4518N) and BOF (Ghent University, H.T.). Y.S. is an ISAC Marylou Ingram scholar.

Author information

These authors contributed equally: Wouter Saelens, Robrecht Cannoodt.

Authors and Affiliations

Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
Wouter Saelens, Robrecht Cannoodt, Helena Todorov & Yvan Saeys
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
Wouter Saelens, Helena Todorov & Yvan Saeys
Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium
Robrecht Cannoodt
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
Robrecht Cannoodt
Centre International de Recherche en Infectiologie, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, École Normale Supérieure de Lyon, Université de Lyon, Lyon, France
Helena Todorov

Authors

Wouter Saelens
View author publications
You can also search for this author in PubMed Google Scholar
Robrecht Cannoodt
View author publications
You can also search for this author in PubMed Google Scholar
Helena Todorov
View author publications
You can also search for this author in PubMed Google Scholar
Yvan Saeys
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.C., W.S., H.T. and Y.S. designed the study. R.C. and W.S. performed the experiments and analyzed the data. W.S., R.C. and H.T. implemented software packages. R.C., W.S., Y.S. and H.T. prepared the manuscript. Y.S. supervised the project.

Corresponding author

Correspondence to Yvan Saeys.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 A common interface for TI methods.

(a) The input and output of each TI method is standardized. As input, each TI method receives either raw or normalized counts, several parameters, and a selection of prior information. After its execution, a method uses one of the seven wrapper functions to transform its output to the common trajectory model. This common model then allows to perform common analysis functions on trajectory models produced by any TI method. (b) Illustrations of the specific transformations performed by each of the wrapper functions.

Supplementary Figure 2 Results from the evaluation, for all methods and across all evaluation criteria.

(a) We characterized the methods according to the wrapper type, their required priors, whether the inferred topology is constrained by the algorithm (fixed) or a parameter (param), and the types of inferable topologies. The methods are grouped vertically based on the most complex trajectory type they can infer. (b) The overall results of the evaluation on four criteria: benchmarking using a reference trajectory on real and synthetic data, scalability with increasing number of cells and features, stability across dataset subsamples, and quality of the implementation. (c) Accuracy of trajectory inference methods across metrics, dataset sources and dataset trajectory types. The performance of a method is generally more stable across dataset sources, but very variable depending on the metric and trajectory type. (d) Predicted execution times and memory usage for varying numbers of cells and features (# cells × # features). Predictions were made by training a regression model after running each method on bootstrapped datasets with varying numbers of cells and features. (e) Stability results by calculating the average pairwise similarity between models inferred across multiple runs of the same method. (f) Usability scores of the tool and corresponding manuscript, grouped per category.

Supplementary Figure 3 Accuracy of trajectory inference methods.

(a) Overall score for all methods across 339 datasets, colored by the source of the datasets. Black line indicates the mean. (b) Similarity between the overall scores of all dataset sources, compared to real datasets with a gold standard, across all methods (n = 46, after filtering out methods that errored too frequently). Shown in the top left is the Pearson correlation. (c) Bias in the overall score towards trajectory types for all methods across 339 datasets. Black line indicates the mean. (d) Distributions of the difference in size between predicted and reference topologies. A positive difference means that the topology predicted by the method is more complex than the one in the reference.

Supplementary Figure 4 Scalability of trajectory inference methods.

(a) Three examples of average observed running times across five datasets (left) and the predicted running time (right). (b) Overview of the scalability results of all methods, ordered by their average predicted running time from (a). We predicted execution times and memory usage for each method with increasing number of features or cells, and used these values to classify each method into sublinear, linear, quadratic and superquadratic based on the shape of the curve.

Supplementary Figure 5 Agreement between actual values and predictions for execution times and memory usage.

We created a predictive model of the running time and memory usage based on a set of scaling datasets (left), and validated this model based on the similarity of the predictions and actual values on all benchmark datasets (right). Shown are the values for each method and dataset (n = 65618 for training, n = 11939 for test). Top left indicates the Pearson correlation coefficient.

Supplementary Figure 6 Usability of trajectory inference methods.

Shown is the score given for each method on every item from the usability score sheet (Supplementary Table 3). Each aspect of the quality control was part of a category, and each category was weighted so that it contributed equally to the final quality score. Within each category, each aspect also received a weight depending on how often it was mentioned in a set of papers discussing good practices in tool development and evaluation. This is represented in the plot as the height on the y-axis. Top: Average usability score for each method. Right: The average score of each quality control item. Shown into more detail are those items which had an average score lower than 0.5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saelens, W., Cannoodt, R., Todorov, H. et al. A comparison of single-cell trajectory inference methods. Nat Biotechnol 37, 547–554 (2019). https://doi.org/10.1038/s41587-019-0071-9

Download citation

Received: 05 April 2018
Accepted: 13 February 2019
Published: 01 April 2019
Issue Date: May 2019
DOI: https://doi.org/10.1038/s41587-019-0071-9

This article is cited by

Powerful and accurate detection of temporal gene expression patterns from multi-sample multi-stage single-cell transcriptomics data with TDEseq
- Yue Fan
- Lei Li
- Shiquan Sun
Genome Biology (2024)
A comparison of marker gene selection methods for single-cell RNA sequencing data
- Jeffrey M. Pullin
- Davis J. McCarthy
Genome Biology (2024)
Computational immunogenomic approaches to predict response to cancer immunotherapies
- Venkateswar Addala
- Felicity Newell
- Nicola Waddell
Nature Reviews Clinical Oncology (2024)
Genetic variation across and within individuals
- Zhi Yu
- Tim H. H. Coorens
- Pradeep Natarajan
Nature Reviews Genetics (2024)
Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN
- Yanay Rosen
- Maria Brbić
- Jure Leskovec
Nature Methods (2024)