Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments


Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create ‘pseudo cells’ from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the scRNA-seq mixology experimental design and benchmark analysis.
Fig. 2: Comparisons of normalization and imputation methods using multiple mixture datasets.
Fig. 3: Comparisons of scRNA-seq clustering methods.
Fig. 4: Comparisons of scRNA-seq trajectory analysis methods.
Fig. 5: Comparisons of data integration methods for batch effect correction for the RNA mixture and the four single cell experiments.
Fig. 6: Summary of results from methods comparisons using scRNA-mixology datasets.

Data availability

Raw data are available under GEO SuperSeries GSE118767. A summary of the individual accession numbers is given in Supplementary Table 1. The processed SingleCellExperiment R objects are available from

Code availability

Code used to perform the comparative analyses and generate the figures is available from The CellBench R package was developed for benchmarking single cell analysis methods and is available from and Bioconductor (


  1. Cole, M. B. et al. Performance assessment and selection of normalization procedures for single-cell RNA-Seq. Cell Syst. 8, 315–328 (2019).

    Article  CAS  Google Scholar 

  2. Yip, S. H., Sham, P. C. & Wang, J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief. Bioinform. (2018).

  3. Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).

    Article  CAS  Google Scholar 

  4. Freytag, S., Tian, L., Lönnstedt, I., Ng, M. & Bahlo, M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res. 7, 1297 (2018).

    Article  Google Scholar 

  5. Duò, A., Robinson, M. D. & Soneson, C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res. 7, 1141 (2018).

    Article  Google Scholar 

  6. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  CAS  Google Scholar 

  7. Svensson, V. et al. Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017).

    Article  CAS  Google Scholar 

  8. Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    Article  CAS  Google Scholar 

  9. Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1098 (2013).

    Article  CAS  Google Scholar 

  10. Grün, D., Kester, L. & Van Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  Google Scholar 

  11. Cope, L. M., Irizarry, R. A., Jaffee, H. A., Wu, Z. & Speed, T. P. A benchmark for affymetrix genechip expression measures. Bioinformatics 20, 323–331 (2004).

    Article  CAS  Google Scholar 

  12. Sequencing Quality Control Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat. Biotechnol. 32, 903–914 (2014).

    Article  Google Scholar 

  13. Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).

    Article  CAS  Google Scholar 

  14. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2018).

  15. Huber, W. et al. Orchestrating high-throughput genomic analysis with bioconductor. Nat. Methods 12, 115–121 (2015).

    Article  CAS  Google Scholar 

  16. Tian, L. et al. scPipe: a flexible R/Bioconductor preprocessing pipeline for single-cell RNA-sequencing data. PLoS Comput. Biol. 14, e1006361 (2018).

    Article  Google Scholar 

  17. Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

    Article  Google Scholar 

  18. Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).

    Article  Google Scholar 

  19. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

    Article  Google Scholar 

  20. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

  21. Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput. Biol. 11, e1004333 (2015).

    Article  Google Scholar 

  22. Bacher, R. et al. SCnorm: robust normalization of single-cell RNA-seq data. Nat. Methods 14, 584–586 (2017).

    Article  CAS  Google Scholar 

  23. Yip, S. H., Wang, P., Kocher, J. P. A., Sham, P. C. & Wang, J. Linnorm: improved statistical analysis for single cell RNA-seq expression data. Nucleic Acids Res. 45, e179 (2017).

    Article  CAS  Google Scholar 

  24. Wagner, F., Yan, Y. & Yanai, I. K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data. Preprint at (2018).

  25. Gong, W., Kwak, I.-Y., Pota, P., Koyano-Nakagawa, N. & Garry, D. J. DrImpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics. 19, 220 (2018).

    Article  Google Scholar 

  26. Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    Article  CAS  Google Scholar 

  27. Andrews, T. S. & Hemberg, M. False signals induced by single-cell imputation. F1000Res. 7, 1740 (2018).

    Article  Google Scholar 

  28. Herman, J. S., Sagar & Grün, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15, 379–386 (2018).

    Article  CAS  Google Scholar 

  29. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

    Article  CAS  Google Scholar 

  30. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    Article  CAS  Google Scholar 

  31. Purdom, E. & Risso, D. clusterExperiment: Compare Clusterings for Single-Cell Sequencing. R package version 2.2.0

  32. Kiselev, V. Y. et al. SC3: Consensus clustering of single-cell RNA-seq data. Nat. Methods 14, 483–486 (2017).

    Article  CAS  Google Scholar 

  33. Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).

    Article  Google Scholar 

  34. Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).

    Article  Google Scholar 

  35. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    Article  CAS  Google Scholar 

  36. Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 047845 (2016).

    Article  Google Scholar 

  37. Ji, Z. & Ji, H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117 (2016).

    Article  Google Scholar 

  38. Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  Google Scholar 

  39. Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421 (2018).

    Article  CAS  Google Scholar 

  40. Hie, B. L., Bryson, B. & Berger, B. Panoramic stitching of heterogeneous single-cell transcriptomic data. Preprint at

  41. Lin, Y. et al. scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proc. Natl Acad. Sci. USA 116, 9775–9784 (2019).

    CAS  PubMed  Google Scholar 

  42. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  CAS  Google Scholar 

  43. Rohart, F., Eslami, A., Matigian, N., Bougeard, S. & Lê Cao, K.-A. MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinformatics 18, 128 (2017).

    Article  Google Scholar 

  44. Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  Google Scholar 

  45. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  Google Scholar 

  46. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  Google Scholar 

  47. Holik, A. Z. et al. RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods. Nucleic Acids Res. 45, e30 (2017).

    Article  Google Scholar 

  48. Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).

    Article  Google Scholar 

  49. Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).

    Article  CAS  Google Scholar 

  50. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Article  CAS  Google Scholar 

  51. Liao, Y., Smyth, G. K. & Shi, W. The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).

    Article  Google Scholar 

  52. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at (2018).

  53. McInnes, L., Healy, J., Saul, N. & Grossberger, L. Umap: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  54. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: cluster analysis basics and extensions. R package version 2.0.7-1 (2018).

  55. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  Google Scholar 

  56. Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).

    Article  Google Scholar 

  57. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).

  58. Kolde, R. pheatmap: Pretty Heatmaps. R package v.1.0.10 (2018).

Download references


We thank C. Weeden and M.-L. Asselin-Labat for providing the cell lines used in this study, J. Schreuder and D. Lin for assistance in conducting experiments and I. Virshup for assistance in the data integration analysis. This work was supported by funding from the National Health and Medical Research Council (NHMRC) Project Grants (No. GNT1143163 to M.E.R., No. GNT1124812 to S.H.N. and M.E.R., and No. GNT1062820 to S.H.N.), Fellowship Nos. GNT1104924 to M.E.R. and GNT1087415 to K.A.L.C., the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (grant no. 2018-182819 to MER and no. 2018-182885 to K.A.L.C.), a Melbourne Research Scholarship to L.T., the Genomics Innovation Hub, the Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS.

Author information

Authors and Affiliations



L.T. designed, planned and performed experiments, conducted data analysis and wrote the manuscript. X.D., S.F., K.A.L.C., S.S. and A.J.A. performed data analysis and wrote the manuscript. D.A.Z., T.S.W., A.S. and J.S.J. performed experiments. S.H.N. and M.E.R. designed the study. M.E.R. supervised the analysis and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Luyi Tian or Matthew E. Ritchie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Overview of the RNA mixture and cell mixture experimental designs.

(a) Ternary plot and plate layout of the RNA mixture design, which involved extracting RNA from 3 cell lines (H2228, H1975 and HCC827) in bulk, and mixing it in known proportions to get 8 mixtures that were then diluted to single cell equivalent amounts ranging from 3.75pg to 30pg. (b) Ternary plot of cell mixtures. Various 9 cell combinations were obtained using the same 3 cell lines. The number of replicates for each combination varies, as does the number of low quality samples included.

Supplementary Figure 2 Violin plots of quality control metrics for cells from each benchmarking dataset.

(a) The proportion of reads that map to introns. (b) The proportion of reads that map to exons. (c) The number of reads that map to exons. (d) The total number of counts per cell after UMI deduplication. (e) The amplification rate, which is defined by the ratio between the reads mapping to exons and the UMI counts after UMI deduplication. This measure reflects the library complexity. The sample size for each dataset is shown in Supplementary Table 1.

Supplementary Figure 3 Visualization of representative benchmarking datasets using t-SNE and UMAP and violin plot of the number of doublets.

(a) t-SNE and UMAP visualizations of 4 datasets. From left to right: 10X single cell using 3 cell lines; 10X single cell using 5 cell lines; a CEL-seq2 cell mixture and a CEL-seq2 RNA mixture. Each point represents a cell or ‘pseudo cell’ and the number of cells in each plot is indicated in the title. (b) Violin plot of doublets in each dataset, identified using Demuxlet (DBL: doublet, SNG: single cell). The number of single cells and doublets are shown on top of each violin. Doublets were excluded when calculating the performance metrics.

Supplementary Figure 4 Violin plots of silhouette widths for different normalization methods.

Silhouette widths calculated using the known biological groups after data have been normalized by different methods. The input to the silhouette width calculation is the distance between cells, which have been calculated using either (a) the gene expression matrix with the 1,000 most highly variable genes or (b) the first two PCs obtained from PCA. The sample sizes in this plots are the same as shown in Fig. 2a.

Supplementary Figure 5 Example clustering results and summary of clustering performance using ARI and the number of clusters.

(a) Examples of clustering results visualized by PCA (top) and t-SNE (bottom), with different colours representing the cluster assignments made by the selected method. The sample sizes are 340 for RNAmix_CEL-seq2, 285 for cellmix3 and 274 for sc_CEL-seq2. Coefficients obtained from linear models fitted using the ARI (b) or the number of clusters (c) as dependent variables, and experimental design, normalization methods, imputation methods and clustering methods as covariates. The coefficients measure whether particular features have positive or negative associations with the dependent variables.

Supplementary Figure 6 Coefficients from linear models used to quantify the impact different methods have on the trajectory analysis and data integration results.

Linear models were fitted using the evaluation metrics as dependent variables, with experimental design, normalization methods, imputation methods and either trajectory analysis or data integration methods as covariates. Positive coefficients indicates that a method is positively associated with the performance metrics. The evaluation metrics used as dependent variables for each plot were: (a) the correlations between calculated pseudotime and ground truth; (b) the overlap between the calculated trajectory and the known trajectory; (c) the average silhouette width of the known groups and (d) the kBET acceptance rate.

Supplementary Figure 7 Visualization of results from the trajectory analysis methods evaluated in our study.

Results for the RNAmix_Sort-seq, cellmix2 and cellmix1 analyses are shown. The dimension reduction method chosen for each method was as follows: PCA for Slingshot and TSCAN, DDR tree for Monocle2, diffusion map for DPT and locally linear embedding (LLE) for SLICER. The sample sizes are 266 for cellmix1, 268 for cellmix2 and 296 for RNAmix_Sort-seq.

Supplementary Figure 8 Additional data integration results for the single cell and RNA mixture datasets.

The kBET acceptance rate versus silhouette width coefficient for each method for the two combinations that had the highest silhouette width from the RNA mixture (a) and single cell (b) data integration analyses. The silhouette width assesses the ability of a given method to group biologically similar cells together while kBET assesses whether different batches are homogeneous after batch effect correction (scMerge_s: supervised scMerge; scMerge_us: unsupervised scMerge) (c) Additional PCA plots from the RNA mixture analysis and (d) t-SNE (perplexity = 30) plots from the single cell analysis to visualize the results of different method combinations. Cells are coloured according to protocol/batch information (t-SNE for MNNs and scanorama were based on batch corrected expression matrices). The samples sizes are n=636 and n=5,319 in panels c and d respectively.

Supplementary Figure 9 Performance of clustering on the RNA mixture datasets after data integration is applied.

Scatter plot of the ARI versus the number of clusters detected for the top performing normalization and imputation combinations. The true number of clusters for the RNA mixture experiment is 7.

Supplementary Iinformation

Supplementary Information

Supplementary Figs. 1–9

Reporting Summary

Supplementary Table 1

Summary of the benchmarking datasets generated.

Supplementary Table 2

Summary of the data characteristics and data analysis tasks that can be compared by each experimental design.

Supplementary Table 3

Summary of integrative methods used to combine data from different protocols and datasets.

Supplementary Table 4

Individual performance metrics obtained from benchmarking analysis, organized by task.

Supplementary Table 5

Run times for different analysis pipelines.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, L., Dong, X., Freytag, S. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat Methods 16, 479–487 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research