Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The triumphs and limitations of computational methods for scRNA-seq

A Publisher Correction to this article was published on 30 June 2021

This article has been updated

Abstract

The rapid progress of protocols for sequencing single-cell transcriptomes over the past decade has been accompanied by equally impressive advances in the computational methods for analysis of such data. As capacity and accuracy of the experimental techniques grew, the emerging algorithm developments revealed increasingly complex facets of the underlying biology, from cell type composition to gene regulation to developmental dynamics. At the same time, rapid growth has forced continuous reevaluation of the underlying statistical models, experimental aims, and sheer volumes of data processing that are handled by these computational tools. Here, I review key computational steps of single-cell RNA sequencing (scRNA-seq) analysis, examine assumptions made by different approaches, and highlight successes, remaining ambiguities, and limitations that are important to keep in mind as scRNA-seq becomes a mainstream technique for studying biology.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Key preprocessing steps in single-cell RNA-seq analysis.
Fig. 2: Key analysis steps in single-cell RNA-seq analysis.
Fig. 3: scRNA-seq basics.
Fig. 4: Approximating and partitioning complex manifolds.
Fig. 5: Approximating dynamical processes.

Data availability

The following scRNA-seq datasets were used in creating example figures:

• 10x Genomics PBMC 10k (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3).

• 10x Genomics PBMC 66k (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.1.0/5k_pbmc_NGSC3_aggr).

• Fetal pancreas: E12.5 timepoint data from Byrnes et al.84 were downloaded from GEO (GSM3140915).

• Mouse developing retina: 10x Chromium replicate from Lo Giudice et al.85 was downloaded from GEO (GSM3466902).

• Cell lines: Benchmarking data measuring different cell lines on different platforms, taken from Tian et al.89, were downloaded from GEO (GSE118767).

• Metadata on the single-cell RNA-seq experiments were taken from http://www.nxn.se/single-cell-studies/.

Code availability

The notebooks and scripts for the figures presented in the paper can be found on the author’s website: http://pklab.med.harvard.edu/peterk/review2020/.

Change history

References

  1. 1.

    Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  2. 2.

    Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).

    CAS  Article  Google Scholar 

  3. 3.

    Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).

    CAS  Article  Google Scholar 

  4. 4.

    Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  5. 5.

    Vu, T. N. et al. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128–2135 (2016).

    CAS  Article  Google Scholar 

  6. 6.

    Svensson, V. Droplet scRNA-seq is not zero-inflated. Nat. Biotechnol. 38, 147–150 (2020).

    CAS  Article  Google Scholar 

  7. 7.

    Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  8. 8.

    Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  9. 9.

    Wang, T., Li, B., Nelson, C. E. & Nabavi, S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinf. 20, 40 (2019).

    Article  Google Scholar 

  10. 10.

    Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  11. 11.

    Vallejos, C. A., Richardson, S. & Marioni, J. C. Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol 17, 70 (2016).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  12. 12.

    Nabavi, S., Schmolze, D., Maitituoheti, M., Malladi, S. & Beck, A. H. EMDomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes. Bioinformatics 32, 533–541 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Korthauer, K. D. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  14. 14.

    Martinez-Jimenez, C. P. et al. Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science 355, 1433–1436 (2017).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  15. 15.

    Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 11, 6077 (2020).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  16. 16.

    Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  17. 17.

    Ntranos, V., Kamath, G. M., Zhang, J. M., Pachter, L. & Tse, D. N. Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 17, 112 (2016).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  18. 18.

    Aggarwal, C. C., Hinneburg, A. & Keim, D. A. in Database Theory — ICDT 2001. (eds Van den Bussche, J. & Vianu, V.) 420–434 (Springer Berlin Heidelberg, 2001).

  19. 19.

    Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  20. 20.

    Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  21. 21.

    Eling, N., Richard, A. C., Richardson, S., Marioni, J. C. & Vallejos, C. A. Correcting the mean-variance dependency for differential variability testing using single-cell RNA sequencing data. Cell Syst. 7, 284–294 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  22. 22.

    Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  23. 23.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  24. 24.

    Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e1821 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  25. 25.

    Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  26. 26.

    Shao, C. & Hofer, T. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 33, 235–242 (2017).

    CAS  Article  Google Scholar 

  27. 27.

    Zhu, X., Ching, T., Pan, X., Weissman, S. M. & Garmire, L. Detecting heterogeneity in single-cell RNA-seq data by non-negative matrix factorization. PeerJ 5, e2888 (2017).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  28. 28.

    Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16, 241 (2015).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  29. 29.

    Zhou, M. Nonparametric Bayesian negative binomial factor analysis. Bayesian Analysis 13, 1065–1093 (2018).

    Article  Google Scholar 

  30. 30.

    Zhang, L. & Mallick, B. K. Inferring gene networks from discrete expression data. Biostatistics 14, 708–722 (2013).

    Article  Google Scholar 

  31. 31.

    Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. Publisher Correction: A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 10, 646 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  32. 32.

    Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  33. 33.

    Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).

    CAS  Article  Google Scholar 

  34. 34.

    Aggarwal, C. C. Neural Networks and Deep Learning: A Textbook. (Springer International Publishing, 2018).

  35. 35.

    Svensson, V., Gayoso, A., Yosef, N. & Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 36, 3418–3421 (2020).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  36. 36.

    Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  37. 37.

    Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  38. 38.

    Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  39. 39.

    Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).

    CAS  Article  Google Scholar 

  40. 40.

    Jarvis, R. A. & Patrick, E. A. Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. C-22, 1025–1034 (1973).

    Article  Google Scholar 

  41. 41.

    Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2020.05.22.111161 (2020).

  42. 42.

    Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  43. 43.

    Van Mieghem, P. Graph Spectra for Complex Networks. (Cambridge University Press, 2010).

  44. 44.

    Haghverdi, L., Buttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  45. 45.

    Maaten, L. V. D. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  46. 46.

    McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).

  47. 47.

    Amir el, A. D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545–552 (2013).

    Article  CAS  Google Scholar 

  48. 48.

    Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).

  49. 49.

    Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

  50. 50.

    Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

    Article  Google Scholar 

  51. 51.

    Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  52. 52.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  53. 53.

    Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    PubMed Central  Article  PubMed  Google Scholar 

  54. 54.

    Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  55. 55.

    Pons, P. & Latapy, M. Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10, 191–218 (2006).

    Article  Google Scholar 

  56. 56.

    Gorban, A. N. & Zinovyev, A. Y. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 28–59 (IGI Global, 2010).

  57. 57.

    Hastie, T. & Stuetzle, W. Principal curves. J. Am. Stat. Assoc. 84, 502–516 (1989).

    Article  Google Scholar 

  58. 58.

    Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  59. 59.

    Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Soldatov, R. et al. Spatiotemporal structure of cell fate decisions in murine neural crest. Science 364, eaas9536 (2019).

    CAS  Article  Google Scholar 

  61. 61.

    Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e11 (2016).

    Article  CAS  Google Scholar 

  62. 62.

    Shin, J. et al. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).

    CAS  Article  Google Scholar 

  63. 63.

    Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  64. 64.

    Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 1–9 (2019).

    Article  Google Scholar 

  65. 65.

    Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat. Neurosci. 21, 120–129 (2018).

    CAS  Article  Google Scholar 

  66. 66.

    Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943(2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  67. 67.

    Tran, T. N. & Bader, G. D. Tempora: cell trajectory inference using time-series single-cell RNA sequencing data. PLoS Comput. Biol. 16, e1008205 (2020).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  68. 68.

    Weinreb, C., Wolock, S., Tusi, B. K., Socolovsky, M. & Klein, A. M. Fundamental limits on dynamic inference from single-cell snapshots. Proc. Natl Acad. Sci. USA 115, E2467(2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  69. 69.

    Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  70. 70.

    La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  71. 71.

    Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  CAS  Google Scholar 

  72. 72.

    Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  73. 73.

    Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571, 419–423 (2019).

    CAS  Article  Google Scholar 

  74. 74.

    Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887(2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  75. 75.

    Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324(2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  76. 76.

    Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  77. 77.

    Lein, E., Borm, L. E. & Linnarsson, S. The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing. Science 358, 64–69 (2017).

    CAS  Article  Google Scholar 

  78. 78.

    Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).

  79. 79.

    Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature 568, 235–239 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  80. 80.

    Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  81. 81.

    Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).

    CAS  Article  Google Scholar 

  82. 82.

    Angerer, P. et al. destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32, 1241–1243 (2016).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  83. 83.

    Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database (Oxford) 2020, baaa073 (2020).

    Article  CAS  Google Scholar 

  84. 84.

    Byrnes, L. E. et al. Lineage dynamics of murine pancreatic development at single-cell resolution. Nat. Commun. 9, 3922 (2018).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  85. 85.

    Lo Giudice, Q., Leleu, M., La Manno, G. & Fabre, P. J. Single-cell transcriptional logic of cell-fate specification and axon guidance in early-born retinal neurons. Development 146, dev178103 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).

    PubMed Central  Article  PubMed  Google Scholar 

  87. 87.

    Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  88. 88.

    Mao, Q., Yang, L., Wang, L., Goodison, S. & Sun, Y. in Proceedings of the 2015 SIAM International Conference on Data Mining 792–800 (SIAM, 2015).

  89. 89.

    Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

P.V.K was supported by the NHLBI R01HL131768 award from NIH and CAREER (NSF-14-532) award from NSF.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Peter V. Kharchenko.

Ethics declarations

Competing interests

P.V.K. serves on the scientific advisory boards of Celsius Therapeutics and Biomage Inc.

Additional information

Peer review information Nature Methods thanks Martin Hemberg, Michael Morgan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Properties of scRNA-seq measurements.

a. Dependency between cost per cell (x axis) and the expected depth (UMIs per cell, y axis) is shown for a number of popular methods, largely based on the assessment by Ding. et al.1. b–d. Systematic transcript-specific bias of different scRNA-seq protocols. b. The scatter plot shows average log10(CPM+1) values for different genes (each dot represents a gene), as assessed using 10x Chromium (x axis) or dropseq (y axis) platforms. Genes showing higher (red) or lower (green) expression (above 10-fold threshold) are highlighted. c, d. Similar scatter plots shown for other two cell lines: H2228 (b) and HCC827 (c) cells. The set of differential genes determined from analysis of the H1975 cell line (a) is shown. Most of the genes that showed large discrepancy in the detection rate in H1975 results also show same discrepancy in the other two cell lines, illustrating stable detection bias between the two platforms. e. The ability to distinguish nearest neighbors decreases as the dimensionality of the space increases. The difference between closest (mind) and furthest (maxd) points from the origin, normalized by mind (y axis) is shown for different distance measures as a function of increasing number of dimensions (x axis). For each dimensionality n, a set of 100 random points are drawn from the n-dimensional uniform distribution, and a median of 1000 draws is shown. The distinction between closest and furthest points approaches 0 at high dimensions. In other words, relative to the origin, in high-dimensional space the points appear to be distributed on the surface of a high-dimensional sphere. f. Principal tree fit to the PBMC10k dataset. The tree shows computationally optimal spanning of the PBMC populations, yet the interpreting it as a dynamic process is incorrect.

Extended Data Fig. 2 Dimensionality reduction and neural networks.

a. A t-SNE embedding of the PBMC10k dataset (left); projection of cells onto the first two principal components (middle); projection of cells onto first two basis of the non-negative matrix factorization (right); b. Projection of cells onto the first two principal components, based on re-analysis of a subset of the PBMC10k dataset that contains only T lymphocytes. Given this restricted cellular context, the first two components are much better at capturing separation between different subsets of T cells, compared to the PCA on the full dataset shown in the previous panel. c. Visualization of the PBMC10k dataset in the 2D latent space determined by an autoencoder structure shown in (d). d. The architecture of an autoencoder used to reduce dimensions of the PBMC10k dataset in the previous panel. The autoencoder starts with a vector of top 3000 most variable genes, and then for each cell transforms this expression profile through a series of non-linear transformations, first into increasingly narrow dimensions, culminating in a two-dimensional middle layer, and then back into a full 3000-dimensional vector. The values of the two-dimensional middle layer are shown in (d). The parameters of the transformations connecting each layer are optimized so that they minimize the discrepancy between the original expression vector (leftmost layer) and the reconstructed vector (rightmost layer). e, f. Using neural networks to learn non-linear mapping from high-dimensional expression state to the coordinates of a t-SNE embedding. As t-SNE embeddings are based on empirical optimization of the relative positions of neighboring cells, there is no obvious analytical function connecting the expression state with the resulting t-SNE coordinates. Neural networks, however, can be used to approximate highly nonlinear and noisy functions. Here, a neural network with an architecture shown in (f) was used to approximate such a function. The parameters of the transformations connecting the layers were optimized based on a training set of 3000 cells, and then an additional set of 3000 test cells was used to illustrate the resulting fit. The left panel in (e) shows the actual positions of the 3000 test cells, and the right plot shows the positions predicted by the trained network.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kharchenko, P.V. The triumphs and limitations of computational methods for scRNA-seq. Nat Methods 18, 723–732 (2021). https://doi.org/10.1038/s41592-021-01171-x

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing