The triumphs and limitations of computational methods for scRNA-seq

Kharchenko, Peter V.

doi:10.1038/s41592-021-01171-x

Review Article
Published: 21 June 2021

The triumphs and limitations of computational methods for scRNA-seq

Peter V. Kharchenko ORCID: orcid.org/0000-0002-6036-5875¹

Nature Methods volume 18, pages 723–732 (2021)Cite this article

36k Accesses
91 Citations
229 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 30 June 2021

This article has been updated

Abstract

The rapid progress of protocols for sequencing single-cell transcriptomes over the past decade has been accompanied by equally impressive advances in the computational methods for analysis of such data. As capacity and accuracy of the experimental techniques grew, the emerging algorithm developments revealed increasingly complex facets of the underlying biology, from cell type composition to gene regulation to developmental dynamics. At the same time, rapid growth has forced continuous reevaluation of the underlying statistical models, experimental aims, and sheer volumes of data processing that are handled by these computational tools. Here, I review key computational steps of single-cell RNA sequencing (scRNA-seq) analysis, examine assumptions made by different approaches, and highlight successes, remaining ambiguities, and limitations that are important to keep in mind as scRNA-seq becomes a mainstream technique for studying biology.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Key preprocessing steps in single-cell RNA-seq analysis.**

**Fig. 2: Key analysis steps in single-cell RNA-seq analysis.**

**Fig. 4: Approximating and partitioning complex manifolds.**

**Fig. 5: Approximating dynamical processes.**

Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data

Article 07 December 2020

Tools for the analysis of high-dimensional single-cell RNA sequencing data

Article 27 March 2020

Multi-level cellular and functional annotation of single-cell transcriptomes using scPipeline

Article Open access 28 October 2022

Data availability

The following scRNA-seq datasets were used in creating example figures:

• 10x Genomics PBMC 10k (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3).

• 10x Genomics PBMC 66k (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.1.0/5k_pbmc_NGSC3_aggr).

• Fetal pancreas: E12.5 timepoint data from Byrnes et al.⁸⁴ were downloaded from GEO (GSM3140915).

• Mouse developing retina: 10x Chromium replicate from Lo Giudice et al.⁸⁵ was downloaded from GEO (GSM3466902).

• Cell lines: Benchmarking data measuring different cell lines on different platforms, taken from Tian et al.⁸⁹, were downloaded from GEO (GSE118767).

• Metadata on the single-cell RNA-seq experiments were taken from http://www.nxn.se/single-cell-studies/.

Code availability

The notebooks and scripts for the figures presented in the paper can be found on the author’s website: http://pklab.med.harvard.edu/peterk/review2020/.

Change history

30 June 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41592-021-01223-2

References

Ding, J. et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020).
Article CAS PubMed Central PubMed Google Scholar
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).
Article CAS PubMed Google Scholar
Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013).
Article CAS PubMed Google Scholar
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).
Article CAS PubMed Central PubMed Google Scholar
Vu, T. N. et al. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 32, 2128–2135 (2016).
Article CAS PubMed Google Scholar
Svensson, V. Droplet scRNA-seq is not zero-inflated. Nat. Biotechnol. 38, 147–150 (2020).
Article CAS PubMed Google Scholar
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
Article PubMed Central CAS PubMed Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Article CAS PubMed Central PubMed Google Scholar
Wang, T., Li, B., Nelson, C. E. & Nabavi, S. Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. BMC Bioinf. 20, 40 (2019).
Article Google Scholar
Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15, 255–261 (2018).
Article CAS PubMed Google Scholar
Vallejos, C. A., Richardson, S. & Marioni, J. C. Beyond comparisons of means: understanding changes in gene expression at the single-cell level. Genome Biol 17, 70 (2016).
Article PubMed Central CAS PubMed Google Scholar
Nabavi, S., Schmolze, D., Maitituoheti, M., Malladi, S. & Beck, A. H. EMDomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes. Bioinformatics 32, 533–541 (2016).
Article CAS PubMed Google Scholar
Korthauer, K. D. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016).
Article PubMed Central CAS PubMed Google Scholar
Martinez-Jimenez, C. P. et al. Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science 355, 1433–1436 (2017).
Article CAS PubMed Central PubMed Google Scholar
Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 11, 6077 (2020).
Article CAS PubMed Central PubMed Google Scholar
Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
Article CAS PubMed Central PubMed Google Scholar
Ntranos, V., Kamath, G. M., Zhang, J. M., Pachter, L. & Tse, D. N. Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts. Genome Biol. 17, 112 (2016).
Article PubMed Central CAS PubMed Google Scholar
Aggarwal, C. C., Hinneburg, A. & Keim, D. A. in Database Theory — ICDT 2001. (eds Van den Bussche, J. & Vianu, V.) 420–434 (Springer Berlin Heidelberg, 2001).
Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
Article CAS PubMed Central PubMed Google Scholar
Fan, J. et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016).
Article CAS PubMed Central PubMed Google Scholar
Eling, N., Richard, A. C., Richardson, S., Marioni, J. C. & Vallejos, C. A. Correcting the mean-variance dependency for differential variability testing using single-cell RNA sequencing data. Cell Syst. 7, 284–294 (2018).
Article CAS PubMed Central PubMed Google Scholar
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).
Article CAS PubMed Central PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
Article PubMed Central CAS PubMed Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 e1821 (2019).
Article CAS PubMed Central PubMed Google Scholar
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Article CAS PubMed Central PubMed Google Scholar
Shao, C. & Hofer, T. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 33, 235–242 (2017).
Article CAS PubMed Google Scholar
Zhu, X., Ching, T., Pan, X., Weissman, S. M. & Garmire, L. Detecting heterogeneity in single-cell RNA-seq data by non-negative matrix factorization. PeerJ 5, e2888 (2017).
Article PubMed Central CAS PubMed Google Scholar
Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16, 241 (2015).
Article PubMed Central CAS PubMed Google Scholar
Zhou, M. Nonparametric Bayesian negative binomial factor analysis. Bayesian Analysis 13, 1065–1093 (2018).
Article Google Scholar
Zhang, L. & Mallick, B. K. Inferring gene networks from discrete expression data. Biostatistics 14, 708–722 (2013).
Article PubMed Google Scholar
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J. P. Publisher Correction: A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 10, 646 (2019).
Article CAS PubMed Central PubMed Google Scholar
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).
Article CAS PubMed Central PubMed Google Scholar
Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).
Article CAS PubMed Google Scholar
Aggarwal, C. C. Neural Networks and Deep Learning: A Textbook. (Springer International Publishing, 2018).
Svensson, V., Gayoso, A., Yosef, N. & Pachter, L. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics 36, 3418–3421 (2020).
Article CAS PubMed Central PubMed Google Scholar
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
Article CAS PubMed Central PubMed Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Article CAS PubMed Central PubMed Google Scholar
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Article CAS PubMed Central PubMed Google Scholar
Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).
Article CAS PubMed Google Scholar
Jarvis, R. A. & Patrick, E. A. Clustering using a similarity measure based on shared near neighbors. IEEE Trans. Comput. C-22, 1025–1034 (1973).
Article Google Scholar
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2020.05.22.111161 (2020).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Article CAS PubMed Central PubMed Google Scholar
Van Mieghem, P. Graph Spectra for Complex Networks. (Cambridge University Press, 2010).
Haghverdi, L., Buttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).
Article CAS PubMed Google Scholar
Maaten, L. V. D. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Amir el, A. D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545–552 (2013).
Article CAS Google Scholar
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2018).
Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Article Google Scholar
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Article CAS PubMed Central PubMed Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article CAS PubMed Central PubMed Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article PubMed Central PubMed Google Scholar
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
Article CAS PubMed Central PubMed Google Scholar
Pons, P. & Latapy, M. Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10, 191–218 (2006).
Article Google Scholar
Gorban, A. N. & Zinovyev, A. Y. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 28–59 (IGI Global, 2010).
Hastie, T. & Stuetzle, W. Principal curves. J. Am. Stat. Assoc. 84, 502–516 (1989).
Article Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Article CAS PubMed Central PubMed Google Scholar
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Article CAS PubMed Google Scholar
Soldatov, R. et al. Spatiotemporal structure of cell fate decisions in murine neural crest. Science 364, eaas9536 (2019).
Article CAS PubMed Google Scholar
Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e11 (2016).
Article CAS Google Scholar
Shin, J. et al. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17, 360–372 (2015).
Article CAS PubMed PubMed Central Google Scholar
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19, 477 (2018).
Article PubMed Central CAS PubMed Google Scholar
Wolf, F. A. et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol. 20, 1–9 (2019).
Article Google Scholar
Hrvatin, S. et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat. Neurosci. 21, 120–129 (2018).
Article CAS PubMed Google Scholar
Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943(2019).
Article CAS PubMed Central PubMed Google Scholar
Tran, T. N. & Bader, G. D. Tempora: cell trajectory inference using time-series single-cell RNA sequencing data. PLoS Comput. Biol. 16, e1008205 (2020).
Article CAS PubMed Central PubMed Google Scholar
Weinreb, C., Wolock, S., Tusi, B. K., Socolovsky, M. & Klein, A. M. Fundamental limits on dynamic inference from single-cell snapshots. Proc. Natl Acad. Sci. USA 115, E2467(2018).
Article CAS PubMed Central PubMed Google Scholar
Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
Article CAS PubMed Central PubMed Google Scholar
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article PubMed Central CAS PubMed Google Scholar
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Article CAS PubMed Google Scholar
Cao, J., Zhou, W., Steemers, F., Trapnell, C. & Shendure, J. Sci-fate characterizes the dynamics of gene expression in single cells. Nat. Biotechnol. 38, 980–988 (2020).
Article CAS PubMed Central PubMed Google Scholar
Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571, 419–423 (2019).
Article CAS PubMed Google Scholar
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887(2019).
Article CAS PubMed Central PubMed Google Scholar
Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309–1324(2018).
Article CAS PubMed Central PubMed Google Scholar
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Article CAS PubMed Central PubMed Google Scholar
Lein, E., Borm, L. E. & Linnarsson, S. The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing. Science 358, 64–69 (2017).
Article CAS PubMed Google Scholar
Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature 568, 235–239 (2019).
Article CAS PubMed Central PubMed Google Scholar
Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Article CAS PubMed Central PubMed Google Scholar
Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).
Article CAS PubMed Google Scholar
Angerer, P. et al. destiny: diffusion maps for large-scale single-cell data in R. Bioinformatics 32, 1241–1243 (2016).
Article CAS PubMed Google Scholar
Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database (Oxford) 2020, baaa073 (2020).
Article CAS Google Scholar
Byrnes, L. E. et al. Lineage dynamics of murine pancreatic development at single-cell resolution. Nat. Commun. 9, 3922 (2018).
Article PubMed Central CAS PubMed Google Scholar
Lo Giudice, Q., Leleu, M., La Manno, G. & Fabre, P. J. Single-cell transcriptional logic of cell-fate specification and axon guidance in early-born retinal neurons. Development 146, dev178103 (2019).
Article CAS PubMed Google Scholar
Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019).
Article PubMed Central PubMed Google Scholar
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Article PubMed Central CAS PubMed Google Scholar
Mao, Q., Yang, L., Wang, L., Goodison, S. & Sun, Y. in Proceedings of the 2015 SIAM International Conference on Data Mining 792–800 (SIAM, 2015).
Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 16, 479–487 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

P.V.K was supported by the NHLBI R01HL131768 award from NIH and CAREER (NSF-14-532) award from NSF.

Author information

Authors and Affiliations

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Peter V. Kharchenko

Authors

Peter V. Kharchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter V. Kharchenko.

Ethics declarations

Competing interests

P.V.K. serves on the scientific advisory boards of Celsius Therapeutics and Biomage Inc.

Additional information

Peer review information Nature Methods thanks Martin Hemberg, Michael Morgan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Properties of scRNA-seq measurements.

a. Dependency between cost per cell (x axis) and the expected depth (UMIs per cell, y axis) is shown for a number of popular methods, largely based on the assessment by Ding. et al.¹. b–d. Systematic transcript-specific bias of different scRNA-seq protocols. b. The scatter plot shows average log10(CPM+1) values for different genes (each dot represents a gene), as assessed using 10x Chromium (x axis) or dropseq (y axis) platforms. Genes showing higher (red) or lower (green) expression (above 10-fold threshold) are highlighted. c, d. Similar scatter plots shown for other two cell lines: H2228 (b) and HCC827 (c) cells. The set of differential genes determined from analysis of the H1975 cell line (a) is shown. Most of the genes that showed large discrepancy in the detection rate in H1975 results also show same discrepancy in the other two cell lines, illustrating stable detection bias between the two platforms. e. The ability to distinguish nearest neighbors decreases as the dimensionality of the space increases. The difference between closest (min_d) and furthest (max_d) points from the origin, normalized by min_d (y axis) is shown for different distance measures as a function of increasing number of dimensions (x axis). For each dimensionality n, a set of 100 random points are drawn from the n-dimensional uniform distribution, and a median of 1000 draws is shown. The distinction between closest and furthest points approaches 0 at high dimensions. In other words, relative to the origin, in high-dimensional space the points appear to be distributed on the surface of a high-dimensional sphere. f. Principal tree fit to the PBMC10k dataset. The tree shows computationally optimal spanning of the PBMC populations, yet the interpreting it as a dynamic process is incorrect.

Extended Data Fig. 2 Dimensionality reduction and neural networks.

a. A t-SNE embedding of the PBMC10k dataset (left); projection of cells onto the first two principal components (middle); projection of cells onto first two basis of the non-negative matrix factorization (right); b. Projection of cells onto the first two principal components, based on re-analysis of a subset of the PBMC10k dataset that contains only T lymphocytes. Given this restricted cellular context, the first two components are much better at capturing separation between different subsets of T cells, compared to the PCA on the full dataset shown in the previous panel. c. Visualization of the PBMC10k dataset in the 2D latent space determined by an autoencoder structure shown in (d). d. The architecture of an autoencoder used to reduce dimensions of the PBMC10k dataset in the previous panel. The autoencoder starts with a vector of top 3000 most variable genes, and then for each cell transforms this expression profile through a series of non-linear transformations, first into increasingly narrow dimensions, culminating in a two-dimensional middle layer, and then back into a full 3000-dimensional vector. The values of the two-dimensional middle layer are shown in (d). The parameters of the transformations connecting each layer are optimized so that they minimize the discrepancy between the original expression vector (leftmost layer) and the reconstructed vector (rightmost layer). e, f. Using neural networks to learn non-linear mapping from high-dimensional expression state to the coordinates of a t-SNE embedding. As t-SNE embeddings are based on empirical optimization of the relative positions of neighboring cells, there is no obvious analytical function connecting the expression state with the resulting t-SNE coordinates. Neural networks, however, can be used to approximate highly nonlinear and noisy functions. Here, a neural network with an architecture shown in (f) was used to approximate such a function. The parameters of the transformations connecting the layers were optimized based on a training set of 3000 cells, and then an additional set of 3000 test cells was used to illustrate the resulting fit. The left panel in (e) shows the actual positions of the 3000 test cells, and the right plot shows the positions predicted by the trained network.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kharchenko, P.V. The triumphs and limitations of computational methods for scRNA-seq. Nat Methods 18, 723–732 (2021). https://doi.org/10.1038/s41592-021-01171-x

Download citation

Received: 08 November 2018
Accepted: 29 April 2021
Published: 21 June 2021
Issue Date: July 2021
DOI: https://doi.org/10.1038/s41592-021-01171-x

This article is cited by

A relay velocity model infers cell-dependent RNA velocity
- Shengyu Li
- Pengzhi Zhang
- Guangyu Wang
Nature Biotechnology (2024)
Drug targeting in psychiatric disorders — how to overcome the loss in translation?
- Konstantin Khodosevich
- Katarina Dragicevic
- Oliver Howes
Nature Reviews Drug Discovery (2024)
Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries
- Tracy Boakye Serebour
- Adam P. Cribbs
- Sarah J. B. Snelling
European Journal of Human Genetics (2024)
scPerturb: harmonized single-cell perturbation data
- Stefan Peidli
- Tessa D. Green
- Chris Sander
Nature Methods (2024)
scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data
- Xianxian Cai
- Wei Zhang
- Yuanyuan Li
Interdisciplinary Sciences: Computational Life Sciences (2024)