Advances in single-cell technologies have enabled high-resolution dissection of tissue composition. Several tools for dimensionality reduction are available to analyze the large number of parameters generated in single-cell studies. Recently, a nonlinear dimensionality-reduction technique, uniform manifold approximation and projection (UMAP), was developed for the analysis of any type of high-dimensional data. Here we apply it to biological data, using three well-characterized mass cytometry and single-cell RNA sequencing datasets. Comparing the performance of UMAP with five other tools, we find that UMAP provides the fastest run times, highest reproducibility and the most meaningful organization of cell clusters. The work highlights the use of UMAP for improved visualization and interpretation of single-cell data.
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Saeys, Y., Van Gassen, S. & Lambrecht, B.N. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 16, 449–462 (2016).
Tenenbaum, J.B., De Silva, V. & Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Coifman, R.R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. USA 102, 7426–7431 (2005).
Van Der Maaten, L. & Hinton, G. Visualizing high-dimensional data using t-SNE. journal of machine learning research. J. Mach. Learn. Res. 9, 26 (2008).
Amir, A.D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat. Biotechnol. 31, 545–552 (2013).
van Unen, V. et al. Mass cytometry of the human mucosal immune system identifies tissue- and disease-associated immune subsets. Immunity 44, 1227–1239 (2016).
McInnes, L. & Healy, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Han, X. et al. Mapping the mouse cell atlas by microwell-seq. Cell 172, 1091–1107.e17 (2018).
Samusik, N., Good, Z., Spitzer, M.H., Davis, K.L. & Nolan, G.P. Automated mapping of phenotype space with single-cell data. Nat. Methods 13, 493–496 (2016).
Wong, M.T. et al. A high-dimensional atlas of human T cell diversity reveals tissue-specific trafficking and cytokine signatures. Immunity 45, 442–456 (2016).
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014).
Linderman, G.C., Rachh, M., Hoskins, J.G., Steinerberger, S. & Kluger, Y. Efficient algorithms for t-distributed stochastic neighborhood embedding. Preprint at https://arxiv.org/abs/1712.09005 (2017).
Ding, J., Condon, A. & Shah, S.P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 9, 2002 (2018).
Levine, J.H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Huang, H., Li, Y. & Liu, B. Transcriptional regulation of mast cell and basophil lineage commitment. Semin. Immunopathol. 38, 539–548 (2016).
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill 1, e2 (2016).
de Graaf, C.A. et al. Haemopedia: an expression atlas of murine hematopoietic cells. Stem Cell Rep. 7, 571–582 (2016).
Mårtensson, I.-L., Keenan, R.A. & Licence, S. The pre-B-cell receptor. Curr. Opin. Immunol. 19, 137–142 (2007).
Wolf, F.A., Angerer, P. & Theis, F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017).
We thank members of the Singapore Immunology Network and notably members of the E.W.N. laboratory. We thank S. Li, Y. Simoni, M. Chng, Y. Cheng, J.W. Lim and M. Fehlings for their insightful feedback. This study was funded by A-STAR/SIgN core funding and A-STAR/SIgN immunomonitoring platform funding.
E.W.N. is a board director and shareholder of immunoSCAPE Pte. Ltd., which is an immune profiling service provider.
Integrated supplementary information
a) Phenotypic characterization of the phenograph clusters. Each cluster medoid is represented after column-wise Z-score transformation. b) Identification of each phenograph cluster of both UMAP (left), t-SNE (middle) and 2D PCA (right). For clarity, only twelve clusters are shown per plot.
Scatterplot of embeddings of the Wong dataset using UMAP (top), t-SNE (middle) and 2D PCA (bottom) color-coded by tissues of origin.
Expression of Ter119 (a marker for mature erythrocytes) color-coded on the UMAP embedding of the Samusik_01 dataset.
Heatmap of the density of a 300x300 square grid of the UMAP or t-SNE projections for the Samusik_01 dataset. The number of events in each bin is color-coded.
Top: UMAP projection of the full Han dataset annotated by AUC scores for various cell lineages (red: high score, blue: low score). Bottom: full Han dataset colored by sample type, Sample ID and pre-filtering status.
Supplementary Figure 6 Side-by-side comparison of each dimensionality reduction method across all datasets annotated by cell types.
Scatterplots of six dimensionality-reduction methods and 6 datasets. Cell populations are annotated using manual gating (Samusik dataset), manually-labelled Phenograph clusters (Wong dataset) or sample of origin (Han_400k dataset).
Embeddings of full datasets as well as subsamples of varying sizes replicated thrice for five dimensionality reduction methods. The color-code is generated using the embedding of the full dataset and propagated to the subsamples. Datasets shown are the a) Samusik_all, b) Wong and c) Han_400k datasets.
About this article
Cite this article
Becht, E., McInnes, L., Healy, J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38–44 (2019). https://doi.org/10.1038/nbt.4314
Nature Communications (2020)
Transcriptomics in Toxicogenomics, Part II: Preprocessing and Differential Expression Analysis for High Quality Data
Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines reveals rules of in vitro evolution
NAR Genomics and Bioinformatics (2020)
Single-cell RNA-seq data analysis on the receptor ACE2 expression reveals the potential risk of different human organs vulnerable to 2019-nCoV infection
Frontiers of Medicine (2020)