Dynamic visualization of high-dimensional data

Sun, Eric D.; Ma, Rong; Zou, James

doi:10.1038/s43588-022-00380-4

Resource
Published: 30 December 2022

Dynamic visualization of high-dimensional data

Nature Computational Science volume 3, pages 86–100 (2023)Cite this article

5105 Accesses
4 Citations
144 Altmetric
Metrics details

Subjects

Abstract

Dimensionality reduction (DR) is commonly used to project high-dimensional data into lower dimensions for visualization, which could then generate new insights and hypotheses. However, DR algorithms introduce distortions in the visualization and cannot faithfully represent all relations in the data. Thus, there is a need for methods to assess the reliability of DR visualizations. Here we present DynamicViz, a framework for generating dynamic visualizations that capture the sensitivity of DR visualizations to perturbations in the data resulting from bootstrap sampling. DynamicViz can be applied to all commonly used DR methods. We show the utility of dynamic visualizations in diagnosing common interpretative pitfalls of static visualizations and extending existing single-cell analyses. We introduce the variance score to quantify the dynamic variability of observations in these visualizations. The variance score characterizes natural variability in the data and can be used to optimize DR algorithm implementations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Dynamic visualization and its uses.**

**Fig. 2: Variance score and its properties.**

**Fig. 3: Optimizing DR algorithms using variance score.**

**Fig. 4: Application of dynamic visualization and variance score to RNA velocity analysis of single-cell transcriptomic data of mouse pancreas (n = 3,696).**

A spectral method for assessing and combining multiple data visualizations

Article Open access 11 February 2023

Visualizing structure and transitions in high-dimensional biological data

Article 03 December 2019

An analytical framework for interpretable and generalizable single-cell data analysis

Article 01 November 2021

Data availability

All processed single-cell RNA-seq data on mouse SVZ were accessed from a public repository⁴⁸. Single-cell RNA-seq data on mouse embryonic stem cells are available at https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2805/. Single-cell RNA-seq data on human bone marrow are available through the Human Cell Atlas data portal at https://prod.data.humancellatlas.org/explore/projects/29f53b7e-071b-44b5-998a-0ae70d0229a4. Single-cell RNA-seq data on gastrulation of the erythroid lineage data can be downloaded according to instructions at https://github.com/MarioniLab/EmbryoTimecourse2018. Single-cell RNA-seq data on mouse pancreas lineage can be found on Gene Expression Omnibus under accession number GSE132188. The MERFISH spatial transcriptomics data on mouse primary motor cortex are available at the VizGen Resources website at https://vizgen.com/resources/molecular-spatial-and-projection-diversity-of-neurons-in-primary-motor-cortex-revealed-by-in-situ-single-cell-transcriptomics/. The genomic data from the 1000 Genomes Project is available through http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip/. Processed data from the Sloan Digital Sky Survey Data Release 17 were retrieved from https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17.

Code availability

DynamicViz code is available at https://github.com/sunericd/dynamicviz and also installable through https://pypi.org/project/dynamicviz/. Jupyter notebooks and Python scripts for the experiments and analyses presented in the paper are available at https://github.com/sunericd/dynamic-visualization-of-high-dimensional-data, which also includes example interactive visualization HTML files and animated visualization GIF files. Frozen versions of the software and associated code for analysis are also available⁴⁹.

References

van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
MATH Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at http://arxiv.org/abs/1802.03426 (2020).
Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).
Article Google Scholar
Su, Y., Shi, Q. & Wei, W. Single cell proteomics in biomedicine: high-dimensional data acquisition, visualization, and analysis. Proteomics 17, 1600267 (2017).
Article Google Scholar
Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J. Hum. Genet. 66, 85–91 (2021).
Article Google Scholar
Anders, F. et al. Dissecting stellar chemical abundance space with t-SNE. Astron. Astrophys. 619, A125 (2018).
Article Google Scholar
Cooley, S. M., Hamilton, T., Aragones, S. D., Ray, J. C. J. & Deeds, E. J. A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/689851 (2022).
Espadoto, M., Martins, R. M., Kerren, A., Hirata, N. S. T. & Telea, A. C. Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Vis. Comput. Graph. 27, 2153–2173 (2021).
Article Google Scholar
Nonato, L. G. & Aupetit, M. Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE Trans. Vis. Comput. Graph. 25, 2650–2673 (2019).
Article Google Scholar
Chari, T., Banerjee, J. & Pachter, L. The specious art of single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2021.08.25.457696 (2021).
Johnson, E. M., Kath, W. & Mani, M. EMBEDR: distinguishing signal from noise in single-cell omics data. Patterns 3, 100443 (2022).
Article Google Scholar
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Article Google Scholar
Stuart, T. et al. Comprehensive Integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Article Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Article Google Scholar
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article Google Scholar
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Article Google Scholar
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill 1, e2 (2016).
Article Google Scholar
Cooley, S. M. Distortion in Dimensionality Reduction and Implications for the Analysis of Single Cell RNA-Sequencing Data. PhD Thesis, Univ. California, Los Angeles, (2021); https://www.proquest.com/docview/2571111018/abstract/1C4D093B947C4AC5PQ/1
Wu, Y., Tamayo, P. & Zhang, K. Visualizing and interpreting single-cell gene expression datasets with similarity weighted nonnegative embedding. Cell Syst. 7, 656–666.e4 (2018).
Google Scholar
Paulovich, F. V., Nonato, L. G., Minghim, R. & Levkowitz, H. Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008).
Article Google Scholar
Venna, J. & Kaski, S. Visualizing gene interaction graphs with local multidimensional scaling. In Proc. ESANN’06, 14th European Symposium on Artificial Neural Networks 557–562 (d-side group, 2006).
Schreck, T., von Landesberger, T. & Bremm, S. Techniques for precision-based visual analysis of projected data. Inf. Vis. 9, 181–193 (2010).
Article Google Scholar
Aupetit, M. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70, 1304–1330 (2007).
Article Google Scholar
Buckley, M. T. et al. Cell type-specific aging clocks to quantify aging and rejuvenation in regenerative regions of the brain. Preprint at bioRxiv https://doi.org/10.1101/2022.01.10.475747 (2022).
Dulken, B. W. et al. Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature 571, 205–210 (2019).
Article Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
McVean, G. A. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Article Google Scholar
Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).
Article Google Scholar
York, D. G. et al. The Sloan Digital Sky Survey: technical summary. Astron. J. 120, 1579–1587 (2000).
Article Google Scholar
Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).
Article Google Scholar
Bastidas-Ponce, A. et al. Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development 146, dev173849 (2019).
Article Google Scholar
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Article Google Scholar
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).
Article Google Scholar
Wang, Y., Huang, H., Rudin, C. & Shaposhnik, Y. Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. J. Mach. Learn. Res. 22, 1–73 (2021).
MATH Google Scholar
Amid, E. & Warmuth, M. K. TriMap: large-scale dimensionality reduction using triplets. Preprint at http://arxiv.org/abs/1910.00204 (2019).
Bergen, V., Soldatov, R. A., Kharchenko, P. V. & Theis, F. J. RNA velocity-current challenges and future perspectives. Mol. Syst. Biol. 17, e10282 (2021).
Article Google Scholar
Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).
Article Google Scholar
Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).
Article Google Scholar
Hinton, G. E. & Roweis, S. Stochastic neighbor embedding. Adv. Neural Inf. Process. Syst. 15, 857–864 (2002).
Google Scholar
Joia, P., Coimbra, D., Cuminato, J. A., Paulovich, F. V. & Nonato, L. G. Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17, 2563–2571 (2011).
Article Google Scholar
Martins, R. M., Minghim, R. & Telea, A. C. in Computer Graphics and Visual Computing (eds Borgo, R. & Turkay, C.), 121–128 (Eurographics Association, 2015).
Martins, R. M., Coimbra, D. B., Minghim, R. & Telea, A. C. Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014).
Article Google Scholar
Shao, J. & Tu, D. The Jackknife and Bootstrap Springer Series in Statistics (Springer, 1995); https://doi.org/10.1007/978-1-4612-0795-5
Shao, J. Bootstrap estimation of the asymptotic variances of statistical functionals. Ann. Inst. Stat. Math. 42, 737–752 (1990).
Article MATH Google Scholar
Kokoska, S. & Zwillinger, D. CRC Standard Probability and Statistics Tables and Formulae Student edn (CRC Press, 2000).
McQuitty, L. L. Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ. Psychol. Meas. 17, 207–229 (1957).
Article Google Scholar
Hartigan, J. A. Consistency of single linkage for high-density clusters. J. Am. Stat. Assoc. 76, 388–394 (1981).
Article MATH Google Scholar
Sun, E. D. Processed data for Cell type-specific aging clocks to quantify aging and rejuvenation in regenerative regions of the brain. Zenodo https://doi.org/10.5281/zenodo.7145399 (2022).
Sun, E. D. Software for dynamic visualization of high-dimensional data. Zenodo https://doi.org/10.5281/zenodo.7305446 (2022).

Download references

Acknowledgements

We thank K. Swanson and C. Yeh for their feedback on the functionality of DynamicViz. Funding support was provided by Knight-Hennessy Scholars program (E.D.S.), Paul and Daisy Soros Fellowship for New Americans (E.D.S.), the National Science Foundation Graduate Research Fellowship Program (E.D.S.), D. Donoho at Stanford University (R.M.), NSF CAREER 1942926 (J.Z.), NIH P30AG059307 (J.Z.), 5RM1HG010023 (J.Z.), and grants from the Silicon Valley Foundation (J.Z.) and the Chan-Zuckerberg Initiative (J.Z.).

Author information

Authors and Affiliations

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
Eric D. Sun & James Zou
Department of Statistics, Stanford University, Stanford, CA, USA
Rong Ma

Authors

Eric D. Sun
View author publications
You can also search for this author in PubMed Google Scholar
Rong Ma
View author publications
You can also search for this author in PubMed Google Scholar
James Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.D.S. and J.Z. conceived of the study. E.D.S. designed and implemented the method with input from J.Z. and R.M. R.M. contributed to the theoretical framework for the study. E.D.S. prepared a draft of the manuscript. J.Z. and R.M. edited the manuscript.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Di Yu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Analysis of bridging connections between clusters.

Analysis of bridging connections between clusters. (A) Interactive t-SNE visualization of single-cell transcriptomic data from mouse subventricular zone (SVZ) (n=1000) for diagnosing stability of bridging connections between cell-type clusters in the neural stem cell lineage. (B) Distribution of contact distances across bootstrap visualizations (B=20) of either the neuroblast or astrocyte-qNSC cell cluster to the aNSC-NPC cell cluster in the SVZ data shown in panel A. Statistical significance was assessed with the two-sided Wilcoxon rank-sum test. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (C) Same as in panel A except for interactive PCA visualization. (D) Same as in panel B except for interactive PCA visualization. (E) Same as in panel A except for interactive UMAP visualization (n neighbors=100) of the entire SVZ data (n=21458). (F) Same as in panel B except for interactive UMAP visualization (n neighbors=100) of the entire SVZ data (n=21458) and with removal of the one percent closest cells to aNSC-NPC in both the neuroblast and astrocyte-qNSC cell clusters respectively before computing the contact distance. This filtering produces a robust estimate of the contact distance by removing the effect of outlier cells that clustered separately, which are evident in panel E. (G) UMAP visualization plots of three external SVZ single-cell datasets. (H) Interactive t-SNE visualization (perplexity = 40) of genomic data from the 1000 Genomes Project (n=1000) for diagnosing stability of bridging connections between European (EUR) and admixed American (AMR) population clusters. DR visualization settings are outlined in Methods.

Extended Data Fig. 2 Analysis of cluster stability.

Analysis of cluster stability. (A) Animated t-SNE visualization of single-cell transcriptomic data from human bone marrow (n=1000). (B) Distribution of cell-wise silhouette coefficients across bootstrap visualizations (B=20) of the CLP cluster and of other candidate cluster identified in Figure 1D on the t-SNE visualizations shown in panel A. Statistical significance was assessed with the two-sided Wilcoxon rank-sum test. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (C) Same as in panel A except for animated PCA visualization. (D) Same as in panel B except for animated PCA visualization. (E) Same as in panel A except for animated UMAP visualization of the entire human bone marrow data (n=5780). (F) Same as in panel B except for animated UMAP visualization of the entire human bone marrow data (n=5780). (G) Animated visualization of spectral data of stars, quasars, and galaxies from the Sloan Digital Sky Survey (n=1000) queries stability of clusters identified in one visualization (leftmost panel) across other visualizations. DR visualization settings are outlined in Methods.

Extended Data Fig. 3 Analysis of label separation and continuous trajectories.

Analysis of label separation and continuous trajectories. (A) Stacked UMAP visualization of transcriptomic data of mouse embryonic stem cells (n=288, B=100) undergoing three different phases of cell cycle (right) reveals stable separation of cell cycle phases that is not apparent from a single static visualization (left). (B) Same as in panel A except for stacked PCA visualization. (C) Maximum improvement in the mean silhouette coefficient when comparing DynamicViz stacked visualization with a single bootstrap visualization for eight independent DR methods. (D) Same as in panel A except for the mouse pancreatic lineage data (n=3696). (E) Stacked t-SNE visualization of single-cell transcriptomic data from gastrulation of erythroid lineage (n=1000, B=100) (right) compared to a single static visualization (left). (F) Same as in panel E except for stacked PCA visualization. (G) Same as in panel E except for stacked UMAP visualization of the entire data (n=9815). (H) Same as in panel E except for stacked UMAP visualization of the mouse pancreatic lineage data (n=3696).

Extended Data Fig. 4 Analysis of DynamicViz runtime.

Analysis of DynamicViz runtime. (A) Empirical runtimes measured for generating dynamic visualizations of data drawn from a mixture of five Gaussian distributions (p=50) and broken down into two components: bootstrap DR visualization and rigid alignment of bootstrap visualizations. Individual plots represent different DR methods (i.e. UMAP, t-SNE, or PCA). (B) Same as in panel A except for subsamples drawn from the MERFISH mouse primary motor cortex spatial transcriptomics data. (C) Empirical runtimes measured for computing the variance score using the default global neighborhood definition (see Methods for details) broken down into four components: constructing the neighborhood, computing required pair-wise distances, computing a normalization factor, and calculating the variance of pair-wise distances. Shown are the mean runtimes across all datasets and DR methods presented in panels A and B with error bars corresponding to the standard deviation in runtimes. (D) Same as in panel C except for the variance score using the alternative random neighborhood definition (k=50, see Methods for details). (E) Scatter plot of the variance score with either the global or random neighborhoods (k=50) for all cells in a n=12800 subsample of the MERFISH dataset. (F) Average variance score across t-SNE and UMAP visualizations as a function of the number of observations in data drawn from a mixture of five Gaussian distributions (p=50). (G) Same as in panel F except for subsamples drawn from the MERFISH mouse primary motor cortex spatial transcriptomics data.

Extended Data Fig. 5 Random approximation of the global variance score.

Random approximation of the global variance score for different choices of the number of random neighbors used, k, and for different combinations of DR visualization methods and data. (A) UMAP visualizations with synthetic data drawn from a mixture of Gaussians (n=1000, p=50). (B) t-SNE visualizations for the same synthetic data. (C) UMAP visualizations for the mouse subventricular zone (SVZ) single-cell transcriptomics data. (D) t-SNE visualizations for the same SVZ data. Generally, the random method for computing variance scores (see Methods for details) is a good approximation of the global variance score and the quality of this approximation increases with k. Shaded region corresponds to 95% confidence interval in all panels.

Extended Data Fig. 6 Analysis of variance score properties.

Analysis of variance score properties. (A) Silhouette scores computed for t-SNE visualizations of data drawn repeatedly from a mixture of Gaussian distributions (100 repeated draws, n=500, p=100) compared to silhouette scores for 100 bootstrap samples from one initial sample from the mixture of Gaussian distributions. Shown are bootstrap results for three representative initial samples. No statistically significant differences between any groups at two-sided Wilcoxon rank-sum test p-value cut-off of 0.05. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (B) Variance in the Euclidean distance between two predetermined observations in t-SNE visualization across either 100 bootstrap sample data or 100 resampled data from a Gaussian mixture model (50 features, 5 distributions) for different numbers of observations n. 95% confidence intervals are shown for 20 pairs of predetermined observations. (C) Same setting as panel C except with n=1000 and different number of bootstrap samples or resamples B. (D) t-SNE visualization of mixture of five Gaussian distributions (n=1000, p=50). (E) UMAP and t-SNE visualizations of the mouse SVZ single-cell data (n=1000). (F) Relative contributions of the marginal variance scores (B=100) for dynamic t-SNE visualizations of synthetic data drawn from a mixture of five Gaussian distributions (n in [320,640,1280,2560], p=50), of single-cell transcriptomic data from mouse subventricular zone (n in [320,640,1000]), and of single-cell transcriptomic data from mouse pancreatic lineage (n in [320,640,1000]). (G) t-SNE visualization of the first replicate of the MERFISH mouse primary motor cortex dataset. Error bar corresponds to 95% confidence interval. (H) Pearson correlation between mean DR quality metrics for t-SNE (including variance score, B=100) computed at the cell-type level on a single replicate and the mean gene variance computed at the cell-type level across all 12 technical and biological replicates in the MERFISH mouse primary motor cortex spatial transcriptomics dataset. Center line corresponds to median and box corresponds to interquartile range. In panels F and H, variance scores were computed for B=100 bootstrap samples using a random neighborhood approximation with k=200 (see Methods for details).

Extended Data Fig. 7 Analysis of DR optimization using the variance score.

Analysis of DR optimization using the variance score. (A) Stacked t-SNE visualizations of the SVZ single-cell data (n=1000) at different perplexity values. Perplexity values correspond to those shown in Figure 3A. (B) Variance scores of UMAP visualizations of the SVZ single-cell data (n=1000) computed for different choices of number of neighbors (neighbors) with stacked UMAP visualizations for the optimal neighbors value (neighbors = 320) and the least optimal case (neighbors = 5). (C) Stacked UMAP visualizations of the SVZ single-cell data (n=1000) at different neighbors values corresponding to panel B. (D) Variance scores of LLE (locally linear embedding) visualizations of the pancreatic cell lineage single-cell data (n=3696) computed for different choices of number of neighbors (neighbors) with stacked LLE visualizations for the optimal neighbors value (neighbors = 500) and the least optimal case (neighbors = 40). (E) Stacked LLE visualizations of the pancreatic cell lineage single-cell data (n=3696) at different neighbors values corresponding to panel D. (F) Distribution of variance scores, computed using random neighborhood definition with k=200 on the same data as in panel D and E, for the optimal hyperparameter choices (lowest variance score) of different DR algorithms (perplexity: t-SNE; number of neighbors: UMAP, ISOMAP, LLE, PACMAP; number of inliers: TRIMAP) shown with stacked visualizations of all bootstrap visualizations from three representative DR algorithms. Center line corresponds to median and box corresponds to interquartile range. Variance scores and dynamic visualizations were computed for B=100 bootstrap samples. In panels D and F, variance scores were computed for B=100 bootstrap samples using a random neighborhood approximation with k=200 (see Methods for details).

Extended Data Fig. 8 Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data for gastrulation of the erythroid lineage.

Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data for gastrulation of the erythroid lineage. (A) RNA velocity embedding stream UMAP plots for the original data and two bootstrapped versions (left to right). (B) UMAP visualization of the original data with colors corresponding to variance score. (C) UMAP visualization of the original data with colors corresponding to RNA velocity pseudotime. (D) Median rank-ordered pseudotimes computed for each cell over bootstrap UMAP visualizations with gray shading corresponding to 95% confidence interval. (E) Predicted terminal states of a Blood Progenitor 1 cell using RNA velocity trajectory analysis across bootstrap UMAP visualizations and transitions traced across representative bootstrap visualizations for two terminal states (Erythroid1. Erythroid2). Color corresponds to pseudotime along trajectory. Dynamic visualization and variance score provide a more detailed picture of standard RNA velocity analyses in the gastrulation of erythroid lineage, including the stability of RNA velocity streams, cell fates, and pseudotime. Variance scores, pseudotimes, and trajectory predictions were computed for B=100 bootstrap samples. Variance scores were computed using the random neighborhood approximation with k=50 (see Methods).

Extended Data Fig. 9 Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data of human bone marrow.

Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data of human bone marrow. (A) RNA velocity embedding stream UMAP plots for the original data and two bootstrapped versions (left to right). (B) UMAP visualization of the original data with colors corresponding to variance score. (C) UMAP visualization of the original data with colors corresponding to a locally defined variance score where each cell-type cluster is a unique neighborhood. (D) UMAP visualization of the original data with colors corresponding to RNA velocity pseudotime. (E) Median rank-ordered pseudotimes computed for each cell over bootstrap UMAP visualizations with gray shading corresponding to 95% confidence interval. (F) Predicted terminal states of a Mono1 cell using RNA velocity trajectory analysis across bootstrap UMAP visualizations and transitions traced across representative bootstrap visualizations for two terminal states (HSC1, CLP). Color corresponds to pseudotime along trajectory. Dynamic visualization and variance score provide a more detailed picture of standard RNA velocity analyses in the human bone marrow lineage, including the stability of RNA velocity streams, cell fates, and pseudotime. Variance scores, pseudotimes, and trajectory predictions were computed for B=100 bootstrap samples. Variance scores were computed using the random neighborhood approximation with k=50 (see Methods).

Supplementary information

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, E.D., Ma, R. & Zou, J. Dynamic visualization of high-dimensional data. Nat Comput Sci 3, 86–100 (2023). https://doi.org/10.1038/s43588-022-00380-4

Download citation

Received: 02 June 2022
Accepted: 16 November 2022
Published: 30 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1038/s43588-022-00380-4

This article is cited by

Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters
- Lucy Xia
- Christy Lee
- Jingyi Jessica Li
Nature Communications (2024)
TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses
- Eric D. Sun
- Rong Ma
- James Zou
Nature Methods (2024)
Dimensionality reduction under scrutiny
- Yang Yang
- Zewen K. Tuong
- Di Yu
Nature Computational Science (2023)