Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

Dynamic visualization of high-dimensional data

Abstract

Dimensionality reduction (DR) is commonly used to project high-dimensional data into lower dimensions for visualization, which could then generate new insights and hypotheses. However, DR algorithms introduce distortions in the visualization and cannot faithfully represent all relations in the data. Thus, there is a need for methods to assess the reliability of DR visualizations. Here we present DynamicViz, a framework for generating dynamic visualizations that capture the sensitivity of DR visualizations to perturbations in the data resulting from bootstrap sampling. DynamicViz can be applied to all commonly used DR methods. We show the utility of dynamic visualizations in diagnosing common interpretative pitfalls of static visualizations and extending existing single-cell analyses. We introduce the variance score to quantify the dynamic variability of observations in these visualizations. The variance score characterizes natural variability in the data and can be used to optimize DR algorithm implementations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Dynamic visualization and its uses.
Fig. 2: Variance score and its properties.
Fig. 3: Optimizing DR algorithms using variance score.
Fig. 4: Application of dynamic visualization and variance score to RNA velocity analysis of single-cell transcriptomic data of mouse pancreas (n = 3,696).

Similar content being viewed by others

Data availability

All processed single-cell RNA-seq data on mouse SVZ were accessed from a public repository48. Single-cell RNA-seq data on mouse embryonic stem cells are available at https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2805/. Single-cell RNA-seq data on human bone marrow are available through the Human Cell Atlas data portal at https://prod.data.humancellatlas.org/explore/projects/29f53b7e-071b-44b5-998a-0ae70d0229a4. Single-cell RNA-seq data on gastrulation of the erythroid lineage data can be downloaded according to instructions at https://github.com/MarioniLab/EmbryoTimecourse2018. Single-cell RNA-seq data on mouse pancreas lineage can be found on Gene Expression Omnibus under accession number GSE132188. The MERFISH spatial transcriptomics data on mouse primary motor cortex are available at the VizGen Resources website at https://vizgen.com/resources/molecular-spatial-and-projection-diversity-of-neurons-in-primary-motor-cortex-revealed-by-in-situ-single-cell-transcriptomics/. The genomic data from the 1000 Genomes Project is available through http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/hd_genotype_chip/. Processed data from the Sloan Digital Sky Survey Data Release 17 were retrieved from https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17.

Code availability

DynamicViz code is available at https://github.com/sunericd/dynamicviz and also installable through https://pypi.org/project/dynamicviz/. Jupyter notebooks and Python scripts for the experiments and analyses presented in the paper are available at https://github.com/sunericd/dynamic-visualization-of-high-dimensional-data, which also includes example interactive visualization HTML files and animated visualization GIF files. Frozen versions of the software and associated code for analysis are also available49.

References

  1. van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  2. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at http://arxiv.org/abs/1802.03426 (2020).

  3. Kobak, D. & Berens, P. The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10, 5416 (2019).

    Article  Google Scholar 

  4. Su, Y., Shi, Q. & Wei, W. Single cell proteomics in biomedicine: high-dimensional data acquisition, visualization, and analysis. Proteomics 17, 1600267 (2017).

    Article  Google Scholar 

  5. Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J. Hum. Genet. 66, 85–91 (2021).

    Article  Google Scholar 

  6. Anders, F. et al. Dissecting stellar chemical abundance space with t-SNE. Astron. Astrophys. 619, A125 (2018).

    Article  Google Scholar 

  7. Cooley, S. M., Hamilton, T., Aragones, S. D., Ray, J. C. J. & Deeds, E. J. A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/689851 (2022).

  8. Espadoto, M., Martins, R. M., Kerren, A., Hirata, N. S. T. & Telea, A. C. Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Vis. Comput. Graph. 27, 2153–2173 (2021).

    Article  Google Scholar 

  9. Nonato, L. G. & Aupetit, M. Multidimensional projection for visual analytics: linking techniques with distortions, tasks, and layout enrichment. IEEE Trans. Vis. Comput. Graph. 25, 2650–2673 (2019).

    Article  Google Scholar 

  10. Chari, T., Banerjee, J. & Pachter, L. The specious art of single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2021.08.25.457696 (2021).

  11. Johnson, E. M., Kath, W. & Mani, M. EMBEDR: distinguishing signal from noise in single-cell omics data. Patterns 3, 100443 (2022).

    Article  Google Scholar 

  12. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  Google Scholar 

  13. Stuart, T. et al. Comprehensive Integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  Google Scholar 

  14. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    Article  Google Scholar 

  15. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  Google Scholar 

  16. Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  Google Scholar 

  17. Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill 1, e2 (2016).

    Article  Google Scholar 

  18. Cooley, S. M. Distortion in Dimensionality Reduction and Implications for the Analysis of Single Cell RNA-Sequencing Data. PhD Thesis, Univ. California, Los Angeles, (2021); https://www.proquest.com/docview/2571111018/abstract/1C4D093B947C4AC5PQ/1

  19. Wu, Y., Tamayo, P. & Zhang, K. Visualizing and interpreting single-cell gene expression datasets with similarity weighted nonnegative embedding. Cell Syst. 7, 656–666.e4 (2018).

    Google Scholar 

  20. Paulovich, F. V., Nonato, L. G., Minghim, R. & Levkowitz, H. Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Vis. Comput. Graph. 14, 564–575 (2008).

    Article  Google Scholar 

  21. Venna, J. & Kaski, S. Visualizing gene interaction graphs with local multidimensional scaling. In Proc. ESANN’06, 14th European Symposium on Artificial Neural Networks 557–562 (d-side group, 2006).

  22. Schreck, T., von Landesberger, T. & Bremm, S. Techniques for precision-based visual analysis of projected data. Inf. Vis. 9, 181–193 (2010).

    Article  Google Scholar 

  23. Aupetit, M. Visualizing distortions and recovering topology in continuous projection techniques. Neurocomputing 70, 1304–1330 (2007).

    Article  Google Scholar 

  24. Buckley, M. T. et al. Cell type-specific aging clocks to quantify aging and rejuvenation in regenerative regions of the brain. Preprint at bioRxiv https://doi.org/10.1101/2022.01.10.475747 (2022).

  25. Dulken, B. W. et al. Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature 571, 205–210 (2019).

    Article  Google Scholar 

  26. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  27. McVean, G. A. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  28. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Article  Google Scholar 

  29. York, D. G. et al. The Sloan Digital Sky Survey: technical summary. Astron. J. 120, 1579–1587 (2000).

    Article  Google Scholar 

  30. Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).

    Article  Google Scholar 

  31. Bastidas-Ponce, A. et al. Comprehensive single cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development 146, dev173849 (2019).

    Article  Google Scholar 

  32. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Article  Google Scholar 

  33. Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598, 137–143 (2021).

    Article  Google Scholar 

  34. Wang, Y., Huang, H., Rudin, C. & Shaposhnik, Y. Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMap, and PaCMAP for data visualization. J. Mach. Learn. Res. 22, 1–73 (2021).

    MATH  Google Scholar 

  35. Amid, E. & Warmuth, M. K. TriMap: large-scale dimensionality reduction using triplets. Preprint at http://arxiv.org/abs/1910.00204 (2019).

  36. Bergen, V., Soldatov, R. A., Kharchenko, P. V. & Theis, F. J. RNA velocity-current challenges and future perspectives. Mol. Syst. Biol. 17, e10282 (2021).

    Article  Google Scholar 

  37. Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).

    Article  Google Scholar 

  38. Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563, 72–78 (2018).

    Article  Google Scholar 

  39. Hinton, G. E. & Roweis, S. Stochastic neighbor embedding. Adv. Neural Inf. Process. Syst. 15, 857–864 (2002).

    Google Scholar 

  40. Joia, P., Coimbra, D., Cuminato, J. A., Paulovich, F. V. & Nonato, L. G. Local affine multidimensional projection. IEEE Trans. Vis. Comput. Graph. 17, 2563–2571 (2011).

    Article  Google Scholar 

  41. Martins, R. M., Minghim, R. & Telea, A. C. in Computer Graphics and Visual Computing (eds Borgo, R. & Turkay, C.), 121–128 (Eurographics Association, 2015).

  42. Martins, R. M., Coimbra, D. B., Minghim, R. & Telea, A. C. Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014).

    Article  Google Scholar 

  43. Shao, J. & Tu, D. The Jackknife and Bootstrap Springer Series in Statistics (Springer, 1995); https://doi.org/10.1007/978-1-4612-0795-5

  44. Shao, J. Bootstrap estimation of the asymptotic variances of statistical functionals. Ann. Inst. Stat. Math. 42, 737–752 (1990).

    Article  MATH  Google Scholar 

  45. Kokoska, S. & Zwillinger, D. CRC Standard Probability and Statistics Tables and Formulae Student edn (CRC Press, 2000).

  46. McQuitty, L. L. Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ. Psychol. Meas. 17, 207–229 (1957).

    Article  Google Scholar 

  47. Hartigan, J. A. Consistency of single linkage for high-density clusters. J. Am. Stat. Assoc. 76, 388–394 (1981).

    Article  MATH  Google Scholar 

  48. Sun, E. D. Processed data for Cell type-specific aging clocks to quantify aging and rejuvenation in regenerative regions of the brain. Zenodo https://doi.org/10.5281/zenodo.7145399 (2022).

  49. Sun, E. D. Software for dynamic visualization of high-dimensional data. Zenodo https://doi.org/10.5281/zenodo.7305446 (2022).

Download references

Acknowledgements

We thank K. Swanson and C. Yeh for their feedback on the functionality of DynamicViz. Funding support was provided by Knight-Hennessy Scholars program (E.D.S.), Paul and Daisy Soros Fellowship for New Americans (E.D.S.), the National Science Foundation Graduate Research Fellowship Program (E.D.S.), D. Donoho at Stanford University (R.M.), NSF CAREER 1942926 (J.Z.), NIH P30AG059307 (J.Z.), 5RM1HG010023 (J.Z.), and grants from the Silicon Valley Foundation (J.Z.) and the Chan-Zuckerberg Initiative (J.Z.).

Author information

Authors and Affiliations

Authors

Contributions

E.D.S. and J.Z. conceived of the study. E.D.S. designed and implemented the method with input from J.Z. and R.M. R.M. contributed to the theoretical framework for the study. E.D.S. prepared a draft of the manuscript. J.Z. and R.M. edited the manuscript.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Di Yu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Analysis of bridging connections between clusters.

Analysis of bridging connections between clusters. (A) Interactive t-SNE visualization of single-cell transcriptomic data from mouse subventricular zone (SVZ) (n=1000) for diagnosing stability of bridging connections between cell-type clusters in the neural stem cell lineage. (B) Distribution of contact distances across bootstrap visualizations (B=20) of either the neuroblast or astrocyte-qNSC cell cluster to the aNSC-NPC cell cluster in the SVZ data shown in panel A. Statistical significance was assessed with the two-sided Wilcoxon rank-sum test. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (C) Same as in panel A except for interactive PCA visualization. (D) Same as in panel B except for interactive PCA visualization. (E) Same as in panel A except for interactive UMAP visualization (n neighbors=100) of the entire SVZ data (n=21458). (F) Same as in panel B except for interactive UMAP visualization (n neighbors=100) of the entire SVZ data (n=21458) and with removal of the one percent closest cells to aNSC-NPC in both the neuroblast and astrocyte-qNSC cell clusters respectively before computing the contact distance. This filtering produces a robust estimate of the contact distance by removing the effect of outlier cells that clustered separately, which are evident in panel E. (G) UMAP visualization plots of three external SVZ single-cell datasets. (H) Interactive t-SNE visualization (perplexity = 40) of genomic data from the 1000 Genomes Project (n=1000) for diagnosing stability of bridging connections between European (EUR) and admixed American (AMR) population clusters. DR visualization settings are outlined in Methods.

Extended Data Fig. 2 Analysis of cluster stability.

Analysis of cluster stability. (A) Animated t-SNE visualization of single-cell transcriptomic data from human bone marrow (n=1000). (B) Distribution of cell-wise silhouette coefficients across bootstrap visualizations (B=20) of the CLP cluster and of other candidate cluster identified in Figure 1D on the t-SNE visualizations shown in panel A. Statistical significance was assessed with the two-sided Wilcoxon rank-sum test. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (C) Same as in panel A except for animated PCA visualization. (D) Same as in panel B except for animated PCA visualization. (E) Same as in panel A except for animated UMAP visualization of the entire human bone marrow data (n=5780). (F) Same as in panel B except for animated UMAP visualization of the entire human bone marrow data (n=5780). (G) Animated visualization of spectral data of stars, quasars, and galaxies from the Sloan Digital Sky Survey (n=1000) queries stability of clusters identified in one visualization (leftmost panel) across other visualizations. DR visualization settings are outlined in Methods.

Extended Data Fig. 3 Analysis of label separation and continuous trajectories.

Analysis of label separation and continuous trajectories. (A) Stacked UMAP visualization of transcriptomic data of mouse embryonic stem cells (n=288, B=100) undergoing three different phases of cell cycle (right) reveals stable separation of cell cycle phases that is not apparent from a single static visualization (left). (B) Same as in panel A except for stacked PCA visualization. (C) Maximum improvement in the mean silhouette coefficient when comparing DynamicViz stacked visualization with a single bootstrap visualization for eight independent DR methods. (D) Same as in panel A except for the mouse pancreatic lineage data (n=3696). (E) Stacked t-SNE visualization of single-cell transcriptomic data from gastrulation of erythroid lineage (n=1000, B=100) (right) compared to a single static visualization (left). (F) Same as in panel E except for stacked PCA visualization. (G) Same as in panel E except for stacked UMAP visualization of the entire data (n=9815). (H) Same as in panel E except for stacked UMAP visualization of the mouse pancreatic lineage data (n=3696).

Extended Data Fig. 4 Analysis of DynamicViz runtime.

Analysis of DynamicViz runtime. (A) Empirical runtimes measured for generating dynamic visualizations of data drawn from a mixture of five Gaussian distributions (p=50) and broken down into two components: bootstrap DR visualization and rigid alignment of bootstrap visualizations. Individual plots represent different DR methods (i.e. UMAP, t-SNE, or PCA). (B) Same as in panel A except for subsamples drawn from the MERFISH mouse primary motor cortex spatial transcriptomics data. (C) Empirical runtimes measured for computing the variance score using the default global neighborhood definition (see Methods for details) broken down into four components: constructing the neighborhood, computing required pair-wise distances, computing a normalization factor, and calculating the variance of pair-wise distances. Shown are the mean runtimes across all datasets and DR methods presented in panels A and B with error bars corresponding to the standard deviation in runtimes. (D) Same as in panel C except for the variance score using the alternative random neighborhood definition (k=50, see Methods for details). (E) Scatter plot of the variance score with either the global or random neighborhoods (k=50) for all cells in a n=12800 subsample of the MERFISH dataset. (F) Average variance score across t-SNE and UMAP visualizations as a function of the number of observations in data drawn from a mixture of five Gaussian distributions (p=50). (G) Same as in panel F except for subsamples drawn from the MERFISH mouse primary motor cortex spatial transcriptomics data.

Extended Data Fig. 5 Random approximation of the global variance score.

Random approximation of the global variance score for different choices of the number of random neighbors used, k, and for different combinations of DR visualization methods and data. (A) UMAP visualizations with synthetic data drawn from a mixture of Gaussians (n=1000, p=50). (B) t-SNE visualizations for the same synthetic data. (C) UMAP visualizations for the mouse subventricular zone (SVZ) single-cell transcriptomics data. (D) t-SNE visualizations for the same SVZ data. Generally, the random method for computing variance scores (see Methods for details) is a good approximation of the global variance score and the quality of this approximation increases with k. Shaded region corresponds to 95% confidence interval in all panels.

Extended Data Fig. 6 Analysis of variance score properties.

Analysis of variance score properties. (A) Silhouette scores computed for t-SNE visualizations of data drawn repeatedly from a mixture of Gaussian distributions (100 repeated draws, n=500, p=100) compared to silhouette scores for 100 bootstrap samples from one initial sample from the mixture of Gaussian distributions. Shown are bootstrap results for three representative initial samples. No statistically significant differences between any groups at two-sided Wilcoxon rank-sum test p-value cut-off of 0.05. Center line represents median, box represents interquartile range (IQR), whiskers represent range up to 1.5 × IQR. (B) Variance in the Euclidean distance between two predetermined observations in t-SNE visualization across either 100 bootstrap sample data or 100 resampled data from a Gaussian mixture model (50 features, 5 distributions) for different numbers of observations n. 95% confidence intervals are shown for 20 pairs of predetermined observations. (C) Same setting as panel C except with n=1000 and different number of bootstrap samples or resamples B. (D) t-SNE visualization of mixture of five Gaussian distributions (n=1000, p=50). (E) UMAP and t-SNE visualizations of the mouse SVZ single-cell data (n=1000). (F) Relative contributions of the marginal variance scores (B=100) for dynamic t-SNE visualizations of synthetic data drawn from a mixture of five Gaussian distributions (n in [320,640,1280,2560], p=50), of single-cell transcriptomic data from mouse subventricular zone (n in [320,640,1000]), and of single-cell transcriptomic data from mouse pancreatic lineage (n in [320,640,1000]). (G) t-SNE visualization of the first replicate of the MERFISH mouse primary motor cortex dataset. Error bar corresponds to 95% confidence interval. (H) Pearson correlation between mean DR quality metrics for t-SNE (including variance score, B=100) computed at the cell-type level on a single replicate and the mean gene variance computed at the cell-type level across all 12 technical and biological replicates in the MERFISH mouse primary motor cortex spatial transcriptomics dataset. Center line corresponds to median and box corresponds to interquartile range. In panels F and H, variance scores were computed for B=100 bootstrap samples using a random neighborhood approximation with k=200 (see Methods for details).

Extended Data Fig. 7 Analysis of DR optimization using the variance score.

Analysis of DR optimization using the variance score. (A) Stacked t-SNE visualizations of the SVZ single-cell data (n=1000) at different perplexity values. Perplexity values correspond to those shown in Figure 3A. (B) Variance scores of UMAP visualizations of the SVZ single-cell data (n=1000) computed for different choices of number of neighbors (neighbors) with stacked UMAP visualizations for the optimal neighbors value (neighbors = 320) and the least optimal case (neighbors = 5). (C) Stacked UMAP visualizations of the SVZ single-cell data (n=1000) at different neighbors values corresponding to panel B. (D) Variance scores of LLE (locally linear embedding) visualizations of the pancreatic cell lineage single-cell data (n=3696) computed for different choices of number of neighbors (neighbors) with stacked LLE visualizations for the optimal neighbors value (neighbors = 500) and the least optimal case (neighbors = 40). (E) Stacked LLE visualizations of the pancreatic cell lineage single-cell data (n=3696) at different neighbors values corresponding to panel D. (F) Distribution of variance scores, computed using random neighborhood definition with k=200 on the same data as in panel D and E, for the optimal hyperparameter choices (lowest variance score) of different DR algorithms (perplexity: t-SNE; number of neighbors: UMAP, ISOMAP, LLE, PACMAP; number of inliers: TRIMAP) shown with stacked visualizations of all bootstrap visualizations from three representative DR algorithms. Center line corresponds to median and box corresponds to interquartile range. Variance scores and dynamic visualizations were computed for B=100 bootstrap samples. In panels D and F, variance scores were computed for B=100 bootstrap samples using a random neighborhood approximation with k=200 (see Methods for details).

Extended Data Fig. 8 Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data for gastrulation of the erythroid lineage.

Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data for gastrulation of the erythroid lineage. (A) RNA velocity embedding stream UMAP plots for the original data and two bootstrapped versions (left to right). (B) UMAP visualization of the original data with colors corresponding to variance score. (C) UMAP visualization of the original data with colors corresponding to RNA velocity pseudotime. (D) Median rank-ordered pseudotimes computed for each cell over bootstrap UMAP visualizations with gray shading corresponding to 95% confidence interval. (E) Predicted terminal states of a Blood Progenitor 1 cell using RNA velocity trajectory analysis across bootstrap UMAP visualizations and transitions traced across representative bootstrap visualizations for two terminal states (Erythroid1. Erythroid2). Color corresponds to pseudotime along trajectory. Dynamic visualization and variance score provide a more detailed picture of standard RNA velocity analyses in the gastrulation of erythroid lineage, including the stability of RNA velocity streams, cell fates, and pseudotime. Variance scores, pseudotimes, and trajectory predictions were computed for B=100 bootstrap samples. Variance scores were computed using the random neighborhood approximation with k=50 (see Methods).

Extended Data Fig. 9 Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data of human bone marrow.

Application of dynamic visualizations and variance scores to RNA velocity analysis of single-cell data of human bone marrow. (A) RNA velocity embedding stream UMAP plots for the original data and two bootstrapped versions (left to right). (B) UMAP visualization of the original data with colors corresponding to variance score. (C) UMAP visualization of the original data with colors corresponding to a locally defined variance score where each cell-type cluster is a unique neighborhood. (D) UMAP visualization of the original data with colors corresponding to RNA velocity pseudotime. (E) Median rank-ordered pseudotimes computed for each cell over bootstrap UMAP visualizations with gray shading corresponding to 95% confidence interval. (F) Predicted terminal states of a Mono1 cell using RNA velocity trajectory analysis across bootstrap UMAP visualizations and transitions traced across representative bootstrap visualizations for two terminal states (HSC1, CLP). Color corresponds to pseudotime along trajectory. Dynamic visualization and variance score provide a more detailed picture of standard RNA velocity analyses in the human bone marrow lineage, including the stability of RNA velocity streams, cell fates, and pseudotime. Variance scores, pseudotimes, and trajectory predictions were computed for B=100 bootstrap samples. Variance scores were computed using the random neighborhood approximation with k=50 (see Methods).

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, E.D., Ma, R. & Zou, J. Dynamic visualization of high-dimensional data. Nat Comput Sci 3, 86–100 (2023). https://doi.org/10.1038/s43588-022-00380-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-022-00380-4

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics