A spectral method for assessing and combining multiple data visualizations

Dimension reduction is an indispensable part of modern data science, and many algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it important to evaluate their relative performance, and to leverage and combine their individual strengths. This paper proposes a spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure – the visualization eigenscore – of the relative performance of the visualizations for preserving the structure around each data point. It also generates a consensus visualization, having improved quality over individual visualizations in capturing the underlying structure. Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple real-world datasets to demonstrate the effectiveness of the method. We also provide theoretical justifications based on a general statistical framework, yielding several fundamental principles along with practical guidance.


INTRODUCTION
Data visualization and dimension reduction is a central topic in statistics and data science, as it facilitates intuitive understanding and global views of high-dimensional datasets and their underlying structural patterns through a low-dimensional embedding of the data (Donoho, 2017;Chen et al., 2020).The past decades have witnessed an explosion in machine learning algorithms for data visualization and dimension reduction.Many of them, such as Laplacian eigenmap (Belkin and Niyogi, 2003), kernel principal component analysis (kPCA) (Schölkopf et al., 1997), t-SNE (van der Maaten andHinton, 2008), and UMAP (McInnes et al., 2018), have been regarded as indispensable tools and state-of-art techniques for generating graphics in academic and professional writings (Chen et al., 2007), and for exploratory data analysis and pattern discovery in many research disciplines, such as astrophysics (Traven et al., 2017), computer vision (Cheng et al., 2015), genetics (Platzer, 2013), molecular biology (Olivon et al., 2018), especially in single-cell transcriptomics (Kobak and Berens, 2019), among others.
However, the wide availability and functional diversity of data visualization methods also brings forth new challenges to data analysts and practitioners (Nonato and Aupetit, 2018;Espadoto et al., 2019).On the one hand, it is critically important to determine among the extensive list which visualization method is most suitable and reliable for embedding a given dataset.In fact, even for a single visualization method, such as t-SNE or UMAP, oftentimes there are multiple tuning parameters to be determined by the users, and different tuning parameters may lead to distinct visualizations (Kobak and Linderman, 2021;Cai and Ma, 2021).Thus, for a given dataset, selecting the most suitable visualization method and along with its tuning parameters calls for a method that provides quantitative and objective assessment of different visualizations of the dataset.On the other hand, as different methods are usually based on distinct ideas and heuristics, they would generate qualitatively diverse visualizations of a dataset, each containing important features about the data that are possibly unique to the visualization method.Meanwhile, due to noisiness and highdimensionality of many real-world datasets, their low-dimensional visualizations necessarily contain distortions from the underlying true structures, which again may vary from one visualization to another.It is therefore of substantial practical interest to combine strengths and reach a consensus among multiple data visualizations, in order to obtain an even better "meta-visualization" of the data that captures the most information and is least susceptible to the distortions.Naturally, a meta-visualization would also save practitioners from painstakingly selecting a single visualization method among many.
In this paper, we propose an efficient spectral approach for simultaneously assessing and combining multiple data visualizations produced by diverse dimension reduction/visualization algorithms, allowing for different settings of tuning parameters for individual algorithms.Specifically, the proposed method takes as input a collection of visualizations, or low-dimensional embeddings of a dataset, hereafter referred as "candidate visualizations," and summarizes each visualization by a normalized pairwise-distance matrix among the samples.With respect to each sample in the dataset, we construct a comparison matrix from these normalized distance matrices, characterizing the local concordance between each pair of candidate visualizations.Based on eigen-decomposition of the comparison matrices, we propose a quantitative measure, referred as "visualization eigenscore," that quantifies the relative performance of the candidate visualizations in a sample-wise manner, reflecting their local concordance with the underlying low-dimensional structure contained in the data.To obtain a meta-visualization, the candidate visualizations are combined together into a meta-distance matrix, defined as a row-wise weighted average of those normalized distance matrices, using the corresponding eigenscores as the weights.The meta-distance matrix is then used to produce a meta-visualization, based on an existing method such as UMAP or kPCA, which is shown to be more reliable and more informative compared to individual candidate visualizations.Our method is schematically summarized in Figure 1 and Algorithm 1, and detailed in Section 2.1.The thus obtained meta-visualization reflects a joint perspective aggregating various aspects of the data that are oftentimes captured separately by individual candidate visualizations.
Numerically, through extensive simulations and analysis of multiple real-world datasets with diverse underlying structures, we show the effectiveness of the proposed eigenscores in assessing and ranking a collection of candidate visualizations, and demonstrate the superiority of the final meta-visualization over all the candidate visualizations in terms of identification and characterization of these structural patterns.To achieve a deeper understanding of the proposed method, we also develop a formal statistical framework, that rigorously justifies the proposed scoring and meta-visualization method, providing theoretical insights on the fundamental principles behind the empirical success of the method, along with its proper interpretations, and guidance on practice.
Figure 1: A graphical illustration of the proposed method.The algorithm takes as input the normalized pairwise distance matrices associated to a collection of candidate visualizations (viz1 to viz4) of a dataset.For each sample of the dataset, we compute the similarity matrix between the rows of the normalized distance matrices associated to the sample (rows highlighted in the same color), and then define the corresponding eigenscores as the first eigenvector of the similarity matrix.The size of the circles in the similarity matrices and the vectors of eigenscores indicate the magnitude of the entries (assumed to be non-negative).The meta-distance matrix is defined such that its rows are the eigenscore-weighted average of the rows in the normalized distance matrices.The meta-distance leads to a meta-visualization, expected to be more concordant with the underlying true structure than individual candidate visualizations.

Related Works
Quantitative assessment of dimension reduction and data visualization algorithms is of substantial practical interests, and have been extensively studied in the past two decades.For example, many evaluation methods are based on distortion measures from metric geometry (Abraham et al., 2006(Abraham et al., , 2009;;Chennuru Vankadara and von Luxburg, 2018;Bartal et al., 2019), whereas some other methods rely on information-theoretic precision-recall measures (Venna et al., 2010;Arora et al., 2018), co-ranking structure (Mokbel et al., 2013), or graph-based criteria (Wang et al., 2021;Cai and Ma, 2021).See also Bertini et al. (2011), Nonato and Aupetit (2018) and Espadoto et al. (2019) for recent reviews.However, most of these existing methods evaluate data visualizations by comparing them directly with the original dataset, without accounting for its noisiness.The thus obtained assessment may suffer from intrinsic bias due to ignorance of the underlying true structures, only approximately represented by the noisy observations; see Section 2.3.5 and Supplementary Figure 21 for more discussions.To address this issue, the proposed eigenscores, in contrast, provide provably consistent assessment and ranking of visualizations reflecting their relative concordances with the underlying noiseless structures in the data.
Compared to the quantitative assessment of data visualizations, there is a scarcity of metavisualization methods that combine strengths of multiple data visualizations.In Pagliosa et al. (2015), an interactive method is developed that assesses and combines different multidimensional projection methods via a convex combination technique.However, for supervised learning tasks such as classification, there is a long history of research on designing and developing meta-classifiers that combine multiple classifiers (Woods et al., 1997;Tax et al., 2000;Parisi et al., 2014;Liu et al., 2017;Mohandes et al., 2018).Compared with meta-classification, the main difficulty of metavisualization lies in the identification of a common space to properly align multiple visualizations, or low-dimensional embeddings, whose scales and coordinate bases may drastically differ from one to another (see, for example, Figures 3-5(a)).Moreover, unlike many meta-classifiers, which combines presumably independent classifiers trained over different datasets, a meta-visualization procedure typically relies on multiple visualizations of the same dataset, and therefore has to deal with more complicated correlation structure among the visualizations.The current study provides the first meta-visualization method that can flexibly combine any number of visualizations, and has interpretable and provable performance guarantee.

Main Contributions
The main contribution of the current study can be summarized as follows: • We propose a computationally efficient spectral method for assessing and combining multiple data visualizations.The method is generic and easy to implement: it does not require knowledge of the original dataset, and can be applied to a large number of data visualizations generated by diverse methods.
• For any collection of visualizations of a dataset, our method provides a quantitative measure -eigenscore -of the relative performance of the visualizations for preserving the structure around each data point.The eigenscores are useful on their own rights for assessing the local and global reliability of a visualization in representing the underlying structures of the data, and in guiding selection of hyper-parameters.
• The proposed method automatically combines strengths and ameliorates weakness (distortions) of the candidate visualizations, leading to a meta-visualization which is provably better than all the candidate visualizations under a wide range of settings.We show that the metavisualization is able to capture diverse intrinsic structures, such as clusters, trajectories, and mixed low-dimensional structures, contained in noisy and high-dimensional datasets.
• We establish rigorous theoretical justifications of the method under a general signal-plus-noise model (Section 2.3) in the large-sample limit.We prove the convergence of the eigenscores to certain underlying true concordance measures, the guaranteed performance of the metavisualization and its advantages over alternative methods, its robustness against possible adversarial candidate visualizations, along with their conditions, interpretations, and practical implications.
The proposed method is described in detail in Section 2.1, and empirically illustrated and evaluated in Section 2.2, through extensive simulation studies and analyses of three real-world datasets with diverse underlying structures.In Section 2.3, we show results from our theoretical analysis, which unveils fundamental principles associated to the method, such as the benefits of including qualitatively and functionally diverse candidate visualizations.

Eigenscore and Meta-Visualization Methodology
Throughout, without loss of generality, we assume that for visualization purpose the target embedding is two-dimensional, although our discussion applies to any finite-dimensional embedding.
We consider visualizing a p-dimensional dataset {Y i } 1≤i≤n containing n samples.From {Y i } 1≤i≤n , suppose we obtain a collection of K (candidate) visualizations of the data, produced by various visualization methods.We denote these visualizations as two-dimensional embeddings {X Our approach only needs access to the low-dimensional embeddings {X (k) i } 1≤i≤n rather than the raw data {Y i } 1≤i≤n ; as a result, the users can use our method even if they don't have access to the original data, which is often the case.

Measuring Normalized Distances From Each Visualization
In order that the proposed method is invariant to the respective scale and coordinate basis (i.e., directionality) of the low-dimensional embeddings generated from different visualization method, we start by considering the normalized pairwise-distance matrix for each visualization.
Specifically, for each k ∈ {1, 2, ..., K}, we define the normalized pairwise-distance matrix where is the un-normalized Euclidean distance matrix, and 2 ) is a diagonal matrix with its diagonal entries being the 2 -norms of the rows {P (k) 1. , ..., P (k) n. } of P (k) .As a result, the normalized distance matrix P(k) has its rows being unit vectors, and is invariant to any scaling and rotation of the visualization {X (k) i } 1≤i≤n .The normalized distance matrices { P(k) } 1≤k≤K summarize the candidate visualizations in a compact and efficient way.Their scale-and rotation-invariance properties are particularly useful for comparing visualizations produced by distinct methods.

Algorithm 1 Spectral assessment and combination of multiple data visualizations
(2.1) where n. 2 ). 2. Obtain eigenscores: for each i ∈ {1, 2, ..., n}, (i) calculate the similarity matrix (2.2) (ii) perform eigen-decomposition of G i and define the eigenscores where u i is the eigenvector of G i associated to its largest eigenvalue.
3. Construct meta-distance matrix: for each i ∈ {1, 2, ..., n}, calculate the eigenscoreweighted average and define Pm ∈ R n×n whose i-th row is Pm i. .4. Obtain meta-visualization: apply an existing visualization method (e.g., UMAP or kPCA) to Pm to obtain a meta-visualization.Output: the eigenscores { s i } 1≤i≤n , and the meta-visualization.

Sample-wise Eigenscores for Assessing Visualizations
Our spectral method for assessing multiple visualizations is based on the normalized distance matrices { P(k) } 1≤k≤K .For each i ∈ {1, 2, ..., n}, we define the similarity matrix which summarizes the pairwise similarity between the candidate visualizations with respect to sample i.By construction, the entries of G i are inner-products between unit vectors, each representing the normalized distances associated with sample i in a candidate visualization.Naturally, a larger entry ( P(k 1 ) i. ) P(k 2 ) i.
indicates higher concordance between the two candidate visualizations.Then, for each i ∈ {1, 2, ..., n}, we define the vector of eigenscores s i = (ŝ i,1 , ..., ŝi,K ) for the candidate visualizations with respect to sample i as the absolute value of the eigenvector u i ∈ R K of G i associated to its largest eigenvalue, that is, where the absolute value function |•| is applied entrywise.As will be explained later (Section 2.3.2), the nonnegative components of s i quantify the relative performance of K candidate visualizations with respect to sample i, with higher eigenscores indicating better performance.Consequently, for each candidate visualization {X (k) i } 1≤i≤n , one obtains a set of eigenscores {ŝ i,k } 1≤i≤n summarizing its performance relative to other candidate visualizations in a sample-wise manner.Ranking and selection among candidate visualizations can be achieved based on various summary statistics of the eigenscores, such as mean, median, or coefficient of variation, depending on the specific applications.In particular, when some candidate visualizations are produced by the same method but under different tuning parameters, the eigenscores can be used to select the most suitable tuning parameters for visualizing the dataset.However, a more substantial application of the eigenscores is to combine multiple data visualizations into a meta-visualization, which has improved signal-tonoise ratio and higher resolution of the structural information contained in the data.
Importantly, as will be shown later (Section 2.3.5), the eigenscores essentially take the underlying true signals rather than the noisy observations {Y i } 1≤i≤n as its referential target for performance assessment, making the method easier to implement and less susceptible to the effect of noise in the original data (Section 2.3.5 and Supplement Figure 21).

Meta-Visualization using Eigenscores
Using the above eigenscores, one can construct a meta-distance matrix properly combining the information contained in each candidate visualization.Specifically, for each i ∈ {1, 2, ..., n}, we define the vector of meta-distances with respect to sample i as the eigenscore-weighted average of all the normalized distances respect to sample i, that is, (2.9) Then, the meta-distance matrix is defined as Pm ∈ R n×n whose i-th row is Pm i. .To obtain a metavisualization, we take the meta-distance matrix Pm and apply an existing visualization method that allows for the meta-distance Pm (or its symmetrized version Pm + ( Pm ) ) as its input.
Intuitively, for each i = 1, 2, ..., n, we essentially apply a principal component (PC) analysis to the normalized distance matrix Pi = P(1) i.
∈ R n×K .Specifically, by definition (Jolliffe and Cadima, 2016) the leading eigenvector u i of G i = P i Pi ∈ R K×K is the first PC loadings of Pi , whereas the first PC is defined as the linear combination Pi u i = K k=1 ûi,k P(k) i. .Under the condition that the first PC loadings are all nonnegative (which is ensured with high probability under condition (C2) below), the first PC Pi u i is exactly the meta-distance Pm i. defined in (2.9) above.When interpreted as PC loadings, the leading eigenvector u i of G i contains weights for different vectors { P(k) i. } 1≤k≤K so that the final linear combination Pi u i has the largest variance, that is, summarizes the most information contained in Pi .It is in this sense that the meta-distance Pm i. is a consensus across { P(k) i. } 1≤k≤K .For our own numerical studies (Section 2.2), we used UMAP for meta-visualizing datasets with cluster structures, and used kPCA for meta-visualizing all the other datasets with smoother manifold structures, such as trajectory, cycle, or mixed structures.The choice of UMAP in the former case was due to its advantage in treating large numbers of clusters without requiring prior knowledge about the number of clusters (McInnes et al., 2018;Cai and Ma, 2021); whereas the choice of kPCA in the latter case was rooted in its advantage in capturing nonlinear smooth manifold structures (Ding and Ma, 2022).In each case, the hyper-parameters used for generating the meta-visualization were determined without further tuning -for example, when using UMAP for meta-visualization, we set the hyper-parameters the same as those associated to the UMAP visualization which achieved higher median eigenscore than other UMAP visualizations.Moreover, while in general UMAP/kPCA works well as a default method for meta-visualization, our proposed algorithm is robust with respect to the choice of this final visualization method.In our numerical analysis (Section 2.2), we observed empirically that other methods such as t-SNE and PHATE could also lead to meta-visualizations with comparably substantial improvement over individual candidate visualizations in terms of the concordance with the underlying true low-dimensional structure of the data (see Supplement Figure 11).In addition, the meta-visualization shows robustness to potential outliers in the data (Figures 4 and 5).
In Section 2.3, under a generic signal-plus-noise model, we obtain explicit theoretical conditions under which the performance of the proposed spectral method is guaranteed.Specifically, we show the convergence of the eigenscores to a desirable concordance measure between the candidate visualizations and the underlying true pattern, characterized by their associated pairwise distance matrices of the samples.In addition, we show improved performance of the meta-visualization over the candidate visualizations in terms of their concordance with the underlying true pattern, and its robustness against possible adversarial candidate visualizations.These conditions provide proper interpretations and guidance on the application of the method, such as how to more effectively prepare the candidate visualizations (Section 2.3.6).For clarity, we summarize these technical conditions informally as follows, and relegate their precise statements to Section 2.3.
(C1') The performance of candidate visualizations are sufficiently diverse in terms of their individual distortions from the underlying true structures.
(C2') The candidate visualizations altogether contains sufficient amount of information about the underlying true structures.
Intuitively, Condition (C1') concerns diversity of methods in producing candidate visualizations, whereas Condition (C2') is related to the quality of the candidate visualizations.In practice, Condition (C2') is satisfied when the signal-to-noise ratio in the data, as described by (2.11), is sufficiently large, so that the adopted visualization methods perform reasonably well on average.On the other hand, from Section 2.3, a sufficient condition for (C1') is that, at most √ K out of K candidate visualizations are very similarly distorted from the true patterns in terms of the normalized distances P(k) .This would allow, for example, groups of up to 3 to 4 candidate visualizations out of 10 to 15 visualizations being produced by very similar procedures such as the same method under different hyper-parameters.

Simulation Studies: Visualizing Noisy Low-Dimensional Structures
To demonstrate the wide range of applicability and the empirical advantage of the proposed method, we consider visualization of three families of noisy datasets, each containing a distinct low-dimensional structure as its underlying true signal.We assess performance of the eigenscores and the quality of the resulting meta-distance matrix based on 16 candidate visualizations produced by multiple visualization methods.
For a given sample size n, we generate p-dimensional noisy observations where {Y * i } 1≤i≤n are the underlying noiseless samples (signals), and {Z i } 1≤i≤n are the random noises.Specifically, we generate true signals {Y * i } 1≤i≤n from various low-dimensional structures isometrically embedded in the p-dimensional Euclidean space.Each of the low-dimensional structures lie in some r-dimensional linear subspace, and is subject to an arbitrary rotation in R p , so that these signals are generally p-dimensional vectors with dense (nonzero) coordinates.Then we generate i.i.d.noise vector Z i from the standard multivariate normal distribution N (0, I p ), and use the p-dimensional noisy vector Y i = Y * i + Z i as the final observed data.In this way, we simulated noisy observations {Y i } 1≤i≤n of an intrinsically Table 1: Empirical mean and standard error (SE) of the averaged cosines 1 n n i=1 cos ∠( s i , s i ), between the eigenscores and the true concordance measures, over each family of datasets associated with a given low-dimensional structure under various values of the SNR parameter θ.
(ii) "Smiley face" with r = 2: {Y * i } 1≤i≤n are generated independently and uniformly from a twodimensional "smiley face" structure (Supplement Figure 6 left) of diameter θ, isometrically embedded in R p and subject to an arbitrary rotation.
(iii) "Mammoth" manifold with r = 3: {Y * i } 1≤i≤n are generated independently uniformly from a three-dimensional "mammoth" manifold (Supplement Figure 6 right) of diameter θ, isometrically embedded in R p and subject to an arbitrary rotation.
The thus generated datasets cover diverse structures including Gaussian mixture clusters (i), mixedtype nonlinear clusters (ii), and a connected smooth manifold (iii).As a result, the first family of datasets was set to have p = 500 and n = 900, and were obtained by fixing various values of the SNR parameter θ, and generating Y * i ∈ R p from the above setting (i) to obtain the noisy dataset {Y i } 1≤i≤n as described above.Similarly, the second and the third families of datasets were obtained by drawing Y * i ∈ R p from the above settings (ii) and (iii), respectively, and generating datasets {Y i } 1≤i≤n with p = 300 and n = 500, for various values of θ.
To evaluate the proposed eigenscores, for each setting and each i ∈ {1, 2, ..., n}, we compute cos ∠( s i , s i ) := ( s i ) s i s i 2 s i 2 , for the angle between the eigenscores s i and the true local concordance s i defined as  1 shows empirical mean and standard error (SE) of the averaged cosines 1 n n i=1 cos ∠( s i , s i ), over the family of datasets under the same low-dimensional structure associated with various θ as shown in Figure 2(a).Our simulations showed that cos ∠( s i , s i ) ≈ 1, indicating that the eigenscores s i essentially characterize the true concordance between the patterns contained in each candidate visualization and that of the underlying noiseless samples, evaluated locally with respect to sample i.This justifies the proposed eigenscore as a precise measure of performance of the candidate visualizations in preserving the underlying true signals.
To assess the quality of two meta-distance matrices, for each dataset, we compare the mean concordance between the normalized distance of each candidate visualization and that of the underlying noiseless samples, and the mean concordance between the obtained meta-distance and that of the underlying noiseless samples.Figure 2(a) and Supplement Figure 7 show boxplots of these mean concordances for the 16 candidate visualizations and the two meta-distances under each setting of underlying structures across various values of θ.We observe that for each of the three structures, our proposed meta-distance is substantially more concordant with the underlying true patterns, than every candidate visualization and the naive meta-distance, indicating the superiority of the proposed meta-distance.To further demonstrate the advantage of the spectral meta-distance and its benefits to the final meta-visualization, we compared our proposed meta-visualization using UMAP, and candidate visualizations of a dataset under setting (i) with θ = 5, and present their sample-wise concordance {( P(k) i. ) P * i. } 1≤i≤n for each k, and {( Pm i. ) P * i. } 1≤i≤n of the proposed meta-distance (Figure 2(b) and Supplement Figure 8).We observe that, while each individual method may capture some clusters in the dataset but misses others, the proposed meta-visualization is able to combine strengths of all the candidate visualizations in order to capture all the underlying clusters.Finally, to demonstrate the flexibility of our method with respect to higher intrinsic dimension r, under the setting (i), we further evaluated the performance of different methods for r ∈ {15, 30, 50}.Supplement Figure 9 shows consistent and superior performance of the proposed method compared to the other approaches.

Visualizing Clusters of Religious and Biblical Texts
Cluster data are ubiquitous in scientific research and industrial applications.Our first real data example concerns n = 590 fragments of text, extracted from English translations of eight religious books or sacred scripts including Book of Proverb (BOP), Book of Ecclesiastes (BOE1), Book of Ecclesiasticus (BOE2), Book of Wisdom (BOW), Four Noble Truth of Buddhism (BUD), Tao Te Ching (TTC), Yogasutras (YOG) and Upanishads (UPA) (Sah and Fokoué, 2019).All the text were pre-processed using natural language processing into a 590×8265 Document Term Matrix that counts frequency of 8265 atomic words, such as truth, diligent, sense, power, in each text fragment.In other words, each text fragment was treated as a bag of words, represented by a vector with 8265 features.The word counts were centred and normalized before downstream analysis.
As in our simulation studies, we still consider K = 16 candidate visualizations generated by 12 different methods with various tuning parameters (see Section A.2 of the Supplement for implementation details).Figure 3  of cluster patterns demonstrated by each candidate visualizations in Figure 3(a) and in Supplement Figure 10. Figure 3(c) is the proposed meta-visualization1 of the samples by applying UMAP to the meta-distance matrix, which shows substantially better clustering of the text fragments in accordance with their sources.In addition, the meta-visualization also reflected deeper relationship between the eight religious books, such as the similarity between the two Hinduism books YOG and UPA, the similarity between Buddhism (BUD) and Taoism (TTC), the similarity between the four Christian books BOE1, BOE2, BOP, and BOW, as well as the general discrepancy between Asian religions (Hinduism, Buddhism, Taoism) and non-Asian religions (Christianity).All of these important phenomena, while salient in our meta-visualization, only appeared vaguely in very few candidate visualizations such as those produced by PHATE (Figure 3(a)) and UMAP (Supplement Figure 10).
To quantitatively evaluate the preservation of the underlying clustering pattern, we computed for each visualization the Silhouette indices (Rousseeuw, 1987) with respect to the underlying true cluster membership, based on the normalized pairwise-distance matrices of the embeddings defined in (2.5).The Silhouette index (see Section A.2 of the Supplement for its definition), defined for each individual sample in a visualization, measures the amount of discrepancy between the within-class distances and the inter-class distances with respect to a given sample.As a result, for a given visualization, its Silhouette indices altogether indicate how well the underlying cluster pattern is preserved in a visualization, and higher Silhouette indices indicate that the underlying clusters are more separate.Empirically, we observed a notable correlation (ρ = 0.679) between the median Silhouette indices and the median eigenscores across the candidate visualizations (Supplement Figure 11).In addition, for each candidate visualization, we found that samples with higher Silhouette index tend to have higher eigenscores (Supplement Figure 12), demonstrating the effectiveness of eigenscores, and its benefits on the final meta-visualization.In Figure 3(d), we show that, even taking into account the stochasticity of the visualization method (UMAP) applied to the metadistance matrix, our meta-visualization had the median Silhouette index much higher than those of the candidate visualizations, as well as that of the meta-visualization "meta-aver" based on the naive meta-distance.It is of interest to note that "meta-spec" was the only visualization with a positive median Silhouette index, showing its better separation of clusters compared with other visualizations.Importantly, the proposed meta-visualization was not sensitive to the specific visualization method applied to the meta-distance matrices -similar results were obtained when we replaced UMAP by PHATE, the method having the highest median eigenscore in Figure 3(c), or t-SNE, for meta-visualization (Supplement Figure 11).

Visualizing Cell Cycles
Our second real data example concerns visualization of a different low-dimensional structure, namely, a mixture of cycle and clusters, contained in the gene expression profile of a collection of mouse embryonic stem cells, as a result of the cell cycle mechanism.The cell cycle, or celldivision cycle, is the series of events that take place in a cell that cause it to divide into two daughter cells2 .Identifying the cell cycle stages of individual cells analyzed during development is important for understanding its wide-ranging effects on cellular physiology and gene expression profiles.Specifically, our dataset contains n = 288 mouse embryonic stem cells, whose underlying cell cycle stages were determined using flow cytometry sorting (Buettner et al., 2015).Among them, one-third (96) of the cells are in the G1 stage, one-third in the S stage, and the rest in the G2M stage.The raw count data were preprocessed and normalized, leading to a dataset consisting of standardized expression levels of 1147 cell-cycle related genes for the 288 cells (see Section A.2 of the Supplement for implementation details of our data preprocessing).
We obtained 16 candidate visualizations as before, and applied our proposed method.Figure 4(a) contains examples of candidate visualizations obtained by t-SNE, LEIM, and kPCA, whose median eigenscores were ranked top, middle and bottom among all the visualizations, respectively, and the cells were colored according to their true cell cycle stages.Figure 4(b) contains the boxplots of eigenscores for the candidate visualizations, indicating the overall quality of each visualization.The variation of eigenscores within each candidate visualization suggests that different visualizations have their own unique features and strengths to be contributed to the meta-visualization (Supplement Figure 16).Figure 4(c) is the proposed meta-visualization by applying kPCA to the meta-distance matrix.Comparing with Figure 4(a), the proposed meta-visualization showed better clustering of the cells according to their cell cycle stages, as well as a more salient cyclic structure underlying the three cell cycle stages (Supplement Figure 15).To quantify the performance of each visualization in terms of these two underlying structures (cluster and cycle), we considered two distinct metrics, namely, the median Silhouette index with respect to the underlying true cell cycle stages, and the Kendall's tau statistic (Kendall, 1938) between the inferred relative order of the cells and their true orders on the cycle.Specifically, to infer the relative order of cells, we projected the coordinates of each visualization to the two-dimensional unit circle centred at the origin (Supplement Figure 15), and then determined the relative orders based on the cells' respective projected positions on the unit circle.Figure 4(d) shows that the proposed meta-visualization was significantly better than all the candidate visualizations and the naive meta-visualization in representing both aspects of the data.

Visualizing Trajectories of Cell Differentiation
Our third real data example concerns visualization of a mixed pattern of a trajectory and clusters underlying the gene expression profiles of a collection of cells undergoing differentiation (Hayashi et al., 2018).Specifically, 421 mouse embryonic stem cells were induced to differentiate into primitive endoderm cells.After the induction of differentiation, the cells were dissociated and individually captured at 12-or 24-hour intervals (0, 12, 24, 48 and 72 h), and each cell was sequenced to obtain the final total RNA sequencing reads using the random displacement amplification sequencing technology.As a result, at each of the five time points, there were about 70 to 90 cells captured and sequenced.The raw count data were preprocessed and normalized (see Section A.2 of the Supplement for implementation details), leading to a dataset consisting of standardized expression levels of 500 most variable genes for the 421 cells.
Again, we obtained 16 candidate visualizations as before, and applied our proposed method.In Figure 5(a)-(c) we show examples of candidate visualizations, boxplots of the eigenscores, and the meta-visualization using kPCA.The global (Figure 5(b)) and local (Supplement Figure 18) variation of eigenscores demonstrated contribution of different visualizations to the final meta-visualization according to their respective performance.We observed that some candidate visualizations such as kPCA, UMAP (Figure 5(a)) and PHATE (Supplement Figure 17) to some extent captured the underlying trajectory structure consistent with the time course of the cells.However, the metavisualization in Figure 5(c) showed much more salient patterns in terms of both the underlying trajectory and the cluster pattern among the cells, by locally combining strengths of the individual visualizations (Supplement Figure 18).We quantified the performance of visualizations from these two aspects using the median Silhouette index with respect to the underlying true cluster membership (i.e., batches of time course) and Kendall's tau statistic between the inferred cell order and the true order along the progression path.To infer the relative order of the cells from a visualization, we ordered all the cells based on the two-dimensional embedding along the direction that explained the most variability of the cells.In Figure 5(d), we observed that, the proposed meta-visualization had the largest median Silhouette index as well as the largest Kendall's tau statistic, compared with all the candidate visualizations and the naive meta-visualization, showing the superiority of the proposed meta-visualization in both aspects.

Computational Cost
For datasets of moderate size as the ones analyzed in the previous sections, the proposed method had a computational cost comparable to that of t-SNE or UMAP for generating a single candidate visualization (Supplement Figure 19).As for very large and high-dimensional datasets, there are a few features of the proposed algorithm that make it readily scalable.First, although our method relies on computing the leading eigenvector of generally non-sparse matrices, these matrices (i.e., G i in Algorithm 1) are of dimension K × K, where K -the number of candidate visualizations -is usually much smaller compared to the sample size n or dimensionality p of the original data.Thus, for each sample i, the computational cost due to the eigendecomposition is mild.Second, given the candidate visualizations, our proposed algorithm is independent of the dimensionality (p) of the original dataset, as it only requires as input a set of low-dimensional embeddings produced by different visualization methods.Third, since our algorithm computes the eigenscores and the meta-distance with respect to each sample individually, the algorithm can be easily parallelized and carried out in multiple cores to further reduce time cost.
To demonstrate the computational efficiency of the proposed method for large and high-dimensional datasets, we evaluated the proposed method on real single-cell transcriptomic datasets (Buckley et al., 2022) of various sample sizes (n ∈ {1000, 2000, 4000, 8000, 14000} cells of nine different cell types from the neurogenic regions of mice) and dimensions (p ∈ {500, 1000, 2000} genes).For each dataset, we obtained 11 candidate visualizations and applied Algorithm 1 to generate the final meta-visualization (see Section A.2 of the Supplement for implementation details).Supplement Figure 20(b) contains boxplots of median Silhouette indices for each candidate visualizations and the meta-visualization (highlighted in red) with respect to the underlying true cell types, showing the stable and superior performance of the proposed method under various sample sizes and dimensions.In Supplement Figure 20(a), we compared the running time for generating the 11 candidate visualizations, and that for generating the meta-visualizations based on Algorithm 1, on a MacBook Pro with 2.2 GHz 6-Core Intel Core i7.In general, as n became large, the running time of the proposed algorithm also increased, but remained much less than that for generating the candidate visualizations.The difference in time cost became more significant as n increased, demonstrating that for very large and high-dimensional datasets the computational cost essentially comes from generating candidate visualizations, rather than from the meta-visualization step.In particular, for dataset of sample size as large as 8000 and of dimension 2000, it took about 60 mins to generate all the 11 candidate visualizations, and took about additional 12 mins to generate the meta-visualization.Moreover, Supplement Figure 20(a) also demonstrated that, for each n, when p increased, the running time for generating the candidate visualizations was longer, but the time cost for meta-visualization remained about the same (difference less than one minute).We also note that users often create multiple visualizations for data exploration, and our approach can simply reuse these visualizations with little additional computational cost.

General Model for Multiple Visualizations
We develop a general and flexible theoretical framework, to investigate the statistical properties of the proposed methods, as well as the fundamental principles behind its empirical success3 .As can be seen from Section 2.1, there are two key ingredients of our proposed method, namely, the eigenscores { s i } 1≤i≤n for evaluating the candidate visualizations, and the meta-distance matrix Pm that combines multiple candidate visualizations to obtain a meta-visualization.
To formally study their properties, we introduce a generic model for the collection of K candidate visualizations produced by multiple visualization methods, with possibly different settings of tuning parameters for a single method as considered in previous sections.Specifically, we assume {Y i } 1≤i≤n are generated as i } 1≤k≤K are identically distributed sub-Gaussian vectors with parameter σ 2 , that is, for any deterministic unit vector g ∈ R n , we have E exp{(h 1)) with high probability4 for some constant c > 0.
This assumption makes (2.12) a generative model for {P (k) i. } 1≤k≤K with ground truth P * i. and random distortions, where the variance parameter σ describes the average level of the distortions of candidate visualizations from the truth after proper scaling.
In relation to (2.11), such a condition can be satisfied when the signal structure {Y * i } 1≤i≤n is finite, the noise {Z i } 1≤i≤n is sub-Gaussian, and the dimension reduction map underlying the candidate visualization is bounded and sufficiently smooth.See Section B.2 of the Supplement for details.In addition, we also need to characterize the correlations among these random distortions, not only because the candidate visualizations are typically obtained from the same dataset {Y i } 1≤i≤n , but also because of the possible similarity between the adopted visualization methods, such as MDS and iMDS, or t-SNE under different tuning parameters.Specifically, for any j, k ∈ {1, 2, ..., K}, we define the crossvisualization covariance , and quantify the level of dependence between a pair of candidate visualizations by ρ jk = Σ jk /σ 2 .By Condition (C1a), we have ρ jj ≤ 1 for all j.For all correlation parameters {ρ jk } 1≤j,k≤K , we assume Condition (C1b) covers a wide range of correlation structures among the candidate visualizations, allowing in particular for a subset of highly correlated visualizations possibly produced by very similar methods.The parameter ρ characterizes the overall correlation strength among the candidate visualizations, which is assumed to be not too large.As a comparison, note that a set of pairwise independent candidate visualizations implies that ρ ≈ 1, whereas a set of identical candidate visualizations have ρ ≈ K.In particular, the requirement ρ = o(K) can be satisfied if, for example, among K candidate visualizations, there are subsets of at most √ K visualizations that are produced by very similar procedures, such as by the same method under different tuning parameters, so that ρ ≤ √ K = o(K).When Condition (C1b) fails, as all the candidate visualizations are essentially similarly distorted from truth, combination of them will not be substantially more informative than each individual visualization.

Convergence of Eigenscores
Under Condition (C1a), it holds that E h Hence, we can use the quantity n to characterize the overall SNR in the candidate visualizations as modelled by (2.12), which reflects the average quality of the candidate visualizations in preserving the underlying true patterns around sample i.Before stating our main theorems, we first introduce our main assumption on the minimal SNR requirement, that is, (C2) For (σ, ρ) defined in ( C1a) and (C2b), it holds that Our algorithm is expected to perform well if ρ/K is small relative to the overall SNR. ).The following theorem concerns the convergence of eigenscores to the true concordance s i , and is proved in Section B.3 of the Supplement.
Theorem 1.Under Conditions (C1a) (C1b) and (C2), for each i ∈ {1, 2, ..., n}, it holds that cos ∠( Theorem 1 implies that, as long as the candidate visualizations contain sufficient amount of information about the underlying true structure, and are not terribly correlated, the proposed eigenscores { s i } 1≤i≤n are quantitatively reliable, as they converge to the actual quality measures {s i } 1≤i≤n asymptotically.In other words, the eigenscores provide a point-wise consistent estimation of the concordance between the candidate visualizations as summarized by {P (k) } 1≤k≤K and the underlying true patterns P * , justifying the empirical observations in Table 1.Importantly, Condition (C2) suggests that our proposed eigenscores may benefit from a larger number K of candidate visualizations, or a smaller overall correlation ρ, that is, a collection of functionally more diverse candidate visualizations; see further discussions in Section 2.3.6.

Theoretical Guarantee for Meta-Visualization
Our second theorem concerns the guaranteed performance of our proposed meta-distance matrix and its improvement upon the individual candidate visualizations in the large-sample limit.
Theorem 2. Under Conditions (C1a) (C1b) and (C2), for each i ∈ {1, 2, ..., n}, it holds that cos ∠( Pm i. , P * i. ) → 1 in probability as n → ∞.Moreover, for any constant δ ∈ (0, 1), there exist a constant C > 0 such that, whenever Theorem 2 is proved in Section B.4 of the Supplement.In addition to the point-wise consistency of Pm as described by cos ∠( Pm i. , P * i. ) → 1 in probability, Theorem 2 also ensures that the proposed meta-distance is in general no worse than the individual candidate visualizations, suggesting a competitive performance of the meta-visualization.In particular, if in addition to Conditions (C1a) (C1b) and (C2) we also have that is, the magnitude of the random distortions from the true structure P * i. is relatively large, then each candidate visualization necessarily has at most mediocre performance, i.e., max 1≤k≤K cos ∠(P (k) i. , P * i. ) < 1 − δ in probability.In such cases, the proposed meta-distances is still consistent and thus strictly better than all candidate visualizations.Theorem 2 justifies the superior performance of the spectral meta-visualization demonstrated in Section 2.2, compared with 16 candidate visualizations.
Among the three conditions required for the consistency of the proposed meta-distance matrix, Condition (C2) is most critical as it describes the minimal SNR requirement, that is, how much information the candidate visualizations altogether should contain about the underlying true structure of the data.In this connection, our theoretical analysis indicates that, in fact, such a signal strength condition is also necessary, not only for the proposed method, but for any possible methods.More specifically, in Section B.6 of the Supplement, we proved (Theorem 4) that, it's impossible to construct a meta-distance matrix that is consistent when Condition (C2) is violated.This result shows that the settings where our meta-visualization algorithm works well is essentially the most general setting possible.

Robustness of Spectral Weighting against Adversarial Visualizations
In Section 2.2, in addition to the proposed meta-visualization, we also considered the metavisualization based on the naive meta-distance matrix Pa , whose rows are which is a simple average across all the candidate visualizations.We observed in all our realworld data analyses that, such a naive meta-visualization only had mediocre performance compared to the candidate visualizations (Figures 3,4 and 5), much worse than the proposed spectral meta-visualization.The empirical observations suggest the advantage of informative weighting for combining candidate visualizations.
The empirically observed suboptimality of the non-informative weighting procedure can justified rigorously by theory.Our next theorem concerns the behavior of the proposed meta-distance matrix Pm and the naive meta-distance matrix Pa when combining a mixture of well-conditioned candidate visualizations, as characterized by our assumptions (C1a) (C1b) and (C2), and some adversarial candidate visualizations whose pairwise-distance matrices does not contain any information about the true structure.Specifically, we suppose among all the K candidate visualizations, there is a collection C 0 of (1 − η)K well-conditioned candidate visualizations for some small η ∈ (0, 1), and a collection C 1 of ηK adversarial candidate visualizations.Theorem 3.For any i ∈ {1, 2, ..., n}, suppose among all the K candidate visualizations, there is a collection C 0 of (1 − η)K candidate visualizations for some small η ∈ (0, 1) satisfying Conditions (C1a) (C1b) and (C2), and a collection C 1 of ηK adversarial candidate visualizations such that (P (k) i. ) P * i. = 0 for all k ∈ C 1 .Then, for the proposed meta-distance Pm , we still have cos ∠( Pm i. , P * i. ) → 1 in probability as n → ∞.However, for the naive meta-distance Pa , even if Theorem 3 is proved in Section B.5 of the Supplement.By Theorem 3, on the one hand, even when there are a small portion of really poor (adversarial) candidate visualizations to be combined with other relatively good visualizations, the proposed method still perform well thanks to the consistent eigenscore weighting in light of Theorem 1.On the other hand, no matter how strong the SNR is for those well-conditioned candidate visualizations, the method based on noninformative weighting is strictly sub-optimal.Indeed, when ) → 1 in probability for all k ∈ C 0 , the non-informative weighting would suffer from the non-negligible negative effects from the adversarial visualizations in C 1 , causing a strict deviation from P * ; see, for example, Figures 3-5(b)(d) for empirical evidences from real-world data.

Limitations of Original Noisy High-Dimensional Data
The proposed eigenscores provide an efficient and consistent way of evaluating the performance of the candidate visualizations.As mentioned in Section 1, a number of metrics have been proposed to quantify the distortion of a visualization by comparing the low-dimensional embedding directly with the original high-dimensional data.Such metrics essentially treat the original high-dimensional data as the ground truth, and do not take into account the noisiness of the high-dimensional data.However, for many datasets arising from real-world applications, the observed datasets, as modelled by (2.11), are themselves very noisy, which may not make an ideal reference point for evaluating a visualization that probably has already significantly denoised the data through dimension reduction.For example, the three real-world datasets considered in Section 2.2 are all high-dimensional and contain much more features than number of samples.In each case, there are some underlying clusters among the samples, but the original datasets showed significantly weaker cluster structure compared to most of the 16 candidate visualizations (Supplement Figure 21), suggesting that directly comparing a visualization with the noisy high-dimensional data may be misleading.In this respect, our theorems indicate that the proposed spectral method is able to precisely assess and effectively combine multiple visualizations to better grasp the underlying noiseless structure P * , without referring to the original noisy datasets, making it more robust, flexible, and computationally more efficient.

Benefits of Including More Functionally Diverse Visualizations
Our theoretical analysis implies that the proposed meta-visualization may benefit from a large number (larger K) of functionally diverse (small ρ) candidate visualizations.To empirically verify this theoretical observation, we focused on the religious and biblical text data and the mouse embryonic stem cells data considered in Section 2.2, and obtained spectral meta-visualizations based on a smaller but relatively diverse collection of 5 candidate visualizations, produced by arguably the most popular methods, namely, t-SNE, PHATE, UMAP, PCA and MDS, respectively.Compared with the 16 candidate visualizations considered in Section 2.2, here we have presumably similar ρ but much smaller K.As a result, for the religious and biblical texts data, the meta-visualization had a median Silhouette index 0.187 (Supplement Figure 13), which was smaller than the median Silhouette index 0.275 based on the 16 candidate visualizations as in Figure 3(d); for the cell cycle data, the meta-visualization had a median Silhouette index -0.062and a Kendall's tau statistic 0.313, both smaller than the respective values based on the 16 candidate visualizations as in Figure 4(d).On the other hand, we also evaluated the effect when ρ is increased but K remains fixed.Specifically, we obtained 16 candidate visualizations, all produced by PHATE with varying nearest neighbor parameters, the final spectral meta-visualization had a median Silhouette index 0.094, which was even lower than the above meta-visualization based on five distinct methods, although being still slightly better than the 16 PHATE-based candidate visualizations (Supplement Figure 13).These empirical evidences were in line with our theoretical predictions, suggesting benefits of including more diverse visualizations.

DISCUSSION
We developed a spectral method in the current study to assess and combine multiple data visualizations.The proposed meta-visualization combines candidate visualizations through an arithmetic weighted average of their normalized distance matrices, by their corresponding eigenscores.Although the proposed method was shown both in theory and numerically to outperform the individual candidate visualizations and their naive combination, it is still unclear whether there exists any other forms of combinations that lead to even better meta-visualizations.For example, one could consider constructing a meta-distance matrix using the geometric or harmonic (weighted) average, or an average based on barycentric coordinates (Floater, 2015).We plan to investigate such problems concerning how to optimally combining multiple visualizations in a subsequent work.
Although originally developed for data visualization, the proposed method can be useful for other supervised and unsupervised machine learning tasks, such as combining multiple algorithms for clustering, classification, or prediction.For example, for a given dataset, if one has a collection of predicted cluster memberships produced by multiple clustering algorithms, one could construct cluster membership matrices with (i, j)-th entry being 0 if sample i and j are not assigned to the same cluster and being 1 otherwise.Then we may define the similarity matrix as in (2.7), obtain the eigenscores for the candidate clusterings, and a meta-clustering using (2.9).It is of interest to know its empirical performance and if the fundamental principles unveiled in the current work continue to hold for such broader range of learning tasks.Cell Cycle Data.The raw count data were preprocessed, normalized, and scaled by following the standard procedure (R functions CreateSeuratObject, NormalizeData and ScaleData under default settings) as incorporated in the R package Seurat1 .We also applied the R function FindVariableFeatures in Seurat to identify 2000 most variable genes for subsequent analysis.The final p = 1147 cell-cycle related genes were selected based on two-sample t-tests.The 16 candidate visualizations were generated the same way as in the previous example, with the same set of tuning parameters.
Cell Differentiation Data.The raw count data were preprocessed, normalized, and scaled using Seurat package by following the same procedure as described previously.The 16 candidate visualizations were generated the same way as in the previous examples, with the same set of tuning parameters.
Computational Cost.We considered the single-cell transcriptomic dataset of Buckley et al. (2022) that contains more than 20,000 cells of different cell types from the neurogenic regions of 28 mice.For each n ∈ {1000, 2000, 4000, 8000, 14000}, we randomly select n cells of nine different cell types, and selected subsets of p ∈ {500, 1000, 2000} genes to obtain an n × p count matrix.After normalizing the count matrix, we applied various visualization methods (PCA, HLLE, kPCA, LEIM, UMAP, t-SNE and PHATE) that are in general scalable to large datasets (i.e., cost less than one minute for visualizing 1000 samples of dimension 300), to generate 11 candidate visualizations   (with two different parameter settings for kPCA, t-SNE, UMAP and PHATE).Then we ran our proposed algorithm to obtain the final meta-visualization.

B.1 Notations
For a vector a = (a 1 , ..., a n ) ∈ R n , we denote diag(a 1 , ..., a n ) ∈ R n×n as the diagonal matrix whose i-th diagonal entry is a i , and define the p norm a p = n i=1 a p i 1/p and the ∞ norm a ∞ = max 1≤i≤n |a i |.For a matrix A = (a ij ) ∈ R n×n , we define its Frobenius norm as A F = n i=1 n j=1 a 2 ij , and its spectral norm as A = sup x 2 ≤1 Ax 2 ; we also denote A .i ∈ R n as its i-th column and A i. ∈ R n as its i-th row.For sequences {a n } and {b n }, we write

B.2 Sufficient Condition for (C1a)
We provide a sufficient condition in light of the signal-plus-noise model (2.11) of the main text with some intuitions that implies the sub-Gaussian condition (C1a) for h (k) i in (2.12).In general, the distortion vector h (k) i is jointly determined by the noise {Z i } 1≤i≤n , the noiseless samples {Y * i } 1≤i≤n and the dimension reduction map f k : R p → R 2 associated to the k-th visualization method.Accordingly, our sufficient condition for (C1a) essentially involves regularity of the signal structure {Y * i } 1≤i≤n and the dimension reduction (DR) maps f k : R p → R 2 for 1 ≤ k ≤ K, and sub-Gaussianity of the noise vector {Z i } 1≤i≤n .We first state precisely our sufficient condition.
(C01) (Regularity of the signal and DR map) The noiseless samples {Y * i } 1≤i≤n lie on a bounded manifold M embedded in R p , and the DR map f k and its first-order derivative are bounded almost surely for some constants L > 1 and C f > 0.
Intuitively, (C01) requires that the underlying signal structure is finite and that the DR map is also finite and sufficiently smooth.In particular, we allow that f k is random in itself, as in the cases of randomized algorithms such as t-SNE and UMAP.The sub-Gaussian condition (C02) on the noise vector Z i is mild and allows for wide range of noise structures.Below we show that Conditions (C01) and (C02) jointly imply the sub-Gaussianity of h i .Firstly, note that by definition P . Then for each i, we can define the pairwise distance for the noiseless samples associated with the k-th visualization method as . Then, by (2.12), it follows that where in (B.3) we used Taylor expansion of P being some point between P For each i, since c i,k is a parameter that accounts for the possible scaling difference caused by the dimension reduction map f k , without loss of generality, we can take c ) is bounded and therefore a sub-Gaussian random variable.Similarly, for the random variable g k,ij 2 , and the sub-Gaussianity of Z i and Z j from (C02), imply that (B.5) is also a sub-Gaussian random variable.Thus, we have verified that the sub-Gaussianity of h (k) i under Conditions (C01) and (C02).In particular, the sub-Gaussian parameter σ 2 in (C1a) is jointly determined by the underlying manifold M, the scale (L) and the smoothness (C f ) of the DR map.

B.3 Eigenscore Consistency: Proof of Theorem 1
For simplicity, we omit the dependence on i ∈ {1, 2, ..., n} and hereafter denote x k = (x k1 , ..., x kn ) = P . Therefore, model (2.12) in the main paper can be rewritten as and define the matrix of normalized vectors where Xj ∈ R K is the j-th column of X, so that For any 1 ≤ j, k ≤ n, it follows that This implies that X X = g 2 2 • yy + yg H + Hgy + HH , (B.8) where Let the singular value decomposition of X be where r = min{n, K} and σ 1 ≥ σ 2 ≥ ... ≥ σ r .Then we have , where V −1 = v2 ... vr ∈ R n×(r−1) .Now we derive estimates of the random quantities σ 1 , σ 2 , |v 1 y| and V −1 y to obtain an upper bound of the last term in the above inequality.Consider the decomposition (B.9) and the first equation of (B.10).Firstly, by Weyl's inequality (e.g., Corollary III.2.6 of Bhatia 2013), we have Secondly, by Davis-Kahan's perturbation theorem (e.g., Theorem 1 of (Yu et al., 2015)), we have Thus, if we denote R = E /( g 2 2 y 2 2 ), by (B.12) and (B.13), if R < 1, we have and by (B.14), we have In other words, we have in probability, for some non-negative sequence δ n → 0. This implies B.22) in probability.
Lower bound of g 2 .By similar argument that leads to (B.21), we have (B.23) in probability for some non-negative sequence δ n → 0.
In other words, X X = g 2 2 • yy + LL + E, E = yg H + Hgy + HH , (B.45) Note that by definition LL y = 0, then we can treat the above decomposition as a low-rank matrix g 2 2 • yy + LL plus a noise matrix E. In this respect, the Davis-Kahan's perturbation theorem applies similarly as in the proof of Theorem 1, only with for some sufficiently large C > 0. This completes the first part of the proof.
As for the second statement concerning Pa , note that by (B.25), whenever y 2 /σ ≥ C √ n for some sufficiently large C, all the (1 − η)K candidate visualizations are concordant with the true structure in probability, that is cos ∠(x k , y) → P 1 for all k ∈ C 0 .In particular, we have In this part, we show that the signal strength condition (C2) required by the proposed method is in fact asymptotically sharp and optimal, in the sense that it is essentially required for any possible method in order to achieve consistent estimation of the true structure P * .Specifically, we consider the settings where the following two conditions hold: (C1c) {h Note that Conditions (C1c) and (C1d) are sufficient conditions for Conditions (C1a) and (C1b), respectively, so that Conditions (C1a) and (C1b) are satisfied throughout our discussion below.
Using the previous notations, we can rewrite our model for given i as where by (C1c) {z k } 1≤k≤K are identically distributed Gaussian vectors with mean zero and covariance matrix σ 2 I n , and R ∈ R K×K is defined similarly as in the main paper describing the Then, it follows that ȳ ȳ − y y F ≥ √ 0.6 implies cos ∠(y, y) < 0.7.Therefore, whenever we choose a constant C 1 > 0 sufficiently large so that This completes the proof of the theorem.

B.7 Proof of Auxiliary Lemma 1
We need the following lemma, proved in Cai and Zhang (2018), to obtain the upper bound.
Lemma 3.For any p ≥ 1, denote B p = {x ∈ R p : x 2 ≤ 1} as the p-dimensional unit ball in the Euclidean space.Suppose K ∈ R p 1 ×p 2 is a random matrix.Then we have for t > 0, where Σ ij = σ 2 ρ ij for 1 ≤ i, j ≤ d.In particular, we have To see this, if we denote v = (v 1 , ..., v d ) ∈ R nd , then l Rl = R .
Figure 2: (a) Boxplots of the mean concordance with the underlying true pattern for 15 candidate visualizations (HLLE omittted due to very low concordance) and the two meta-distance matrices under each simulation setting (Left: Gaussian mixture; Middle: smiley face; Right: mammoth) across various values of the SNR value θ.See Supplement Figure 7 for complete plots.(b) Top: examples of candidate visualizations along with their sample-wise concordance {( P(k) i.) P * i. } 1≤i≤n with the structure of noiseless samples, and the proposed meta-visualization using UMAP and the concordance {( Pm i. ) P * i. } 1≤i≤n for the proposed meta-distance.Bottom: boxplots (center line, median; box limits, upper and lower quartiles; points, outliers) of concordance measures as grouped by the true clusters under the Gaussian mixture setting.See Supplement Figure8for more examples.

Figure 3 :
Figure 3: Visualization of 590 fragments of texts from eight religious and biblical books.(a) Three examples of candidate visualizations.The samples are marked by eight different symbols and colors according to their associated books.More examples are included in Supplement Figure 10.(b) Boxplots of eigenscores for all 16 candidate visualizations.(c) The proposed spectral meta-visualization using UMAP.(d) Median silhouette indices for the 16 candidate and 2 metavisualizations.The error bars of the meta-visualizations indicate the variability (95% confidence interval) due to the visualization method (UMAP) applied to the meta-distance matrix (2.9).

Figure 4 :
Figure 4: Visualization of the cell cycle of 288 mouse emryonic stem cells.(a) Three examples of candidate visualizations.The cells are marked by three different symbols and colors according to their associated cell cycle stages.More examples are included in Supplement Figure 14.(b) Boxplots of eigenscores for all 16 candidate visualizations.(c) The proposed meta-visualization using kPCA.(d) Median Silhouette indices versus Kendall's tau statistics for the 16 candidate and the 2 meta-visualizations (Red: proposed spectral meta-visualization; Orange: naive simple average meta-visualization).For both metrics, a higher value indicates a better visualization of the respective structure (cluster/cycle).

Figure 5 :
Figure 5: Visualization of 421 cells undergoing differentiation.(a) Three examples of candidate visualizations.The individual cells are marked by five different symbols and colors according to the time points they were captured and sequenced.More examples are included in Supplement Figure 17.(b) Boxplots of eigenscores for all 16 candidate visualizations.(c) The proposed metavisualization using kPCA.(d) Median Silhouette indices versus Kendall's tau statistics for the 16 candidate and the 2 meta-visualizations (Red: proposed spectral meta-visualization; Orange: naive simple average meta-visualization).

Figure 7 :
Figure 7: Boxplots for the mean concordance of the 16 candidate visualizations and the 2 metavisualizations under each simulation setting (left: finite mixture of points; middle: smiley face; right: mammoth) across 20 equispaced values of θ from some intervals.The proposed spectral meta-distance matrix had superior performance than the others.

Figure 8 :
Figure 8: Examples of candidate visualizations of the simulated data generated from a 6-class Gaussian mixture model (i.e, setting (i)), along with their pointwise concordance, and boxplots grouped by clusters.The plots indicates strengths and weaknesses of different methods in capturing the underlying clusters.

Figure 9 :
Figure 9: Boxplots for the mean concordance of the 16 candidate visualizations and the 2 metavisualizations under the Gaussian mixture model (i.e., setting (i)) with various intrinsic dimensions (r) across different values of θ ∈ [5, 10].The plots demonstrate the flexibility of the proposed method with respect to the intrinsic dimension r.

Figure 10 :
Figure 10: More examples of visualization of 590 fragments of religious text.The cluster patterns are far from clear based on individual methods.

Figure 11 :
Figure 11: Visualization of 590 fragments of religious text.Left: median Silhouette indices for 16 candidate and 2 meta-visualizations and the original data (ori), where the two meta-visualizations are based on t-SNE.Middle: median Silhouette indices for 16 candidate and 2 meta-visualization and the original data, where the two meta-visualizations are based on PHATE.Right: scatter plot of averaged eigenscores and averaged Silhouette indices for 16 candidate visualizations.The left and middle panels indicate the improvement on performance was not sensitive to specific visualization methods.The right panel shows the potential interpretation and the effectiveness of the eigenscores.

Figure 12 :
Figure 12: Comparison of eigenscores (top) and Silhouette indices (bottom) of four candidate visualizations, grouped by clusters.Notable correlation between the two quantities indicates the effectiveness of the spectral weighting approach.

Figure 13 :
Figure 13: Visualization of 590 fragments of religious text.Left: boxplots of Silhouette indices for 5 candidate and 2 meta-visualizations and the original data (ori), where the 5 candidate visualizations are based on t-SNE, PHATE, UMAP, PCA and MDS, respectively.Right: boxplots of Silhouette indices for 16 candidate and 2 meta-visualizations and the original data (ori), where the 16 candidate visualizations are based on PHATE with varying nearest neighbor parameters.Compared with Figure 3(d), they showed that the proposed method may benefit from additional, and more diverse candidate visualizations.

Figure 14 :
Figure 14: More examples of visualization of 288 mouse emryonic stem cells in cell cycles.

Figure 15 :
Figure 15: Examples of projection of candidate visualization onto the unit circle centred at the origin, from which the Kendall's tau statistics were computed.The colors correspond to the true cell cycle stages, suggesting that the spectral meta-visualization recovered the cell cycle better than other methods.

Figure 16 :
Figure 16: Illustrations of eigenscores for four candidate visualizations of the cell cycle data, showing that different visualizations may contribute to the meta-visualization from distinct aspects.Lighter color indicates better concordance to the underlying structures as assessed by the eigenscores.
and write a n = O(b n ), a n b n or b n a n if there exists a constant C such that a n ≤ Cb n for all n.We write a n b n if a n b n and a n b n .

Figure 17 :
Figure 17: More examples of visualization of the 421 cells under differentiation.

Figure 18 :
Figure 18: Illustrations of eigenscores for four candidate visualizations of the cell differentiation data, showing that different visualizations may contribute to the meta-visualization from distinct aspects.Lighter color indicates better concordance to the underlying structures as assessed by the eigenscores.

Figure 19 :
Figure 19: Computational time (log of running time in seconds) for each candidate visualization and the proposed method, including all four steps of Algorithm 1, for the three real-world datasets considered in Section 2.2.Left: religious and biblical text data; Middle: cell cycle data; Right: cell differentiation data.The proposed method had a computational time comparable to that of tSNE or UMAP for generating a single candidate visualization.

Figure 20 :
Figure 20: (a) Computational time (in mins) for generating 11 candidate visualizations ("generate candidate visualizations") for single-cell transcriptomic datasets of various sample sizes and dimensions, and that for generating the meta-visualizations ("generate meta-visualization") based on Algorithm 1.(b) Boxplots of the median Silhouette indices for 11 candidate visualizations and the meta-visualization (highlighted in red) with respect to the underlying true cell types.The running time of the proposed algorithm increased with n, but remained much less than that for generating the candidate visualizations.For each n, when p increased, the running time for generating the candidate visualizations was longer, but the time cost for meta-visualization remained the same.

Figure 21 :
Figure21: Boxplots of Silhouette index for 16 candidate and 2 meta-visualizations, and the original data for the three datasets analyzed in the main paper.The original datasets showed significantly weaker cluster structure compared to most of the 16 candidate visualizations, suggesting that directly comparing a visualization with the noisy high-dimensional data may be misleading.
ij and P * (k) ij , and in (B.4) we used Taylor expansion of f k (Y i ) at Y * i with s i being some point between Y i and Y * i .

|
B.25)    in probability.By Condition (C2), whenever K n the right-hand side of the above inequality is strictly positive.In other words, in probability, s is a strictly positive vector.Thus, for s = |û 1 |, under the same event it holds that cos ∠( s, s) ≤ cos ∠(|û 1 |, s). of Meta-Visualization: Proof of Theorem 2We still use the notation defined at the beginning of the proof of Theorem 1.In addition, we denote x m = Pm i. , and definex * = K i=1 w i xi ,(B.28)where w i = s i / s 2 .To begin with, note that cos ∠(x * , y) cos ∠(x i , y)| = Xy ∞ / y 2 .
Now by (B.60), we haveE H 0 , uv ≤ Cσ √ ρ. (B.61)By properties of sub-Gaussian vectors, it holds thatP (| H 0 , uv | ≤ Cσ √ ρ(1 + t)) ≥ 1 − e −ct 2 .(B.62) . is the i-th row of the normalized distance matrix P * for the underlying noiseless samples {Y * i } 1≤i≤n , defined as in (2.5) with X 1) i. ) P * i. , ( P(2) i. ) P * i. , ..., ( P(K) i. ) P * i. ) ∈ R K , (2.10) where P * ii 's replaced by Y * i 's.Table 1≤k≤K are the distance matrices associated to the candidate visualizations.Then, for the candidate visualizations, we consider a scaled signal-plus-noise expression } 1≤k≤K are jointly determined by the random noises {Z i } 1≤i≤n in (2.11), and the features and relations between of the specific visualization methods.Importantly, in line with what is often encountered in practice, equation (2.12) allows for flexible and possibly distinct scaling and directionality for different candidate visualizations, by introducing the visualization-specific parameter c k , and by focusing on the pairwise distance matrices, rather than the low-dimensional embeddings {X i } 1≤i≤n are the underlying noiseless samples and {Z i } 1≤i≤n are the random noises.Recall that {P (k) } i The condition K = o(n) is easily satisfied for a sufficiently large dataset.