Fast visual exploration of mass spectrometry images with interactive dynamic spectral similarity pseudocoloring

Mass Spectrometry Imaging (MSI) is an established and still evolving technique for the spatial analysis of molecular co-location in biological samples. Nowadays, MSI is expanding into new domains such as clinical pathology. In order to increase the value of MSI data, software for visual analysis is required that is intuitive and technique independent. Here, we present QUIMBI (QUIck exploration tool for Multivariate BioImages) a new tool for the visual analysis of MSI data. QUIMBI is an interactive visual exploration tool that provides the user with a convenient and straightforward visual exploration of morphological and spectral features of MSI data. To improve the overall quality of MSI data by reducing non-tissue specific signals and to ensure optimal compatibility with QUIMBI, the tool is combined with the new pre-processing tool ProViM (Processing for Visualization and multivariate analysis of MSI Data), presented in this work. The features of the proposed visual analysis approach for MSI data analysis are demonstrated with two use cases. The results show that the use of ProViM and QUIMBI not only provides a new fast and intuitive visual analysis, but also allows the detection of new co-location patterns in MSI data that are difficult to find with other methods.

. Interactive artifact and matrix subtraction. UMAP embedding (scatter plots on the left) with sample pixels marked in blue, artifact pixels are marked in yellow and matrix pixels in green. The corresponding pixels are shown on the right, respectively. (a) I O k artifact subtraction (b) I O k matrix subtraction (c) I O v matrix subtraction (d) I t k matrix subtraction. The shown visualization is part of ProViMs interactive artifact and matrix subtraction module. The dimension reduction was computed by a successive combination of tSVD [1][2][3] and UMAP 4, 5 as described in Material and Methods.

B Introduction to the Interactive Matrix Subtraction Tool
When the interactive matrix subtraction tool is started it shows the selected embedding in blue on the left and an empty image on the right (see Figure 2 for an example). By using the respective buttons and a lasso selector in the embedding, the user can select regions of interest (RoI) represented in orange, matrix pixels represented in green, and artifact pixels represented in yellow. After the lasso has been used, the selected pixels are shown as binary image at the right. Currently, only the last selection is visualized. Therefore, if selections of multiple categories (RoI, matrix, artifacts) are done, only the pixels of the last selection will be shown. The Reset button will reset every selection. By using the Run button, the remaining workflow will be completed as described in the section Matrix and artifact detection and subtraction in the main manuscript. For further downstream analysis of potential interesting regions independent of ProViM, the user can select a RoI, type a label in the Type Name field and use the Export button. This will export a .npy numpy array 6 which contains the indices of the selected pixels. Those indices can be used to filter either the embedding or the pandas DataFrame of the used data set.

Figure 2.
Interactive matrix and artifact detection and subtraction tool. The embedding of all pixels is shown on the left side, with a default color of blue. The binary image of the current pixel selection is shown on the right. In this example, the image corresponds to the selection of matrix pixels. The region of interest selection is represented in orange, the artifact selection in yellow and the matrix selection in green. The shown visualization is part of ProViMs interactive artifact and matrix subtraction module. The dimension reduction was computed by a successive combination of tSVD 1-3 and UMAP 4, 5 as described in Material and Methods.

C Introduction to the Interactive Peak Picking Tool
When the interactive peak picking tool is started it shows only the mean spectrum of all pixels (blue) (see Figure 3 for an example). By moving the mouse across the screen, a helper line (green) with a number next to it indicates the currently selected threshold value. A right click activates the peak detection algorithm. After completion, the computed peaks are shown as circles (red) and the number of computed peaks are displayed on the lower left. The default value for deisotoping is 1.2, but can be changed by the user. The Deiso button starts the deisotoping algorithm. Peaks that remain after deisotoping are shown as smaller circles (orange) and their quantity is also shown at the lower left. The Run button continues the remaining peak picking workflow with the selected threshold.

D Deisotoping
The deisotoping procedure uses a minimum and maximum peak distance to compute potential isotope sets. These sets are checked for two kinds of patterns. The first one is characterized by a series of peaks that continuously decrease from a maximum to a minimum (descent pattern). The second is characterized by a series of peaks that increase up to a maximum, followed by a decrease (hill pattern). If such patterns are found, either the peak with maximal intensity (default) or the first one of the series is picked and the others are discarded. The algorithm to find these patterns is a two step procedure: 1. Consecutive peaks within a specified distance are computed and collected in a isotope set. 2. Each isotope set is checked whether more than one local maximum is present. If the check is positive, these sets are broken recursively at the local minimum that lies between the first two local maxima. This happens until each set contains only a single local maximum. Although this method is a rough approximation to find isotopes, it has the advantage to be efficient and no prior knowledge is required.   Table 2. List of ten masses with the highest signal in the QUIMBI visualizations shown in Figure 5 of the main manuscript.

I ProViM Performance and Statistics
ProViM was executed on a machine with 500 GB memory and 42 CPUs, using the parameters described in the section ProViM Configuration. Table 4 of Supplementary I shows the total run time without the interactive parts of artifact and matrix subtraction, peak picking and deisotoping, the maximum CPU usage, and the number of peaks picked before and after matrix subtraction. The average run time was 0.14 s · pixel −1 . The number of peaks after matrix subtraction increased by a factor of 1.2 to 2.2 compared to the number of peaks before matrix subtraction. Depending on the data set, executing ProViM has a high memory consumption (e.g. 22 GB for data set I t s and 181 GB for data set I O k ). Multiple CPUs are also advantageous to speed up the process with parallelization. Accordingly, it is not advisable to run the pipeline on a standard desktop computer. Table 4. Pipeline statistics for each data set. The number of spectral pixels |p + |, the total run time of ProViM, the maximal CPU usage (100% per core), the maximal memory usage and the number of peaks before |ρ m | and after the matrix |ρ c | subtraction with a peak picking threshold z t = 0.005 are shown.

J Automatic Matrix and Artifact Detection and Subtraction Procedure
In the automatic detection, the embedding is squared and multiplied by its original signs to increase the distance between the embedded pixels. With higher distances between embedded pixels, the parameters can be chosen more generously. Then, a k-Nearest-Neighbors (kNN) graph and a radius-Neighbors (rNN) graph are constructed and connected components are calculated (CC kNN , CC rNN ). This approach is similar to the one proposed in 7 for sequencing contamination detection. For the purpose of matrix classification, the number of connected components |CC| as well as their compositions are compared. To detect the final matrix pixels the following procedure is applied: 1. If |CC kNN | < 2 ∧ |CC rNN | < 2: no data is classified as matrix.
2. If |CC kNN | = 2 ∧ |CC rNN | = 2: if the compositions of the smaller components are different, return their joint set of pixels as matrix pixels.
3. If |CC kNN | = 2 ⊕ |CC rNN | = 2: select the respective method and return the pixels of the the smaller component as matrix.
4. If |CC kNN | > 2 ∧ |CC rNN | > 2: select the method with fewer components and return the pixels of every component except the biggest one as matrix pixels.
5. If 2 < |CC kNN | = CC rNN : calculate the joint set of pixels for all components from both methods except their biggest one and return it as matrix.
This automatic classification is not capable to distinguish between artifact pixels and matrix pixels. If matrix and artifacts are present, the algorithm will return the combination of both as matrix. Similar to the interactive matrix and artifact detection and subtraction procedure, new data sets will be created. One containing only the detected matrix pixels (combination of matrix and artifacts) and the other one containing the remaining pixels. This allows access to the detection result for a manual in-depth analysis if required.

K Efficient Implementation of QUIMBI
QUIMBI's user interface is implemented in the JavaScript programming language in combination with the Vue.js framework. It uses the graphics processing unit (GPU) for hardware accelerated computation of the interactive visualizations in real time. In a web-based JavaScript application, the GPU is exposed through the WebGL application programming interface (API). WebGL is intended to be used to display two-or three-dimensional graphics in a web browser. In order to process much higher dimensional data, QUIMBI uses the WebGL API in a special way, which is outlined in the following. A typical graphics rendering pipeline with WebGL consists of four steps: Step 1: GPU memory allocation and initialization of the WebGL shader programs which contain the logic that should be executed in the GPU. The GPU memory is typically used to store textures, images that are projected to two-or three-dimensional surfaces when the rendering pipeline is executed. The total size of the available texture memory varies depending on the model of the GPU. However, the texture memory layout is always a two-dimensional array of 4 byte values, one each for the red, green, blue and alpha color channels of an image. To fit a multivariate mass spectrometry data set I = {I 1 , . . . , I d }, which consists of d mass channel images, into the texture memory, the data points p i, j,k are first transformed to bytes p i, j,k ∈ {x ∈ N|0 ≤ x ≤ 255} (see Equation 1). The result is a normalized data set I = {I 1 , . . . , I d }. The normalized data set I is split into tiles I l−r = {I l , . . . , I r }, where each tile consists of four consecutive mass channels. Each tile can be treated like a regular image, only that the four color channels of the regular image now correspond to four mass channels of the data set. All tiles of a data set can be loaded into the texture memory of the GPU and are available to be accessed by the WebGL shader programs.
, where [·] denotes the rounding operation (1) Step 2: A vertex shader program computes the scene of two-or three-dimensional shapes. In the case of QUIMBI, this scene consists of only a single two-dimensional rectangle on which the image of the data visualization should be projected for display.
Step 3: A fragment shader program computes the color of each pixel of the scene that should be rendered. The fragment shader program is executed for each pixel in parallel on the GPU, which makes the computation very fast. The color of a pixel is typically based on a single interpolated color from a texture, as well as computed shadows or reflections in a three-dimensional scene. However, in order to compute the color of a pixel for the data visualization in QUIMBI, the fragment shader program needs to access the colors from the appropriate positions of all the tiles I l−r of the data set, that are stored in texture memory.
To compute the color c i, j of a pixel in the data visualization, different fragment shader programs are executed, depending on the current visualization mode. Each fragment shader program first computes an similarity/intensity value c i, j which is then transformed to the final color c i, j of the pixel using C (see Equation 1 of the main manuscript).
Step 4: In this final step of the WebGL rendering pipeline, the image produced by the fragment shader program is returned from GPU memory and drawn in the user interface. For the interactive real time visualization, the complete WebGL rendering 9/13 pipeline is executed in a loop. The data visualization is updated whenever the mouse position changes (Similarity Mode) or the selected mass channels change (Browsing Mode and Grouping Mode).

L Comparison of various dimension reduction methods
Figures 9 to 11 illustrate the comparison of four different dimension reduction methods for the purpose of matrix detection. It can be seen that the combination of PCA, NMF and tSVD with k-means providing a solid detection of the matrix. However, a look at the embeddings of these three methods shows that no grouping of data points can be identified manually. This has two major disadvantages: 1. One has to blindly trust in these methods. If the detect236580ion is not working as well as in this example, it can be very tedious to inspect the embedding manually as it provides no visual cues for potential clusters. 2. The automatic detection is not able to detect potential artifacts nor does the embedding provide visual clues for such spectra.
It can be seen that the use of simple clustering techniques like k-Means is not suited to detect clusters in embeddings with complex topology such as those generated by UMAP, which is the reason why we proposed a different method in Supplementary J. However, the embedding provides a lot of visual cues for potential clusters, which makes manual selection considerably easier.

10/13
In summary, all methods applied (PCA, NMF, tSVD and UMAP) deliver data embeddings that can be used to algorithmically determine point groups relating to matrix or artifact signals (for instance using machine learning). However, only the UMAP embedding provided an embedding plot with such a strong topological separation of matrix/artifact and sample clusters that this plot can be used as part of an interactive manual selection process, giving users full control over this sensitive data preparation step. . Segmentation maps resulting from the clusters found in Figure 10.