Ultrafast data mining of molecular assemblies in multiplexed high-density super-resolution images

Multicolor single-molecule localization super-resolution microscopy has enabled visualization of ultrafine spatial organizations of molecular assemblies within cells. Despite many efforts, current approaches for distinguishing and quantifying such organizations remain limited, especially when these are contained within densely distributed super-resolution data. In theory, higher-order correlation such as the Triple-Correlation function is capable of obtaining the spatial configuration of individual molecular assemblies masked within seemingly discorded dense distributions. However, due to their enormous computational cost such analyses are impractical, even for high-end computers. Here, we developed a fast algorithm for Triple-Correlation analyses of high-content multiplexed super-resolution data. This algorithm computes the probability density of all geometric configurations formed by every triple-wise single-molecule localization from three different channels, circumventing impractical 4D Fourier Transforms of the entire megapixel image. This algorithm achieves 102-folds enhancement in computational speed, allowing for high-throughput Triple-Correlation analyses and robust quantification of molecular complexes in multiplexed super-resolution microscopy.


Supplementary Discussion
Given the sparse nature of SMLM data, one major advantage of dTC as compared to the ftTC algorithm is its improved computational speed. Specifically, since the ftTC is calculated via the bispectrum convolution theorem its computational expenditure reaches ~( log 2 ) 4 for 4D Fourier Transform, but in practice it would require an outflow of ~( 2 • log 2 ) 2 due to the insufficient capacity of RAM (see ref 20 ). In contrast, the dTC algorithm visits each of the SMLM coordinates --which are much less than 2 due to their sparsity, and partitions three measures (the distances from the visited coordinate to other coordinates in the other two channels, and the angle ∆ in between each pairwise vectors) into 3-D bins (Supplementary Note 1). Because the user-defined bins do not have to be evenly sampled and can only focus on the configurations of interests with high sampling frequency, such tri-variate histogram bin counting further saves the computation costs. We note that this dTC algorithm can also be adapted to the calculation of the Pair-Correlation (dPC) function, although the 2D-Fourier Transform We also note that implementation of GPU (NVIDIA GTX 1060) does not improve the computation performance in calculating the Triple-Correlation function via dTC (Fig. 2C). As mentioned above, the essence of the dTC algorithm is the tri-variate (3-D) histogram bin counting during visiting each coordinate. Since the number of the 3-D sampling bins is usually larger than the capacity of the shared memory of most of the current GPU, the histogram bin counting cannot be decomposed into smaller data blocks for parallel computation. Figure 2C shows the evaluation of the naive 3-D histogramming on the global memory of the GPU, indicating no improvement in computation performance as compared to performance on CPU.
Moreover, since in most situations the orientation of each triplet � CH1 , CH2 , CH3 � is independent of that of the whole image, ( 12 , 13 ) = ( 12 , 12 , 13 , 13 = 12 + ∆ ) is further averaged along 12 from 0 to 2π (Fig. 1a), and derived as Supplementary Equation (2): We note that the integration through [0,2π) at distance from could exceed the image area, and in such cases, integration has to be limited within has to meet such condition that at least one of 12 = ( 12 , ) and 13 = ( 13 , + ∆ ) is outside the image area ( Supplementary Fig.   1).
We note that the Pair-Correlation can be defined as can be also defined in two forms. Besides the definition given in this work, it can be defined as ( 12 , 13 ) = between these two definitions is not a constant scaler.
The derivation demonstrates that ( 12 , 13 ) results in a mixture of trimer and dimer correlations while ( 12 , 13 ) only contains the trimer correlation ( Supplementary Fig. 2), which is in fact the most of our interests. Since the dTC algorithm presented in this work follows the first kind definition of the Triple-Correlation function ( 12 , 13 ), we extract the Cross-Pair-Correlation from the dTC results to generate the final ( 12 , 13 ) profile ( Supplementary   Fig. 2).
To further interpret the triple-correlation results, we analyzed the local maxima of ( 12 , 13 , 23 ), which represent the most significant geometric configuration defined by { 12 , 13 , 23 } amongst the three colors. The configurations are then displayed as triangles in which the size of the circle at the vertex denotes the correlation amplitude (Fig.   1d).
The coordinate-visiting algorithm presented in this work for Triple-Correlation can be adapted to computing the Pair-Correlation function. Considering a two-color SMLM image CH1 ( ) and CH2 ( ) (can be "two" same channels for Auto-Pair-Correlation), with its single-molecule localization coordinate sets ℂ CH1 = � CH1 � and ℂ CH2 = � CH2 � from two color channels CH 1 and CH 2 , its Pair-Correlation Function is defined as ( ) = where 〈 〉 denotes averaging over all positions in the image; ( ) = �∫ ( ) ∆ � ∆ � is the local density at within a differential area ∆ . Given the fact that CH1 ( ) CH2 ( + ) ≡ 0 ( ∉ ℂ), the Pair-Correlation function can be derived as Supplementary Equation (3): where and = 〈 ( )〉 is the image size and the average density of the entire SMLM image in the -th channel, respectively; CH2 � ( ) � represents the local density at in CH 2 originated from the -th coordinates in CH 1 .
Moreover, since in most situations the orientation of each pair � CH1 , CH2 � is independent of that of the whole image, ( ) = ( , ) is further averaged along from 1 to 2 , and derived as Eq(4) where 0 ≤ | 2 − 1 | ≤ 2π must satisfy that = � , � ∉ ( 1 , 2 )� is outside the image (similar to Supplementary Fig. 1). This direct Pair-Correlation (dPC) algorithm can be proceeded as following: 1) visiting a given detection at in CH 1 and assigning it as origin ; 2) calculating CH2 within part of a ring from 1 to 2 at configurations as well as the Triple-Correlation resolved output configurations. As shown, the trimer configurations were not well resolved when the localization uncertainty is larger than its input spacing (scenario 1 and 2), but can be approximated when the spacing and the localization precision was comparable (scenario 3). The configurations were well resolved when the spacing is larger than the localization uncertainty (scenario 4). We note that besides the localization uncertainty in SMLM data, other experimental variable (e.g. insufficient labeling specificity, varying clustering of one or more of the examined three species, and etc.) can introduce additional complexity in Triple-Correlation based analyses, as well as other image analyses. As such the resolved triangular configurations serves as a range of relative mutual positioning or triple-wise colocalization of examined three species rather than as absolute measurements of spacing of the examined trimers.