Robust, fiducial-free drift correction for super-resolution imaging

We describe a robust, fiducial-free method of drift correction for use in single molecule localization-based super-resolution methods. The method combines periodic 3D registration of the sample using brightfield images with a fast post-processing algorithm that corrects residual registration errors and drift between registration events. The method is robust to low numbers of collected localizations, requires no specialized hardware, and provides stability and drift correction for an indefinite time period.

Drift correction is a prevalent problem in microscopy, in which the sample being examined alters its position over time, leading to distortions for data collected over a series of movie frames. Typically, translational motion is the main significant positional change, which is what we will consider here, however, others have also examined the issue of rotational movements 1,2 . Drift can be due to a variety of factors, such as temperature variation, vibration and mechanical relaxation of the measuring instruments [3][4][5] , becoming significant for long recording times.
Various techniques to eliminate drift errors have been developed, such as producing specialized hardware and introducing fixed fiduciary markers (quantum dots, luminescent gold beads, fluorescent beads, semiconductor nanocrystals, polymers fixed on the coverslip, etc.) or patterns as reference points 11,12,[14][15][16][17][18] . Some researchers have suggested tracking image features, for example, the center of mass of a cell-like object 19 . Several techniques involve identifying and extracting image features, then matching them for global correspondence 6 , using clusters of those features 2 , or their nearest (feature) neighbors 8 in order to deduce the drift. These techniques work best with non-pointlike features 6 .
Active drift correction via continuous brightfield / non-fluorescent imaging with feedback to the stage position has been used by some 20,21 , the latter using circular intracellular vesicles to substitute for fiducial markers. In another study, brightfield images were collected every 10 frames and correlated for later analysis 22 . Tang et al. 4 minimized the normalized root-mean-square error between brightfield images over all pixels.
A common technique for estimating drift post experiment is by using multiple localizations from the same source that are interspersed throughout the data collection. The most common drift correction technique for super-resolution images using point localization data has been image cross-correlation, in particular, a fast implementation using fast Fourier transforms (FFTs) known as phase correlation 15,[23][24][25][26] , however, Pearson's r correlation coefficient has also been used (in time-lapse confocal microscopy) 27 . A variation of these techniques involves auto-correlation at time zero and cross-correlation subsequently 3 . One study considered cross-correlation on all possible combinations of images, proposing a cubic spline model for the drift curve 28 . McGorty et al. 29 used cross-correlation for 3D drift estimation.
Two additional extensions involve sum images, in which the sum of a series of frames is used to more reliably indicate localizations 15 . In Wang et al. 30 , each frame is assumed to contain an incomplete description of an identical structure. Groups of frames are combined into sum images. Redundant cross-correlation (RCC) is then used, considering all possible combinations of the sum images. The second procedure 31 identifies the FFT cross-correlation maximum between a sequential pair of frames. The estimated drift vector is applied to the later frame, which is then added to the sum image of the earlier shifted frames (or simply earlier frame on the first iteration). The process then repeats with the earlier frame now being a sum image.
Two techniques directly employing point positions are using a Bayesian statistical framework to calculate the drift corrections for every image frame 32 or per emitter when grouping localizations 33 . A third technique involves maximizing the molecular constraint field (MCF) cost function, where the MCF is a function of the distances between the points in two images 5 . The MCF cost function, defined as the sum of the negative exponentials of the distances between all points in a fixed reference image with respect to all points in a shifted movable image, takes into account the sparseness of points around each position, and produces near zero contribution when the distances become large. Typical usage involves combining collections of sequential frames into datasets (i.e., sum images) to which the MCF is applied incrementally while combining previously drift corrected results with the reference image.
Several researchers have used a combination of techniques. Fluorescent bead fiducial markers paired with cross-correlation of STORM images 23 or sum images 15 was an early technique. Schnitzbauer et al. 34 applied RCC to the whole field of view, then used DNA origami and the binding sites of a 20 nm grid in separate fiducial steps.
Here, we propose a combination of brightfield registration performed during an experiment before each collection of super-resolution frames forming a dataset, followed by a post procedure that uses the nearest neighbor distances of the localizations found to perform drift correction. All software described was written in MATLAB.

Algorithm
The components and dataflow of our algorithm are summarized in Supplementary Fig. S1. Before the superresolution data collection begins, a 3D brightfield z-stack is taken as an xyz reference. The data collection is broken up into a series of discrete datasets. Before each dataset, a brightfield registration procedure is used to align the sample to the reference z-stack. In a post-processing step, we correct for the remaining drift within each dataset and nanometer-scale shifts between datasets that arise from imperfect brightfield registration. The intra-dataset correction estimates a continuous motion model using a cost function calculated from pairs of localizations that are likely to be from the same fluorophore. The inter-dataset correction finds a global shift between datasets using the same cost-function.
Brightfield registration. Brightfield registration both ensures that we maintain the z-focus of our desired sample plane and that the sample does not drift too far from the desired x, y field of view. Remaining uncorrected drift/shifts introduced by brightfield registration on the nanometer scale are corrected by post-processing drift correction.
A z-stack of brightfield reference images is obtained before data collection begins. Before each dataset is collected, a new brightfield z-stack is obtained and compared with the previous reference z-stack, finding the best z and x, y fits via a scaled cross-correlation. The sample is moved to correct for any found offset. The procedure iterates until the corrected movement is less than 5 nm (50 nm) for x, y (z). A fluorescence super-resolution dataset is collected, then the procedure is repeated before the next dataset collection. See Fig. 1a. A complete data collection for one sample typically includes 10-50 datasets.
For a single dataset, registration between the reference z-stack, ref , and the new brightfield z-stack, new , proceeds as follows. Images in each of the two stacks are individually scaled by the sum of their pixel values: where stack k denotes a single image taken from either ref or new and stack i,j,k denotes a single pixel in stack k . This process is done for every image in both ref and new . Scaling each image in this manner reduces biases in the z location of the cross-correlation maximum that may occur due to differing intensities of images taken at different z positions. The two stacks are then individually whitened: where ⋆ denotes cross-correlation, N is the total number of pixels in the z-stack, and mean/stdev(stack) is the mean or standard deviation of pixel values across the entire z-stack. The whitening procedure will both reduce biases in the location of the cross-correlation maximum (i.e., for image stacks with a non-zero mean pixel value, the cross-correlation maximum is biased to correspond to small offsets between the stacks relative to the size of the stacks) and will scale the stacks such that pixel values in the final cross-correlation stack will range from − 1 to 1. A scaled 3D cross-correlation between ref and new is then computed as where F (· · · ) indicates the complex conjugate of a fast Fourier transform (denoted by F ) and ones denotes a stack of 1's the same size as the stacks ref and new . Each pixel in ones ⋆ ones will contain a count of the total number of overlapping pixels between ref and new that were used to compute each pixel in ref ⋆ new . The element-wise scaling of ref ⋆ new by ones ⋆ ones will reduce the bias in the location of the cross-correlation maximum introduced by the differing number of overlapping pixels used to compute each pixel in ref ⋆ new . The drift is then independently computed for each of x, y and z by fitting second-order polynomials to the k-nearest neighbor search. The core of these processes involves k-nearest neighbor searches in which the image is partitioned by a data structure called a k-d or k-dimensional tree. Each step in the k-d algorithm splits a region of the image space by a hyperplane, dividing the points into one of two half-spaces. The regions are formed by www.nature.com/scientificreports/ earlier splits. k-d trees are typically constructed by cycling through the dimensions (x then y then x then ...in 2D) as regions are split further and further, and choosing a median splitting plane that evenly divides the points between the two half-spaces. In this situation, the k-d tree is considered balanced. Nearest neighbor distances are quickly computed once a k-d data structure is produced. MATLAB implements balanced k-d trees for nearest neighbor searches 35 .

Inter-dataset drift correction.
To understand what is happening in more detail, consider Fig. 1d, depicting the situation for inter-dataset drift correction (in 2D), which is easiest to explain first. The emitters in dataset 1 are assumed to undergo a lateral shift, that may occur between datasets because of small residual errors in the brightfield registration process, in the transition to dataset 2. Brightfield registration is performed in the interval, and it is assumed that the intra-dataset drifts have been removed during application of the first part of the post-processing drift correction algorithm. What is left then is simply a constant offset between dataset 1 and dataset 2 due to residual errors from the brightfield registration, actual sample drift during the typically short time interval, and other errors due to the dynamic behavior of the system. For each localization in dataset 1, the nearest neighbor in dataset 2 of the predicted location of the localization under assumptions of a lateral shift can be determined. For the sake of efficiency (see "Methods; Post-processing drift correction" section), the model we actually use is the predicted location of dataset 2 localizations when unshifted to dataset 1. A sum of all the thresholded nearest neighbor distances (indicated by the connecting dotted lines) between the model (derived from dataset 2 and an assumed lateral shift) and the actual dataset 1 localizations is then constructed, which will become the cost function for an optimization routine. The minimum value of this sum, corresponding to the best correspondence between the model predictions and dataset 1 is searched for by the optimizer, yielding a model of lateral x-and y-shifts (and z-shifts in 3D) between these two datasets. The process then repeats using dataset 1 and dataset 3, etc.
Intra-dataset drift correction. The intra-dataset algorithm, depicted in Fig. 1b, c, is similar in many ways to the inter-dataset procedure. Here, the localizations, instances of true emitters, in all the frames in a single dataset are related. The drift model now assumes a polynomial dependence on time, so given the model parameters, each localization in the entire dataset is projected to t = 0 using the motion model. The constant in the polynomial fit is set to zero in order to match the model with the initial locations in the dataset. The first figure shows the situation in the case of no drift model, while the second provides an example using a quadratic time model. Once again, the nearest neighbor of each shifted localization in the entire dataset is determined, but this time with respect to the same model dataset (here, nearest neighbor means a different localization than the original one). Thus, minimizing the sum of the thresholded nearest neighbor distances minimizes the dispersion of the localizations throughout the dataset. In other words, we are trying to get the localizations to move as a group between frames and not spread out as would typically happen for poor drift models. This then carries over to the true emitters.
Thresholding and initial guesses for the optimizer. The threshold for the nearest neighbor sums is chosen to deemphasize nearest neighbors not likely to have been generated by the same emitter > l intra (= 1 pixel) away in the intra-dataset analysis, and > l inter (= 2 pixels) away in the inter-dataset portion of the algorithm. Mathematically, if d i is the nearest neighbor distance to the estimated localization i, the thresholded nearest neighbor sum is then where l = l intra or l inter is the threshold. For intra-dataset movement, fluorophores are constantly blinking on and off and are also sparsely represented, so this threshold tries to prevent bad neighbor pairings with false large contributions to the nearest neighbor sum. For example, when the nearest neighbor of a fluorophore blinks off in one frame of a dataset, this can result in a totally different and potentially distant nearest neighbor for the same fluorophore in the next frame. If distant, this can produce a huge contribution to the nearest neighbor sum if no threshold was present. In the inter-dataset drift procedure, this is less of a problem because the data is much denser, however, a threshold can be important for sparse datasets (see Supplementary Fig. S5). See also Fig. 1d for an example of a threshold coming into play when an emitter only present in dataset 2 (red ringed square) has a predicted dataset 1 position that is far away from any actual dataset 1 localizations. For intra-dataset drift correction, an initial guess of zero for all coordinates is used for the optimizer; for inter-dataset drift correction, an initial guess based on the accumulation of the drift of those datasets that have been corrected in previous iterations is used (or zero if brightfield registration has been employed between datasets). The polynomial fits for drift correction are a function of frame (or dataset) number; in other words, time. By default, we use a linear fit intra-dataset and a constant fit inter-dataset, but these can be easily changed if desired. This procedure runs in both 2D and 3D. See the "Methods" section for additional details.

Results
The drift correction procedure consists of two phases: brightfield registration before the acquisition of each dataset, and a post-processing drift correction algorithm (driftCorrectKNN).
Application to 2D imaging. Figure 2 shows an example of the effects of not performing versus performing brightfield registration for alpha tubulin in HeLa cells using 2000 frame datasets. Only brightfield registration took place; the post-processing drift correction algorithm was not applied here. In general, brightfield registra- www.nature.com/scientificreports/ tion is not this dramatic, but for our day-to-day usage, frequently our post-processing drift correction algorithm has little work to do because the brightfield registration works so well. Figure 3 shows the results of applying our post-processing drift correction algorithm to two different 2D examples, one involving nanorods labeled via DNA-PAINT in which brightfield registration was not used, and the other consisting of actin microfilaments in HeLa cells labeled with fluorescent particles which did use registration. The first image in each sequence is the raw result from localization identification to which frame connection has been applied to consolidate emitters (see Supplementary Methods). The second image is the result of applying the drift correction algorithm. In each example, a small selected area of the entire region of interest (ROI) is also displayed. In the two examples, the drift correction algorithm considerably sharpens the observed details.
The estimated drift plots for the above examples are displayed in Supplementary Fig. S2. The computed drift for each dataset taken for a super-resolution image is plotted with a tapering line segment indicating increasing frame number (so increasing time), while all the frames in the entire image are color coded from blue to red, again indicating the direction of increasing time. The DNA nanorods example did not employ brightfield registration, while the actin example did, therefore, the inter-dataset fitting optimization for each dataset was initialized with either the computed drift correction determined for the last frame in the previous dataset or zero, respectively. These initializations mimicked the drift correction results. The plot for the DNA nanorods example  www.nature.com/scientificreports/ shows a connected series of drift corrections, while the actin example shows drift corrections initially scattered about (0, 0), then slowly increasing in x over time.
The improvements in average resolution can be quantified using a measure based on Fourier ring correlation (FRC) 36 . Supplementary Fig. S3 shows the results of applying Q-corrected FRC (which removes spurious correlations) to the two example datasets where the resolution is defined as the inverse of the spatial frequency when the FRC drops below the black line representing the 1 7 threshold. See "Methods" section for more details. Drift correction improved the resolution of the the DNA nanorods example (over the entire ROI) with respect to the uncorrected result from 97.2 to 28.7 nm, while for the actin example, the resolution improved from 72.5 to 60.9 nm.
In Supplementary Fig. S4, we plot cost function landscapes explored by the optimizer for the two 2D examples presented in Fig. 3. The cost functions for the default fits, linear with zero constant term intra-dataset or constant separation inter-dataset, depend on only one x and one y parameter, and so can be plotted as 3D surfaces as a function of these parameters. The two 2D examples are displayed in separate rows, exhibiting the initial intraand inter-dataset landscapes. In all cases, the cost function landscape is funnel shaped with a broad plateau at the top. For the intra-dataset figures, the tip of the funnel is sharp and very near (0, 0), while for the inter-dataset figures, the funneling behavior is more gradual and broad. These landscapes imply that the initial guess should not be too far from the coordinates of the funnel's bottom, but if it is anywhere close, the funnel will lead the optimizer to the global minimum very quickly.
Dependence on labeling density and blinking statistics. To better understand the behavior of our drift correction algorithm, we evaluated the performance of estimating artificial drift under varying numbers of particles per dataset. Supplementary Fig. S5 shows the results for two studies, one involving randomly generated emitters confined to a 2D star-shaped domain, and the second involving a uniform distribution of random emitters over the entire 2D ROI. The mean drift computed by the algorithm matches within 0.1 (0.05) nm/frame the actual lateral shift imposed upon the emitters in the two scenarios down to about 10 2 ( 10 3 ) emitters per dataset for star (uniformly) distributed data. The variability of the results, however, broadens dramatically for sparser and sparser datasets around these values. Supplementary Fig. S6 provides a set of typical examples for the 2D star shaped domain undergoing artificial drift and then being corrected for a subset of the particle numbers per dataset displayed in Supplementary Fig. S5. The last row in this figure shows the corresponding estimated drift plots for this data as described earlier.
The intra-dataset drift correction procedure depends on nearest neighbor pairings of "on" fluorophores that lie within a distance threshold. We performed a series of simulations for various emitter properties on single 1,000-frame datasets of uniform randomly distributed sets of 2D emitters over a fixed size region in which only intra-dataset drift correction occurs, applying a constant drift per frame. The first two graphs in Supplementary  Fig. S7 plot the root-mean-square error (RMSE) between the true and the estimated drift curves against the number of blinking event pairs in the dataset, N p N e , where < N p >= 2 /2 (see "Methods" section) is the expected number of pairs of blinking events per emitter, is the expected number of blinking events per emitter, and N e is the number of emitters in the dataset, all calculated from the simulated data. The first simulation series holds N e fixed with increasing, while the second holds fixed with N e increasing. As the number of pairs of blinking events increases in either situation to 10 2 -10 3 , the RMSE drops precipitously to about 1 nm with a standard deviation of about 0.8 nm. See also Supplemental Methods; RMSE analysis for additional details.
Plotting the theoretical product (see "Methods; Fluorophore pairings for intra-dataset drift correction" section) of < N p > N e over a range of and N e produces a three-dimensional surface ( Supplementary Fig. S7), which can also be displayed as a contour plot with contours of constant < N p > N e . This coupling of the plots in Supplementary Fig. S7 constrain the values of and N e for producing a desired RMSE under this particular intra-dataset model.
In Supplementary Fig. S7, the simulations considered a linear model for the drift and applied a linear polynomial fit for the correction estimation procedure. In Supplementary Fig. S8, the results for other studies are shown. A linear drift was approximated, using about 10 3 -10 4 pairs for 1 nm accuracy, by a quadratic fit (but was about 5× slower than the case for a linear fit), a quadratic drift was approximated by a quadratic fit also using approximately 10 3 -10 4 pairs for nanometer accuracy, and a quadratic drift was less well approximated by a linear fit, achieving asymptotic 3-5 nanometer accuracy starting at 10 2 -10 3 pairs.
A study of inter-dataset drift in which 2000 frame runs of simulated data were performed, broken up into two 1000 frame datasets, is shown in Supplementary Fig. S9. Roughly, 10× more emitter pairs were required to achieve ∼ 1 ± 0.5 nm accuracy compared to the above results. Note that for quadratic drift, the inter-dataset RMSE continued to steadily decrease unlike the pure intra-dataset situation in the previous figure. 3D simulations. A more complex example is provided in Fig. 4. Here, two 3D rings of diameter 40 nm containing localizations are separated by 80 nm. A simulated astigmatism point spread function (PSF) was used to generate the localizations. Smooth, slowly oscillating drifts are applied in the x, y and z directions for 50,000 frames, maximally varying approximately 200 nm over about 12,500 frames. The simulated data is broken up into 100 datasets of 500 frames apiece. Applying the post-processing drift correction algorithm to the drifted data reproduces the rings with some slight scatter in the z-direction (see Fig. 4a-c). The computed drift curves almost perfectly overlap the simulated drift. Figure 4j, k show, respectively, the differences between the computed and simulated drift curves, and the overlapped y-drift curve. The maximum difference in x, y or z is 7 nm, but the mean differences are all close to 1 ± 0.5 nm. The greatest differences tend to occur around steep changes in the derivative of the drift curve. A second study in which 10 ring pairs were centered at different x, y, z-locations was performed using the same drift curves. The results are nearly identical to those for the single ring pair www.nature.com/scientificreports/ (see Fig. 4d-f, l), even for a zoomed-in set (Fig. 4g, i). The estimated drift plot, Fig. 4h, is comparable in shape to the drifted dataset from the first study, Fig. 4b.
The results with the above 3D localizations were quite good. However, we decided to perform a pair of studies with noisier simulated data, in which the localizations were fit and thresholded (see "Methods" section) after applying drift to them in order to mimic more realistic results. The first set of localizations were generated as a pair of 80 nm separated rings using a simulated astigmatism PSF and the same drift curves as before. The results are shown in Supplementary Fig. S10a, b, c, g, h. The data is much more scattered, but ring separation is still clear. However, if we used an experimental PSF for simulation and fitting ("Methods" section), 80 nm separation was not sufficient to distinguish the rings, so the results shown in Supplementary Fig. S10d, e, f, i are for a pair of rings separated by 120 nm. The differences in computed versus simulated drift curves are also larger for the www.nature.com/scientificreports/ imperfect data, especially when using the experimental PSF ( Supplementary Fig. S10g, i). The maximum drift curve differences for simulated astigmatism PSF (experimental PSF) are x, y: 27 (50) nm and z: 32 (85) nm, while the mean differences are x, y: 2 ± 2 ( 9 ± 6 ) nm and z: 9 ± 5 ( 27 ± 11 ) nm.

Breaking up datasets.
Finally, sometimes it is beneficial to reorganize the acquired datasets when the initial dataset size chosen was too large. Supplementary Fig. S11 shows an example of ATTO655 20 nm nanorulers in which the post-processing drift correction procedure was applied with varying dataset organizations. Here, 15,000 frames in a single dataset provided only one opportunity for the intra-dataset and no opportunities for the inter-dataset portions of the drift correction algorithm to be applied, so the results were very smeared out like the raw super-resolution image, while 100 frames per dataset provided too few localizations per dataset for clean sum images to be used by the algorithm. 1000 frames per dataset produced the sharpest results. We find that ∼ 2000 frames/dataset for dSTORM data and ∼ 1000 frames/dataset for DNA-PAINT data are good choices, but the drift correction methodology is robust enough to allow some leeway.

Comparison with Redundant Cross Correlation.
We performed a one-on-one comparison (after data acquisition) of the MATLAB implementation of Redundant Cross Correlation (RCC) 30 Supplementary Table S1), choosing the best outcome (see "Methods" section) individually for driftCorrectKNN and RCC in each simulation. (We also ran studies on rmax , but did not find noticeable improvements over the 0.2 pixel value which we typically used.) The two algorithms were nearly comparable on the star domain, while driftCorrectKNN was noticeably better on the uniformly random square regions, preserving the character of the true images more accurately-see Supplementary Figs. S14-S17. In particular, the histograms of the computed minus true x, y-positions show for the square regions that RCC, in general, made poorer predictions than driftCorrectKNN for these examples.

Discussion
To correct super-resolution drift, we propose a combination of brightfield registration performed after each dataset acquired, maintaining the sample plane z-focus and the sample in the x, y field of view, and a post-processing algorithm (driftCorrectKNN) based on nearest neighbor distances, combining intra-dataset and inter-dataset processing. No extra hardware is required. This methodology can produce a significant improvement in sharpness as indicated by FRC results for real data, while theoretical results and simulations show under what fluorophore conditions nanometer accuracy can be achieved in 2D. In 3D simulations, driftCorrectKNN produced lateral (axial) accuracies of 2-9 nm (order 25 nm) in a noisy environment. On comparing data examples from this article, driftCorrectKNN produced results as good as or better than RCC, a commonly used algorithm, especially for non-structured data where the error spread was much narrower. The two algorithms did perform comparably on the DNA-PAINT nanorods (but with better sharpness from driftCorrectKNN) as well as with the high density star domain and the actin microfilaments which possess overall long range structures, making it easier for cross-correlation techniques to latch onto. RCC did not do as well on the uniformly random square domains except for the sparsest example where was high, so an easier test than the denser squares where the number of blinking events per emitter was low. A disadvantage of RCC (and cross-correlation algorithms in general) is the necessity of properly choosing the spatial bin size, although as noted in Wang et al. 30 , guidance can be found in the literature. We note that sometimes RCC was unable to easily converge on a solution and ran very slowly, notably on the high density actin microfilament data where it was slower than driftCorrectKNN by a factor of 4. In summary, driftCorrectKNN performs robustly over a range of data densities and dataset segmentations, while requiring sometimes substantially less time and certainly less effort by the user in determining needed parameters.
The procedure we describe works in both 2D and 3D, the brightfield registration severely curtailing the interdataset drift and the post-processing algorithm then correcting for whatever drift remains. Drift is assumed to follow a continuous motion model within a dataset, whereas the inter-dataset analysis handles the discontinuities generated by brightfield registration. Supplementary Fig. S1 provides a flow chart and some comments on our recommended procedure.
Our post-processing drift-correction algorithm works for images produced by common super-resolution methods such as dSTORM and DNA-PAINT in which fluorescent labels blink or dye-labeled DNA strands bind/unbind repeatedly. In this situation, the nearest neighbors provide a cheap and fast way to produce image feature pairings that remain roughly stable over a number of frames. Note that a continuous underlying structure is not required, as the algorithm also works for isolated and randomly distributed fluorophores. However, this www.nature.com/scientificreports/ idea will not work with single activated probes such as in photo-activated localization microscopy (PALM) 11 . Minimizing the thresholded sum of the nearest neighbor distances keeps the localizations, and thus the emitters, cohesively moving together over time. A bad model would let these distances spread out and the emitters drift apart contrary to the assumptions of the model. The intra-dataset algorithm estimates drift as a function of frame number, while the inter-dataset algorithm computes lateral shifts between datasets. Increasing the number of blinking events in the intra-dataset algorithm provides more true pairs to constrain the drift model.
The intra-dataset threshold on nearest neighbor distances minimizes the effect of false fluorophore pairing. A threshold is less important for inter-dataset fitting as the sum of all the emitters in entire datasets rather than the sparse representation present in individual frames are considered when computing nearest neighbor distances, however, it can be important for sparse datasets. Employing thresholds is in contrast to cross-correlation techniques where contributions from different emitters are not restricted, so that image overlap at a pixel may be due to a combination of near and far away emitters. The nearest neighbor thresholds restrict contributions to local entities. Note that Eq. 1 is reminiscent of the MCF function 5 , but with a simpler structure and thresholding present, so significantly faster computationally. The threshold paradigm allows up to nanometer accuracy (for the intra-dataset portion of the algorithm) under practical conditions.

Methods
Sample preparation. HeLa cells (ATCC) were cultured as described in detail previously 37 . All labeling and washing steps were carried out at room temperature. Cells were seeded onto 25 mm glass (#1.5) coverslips in 6 well chambers (LabTek) to adhere for 24 h. For alpha tubulin, we used PBS as a buffer, while a PEM buffer (80 mM PIPES + 5 mM EGTA + 2 mM MgCl 2 , at pH 7.2) was used for actin filaments. The first fixation step was in 0.6% paraformaldehyde + 0.1% glutaraldehyde + 0.25% Triton diluted in buffer for 60 s, followed by a hard fixative buffer including 4% paraformaldehyde + 0.2% glutaraldehyde diluted in buffer for an hour. Cells were washed 2x in PBS and kept in NaBH 4 for 10 min to quench the autofluorescence within the cells resulting from glutaraldehyde in the fixation buffer, followed by a 2x wash with PBS. To quench reactive crosslinkers, the samples were kept in 10 mM Tris for 10 min, followed by 2 washes with PBS. Finally, samples were blocked in 5% BSA + 0.05% Triton X-100 for 15 min. At the end, samples were washed 1x with PBS.
For alpha tubulin, fixed cells were immunolabeled using Alexa Fluor 647 conjugated primary antibody (Novus Biologicals, CO) at 10 µg/mL for 1hr, followed by washes using PBS. For actin filaments in HeLa cells, the sample was imaged while it was labeled by Lifeact + Atto 655 with 0.1% BSA to minimize non-specific intracellular binding of the Lifeact + Atto 655. The labeling and imaging buffer was 3 nM of Lifeact + Atto 655 in 50 mM Tris, 10 mM NaCl, 10% (w/v) Glucose (TNG) at pH 8. GATTA-PAINT nanoruler slide sample (40R, GATTAquant DNA Technologies) was used as purchased.
Imaging. For alpha tubulin in HeLa cells and the DNA-PAINT nanorods sample, imaging was done on a custom-built inverted wide field fluorescence microscope setup as described previously 37 . Fluorescence excitation of the sample was done using 642 nm laser diode (Thorlabs, HL6366DG). The laser beam was collimated and passed through a single mode fiber (Thorlabs, P1-488PM-FC-2) before being focused on the back focal plane of a 1.45 NA oil objective (UAPON 150XOTIRF, Olympus America Inc.). TIRF excitation of the sample was achieved by translating the laser close to the edge of the objective back aperture. Fluorescence emission collected from the sample was passed through a quad band dichroic/emission filter set (Semrock, LF405/488/561/635-A) and a bandpass filter (Semrock, FF01-446/523/600/677-25) before being detected using an electron-multiplying charge-coupled device (EM CCD) camera (Andor Technologies, iXon 897). Data collection and instrument control on the microscopes here and below was controlled by custom-written MATLAB software 38 .
For dSTORM imaging of alpha tubulin in HeLa cells, an xyz piezo stage (Mad City Labs, Nano-LPS100) mounted on an x-y manual stage was installed on the microscope for cell locating and brightfield registration. To mount the prepared samples on 25 mm coverslips, an Attofluor cell chamber (Life Technologies, A-7816) was used and a clean 25 mm coverslip was used to seal the samples. A trans-illumination halogen lamp equipped with the microscope was used for collecting the brightfield images. The samples were stored at 4 • C and then transferred to the microscope just before starting imaging at room temperature. This natural heating up of the sample to room temperature was used to create the sample drift, which was observed in the images. A total of 50,000 256 × 256 pixel frames were collected using 100 ms exposure time for the GATTA-PAINT nanoruler super-resolution imaging. For alpha tubulin in HeLa cells, 25,000 frames ( 256 × 256 pixels) were recorded in each imaging experiment using 10 ms camera exposure time.
For actin filaments in HeLa cells, the imaging system was built on an inverted microscope (Olympus). An xyz piezo stage (Mad City Labs, Nano-LPS100) mounted on a x-y manual stage was installed on the microscope for cell locating and brightfield registration. A mounted LED with wavelength 850 nm (M850L3, Thorlabs) was used for brightfield illumination. Brightfield images were collected on a complementary metal-oxide semiconductor (CMOS) camera (Thorlabs, DCC1545M) after reflection by a short-pass dichroic beam splitter (Semrock, FF750-SDi02) and passing through a single-band bandpass filter (Semrock, FF01-835/70-25). A 638 nm laser was used (collimated from a laser diode, Thorlabs, L638P200) coupled into a single mode fiber and focused onto the back focal plane of the 1.49 NA objective lens (Olympus UAPON 100XOTIRF). Emission for super-resolution data was collected through a short-pass dichroic beam splitter (Semrock, FF750-SDi02) and a single-band bandpass filter (Semrock, FF624-Di01) on an iXon 860 EM CCD camera (Andor Technologies, iXon DU-860E-CS0#BV).
Imaging was performed in TNG buffer mixed with Lifeact conjugated with dye. The blinking events for single molecule localization were due to the binding/unbinding of Lifeact to the actin filaments 39,40 . We used the same 25 mm coverslip chamber as above to mount the samples on the microscope. Imaging was performed www.nature.com/scientificreports/ with TIRF illumination to reduce the background noise due to diffusing dyes in the imaging buffer, and images were acquired at 20 ms exposure time. Brightfield registration was performed to correct for drift after every 3000 frames as mentioned in "Methods; Brightfield registration" section.
Super-resolution fitting. We used a difference of Gaussians filter to reduce noise and enhance the signal from single emitters to enable identification of local maxima. Pixel coordinates with local intensity maxima were selected and used as centers of fitting regions of size 16 × 16 pixels. A given numerical PSF was used in the GPU 3D fitting algorithm to localize emitters. The algorithm fit the single emitters using maximum likelihood estimation (MLE) and assuming a Poisson noise model 41,42 . The algorithm finds the MLE employing the Newton-Raphson approach to iteratively update parameters, including x, y, z-locations, intensity and background. The resulting localizations were filtered by thresholding the intensity, background, p-value and Cramer-Rao lower bound of the estimations.
Brightfield registration. A camera, a stage, and a lamp are the hardware required to perform brightfield registration. At the beginning of the experiment, a brightfield reference z-stack is produced after turning the lamp on (the lamp is normally turned off during the collection of movie frames to not interfere with the fluorescent particle emissions). A dataset is then collected, after which a brightfield image z-stack is produced to be compared with the reference z-stack, again turning the lamp on during its production. The z-stack in our setup consists of 2D images taken at an odd number (typically 21) of different z positions reachable by the stage. Aligning the current z-stack with the reference z-stack, an (x, y, z) drift is computed for it, which will then be corrected before the next super-resolution dataset is imaged. The process then repeats until the found shifts in x, y and z are all less than a selected tolerance (typically chosen to be between 5 and 50 nm, depending on the minimum step size of the stage). The brightfield registration procedure typically takes < 30 s per super-resolution dataset collected.
The software used to perform brightfield registration is included in the Supplementary Software.
Post-processing drift correction (driftCorrectKNN). driftCorrectKNN operates by taking (x, y) or (x, y, z) coordinates from a sequential series of image frames of super-resolution observations collected together in datasets and computes the drift correction over the frames. The intra-dataset drift correction is computed by optimizing a polynomial fit as a function of frame number (so really time) using Eq. 1 with l replaced by l intra = 1 pixel. In practice, a linear polynomial fit was sufficient to get an accuracy of 0.0001 pixel/frame ( ∼ 0.01 nm/frame) for the drift rate. Finding more than one nearest neighbor did not noticeably improve the results.
The inter-dataset drift correction is computed by optimizing a constant fit to the positions of all localizations in each dataset relative to corresponding drift-corrected positions computed for the initial dataset in the intra-dataset analysis. Eq. 1 with l replaced by l inter = 2 pixels applies once again in computing the sum. This is done rather than considering the nearest neighbor distances between the predicted positions in the current dataset (derived from the initial dataset and the modeled drifts) and the actual positions. The two points of view are very similar, however, the former requires only one k-d decomposition (of the initial dataset), while the latter viewpoint requires a k-d decomposition of every dataset but the first, so is much more computationally demanding. If no other phenomena but drift is happening and the drift model is accurate, this nearest neighbor sum should be zero.
The values chosen for l intra (1 pixel) and l inter (2 pixels) were the result of a parameter study involving drift correction on the images and simulations presented in this paper. We found it necessary to make l inter > l intra , and the values chosen seemed to give reasonable overall performance.
The choice of using zero or the ending drift corrected value of the previous dataset, as derived from the intradataset and previous inter-dataset drift corrections, to initialize the inter-dataset optimization depends on how the data was collected. If brightfield registration was performed, then zero (correct for the first dataset, and the sum of accumulated drifts, so theoretically zero, in subsequent datasets) is appropriate. If the datasets were not collected using registration, then the ending frame of the previous dataset adjusted for accumulated drift is the better choice for initializing the inter-dataset optimization. The intra/inter-dataset optimizations, which use the thresholded/simple nearest neighbor sums above as cost functions, search for the global minimum of these costs using the MATLAB function fminsearch.
If each dataset in an experiment contains a large number of frames in which drift changes are significant and the datasets have not been registered, breaking up the datasets into smaller chunks can sometimes help to produce better results. The user can specify the total number of datasets or number of frames per dataset desired while calling driftCorrectKNN and the code will internally reorganize the datasets as specified, perform the drift correction calculation, and then reassemble the results back into the original dataset scheme at the end (see, for example, Supplementary Fig. S11).
driftCorrectKNN is implemented in MATLAB (see Supplemental Methods; Post-processing drift correction algorithm for a MATLAB pseudocode description) and is included in the Supplementary Software.
Fluorophore pairings for intra-dataset drift correction. The intra-dataset drift correction algorithm computes the sum of nearest neighbor distances (within a threshold) produced by the pairings of nearby "on" fluorophores over an entire dataset. The sum of these distances is the cost function, which when minimized, produces an estimate of the drift as a function of the frame number. To understand this pairing phenomenon in more detail, consider for a dataset how one might compute the expected number of pairs of blinking events per emitter, < N p > , given the number of expected blinking events per emitter, , which, for example, can be www.nature.com/scientificreports/ directly computed in simulations. We examined simulations exhibiting a uniform distribution (see "Methods; Simulations"), where single datasets of 1000 frames were used, so that only intra-dataset drift correction was performed. The number of blinking events post-frame connection was computed for the various emitter densities averaged over N = 100 runs. Let P(k; ) be the probability of exactly k blinking events ( k ≥ 0 ) per emitter occurring given an expected value of blinking events per emitter. This quantity can be represented by the Poisson distribution P(k; ) = k e − /k! . The number of blinking event pairs per emitter, N p (k) , given k blinking events, is simply the number of ways of choosing two blinking events from a total of k available, N p (k) = k 2 . Therefore, the total number of pairs of blinking events per emitter expected in the dataset is Fourier ring correlation (FRC). FRC 36 is a measure of the average resolution over a super-resolution image. It works by dividing the set of single-emitter localizations in the super-resolution image into two statistically independent subsets. The Fourier transforms of subimages generated from each of these subsets are then statistically correlated over pixels on the perimeters of circles of constant spatial frequency. The image resolution is defined as the inverse of the spatial frequency when the FRC curve drops below a threshold, taken to be 1 7 in Nieuwenhuizen et al. 36 . See Supplementary Fig. S3. Spurious correlations (for example, due to repeated photoactivation of the same emitter) are removed by estimating the number of times an emitter is localized on average (Q) assuming Poisson statistics. All analyses were accomplished using the software developed by Nieuwenhuizen et al. found at http:// www. diplib. org/ add-ons.
Simulations. 2D simulations of simple artificial drift were performed for (noisy) emitters randomly distributed in a star-shaped domain and over the entire 6400 × 6400 nm 2 ROI. Localizations around the true emitter locations were scattered by up to ±10 nm using a normal distribution. Seven different initial fluorophore densities were specified ([10, 5, 2, 1, 0.5, 0.25, 0.125] · 10 −4 fluorophore/nm 2 ) from which the number of emitters per dataset were computed. A constant drift of x = 0.3 and y = 0.4 nm/frame was imposed upon each emitter, after which driftCorrectKNN was applied to the drifted results. N = 100 simulations per condition were performed and the mean resulting drift calculated was compared to the true drift. The standard deviations of the results were also computed.
To produce the RMSE versus fluorophore pairings plot ( Supplementary Fig. S7a), we performed simulations in which emitters were randomly distributed uniformly over a 25, 600 × 25, 600 nm 2 ROI using single 1000 frame datasets, allowing for only intra-dataset drift correction. The following two series were run for linear and quadratic drifts. The first varied , the number (via a Poisson distribution) of expected blinking events per emitter per dataset over the range 0.01-10 while fixing the number of emitters, N e , to 10 5 . The second series varied N e over 10 3 -10 6 while setting = 0.2 . True cumulative drifts were defined by αf + βf 2 , where f is the frame number. Linear drifts were α x = 0.05 nm/frame and α y = 0.02 nm/frame, while quadratic drifts had the above linear terms as well as quadratic terms β x = 10 nm/frame 2 , β y = −20 nm/frame 2 . driftCorrectKNN was run with the appropriate intra-dataset polynomial order. The results were averaged over N = 100 runs.
A second series of simulations involved constructing more realistic artificial drift curves, and in general mimicking real, 3D experimental conditions. Simulated drift curves for x, y and z, designed to resemble realistic drift curves, were produced by generating a set of approximately 35 nodes over 50,000 frames (the exact number of nodes selected from a Poisson distribution), each node representing a point on a graph of drift component value (in units of nm) versus frame number. The intervals between the nodes randomly varied in width with respect to an average of N frames /(N nodes + 1) . The drift curves were fit by passing a cubic spline through the nodes in which the component values ranged approximately between − 100 and + 100 nm (− 1 to + 1 pixel). See Fig. 4k for an example of one such drift curve, the y-component.
In the first simulations, two 40 nm diameter rings separated in the z-direction by 80 nm were produced to mimic a DNA-PAINT labeled nanorod in which the center and the two ends were visible under super-resolution. The 2D sample structure, a ring in this case, was given as a binary image template, where the nonzero portion was filled with emitters of a uniform density ρ = 1000 fluorophores/pixel unit of length. A trace of blinking events for each emitter was then generated using the input duty cycle parameters, K on = 0.0005/frame and K off = 1/frame, which are the rate of emitters turning from off to on and from on to off. The random durations for the emitters to turn on and the on-durations were taken from exponential distributions with mean values of, respectively, K on and K off . The density of the on-emitters was then given by The simulated structures were produced at specific z-positions, which allowed separated pairs of rings to be made. A second set of simulations involved 10 pairs of 80 nm separated rings randomly distributed in (x, y) around the simulation region. A theoretical astigmatism point spread function (PSF) was used for the 80 nm separated sets, while a spiral experimental PSF was given as input for a third set of simulations in which a single pair of rings separated by 120 nm was studied. Localizations in the third set of simulations were further fit and thresholded (see "Methods; Super-resolution fitting" section for additional details). For all simulations, an interpolated PSF ρ on ≈ ρ K on K on + K off . www.nature.com/scientificreports/ was generated according to the z-position of the emitter and placed in (x, y, t) in the absence of drift. Drift was added as discussed above. If an emitter was on for the entire exposure time of a frame, it was taken to be at maximum intensity (12,000 photons), otherwise dimmer based on the fraction of the frame in which it was on. The data was produced with a fixed offset background (1000 photons) corrupted with Poisson noise.

Redundant Cross Correlation comparison. We downloaded the Redundant Cross Correlation (RCC)
MATLAB implementation as specified in Wang et al. 30 and applied it to the data presented in this article, in particular, DNA-PAINT nanorods, actin microfilaments (both shown in Fig. 3, as well as the 2D data supplied with the Supplementary Software for a star-shaped domain undergoing constant drift (see the leftmost column in Supplementary Fig. S6). Drift was applied to true emitters, then localizations were produced with a 10 nm normal distribution around each emitter. We also examined three different densities (383, 36, 4.5 emitters/µm 2 ) of square simulated uniformly random domains. The star and the two denser squares had low blinking events per emitter ( ∼ 1.28 ), while the sparsest square was given = 2.78 . Since the actin microfilament example originally was produced with brightfield registration, we undid the loss of smoothness in the drift curve by adding in the computed driftCorrectKNN inter-dataset corrections along with the intra-dataset correction from the last frame of the previous dataset to the raw SR data. For RCC, we used the number of frames in a dataset that was given to driftCorrectKNN as the segmentation parameter, rmax to 0.2 times the camera pixel size as was done in 30 , and set the spatial bin size to 2/1 nm for the DNA nanorods/actin SR images after making a parameter study on this quantity via visual inspection of the drift corrected results. For the simulated domains, we split the total number of frames into 20, 10, 8, 5, 4 and 2 datasets for both driftCorrectKNN and RCC. For RCC, we also varied the spatial bin size as 2, 4, 8, 16, 32, 48 and 64 nm. For either algorithm, we chose the best results correcting the imposed drift in the four simulations by computing found localization minus true localization x, y-positions (where they would be at time zero), and selecting the histograms of these quantities that had the smallest spatial spread. We also experimented with rmax , but at least on our data, there were only small differences until rmax exceeded 10 pixels where the results noticeably worsened when examining the corrected images in the starshaped domain. See also Supplementary Table S1.

Data availability statement
The DNA-PAINT 80 nm nanorods example is included in the Supplementary Software. All datasets analyzed during the current study are available from the corresponding author on reasonable request.