Pointwise error estimates in localization microscopy

Pointwise localization of individual fluorophores is a critical step in super-resolution localization microscopy and single particle tracking. Although the methods are limited by the localization errors of individual fluorophores, the pointwise localization precision has so far been estimated using theoretical best case approximations that disregard, for example, motion blur, defocus effects and variations in fluorescence intensity. Here, we show that pointwise localization precision can be accurately estimated directly from imaging data using the Bayesian posterior density constrained by simple microscope properties. We further demonstrate that the estimated localization precision can be used to improve downstream quantitative analysis, such as estimation of diffusion constants and detection of changes in molecular motion patterns. Finally, the quality of actual point localizations in live cell super-resolution microscopy can be improved beyond the information theoretic lower bound for localization errors in individual images, by modelling the movement of fluorophores and accounting for their pointwise localization uncertainty.

Super-Resolution fluorescence microscopy and live cell single particle tracking rely on computer intensive data analysis to find and localize single fluorescent emitters in noisy images. Much effort has been spent on developing and testing efficient spot localization algorithms [1] and understanding the theoretical limits for localization accuracy [2][3][4][5]. However, the problem of estimating and using the actual accuracy is still unsolved.
PALM/STORM type super-resolution imaging [6,7] relies on the serial activation and localization of sparse photo-switchable fluorophores. Knowledge about the localization accuracy is important to build up a high resolution image since uncertain points will only contribute blur. Often, only the number of photons, pixel size, and background noise for each emitter is used to estimate the localization accuracy, assuming that it achieves its theoretical limit. However, theoretical estimates neglect many important factors, and are prone to systematic er- Representative frames from a simulated movie with b = 1, N = 150. Images were simulated using SMeagol [8]. * johan.elf@icm.uu.se rors in particular when the background is variable and the emitter is moving, which is the common situation for live cell super-resolution imaging.
Knowledge of the localization uncertainty is also important in single particle tracking (SPT) [9], where it can be used to improve estimators of diffusion constants [10,11]. It is common in live cell imaging that the localization uncertainty varies throughout an experiment, for example due to out-of-focus motion, drift, motion blur, fluorophore intensity fluctuations, heterogeneous background, or gradual photobleaching of the background or labeled molecule. Examples of heterogeneous and timevarying spot quality are shown in Fig. 1.
Here, we investigate methods to extract and use localization uncertainty of single dots in super-resolved single particle tracking, using a combination of experimental data and highly realistic simulated microscopy experiments [8]. We propose and characterize an uncertainty estimator based on the Laplace approximation, combined with information about physical limitations in the detection system. This method outperforms the common practice of combining maximum-likelihood localization with a Gaussian point-spread function (PSF) model, and the Cramér-Rao lower bound (CRLB), which systematically underestimates the uncertainty in low light conditions relevant for live cell applications.
Second, we demonstrate how estimated localization uncertainties can be used to improve the estimation of diffusion constants, particle positions, and state changes in single-and multi-state diffusion SPT data. For the multi-state case, we derive a variational Expectation Maximization (EM) algorithm for a diffusive Hidden Markov model (HMM) which extends previously described algorithms [10][11][12][13][14][15][16] by accounting for both multistate diffusion, localization uncertainty, and motion blur.

Estimating point-wise uncertainty
Estimating uncertainty is closely related to estimating positions, where the maximum likelihood estimate (MLE) is generally considered the optimal method. A maximum likelihood method starts with a likelihood function, i.e., the probability density function of a probabilistic model for generating images of spots (pixel counts in a small region around a spot) with the emitter position among the adjustable parameters. The MLE are the parameters that maximize the likelihood function for a particular spot image. Following common practice, we model EMCCD camera noise with the high-gain approximation and Gaussian readout noise [4,17], and the spot shape by a symmetric Gaussian intensity profile plus a constant background intensity. The fit parameters are thus spot position (µ x , µ y ), background intensity b, PSF width σ, and spot amplitude N (see Methods, Eq. (1)), while the camera noise is assumed known from calibration.
What is localization uncertainty? The localization error is the difference µ est. − µ true between estimated and true positions. By localization uncertainty, we seek the distribution of the error, either in a Bayesian posterior sense, or in the sense of repeated localizations of spots with the same uncertainty. The uncertainty of the localization is related to the shape of the likelihood maximum: a sharply peaked maximum means that only a narrow set of parameters are likely, while a shallow maximum means greater uncertainty.
The Cramér-Rao lower bound (CRLB) is the smallest possible variance of an unbiased estimator for a given set of model parameters, and is related to the expected sharpness of the likelihood maximum (see Methods, Eq. (4)). While this is strictly speaking not a statement about a single image, but rather about the average information content of data generated by a model, it is often used to estimate localization uncertainty. A Bayesian alternative is the Laplace approximation [18], which derives an approximate Gaussian posterior distribution for the fit parameters in terms of the sharpness of the likelihood maximum for each particular image (see Methods, Eq. (5)). Both estimators supply an estimated variance ε 2 , but none of them are well characterized as estimators of localization uncertainty.
To test these estimators, we analyzed a large set of simulated movies of a fluorescent particle diffusing at D=1 µm 2 s −1 in an E. coli-like geometry. The movies cover a broad range of experimentally relevant imaging conditions (see Methods) and include realistic EM-CCD noise, background fluorescence, a non-Gaussian and space-dependent PSF [19], and motion blur [20].
A basic consistency check is that the average estimated error variance, ε 2 , agrees with the variance of the actual errors, (µ est. − µ true ) 2 . In Fig. 2a, we compare the square root of these quantities for different imaging conditions, based on MLE localizations combined with either CRLB or Laplace uncertainty estimators. We obtain consistency under good imaging conditions, where the spots are bright and the average errors low. However, as conditions worsen and the errors increase, the uncertainty is underestimated, especially by the CRLB.
There are several possible reasons for this behavior. The Laplace approximation is based on a truncated Taylor expansion of the log likelihood (see Methods). The CRLB is strictly not a point-wise error estimator at all, and is further defined in terms of the true parameter values, although by necessity evaluated with the fitted ones. The simplified Gaussian PSF model performs well for localization [4], but this does not guarantee good uncertainty estimates. Any of these approximations might fail in noisy or low light conditions. Another possibility is that localizations under poor imaging conditions are corrupted by a sub-population of fits that converge to a local minimum that does not reflect the underlying spot shape, e.g., fitting a single bright pixel as a very narrow spot, or misinterpreting the PSF shoulders as background or vice versa.
To address the latter complication, we constrained the localizations using prior distributions on selected param-eters, thus replacing MLE with maximum a aposteriori estimation (MAP). Fixing parameter values is not suitable here, since both the size and shape of spots fluctuate due to fluorophore motion and varying imaging conditions. Furthermore, it is experimentally easier to obtain independent information about the background and PSF width than about the spot intensity. We therefore limit our attention to background and PSF width, and found the following priors to perform well: a log-normal prior centered on the true value for the background intensity, and a skewed log-normal [21] prior for the PSF width to penalize fits with unphysical widths below that of an in-focus spot.
The PSF width prior together with the distribution of fitted PSF widths are shown in Fig. 2b. PSF widths below 1 pixel (80 nm) are virtually eliminated in the MAP fits. Background and spot amplitudes (see Fig. S1) are shifted somewhat downwards and upwards, respectively. As seen in Fig. 2c, this substantially improves the agreement between true and estimated errors under all imaging conditions. The Laplace estimator still outperforms the CRLB by a small margin, and numerical experiments on a wider range of priors (see SI Fig. S2) further confirm that the Laplace estimator is more robust to non-optimal priors than the CRLB.
How does the prior help? A direct comparison of the true errors for the MLE and MAP fits (SI Fig. S3) reveals very small differences, indicating that the improvement is mainly due to improved uncertainty estimates. This is consistent with the theoretical observation that the Fisher information matrix for the localization problem is nearly block-diagonal [4] with very weak coupling between two groups of fitting parameters: positions (µ x , µ y ) and shape parameters (N, b, σ). Thus, additional information about one of these groups does not reduce the errors of the other significantly. For uncertainty estimation, this is not the case: information about the background and PSF shape does help in estimation position uncertainty.
While mean-square errors are useful, we are ultimately interested in the full distribution of errors. In particular, most [10][11][12][13][14][15][16]22] (but not all [23]) statistical models of SPT data assume Gaussian errors, but this assumption has not been tested. The Laplace approximation (Eq. (5)) is a Gaussian approximation. If it was exact, the errors normalized by the estimated standard deviation would be Gaussian with unit variance, and produce a straight line in the Gaussian probability plot of Fig. 2d. The actual normalized errors based on MAP localizations is more Gaussian than the MLE ones, and follow the straight line for almost four standard deviations. This shows that the improved consistency of the MAP estimates also translates to more Gaussian error distributions.
Overall, these results show that point-wise uncertainty estimation using the Laplace approximation works well in a wide range of experimentally relevant conditions, but that experimentally accessible additional information about background and PSF shape is necessary to get consistent results in low light conditions. Moreover, we see good support for the common assumption of Gaussian errors.

Validation on real data
To test the above conclusions on real data, we imaged immobilized fluorescent beads, alternating strong and weak excitation as shown in Fig. 3a. We used images under strong excitation conditions to extract a high accuracy ground truth for testing the uncertainty estimates in the dim images. We estimated the position and uncertainty of spots using the MAP estimation described above, with a background prior centered around the mean background (0.8 photons/pixel) seen in dim frames. A drift-corrected ground truth was estimated by cubic spline interpolation between the mean positions obtained from each block of 10 consecutive bright images. Since the intensity differs by about a factor 10 between bright and dim frames, the RMS errors of the ground truth should be approximately 10-fold lower than that of a single dim spot. Fig. 3c shows the resulting erroruncertainty comparison, with every point corresponding to a single bead. It reproduces the behavior on simulated images in Fig. 2b, thus confirming our conclusion that the Laplace approximation is preferable to CRLB for uncertainty estimators, and that the good performance of our proposed PSF width prior is not limited to that particular simulated data set.

Diffusion constants
Next, we consider how estimated localization uncertainties leads to better estimates of diffusion constants, arguably the most common analysis of single particle tracking data. We divided the synthetic data shown in Fig. 1 into trajectories of length 10, estimated positions and uncertainties with the MAP and Laplace estimators described above, and estimated diffusion constants using the covariance-based estimators of Ref. [10] with and without the use of uncertainty estimates. Fig. 4 shows the mean value and 1% quantiles of the two estimators applied to every imaging condition, plotted against the signal-to-noise ratio. As expected, the use of estimated uncertainties improves the variability of the diffusion estimates substantially. The covariance-based estimators only use the average uncertainty in each trajectory (see Methods). We also implemented a maximum likelihood estimator for the diffusion constant [11] which makes explicit use of the point-wise uncertainties (see SI text S3), but found no further improvement. D∆t/ ε 2 [10] in simulated 10-step trajectories, mean value and 1% quantiles.

Analysis of multi-state data
We now turn to a more challenging problem where point-wise errors do matter: data where both the diffusion constant and localization error change significantly on similar time scales. In single particle tracking, changes in diffusion constant can be used as a non-invasive reporter on intracellular binding and unbinding events [24]. However, diffusive motion and localization errors contribute additively to the observed step length statistics (see Eq. 7), and thus changes in diffusion constants and localization errors cannot be reliable distinguished.
As an example, we consider a protein that alternates between free diffusion (D = 1 µm 2 s −1 ) and a bound state simulated by slow diffusion (D = 0.1 µm 2 s −1 ). We focus on a single trajectory with 4 binding/unbinding events, two of which occur about 400 nm out of focus, and thus are accompanied by substantial broadening of the PSF and accompanying increases in localization errors. This defocus matches roughly the radius of an E. coli cell, and the scenario could model tracking experiments with cytoplasmic proteins that can bind to the inner cell membrane.
Using SMeagol [8], we simulated 12 000 replicas of the above set of events, at a camera frame rate of 200 Hz, continuous illumination, and 300 photons/spot on average. Fig. 5a shows the z coordinates in the input trajectory, and the frame-wise RMS errors produced by the MAP localization algorithm described above. The input points are sparse (tens of ms apart), which allows SMeagol to produce simulated movies that contain the same binding events and defocus trends, but vary in the detailed diffusion trajectory as well as in noise realizations. Examples of simulated spots along a trajectory are shown in Fig. 5b.
To analyze this challenging data set, we extended the maximum likelihood diffusion constant estimator with explicit point-wise errors to include multiple diffusion states governed by an hidden Markov model (HMM), for which we derived a variational EM algorithm (see SI text S4). We then analyzed each simulated trajectory with three different 2-state HMMs: (i) vbSPT, which models the observed positions as pure diffusion and neglects blur and localization errors [24], (ii) the above-mention HMM that explicitly models these effects, and (iii) the Kalman filter limit of the HMM in (ii), which models localization errors but not blur effects. Since multi-state Kalmantype algorithms have been studied previously [13,15,16], it is interesting to compare models with and without blur. Fig. 5c shows the inferred average state from the three different models. As expected, the HMMs that include localization errors outperform vbSPT at detecting the strongly defocused first and third binding events. The full HMM does not give the best classification of the two short binding events, but it does give the lowest overall misclassification rate, 9% versus 9.6% and 17% for the Kalman and vbSPT models, respectively, so the apparent worse time resolution might reflect an overall tendency to invent spurious transitions by the Kalman and vbSPT models.
Next, we look at estimated diffusion constants. Here, the Kalman and vbSPT models make systematic errors as seen in the bare parameters in Fig. 5d. However, by comparing the step length statistics between the full and simplified models, one can derive heuristic correction fac- tors for these models (see Methods, Eq. (9)). As shown in Fig. 5e, this reduces the bias substantially, especially for the high diffusion constant.
To finally compare the different HMMs on more wellbehaved data, we reran the same experiment but with all z coordinates rescaled by a factor 1/5 in the PSF model, which essentially removes the z-dependent defocus effects. On this less challenging data set, event detection is much improved and the differences between the three HMMs are much less pronounced. However, with misclassification rates of 4.8%, 6.6%, and 7.6% for the HMM, Kalman model, and vbSPT, respectively, the full HMM still does overall better.

Position refinement
Since the new HMM includes the true trajectory as a hidden variable and performs a global analysis, it can be used to refine individual localized positions, and in principle beat the Cramer-Rao lower bound for single image localizations. Fig. 6a shows the true, measured, and refined positions for part of a two-state trajectory, and Fig. 6b the relative change of the RMS error for each frame in Fig. 5a after refinement. The refinement improves the RMS errors with up to 50%. Large localization errors and small diffusion constant leads to larger relative improvement, which is intuitive since those factors both make a single point less informative relative its neighbors. A few points show a small error increase. On closer inspection, these turn out to be situated near hardto-detect state-changes, and therefore tend to be refined using an incorrect diffusion constant.

DISCUSSION
Fluorophore positions are not the only useful kind of information in super-resolution microscopy images. Here, we have shown that point-wise position uncertainty can also be extracted and used to improve quantitative data analysis. This is particularly important for live cell data where dynamic phenomena can be studied, and one may expect more heterogeneous imaging conditions where many of the theoretical estimates of localization errors, that may apply for cells with immobilized internal structure ("fixed"), will have little relevance.
In general, our results indicate that estimation of position uncertainty is more sensitive to the fit model than estimation of position. For parctical use, we find that an estimate based on the Laplace approximation combined with external information about the fluorescent background and PSF shape performs well in a wide range of experimentally relevant conditions. An intuitive reason why the CRLB performs worse may be that it makes explicit use of the fit model twice, for both fitting and predicting the model-dependent uncertainty, while the Laplace approximation only requires the first step. Theoretically, the posterior density formalism of the Laplace approximation also fits more naturally with model-based time-series analysis where the particle trajectory is treated as a latent variable to be integrated out.
While we have focused on 2D localization using conventional optics, the extension to e.g., three dimensions using dual plane imaging [25] or engineered PSFs [26] present no principal difficulties, as long as appropriate PSF models for localization can be formulated. Note however that even a perfect characterization of an experimental PSF [25] does not constitute a perfect localization model, since it does not describe random PSF shape fluctuations due to motion blur [20]. Current techniques for 3D localization are inherently asymmetric and yield different in-plane and axial accuracy [5], which further underscores the need for downstream analysis methods to incorporate heterogeneous localization uncertainty.
Most super-resolution microscopy applications are however not aimed at particle tracking, but imaging. For PALM/STORM type imaging of fixed samples, the ability to estimate the uncertainty of individual spots does not improve the localizations themselves, but it may still improve the final resolution since uncertain points can be omitted on a quantitative basis. However, the consequences for live cell imaging are more interesting, since the same fluorophore may be detected in different positions over different frames if the target is moving. For this case, we have shown that the combination of estimating uncertainty and modeling the fluorophore motion can produce refined position estimates, in principle pushing the localization errors below the single-spot Cramer Rao lower bound, by merging information from consecutive frames in an optimal way.

Synthetic data
We generated synthetic microscopy data using SMeagol, a software for accurate simulations of dynamic fluorescence microscopy at the single molecule level [8]. We modeled the optics using a Gibson-Lanni PSF model with 584 nm wavelength and NA=1.4 [19], and an EM-CCD camera with the high gain approximation [4,17] plus Gaussian readout noise.
For localization and diffusion estimation tests, we simulated simple diffusion (D = 1µm 2 s −1 ) in a cylinder of length 14.4 µm and diameter 2 µm, similar to long E. coli cells, in order to avoid confinement artifacts in the longitudinal direction. We generated 48 data sets spanning a wide range of imaging conditions by varying frame duration (8 ms or 10 ms, exposure time 2 ms, 4 ms, 6 ms, and 3 ms, 5 ms, 8 ms respectively, background intensity (1 or 2 photons per pixel), average spot brightness (75-300 photons/spot), EM gain 20 or 30, and readout noise level 4 or 8. Each data set contained 10 4 individual spots.
For the simulated multi-state data, we hand-modified a single SMeagol input trajectory from a simulated 2state model to contain 4 binding events with different durations and z-coordinates as seen Fig. 5a, and also thinned out the input trajectory to create more variabil-ity in the particle paths between different realizations. We then simulated many realizations from this input trajectory, using the same PSF model as above, continuous illumination with a sample time of 5 ms, EMCCD gain 90, an average spot intensity of 300 photons/spot, and a time-dependent background that decays exponentially from 0.95 to 0.75 background photons per pixel with a time-constant of 0.75 s.

Real data
For estimating localization errors in the real imaging conditions, we use immobilized fluorescent beads with the diameter of 0.1 µm (TetraSpeck Fluorescent Microspheres, ThermoFischer T7284). The beads where diluted in ethanol and then placed on a coverslip where we let them dry in before adding water as a mounting medium.
Imaging was done with a Nikon Ti-E microscope which was configured for EPI-illumination with a 514 nm excitation laser (Coherent Genesis MX STM) together with matching filters (Semrock dichroic mirror Di02-R514 with emission filter Chroma HQ545/50M-2P 70351). Intensity modulation was made possible by an acousto-optic tunable filter (AOTF) (AA Opto Electronics, AOTFnC) which was triggered by a waveform generator (Tektronix, AFG3021B). The waveform used was a sequence of square pulses, high for 200 ms and low for 1800 ms. The two illumination intensities, high and low, corresponds to 10.7 kW cm −2 and 0.63 kW cm −2 , respectively. Fluorescent beads where viewed through a 100x (CFI Apo TIRF 100x oil, Na=1.49) objective with a 2X (Diagnostic instruments DD20NLT) extension in front an Andor Ultra 897 EMCCD camera (Andor Technology Ltd.). This configuration puts the pixel size to 80 nm which is the same pixel size set in the simulated data. The data set constituted of 1000 frames (Fig. 3a) with an exposure time of 30 ms. EMCCD noise characteristics (gain, offset, readout noise) were determined by analyzing a "dark" movie obtained with the shutter closed.

Localization
We perform maximum likelihood (MLE) localization using a high-gain EMCCD noise model [4,17], which relates the probability q(c i |E i ) of the offset-subtracted pixel count c i for a given pixel intensity E i (expected number of photons/frame) in pixel i. For the intensity E(x, y), we model the PSF with a symmetric Gaussian, (1) with pixel size a, background b (expected number of photons/pixel), spot width σ, amplitude N (expected num-ber of photons/spot), and spot position (µ x , µ y ), and approximate the pixel intensity by numerical quadrature [34]. The log likelihood of an image containing a single spot is then given by where θ = (µ x , µ y , b, N, σ) are fit parameters, and q 0 is a prior distribution (we set ln q 0 = 0 for MLE fitting).
To avoid the complications of spot identification, we use known positions to determine the ROI and initial guess for (µ x , µ y ). All localizations are performed using a 9 × 9 ROI. Fits that failed to converge or estimated uncertainties larger than 2 pixels where discarded.

Cramer-Rao lower bound
The CRLB is a lower bound on the variance of an unbiased estimator [35,36]. We use an accurate approximation to the CRLB for a symmetric Gaussian PSF from Ref. [5], with τ = 2πσ 2 a b/(N a 2 ), σ 2 a = σ 2 + a 2 /12, and the prefactor 2 accounts for EMCCD excess noise [4].

Laplace approximation
An alternative way to approximate the uncertainty of the fit parameters is to Taylor expand the likelihood around the maximum-likelihood parameters θ * to second order, that is The first-order term is zero, since θ * is a local maximum. This approximates the likelihood by a Gaussian with covariance matrix given by the inverse Hessian, i.e., Σ = [∂ 2 ln L/∂θ 2 ] −1 . In a Bayesian setting, this expresses the (approximate) posterior uncertainty about the fit parameters. The estimated uncertainties (posterior variances) are given by the diagonal entries of the covariance matrix, e.g., ε 2 Lap. (µ x ) = Σ µx,µx . We compute the Hessian numerically using Matlab's built-in optimization routines, and use the log of the scale parameters b, N, σ for fitting, since they are likely to have a more Gaussianlike posterior [37].

Prior distributions
The prior distributions for localization used in the main text are: normal priors with mean value ln b true and std. 0.2 for the log background intensity, resulting in the log-normal prior b ∈ ln N (ln b true , 0.2 2 ). For the PSF width σ, we use a skewed normal prior for ln σ to penalize fits with σ below the minimum in-focus width of the PSF. The skew normal density is given by [21] f (x, where φ is the probability distribution function and Φ is the cumulative distribution function for the unit normal distribution N (0, 1). We used x 0 = log(1.5), w = 1, and α = 5, as sketched in Fig. 2b, with the pixel width a = 80 nm as the length unit. These priors are shown in SI Fig. S1.

Covariance-based diffusion estimator
If x k (k = 0, 1, . . .) is the measured trajectory of a freely diffusing particle with diffusion constant D, the widely used model for camera-based tracking by Berglund [22] predicts that the measured step lengths ∆ k = x k+1 − x k are zero-mean Gaussian variables with covariances given by and uncorrelated otherwise. Here, 0 ≤ R ≤ 1/4 is a blur coefficient that depends on how the images are acquired (e.g., R = 1/6 for continuous illumination), ∆t is the measurement time-step, and ε 2 is the variance of the localization errors.
Substituting sample averages for ∆ 2 k and ∆ k ∆ k+1 and solving for D yields a covariance-based estimator (CVE) with good performance [10]. If ε 2 is known or can be estimated independently, the first relation in Eq. (7) alone yields a further improved estimate of D. As we argue in Sec. S2 S2.3, these estimators apply also for variable localization errors if ε 2 is replaced by the average ε 2 .

Maximum likelihood and multi-state diffusion
The Berglund model [22] can also be used directly for maximum likelihood inference, which has the potential advantage that point-wise errors can be modeled [11]. The basic assumption is to model the observed positions x k as averages of the true diffusive particle path y(t) during the camera exposure, plus a Gaussian localization error, i.e., where f (t) is the normalized shutter function [22], ε k is the localization uncertainty (standard deviation) at time k, and ξ k are independent Gaussians random numbers with unit variance. Continuous illumination is described by a constant shutter function, f (t) = 1/∆t. The limit where blur effects are neglected can be described by setting f (t) to a delta function, which corresponds to instantaneous position measurement. This reduces Eq. (8) to a standard Kalman filter [12], and leads to R = 0 in Eq. (7).
In SI text S3, we derive a maximum likelihood estimator that learns both D and y(t). In SI Sec. S4, we extend the model to multi-state diffusion, by letting the diffusion constant switch randomly between different values corresponding to different hidden states in an HMM, and derive a variational EM algorithm for maximum likelihood inference of model parameters, hidden states, and refined estimates of the measured positions.
To interpret estimated diffusion constants from simplified models, one may "derive" corrected diffusion estimates D * by equating expressions for the step length variance ∆ 2 k from Eq. (7) with and without those effects present. For the Kalman (R = 0) and vbSPT (R = ε = 0) models, we get respectively, which is what we use in Fig. 5e.

Software
Matlab open source code for the EM and localization algorithms are available at github.com/bmelinden/uncertainSPT.

SUPPLEMENTARY MATERIAL
Appendix S1: Localization error estimates with different priors In this section we present how different prior functions assigned to the PSF width σ and the background b influence the accuracy of the computed RMS errors and RMS uncertainties for the synthetic data set considered in this paper.
We considered log-normal priors for the background intensity, either centered around the true (simulated) background in each data set, or around the average background intensity in all data sets. For the PSF width we tested two types of priors: (i) a log-normal prior around the expected value for the PSF width; (ii) a skew-normal prior on ln(σ), which assigns very low probability to PSF widths below ≈ 0.8 pixels, as well as low probabilities for very wide PSFs. We do not impose any prior on the spot amplitude N .
The probability density function for the log-normal prior, denoted as ln N (µ, σ 2 ), is defined by where µ and σ are the mean and standard deviation parameters. The skew-normal distribution [21] has a density function given by where φ is the probability density function and Φ is the cumulative distribution function for the unit normal distribution N (0, 1), and the parameter α determines the degree of asymmetry; with α positive (negative) increasing the weight above (below) the mean value parameter x 0 . For our investigations on the effects of different prior distributions, we used two background priors: • Log-normal "correct" prior on b: ln N (ln b true , 0.2 2 ), where b true is either 1 or 2 photons/pixel, which is the background level used for simulating that particular data set.
• Log-normal "average" prior on b: ln N (ln 1.5, 0.2 2 ), centered around the average background intensity in all data sets, that is, 1.5 photons/pixel.
For the PSF width, we considered the two following priors: • Log-normal prior on σ: ln N (ln 1.5, 0.2 2 ), since the half-width of the focused PSF width is around 1.5 pixels wide. substantially underestimate the true errors, especially for points with low photon count. In comparison, all other prior combinations present an average improvement. Overall, the Laplace estimator seems more consistent (RMS uncertainty mostly closer to RMS error), and also less sensitive to the choice of priors. The latter trend is especially clear in the examples using the "average" background prior, which is always either a systematic over-or underestimate, which results in substantially larger variability or the CRLB results.

S2.1. Diffusive camera-based tracking with blur and localization errors
To streamline the presentation, we will use units where ∆t = 1, use the subscript t = 0, 1, 2, . . . to denote discrete time dependence, and use the step length and localization variances, λ t = 2D t ∆t and v t = ε 2 t , respectively. We allow both of them to be time-dependent, but assume them to be statistically independent and restrict the diffusion constant to be constant throughout each frame. Then, Eq. (8) reads where ξ t are independent identically distributed (iid) N(0,1) variables, and y(t) is the true trajectory of the particle being localized. The shutter distribution f (t) is a probability density on [0, 1], which describes the image acquisition process (e.g., f (t) = 1 for continuous acquisition), but neglects stochastic elements such as fluorophore blinking. It has the distribution function We divide y(t) in two parts, the true positions y t at the beginning of each frame, which evolve according to where η t are again iid N(0,1), and a conditional interpolating process between them, described by Brownian bridges [27]. Thus, for 0 ≤ t ′ ≤ 1, we write where B t are a set of iid standard Brownian bridges. These are Gaussian processes on the interval [0,1], defined by and also independent on different intervals, so that B r (t ′ )B v (t ′′ ) = 0 if r = v. Substituting the interpolation formula Eq. (S4) in the localization model Eq. (S1), we get where we have introduced the shutter average, given by Using the properties of Brownian bridges, Eq. (S5), one can show that the last integral in Eq. (S6) is a Gaussian random variable with mean zero and variance where R is the blur coefficient of Ref. [22], given by By assumption, v t and B t are statistically independent, and thus one can add up the noise in the measurement model and arrive at where ζ t are again iid N(0,1).

S2.2. Constant exposure
An important class of shutter distribution are those that are constant during some fraction t E of the each frame, and then zero, i.e., which leads to We see that R ≤ 1 6 and τ ≤ 1 2 , with maxima at continuous exposure (t E = 1). On the other hand, β has a maximum of 1 9 at t E = 2 3 , and the value of 1 12 at continuous exposure can only be further lowered when t E < 1 3 . It is unclear if this is significant, since with non-constant exposure there is some freedom in how the shutter distribution is defined, and one could also place it symmetrically in the interval and get τ = 0.5 for all exposure times. We defer further investigations of this issue to future work.

S2.3. Covariance relations
The covariance matrix for the steps ∆ t = x t+1 − x t can be found from Eqs. (S3,S10). With some manipulations, we get where the expectations · are understood to be over the noise distributions only. If we further assume simple diffusion, λ t = 2D∆t = const., and average over time as well, we recover covariance relations of the same form as Eq. (7), where the averages are now over time as well, and ε 2 is identified as the time-average v t . Thus, the covariance-based estimators of Ref. [10] should apply also to non-constant localization errors.

S4.1. Model
In addition to the measured (x) and true (y) positions, we include a hidden state trajectory s = [s 1 , s 2 , . . . , s T ], such that s t determines the diffusion constant on the interval [t, t + 1]. The hidden states are numbered from 1 to N , and evolve according to a Markov process with transition matrix A and initial state probability π. For a single 1-dimensional trajectory, this leads to a complete data likelihood of the form p(x, y, s|λ, A, π) = p(x|y, s, λ)p(y|s, λ)p(s|A, π), (S1) with factors

S4.2. Variational EM approach
We would like to perform maximum-likelihood inference of the model parameters, which means maximizing the likelihood with latent variables s, y integrated out, L(A, π, λ) = dy s p(x|y, s, λ)p(y|s, λ)p(s|A, π). (S5) Since this problem is intractable, we make a variational approximation, meaning we approximate ln L with a lower bound ln L = ln dy s p(x|y, s, λ)p(y|s, λ)p(s|A, π) ≥ dy s q(s)q(y) ln p(x|y, s, λ)p(y|s, λ)p(s|A, π) q(s)q(y) ≡ F, (S6) where the inequality follows from Jensen's inequality. Here, q(s), q(y) are arbitrary variational distributions that need to be optimized together with the model parameters to achieve the tightest lower bound, and can be used for approximate inference about the latent variables. In particular, it turns out that the optimal variational distributions approximate the posterior distribution of y, s in the sense of minimizing a Kullback-Leibler divergence [29]. We use gradient descent for the optimization, i.e., we iteratively optimize each variational distribution and the parameters with the others fixed, which leads to an EM-type algorithm.
To optimize F w.r.t. the variational distributions, we set the functional derivatives of F to zero and use Lagrange multipliers to enforce normalization. After some work, one arrives at ln q(s) = − ln Z s + ln p(x|y, s, λ q(y) + ln p(y|s, λ q(y) + ln p(s|A, π), (S7) ln q(y) = − ln Z y + ln p(x|y, s, λ q(s) + ln p(y|s, λ q(s) + ln p(s|A, π) q(s) where Z s,y are normalization constants originating from the Lagrange multipliers, and · f denotes an expectation value computed with respect to the distribution f . As it turns out, these equations are individually tractable, and results in q(y) being a multivariate Gaussian, and q(s) adopting the standard HMM form amenable to efficient forward-backward iterations. For the initial state and transition probability parameters, the only dependence in F comes from p(s|A, λ), and arrives at the classical Baum-Welch reestimation formulas [30]. However, optimizing F w.r.t. step length variances does not lead to a tractable update equation, and we are instead forced to optimize the λ j -dependent parts of F numerically, i.e., λ j = argmax λj ln p(x|ys, λ) q(y)q(s) ln p(y|s, λ) q(y)q(s) , (S9)

S4.3. The lower bound
The lower bound can be computed as well, and gets a particularly simple form just after the update of q(s). Substituting the update equation Eq. (S7) into the expression for F , Eq. (S6), we get where N (a, b) denotes a Gaussian density with mean a and variance b. Furthermore, the localization uncertainty is also assumed Gaussian and independent of the underlying kinetics: p(x t |z, y, s, θ) = p(x t |z t ) = N (z t , v t ). (S14) Applying Bayes theorem to these relations, we get p(z t |x t , y, s, θ) = p(x t |z t )p(z t |y, s, θ) p(x t |y, s, θ)) = N (z t , v t )N (y t , βλ st ) N (y t , βλ st The predictive distribution is finally given by marginalizing p(z t |x t , y, s, θ) over the posterior for y, s. Using the variational distribution, this means . (S16) In particular, the posterior mean of z t is then given by which we will use as our estimator for refining the localizations. Here, µ t = y t q(y) is the variational mean value. The variational average over hidden states is done numerically.