## Abstract

Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importance in many fields of science. A crucial objective when interpreting such data is to fit these averages (for instance, squared displacements) with a function and extract parameters (such as diffusion constants). A commonly overlooked challenge in such function fitting procedures is that fluctuations around mean values, by construction, exhibit temporal correlations. We show that the only available general purpose function fitting methods, correlated chi-square method and the weighted least squares method (which neglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedy this by deriving a new closed-form error estimation formula for weighted least square fitting. The new formula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free of the robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy in four examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractional Brownian motion and continuous time random walks. We also successfully apply our method, weighted least squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICE software.

## Introduction

Time-dependent ensemble averages appear in several scientific fields. Examples include: particle tracking experiments where mean square displacements (MSD) are measured at different sampling times^{1}, human travel dynamics where dispersal distance as a function of time are measured^{2}, single-molecule pulling experiments^{3}, applications of fluctuation theorems^{4} such as the Jarzynski equality^{5}, measurements of the time-dependence of donor-acceptor distance dynamics^{6}, tracer particle dynamics in complex systems^{7} and correlation functions in spin systems and lattice gauge theories^{8}. The final step when interpreting ensemble averages is often to fit a function to these averages in order to extract parameters.

Fitting a function to data is done so readily in science that one seldom considers the correctness of the standard go-to solution of the (linear and non-linear) weighted least squares (WLS) method^{9,10,11}. One of the crucial implicit assumptions of the “standard” version of this method is that the fluctuations around mean values are independent. However, since for time-dependent ensemble averages the data is sampled along trajectories, this independence assumption is in general *not* satisfied when analyzing ensemble averages; heuristically, if in one trajectory an observable, such as the square displacement, was smaller than its ensemble averaged value at some time, it is typically still so at the next time step. For an illustrative example, see Figure S1 in Supplementary Information, which shows the time-evolution in simulations of fractional Brownian motion (FBM). Thus, the fluctuations around an ensemble averaged (time-dependent) observable will in general exhibit temporal correlations. Herein, the term trajectory is used in its widest sense: an observable (such as squared displacement) is chosen, and a trajectory is then measurements of this observable at different consecutive sampling times.

The question now arises of how severe the consequences of neglecting the temporal correlations in least squares fitting are. We demonstrate that such neglect leads to unreliable error estimation for parameters and can in some cases lead to underestimated errors for fitted parameters (such as diffusion constants) by more than one order of magnitude for our prototype systems (see below). The unreliability of the estimated errors can have detrimental effects when statistically interpreting the data: The 1*σ* (2*σ*) rule for Gaussian statistics states that 68% (95%) of the observed data should (on average) fall within ±1 (±2) *σ* from the estimated mean. For this rule to be meaningful one must have a correct estimator for the variance in estimated parameters, *σ*^{2}.

To our knowledge, the only previous method for dealing fully with correlation in data for function fitting to ensemble-averages is the correlated chi-square method (CCM)^{12,13}. This method is known to the lattice quantum chromodynamics community, but does not seem to have found wide spread use. This could partly be due to that, while mathematically sound, numerical robustness issues have been identified^{14,15}. We here carefully examine the CCM method and demonstrate that it in general only provides correct parameter estimation in a small region of the “phase space” (*N*, *M*), where *N* is the number of sampling times and *M* is the number of trajectories. Thus, it appears that the CCM is of limited general purpose use for fitting of time-dependent ensemble averages to a model function.

Although the least squares and WLS methods are common techniques for parameter estimation from ensemble averages, alternative methods exist, e.g., for inferring parameters from trajectories for biological systems^{16,17,18}. In particular, for Brownian motion (BM) an optimal estimator for the diffusion constant has recently been derived^{19,20,21}. Bayesian methods^{11,22,23,24,25,26} have also been used for parameter estimation for certain classes of systems. In general, when they apply, these methods give more precise parameter estimates than the WLS method. However, these newer approaches require as input a full stochastic model of the process, and we refer to this type of approach as *model matching* methods. By a full stochastic model we here refer to a model from which (in principle) any probability or average of a measured observable can be calculated. A simple example is BM, where the time-evolution is described by a Langevin equation with a noise term for which the statistics is fully specified. In contrast, the WLS and CCM methods are parametric *function fitting*^{27} type methods, which can be used even if a full stochastic model is not available to describe the data at hand. An example from single-particle tracking, where function fitting is useful, is if one wants to determine a power-law exponent for the scaling of the mean-square displacement with time. In this situation, a function fitting procedure such as WLS can be used, without making any assumption about the underlying dynamics. Also, even if a full stochastic model is indeed available, it might be impractical to carry out a full model matching procedure.

In this article, we derive a mathematically rigorous expression for the variance and covariance of estimated parameters in WLS fitting. Our new error estimation formula for fitted WLS parameters takes into account the temporal correlations, which are intrinsic to ensemble averages based on trajectories. To avoid confusion we term the “standard” WLS method^{9,10,11} (i.e., weighted least squares neglecting correlation) as WLS-ECE (Weighted Least Squares Excluding Correlation in Error estimation), whereas our new approach is referred to as WLS-ICE (Weighted Least Squares Including Correlation in Error estimation). In figures and discussion where we only consider parameter values and not the associated errors, we only use the term WLS. In contrast to the previous two methods (WLS-ECE and CCM), our new method has the desirable unique features of providing both (1) robust parameter estimates in the full phase space (*N*, *M*) with mean parameter values in agreement with theory for our prototype systems; (2) error estimates that reproduce the observed spreads in our fitted parameters.

As prototype models we use BM, damped harmonic oscillation (DHO) in a heat bath, FBM and continuous time random walks (CTRW). These have been identified as important model systems in a wide range of systems. BM is of interest to many fields of science^{28,29,30}. Variants of DHO appear in physics, engineering and chemistry^{31}. FBM has been applied, for instance, to protein dynamics^{6}, in financial modeling^{32}, for analyzing climate time series^{33}, to describe tracer particle diffusion^{7,34} and for modeling earth quake phenomena^{35}. Recent applications of CTRW^{28,36} include modeling of human travel patterns^{2} and of molecular motions in cells and cell membrane^{34,37}. However, we point out that our model systems are merely convenient examples for illustrating our WLS-ICE function fitting procedure, which can be applied to arbitrary fit functions. Our four model systems provide ideal test beds for our method, because the functions to be fitted, the mean position and MSD, are known analytically for these systems. Moreover, trajectories are fast to generate for these systems, which, which facilitates stringent testing of the fitting methods based on a relatively large number of trajectories.

We finally point out two restrictions on the scope of our study: First, we do not concern ourselves with the model selection problem^{11,38}, i.e., how to choose the “best” model or “best” form for the fit function. Second, in single particle tracking (one of the application fields of our results), it is common to separate between time-averaged observables (such as the time-averaged MSD) and ensemble averaged observables^{39,40}. In certain cases, these averages are described by the same functional form, but this is not always so^{40}. In this study our sole focus is on ensemble averaged observables.

## Methods

In what follows, we provide a ready-to-use method, which is further motivated and detailed in Section A in Supplementary Information.

### The WLS-ICE procedure

In experiments or simulations one records a set of trajectories, here denoted by *m*. The task at hand is to fit some functional form *f*(*t*_{
i
};** θ**) =

*f*

_{ i }(

**), with**

*θ**K*free fitting parameters

**=**

*θ**θ*

_{1}, …,

*θ*

_{ K }to some ensemble averaged observable $\overline{y}({t}_{i})={\overline{y}}_{i}$ over the trajectories, i.e., to a sample mean of the form

where the index *i* is over the *N* sampling times ** T** =

*T*

_{1}, …,

*T*

_{ N }(with $N\ge K$). Herein, we use bold symbols to denote vectors or matrices. For BM, FBM and CTRW (see Results), which are all zero mean processes, the observable used is the squared displacements, i.e., ${y}_{i}^{(m)}=|{\mathit{x}}^{(m)}({T}_{i})-{\mathit{x}}^{(m)}(0){|}^{2}$, where

*x*^{(m)}(

*t*) is the position (a vector with

*d*components, where

*d*is the number of spatial dimensions) at process time

*t*for trajectory

*m*, and the start time for the simulation/experiment is

*t*= 0. For DHO, our non-zero-mean prototype process, we instead use the position directly as relevant observable, ${y}_{i}^{(m)}={x}^{(m)}({T}_{i})$. It is important to point out, however, that in the fitting procedure the quantity ${y}_{i}^{(m)}$ can be any observable for trajectory

*m*at sampling time

*T*

_{ i }. We shall consistently use a ‘bar’ to denote a sample estimator (we only make use of sample means and sample covariances). The challenge in function fitting procedures

^{10}is to fit some function

*f*

_{ i }(

**) to the data ${\overline{y}}_{i}$ and thereby extract the model parameters,**

*θ***. This problem has previously been tackled using the WLS-ECE or CCM methods (reviewed in Section B in Supplementary Information).**

*θ*Our approach, the WLS-ICE method, extends the WLS-ECE procedure with a correct error estimation formula which takes correlations in fluctuations around ensemble averages into account (see Introduction). For completeness and ease of application, we here provide the full details of the proposed WLS-ICE fitting procedure. We start by introducing a cost function, χ^{2}, based on the the difference between the sample average and the fit function ${\mathrm{\Lambda}}_{i}={f}_{i}(\mathit{\theta})-{\overline{y}}_{i}$ for all time points, according to

where ** R** is a symmetric positive definite matrix. This cost function is to be minimized with respect to

**in order to determine the best parameter values, ${\stackrel{\u02c6}{\theta}}_{a}$ (**

*θ**a*= 1, …,

*K*)

^{41}. We use a ‘hat’ to denote parameters which have been estimated through minimization of the χ

^{2}cost function above and for the estimated (co)variance of such parameters. In the WLS method one uses weights ${R}_{ij}={\overline{R}}_{ij}={\delta}_{i,j}/{\overline{C}}_{ij}$, where

*δ*

_{i,j}is the Kronecker delta, and the (unbiased) sample “covariance matrix of the mean” is defined as ${\overline{C}}_{ij}={\overline{Q}}_{ij}/M$, with $\overline{\mathit{Q}}$ being the sample covariance matrix

While this specific choice of ** R** is used in our applications, we note that the results in this section, including the new error formula below, is valid for arbitrary choices of

**. In Section A in Supplementary Information we elaborate on one “non-conventional” choice of**

*R***particularly adapted for BM.**

*R*The parameters, ${\stackrel{\u02c6}{\theta}}_{a}$, obtained by minimizing *χ*^{2} in Equation (2), have a (co)variance ${\mathrm{\Delta}}_{ab}=\u3008({\stackrel{\u02c6}{\theta}}_{a}-{\theta}_{a}^{\u204e})({\stackrel{\u02c6}{\theta}}_{b}-{\theta}_{b}^{\u204e})\u3009$, where $\u3008\dots \u3009$ denotes ensemble average. Throughout this study we use a ‘star’ to denote exact parameter values, i.e., estimated values as *M* → ∞. The variances of the fitted parameter are ${\sigma}_{a}^{2}={\mathrm{\Delta}}_{aa}$. As noted in the Introduction, this covariance depends on the temporal correlations. For a stationary process, it is well-known how to estimate the variance of a mean in the presence of temporal correlations, typically by expressing the variance in terms of the sum or integral of the auto-correlation function^{42,43}. In the present context, such an estimation corresponds to fitting to a constant, *f*_{
i
}(*t*) = *θ*_{1}, and assuming all correlation functions only depend on time differences.

We here extend the above-mentioned results to non-stationary processes and arbitrary fit functions by deriving the analogous expression for ${\stackrel{\u02c6}{\mathrm{\Delta}}}_{ab}$ by using the full multivariate probability density for the fluctuations around mean values. Briefly, the covariance for the estimated parameters is defined ${\stackrel{\u02c6}{\mathrm{\Delta}}}_{ab}=\u3008({\stackrel{\u02c6}{\theta}}_{a}-{\theta}_{a}^{\u204e})({\stackrel{\u02c6}{\theta}}_{b}-{\theta}_{b}^{\u204e})\u3009$ where $\u3008F(\overline{\mathit{y}})\u3009=\int F(\overline{\mathit{y}})\rho (\overline{\mathit{y}};{\mathit{\theta}}^{\u204e})d{\overline{y}}_{1}d{\overline{y}}_{2}\phantom{\rule{.25em}{0ex}}\cdots \phantom{\rule{.25em}{0ex}}d{\overline{y}}_{N}$ denotes an average over the multivariate probability density, $\rho \phantom{\rule{.25em}{0ex}}(\overline{\mathit{y}};{\mathit{\theta}}^{\u204e})$. We note that the dependence of the estimated parameters $\stackrel{\mathit{\u02c6}}{\mathit{\theta}}$ on $\overline{\mathit{y}}$ is implicitly determined by the minimization condition ∂*χ*^{2}/∂*θ*_{
a
} = 0. Now, because all ${\overline{y}}_{i}$ are averages over *M* identically distributed random numbers, for large *M*, it immediately follows from the multivariate central limit theorem that the function *ρ* takes the Gaussian form: $\rho (\overline{\mathit{y}};{\mathit{\theta}}^{\u204e})={Z}^{-1}\mathrm{exp}(\phantom{\rule{-.25em}{0ex}}-\phantom{\rule{-.25em}{0ex}}{(\overline{\mathit{y}}-{\mathit{y}}^{\u204e})}^{T}{\mathit{C}}^{\u204e-1}(\overline{\mathit{y}}-{\mathit{y}}^{\u204e})/2)$ with normalization constant $Z={(2\pi )}^{N\mathrm{/2}}\sqrt{\mathrm{det}({\mathit{C}}^{\u204e})}$^{44}. Two complications that occur in evaluating ${\stackrel{\u02c6}{\mathrm{\Delta}}}_{ab}$ in closed-form are that the $\overline{\mathit{y}}$-dependence of $\stackrel{\mathit{\u02c6}}{\mathit{\theta}}$ is implicit, and, in general, non-linear. Both of these challenges are solved by making a Taylor series expansion of ${\stackrel{\u02c6}{\theta}}_{a}-{\theta}_{a}^{\u204e}$ in terms of $\overline{\mathit{y}}-{\mathit{y}}^{\u204e}$ and implicitly using the minimization condition. The full derivation is given in Section A in Supplementary Information. The final result is the following estimator:

and

where the indices *a*,*b* = 1, …, *K*. Equation (4) gives a mathematically rigorous expression (to lowest order in 1/*M*) for the covariance of the estimated parameters, and is our key result. It allows us to accurately estimate the covariance of any parameter fitted by minimizing the cost function in Equation (2). Notice that the correlations in fluctuations around mean values enter through the quantity $\overline{\mathit{Q}}$, which is estimated using the usual sample estimate above. In practice, our general formula, Equation (4) is simple to implement and computationally fast.

The new error estimation formula, Equation (4), reduces to previously known results in specific limits. (i) Neglecting the off-diagonal elements of $\overline{\mathit{Q}}$ above we recover the WLS-ECE error estimation formula^{9}. (ii) By setting $\overline{\mathit{R}}={\overline{\mathit{C}}}^{-1}$ above we recover the covariance estimation formula for CCM^{10,12}. (iii) For a stationary process one seeks to fit a constant, *f*_{
i
}(*θ*_{1}) = *θ*_{1}, to data. For such a case, the minimization procedure (solving ∂*χ*^{2}/∂*θ*_{1} = 0 with *R*_{
ij
} = (1/*σ*^{2})*δ*_{i,j}, where *σ* is the time-independent variance) yields ${\stackrel{\u02c6}{\theta}}_{1}=(1/N){\sum}_{i}{\overline{y}}_{i}$, i.e., the parameter estimate is the mean of the data. The error estimation Equation (4), then reduces to the usual result^{42,43} $\stackrel{\u02c6}{\mathrm{\Delta}}=(1/M){\sum}_{i,j}{\overline{Q}}_{ij}/{N}^{2}$ used, for instance, in analyzing Monte Carlo and molecular dynamics simulations. (iv) For linear fit functions, *f*_{
i
}(** θ**) =

*θ*

_{1}

*t*

_{ i }, Equation (4) reduces to previously known expressions (equation 5.253 in van den Bos

^{10}).

### Validation procedure

We tested the different fitting procedures on simulation data for our four prototype systems (generated as described in Section D in Supplementary Information). Estimated parameters, ${\stackrel{\u02c6}{\theta}}_{a}$, were compared to their known exact values ${\theta}_{a}^{\ast}$ (see Section C in Supplementary Information). For BM, the MSD behaves as 〈[** x**(

*t*) −

**(0)]**

*x*^{2}〉 =

*f*(

*θ*,

*t*) =

*θ*

_{1}

*t*. The corresponding expression for FBM and CTRW is $\u3008{[\mathit{x}(t)-\mathit{x}(0)]}^{2}\u3009=f(\mathit{\theta},t)={\theta}_{1}{t}^{{\theta}_{2}}$. For DHO (at critical damping and with the initial conditions

*x*(0) =

*x*

_{0}and

*v*(0) = 0), the mean position has the form 〈

*x*(

*t*)〉 =

*f*(

*θ*,

*t*) =

*x*

_{0}(1 +

*θ*

_{1}

*t*)exp(−

*θ*

_{1}

*t*).

For validating the WLS-ICE estimator for Δ_{
ab
}, we generated *S* simulation sets (with *S* = 500) each consisting of *M* trajectories. Using these *S* × *M* trajectories, we obtained *S* number of parameter estimates ${\stackrel{\u02c6}{\theta}}_{a}$. From these *S* estimates we calculate the covariance Δ_{
ab
} (using sample estimators), which then serves as true Δ_{
ab
} (“ground truth”). This true Δ_{
ab
} is then compared to estimates based on the WLS-ICE error formula, Equation (4) (which requires only one set of simulations), and the corresponding error estimates for WLS and CCM.

### Code availability

Computer codes (Python, Octave/MATLAB, and Lisp) which performs the associated fitting (determining ${\stackrel{\u02c6}{\theta}}_{a}$) and error estimation (calculating ${\stackrel{\u02c6}{\mathrm{\Delta}}}_{ab}$), using a set of measured observables for different trajectories and at different times as input, is freely available under the gnu General Public License (gpl)^{45} at http://cbbp.thep.lu.se/activities/wlsice/.

## Results

Our first test of the fitting methods involve comparing histograms of fitted parameters for our four prototype systems (the number of trajectories, *M*, and number of sampling times, *N*, were kept fixed). For both CCM and WLS the *S* fitted values of a given parameter were binned to a histogram, see Fig. 1, and compared to a Gaussian centered on the mean of the estimated parameters with a variance from the average of the error estimates, using either the WLS-ECE or WLS-ICE method. For WLS, the histogram of fitted parameters is centered close to the true value (see also Figure S3 in Supplementary Information). However, only the WLS-ICE method gives a correct error estimation, Equation (4), as the predicted width of the WLS-ECE method, see Section B in Supplementary Information, is much too narrow. Clearly, the new error estimation of the WLS-ICE method performs extremely well. By contrast, the WLS-ECE method does not provide correct errors of the estimated parameters; this result extends beyond the chosen parameters for (*N*, *M*) in Fig. 1, and holds true under rather general conditions, see Fig. 2 (the exception is the prefactor for CTRW for very small *M*). Notice that while the parameters from the WLS-ICE and WLS-ECE methods are centered on the analytical prediction, this is not true for parameters from the CCM method, which show a strong bias (Fig. 1) for BM, FBM and CTRW (not for DHO). Thus, the WLS-ICE is the only method which yields an acceptable bias and correct error estimation for all model systems. Note that for the ensemble size used in Figure S2 in Supplementary Information, the distribution of fitted parameters is well described by a Gaussian, see Section F in Supplementary Information for a discussion on this topic. For a smaller ensemble size there are deviation from a Gaussian distribution, see Figure S2 in Supplementary Information, in particular for the prefactor for CTRW. From Fig. 2 we notice that the variance in the estimated parameter does not approach zero as *N* → ∞. Hence, the only way to decrease the variance in estimated parameters further is to increase *M* (the WLS estimator is consistent with respect to *M*).

As we have seen (Fig. 1), the CCM method gives a pronounced bias in the parameter estimate for a specific choice of the number of sampling times *N* and trajectories *M* for BM, FBM and CTRW systems, but not for DHO. In order to understand the generality of these findings, we numerically quantified the bias for an extended range of (*N*, *M*) values, and find that the pronounced bias for BM, FBM and CTRW (and lack of bias for DHO) is rather general, see Figure S3 in Supplementary Information. In Section E in Supplementary Information we investigate the expected bias for the CCM method further by analytical means. Indeed, we find that the parameter estimate from CCM fitting is unbiased for DHO. Mathematically, this result follows from the fact that the observable (mean position) used for the fitting is a linear function of the noise (this is in contrast to BM, FBM and CTRW, where the squared displacements are used as relevant observables). For BM, our analytical calculation in Section E in Supplementary Information shows that for large *N* the bias for CCM fitting becomes $\u3008\stackrel{\u02c6}{\theta}\u3009={\theta}^{\u204e}+DG\left(N\right)/M$, where *G*(*N*) ≈ −8*N*/(ln*N* + *γ* + 2ln2) and *γ* ≈ 0.5772 is the Euler-Mascheroni constant. Thus, with increasing number of sampling points *N*, the bias increases as *N*/ln*N* (see Figure S3 in Supplementary Information). The bias for CCM appears also in the FBM and CTRW systems, as seen in Figs 1 and S3 in Supplementary Information. A similar calculation for the WLS parameter estimate, see Section E in Supplementary Information, yields only a minor, essentially *N*-independent, bias with *G*(*N*) = −4(1 − 1/*N*) for BM.

In order to further investigate practical implications of the pronounced bias for CCM fitting, as well as other known issues with the CCM method^{14,15}, we quantified in what parts of phase space (*N*, *M*) the CCM fitting and WLS-ICE provides “acceptable” (see below) parameter estimation, see Fig. 3. First, we find that for large *N* and moderate to small *M*, the sample estimate for the covariance matrix ** C** is ill-conditioned (the condition number is larger than the machine precision). In practice this means that it cannot be numerically inverted, as required in the CCM parameter estimation procedure, without uncontrollable numerical errors. Second, for parts of phase space where ill-conditioness is not an issue, we, rather generously, defined an acceptable fit as one where the bias is smaller than 10% (compared to the analytic value, ${\theta}_{a}^{\u204e}$). We find that for BM, FBM and CTRW there is indeed a thin region of the (

*N*,

*M*)-phase space (large

*M*and small

*N*) where CCM works. For DHO, the bias effect is negligible, as previously noted. However, the ill-conditioness issue is as pronounced for DHO as for BM, FBM and CTRW. In contrast, for WLS ill-conditioness is not a problem (no matrix inversion is required in this procedure), and the bias in the parameter estimation is acceptable for most parts of the phase space. The bias inherent in the CCM method (for observables which are not linear functions of the noise (MSD for BM, FBM and CTRW)) can be reduced by applying the common jackknife procedure

^{46}, which removes bias terms proportional to 1/

*M*, see Section G in Supplementary Information. By applying the (first-order) jackknife procedures to BM, FBM and CTRW (Fig. 3), we find that the bias is reduced which expands somewhat the region of the phase space where the CCM method may be used reliably. Note that the computational time is a factor

*g*(i.e., the number of groups into which the trajectories are pooled) larger for the first-order jackknife procedure compared to the non-jackknife case. Finally, the jackknifing procedure can be extended to remove higher order bias terms (proportional to $\mathrm{1/}{M}^{n}$, with

*n*= 2,3,…)

^{46}. However, for the present case there is no guarantee that these higher order terms have this functional form with respect to

*M*, see Section E in Supplementary Information. Also, our results show that the second-order jackknife increased, rather than decreased, the bias in the parameter estimations for most parts of the phase spaces (Fig. 3). For BM, Figure S4 in Supplementary Information indicates that the reason for this is that the third order term (term proportional to 1/

*M*

^{3}) is generally larger in amplitude (but of opposite sign) than the second order one. Higher order bias reduction comes at a computational price, since the number of numerical evaluations required for second order jackknife is

*g*(

*g*+ 1)/2 times that of non-jackknifed parameter estimation. Due to these findings and the lack of a formal functional form for the bias, beyond the 1/

*M*term (see above), we do not recommend applying the jackknife procedure beyond first order. Finally, we point out that the new error estimation formula, Equation (4), remains valid also for jackknifed parameters, see Section G in Supplementary Information.

In Figure S5 in Supplementary Information we investigated the “goodness of fit” for the WLS and CCM procedures using a standard *R*^{2} metric (see Section I in Supplementary Information). Examples of fitted curves are found in Figure S6 in Supplementary Information. A good fit is characterized by *R*^{2} ≈ 1. We find that, in this sense, the new method provides “good” fits. In contrast, the CCM method provides “bad” fits for BM, FBM and CTRW with ${R}^{2}\ll 1$ for large *N*. We point out that for the present type of data, *R*^{2} is only a heuristic goodness-of-fit metric — its distributional properties are not known for general fit functions and correlated data.

When computational times are not a concern, error estimation using bootstrap resampling (or the related jackknife error estimation procedure) are common method (see Section H in Supplementary Information)^{47}. We here find that bootstrap resampling performs as well as WLS-ICE in general for our four models (jackknife error estimation is slightly worse), see Figure S7 in Supplementary Information. Thus, our numerical results indicate that for the type of observables and fit functions used in our model systems, the bootstrap can be used for calculating the variance for parameters estimated through *χ*^{2} minimization. However, we point out that such resampling techniques require us to repeat the *χ*^{2} minimization several (herein, 100) times (the WLS-ICE method requires only one *χ*^{2} minimization). Such minimization can be computationally costly, especially for the case when the number of unknown parameters is large. Moreover, one must bear in mind that the bootstrap method is in general a heuristic method (there are cases when it does not apply^{47}).

As a final alternative to the WLS-ICE method, we now briefly turn to error estimation using subsampling^{43}. Subsampling refers to the method of choosing sampling times sufficiently sparsely in order to make the data points essentially uncorrelated (the “brute force” method in Figure S1 in Supplementary Information is an extreme case of subsampling where only one data point per trajectory is kept). After subsampling, error analysis is performed using standard error analysis for independent data. In order to properly choose *N* within this method, *N* is systematically decreased until the variance saturates to a constant, which is assumed to be the true variance^{43}. Notice for stationary time series, rather than reducing the number of sampling times, one can make full use of the data through the blocking method^{42}. For non-stationary processes the blocking method cannot be used, however. Figure 2 shows how estimated errors from our WLS-ECE and WLS-ICE analyses depend on the number of data points used, *N*. We find that temporal correlations are so strong that the WLS-ECE method underestimates the errors down to very small *N*. Moreover, finding a sufficiently small *N* is difficult, since the error does not in general saturate to a constant level as *N* is reduced. These problems are circumvented by instead using the error estimation from the WLS-ICE method (i.e., using Equation (4) instead of the WLS-ECE equations in Section B in Supplementary Information).

As a final test of our method, we now turn to “real world” data. To that end, we use particle tracking data used in a competition for testing particle tracking software where 14 teams world-wide participated^{48}. We choose to analyze this data set for two reasons. First, it served as standard benchmark data within the particle tracking community. Second, since these movies are based on noisified and pixelated simulations (aiming to mimic actual experimental data), we know the values of the underlying model parameters. We used their Supplementary Videos 1 (medium particle density), 5 (low particle density) and 6 (high particle density). All these movies correspond to BM of vesicles for which the expected MSD for the data sets are 〈[** x**(

*t*) −

**(0)]**

*x*^{2}〉 =

*f*

_{BM}(

*θ*,

*t*) =

*θ*

_{1}

*t*, with

*θ*

_{1}= 2

*dD*= 8. For particle detection in the movies and linking of particle positions into trajectories we used Method 1

^{48}, i.e., the tracking method described by Sbalzarini

*et al*.

^{49}, and implemented as the ImageJ plugin “Particle Tracker” by the MOSAIC group

^{50}. Parameter settings for the plug-in are listed in Section J in Supplementary Information. For each video we extracted trajectories which were subsequently cut into trajectories consisting of 7 discrete process times (there is no memory in BM, so the start time is inessential). Notice that for the higher particle density, fewer sufficiently long trajectories were produced as compared to the low density scenario (values for

*M*are listed in Table 1). We subsequently divided the trajectories for each movie into two data sets each with

*M*trajectories. For the fitting procedures the first process time point,

*t*

_{0}= 0, was discarded (since at

*t*

_{0}the position is precisely known, the variance = 0 and can not be used as a weight in Equation (2)), thus leaving us with

*N*= 6 sampling times. Results for the estimated parameters, ${\stackrel{\u02c6}{\theta}}_{1}$ and associated standard deviation, $\stackrel{\u02c6}{\sigma}$ are found in Table 1. We notice that the CCM method fails at predicting the correct parameter for high and medium particle densities. This finding is simply due to the smaller ensemble size for these cases which, in turn, is a result of the tracking software’s inability to track and link particles in high and medium density settings. Comparing the WLS-ECE and WLS-ICE method, we see that the WLS-ECE underestimates the error by factors ≈ 2 for all movies. While, this underestimation may seem minor it will affect conclusions drawn from particle tracking data (see discussion in Introduction), in particular it is noteworthy that for the WLS-ECE method only 2 out of 6 estimates fall within 2

*σ*(confidence level 95%) of the expected result (=8). In contrast, for the WLS-ICE all six observed parameter estimations for

*θ*

_{1}fall within 2

*σ*of the expected value.

Let us finally briefly discuss how well one is expected to be able to estimate a parameter based on experimental/simulation data. For model matching procedures (see Introduction), the Cramer-Rao bound is useful by providing an expression for the smallest possible variance in the estimated parameter^{10}. For the case of BM, optimal estimators (i.e., estimators which reach the Cramer-Rao bound) based on the measured displacements have been derived for model matching type fitting^{19,20,21}. For function fitting, the question is rather whether an optimal cost function, i.e., an optimal weight matrix ** R**, can be found (see Equation (2)). If the covariance matrix for the process is independent of the inferred parameters (up to a proportionality constant), and for linear fit functions, then the generalized least squares method can be shown to be optimal among unbiased WLS methods

^{51}. Since the generalized least squares method requires as input the inverse of the true covariance matrix, it can be viewed as a hybrid method in between model matching and function fitting. In Figure S8 in Supplementary Information we show results for the generalized least squares for BM (we use the term BMALS – Brownian motion adapted least squares) where we see that, indeed, the variance in estimated parameter value is smaller for BMALS as compared to WLS-ICE, although the difference is not dramatic. Also notice that for

*M*and

*N*values where the CCM “works” (acceptable bias, see Fig. 3) the variance in estimated parameters for CCM and BMALS agree, as it should.

## Discussion, Conclusion and Outlook

A common task in many fields of science is that of fitting a model to the time-evolving mean of some observable. Since fluctuations around observed mean values, calculated based on trajectories, are in general correlated in time, the error estimates provided by a “standard” weighted least squares (WLS-ECE) fit can be more than one order of magnitude too small, see Fig. 2. Further, the correlated chi-square method (CCM), involving numerical inversion of a noisy covariance matrix, often show numerical instabilities (ill-conditioning) or a strong bias in the fitted parameters, see Fig. 3. To overcome these problems, we derived a new error estimation formula, see Equation (4), for weighted least squares fitting, which does not require inversion of a noisy covariance matrix. With this formula at hand, a simple, yet accurate, function fitting procedure, WLS-ICE, can be followed: (A) perform a weighted least squares fit to the data, (B) use the new formula to estimate the errors. We demonstrated on four simulated prototype systems that the WLS-ICE method provides robust results, with a negligible bias in the fitted parameters and accurate error estimates. Our method’s estimated errors are comparable to errors estimated using bootstrap and jack-knife resampling for the four model systems. A strength of our method is that the fitting procedure does not have to be repeated multiple times.

We separated between two types of parameter estimation procedures: model matching where a full stochastic model is matched to the data, and function fitting in which a full stochastic model is not known and one rather seeks to fit a function to the chosen ensemble-averaged observables. The weighted least-squares method is a procedure of function fitting type.

We have in this study not discussed methods for dealing with experimental errors, such as missing data etc. Such errors depend on the experimental setup and typically have to be dealt with in different ways depending on setup. For the single-particle tracking field (one of the application fields of our results), two major sources of experimental errors are: effects due to the finite size of pixels in cameras used to record the trajectory and motional blur effects (in a single time frame, a fluorescent molecule moves while being imaged). Methods for correcting these types of errors are discussed by Savin *et al*.^{52}, Martin *et al*.^{53}, Berglund^{19} and Calderon^{54}.

Parameter estimation through *χ*^{2} minimization is ubiquitous throughout many fields of science, and we hope that our method and publically available software will be found useful in these fields.

## Additional information

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- 1.
Saxton, M. J. Single-particle tracking: connecting the dots.

*Nature Methods***5**, 671–672 (2008). - 2.
Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel.

*Nature***439**, 462–465 (2006). - 3.
de Souza, N. Pulling on single molecules.

*Nature Methods***9**, 873–877 (2012). - 4.
Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines.

*Reports on Progress in Physics***75**, 126001 (2012). - 5.
Jarzynski, C. Nonequilibrium equality for free energy differences.

*Physical Review Letters***78**, 2690 (1997). - 6.
Kou, S. & Xie, X. S. Generalized Langevin equation with fractional gaussian noise: subdiffusion within a single protein molecule.

*Physical Review Letters***93**, 180603 (2004). - 7.
Szymanski, J. & Weiss, M. Elucidating the origin of anomalous diffusion in crowded fluids.

*Physical Review Letters***103**, 038102 (2009). - 8.
Rothe, H. J.

*Lattice gauge theories: an introduction, 4th ed*., vol. 74 (World Scientific, 2012). - 9.
Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.

*Numerical Recipes 3rd Edition: The Art of Scientific Computing*(Cambridge University Press, New York, NY, USA, 2007), 3rd edn. - 10.
Van den Bos, A.

*Parameter estimation for scientists and engineers*(John Wiley & Sons, 2007). - 11.
Sivia, D. & Skilling, J.

*Data analysis: a Bayesian tutorial*(OUP Oxford, 2006). - 12.
Gottlieb, S., Liu, W., Renken, R. L., Sugar, R. L. & Toussaint, D. Hadron masses with two quark flavors.

*Physical Review D***38**, 2245–2265 (1988). - 13.
Michael, C. Fitting correlated data.

*Physical Review D***49**, 2616–2619 (1994). - 14.
Seibert, D. Undesirable effects of covariance matrix techniques for error analysis.

*Physical Review D***49**, 6240–6243 (1994). - 15.
Yoon, B., Jang, Y.-C., Jung, C. & Lee, W. Covariance fitting of highly-correlated data in lattice QCD.

*Journal of the Korean Physical Society***63**, 145–162 (2013). - 16.
Meroz, Y. & Sokolov, I. M. A toolbox for determining subdiffusive mechanisms.

*Physics Reports***573**, 1–29 (2015). - 17.
Höfling, F. & Franosch, T. Anomalous transport in the crowded world of biological cells.

*Reports on Progress in Physics***76**, 046602 (2013). - 18.
Norregaard, K., Metzler, R., Ritter, C. M., Berg-Sørensen, K. & Oddershede, L. B. Manipulation and motion of organelles and single molecules in living cells.

*Chemical Reviews***117**, 4342–4375 (2017). - 19.
Berglund, A. J. Statistics of camera-based single-particle tracking.

*Physical Review E***82**, 011917 (2010). - 20.
Michalet, X. & Berglund, A. J. Optimal diffusion coefficient estimation in single-particle tracking.

*Physical Review E***85**, 061916 (2012). - 21.
Vestergaard, C. L., Blainey, P. C. & Flyvbjerg, H. Optimal estimation of diffusion coefficients from single-particle trajectories.

*Physical Review E***89**, 022726 (2014). - 22.
Persson, F., Lindén, M., Unoson, C. & Elf, J. Extracting intracellular diffusive states and transition rates from single-molecule tracking data.

*Nature Methods***10**, 265–269 (2013). - 23.
Monnier, N.

*et al*. Inferring transient particle transport dynamics in live cells.*Nature Methods***12**, 838–840 (2015). - 24.
El Beheiry, M., Dahan, M. & Masson, J.-B. Inferencemap: mapping of single-molecule dynamics with bayesian inference.

*Nature Methods***12**, 594–595 (2015). - 25.
Robson, A., Burrage, K. & Leake, M. C. Inferring diffusion in single live cells at the single-molecule level.

*Phil. Trans. R. Soc. B***368**, 20120029 (2013). - 26.
Krog, J. & Lomholt, M. A. Bayesian inference with information content model check for Langevin equations.

*Physical Review E***96**, 062106 (2017). - 27.
Gershenfeld, N. A.

*The nature of mathematical modeling*(Cambridge university press, 1999). - 28.
Metzler, R. & Klafter, J. The random walk’s guide to anomalous diffusion: a fractional dynamics approach.

*Physics Reports***339**, 1–77 (2000). - 29.
Pigeon, S., Fogelmark, K., Söderberg, B., Mukhopadhyay, G. & Ambjörnsson, T. Tracer particle diffusion in a system with hardcore interacting particles.

*Journal of Statistical Mechanics: Theory and Experiment***2017**, 123209 (2017). - 30.
Mehrer, H. & Stolwijk, N. A. Heroes and highlights in the history of diffusion.

*Diffusion Fundamentals***11**, 1–32 (2009). - 31.
Bloch, S. C.

*Introduction to Classical and Quantum Harmonic Oscillators*(John Wiley & Sons, 2013). - 32.
Bouchaud, J.-P. & Sornette, D. The black-scholes option pricing problem in mathematical finance: generalization and extensions for a large class of stochastic processes.

*Journal de Physique I***4**, 863–881 (1994). - 33.
Yuan, N., Fu, Z. & Liu, S. Extracting climate memory using fractional integrated statistical model: A new perspective on climate prediction.

*Scientific Reports***4**(2014). - 34.
Barkai, E., Garini, Y. & Metzler, R. Strange kinetics of single molecules in living cells.

*Physics Today***65**, 29 (2012). - 35.
Tsai, C.-C. Slip, stress drop and ground motion of earthquakes: A view from the perspective of fractional Brownian motion.

*Pure and Applied Geophysics***149**, 689–706 (1997). - 36.
Metzler, R. & Klafter, J. The restaurant at the end of the random walk: recent developments in the description of anomalous transport by fractional dynamics.

*Journal of Physics A: Mathematical and General***37**, R161 (2004). - 37.
Weigel, A. V., Simon, B., Tamkun, M. M. & Krapf, D. Ergodic and nonergodic processes coexist in the plasma membrane as observed by single-molecule tracking.

*Proceedings of the National Academy of Sciences***108**, 6438–6443 (2011). - 38.
Machta, B. B., Chachra, R., Transtrum, M. K. & Sethna, J. P. Parameter space compression underlies emergent theories and predictive models.

*Science***342**, 604–607 (2013). - 39.
Kepten, E., Bronshtein, I. & Garini, Y. Improved estimation of anomalous diffusion exponents in single-particle tracking experiments.

*Physical Review E***87**, 052713 (2013). - 40.
Metzler, R., Jeon, J.-H., Cherstvy, A. G. & Barkai, E. Anomalous diffusion models and their properties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking.

*Physical Chemistry Chemical Physics***16**, 24128–24164 (2014). - 41.
Transtrum, M. K., Machta, B. B. & Sethna, J. P. Why are nonlinear fits to data so challenging?

*Physical Review Letters***104**, 060201 (2010). - 42.
Flyvbjerg, H. & Petersen, H. G. Error estimates on averages of correlated data.

*The Journal of Chemical Physics***91**, 461–466 (1989). - 43.
Berg, B. A. & Billoire, A.

*Markov chain Monte Carlo simulations*(Wiley Online Library, 2008). - 44.
Van Kampen, N. G.

*Stochastic processes in physics and chemistry*, vol. 1 (Elsevier, 1992). - 45.
gnu General Public License. http://www.gnu.org/licenses/gpl.html.

- 46.
Miller, R. G. The jackknife — a review.

*Biometrika***61**, 1–15 (1974). - 47.
Efron, B. & Tibshirani, R. J.

*An introduction to the bootstrap*(CRC press, 1994). - 48.
Chenouard, N.

*et al*. Objective comparison of particle tracking methods.*Nature Methods***11**, 281 (2014). - 49.
Sbalzarini, I. F. & Koumoutsakos, P. Feature point tracking and trajectory analysis for video imaging in cell biology.

*Journal of Structural Biology***151**, 182–195 (2005). - 50.
Sbalzarini, I. F. & Koumoutsakos, P. Particletracker http://imagej.net/Particle_Tracker. Version November 2016 (2016).

- 51.
Kariya, T. & Kurata, H.

*Generalized least squares*(John Wiley & Sons, 2004). - 52.
Savin, T. & Doyle, P. S. Static and dynamic errors in particle tracking microrheology.

*Biophysical Journal***88**, 623–638 (2005). - 53.
Martin, D. S., Forstner, M. B. & Käs, J. A. Apparent subdiffusion inherent to single particle tracking.

*Biophysical Journal***83**, 2109–2117 (2002). - 54.
Calderon, C. P. Motion blur filtering: A statistical approach for extracting confinement forces and diffusivity from a single blurred trajectory.

*Physical Review E***93**, 053303 (2016).

## Acknowledgements

We are grateful to Bo Söderberg and Björn Linse for fruitful discussions. T.A. was supported by the Swedish Research Council (grant nos 2009–2924 and 2014–4305). K.F. was supported by the Swedish Research Council (grant no. 2010–5219). M.A.L. acknowledges funding from the Danish council for Independent Research-Natural Sciences (FNU), grant number 4002-00428B.

## Author information

### Affiliations

#### Computational Biology and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, 223 62, Lund, Sweden

- Karl Fogelmark
- , Anders Irbäck
- & Tobias Ambjörnsson

#### 2Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark

- Michael A. Lomholt

### Authors

### Search for Karl Fogelmark in:

### Search for Michael A. Lomholt in:

### Search for Anders Irbäck in:

### Search for Tobias Ambjörnsson in:

### Contributions

M.A.L. and T.A. conceived the idea of the project. All authors contributed to the conceptual design of the WLS-ICE method. K.F. performed the simulations and wrote the analysis software supervised by T.A. K.F. prepared all figures. T.A. and K.F. wrote the manuscript with help from A.I. and M.A.L. T.A. derived the new error estimation formula (with and without jackknife). M.A.L. derived the bias correction prediction for BM with input from K.F. and T.A. A.I. suggested the use of jackknife for CCM fitting. T.A. coordinated the project.

### Competing Interests

The authors declare no competing interests.

### Corresponding author

Correspondence to Tobias Ambjörnsson.

## Electronic supplementary material

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.