Fingerprint resampling: A generic method for efficient resampling

Mestdagh, Merijn; Verdonck, Stijn; Duisters, Kevin; Tuerlinckx, Francis

doi:10.1038/srep16970

Download PDF

Article
Open access
Published: 24 November 2015

Fingerprint resampling: A generic method for efficient resampling

Merijn Mestdagh¹,
Stijn Verdonck¹,
Kevin Duisters^1,2 &
…
Francis Tuerlinckx¹

Scientific Reports volume 5, Article number: 16970 (2015) Cite this article

1587 Accesses
3 Citations
Metrics details

Subjects

Abstract

In resampling methods, such as bootstrapping or cross validation, a very similar computational problem (usually an optimization procedure) is solved over and over again for a set of very similar data sets. If it is computationally burdensome to solve this computational problem once, the whole resampling method can become unfeasible. However, because the computational problems and data sets are so similar, the speed of the resampling method may be increased by taking advantage of these similarities in method and data. As a generic solution, we propose to learn the relation between the resampled data sets and their corresponding optima. Using this learned knowledge, we are then able to predict the optima associated with new resampled data sets. First, these predicted optima are used as starting values for the optimization process. Once the predictions become accurate enough, the optimization process may even be omitted completely, thereby greatly decreasing the computational burden. The suggested method is validated using two simple problems (where the results can be verified analytically) and two real-life problems (i.e., the bootstrap of a mixed model and a generalized extreme value distribution). The proposed method led on average to a tenfold increase in speed of the resampling method.

Calibration after bootstrap for accurate uncertainty quantification in regression models

Article Open access 20 May 2022

Principal component analysis

Article 22 December 2022

Techniques to produce and evaluate realistic multivariate synthetic data

Article Open access 28 July 2023

Introduction

Resampling methods refer to a variety of different procedures in which a Monte Carlo method is used to create a large number of resampled versions of the original data. Well known examples include bootstrapping¹, permutation tests and various sorts of cross-validation. Commonly, resampling methods are used to assess the uncertainty of an estimator, to estimate tail area probabilities for hypothesis testing, to assess prediction errors or to do model selection².

Broadly speaking, all resampling methods follow the same general scheme. First, the original data set is resampled multiple times. Resampling can be done directly on the original data set or through a statistical model that is first fitted to the data. In a next step, a model is estimated for the resampled data. Finally, the results of the estimated models are combined to calculate, for example, the uncertainty of an estimator of interest (e.g., confidence intervals and predictive validity obtained using bootstrapping and cross-validation, respectively).

A major reason for the popularity of these resampling methods stems from their flexibility^3,4,5,6,7. They allow a researcher to compute uncertainty assessments or hypothesis tests in situations where no analytical solution is available or where some assumptions of the analytical method are not justified (e.g., deviations from normality).

Unfortunately, even in these times of abundant computer power, a main disadvantage of these resampling methods is the computational burden. This computational burden is specifically present when for every regenerated data set, an iterative method is needed to maximize a likelihood or to minimize a squared error. If such an optimization is costly, the resampling method may become practically unfeasible.

As an illustration of the problem, one may think of the calculation of a bootstrap based confidence interval (CI) for a variance component in a linear mixed model⁸. Linear mixed models are commonly used in the life sciences^{9,10,11,12,13}. More specifically, we consider a data set and model described in Samuh et al¹⁴. A sample of 129 depressed individuals is measured 71 times on average (in total there are about 9115 time points) regarding their momentary sadness. The design is unbalanced because not all individuals have the same number of measurements. A longitudinal linear mixed model (see Verbeke and Molenberghs⁸) with the previous emotional state as a predictor is fitted to the data. Both intercept and slope are assumed to vary randomly across patients.

Our prime interest for these data is in the variance components (i.e., the estimates of the between patient variances regarding intercept and slope). Current analytical procedures for computing such a variance component CI are not accurate enough and therefore a bootstrap based CI is pursued as an alternative^15,16,17. If we aim at 2000 bootstrap replications, then repeatedly fitting the linear mixed model would take more than four hours using one core on an Intel Core i7-3770 with 3.4 GHz computer. Such computation times may prohibit the routine use of the bootstrap by applied researchers.

Together with the rise in popularity of resampling methods, there have been continued efforts to alleviate the computational burden of these methods. For example, Efron and Bradley proposed to decrease the number of bootstraps in favorable cases¹⁸. They improved the post processing of the bootstrap estimates, increasing the accuracy for the same number of bootstrap estimates computed. However, most efforts are concentrated on decreasing the computational burden of the model fit on a single resampled data set by, for example, reducing the number of iterations in the optimization method^{9,10,11,12,13,14,15,16,17,18,19,20,21}. Another way to speed up the resampling process is to split the estimating function into smaller, more simple parts. Bootstrapping these parts can lead to faster and more accurate bootstrapping²². More recent improvements focus on tailor-made solutions for specific problems^23,24,25. Unfortunately, none of these methods are generic and easily convertible to other situations.

In this paper, we will propose a generic procedure to decrease the computational burden. As a result, resampling methods can be used for problems that are very time-consuming because an iterative optimization is needed. The key idea of our method is that for all resampled data, a similar task has to be performed because the same model has to be fitted to very similar data. Consequently, when this task is performed repeatedly, we may learn from the past and use this knowledge for future problems. This core idea is very simple and can be applied to any resampling technique.

More specifically, the optimization algorithm needs an initial (or starting) value for each resampled data set. By improving the quality of this initial value, the optimization problem can be significantly speeded up because the closer the initial value to the real optimum, the less iterations the algorithm needs to converge. The method we propose will use information from previous fits to considerably improve the starting points for subsequent problems. In addition, if the initial value is accurate enough, the optimization process may even be omitted.

Results

An optimal choice of the starting values

Two simple examples (involving the exponential distribution and a simple linear regression) will be used to motivate and introduce our method on how to choose the starting values of numerical optimization algorithms in a resampling context optimally. In both examples, there are two artificial aspects that make them simple examples. First, we will make use of the bootstrap in both situations, but the simple and well-understood properties of these models make the use of the bootstrap unnecessary. Second, our proposed method will only provide an advantage if the estimation process makes use of a numerical optimization algorithm. However, for both examples, closed-form estimators are available. Therefore, we will nevertheless make use of an iterative optimization algorithm and act as if the analytical solution is not known. An advantage of working with such simple models with available closed-form expressions is that some results can be checked analytically.

In a first example, we will bootstrap an exponential distribution. The loglikelihood of such an exponential distribution, with data set is given by

For a positive , the maximum of this equation is found by setting its first derivative to zero:

and solving for :

The resulting optimum is the sample average, . The analytical solution will only be used to verify our findings and to provide a better insight in the workings of our method. Let us now simulate a data set, denoted as the original data , with θ = 2, and n = 100. For the bootstrapping process, the data sets are non-parametrically resampled B = 200 times . The loglikelihoods of the original data set and 200 resampled data sets are shown in Fig. 1a.

For every resampled data set , a loglikelihood needs to be optimized to find its maximum, . In an iterative estimation procedure, this means that for every resampled data set , we also need a starting value, . (In the remainder of the paper, we will denote starting values by a tilde.) The closer the starting value to the optimum, the faster the optimization algorithm will converge to the optimum. The difference between the starting value and the optimum for a bootstrap sample, , will be called the initial error and denoted as .

Let us now consider in turn three options we have when choosing these starting values for the bootstrap samples.

Option 1: A naive starting value

In an extreme situation, the researcher barely knows anything about the problem at hand and chooses a common naive starting value, for example, (for all ). The initial error corresponding with this naive starting value, , is shown in Fig. 1b for all resampled data sets. In this case, the starting value is clearly an underestimation of the optimum and the error is always negative.

Option 2: The original optimum as a starting point

An easy improvement can be made at this point by taking the original optimum as a starting value:

The original optimum, , is the optimum that belongs to the fit of the original data set and is typically computed before the resampling process anyway. All the subsequent regenerated data sets, , are related to this original data set. The original data set can be considered central. As a consequence, it is likely that also is central to the optima of the resampled data sets, as is the case in Fig. 1a. This leads to a much less biased starting value (Fig. 1c) and a faster optimization process (Fig. 2a).

Option 3: Learn from the previously resampled data sets

The starting value can be improved upon further. The same function (Eq. 1), based on very similar data sets (all resampled from ), needs to be optimized over and over again. As a result, all the loglikelihoods look very similar, as shown in Fig. 1a.

The core of our method lies in the fact that we try to learn from the past. When one needs to find the estimate for resampled data set , the optima belonging to all previous data sets, to , are already found. If goes to infinity, the optimum of the same data set will already be computed. A better starting value cannot be wished for.

In the more realistic setting of a finite , will not be in the set of . There will however be some data sets that are more similar to than others. One can search now for the most similar data set. Assume is the most similar data set to . The optimum of this similar data set will probably be a good prediction of the optimum belonging to data set . We can take this prediction as a starting value, . One can call this a nearest neighbor prediction:

with d(X_j, X_b) a distance between data sets and . If gets larger, more and more resampled data sets are fitted leading to a larger pool of previous data sets to choose from. The chances of finding a very similar data set become larger, leading to better starting values.

Although intuitive to grasp, measuring the distance between raw data sets is often not very easy. A better strategy is to first summarize the resampled data sets into properties (or statistics) that carry information about the optima. Such summary statistics will be called fingerprints. A fingerprint is a statistic or function that takes a resampled data set as input. The idea is that these statistics provide direct information on the optimum that we look for. In general, we propose to use derivative-based fingerprints. In this exponential distribution example, the first derivative of the loglikelihood for the resampled data set evaluated in the original optimum will suffice. Formally, the fingerprint for the resampled data is defined as:

with being the sample average of resampled data set . Here the first index of refers to the fact that the first order derivative is used.

Once the derivative-based fingerprints ℱ are computed, we can just take the squared Euclidean distance between them to create a distance measure between the resampled data sets: . In Fig. 1d the initial errors in the starting values computed using Equations 5 and 6 are shown. It is shown that the errors become on average considerably smaller when the number of fitted bootstrap samples increases.

A disadvantage of the nearest neighbor approach is that when a new fingerprint becomes available, only the nearest neighbors to this new fingerprint determine the newly assigned starting point . However, we can improve on this practice by using a clever interpolation method. We want to model the relation between the already computed fingerprints and their corresponding optima . This relation can then be used to make a prediction and to associate a good starting point to the new fingerprint .

Because we do not know much about the relation we want to model in general, we choose an interpolation method that is capable of capturing nonlinear relations. Examples of such methods are least squares support vector machines (LS-SVM)^26,27 or multivariate adaptive regression splines (MARS)²⁸. We will show only the results obtained by using LS-SVM, however similar results were obtained by using MARS.

The goal of the nonlinear interpolators is to learn the relation based on the couples to . The estimated relation can be denoted as (where the index indicates that the first data sets are used). In a next step, the optimum of resampled data set can be predicted:

The predicted optimum will then serve as the improved starting point for the resampled data :

Once again, the predictions will become better when the pool of couples of fingerprints and optima to learn the relation from, grows larger. The errors of the starting values computed with this variant, using LS-SVM, are shown in Fig. 1e. This variant, using the interpolation method in combination with first order derivatives for the fingerprint will be called the variant. In the remainder of this paper, we will not discuss the nearest neighbor variant anymore and directly consider this interpolation method as it gives much better results.

In fact, after two bootstrap samples, the predicted optimum actually coincides with the true optimum for the resampled data set. Hence, the interpolation method can, based on the fingerprint, almost exactly predict the optimum. This is not a surprise because in this exceptionally easy toy example, there exists an exact functional relation between the fingerprint and optimum. This can be seen when one combines Equation 6 and the analytical solution of the maximum likelihood estimation (Eq. 3):

Because of the linearity of this relation, only a set of two training points (Fig. 1e) is enough to create a model that delivers almost perfect starting values, instantly leading to fast optimizations (Fig. 2a). The relation between the fingerprints and the optima for all the resampled data sets is shown in Fig. 1f.

The options to choose a starting value, described in the previous paragraphs, are generic and can easily be used for other estimation problems. In a second example, they are applied to a linear regression without intercept and with known variance. In Figs 3 and 2b similar results are shown for options 1 and 2. However, for option 3, a one dimensional fingerprint, using only the first order derivative of the loglikelihood, is not sufficient anymore. The accuracy of the starting values improves if also the second order derivative is used, called the variant. The starting values are now predicted using a two dimensional fingerprint:

As this new relation is nonlinear (Fig. 3f), it takes more bootstrap samples to be learned, which leads to a more gradual decrease in function evaluations, as shown in Fig. 2b.

Bootstrap of a mixed model

In the previous paragraphs, we studied our methods by evaluating the number of function evaluations to find the optimum. However, the purpose of a resampling method such as the bootstrap is usually to compute an uncertainty measure (e.g., the standard error of an estimate or a confidence interval). In this result section, we will estimate the lower bound of a 95% confidence interval using the percentile bootstrap confidence interval method²⁹.

We will apply this bootstrap on a linear mixed model, a model commonly used in the life sciences^{9,10,11,12,13}. The bootstrap of a linear mixed model is of interest in a variety of areas^{14,30,31,32,33,34}, as inference for its random effects is not straightforward^15,16,17.

In this example we will use real-life data from Bringmann et al¹⁴. As stated in the introduction, 129 depressed patients are measured repeatedly 10 times a day for 10 days using the experience sampling method regarding their momentary sadness³⁵. As not all patients respond to every measurement, the data are unbalanced and consists of only 9115 time points. For each patient, sadness at timepoint will be predicted by sadness at timepoint . Both the intercept and the slope are assumed to vary across patients. We will use the bootstrap with bootstrap iterations to estimate the percentile of the variance matrix of these random effects:

The more bootstrap samples are processed, the higher the accuracy of this percentiles will become. However, more bootstrap iterations generally means more function evaluations to estimate the mixed model parameters. Hence, there is an intrinsic trade-off between speed (number of function evaluations) and accuracy. Next, we can study how the different options and variants of our method deal with this speed-accuracy trade-off. Option 1, taking naive starting values, will not be considered. All speed ups will be measured against option 2, taking the original optimum as starting value. Moreover, new variants will be introduced.

Option 4: Bypass the optimization altogether

In the previous section we showed how better starting values can increase the speed of the optimization process. The end result, the accuracy of the bootstrap, will be exactly the same for all these variations. The different starting values will lead to the same optima (assuming an optimization algorithm fit to find the global optimum). However, when the starting values become better and better, they can become practically indistinguishable from the real optima. Such accurate predictions of new optima make the optimization process superfluous for the associated re-sampled data sets. Skipping the optimization process will lead to an even larger speed increase. Several variants will be considered and they are denoted generically as . The index means that partial derivatives of the loglikelihood function up to order are used for the fingerprint. Finite differences (using an extra number of function evaluations) are used to compute these derivatives. The number refers to the ratio between the number of resampled data sets for which the optimization is bypassed compared to the number of resampled data sets for which the optimization algorithm is fully used. For example, if is 3 and 200 bootstrap estimates are found with an optimization algorithm, then 600 predicted estimates are added to create a total set of 800 estimates. However, the total number of estimates never exceeds 2000 (as we have bootstrap iterations). So with the different values of , maximum 200 predicted estimates will be added if already 1800 estimates are found with an optimization process. Note that the variants are the same as the variants of option 3, where the optimization is never bypassed. All different options and variants are summarized in Table 1.

Table 1 Overview of all the variants.

Full size table

The resulting speed up is shown in Fig. 4 and Table 2. The accuracy is shown against the number of function evaluations, i.e. the computational burden. Our method can acquire the same accuracy up to 17 times as fast as the original optimum variant. More generally, for all the model parameters over all accuracies, approximately a tenfold speed up can be acquired. Using finite differences, more function evaluations are needed to generate a fingerprint that includes higher order derivatives. To get low quality predictions, a fingerprint of a lower order () suffices. Therefore, it is faster to use only first order derivatives when only a low accuracy of the percentile is required. When a higher accuracy of this percentile is required, it can be better to use more informative fingerprints, with higher order derivatives, to create better predictions. For parameters and σ_D,12 a fingerprint that uses only first order derivatives is enough. For a medium as well as high accuracy, the variants outperform the variants. However, for the third parameter, , is a better option when a high accuracy is required. The better predictions now outweigh the extra cost of the larger fingerprint. It is also interesting to note that the variant always gives the worst speed up. This means that, for this example, it is always advantageous to bypass the optimization process for at least a part of the resampled data sets.

Table 2 Speed up results for the bootstrap of a mixed model.

Full size table

Bootstrap of a generalized extreme value model

As a second real-life application we will apply fingerprint bootstrapping on a generalized extreme value (GEV) distribution model³⁶. Also for the inference of GEV distributions, bootstrapping is a well-used tool^37,38,39,40. Again we will estimate the lower bound of the 95% confidence interval of the parameters of the GEV distribution.

In De Kooy, The Netherlands, the maximum hourly mean windspeed was measured almost daily between 1906 and 2015⁴¹. In total 39760 maxima where included. The GEV with one covariate (i.e., date of measurement) will be used to model these maxima. As explained in the method section, the estimation of this model leads to a four dimensional optimization problem, with parameters (shape) , (scale), (intercept) and (regression weight).

By counting the number of function evaluations needed to have a medium or high accuracy of the lower bound of the 95% confidence interval, we calculate the speed up of the fingerprint resampling method as shown in Table 3. The speed up of the over the original optimum variation is even bigger as with the mixed model. The reason is that for the GEV distribution, using only the first derivatives of the likelihood as fingerprint, leads to sufficiently good prediction accuracies. Therefore, the variants can be used for the medium as well as the high accuracy. As the number of function evaluations needed to create the fingerprints of these variants is significantly lower, the speed up is bigger. To achieve a high accuracy in the lower bound of the 95% confidence interval, a speed up of about 25 is acquired for all the variables for the variants. For the variants again a speed up of about 10 is reached. For the medium accuracies the variants lead to speed ups around 10. Again, the variants do not perform very well when only a medium accuracy is needed. Too much time is needed to compute the detailed fingerprints, while only a medium accuracy is required.

Table 3 Speed up results for the bootstrap of a GEV distribution.

Full size table

Effect of parallelization on the speed up

Resampling methods earn part of their popularity by the fact they can be computed easily in parallel. Note that this parallel computing is no problem for our speed up. Only if literally every resampled dataset is handled simultaneously on a different computational entity, our speed increase will vanish. As long as parts of the resampling method stay serial, one can learn from previous information and consequently enjoy a speed increase. This is clearly shown in Fig. 5 for the GEV example. Here we show the speed up factor of the variant while reaching a high accuracy. When everything is run on 1 core, we have a speed up of about 25 for every parameter. When more and more cores are used, the speed up decreases. When 2000 cores are used, every core computes one bootstrap. In this case, nothing can be learned from previous bootstrap computations. The speed up factor is now even below one, our method is slower as the normal method as the fingerprint resampling needs time to compute the fingerprints. If 5 cores are used (normal desktop computer), the method still enjoys 97% of the maximum speed up. When 20 cores are used, as is possible on a very high-end scientific computer, the method can reach 90% of the maximum speed up. On a supercomputer, using 100 cores, 47% of the speed up is preserved.

Discussion

The computational burden of resampling methods is oftentimes very large. Clearly, our fingerprint resampling method can decrease the computational cost enormously. The starting point of the fingerprint resampling is that in a resampling scheme, similar computations are done over and over again for very similar data. Ignoring the previous calculations would be a waste of time. By interpolating between structured previous information, predicting future optima and omitting the optimization process, we succeed in increasing the speed approximately by a factor of 10. We have illustrated our method with two simple examples but also with two real-life applications. The bootstrap of the mixed model can now be finished during a coffee break (20 minutes), while it took half a work day before (more than 4 hours).

In this paper, our method is proposed in such a way that it can be used for a broad range of resampling techniques. It can however easily be adapted to create a custom-made solution, increasing its effectiveness for more specific problems. For example, if the percentile has to be computed with a bootstrap, one can actually use a more advantageous, but more specific scheme. One can predict which resampled data sets are included in the <2α% interval with fingerprints and then use an optimization algorithm to find the real optima of these data sets. Using such a scheme, there is almost no error introduced while retaining the computational advantage. In another example, a Newton-Raphson optimization may be used. In such a case our method can easily be combined with -step method of Davidson et al.²¹. One can first do a raw prediction of the optimum and can then refine this optimum with optimization steps.

Resampling methods earn part of their popularity by the fact they can be computed easily in parallel. Note that this parallel computing is no problem for our speed up. Only if literally every resampled dataset is handled simultaneously on a different computational entity, our speed increase will vanish. As long as parts of the resampling method stay serial, one can learn from previous information and consequently enjoy a speed increase.

Not all aspects of our fingerprint resampling method are known at this point. We discuss two of them. First, in the simulations of this paper, we treated all problems as local optimization problems. Problems with multiple local minima form however an even greater computational problem. Therefore we will address this problem in future research. Fingerprints may still be able to point in the right direction of the optima. The relation between the fingerprints and optima may however become very unsmooth if different resampled data sets converge to other kinds of minima.

Second, the final goal of most resampling techniques is to compute an uncertainty measure. It stays however unclear how much error is introduced in this uncertainty measure by predicting a certain amount of optima (the ratio ). Most black-box techniques can provide reliable prediction errors of the optima themselves, but it can be hard to translate those prediction errors to errors in the uncertainty measure. On the other hand, this is not a problem specific to our method. When our speed up is not used whatsoever, one still has to assign the accuracy of the optimization for every resampled data set. Moreover, most resampling techniques introduce errors themselves. For example, one can never exactly know the error that is introduced by only computing a limited number of bootstraps²⁹. Future research has to assess how all these errors are combined in the final uncertainty measure.

In conclusion, the fingerprint resampling method presented in this paper may fuel a more widespread use of resampling methods for complex models, which is now impossible or difficult because of the large computational burden.

Methods

A general description of the fingerprint resampling method

The basic idea of our method was explained in the result section. In this section we will develop our method in a more systematical fashion and indicate that it can be used more generally. For simplicity and completeness, we will consider the same four options as in the result section. However, only the third and fourth options are variants of our fingerprint resampling method.

Option 1: A naive starting value

Starting values are very important for optimization algorithms. They greatly define the speed in which the optimization process converges. In practice, when little information is known about the estimation process, naive starting values can be used (such as zero, one, the unity matrix, etc.). Such a naive starting value may be biased in the sense that it is systematically too high or too low compared to the real optima of the resampled data sets.

Option 2: The original optimum as a starting point

Taking the original optimum, , may lead to a serious improvement over a naive starting value. As the original data set is central to the resampled data sets, the original optimum is likely to be central compared to the optima of the resampled data sets.

In the context of bootstrapping, the estimated bias on an estimator is calculated by

with the optima of the resampled data sets²⁹. Assuming this estimated bias is zero, one has

This means that the original optimum is exactly the mean of the bootstrap optima. As a consequence, if the estimated bias of the estimator is zero, the bias of as a predictor for the optima of the resampled data sets, is also zero. Therefore, when no further information is known about a problem, this original optimum is usually a reasonably good initial value for the subsequent optimizations and it is easy to implement. Using the original optimum as initial value is the most unsophisticated way of using previous information and it is not unique to our procedure^19,20,21. It may already lead to a significant speed increase as shown in the result section.

Option 3: Learn from the previously resampled data sets

In a next phase we will try to create better predictions for the optima of a new resampled data set making use of the previously resampled data and their corresponding optima. The better the prediction, the better the starting value and the less iterations the optimization procedure needs. For this prediction, we need two ingredients. A first ingredient is a concise summary of the data sets into statistics, called fingerprints, which can be linked to the optima of the resampled data. As a second ingredient, we need a method that can be used to learn the relation between the fingerprints and the optima of past data sets.

It is clear that, in order for our method to be efficient, both the fingerprints and the method to learn the relation should place a minimal computational burden on the whole resampling calculation. In the following parts, we will discuss both ingredients. To conclude, we discuss the link with the tuning parameters of the optimization algorithm.

Fingerprints

A fingerprint is a scalar-valued or vector-valued function taking the resampled data set as input and produces one or more statistics as output. (In what follows, we will denote both the function as well as the function values with the term fingerprint.) To use such a fingerprint for the prediction of the optimum, it must contain information on this optimum in a parsimonious manner. Obviously, there is a trade-off between compression rate and retained information. For instance, a trivial fingerprint would be the identity function, which maps a data set onto itself. Such a fingerprint obviously contains all the necessary information to locate the optimum. However, the relation between the fingerprints and the optima is harder to derive when the fingerprint has a high dimensionality. Therefore, the identity function does not qualify as a good fingerprint because there is no information reduction.

The question then becomes what constitutes a good fingerprint. A derivative-based fingerprint is intuitively attractive. The rationale behind a derivative-based fingerprint is that the information contained in derivatives (rate of change, curvature, etc.) in a point near a local optimum will tell us something about the location of the local optimum itself. In fact, the same kind of information is used in several optimization algorithms (e.g., steepest descent and Newton-Raphson). Therefore, we propose to use the derivatives of the objective function of the resampled data set, evaluated in the original optimum, as a fingerprint. Formally, this can be written as follows in the case of maximum likelihood estimation (for a particular resampled data set ):

where is the order up to which the partial derivatives are used. (In case of least squares estimation can be replaced by a sum of squares loss function .) Of course, only the unique elements of the partial derivatives are used for the fingerprint: For example, if equals 2, these are the first order derivatives combined with the upper triangular elements of the Hessian matrix.

The derivatives can be approximated using finite differences if no analytical expressions exist. Obviously, a fingerprint using finite differences gives another incentive to reduce the dimensionality of the fingerprint. The higher the order of the derivative that needs to be estimated, the more function evaluations need to be computed. If is the dimension of the optimum, one needs at least function evaluations if and function evaluations if when finite differences are used.

A more fundamental reason to use derivative-based fingerprints is provided by Taylor’s theorem. If a function can be completely approximated by a Taylor series, all its information can be summarized in its derivatives evaluated in a certain point (e.g., the original optimum). Taking an infinite number of (partial) derivatives () will then lead to a fingerprint with all information needed to uniquely define the optimum of its function. However, it is not necessary to describe the function one wants to optimize, nor its associated data set in statistics. Already in a very compact fingerprint a lot of information can be found about the optimum. As already mentioned, we propose to take a fingerprint with a dimension as low as possible, for modeling reasons. A limited number of derivatives, for example all first-order derivatives in combination with all the unique elements of the Hessian matrix, will suffice in many cases.

It should be noted that a good fingerprint for one model does not necessarily generalize to a good one in another model. Sometimes, the likelihood cannot be evaluated in the original optimum for some data sets (see e.g., the example on the generalized extreme value distribution in the method section), making it impossible to compute the derivatives. Moreover, other ways of summarizing data may work at least as good as the derivative-based fingerprints. For example, computing the moments of a data set can also lead to good results. The sample average in the first example on the exponential distribution would obviously be a good fingerprint. Another possibility is to use summary statistics as they are used in approximate Bayesian computing. It is even possible to automatically generate such summary statistics⁴². In any case, what constitutes a good fingerprint is very problem specific. A covariance is decisive for the optimum in a regression while it cannot be computed in the data set of an exponential distribution. In situations where one has specific prior knowledge about the model, one can use this information to create a better model-specific fingerprint.

Predicting initial values based on fingerprints

Once the fingerprints are computed, the relation between the fingerprints and corresponding optima needs to be modeled. This relation will then be used to predict the optima of subsequent resampled data sets. These predicted optima can then be used as starting points.

The modeling of the relation between fingerprints and optima should satisfy three requirements. First, the modeling should be fast to calculate. The main reason for the prediction of the optima and using these as starting values is to create a significant speed up in the optimization procedure. However, this gain would be wasted in case of a slow modeling procedure. Second, the modeling should be able to describe a nonlinear relation. One cannot expect to only find strictly linear relations between the fingerprints and the optima. Already in the second simple example in the result section (on regression), the relation between optima and fingerprints is nonlinear. Third, we prefer to choose a method which excels in prediction. Both support vector machines (SVM⁴³) and multivariate adaptive regression splines (MARS²⁸) are well known for their generalization quality, which lead to good predictions. For the SVM, we propose to use a specific adaptation, the least squares support vector machines (LS-SVM), which lead to a much faster modeling compared to traditional SVM²⁶. Note that these methods create a multi-input single-output model. This means that one needs one model for every dimension of the parameter that needs to be optimized. Because every dimension of the fingerprint initially has the same importance regarding the predictions, they are all standardized to train the black-box model.

The black-box methods are versatile, flexible and fast to compute. However, there are two specific problems that need to be addressed before the methods can be applied efficiently: constraints and extrapolation.

A first problem that needs to be solved is that some estimation problems call for a constrained optimization. If a parameter is constrained, this can lead to a non-smooth fingerprint-optimum relation. As an illustration, consider Fig. 6 which can be compared to Fig. 3d in the result section. In both figures the fingerprints of a linear regression are plotted against optima. However, in Fig. 6, the optima are found with a constrained optimization (). A situation is shown where the fingerprints below a value of -30 all result in a value of 2 of the parameter . It can be seen that if the constraint is active, then for multiple resampled data sets, different fingerprints will all lead to the same optima, . When at the same time, other resampled data sets lead to an inactive constraint, those resampled data sets will lead to a different fingerprint optima relation. There are now two different relations between the fingerprints and the optima. As a result, the relation between fingerprints and the optima is non-smooth. A black-box technique may have a hard time to learn such a non-smooth relation. For such cases, we propose the following solution. First, the black-box technique is only be estimated with the optima which are not on the boundary. Second, if the black-box technique makes a prediction that violates the constraint, the prediction is adapted to the constraint value.

A second problem is that we need to avoid extrapolation and the black-box methods can only be used to interpolate. However, because we learn sequentially from the resampled data, it is very likely that we encounter for resampled data a fingerprint value that lies outside the range of previously modeled data. To circumvent the extrapolation, we propose the following course of action during the initial phase. First, the fingerprints of all resampled data sets are computed. Then, for each dimension of the fingerprint, the minimum and maximum is searched. For the data sets corresponding with one or more extrema, the optima are computed first. In total, data sets are prioritized, with the dimension of the fingerprint. Second, a fair number of purely random data sets (usually, ) are added and also for these data sets the optima are computed. Postponing the use of the predictions for these initial data sets ensures a stable model over a broad range of fingerprints. To find the optimum, the original optimum is used as a starting value in this initial phase.

Other tuning parameters

The speed of most optimization algorithms can be improved and controlled by tuning parameters (e.g., the initial step length in line search algorithms for gradient-based methods; the size of the initial simplex in the Nelder-Mead method). It can also be useful to control these tuning parameters based on previous optimizations. Important in this control problem is that the uncertainty associated with the starting value should not be neglected. If the starting value is of high quality (because the predicted optimum and the true optimum are close together), the optimization algorithm should initially only search for an optimum in a small area around it. Otherwise, the advantage of a good starting value may disappear. The quality of the starting value can be extracted from the prediction error of the interpolator.

Option 4: Bypassing the optimization altogether

If more and more optima are computed, more and more information becomes available to train and improve the model that estimates the relation between the fingerprints and the corresponding optima. If both the fingerprints and the method to model this relation are chosen well, prediction errors of subsequent optima can become very small. When these predicted optima are used as starting values, the optimization process will only add little accuracy to the estimated optima. The starting values can become practically indistinguishable from the real optima. In such a case, it is not a problem to bypass the optimization process because the accuracy that can be gained is negligible compared to intrinsic (i.e., Monte Carlo) error in the resampling itself^1,21. Therefore, it is unnecessary to know the optima with highest possible precision. The predicted optima, including a small error, can just as well be used as the real optima. This brings the speed increase of our method to a whole new level, the optimization process can now be skipped. As we note in the discussion, it can be hard to estimate how this prediction error exactly influences the accuracy of the uncertainty measure the resampling method computes.

Assessing the performance of the methods

The performance of the methods is assessed from two perspectives. First, the time needed to estimate the parameters for the series of resampled data sets (averaged over a number of replications in order to reduce the error) is evaluated. Second, we assess how the different methods deal with an intrinsic speed-accuracy trade-off: A longer computation time leads to more accurate results, but how does the accuracy of the methods compare given a fixed computational load?

We measure computation time in terms of number of function evaluations (where the function is the objective function that needs to be optimized for every resampled data set, such as the likelihood or least squares loss function). The two computationally most expensive parts of our method (i.e., the time the optimization algorithm needs to converge and the computation time of the derivative-based fingerprints) can be measured in this number of function evaluations.

The method proposed in this paper will be most useful if the optimization process is unbearably slow. Another aspect of our method involves the modeling of the relation between the fingerprints and the optima and does not place a large burden on the computation. It can be neglected because it does not scale with the difficulty of the optimization process. The relative amount of time needed for the interpolator (such as the LS-SVM or MARS) will be close to zero in cases of high computational burden. For example, using one core on an Intel Core i7-3770 with 3.4 GHz, the execution time of the estimation of one LS-SVM is 1.5 seconds with a training set of 200 previous fingerprints and optima.

Another reason not to consider the interpolator in the total computational cost is because it is unnecessary to update the modeled relation between fingerprints and optima after each optimization process. The modeled relation and the quality of the predictions will only significantly change if the training data changes significantly. This will not be the case if only one couple (i.e., fingerprint and associated optimum) is added. We propose to update and remodel the relation only when more information is available. In practice, the modeling of the relation can be stopped if the reduction in prediction error is only marginal. We stop this modeling when more than 1000 optima are found. Other machine learning improvements are possible to further reduce the time needed by this modeling, like online updating and big data solutions²⁷. All these options are however not included to present a much cleaner picture of our method.

In sum, when considering computational time, we opt to neglect the part spent to modeling the relation between fingerprints and optima in the determination of computational time and only express the computation time in number of function evaluations. This gives the opportunity to also study the effects of our method in simpler problems, where the optimization process is very fast (such as for the two simple examples).

Optimization algorithm

The optimization procedure used by fingerprint resampling is not central to the method. In the result section we use the Nelder-Mead method⁴⁴. This algorithm is used for its generality and its popularity⁴⁵. It is derivative free and can be used on almost any convex function. Moreover, it is the default algorithm used by the basic optimizers in Matlab and R (i.e., in Matlab^46,47, and in R⁴⁸) and it is used in the popular function in R to estimate mixed models. The Matlab function is used as basic implementation of this algorithm.

The Nelder-Mead method is an optimization algorithm that uses simplexes with vertices , with the dimension of the parameter vector to optimize. When initializing the algorithm, vertex is set to the starting value, . For example, in the function, the complete initial simplex is then assigned similar as in Algorithm 1.

Algorithm 1

Initialization of the initial simplex as done by , for not equal to zero. is the -th entry of vector .

The initial simplex can be improved upon considerably when more information becomes available. When modeling the relation between the fingerprints and the optima, future optima are predicted but we can also estimate an error of these predictions. For instance, this error can be estimated with a fast leave-one-out cross-validation. The interpolator will assign for dimension of the parameter vector a prediction error, denoted . The assignment of the initial simplex is then changed as indicated in Algorithm 2.

Algorithm 2

Initialization of the initial simplex using the prediction error for each dimension . is the -th entry of vector . is the user-defined value for assessing convergence based on the parameter values.

The convergence criterion used for the Nelder-Mead optimization method is when the maximum distance (infinite norm) between the vertices is smaller than and the maximum difference between their corresponding function values is smaller than . In our applications, both tolerances are set to .

Mathematical models used

In the result section we make use of an exponential distribution, a linear regression, a mixed effects model and a generalized extreme value distribution. The mathematical model of the exponential distribution is already discussed in this result section. Here we will give every necessary detail about the other three models.

Linear regression

In the second simple example in the result section, a linear regression model is used. With data set , regression weight and known variance , its likelihood is given by

Its loglikelihood is given by

with and . For simplicity, only the case where is known and equal to one is considered. The loglikelihood simplifies to

In the result section, synthetic data sets are simulated with , and . For each simulated data set, bootstrap data sets are generated. For each bootstrap sample this loglikelihood is maximized for the single parameter . Its analytical solution is given by

As explained in the result section, an iterative optimization algorithm is used to optimize the loglikelihood.

For variant , a one-dimensional fingerprint is created using the first derivative of the loglikelihood in the original optimum:

For variant , a two-dimensional fingerprint is created by combining this fingerprint with the second order derivative of the loglikelihood in the original optimum:

Both derivatives are computed with finite differences in the result section. As shown in the result section, one cannot create an exact function between fingerprint from Equation 19 and the optimum from Equation 18 (variant ). However, using a two dimensional fingerprint (variant ), using Equations 18, 19 and 20 one can derive the following functional relation:

This relation leads to much better predictions of new optima, as is illustrated in the result section.

Mixed model

In line with Verbeke & Molenberghs⁸, we start with a random sample of units. For each unit , measurements are taken with respect to an outcome variable and these measurements are collected in a vector . Next, the following model is assumed:

with and and independent. The parameter vector contains the fixed effects regression coefficients and is the population variance-covariance matrix. Such a general linear mixed model implies the following marginal model:

The loglikelihood is then given by

with

and the number of random effects. In practice however, it is better to use a differently parametrized loglikelihood^49,50. First, to assure the positive definiteness of this matrix, we use a Cholesky decomposition of ⁴⁹. Second, to reduce the number of parameters of the optimization problem, and can be profiled out⁵⁰. This leaves us with unique parameters, , to optimize. To find the variances of random effects of the mixed model, the resulting optima (both the ones found after an iterative optimization and the ones predicted) have to be translated back to their original parametrization, as used in Equation 24. Note that, due to the Cholesky decomposition, some of these parameters can only have a positive contribution in Equation 24. For example, a parameter is only used as , even when it can only have a positive contribution . This means that we introduce an indirect constraint.

In the data of Bringmann et al., used in the result section, both the intercept and the slope are assumed to vary across patients¹⁴. No restrictions are imposed on the variance matrix of these random effects (except for positive definiteness):

which leads to a three dimensional optimization problem.

To bootstrap this data, we use the so-called case bootstrap^30,31. The case bootstrap is a non-parametric bootstrap which uses the clusters (in our case, the patients), as entities to sample from. The whole bootstrap process is repeated times, each time with new random bootstrap iterations.

In the software package (see the Code availability paragraph), in the exampleFingerprintBootstrap.m file, the fingerprint bootstrap is also applied on a mixed model using artificial data.

Generalized extreme value distribution

For the generalized extreme value distribution with data set

, shape parameter , scale parameter and location parameter , the likelihood is given by

In the data set used in the result section, is given by the maximum windspeed of a certain day and is given by the date (number of days after first measurement). The logarithm of this likelihood is computed using an adapted version of gevlik.m of Matlab⁴⁶. To find the maximum of this loglikelihood, a four dimensional optimization problem is solved using the Nelder-Mead method. As a naive starting value we use moment estimators as proposed by Hosking et al.⁵¹ and a least squares estimation for the and parameters⁵².

It is interesting to note that the likelihood can only be computed if . It is therefore possible that for some data sets, the fingerprint resampling method might need to be adapted. If the original optimum leads to an invalid likelihood for some bootstrap data sets, the original optimum cannot be used as a starting value, nor can the derivatives be calculated to create a fingerprint. In such a case, it is advised to use another sort of fingerprint. For the data set in the result section, there were no such problems.

Speed-accuracy trade-off for the bootstrap

In the result section, we estimate the lower bound of a 95% confidence interval using bootstrap. From a series of bootstrap samples, this lower bound is the percentile, denoted as . Using the percentile bootstrap confidence interval method, this percentile can be estimated as²⁹:

where is the flooring function. Obviously, it is always better to use more than less bootstrap iterations. For the mixed model problem we use bootstrap iterations as the end goal. However, we also monitor as a function of . For any or , and will always be approximations of , which uses an infinite number of bootstrap iterations. Adding bootstrap optima one after the other, our estimate will come closer and closer to . The error will decrease further and further. However, more bootstrap iterations generally means more function evaluations. Hence, there is an intrinsic trade-off between speed (number of function evaluations) and accuracy (, the absolute difference between and ). itself can obviously not be computed. For the bootstrap of the mixed model, it is approximated using , combining all bootstrap iterations of the 99 bootstrap processes. This combination of all bootstrap iterations over all the bootstrap processes is allowed, as each bootstrap process is created using the same original data set and all bootstrap data sets are generated independent and at random. The accuracy can now be computed as:

The exact speed up our method achieves is measured at two different accuracies. First, it is measured at what is called a high accuracy. This is approximately the accuracy of the original optimum variant when the bootstrap process is completed, after bootstrap iterations, . Second, it is measured at what is called a medium accuracy, which has an error twice as big as the high accuracy: .

As explained before, resampled data sets with extreme fingerprints are prioritized (to avoid extrapolation). This means that potentially extreme optima are found first which may disturb the accuracy of for small . We controlled in an ad hoc way for this artefact by assigning weights to these prioritized optima related to the inverse of the original index of their associated data set. Resampled data sets that are prioritized will get a lower weight for small . Note that this correction has no effect at all if all optima (predicted or found by an optimization process) are used. So in a normal implementation, where all optima will always be used, this correction is of no importance. Only for the speed-accuracy trade-off figure (to show the evolution of the accuracy), such a correction must be made.

To estimate the effects of parallelization for a set up with cores, the function evaluations and accuracies are grouped together in batches of bootstrap estimations. The whole batch would be computed simultaneously on the cores. In a parallelization context, bootstrap estimations which are in the same batch cannot learn from each other, only information of bootstrap estimates of previous batches is available. The speed up is recalculated undoing the effects of this information which is now unavailable. For example, if cores are used to compute 2000 bootstrap estimates, all estimations happen in the same batch. Therefore, since no bootstrap estimation can be speeded up using information of any previous calculations, the fingerprint resampling method would be useless in such a case.

Code availability

Our method is implemented in Matlab, the software package and all data necessary to replicate this paper can be found at

http://ppw.kuleuven.be/okp/software/fingerprintresampling/. The original data set used for the mixed model example can also be found at

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0060188¹⁴. The original data set used for the GEV example can be found at

http://www.knmi.nl/climatology/daily_data/selection.cgi⁴¹. Most results were obtained using Matlab 2014a on a supercomputer using 5 nodes (100 processors)⁴⁶. One processor was used to coordinate the 99 other processors. Each processor was used to compute one bootstrap process of 2000 bootstraps, using all possible variants. This is the reason why exactly bootstrap processes are used to illustrate the speed up of our method.

Additional Information

How to cite this article: Mestdagh, M. et al. Fingerprint resampling: A generic method for efficient resampling.. Sci. Rep.5, 16970; doi: 10.1038/srep16970 (2015).

References

B. Efron. The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics, 1982.
T. Hastie, R. Tibshirani & J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition. Springer Science & Business Media, 2009.
F. Persson, M. Lind, C. Unoson & J. Elf . Extracting intracellular diffusive states and transition rates from single-molecule tracking data. Nat. Methods 10, 265–269, 2013. doi: 10.1038/nmeth.2367.
Article PubMed Google Scholar
S. Ramaswamy et al. A molecular signature of metastasis in primary solid tumors. Nature Genet. 33, 49–54, 2003, doi: 10.1038/ng1060.
Article CAS PubMed Google Scholar
I. A. Adzhubei et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249, 2010, doi: 10.1038/nmeth0410-248.
Article CAS PubMed PubMed Central Google Scholar
P. J. Turnbaugh et al. A core gut microbiome in obese and lean twins. Nature 457, 480–484, 2009, doi: 10.1038/nature07540.
Article CAS ADS PubMed Google Scholar
J. Cox & M. Mann . MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372, 2008, doi: 10.1038/nbt.1511.
Article CAS PubMed Google Scholar
G. Verbeke & G. Molenberghs. Linear Mixed Models for Longitudinal Data. Springer, 2000.
X. Zhou & M. Stephens . Genome-wide efficient mixed-model analysis for association studies. Nature Genet. 44, 821–824, 2012, doi: 10.1038/ng.2310 .
Article CAS PubMed Google Scholar
P. Shaw et al. Intellectual ability and cortical development in children and adolescents. Nature 440, 676–679, 2006, doi: 10.1038/nature04513.
Article CAS ADS PubMed Google Scholar
C. Lippert et al. FaST linear mixed models for genome-wide association studies. Nature Methods 8, 833–835, 2011, doi: 10.1038/nmeth.1681.
Article CAS PubMed Google Scholar
Y. Le Maho et al. Rovers minimize human disturbance in research on wild animals. Nature Methods 11, 1242–1244, 2014, doi: 10.1038/nmeth.3173.
Article CAS PubMed Google Scholar
X. Zhou & M. Stephens . Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods 11, 407–409, 2014, doi: 10.1038/nmeth.2848.
Article CAS PubMed PubMed Central Google Scholar
L. F. Bringmann et al. A network approach to psychopathology: New insights into clinical longitudinal data. PLoS ONE 8, e60188, 2013, doi: 10.1371/journal.pone.0060188.
Article CAS ADS PubMed PubMed Central Google Scholar
C. M. Crainiceanu & D. Ruppert . Likelihood ratio tests in linear mixed models with one variance component. J. Roy. Stat. Soc. B. 66, 165–185, 2004, doi: 10.1111/j.1467-9868.2004.00438.x.
Article MathSciNet MATH Google Scholar
M. H. Samuh, L. Grilli, C. Rampichini, L. Salmaso & N. Lunardon. The use of permutation tests for variance components in linear mixed models. Comm. Stat. Theor. Meth. 41, 3020–3029, 2012, doi: 10.1080/03610926.2011.587933.
Article MathSciNet MATH Google Scholar
U. Halekoh & S. Hjsgaard. A kenward-roger approximation and parametric bootstrap methods for tests in linear mixed models the r package pbkrtest. J. Stat. Software. 59, 2014.
B. Efron . More efficient bootstrap computations. J. Am. Stat. Assoc. 85, 79–89, 1990, doi: 10.1080/01621459.1990.10475309.
Article MathSciNet MATH Google Scholar
G. C. Cawley & N. L. C. Talbot . Efficient approximate leave-one-out cross-validation for kernel logistic regression. Mach. Learn. 71, 243–264, 2008, doi: 10.1007/s10994-008-5055-9.
Article Google Scholar
D. W. K. Andrews . Higher-order improvements of a computationally attractive k-step bootstrap for extremum estimators. Econometrica 70, 119–162, 2002, doi: 10.1111/1468-0262.00271.
Article MathSciNet MATH Google Scholar
R. Davidson & J. G. MacKinnon. Bootstrap testing in nonlinear models. Int. Econ. Rev. 40, 487–508, 1999, doi: 10.1111/1468-2354.00026.
Article MathSciNet Google Scholar
F. Hu & J. D. Kalbfleisch. The estimating function bootstrap. Can. J. Stat. 28, 449–481, 2000, doi: 10.2307/3315958.
Article MathSciNet MATH Google Scholar
A. Stamatakis, P. Hoover & J. Rougemont. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771, 2008, doi: 10.1080/10635150802429642.
Article PubMed Google Scholar
A. Kleiner, A. Talwalkar, P. Sarkar & M. I. Jordan. A scalable bootstrap for massive data. J. Roy. Stat. Soc. B. 76, 795–816, 2014, doi: 10.1111/rssb.12050.
Article MathSciNet Google Scholar
D. Zeng & D. Y. Lin . Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363, 2008, doi: 10.1093/biostatistics/kxm034.
Article PubMed MATH Google Scholar
J. Suykens & J. Vandewalle . Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300, 1999, doi: 10.1023/A:1018628609742.
Article Google Scholar
J. Suykens, T. V. Gestel, J. D. Brabanter, B. D. Moor & J. Vandewalle. Least Squares Support Vector Machines. World Scientific Publishing Company, River Edge, NJ, 2002.
J. H. Friedman . Multivariate adaptive regression splines. Ann. Stat. 19, 1–67, 1991.
Article MathSciNet Google Scholar
B. Efron & R. J. Tibshirani. An Introduction to the Bootstrap. CRC Press, 1994.
H.-T. Thai, F. Mentr, N. H. Holford, C. Veyrat-Follet & E. Comets. A comparison of bootstrap approaches for estimating uncertainty of parameters in linear mixed-effects models. Pharmaceut Stat. 12, 129–140, 2013, doi: 10.1002/pst.1561.
Article Google Scholar
S. Ren et al. Nonparametric bootstrapping for hierarchical data. J. Appl. Stat. 37, 1487–1498, 2010, doi: 10.1080/02664760903046102.
Article MathSciNet Google Scholar
S. Greven, C. M. Crainiceanu, H. Kchenhoff & A. Peters . Restricted likelihood ratio testing for zero variance components in linear mixed models. J. Comput. Graph. Stat. 17, 870–891, 2008, doi: 10.1198/106186008×386599.
Article MathSciNet Google Scholar
J. Shang & J. E. Cavanaugh . Bootstrap variants of the akaike information criterion for mixed model selection. Comput. Stat. Data Anal. 52, 2004–2021, 2008, doi: 10.1016/j.csda.2007.06.019.
Article MathSciNet MATH Google Scholar
B. D. Burch . Nonparametric bootstrap confidence intervals for variance components applied to interlaboratory comparisons. J. Agr. Biol. Environ. Stat. 17, 228–245, 2012, doi: 10.1007/s13253-012-0087-9.
Article MathSciNet MATH Google Scholar
N. Bolger, A. Davis & E. Rafaeli . Diary methods: Capturing life as it is lived. Annu. Rev. Psychol. 54, 579–616, 2003, doi: 10.1146/annurev.psych.54.101601.145030.
Article PubMed Google Scholar
J. Beirlant, Y. Goegebeur, J. Teugels & J. Segers. Regression Analysis. In Statistics of Extremes, 209–250. John Wiley & Sons, Ltd, 2004.
H. W. Rust, M. Kallache, H. J. Schellnhuber & J. P. Kropp . Confidence Intervals for Flood Return Level Estimates Assuming Long-Range Dependence. In J. Kropp & H.-J. Schellnhuber, editors, In Extremis 60–88. Springer: Berlin Heidelberg,, 2011.
A. Overeem, T. A. Buishand, I. Holleman & R. Uijlenhoet . Extreme value modeling of areal rainfall from weather radar. Water Resour. Res. 46(9), W09514, 2010, doi: 10.1029/2009WR008517.
Article ADS Google Scholar
D. Panagoulia, P. Economou & C. Caroni . Stationary and nonstationary generalized extreme value modelling of extreme precipitation over a mountainous area under climate change. Environmetrics 25(1), 29–43, 2014, doi: 10.1002/env.2252.
Article MathSciNet Google Scholar
J. Kysel. A Cautionary Note on the Use of Nonparametric Bootstrap for Estimating Uncertainties in Extreme-Value Models. J. Appl. Meteorol. Clim. 47(12), 3236–3251, 2008, doi: 10.1175/2008JAMC1763.1.
Article ADS Google Scholar
Koninklijk Nederlands Meteorologisch Instituut. Daily weather data for the Netherlands. http://projects.knmi.nl/klimatologie/daggegevens/selectie.cgi. Date of access: 25/08/2015.
P. Fearnhead & D. Prangle . Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. Roy. Stat. Soc. B. Met. 74(3), 419–474, 2012, doi: 10.1111/j.1467-9868.2011.01010.x.
Article MathSciNet MATH Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer Science & Business Media, 2000.
J. A. Nelder & R. Mead. A simplex method for function minimization. Comput. J. 7, 308–313, 1965, doi: 10.1093/comjnl/7.4.308.
Article MathSciNet MATH Google Scholar
Jorge Nocedal and Stephen Wright. Numerical Optimization. Springer, 2006.
MATLAB. version 8.3.0.532 (R2014a). The MathWorks Inc., Natick, Massachusetts, 2014.
J. Lagarias, J. Reeds, M. Wright & P. Wright . Convergence properties of the nelder-mead simplex method in low dimensions. SIAM J. Optim. 9, 112–147, 1998, doi: 10.1137/S105262349630347.
Article MathSciNet MATH Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2013.
J. C. Pinheiro & D. M. Bates . Unconstrained parametrizations for variance-covariance matrices. Stat. Comput. 6, 289–296, 1996, doi: 10.1007/BF00140873.
Article Google Scholar
R. Wolfinger, R. Tobias & J. Sall . Computing gaussian likelihoods and their derivatives for general linear mixed models. SIAM J. Sci. Comput. 15, 1294–1310, 1994, doi: 10.1137/0915079.
Article MathSciNet MATH Google Scholar
J. R. M. Hosking . Algorithm AS 215: Maximum-Likelihood Estimation of the Parameters of the Generalized Extreme-Value Distribution. J. Roy. Stat. Soc, C-App. (Applied Statistics) 34(3), 301–310, 1985, doi: 10.2307/2347483.
Article Google Scholar
S. El Adlouni & T. B. M. J. Ouarda . Joint Bayesian model selection and parameter estimation of the generalized extreme value model with covariates using birth-death Markov chain Monte Carlo. Water Resour. Res. 45(6), W06403, 2009, doi: 10.1029/2007WR006427.
Article ADS Google Scholar

Download references

Acknowledgements

The research leading to the results reported in this paper was sponsored in part by the Fund of Scientific Research Flanders by the grant for Merijn Mestdagh, in part by the Belgian Federal Science Policy within the framework of the Interuniversity Attraction Poles program (IAP/P7/06), in part by the grant GOA/15/003 from the KU Leuven and in part by the grant from the Fund of Scientific Research Flanders. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Hercules Foundation and the Flemish Government department EWI. The authors would like to thank Kristof Meers for his help on the use of this supercomputer.

Author information

Authors and Affiliations

University of Leuven, Oude Markt 13, Leuven, 3000, Belgium
Merijn Mestdagh, Stijn Verdonck, Kevin Duisters & Francis Tuerlinckx
Leiden University, Rapenburg 70, EZ Leiden, 2311, The Netherlands
Kevin Duisters

Authors

Merijn Mestdagh
View author publications
You can also search for this author in PubMed Google Scholar
Stijn Verdonck
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Duisters
View author publications
You can also search for this author in PubMed Google Scholar
Francis Tuerlinckx
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.M., K.D. and F.T. wrote the manuscript. S.V. and M.M. worked out the basic concept and the mathematics of the idea of fingerprint resampling. M.M. and K.D. created the fingerprint resampling software and analyzed the results. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Mestdagh, M., Verdonck, S., Duisters, K. et al. Fingerprint resampling: A generic method for efficient resampling. Sci Rep 5, 16970 (2015). https://doi.org/10.1038/srep16970

Download citation

Received: 07 May 2015
Accepted: 22 October 2015
Published: 24 November 2015
DOI: https://doi.org/10.1038/srep16970

This article is cited by

Electromyographic activity of the sternocleidomastoid muscle in infants with bronchopulmonary dysplasia
- Mariana Zucherato D’Arcadia
- Adriano Alves Pereira
- Vivian Mara Gonçalves de Oliveira Azevedo
Research on Biomedical Engineering (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Calibration after bootstrap for accurate uncertainty quantification in regression models

Principal component analysis

Techniques to produce and evaluate realistic multivariate synthetic data

Introduction

Results

An optimal choice of the starting values

Option 1: A naive starting value

Option 2: The original optimum as a starting point

Option 3: Learn from the previously resampled data sets

Bootstrap of a mixed model

Option 4: Bypass the optimization altogether

Bootstrap of a generalized extreme value model

Effect of parallelization on the speed up

Discussion

Methods

A general description of the fingerprint resampling method

Option 1: A naive starting value

Option 2: The original optimum as a starting point

Option 3: Learn from the previously resampled data sets

Fingerprints

Predicting initial values based on fingerprints

Other tuning parameters

Option 4: Bypassing the optimization altogether

Assessing the performance of the methods

Optimization algorithm

Algorithm 1

Algorithm 2

Mathematical models used

Linear regression

Mixed model

Generalized extreme value distribution

Speed-accuracy trade-off for the bootstrap

Code availability

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Electromyographic activity of the sternocleidomastoid muscle in infants with bronchopulmonary dysplasia

Comments

Search

Quick links