Fast and accurate Monte Carlo sampling of first-passage times from Wiener diffusion models

We present a new, fast approach for drawing boundary crossing samples from Wiener diffusion models. Diffusion models are widely applied to model choices and reaction times in two-choice decisions. Samples from these models can be used to simulate the choices and reaction times they predict. These samples, in turn, can be utilized to adjust the models’ parameters to match observed behavior from humans and other animals. Usually, such samples are drawn by simulating a stochastic differential equation in discrete time steps, which is slow and leads to biases in the reaction time estimates. Our method, instead, facilitates known expressions for first-passage time densities, which results in unbiased, exact samples and a hundred to thousand-fold speed increase in typical situations. In its most basic form it is restricted to diffusion models with symmetric boundaries and non-leaky accumulation, but our approach can be extended to also handle asymmetric boundaries or to approximate leaky accumulation.

probability of reaching either boundary, and as a consequence, it over-estimates the first-passage times [e.g. 13 ]. This leads to conflicting requirements on the simulation step-size. On one hand, it should be small to reduce the first-passage time biases. On the other hand, it should be large to speed up the stimulations. This short-coming also exists in higher-order particle trajectory simulation methods 12 , albeit to a lesser degree.
Here, we introduce a new method to sample from the first-passage time density of simple diffusion models that does not share the short-comings of the Euler-Maruyama and related methods. It does not suffer from any biases, as it directly draws exact samples by rejection sampling from the series expansion of the first-passage time density. Furthermore, it is in the order of a hundred to a thousand times faster than simulating the whole particle trajectory with typical step-sizes. Admittedly, the method is more complex than trajectory-based methods, but the author provides implementations in various programming languages on his website. It is already long know that stochastic simulations that use first-passage time solutions can be more efficient than regular-time-step solution methods. They have, for example, already been used for several decades to simulate non-linear chemical reactions [14][15][16] . In the context of Wiener diffusion models, a similar approach has been used previously to draw samples from a diffusion model without drift 17,18 . The addition of a drift required various changes to the method, that we will discuss in more detail further below. A previously developed sampling method for non-zero drifts 19 uses a comparable approach to the one described here, but features various weaknesses (see Discussion) that makes it slower than the method presented here.
As described here, the sampling method only applies to diffusion models with symmetric boundaries around a central particle starting point, and a drift and diffusion variance that remains constant over time. However, it is easily embedded in simulations in which these parameters vary across trials. For example, we can draw samples from a model with a drift that varies across trials [e.g. 1 ] by first drawing this drift and then using our method to sample the first-passage time and boundary for this drift. The same approach makes it applicable to variable boundaries and non-decision times [e.g. 1 ] or the simulation of occasional random lapses 20 . The only restriction is that the boundaries need to remain symmetric around the particle starting point, which makes it unsuitable for situations in which the particle starting point varies [e.g. 1 ]. However, even in such cases, our method can be embedded in an approach that provides samples for diffusion models with asymmetric boundaries 19 .
In what follows, we first derive the method based on rejection sampling from the series expansion of the first-passage time density. This includes describing two variants of this series and their relevant properties, the rejection sampling approach, and the derivation of suitable proposal densities for rejection sampling. Then, we perform simulations to evaluate the best settings of the two free parameters of this method, and demonstrate its superiority when compared to the Euler-Maruyama method. Finally, we relate it to previous work and discuss potential extensions.

Fast and accurate sampling of first-passage time densities
We consider a diffusion model in which the trajectory of a drifting and diffusing particle x(t) is described by the diffusion equation where μ denotes the drift and dW is a Wiener process ( Fig. 1(a)). Initially, the particle is located at x(0) = 0. It terminates its trajectory at some time t > 0 as soon as it reaches one of two boundaries, located at − 1 and 1.
The time at which either boundary is reached is known as the first-passage time. In the following, we describe a fast method to draw samples from both first-passage time and boundary. After that, we describe how the same procedure can be used to draw samples from diffusion models with non-unit boundary locations and non-unit diffusion variance.  First-passage time densities. For the considered diffusion process, the first-passage time densities g s,+ (t) for the upper boundary, and g s,− (t) for the lower boundary can be expressed as an infinite series. The series that will be described are both exact in the infinite limit, but converge to the correct density at different rates, depending on t. The first series is found by the method of images 21 , resulting in is the set of all particle trajectories that reach either of the boundaries for the first time at time t. In the above, g s,0 (t) denotes the infinite series This series is known to converge rapidly for small times 10,17,18 , which is indicated by its subscript ⋅ s for short-time series.
An alternative approach to compute the first-passage time densities is to seek a Fourier series solution 21 , which, after some simplification due to symmetric boundaries, results in with the corresponding infinite series This series converges rapidly for large times 10,17,18 , which is indicated by its subscript ⋅ l for long-time series. A useful property of diffusion models with symmetric boundaries is that their first-passage times are independent of which boundary is reached. To see this, consider Eqs. (2) and (3), which reveal that these joint probabilities are re-scaled versions of each other, whose scaling depends only on the drift μ, but not on time t. As a result, they can be factored into . A consequence of this factorization is that the first-passage time can be sampled independently of which boundary was reached. Furthermore, sampling the boundary corresponds to a draw from a Bernoulli distribution with probability g + . Thus, the rest of this section focuses on how to draw samples from the first-passage time density, using either g s (t) or g l (t).
Rejection sampling from a series expansion. To sample from the first-passage time density, we have to overcome several obstacles. First, it is not possible to draw samples from this density directly. We tackle this problem by rejection sampling, for which it is sufficient to find an easy-to-sample proposal density that upper-bounds the first-passage time target density up to a proportionality constant. Second, such rejection sampling requires the accurate evaluation of the value of this target density for particular times. However, the density is only available as a series expansion, which prohibits such accurate computation, and makes even approximate evaluations computationally expensive. Fortunately, both described series expansions feature a property that makes them suitable for a variant of rejection sampling that only requires the computation of few elements in these series, such that exact samples can be drawn without the use of approximations. Third, each of the two series only features this property for a limited range of first-passage times. In combination, they cover all non-negative times, such that, depending on the time, one or the other can be chosen. However, the series feature qualitatively different properties that Scientific RepoRts | 6:20490 | DOI: 10.1038/srep20490 need to be taken into account when designing suitable proposal densities for rejection sampling. In the below, we handle each of these points in turn.
Rejection sampling is a sampling method in which one draws a sample t * from an easy-to-sample proposal density ∼ ( ) ⁎ t f t that tightly upper-bounds the desired sampling density g(t), up to a proportionality constant C f ( Fig. 2(a)). That is Assuming for now that ( ) ⁎ g t can be computed rapidly and accurately, this sample is accepted if ≤ ( ) Otherwise, sampling is repeated until the first sample is accepted [ 22 , Ch. 2]. The efficiency of rejection sampling hinges on how likely the samples drawn from the proposal density are accepted. To ensure a high acceptance likelihood, the scaled proposal density ( ) C f t f needs to tightly upper-bound the target density g(t), as illustrated in Fig. 2(a). This is particularly important for the tails of the target density. A proposal density that only loosely bounds these tails causes more samples to be drawn from these tail regions which are in turn very likely rejected (e.g. sample ⁎ t 2 in Fig. 2(a)). Thus, great care will be put on tightly bounding the tail regions when designing suitable proposal densities.
We cannot directly apply rejection sampling to draw samples from the first-passage times, as g(t * ) is only known in the form of infinite series. However, these series have a property that makes them suitable for the series method, which is a particular variant of rejection sampling [ 22 ,Ch. 4]. Focusing for now on the long-time series used in ( ) ⁎ g t l , we will show in the next section that its elements form a sequence that alternatingly upper and lower-bound the true ( ) ⁎ g t l ( Fig. 2(b)). That is, for sufficiently large t * we can form a easy-to-compute sequence ( ), ( ), … for all odd n's. As a result, we can define a region ( ), ( ) and whose size shrinks towards zero for increasing n. With such a sequence, rejection sampling can be performed as illustrated in Fig. 2 , and for which , the sample of t * is immediately rejected, and if < ( ) it is immediately accepted. Otherwise, we proceed iteratively to narrow the region around ( ) ⁎ g t l until rejection or acceptance occurs. For n = 2 and any subsequent even n, we compute ( ) . If neither occurred, we increment n by one and return to the previous step for even n's. This procedure is repeated until t * is either accepted or rejected. As the sequence of ( ) the procedure is guaranteed to terminate. In the following we show that such a convergent series can be found for both g s (t) and g l (t). This convergence is only guaranteed for small t for g s (t) and large t for g l (t), such that the choice of series depends on the initially sampled t * . After having shown this, we design suitable proposal densities for both g s (t) and g l (t).

Sequences of upper and lower bounds on g s (t) and g l (t).
Considering first the short-time series, we show how to construct a sequence ( ), ( ), … g t g t s s 0 1 that converges to g s (t) in Eq. (8) from above and below for even and odd n's, respectively. For any fixed time t, g s (t) is a scaled version of ( ) , g t s 0 , Eq. (4), such that it is sufficient to construct such a sequence on ( ) , g t s 0 . Before doing so, note that the kth term of ( ) is equivalent to the (1 − k)th term, such that they can be re-grouped into a more convenient form, given by , as illustrated by two samples, ⁎ t 1 and ⁎ t 2 . Either sample is accepted if the corresponding z falls into the green region, and rejected if it falls into the red region. Thus, the first sample, ⁎ t 1 is very likely accepted, while ⁎ t 2 is more likely to be rejected, due to ( ) converges to ( ) ⁎ g t from above and below, thus forming a reject/accept region around ( ) ⁎ g t that increases in size with every additional element of this sequence. Thus, the t * can be accepted (for z 1 , green dot) or rejected (for z 2 , red dot) while computing only a small number of these elements.  Based on this, we define the truncated series . This series can be computed recursively by holds. Overall, to satisfy the above for all n, we need to have for odd n, as required by our sampling method.
For the long-time series, we define , as the truncated series approaching ( ) , g t l 0 , Eq. (6). As for the short-time series, this truncated series can be computed recursively by for odd n, as required by our sampling method.
Overall, the two series differ only in their applicable range of t and in the constants that they rely on. Therefore, as shown in Table 1, the accept/reject decision for exact sampling with both series can be implemented by the same function that takes the series-dependent C 2 as an argument.

Suitable proposal distributions.
What remains is to define a proposal density to draw the t * 's from. This proposal f(t) needs to be easy to sample from, and, scaled by C f , needs to tightly upper-bound the density we wish to sample from. As previously discussed, the tightness of this bound is important, as it determines the rejection rate, and thus the efficiency of the sampling procedure. Here, we construct two such proposal densities. The first, f 1 (t), is tight for small μ, and the second, f 2 (t), for large μ. The final sampling scheme chooses between these proposal densities on the basis of comparing the desired μ to some threshold µ . In addition to this, we assume some t below and above which the short-time and long-time series are used, respectively, to decide if the drawn t * is accepted. For now, we only require ∈ . , .
To use the series-based accept/reject procedure, either proposal density should upper-bound the largest bound in the sequence that approaches ( ) = ( ) g t g t s l . For both the short-time and long-time series, this bound is given by the n = 0 element in the respective sequence, such that, by combining Eqs. (8), (12) and (16), these bounds are given by as illustrated in Fig. 3(a). Proposal distribution for small μ. In the above, ( ) g t l 0 is proportional to the density of an exponential distribution with rate µ π ( + ) / 4 8 2 2 . This distribution is easy to sample from, such that we choose To sample from this proposal, we cannot directly use Gaussian samples, as these would only be useful for the short-time part of the support of the proposal density. Instead, we use the inversion method [ 22 , Ch. 2], which first draws a value from a uniform distribution and then transforms this value by the inverse cumulative function to achieve samples from the desired density. Thus, the method requires the full cumulative functions of the proposal densities. For ( ) , f t s 1 , this cumulative function and its inverse are given by . The densities that the cumulative functions are based on are unnormalized, such that (∞) F 1 is not guaranteed to be one, as would be the case for normalized densities. Thus, to sample according to F 1 (t), we draw a P uniformly from , (∞) F [ 0 ] 1 (rather than from , [0 1], as we would for normalized densities), and then choose t * according to otherwise 26 Overall, this results in the algorithm in Table 2, where we have used for the adequately re-scaled short-time proposal at t * , and the corresponding Proposal distribution for large μ. As Fig. 3(b) illustrates, the described proposal forms a tight upper bound on ( ) g t s 0 for small μ, but fails to do so for large μ. In case of the latter, we use the property that ( ) g t s 0 is proportional to an inverse-Gaussian distribution with mean µ −1 and shape 1, which we can sample from using the proposal density is guaranteed (see Fig. 3(a)). Thus, as long as ≥t t, the above C f 2 causes the proposal to upper-bound ( ) g t l 0 . As soon as <t t, we need to additionally re-scale the proposal by  Figure 3(b) shows that this proposal provides a tighter bound for larger μ. Overall, this results in the algorithm in Table 3, whose constants C s and C l are based on an adequate re-scaling of the proposal density, and where ( , ) a b IG denotes an inverse-Gaussian distribution with mean a and scale b. , and (∞) F 1 (see Proposal distribution for small μ)   This leads to the algorithm in Table 4, which returns a sample of the first-passage time, together with which boundary was reached first. In this algorithm, ( ) a  denotes a draw from a Bernoulli distribution with probability a, where we have used the boundary probability g + as given by Eq. (9).

Simulations
In this section we show by simulations how to set the thresholds t and µ , and demonstrate the speed-up the the proposed method achieves when compared to simulating a diffusion model by the Euler-Maruyama method. All reported sampling speeds result from a single-core Julia implementation of the algorithms, running on a Mid-2014 15" MacBook Pro with a 2.8 GHz Intel Core i7 processor and 16 GB of RAM. Implementations of the proposed algorithm are available on the author's webpage in Julia, as well as in C+ + 11, with MATLAB and Python interfaces.
Tuning the thresholds  t and μ . We start by finding the best threshold t between the short-time and , and, on the other hand, by how many elements in the corresponding sequence need to be evaluated before the proposed t * sample can be accepted or rejected. Also, a change in drift μ causes a change in both the target density and the proposals. To take all of these factors into account, we measured the time it took to draw 10 6 samples for a set of different t and μ. For each t and μ we repeated this procedure 100 times, discarded the top and bottom 20% sampling times as outliers, and averaged over the rest. The resulting average sampling times are shown in Fig. 4(a).
For the Gaussian/exponential proposal,  Fig. 4(a) shows that, for small μ, sampling speed is mostly independent of the choice of t , as long as > .
t 0 2. As soon as μ rises above around 1, increasingly smaller t lead to more rapid sampling. The relationship between μ and the t that maximizes sampling speed turned out to be well captured by µ µ ( ) = . + .
(− / ) t 0 12 0 5 exp 3 (dashed line in Fig. 4), which ranges from 0.62 for small μ to 0.12 for large μ. We acquired this function to set t for all future uses of this proposal.
For the inverse-Gaussian proposal, ( ) f t 2 , Fig. 4(a) demonstrates that μ and t influence the sampling speed largely independently. An increasing drift μ generally causes faster sampling, which can be traced back to ( ) f t 2 being a tighter upper bound in such cases. The threshold t did not influence the proposal ( ) f t 2 directly, but modulated sampling speed by determining which series was used to accept or reject the drawn time samples. Sampling then return t * Table 3. Algorithm to sample first-passage times for large μ.  Table 4. The complete sampling algorithm. The function takes the drift µ † , boundary θ, and diffusion variance σ 2 and returns the tuple (T, X), where T is a sample of the first-passage time, and X = 1 (X = 0) if the upper (lower) boundary was reached first. t and µ  are tuning parameters whose values are optimized in the Simulations section.
was slow for small t , but for ≥ t 1, this threshold had little influence on the sampling speed. Thus, independent of the drift, we chose = .
t 2 5 for this proposal (dashed line in Fig. 4), which gave the overall best performance. Having determined a tuned t for each proposal, we now turn to the question of how to set µ  to choose between the two proposal densities. To do so, we evaluated the sampling speed associated with either proposal as before, for a set of different μ, but this time using the tuned t . In particular for the inverse-Gaussian proposal, using this t led to simplified algorithms (see Table 3 for = . t 2 5), and an associated increase in sampling speed. The resulting speeds for both proposals are shown in Fig. 4(b). As expected, the Gaussian/exponential proposal performs better for small μ, and the inverse-Gaussian proposal for large μ. Their speeds intersect at around µ =  1 (dashed line in Fig. 4(b)), which we acquired as the threshold to decide between the two proposal densities.

Speed-up when compared to the Euler-Maruyama method.
To get an idea of the speed-up achieved by the proposed method, we compare it to the standard Euler-Maruyama method for simulating diffusion models. This method starts at x 0 = 0 and then iterates over In the above, Δ is a small step-size, and η n is a zero-mean unit-variance Gaussian random variable. While easy to implement, the algorithm does not take into account excursions of the x(t) trajectory beyond − 1 or 1 between two consecutive trajectory samples, x n and x n+1 (see Fig. 1(b)), which makes it prone to over-estimating the first-passage time 13 . The resulting bias is shown in Fig. 5(a) for different step-sizes Δ and drifts μ, which illustrates that larger step-sizes cause an increase in the bias. Taking larger steps also lowers simulation time, such that the choice of Δ is a trade-off between minimizing bias and maximizing sampling speed. This trade-off is not present in our method, which always generates unbiased first-passage time samples.
As there is no single best step-size for the Euler-Maruyama method, we compared the speed of our method to that of the Euler-Maruyama method for different step-sizes. For either method, we found the sampling speed as before, by computing an average over 100 runs of 10 6 samples each, while discarding the slowest and fastest 20% of these runs as outliers. As shown in Fig. 5(b), this procedure revealed a speed-up by a factor of 100 to 1000 for sensible step-sizes, Δ ≤ 1 ms, as recommended by (ref. 12). Even for extreme step-sizes of Δ = 50 ms, in which the Euler-Maruyama method might over-estimate the first-passage times by a factor of two, our method featured faster sampling times for μ < 5. Thus, there does not appear any sensible parameter range in which the Euler-Maruyama method yielded lower sampling times than our method. For this reason, our method should always be the preferred approach.

Discussion
We have developed a fast and unbiased method to sample first-passage times from diffusion models. This method is based on rejection sampling from the known infinite series expansion to the first-passage time densities. Making use of properties of this series, we showed that samples can be rejected or accepted while computing only few terms of this series. The method features two parameters that we have tuned by simulations to maximize sampling speed. Overall, our method draws unbiased samples roughly a hundred to a thousand times faster than the Euler-Maruyama method that only provides biased samples.
Previously, a similar approach has been used to draw samples from a diffusion model with zero drift 17,18 . This allowed the authors to directly use the upper bounds, , as proposal densities, and draw samples from the resulting density by the inverse method. Once we introduced a non-zero drift, these densities became inadequate, such that we had to replace that bounding ( ) for large μ, which might lead to a large rejection rate and thus inefficient sampling. For this reason, we used another inverse-Gaussian proposal density ( ) f t 2 that is tighter for μ > 1 and, as a consequence, provides faster sampling for such μ. Interestingly, this proposal corresponds to the first-passage time density for diffusion models with a single bound 21 . Hence, as soon as the drift towards this bound is sufficiently large, the contribution of the opposing bound to this density becomes negligible.
A previously proposed approach 19 for non-zero drifts is also based on rejection sampling and so comparable to the method proposed here. It differs from our method in the following points. First, it features a single, less-tight proposal density whose acceptance rate decreases with increasing drift rates. Our method avoids this by using different proposal densities for small and large drift rates. Second, the previous approach does not use the alternating lower/upper-bound property of the series expansion of the first-passage time densities to guide rejection, but instead truncates this expansion after a fixed number of terms. If the expansion is truncated after too few terms, sampling will be inaccurate. If too many terms are evaluated, the method will be slower than ours. Third, the series expansion used in 19 corresponds to the Fourier series solution, Eq. (5) which is known to converge quickly for large drift rates, but slowly for small drift rates. Therefore, the number of terms after which to best truncate A speed-up of 10 means that, on average, our method yields samples ten times faster than the Euler-Maruyama method.
Scientific RepoRts | 6:20490 | DOI: 10.1038/srep20490 Eq. (5) depends on the drift rate, which is not considered in (ref. 19). We, instead, use a different series for small drift rates, which makes the method overall faster. In terms of sampling speed, we compared our method to the Euler-Maruyama method, which is known to be both biased and slow. Higher-order alternatives to the stimulation of stochastic differential equations [e.g. 23 ] might reduce this bias, and thus might seem a more adequate performance baseline. However, their lower bias comes at a higher computational cost, which makes them slower than the Euler-Maruyama method. Also, while they might be able to lower the bias, they will not be able to completely eliminate it. Therefore, even these higher-order methods will be no match to the method we have proposed. Furthermore, for the sake of fitting diffusion model parameters, they only provide a marginal improvement over the most basic Euler-Maruyama method 12 .
While our method was developed for a simple diffusion model with symmetric unit boundaries and a unit diffusion variance, time and space rescaling makes it applicable to arbitrary boundary levels and diffusion variances. Furthermore, it can be embedded within a sampler that also models drifts, bounds, and other variables as random, thus providing additional levels of flexibility [e.g. 1,11 ]. One restriction for our method to work is that the boundaries need to be symmetric around the particle starting point. This restriction ensures that both the short-time and long-time series alternatingly form upper and lower bounds on the true first-passage time density. For asymmetric boundaries, this property is not guaranteed for the long-time series, such that we are unable to use the same rejection sampling variant.
One possible extension to sample efficiently from diffusion models with asymmetric boundaries is illustrated in Fig. 6(a). The idea is to use the symmetric sampler as a building block to sample from more complex diffusion models, analogous to the method introduced by (refs. [17][18][19]. In the case of asymmetric boundaries, sampling would commence by assuming a symmetric diffusion model that is tightly bounded by the asymmetric model. When reaching the boundary that is shared by both models, sampling would stop and return the sampled time and boundary. Otherwise, sampling continues from another symmetric diffusion model that is again tightly bounded by the asymmetric model, but is this time centered on the previously terminal particle location. This procedure is continued until the reached boundary is that shared by both the symmetric and the asymmetric model. At that point, the total time, as well as the reached boundary are returned. A similar approach allows us to approximate samples from an Ornstein-Uhlenbeck process, or leaky accumulator, which acts as another popular psychological model 24 . In this case, we could approximate the leak, which theoretically varies continuously over the particle space, by a sequence of regionally constrained leak-free diffusions, each of which are represented by symmetric diffusion models with a different drift ( Fig. 6(b)). Due to the approximation, this approach would unfortunately introduce a bias. An unbiased alternative is to use a method that samples from such leaky processes without bias, by creating a sequence of skeleton point connected by Brownian bridges 25 . While this has the potential for faithfully sampling from leaky processes, its efficiency when compared to the the Euler-Maruyama method remains to be evaluated.

Conclusions
We have presented a new method to sample the first-passage time and reached boundary for Wiener diffusion models. Our method is superior to previously used approaches in that it is both unbiased and significantly faster. While restricted to diffusion models with boundaries symmetric around the starting point, it can act as a building block to sample from models that violate this constraint. Thus, it promises to extend its reach, improving upon both fitting such models to behavioral data and simulating them with high efficiency. (a) Application to diffusion models with asymmetric boundaries. Such boundaries can be handled by using a symmetric diffusion model centered on the current particle location that is tightly bounded by the asymmetric diffusion model (shaded areas). A sample is returned if a boundary shared by both diffusion models is reached (sample 2). Otherwise (sample 1), a new symmetric diffusion model is inscribed, and the procedure is repeated. (b) Approximating a leaky accumulator. A leaky accumulator is governed by the stochastic process τ µ = (− / + ) + . Splitting x into equally-sized regions, each endowed with a different drift µ † , results in the piece-wise sampling scheme illustrated above.