Rare events are an important and exciting theoretical research field in mathematics and in natural sciences, with a long history in topics ranging from physics, geophysics and biology, to ecology and social systems1,2,3,4. A deeper understanding of the mechanism that leads to rare events is a major problem in risk predictions and management, across different disciplines5,6.

In this field, an interesting role is played by the so called Big Jump principle. The principle explains extreme events in a wide class of natural and man-made systems with heavy tailed distributions, not in terms of an accumulation of many small subevents but solely as an effect of the biggest event, the big jump. The accumulated rain fall in one month in a region7, the energy released in an earthquake, and also the position of particles whose motion is determined by a sum of very heterogeneous steps are examples of processes where the principle is very likely to be valid. If only one event is controlling the statistics of extremes, we can understand the inherent difficulties in the prediction. At the same time, if we know that the process we are studying follows the principle, we can learn how to better quantify the extremes.

The big jump principle has originally been shown to hold for sums of independent random variables following a heavy-tailed (i.e. subexponential) distribution8,9,10,11,12. Recently it has been applied to models of anomalous transport in quenched disorder13,14,15, where it has been used to predict with a surprising accuracy large fluctuations driven by a single rare event. Interestingly, a key feature of the big jump approach is that it is able to reproduce the whole general shape of the probability density for rare events, and in particular its non-analytical behaviors15, i.e. cusps and fine structures related to the specific form of the single process that contributes to the tail. Indeed, while the central part of the probability distribution typically features universal and smooth shapes driven by central limit theorems arguments16, the big jump can give rise to non universal effects since it involves a single process. The non universal effects can be used in one direction, from the microscopic modelling towards an accurate prediction of the risk for rare events, but also in reverse order, that is to argue details of the microscopic underlying processes from the structure of the far tail.

In particular, the big jump principle was recently extended15,17,18 to case studies which involve Lévy walks. These are introduced as continuous time stochastic process for particles performing steps with duration drawn from a power law, hence heavy-tailed, distribution19,20,21. Because of their generality, Lévy walks are applied to describe motion of cold atoms in laser cooling22, transport in turbulent flow23 and in neural transmission24, animal motion25,26, and natural and optimized search processes27. These systems all have in common a power law distribution for step durations and they can differ in how the walker moves along the steps, i.e. at constant velocity or with a more complex type of motion. In this framework, the typical quantity of interest is the particle position at fixed time, independently of the number of steps (draws). This introduces a non trivial coupling mechanism between position and observation time, as the far tails of the position distribution are naturally cutoff by the finite speed of propagation. Therefore Lévy walks depart from the simple case of summation of random variables and in particular the distribution of rare events presents cutoffs and other non analytic features.

A generalized Lévy walks28,29, originally introduced for motion in turbulent fluids, has recently been considered to model complex motion in each single stretch. More precisely, the duration of a step t is drawn from a power law distribution λ(t) t−1−α, while the motion within a step is described by two further exponents: v, relating the step length with the duration time t, and \(\eta \), which provides the temporal dynamics within a step, modelling acceleration and deceleration effects. Such a general description of the microscopic motion is suitable to deal with a wide class of Lévy walks and hence it can be applied to many model systems in the presence of complex trajectories25,26. Previous results28,29 focus on the calculation of the mean square displacement of the generalized Lévy walk as a function of time. Here, we describe the asymptotic time evolution of the entire walker probability distribution, which allows us to extract the behavior of correlations and higher moments.

First, we apply standard techniques in random walk theory to obtain the central part of the distribution and its scaling length. We show that the bulk of the distribution displays standard universal behaviors, i.e. a Gaussian distribution, a Lévy stable distribution or the distribution of continuous time random walks (CTRW), depending on the divergence or finiteness of the mean duration and the mean square length of the single step.

Then, by using the big jump principle, we characterize the tail of the probability distribution at distances much larger than the scaling length. We show that rare events are described by non trivial functions, determined both by the duration distribution of the steps \(\lambda (t)\) and by the microscopic acceleration and deceleration along the step, so that the result depends on all the exponents \(\alpha \), v and \(\eta \). Remarkably, these non-universal distributions, which display non-analytic behaviors, are obtained from the general principle of single big jump, which provides a unique physical explanation of the process driving the rare events. We also highlight that for some values of \(\alpha \), v and \(\eta \) the motion within a step can be slower than the growth of the scaling length, so in this case the principle does not apply. As a final result, we also derive the scaling of all the moments of the distribution that, interestingly, feature strong anomalous diffusion30,31,32. All our analytical results are in very good agreement with extensive numerical simulations.

The paper is organized as follows: the section Results is divided into 4 parts. In the first one we discuss the single big jump principle. In the second, we discuss the generalized Lévy walk model28,29 and we describe the central part of the probability distribution. In the third part we apply the big jump principle to the generalized Lévy walk and we obtain the distribution of rare events and in the last part we discuss the moments of the distribution. Comparisons with numerical simulations are shown along the sections, showing a very good agreement in the long time asymptotic limit. The section Methods is devoted to a discussion of the big jump principle in terms of a very general formulation which can be applied to a wide class of models. In the Supplementary Information (SI) we describe some the details of our calculation. We end with our conclusions and final remarks.


The big Jump principle

The big jump principle applies to systems where a rare fluctuation of a stochastic variable is driven by a single extreme event, that we call the big jump. We introduce the principle with the rate approach15, an heuristic formulation which allows for an easy extension beyond the standard case of sum of independent and identically distributed random variables. The estimate is based on the splitting of the problem in two parts: the first one leads to the calculation of the jump rate, that is the rate at which the walker makes attempts to perform the big jump. The second part takes into account the dynamical evolution during the big jump.

We consider a dynamical stochastic process with random variables \({T}_{i}\) drawn from a broad distribution \(\lambda (t)\) at times \({T}_{i}\) (\({T}_{i} < {T}_{j}\) if \(i < j\)). The extraction time \({T}_{i}\) is also a random variable that can depend on \({t}_{j}\) (\(j < i\)), while in the simple case of the sum of IID we simply have \({T}_{i}=i\). We are interested in the “global” stochastic variable \(R\) which in general depends on \({t}_{i}\) in a non trivial way (\(R > 0\) for the sake of simplicity). We call \(P(R,T)\) the Probability Density Function (PDF) of measuring R at time T (see Methods for details). We focus on generalized Lévy walks where the event i is a jump, \({t}_{i}\) is the jump duration and R the particle position. In different stochastic processes, \({t}_{i}\) and R can have different interpretations (energies, masses...)33,34,35,36.

We consider a process where, at large T, P(R, T) can be split in two terms, one related to the central part of the distribution, describing typical values of the final position R, and the other related to the far tail at very large R, driven by rare events:

$$P(R,T)\sim \{\begin{array}{cc}{\ell }^{-1}(T)f(R/\ell (T)) & {\rm{i}}{\rm{f}}\,R < \ell (T)\kappa (T)\\ B(R,T) & {\rm{i}}{\rm{f}}\,R > \ell (T)\kappa (T)\end{array}$$

where \(\ell (T)\) is the characteristic length of the process and \(\kappa (T)\) is a slowly growing function of  \(T\) (e.g. a logarithmic function). Notice that at large \(T\), \(P(R,T)\) converges in probability to \({\ell }^{-1}(T)f(R/\ell (T))\), a function which is significantly different from zero only for values of the final position \(R < \ell (T)\kappa (T)\). However, \(B(R,T)\) describes \(P(R,T)\) for \(R\gg \ell (T)\), i.e. at distances much larger than the scaling length of the process. Therefore \(B(R,T)\) can be relevant in the calculation of higher moments of the distribution \(\langle {R}^{q}(T)\rangle ={\int }_{0}^{\infty }P(R,T){R}^{q}dR\) (\(q > 0\)), such as the mean square displacement \(q=2\), since:

$$\langle {R}^{q}(T)\rangle \sim {\int }_{0}^{\ell (T)\kappa (T)}{\ell }^{-1}(T)f(R/\ell (T)){R}^{q}dR+{\int }_{\ell (T)\kappa (T)}^{\infty }B(R,T){R}^{q}dR.$$

Here the first term can be subleading with respect to the second integral for \(q > {q}_{c}\), where \({q}_{c}\) is a critical order of the moments. This means that some moments of the process are influenced by the rare events30,32. \(B(R,T)\) is precisely the part of the distribution that we want to calculate with the big jump principle. In practice, \(B(R,T)\) describes the finite time deviations of \(P(R,T)\) from the bulk scaling function at large \(R\gg \ell (T)\), and this is what determines the anomalous moments of the distribution.

Since \(\lambda (t)\) does not depend on the jumping time, the probability to perform a jump of duration \(t\) at time \({T}_{w}\) is \({p}_{{\rm{tot}}}(t,{T}_{w})={n}_{R}({T}_{w})\cdot \lambda (t)\), where \({n}_{R}({T}_{w})\) is the jumping rate that is \({n}_{R}(T)=d\langle N(T)\rangle /dT\) and \(\langle N(T)\rangle \) in the average number of jumps up to time \(T\). As we are considering \(R\gg \ell (T)\), according to the principle we suppose that the only important process that contributes to \(B(R,T)\) is the biggest jump and therefore we neglect all the jumps occurring before and after that. We call \({\mathcal{P}}(R|T,t,{T}_{w})\) the probability that a process, driven by the single jump of duration \(t\) starting at \({T}_{w}\), takes the walker in \(R\) at time \(T\ge {T}_{w}\). The big jump principle states that, as for \(R\gg \ell (T)\) the relevant part of the distribution is \(B(R,T)\), this can be determined as:

$$B(R,T)=\int dt{\int }_{0}^{T}d{T}_{w}{p}_{{\rm{t}}{\rm{o}}{\rm{t}}}(t,{T}_{w}){\mathcal{P}}(R|T,t,{T}_{w})$$

Hence, \(B(R,T)\) is evaluated by summing over all the paths (\(t\) and \({T}_{w}\)) that in a single jump bring the process to \(R\) at time \(T\). These paths, described by \({\mathcal{P}}(R|T,t,{T}_{w})\), can be very complex, as they include all the correlations and non-linearities of the the model. However, since only one stochastic draw is involved, an analytic approach is often feasible (for further details see Methods). Notice that Eq. (3) provides an estimate of \(B(R,T)\) only for large \(R\), so in general \(B(R,T)\) can behave as an infinite density37, i.e. \(B(R,T)\) diverges at \(R=0\) so that \(\int dRB(R,T)=\infty \). Nevertheless, \(B(R,T)\) provides the correct expression for the asymptotic behavior of the moments \(\langle {R}^{q}(T)\rangle \) with large \(q\), since, according to Eq. (2), the factor \({R}^{q}\) cures the divergence in \(R=0\). Notice also that the hypothesis that a single big jump contributes to \(B(R,T)\) is crucial. If in a process it is not possible to reach \(R > > \ell (T)\) with a single stochastic event, Eq. (3) does not apply and different approaches must be introduced38.

Generalized Lévy walks: microscopic dynamics and the bulk of the distribution

The generalized Lévy walk28,29 is a model of anomalous transport with acceleration and deceleration along the microscopic trajectories, an effect that is often encountered in experiments25,26. In this model, the stochastic variable \({t}_{i}\) drawn from the broad PDF \(\lambda ({t}_{i})\) defines the duration of the i-th step so that the draw i occurs at time \({T}_{i}=\mathop{\sum }\limits_{j=1}^{i-1}{t}_{j}\) (T1 = 0). As a typical example of broad distribution we take a power law \(\lambda (t)\) where for \(t > {\tau }_{0}\)

$$\lambda (t)=\frac{{\tau }_{0}^{\alpha }}{{t}^{1+\alpha }}$$

and \(\lambda (t)=0\) for \(t < {\tau }_{0}\). We define \(r(T)\) the position of the walker and \(R(T)=|r(T)-r(0)|\) its distance from the origin. The microscopic dynamic of the walker in the time interval \({T}_{i} < T < {T}_{i+1}\) is defined as:

$$r(T)=r({T}_{i})+{c}_{i}{t}_{i}^{\nu -\eta }{(T-{T}_{i})}^{\eta },$$

where \(\nu > 0\) and \(\eta > 0\) are the parameters describing the microscopic motion and the random “velocity” \({c}_{i}=\pm \,c\) is drawn with probability \(1/2\) in each step. According to Eq. (5) the step i starts in \(r({T}_{i})\) and stops in \(r({T}_{i})+{c}_{i}{t}_{i}^{\nu }=r({T}_{i+1})\) which defines the starting point of the \(i+1\) step. In this framework we call \({L}_{i}=c{t}_{i}^{\nu }\) the length of the step i.

The generalized Lévy walks correspond to many different types of motions along the steps. If \(\eta < \nu \) the walker moves faster at the beginning of the step, then it slows down. Conversely, for \(\eta > \nu \) the motion starts at slow speed, then it speeds up (see Fig. 1). In particular, for \(\eta =0\) we recover the so called step-first dynamics21, where the particle reaches instantaneously \(r({T}_{i})+{c}_{i}{t}_{i}^{\nu }\) at time \({T}_{i}\) then it waits a time \({t}_{i}\) before the following step. On the other hand, for \(\eta =\infty \) this is the wait-first dynamics21, with the walker waiting a time \({t}_{i}\) in \(r({T}_{i})\) then suddenly moving to \(r({T}_{i})+{c}_{i}{t}_{i}^{\nu }\) just before the next step. The case \(\eta =\nu =1\) corresponds to standard Lévy walks19, which presents ballistic motion along the steps, while the case \(\eta =\nu \) has been studied recently in detail in39, where the distribution of the rare events has been evaluated using a moment resummation technique.

Figure 1
figure 1

The big jump contributions. The jump starts at time \({T}_{w}\) and it can either lead you to the time horizon of the Lévy walk \(t > (T-{T}_{w})\), as in panel (a) or it may start and end before time T if \(t < (T-{T}_{w})\) as in panel (b). The orange line in the big jump represents the global motion after a jump \(c{T}^{\nu }\) (\(\nu =1\) in this case). In green we plot the motion of the walker. Within the jump we plot \(c{t}^{\nu -\eta }{(T-{T}_{w})}^{\eta }\) and continuous and dashed lines refer to \(\eta < \nu \) and to \(\eta > \nu \) respectively. The final position R is plotted in magenta. In panel (a) R depends on \(\eta \) and \(\nu \) while in panel (b) it is driven by the exponent \(\nu \) only.

If \(\lambda ({t}_{i})\) is given by Eq. (4), the step length \({L}_{i}\) is distributed as \(\tilde{\lambda }({L}_{i}) \sim {L}_{i}^{-1-\alpha /\nu }\). Hereafter, we define \(\langle t\rangle =\int dtt\lambda (t)\) the average duration of a step and \(\langle {L}^{2}\rangle ={c}^{2}\langle {t}^{2\nu }\rangle =\int dt{t}^{2\nu }\lambda (t)\) the average square length of a jump; \(\langle t\rangle \) is finite for \(\alpha > 1\), \(\langle {L}^{2}\rangle \) is finite for \(\alpha > 2\nu \). Since at the end of the jump the length \({L}_{i}=c{t}_{i}^{\nu }\) is independent of \(\eta \), one can expect naively, as in standard transport theories, that the statistical properties of \(R\) will be \(\eta \) independent. However, for heavy-tailed processes, the dynamics in the time interval between the last jump and the measurement time are important and hence the final result will be sensitive to \(\eta \).

Since \({t}_{i}\) can be arbitrary large, also the generalized velocity \(c{t}_{i}^{\nu -\eta }\) for \(\nu > \eta \) is unbounded and the walker can reach arbitrary large distances in an arbitrary small time \(\delta t=T-{T}_{i}\). Conversely, for \(\eta \ge \nu \), in a time \(T\) the walker can reach a maximum distance \({l}_{{\rm{cone}}}(T)\), that we call the light cone of the walker. For \(\nu \ge 1\) the light cone can be reached in a single step and \({l}_{{\rm{cone}}}(T)=c{T}^{\nu }\). For \(\nu < 1\) the light cone can only be reached in many steps, all in the same direction and \({l}_{{\rm{cone}}}(T)=Tc{\tau }_{0}^{\nu -1}\).

The bulk behavior of the PDF \(P(R,T)\) can be evaluated showing that the following scaling form holds (see SI for details):

$$P(R,T) \sim \frac{f(R/\ell (T))}{\ell (T)}.$$


$$\ell (T)\sim \{\begin{array}{cc}{T}^{1/2} & {\rm{i}}{\rm{f}}\,\alpha > 2\nu \,{\rm{a}}{\rm{n}}{\rm{d}}\,\alpha > 1\\ {T}^{\nu /\alpha } & {\rm{i}}{\rm{f}}\,\alpha < 2\nu \,{\rm{a}}{\rm{n}}{\rm{d}}\,\alpha > 1\\ {T}^{\alpha /2} & {\rm{i}}{\rm{f}}\,\alpha > 2\nu \,{\rm{a}}{\rm{n}}{\rm{d}}\,\alpha < 1\\ {T}^{\nu } & {\rm{i}}{\rm{f}}\,\alpha < 2\nu \,{\rm{a}}{\rm{n}}{\rm{d}}\,\alpha < 1\end{array}$$

For \(\alpha > 2\nu \) and \(\alpha > 1\), the mean duration and the mean square length of the single step are finite so that the scaling function is Gaussian, independently of the value of the exponents \(\alpha \), \(\nu \) and \(\eta \), as shown in Fig. 2 panel (a). For \(\alpha < 2\nu \) and \(\alpha > 1\) the mean duration of a step is finite but the mean square length is infinite, we are in a super-diffusive regime and \(f(\,\cdot \,)\) is a Lévy stable function8,40 which only depends on the ratio \(\nu /\alpha \), as shown in Fig. 2 panel (b). Notice that in this case the exponent \(\alpha /\nu \) driving both the scaling length \(\ell (T)\) and the distribution \(f(\,\cdot \,)\) is exactly the exponent that describes the distribution of the jump \(L\) whose variance is infinite. For \(\alpha > 2\nu \) and \(\alpha < 1\) the mean square length is finite but the mean duration of a step is infinite, and in this case the motion is sub-diffusive and \(f(\,\cdot \,)\) only depends on \(\alpha \) and corresponds to the scaling function of CTRW with infinite waiting time40 (see Fig. 2 panel (c)). Finally, Fig. 2 panel (d) shows that for \(\alpha < 2\nu \) and \(\alpha < 1\), when the mean square length and the mean duration are both infinite, the scaling function is not universal and depends on the exponents \(\alpha \), \(\nu \) and \(\eta \). In particular, the tail of the scaling function for \(R/{T}^{\nu }\gg 1\) is a pure power law when \(\eta < \nu \) and in this case it can be evaluated using the big jump approach (dashed-line).

Figure 2
figure 2

Scaling at short distances for the PDF in the generalized Lévy walk model. In panel (a) \(\alpha =1.6 > 1\) and \(\nu =0.7 < \alpha /2\) we obtain a diffusive behavior with a Gaussian scaling function. In panel (b) \(\alpha =1.6 > 1\) and \(\nu =0.7 < \alpha /2\). Here the scaling length grows super-diffusively as \({T}^{\nu /\alpha }\) and the scaling function is the Lévy function (see SI). In the panel (c) \(\alpha =0.8 < 1\) and \(\nu =0.3 < \alpha /2\), there is sub-diffusion, the scaling length grows as \({T}^{\alpha /2}\) and the scaling function is the scaling function of CTRW with infinite waiting time, which is independent of v (see SI). In panel (d) \(\alpha =0.7 < 1\) and \(\nu =0.8 > \alpha /2\), the scaling described in SI is determined by the single step motion \(R \sim {T}^{\nu }\). The scaling function depends in a non-trivial way also on the exponents v and \(\eta \). The dashed line represents the result of the big jump approach in Formula (10).

Generalized Lévy walks and the big Jump: tails and rare events

Let us now derive the tail \(B(R,T)\) by applying the big jump principle. According to Eq. (3), we have to find the rate of attempts for the big jump, and the form of all the processes that, in a single jump, bring the walker in \(R\gg \ell (T)\) at time T. We ignore the motion before and after the big jump, as this is the only contribution to the displacement. As shown in Fig. 1, two possible different contributions to \({\mathcal{P}}(R|T,t,{T}_{w})\) are present. In panel (a) \(t > (T-{T}_{w})\), the walker is still moving in the big jump at \(T\) and \(R=c{t}^{\nu -\eta }{(T-{T}_{w})}^{\eta }\). In panel (b) \(t < (T-{T}_{w})\), the walker ends its motion at \(t\) so that \(R=c{T}^{\nu }\). Since the big jump principle applies if \(R\gg \ell (T)\), in this second process we get \(c{T}^{\nu }\gtrsim c{t}^{\nu }=R\gg \ell (T)\). By comparing \(\nu \) with the characteristic exponent of \(\ell (T)\) in Eq. (7), we obtain that the path in panel (b) is relevant only for \(\alpha > 1\) and \(\nu > 1/2\). On the other hand, in the process of panel (a) for \(\nu > \eta \) the walker can reach arbitrary large distances in any fixed time interval \(T-{T}_{w}\) and the process is always relevant. Finally, for \(\nu \le \eta \), for both processes in Fig. 1, we have that \(R \sim c{T}^{\nu }\), and they both provide a contribution to \({\mathcal{P}}(R|T,t,{T}_{w})\) only for \(\alpha > 1\) and \(\nu > 1/2\). This means that, for \(\nu \le \eta \), \(\alpha < 1\) and for \(\nu \le \eta \), \(\alpha > 1\), \(\nu < 1/2\) the walker cannot reach a distance larger than \(\ell (T)\) in a single step and Eq. (3) cannot be used to evaluate \(B(R,T)\).

Let us first consider the case \(\alpha > 1\) and \(\nu > 1/2\) when both processes in Fig. 1 are relevant. In the SI we show that these processes can be simply encoded into the function \({\mathcal{P}}(R|T,L,{T}_{w})\). Moreover since \(\alpha > 1\), the jump rate is constant (\({n}_{R}({T}_{w})={\langle t\rangle }^{-1}\)) and \({p}_{{\rm{tot}}}(t,{T}_{w})=\lambda (t)/\langle t\rangle \). Then we plug \({p}_{{\rm{tot}}}(t,{T}_{w})\) and the explicit expression for \({\mathcal{P}}(R|T,L,{T}_{w})\) into the formula (3) so we obtain the explicit scaling form of \(B(R,T)\):

$$B(R,T)=\frac{1}{{T}^{\alpha -1+\nu }}F(\frac{R}{c{T}^{\nu }})$$

The scaling length at large distance grows as \(c{T}^{\nu }\). The non universal scaling function \(F(x)\) can be explicitly evaluated (see SI), it depends on the exponents \(\alpha \), \(\nu \) and \(\eta \) and it is non-analytic at \(x=1\). The case \(\eta =\mu \) with \(2\nu > \alpha \) has recently been studied in39 and the far tails of the distribution have been obtained using a moment summation technique. The tail of standard Lévy walks \(\eta =\nu =1\) has been discusses within various approaches37,41.

In Fig. 3, panels (a,b), for \(\alpha > 1\) and \(\nu > 1/2\), we plot the far tail of \(P(R,T)\) as a function of \(R/(c{T}^{\nu })\) and compare the analytic predictions with finite time simulations. In the long time limit, the densities fully agree with the big jump formalism. We remark that we used the same data of panels (a,b) in Fig. (2) introducing only a different scaling procedure. In particular, the figure shows the singularities in the distribution when \(R/(c{T}^{\nu })=1\) and the different behaviors when \(\nu > \eta \), \(\nu =\eta \) and \(\nu < \eta \) respectively.

Figure 3
figure 3

The far tails of the distributions \(P(R,T)\) for \(\alpha =1.6 > 1\) and \(\nu > 1/2\): \(\nu =0.7\) and \(\nu =1.2\) in panel (a,b) respectively. The thick lines represent the theoretical value of the scaling function \(F(x)\) explicitly calculated in the SI. The plot shows the singular behavior of the scaling function when \(x=R/T\nu =1\). Different behaviors are present for \(\eta < \nu \), \(\eta > \nu \) and \(\eta =\nu \) respectively. For very small values of \(\eta \) the cusp singularity in the distribution becomes barely visible.

In the case \(\alpha > 1\), \(\nu < 1/2\) and \(\eta < \nu \) only the first process in Fig. 1 allows to reach distances larger than \(\ell (T)\). Moreover since \(\alpha > 1\) and \(\langle t\rangle \) is finite we have \({p}_{{\rm{tot}}}(t,{T}_{w})=\lambda (t)/\langle t\rangle \). So we obtain (see SI):

$$B(R,T)=\frac{{T}^{\frac{\alpha \eta }{\nu -\eta }+1}{c}^{\frac{\alpha }{\nu -\eta }}{\tau }_{0}^{\alpha }}{\langle t\rangle (\nu +(\alpha -1)\eta ){R}^{1+\frac{\alpha }{\nu -\eta }}}$$

Also for \(\alpha < 1\) and \(\eta < \nu \) only the process in panel (a) provides a contribution. For \(\alpha < 1\), the average duration of a step is infinite and the jump rate is not constant. In particular, the jump rate decays with time as \({n}_{R}({T}_{w})={C}_{\alpha }{T}_{w}^{\alpha -1}/{\tau }_{0}^{\alpha }\) (the numerical constant \({C}_{\alpha }\) depends on \(\alpha \) only). So we obtain (see SI):

$$B(R,T)=\frac{{T}^{\frac{\nu \alpha }{\nu -\eta }}{c}^{\frac{\alpha }{\nu -\eta }}{D}_{\alpha }}{(\nu -\eta ){R}^{1+\frac{\alpha }{\nu -\eta }}}$$

where \({D}_{\alpha }\) depends on \(\alpha \) only. Since for \(R\gg \ell (T)\gg c{T}^{\nu }\), no characteristic length is present in the system, Eqs. (9) and (10) are pure power-laws (scale free) functions decaying as \({R}^{-(1+\frac{\alpha }{\nu -\eta })}\).

Figure 4, panels (a,b), shows that, for \(\alpha > 1\), \(\nu < 1/2\) and for \(\alpha < 1\), \(\nu < \alpha /2\), Eqs. (9) and (10) well describe the distributions at \(R\gg \ell (T)\), if \(\eta < \nu \) (dashed-lines). In the regime, \(\alpha < 1\), \(\nu > \alpha /2\), \(\eta < \nu \) Fig. 2, panel (d), shows that the tail in Eq. (10) perfectly matches the short distance scaling function. Notice that in this last case Eq. (10) can be rewritten as in Eqs. (6) and (7) i.e. introducing the scaling length \(\ell (T) \sim {T}^{\nu }\) and obtaining the same \(T\) dependent pre-factor i.e. \(B(R,T) \sim {T}^{-\nu }{(R/{T}^{\nu })}^{-1-\frac{\alpha }{\nu -\eta }}\). This perfect matching means that for \(\alpha < 1\), \(\nu > \alpha /2\) and \(\eta < \nu \), Eq. (6) holds also for \(R\gg \ell (T)\), however its behavior for \(R\gg \ell (T)\) can be evaluated with the single big jump approach.

Figure 4
figure 4

Far tails of the distributions \(P(R,T)\) for \(\alpha =1.2 > 1\) and \(\nu =0.4 < 0.5\) (panel (a)) and \(\alpha =0.8 < 1\) \(\nu =0.3 < \alpha /2\) (panel (b)). The thick lines represent the big jump predictions when \(\eta < \nu \) in formula (9) and (10) for the left and right panel respectively. The plot shows the singular behavior of the scaling function when \(x=R/T\nu =1\) and the different results when \(\eta < \nu \), \(\eta > \nu \) and \(\eta =\nu \) respectively. For \(\eta \ge \nu \) the figure shows that the bulk scaling function seems to describe the distribution even for \(R > \ell (T)\). In this case, indeed, Eq. (3) does not apply but the light cone grows much faster than \(\ell (T)\). Therefore deviations at large distances are not given by a single process but by the contribution of many steps in the same direction, which is an exponentially suppressed process very difficult to be observed.

For \(\alpha < 1\), \(\eta \ge \nu \) and \(\eta \ge \nu \), \(\alpha > 1\) with \(\nu < 1/2\) a single process cannot reach a distance larger than \(\ell (T)\) and Eq. (3) does not apply. In particular the power law tails in Eqs. (9) and (10) cannot be observed, as shown in Fig. 2, panel (d), and in Fig. 4, panels (a,b). A summary of the scaling for the bulk and the tails in the whole range of exponents is shown in Table 1.

Table 1 A summary of the scaling behavior of bulk and tails for the PDF \(P(R,T)\) when \(\alpha > 1\) and when \(\alpha < 1\).

We can compare the results of the tail in Table 1 with the conditions for the light cone. For \(\eta < \nu \) there is no light cone, so \(B(R,T)\) describes the behavior of the tail at arbitrary large distances. When \(\eta \ge \nu \) and \(\alpha > 1\) and \(\nu > 1\), the tail \(B(R,T)\) exactly vanishes at the light cone \({l}_{\mathrm{cone}}(T)=c{T}^{\nu }\). For \(\alpha > 1\) and \(1/2 < \nu < 1\), \(B(R,T)\) vanishes at \(R=c{T}^{\nu }\). However in this case the particle can reach larger distances (\({l}_{\mathrm{cone}}(T) \sim T\)) with multiple steps. Clearly these processes are exponentially suppressed, and this means that in the simulations of Fig. 3 panel (a), for \(\eta =3\) we observe events reaching a distance larger than \(c{T}^{\nu }\) at time \(T\), but these events become extremely rare when increasing \(T\). When the big jump does not apply, two cases are possible: for \(\alpha < 1\) and \(\nu \ge 1\), the light cone of the walker is determined by a single step, \({l}_{\mathrm{cone}}(T)=c{T}^{\nu }\), and trivially \(B(R,T)=0\) since it is impossible to go farther than \(c{T}^{\nu } \sim \ell (T)\) (as in the case of the standard Lévy walks for \(\alpha < 1\) where \({l}_{\mathrm{cone}}(T)=cT\)). In the other cases, the light cone is reached in a large number of coherent steps all in the same direction and \({l}_{\mathrm{cone}}(T) \sim T\gg \ell (T)\), whereas the single jump cannot go farther than \(\ell (T)\). In this case we expect \(B(R,T)\) to be exponentially suppressed and not described by Eq. (3).

The moments of the distribution

We now study the moments of the distribution of \(R\), which are related to quantities typically measured in experiments. We introduce the exponents \(\gamma (q)\) defined as \(\langle {R}^{q}(T)\rangle \sim {T}^{\gamma (q)}\). If \(\gamma (q)\) is not simply proportional to \(q\), this is what is called strongly anomalous diffusion30,32. Here \(\gamma (q)\) is evaluated taking into account the dominant term in Eq. (2) in the different regimes of Table 1. Notice that, for \(\eta < \nu \) \(B(R,T)\) decays at large \(T\) as \({R}^{-1-\frac{\alpha }{\nu -\eta }}\), therefore, for \(q > \alpha /(\nu -\eta )\), the second integral in Eq. (2) and the relevant moments are infinite. In this case, as we show in Figs. 5 and 6 the numerical value of \(\langle {R}^{q}(T)\rangle \) depends on the number of realizations \({N}_{R}\) that we average in the simulation. In particular, \(\langle {R}^{q}(T)\rangle \) diverges for \({N}_{R}\to \infty \) displaying at the same time very large fluctuations.

Figure 5
figure 5

Moments of the distribution for \(\alpha =1.6\), \(\nu =1.2\) and \(\eta =0.5\). panel (a): \(\langle {R}^{q}(T)\rangle \) as a function of \(T\) in the three regimes \(q < \alpha /\nu \) (\(q=1\)), \(\alpha /\nu < q < \alpha (\nu -\eta )\) (\(q=1.5\)) and \(q > \alpha (\nu -\eta )\) (\(q=2.5\)). Different symbols correspond to a different number of averages \({N}_{R}\). Continuous lines are the theoretical prediction \(\langle {R}^{q}(T)\rangle \sim {T}^{\gamma (q)}\) according to Eq. (11). In the first two regimes the symbols are perfectly superimposed and the results are independent of \({N}_{R}\). For \(q > \alpha (\nu -\eta )\) instead the results depends on the number of realizations that we are averaging. In general, \(\langle {R}^{q}(T)\rangle \) increases with \({N}_{R}\) but large fluctuations are present. In panel (b) we plot the fitted exponent \(\gamma \) as a function of the moment \(q\). The three regimes \(q < \alpha /\nu \), \(\alpha /\nu < q < \alpha (\nu -\eta )\) and \(q > \alpha (\nu -\eta )\) are shown. The theoretical result is well fitted but strong pre-asymptotic effects are present close to transitions points between the different regimes.

Figure 6
figure 6

Panel (a): plot of \(\langle {R}^{q}(T)\rangle \) as a function of \(T\) for \(\alpha =1.2\), \(\nu =0.4\), \(\eta =0.1\) or \(\eta =0.9\) and \(q=4.7\) or \(q=3.3\). For \(\eta < \nu \) and \(q > \alpha /(\nu -\eta )\) (\(\eta =0.1\) and \(q=4.7\)) moments are infinite and simulations show a strong dependence on the number of dynamical realizations \({N}_{R}\). In the other cases the results are independent of \({N}_{R}\) and the theoretical results \(\langle {R}^{q}(T)\rangle \sim {T}^{q/2}\) (continuous lines) asymptotically fit the simulations. panel (b): we plot the fitted exponent \(\gamma \) as a function of the moment \(q\).

In Fig. 5 we consider the super-diffusive regime \(\alpha > 1\), \(\nu > \alpha /2\) and \(\nu > \eta \) where:

$$\gamma (q)=\{\begin{array}{cc}q\nu /\alpha & {\rm{i}}{\rm{f}}\,q < \alpha /\nu \\ q\nu -\alpha +1 & {\rm{i}}{\rm{f}}\,\alpha /\nu < q < \alpha /(\nu -\eta )\\ {\rm{\infty }} & {\rm{i}}{\rm{f}}\,q > \alpha /(\nu -\eta )\end{array}$$

Therefore the system displays strong anomalous diffusion30,31. In panel (a) of Fig. 5 we plot \(\langle {R}^{q}(T)\rangle \) and we show that when \(\langle {R}^{q}(T)\rangle \) diverges, the results indeed depend on the number of realizations \({N}_{R}\) we use to obtain the average. In panel (b) we plot the function \(\gamma (q)\) and we show that far away from the critical value, where preasymptotic effects are expected to be stronger, simulations displays a nice agreements with theoretical values in Eq. (11).

In Fig. 6, panels (a,b), we consider \(\alpha > 1\) and \(\nu < 1/2\). For \(\eta < \nu \) analytical calculations of Eq. (2) gives \(\gamma (q)=q/2\) if \(q < \alpha /(\nu -\eta )\) while \(\gamma (q)\) diverges if \(q > \alpha /(\nu -\eta )\). On the other hand, for \(\eta \ge \nu \) we get \(\gamma (q)=q/2\) for any values of \(q\) and strong anomalous diffusion is not present. We remark that this is a general feature of the regimes where the big jump cannot be applied and the far tail are exponentially suppressed. Figure 6 confirms that simulations fit analytical predictions and that in the divergent regime the average moments depends on the number of dynamical realizations in the average process.

In general, therefore, the big jump approach via Eq. (2) is an effective tool for the calculations of anomalous exponents. Moreover, strong anomalous diffusion seems to be a general feature for systems where the big jump approach provides a significant contribution to the tail of \(P(R,T)\).


The single big jump principle provides an interesting and effective insight on the origin of rare events in heavy - tailed processes. The principle allows both for a physical interpretation of the mechanism that drives large fluctuations and also for a direct tool for calculation. In practice, it works as soon as we deal with a process where only one event contributes to the far tail, that is when only one jump takes our physical quantity \(R\) to a value that is well beyond the scaling length of the process. While derived within a heuristic scheme, the principle in the rate approach appears to be extremely effective in predicting the form of the tails, leaving an open question for a rigorous derivation.

We have here applied the principle to derive the exact form of the tail of the distribution in a class of generalized Lévy walks, a stochastic process that models anomalous transport in the presence of complex dynamics in the single step taken by the walker, which is subject to acceleration and deceleration effects. The dynamics in the steps give rise to a variety of shapes and behaviors for the PDF, summarized in Table 1. Interestingly, the single step dynamics is shown to strongly influence the form of the tail. We are therefore in a situation where, while the bulk of the distribution feature the usual universality properties of central limit theorems, the tail is sensitive to the detail of the single step dynamics, because the single step is what drives the rare events.

The big jump approach and the rate calculation can be applied well beyond the Lévy walk models considered in this paper and well beyond quantities that represent random walkers, sums of steps and particle positions. Our result opens new possibilities to use rare events to obtain information on the microscopic dynamics and to have a fresh look on real datasets of single trajectories in systems exhibiting heavy tails statistics. In particular, we expect the generalized Lévy walk to be largely applicable to all settings where deceleration and acceleration effects are relevant along the microscopic trajectories, like in contamination spreading and in complex active transport in the cell26,42.

An open point is to deal with processes where single rare events provide non trivial contribution to the distribution also at shorter distances38, as it happens in the case of the standard Lévy walk for \(\alpha < 1\). The extension of the results to higher dimensions43 is also an open question.


Consider a stochastic process where the variables \({t}_{i}\) (\(i=1,2,\ldots \)) are drawn from the distribution \(\lambda (t)\) at times \({T}_{i}\) with \({T}_{i} < {T}_{j}\) if \(i < j\). The time \({T}_{i}\) is, in general, a stochastic variable which can depend, according to the model, also on the draws occurring before \({T}_{i}\), i.e. on \({t}_{1},\ldots ,{t}_{i-1}\). A general expression for the PDF to measure the quantity \(R\) at time \(T\) is:

$$P(R,T)=\int \prod _{i}\,d{t}_{i}\lambda ({t}_{i}){\mathscr{F}}(R|T,\{{t}_{i}\})$$

where \({\mathscr{F}}(R|T,\{{t}_{i}\})\) is the probability of measuring \(R\) at time \(T\) given the sequence of random variables \(\{{t}_{i}\}\).

Equation (12) is very general and it is suitable to describe processes with complex dynamical correlations, with \({\mathscr{F}}(R|T,\{{t}_{i}\})\) being a highly non trivial function13,15,17.

We first discuss the explicit form of \({\mathscr{F}}(R|T,\{{x}_{i}\})\) for the generalized Lévy walk28,29. We notice that only the first \(n\) steps with \({T}_{n} < T < {T}_{n+1}\) provides a contribution to the process, so we can rewrite \({\mathscr{F}}(R,T,\{{t}_{i}\})\) as

$$\begin{array}{ccc}{\mathscr{F}}(R|T,\{{T}_{i}\}) & = & \mathop{\sum }\limits_{n=1}^{{\rm{\infty }}}\theta (T-{T}_{n})\theta ({T}_{n+1}-T)\int \mathop{\prod }\limits_{i=1}^{n}d{c}_{i}\frac{1}{2}(\delta ({c}_{i}-c)\\ & & \,+\,\delta ({C}_{i}+c))\delta (R-\mathop{\sum }\limits_{i=1}^{n-1}{c}_{i}{t}_{i}^{v}-{t}_{n}^{v-\eta }{(T-{T}_{n})}^{\eta })\end{array}$$

where \(\theta (\,\cdot \,)\) is the Heaviside function. So we obtain for Eq. (12):

$$\begin{array}{rcl}P(R,T) & = & \mathop{\sum }\limits_{n=1}^{\infty }\,\int \mathop{\prod }\limits_{i=1}^{i < n}\,d{t}_{i}\lambda ({t}_{i})\theta (T-\mathop{\sum }\limits_{i=1}^{n-1}\,{t}_{i})\theta (\mathop{\sum }\limits_{i=1}^{n}\,{t}_{i}-T)\int \mathop{\prod }\limits_{i=1}^{n}\,d{c}_{i}\frac{1}{2}(\delta ({c}_{i}-c)\\ & & \,+\,\delta ({c}_{i}+c))\delta (R-\mathop{\sum }\limits_{i=1}^{n-1}{c}_{i}{t}_{i}^{\nu }-{t}_{n}^{\nu -\eta }(T-\mathop{\sum }\limits_{i=1}^{n-1}{t}_{i}{)}^{\eta })\end{array}$$

Notice that in Eq. (14) \(P(R,T)\) is written as the sum of a series and each term of the series is given by an integral over a finite number \(n\) of random variables. This is a general property since only processes occurring at time \({T}_{n} < T\) can affect the measure of quantity \(R\) at time \(T\). Let us consider again the general process in Eq. (12) where \({t}_{i}\) are generic random variables drawn at times \({T}_{i}\). We can call \({w}_{n}({t}_{1},\ldots ,{t}_{n},T)\) the probability that \({T}_{n} < T < {T}_{n+1}\) given the sequence of random variables \({t}_{1},\ldots ,{t}_{n}\). Moreover we define \({{\mathscr{F}}}_{n}(R|T,{t}_{1},\ldots ,{t}_{n})\) the PDF to measure \(R\) at time \(T\) given the the random variables \({t}_{1},\ldots ,{t}_{n}\) and knowing that the variables \({t}_{n}\) has been drawn before \(T\) and the variable \({t}_{n+1}\) has been drawn after \(T\). We have

$$P(R,T)=\mathop{\sum }\limits_{n=1}^{\infty }\int \mathop{\prod }\limits_{i=1}^{i < n}\,d{t}_{i}\lambda ({t}_{i}){w}_{n}({t}_{1},\ldots ,{t}_{n},T){{\mathscr{F}}}_{n}(R|T,{t}_{1},\ldots ,{t}_{n})$$

comparing Eqs. (14) and (15) we have that \({w}_{n}({t}_{1},\ldots ,{t}_{n},T)=\theta (T-\mathop{\sum }\limits_{i=1}^{n-1}\,{t}_{i})\theta (\mathop{\sum }\limits_{i=1}^{n}\,{t}_{i}-T)\) i.e. the probability is, respectively, zero or one if the sums are smaller or larger than \(T\) and

$${{\mathscr{F}}}_{n}(R|T,{t}_{1},\ldots ,{t}_{n})=\int \mathop{\prod }\limits_{i=1}^{n}d{c}_{i}\,\frac{1}{2}\,(\delta ({c}_{i}-c)+\delta ({c}_{i}+c))\delta (R-\mathop{\sum }\limits_{i=1}^{n-1}{c}_{i}{t}_{i}^{\nu }-{t}_{n}^{\nu -\eta }{(T-\mathop{\sum }\limits_{i=1}^{n-1}{t}_{i})}^{\eta })$$

Moreover, we can now write a simple general definition of \(\langle N(T)\rangle \) that is the average number of draws up to time \(T\), i.e.

$$\langle N(T)\rangle =\mathop{\sum }\limits_{n=1}^{\infty }n\int \mathop{\prod }\limits_{i=1}^{i < n}(d{t}_{i}\lambda ({t}_{i}){w}_{n}({t}_{1},\,\ldots ,\,{t}_{n},\,T)$$

Here, we have considered the generalized Lévy walk and we provide a heuristic expression for \({\mathscr{P}}(R|T,t,{T}_{w})\); analogous results have been obtained in15 for different models such as the Lévy Lorentz gas. A fundamental question for the stochastic process is to obtain a general procedure to obtain \({\mathscr{P}}(R|T,t,{T}_{w})\) given the stochastic process described by the observable R and the function \({\mathscr{F}}(R|T,\{{t}_{i}\})\) in Eq. (12).