Introduction

Earthquakes occur in brittle regions of the crust characterized by a velocity-weakening friction, which is at the origin of the stick-slip behavior. The distribution of friction along the fault plane is highly heterogeneous with strong spots, usually called asperities1. Asperities are expected to be surrounded by weak zones with a rheological behavior better described by a velocity-strengthening friction. When the stress accumulated in the surroundings of the hypocenter overcomes the local friction, an abrupt slip takes place and stress is redistributed in the surrounding regions. The stress redistribution along the brittle, velocity-weakening, part of the crust triggers the occurrence of other earthquakes, the aftershocks. They follow well established empirical laws that can be put in the form of power laws with quite universal values for the exponents2. In particular, the aftershock rate exhibits a roughly hyperbolic decay with time since the mainshock, an empirical law known as the Omori–Utsu law3.

At the same time, the stress redistributed by the mainshock in velocity-strengthening regions induces some slow deformations, commonly defined as afterslip. Under the hypothesis4,5 of proportionality between seismicity rate λ(t) and afterslip rate, several features of aftershock occurrence are reproduced4,5,6,7,8,9,10. In ref. 11, we have demonstrated this proportionality in a model with only two elastically coupled degrees of freedom. The first described the fault displacement, with an heterogeneous velocity-weakening friction, while the second corresponded to the ductile region displacement, with a velocity-strengthening friction. This very simple description can model different tectonic contexts and suggests that the coupling with a velocity-strengthening layer and the heterogeneity in the fault friction are the two key ingredients controlling aftershock triggering. The same two ingredients are central in the pre-slip hypothesis12,13,14 according to which small earthquakes, usually named foreshocks15,16, are expected to anticipate the mainshock occurrence. According to this hypothesis, because of friction heterogeneity, there are small regions on the fault that have less resistive power than the large fault and can break before it, in presence of an underlying slow-deformation process. This mechanism can produce an increase of the seismic activity, as the occurrence time of the mainshock is approaching but, because of the limited number of foreshocks, it is very difficult to be identified17,18,19,20. Nevertheless, accurate investigations before some recent large earthquakes have elightened the presence of foreshocks together with a phase of slow slip of the plate interface21,22,23. Other precursory patterns are observed if one considers the distribution in space of foreshocks24,25,26,27 and/or their distribution in magnitude28,29,30,31. In particular, very recently, Gulia and Wiemer31 have shown that the magnitude distribution during aftershock activity is steeper than during foreshock activity. This result is however achieved for only two mainshocks and by means of different selection criterions for the foreshock identification32.

In this article, we show that friction heterogeneities and the slow deformation of a velocity-strengthening layer are sufficient ingredients to explain the whole ensemble of instrumental findings regarding the organization in time, space, and magnitude of both aftershocks and foreshocks. To this extent, we combine the model of two blocks of Lippiello et al.11 with the description of the fault plane originally proposed by Burridge and Knopoff (BK)33: a two-dimensional elastic interface with many degrees of freedom, each being subject to a velocity-weakening friction law. Therefore our model of the fault consists in a collection of sliding blocks connected to a more ductile region, itself treated as an extended interface subject to velocity-strengthening rheology. This system has a clear geophysical justification and allows us to study the organization of simulated earthquakes not only over time but also in space and in magnitude. We find that the model reproduces the most relevant empirical laws observed for instrumental aftershocks and foreshocks, quite independently of the precise value of model parameters.

Results and discussion

The model

The model we propose is composed by a first layer H that represents the brittle part of the fault. H is elastically coupled to a second layer U that mimics the ductile region below the fault and is driven by the tectonic dynamics at the (very small) velocity V0. Each layer is an extended object made of many interacting degrees of freedom labeled i = 1, 2, …, N, organized on a square lattice. For simplicity, we assume a motion restricted along the V0 direction, with scalar displacements hi(t) in the layer H and ui(t) in the layer U. In Fig. 1, we present a schematic description corresponding to a one-dimensional cut of the mechanical model along the V0 direction. The model also extends in the other direction, which is orthogonal to V0. From continuum mechanics, the elastic cost of the displacement field is \({k}_{h}{\sum }_{j\ne i}({h}_{j}-{h}_{i})/{r}_{ij}^{2}\), where rij is the distance between points i and j. The constitutive equations for the displacements hi in the layer H are obtained from the balance between the elastic forces and the velocity-weakening friction force τh:

$${\tau }_{h}={k}_{h}\sum_{j\ne i}\frac{{h}_{j}-{h}_{i}}{{r}_{ij}^{2}}+k({u}_{i}-{h}_{i}).$$
(1)

To improve the efficiency of our numerical scheme, we restrict the sum in Eq. (1) to nearest neighbors rij = 1, which corresponds to replacing the elastic force with the discrete Laplacian kh2hi. The total stress on i simplifies to kh2hi + k(ui − hi) (it is balanced by the friction τh). We also apply this short-range approximation to the layer U, which is, however, more ductile. For this reason, we assume that the viscoelastic interactions34 in U are implemented assuming that neighboring degrees of freedom are connected by means of a dashpot and a spring placed in series (Fig. 1)35,36. The constitutive equations for the layer U reads:

$${\tau }_{{u}_{i}}={k}_{u}({\nabla }^{2}{u}_{i}-{z}_{i})+k({h}_{i}-{u}_{i})+{k}_{0}({V}_{0}t-{u}_{i})$$
(2)
$$\eta \ {\dot{z}}_{i}={k}_{u}({\nabla }^{2}{u}_{i}-{z}_{i}),$$
(3)

where zi is the viscoelastic degree of freedom, and the dot indicates a temporal derivative. The viscoelastic force ku(2ui − zi) has an intrinsic timescale tη = η/ku. When ui moves, for times shorter than tη the dashpot variable zi remains frozen, so the term ku(2ui − zi) acts as a genuine elastic stress, and the layer U is solid-like. At longer times, the variable zi(t) evolves to suppress the viscoelastic force (zi = 2ui), and the layer U displays a liquid-like behavior.

Fig. 1: The mechanical model.
figure 1

Mechanical sketch of the model (one-dimensional cut: the other direction is orthogonal to the plane). This is the direct extension of Fig. 1 from ref. 11: each fault is modeled as a two-dimensional layer (and no longer as a single block). The fault plane H is subject to velocity-weakening friction τh, in the form of randomly placed pinning points (red disks) with varying pinning strength \({\tau }_{i}^{\mathrm{{th}}}\) (disk radius). The ductile region U is subject to velocity-strengthening friction τu, and is pulled at constant velocity V0 by distant regions. Within this ductile region, interactions are viscoelastic (Maxwell model), with dashpots having viscosity ηu and elasticity ku. The relative elongations of dashpots around site i is denoted zi = φi − ui − (φi−1 − ui−1). The two layers are connected elastically with a stiffness k.

Finally, we have to define the form of the friction forces. For the force τu of the ductile layer U, we assume a velocity-strengthening friction, taking the stationary form of the rate-and-state friction (RSF) law37,38,39:

$${\tau }_{u}(t)={\sigma }_{N}\left({\mu }_{c}+A\, {\mathrm{log}}\,\frac{{\dot{u}}_{i}(t)}{{V}_{c}}\right),$$
(4)

where σN is the effective normal stress, μc is the friction coefficient when the block U slides at the steady velocity Vc and A > 0 for a velocity-strengthening material.

For the friction in the brittle fault H, a random Coulomb failure criterion is adopted. As soon as the force overcomes a local random frictional stress threshold \({\tau }_{i}^{\mathrm{{th}}}\), the position hi becomes unstable and moves forward by a random amount (Δh)i. Slips of this kind are the bulk of earthquakes and occur on the very fast timescale ts, typically of the order of seconds. It is reasonable to assume that ts is the shortest timescale in the problem, and by far, tstη, i.e., we assume it is instantaneous. Thus during an earthquake the layer U behaves elastically and Eq. (2) can be approximated by

$${\tau }_{{u}_{i}}={k}_{u}{\nabla }^{2}{u}_{i}+k({h}_{i}-{u}_{i})+{k}_{0}({V}_{0}t-{u}_{i})\qquad t \sim {t}_{s}\ll {t}_{\eta },$$
(5)

the term kuzi being constant at these timescales, it plays no role in the dynamics of τuui. As a consequence, for each slip (Δh)i at position i in H, there are slips \({(\Delta u)}_{j}={q}_{{r}_{ij}}{(\Delta h)}_{i}\) at all positions j in the layer U, where \({q}_{{r}_{ij}}\) is a decreasing function of the distance rij. In general, the precise form of the \({q}_{{r}_{ij}}\) depends on the details of the dynamics of hi(t), zi(t), 0 < t < ts and can be quite complicated. Indeed, when we apply the RSF laws combined with all other equations (Eq. (3) in particular) to compute the true form of \({q}_{{r}_{ij}}\), we find a very fast decay as a function of rij, and thus decide to neglect terms that are not nearest neighbor to the slipping site. Thus in practice we use a short-range form for \({q}_{{r}_{ij}}\): \({q}_{{r}_{ii}}={q}_{0}\), \({q}_{{r}_{ij}}={q}_{1}\), if rij = 1 and 0 for all others. After the earthquake, at times t > tη, the dashpots of the layer U are relaxed and have dissipated some elastic stress (the ku2ui term is exactly compensated by  −kuzi). In this phase the ui’s are decoupled \((\eta {\dot{z}}_{i}=0)\) and Eq. (2) becomes

$${\tau }_{u}(t)=k({h}_{i}-{u}_{i})+{k}_{0}({V}_{0}t-{u}_{i}),\quad t\,> \, {t}_{\eta }.$$
(6)

Implementing the velocity-strengthening friction (Eq. (4)), Eq. (6) admits an explicit solution4,11. More precisely, the time \({t}_{R}=\frac{A{\sigma }_{N}}{{k}_{0}{V}_{0}}\) represents the long timescale associated with the afterslip of the layer U, and for tη < t < tR one obtains

$${u}_{i}(t)={u}_{i}({t}_{0})+{\rho }_{0} \,{\mathrm{log}}\,\left(1+D\frac{t-{t}_{0}}{{t}_{R}}\right),$$
(7)

where \({\rho }_{0}=\frac{A{\sigma }_{N}}{k+{k}_{0}}\) is a characteristic length and D is a constant. Conversely, at later times t > tR, the logarithmic motion becomes linear ui(t) ~ Vct with \({V}_{c}=\frac{{k}_{0}}{k+{k}_{0}}{V}_{0}\).

To summarize, there are four timescales: (1) The slip timescale ts, which characterizes the duration of a single earthquake, (2) tη related to the viscoelastic response in the layer U, (3) tR which corresponds to the posteismic phase, and (4) the inter-sequence timescale td ~Δh/Vc which corresponds to the typical waiting time between consecutive seismic sequences.

We assume an infinite time separation (tstRtd), which is a realistic approximation for geophysical parameters together with tstη < tR. Under this hypothesis, three distinct phases are identified: coseismic phase (t ~ tstη), post-seismic phase t ~ tR, (tsttd) and interseismic phase t ~ tdtR. Furthermore assuming that the local displacement Δh is a constant independent of the position i, the temporal evolution of the model can be numerically implemented via a cellular automaton, for which each slip is infinitely fast. In this approximation, the dynamics of the layer H at location i is completely characterized by the two contributions to the stress acting on that site, namely the intra-layer stress fi(t) = kh2hi and the inter-layer stress gi = k(ui − hi). The sum fi + gi is thus the total stress acting on block i. The details of the evolution of the variables fi and gi are given in the “Methods” section. In general, when \({f}_{i}+{g}_{i}\ge {\tau }_{i}^{\mathrm{{th}}}\), there is a slip in the site i and the stress evolves at i and at nearest-neighboring sites j:

$${f}_{i}(t)\, \to \,{f}_{i}(t)-4\Delta f\\ {f}_{j}(t)\, \to \,{f}_{j}(t)+\Delta f\\ {g}_{i}(t)\, \to \,{g}_{i}(t)-4{k}_{h}\Theta \Delta h\\ {g}_{j}(t)\, \to \,{g}_{j}(t)+\left(\Theta -\epsilon \right)\Delta f$$
(8)

with \(\Delta f={k}_{h}\Delta h,\Theta =(1-{q}_{0})\frac{k}{4{k}_{h}}\) and \(\epsilon =(1-{q}_{0}-4{q}_{1})\frac{k}{4{k}_{h}}\). The stress drop Δf is extracted from a Gaussian distribution with average value 〈Δf〉 and standard deviation σ.

During the coseismic phase, the stress evolution is driven by all the slips in layer H. Conversely, during the post-seismic phase, the stress evolution is driven by the ductile behavior of the layer U. More precisely, since ui evolves according to Eq. (7), one has gi(t) = gi(t0)Φ(t − t0), where Φ(t) is a logarithmic decreasing function of time. During the interseismic phase, the stress gi(t) grows linearly in time at the very slow tectonic rate k0Vc.

Since the specific value of 〈Δf〉 is not relevant, we set 〈Δf〉 = 1 and the model presents only three parameters: σ, Θ, and ϵ. The standard deviation σ quantifies the level of friction heterogeneity, whereas Θ quantifies the elastic interaction between the two layers, and in the limiting case Θ = 0 the layer H is decoupled from the layer U. Finally, the parameter \(\epsilon \propto 1-{\sum }_{j}{q}_{{r}_{ij}}=1-{q}_{0}-4{q}_{1}\) controls the amount of dissipation. In absence of friction in the layer U (τu = 0) and neglecting k0 from Eq. (5), mechanical equilibrium imposes \({\sum }_{j}{q}_{{r}_{ij}}=1\). However in general, for a finite k0 and taking into account the inelastic deformations in the U layer (the zi dynamics), \(\mathop{\sum }\nolimits_{j = 1}^{N}{q}_{{r}_{ij}}<1\). Accordingly, ϵ controls the value of an upper magnitude cutoff mU −1.5 log10 ϵ (see Supplementary Fig. 3). In the main text, we present results for a fixed value of ϵ = 0.008 which allows us to explore a sufficiently large magnitude range without finite-size effects. The role of ϵ and of the system size L is explicitly investigated in Supplementary Figures.

Fundamental quantities and their statistical features in instrumental catalogs

A key quantity is the seismic moment \({M}_{0}=A\overline{D}\), where A is the fractured area and \(\overline{D}\) is the average displacement. In spring-block models, A corresponds to the number of blocks which have slipped at least once during the earthquake and M0 = ∑iniΔh, where ni is the number of slips performed by the ith block during the earthquake. We next introduce the moment magnitude \(m=(2/3)\,{\mathrm{log}}_{10} \, {M}_{0}\). In instrumental catalogs, m is distributed according to the Gutenberg–Richter (GR) law: P(m) ~ 10bm, with quite a universal value40 of b 1. It is worth noticing that the GR law corresponds to a power-law decay of the distribution of the seismic moment \(P({M}_{0}) \sim {M}_{0}^{-1-2b/3}\). Furthermore, M0 is related to the fractured area A by the scaling relation M0 ~ A3/2 equivalent to the proportionality between m and the logarithm of A, \(m={\gamma }_{0}\, {\mathrm{log}}_{10}\,A+{\rm{cnst}}\), with quite a universal coefficient γ0 = 141,42.

Comparison with previous spring-block models

The description of a seismic fault in terms of spring and blocks was originally proposed by Burridge and Knopoff (BK)33. Bak and Tang43 have enlightened the similarity between the BK model and the evolution of a simple cellular automaton model, the BTW model44. In the BTW model, the stress of each block increases in time with a constant rate \(\dot{f}\), which models the tectonic loading, and when it reaches a uniform threshold fth, an earthquake starts by distributing stress to surrounding blocks. In the limit \(\dot{f}\to 0\), once the bond network is assigned the BTW model does not have tunable parameters, and is usually considered the paradigmatic example of self-organized system, since it spontaneously evolves toward a state where the size of avalanches is power-law distributed. Identifying an avalanche with an earthquake, since the earthquake size is proportional to M0, self-organized criticality provides a theoretical explanation for the GR law even, if it gives a too small, non-realistic value of b. Olami, Feder, and Christensen (OFC model)45 have subsequently shown that, keeping the limit \(\dot{f}\to 0\), the BK model can be exactly mapped in a cellular automaton. The model we present coincides with the OFC model in the limit cases Θ = 0 and σ = 0 and, in turn, the OFC model coincides with the BTW model when ϵ = 0. Interestingly, the OFC model presents an intermediate range of ϵ values such that M0 is power-law distributed with a b value close to one. On the other hand, for any finite value of ϵ, in the OFC model M0 A leading to γ0 = 2/3 for the coefficient of the m − log A scaling, different from γ0 1 of instrumental catalogs.

Many modifications of the original OFC model have been proposed in the literature2,46, and we group them into three classes: (i) those introducing a second timescale besides \(\dot{f}\); (ii) those introducing heterogeneity in the friction thresholds fth; (iii) those introducing both a second timescale and friction heterogeneity. A second timescale is usually implemented in order to reproduce the temporal decay of the aftershock number which, indeed, can be attributed to a variety of time-dependent stress transfer mechanisms47. Major examples of class I models are those implementing a viscous relaxation48,49,50,51 or a reductions in fault friction by means of RSF laws50. Concerning class II, the relevance of frictional heterogeneities in earthquake triggering has been deeply investigated52 and, in particular, the OFC model with a random fth corresponds to the quenched Edwards–Wilkinson (qEW) model35,36,53. This is a typical model for driven elastic interfaces in a random media and, in this case, it is well established that the seismic moment is power-law distributed with b independently of the value of ϵ2. Nevertheless, statistical patterns of seismic occurrence are better reproduced by class III models as shown in refs. 36,50,54,55,56,57,58,59,60,61,62.

According to the value of the parameters Θ and σ, our model can belong to the different classes. In particular, our conjecture is that class III models, and in particular the model we present with finite values of Θ > 0 and σ > 0, belongs to the same universality class of seismic occurrence. This conjecture is supported by the results of the subsequent section. In particular, we observe that for finite values of Θ and σ our model is very similar to the Viscoelastic quenched Edwards–Wilkinson (VqEW) model introduced by Jagla et al.35. The key difference lies in the functional form of Φ(t). Indeed in our model, the use of a realistic velocity-strengthening rheology induces a logarithmic variation of Φ(t) with time, which is the crucial ingredient leading to the Omori–Utsu hyperbolic decay of the aftershock rate. In the VqEW model instead, an exponential relaxation of Φ(t) is obtained.

The magnitude distribution and the m − log A scaling

For each earthquake we record the occurrence time t, the hypocentral coordinates i (i.e., the coordinate of the block which nucleates the instability), the magnitude m, and the fractured area A. The simulated catalog contains about 107 earthquakes; however, we exclude the first 10% of events so that results are independent of initial conditions. In the main text, we present results for different values of Θ and σ, keeping ϵ = 0.008 fixed. The results for different ϵ are discussed in the Supplementary Notes.

The full separation of timescales allows us to clearly distinguish separate seismic sequences. We define a seismic sequence as the set of earthquakes triggered by the relaxation of the layer U, according to Eq. (7), i.e., the set of earthquakes triggered during the post-seismic phase. A new sequence starts at much later times when an earthquake is triggered during the interseismic phase with the slow stress rate increase k0Vc. Interestingly, as it is often observed in instrumental catalogs63, this first earthquake in the sequence is not always the largest one. We adopt the convention used to classify events of real seismic sequences: the mainshock is the largest event in the sequence, the foreshocks are all events occurring before it and the aftershocks are all the subsequent ones. In Fig. 2a, we plot an excerpt of the whole catalog. The lag time between two consecutive sequences depends on the specific value of td.

Fig. 2: The numerical catalog.
figure 2

a A typical example of a part of the simulated catalog containing five sequences. We plot the magnitude of each event m versus its occurrence time in units of td. b A zoom on the second sequence plotted in panel (a). c We plot the contour of the area fractured by the mainshock (black line) of the m > 1.5 aftershocks (blue lines) and foreshocks (green lines) for the sequence plotted in panel (b). Red rhombus, cyan, and green triangles indicate the hypocentral location of the mainshock, of the m < 1.5 aftershocks and foreshocks, respectively. We include only aftershocks up to the time t = 0.2tR after the mainshock occurrence. d As in panel (c) for the whole fault plane, during the temporal window of the sequence considered in panel (b).

Before studying the features of aftershocks and foreshocks, we investigate the behavior of the global catalog.

In Fig. 3, we plot the magnitude distribution P(m) for different values of Θ and σ. In particular, the OFC model45 (corresponding to Θ = σ = 0) gives an exponential decay with b = 0.12 ± 0.02 up to a system size-dependent upper cutoff mU, whereas for the qEW model (Θ = 0, σ > 0), we find b = 0.40 ± 0.02 independently of ϵ, with mU controlled by ϵ. Surprisingly, even for small values of Θ, the presence of the velocity-strengthening layer U induces a dramatic and robust change in the b value. For Θ ≥ 0.1, in very good agreement with instrumental catalogs, we always observe b = 1.06 ± 0.05 for intermediate magnitudes ranging from a lower cutoff mL related to lattice-specific details, up to an upper cutoff mU. Keeping Θ > 0.1 fixed we also find that the result b = 1.06 is independent of σ (inset of Fig. 3), except for the singular choice σ = 0, where the magnitude distribution presents a non-monotonic behavior (not shown). The parameters Θ and σ only affect the value of mU, which increases with them (Fig. 3). In particular, mU tends to mL when (Θσ) are very small, shrinking to zero the range where b ≈ 1. We also note an initial exponential decay \(P(m) \sim 1{0}^{-{b^{\prime}} m}\) for small magnitudes (smaller than mL), with \(b^{\prime}\) monotonically increasing with Θ from \(b^{\prime} =0.65\) for Θ = 0.1 to \(b^{\prime} =1.42\), when Θ = 1. In particular, we find an intermediate range of Θ values (Θ [0.4, 0.6]), where \(b^{\prime} \simeq b\), i.e., for which the b 1 regime extends down to small magnitudes.

Fig. 3: The magnitude distribution.
figure 3

Magnitude distribution P(m) for different values of Θ and σ. In the main panel, we fix σ = 5.0 and change Θ, except for the OFC model with σ = 0. In the inset, we fix Θ = 0.5 and change σ. Lines correspond to the GR law P(m) 10bm either with b = 1.06 (green dashed), consistent with the instrumental value, or b = 0.12 (turquoise dot-dashed) or b = 0.40 (black dotted).

Realistic b values of the GR law have already been found by Jagla et al.58,59 in models of only one layer, but including viscoelastic couplings or aging effect in the static friction coefficients36,54,55,56,57,58,59,60,61,62, which in both cases results in an effective additional degree of freedom per lattice site (all these models are spatially extended).

The coupling (Θ > 0) with the layer U also allows us to recover the linear relation between m and log(A). In the OFC model (and when ϵ 0), a degree of freedom slips at most once by construction and therefore M0 A, γ0 = 2/3 (Fig. 4). For the qEW model, the theory of depinning predicts γ0 = 2(1 + ζ/d)/3, with d the dimension of the interface (here it coincides with the layer, d = 2) and ζ its roughness exponent. Here, we have d = 2 and ζ ~ 0.75 (for long-range elasticity, ζ = 0). This is consistent with the values measured: γ0 = 0.87 ± 0.02 (Fig. 4). When Θ > 0 and σ > 0, we find a change to γ0 = 0.96 ± 0.03, independently of Θ and σ (Fig. 4).

Fig. 4: The m – logA scaling.
figure 4

We plot the magnitude m versus log(A), for Θ = 0.5, σ = 5, L = 1000 and ϵ = 0.008. Lines correspond to the relation \(m={\gamma }_{0}{\mathrm{log}\,}_{10}(A)\) with γ0 = 1 (green dashed) and γ0 = 2/3 (turquoise dot-dashed).

Statistical features of aftershocks and foreshocks

Let us now consider the properties of aftershocks and foreshocks. Results do not depend on the specific values of Θ and σ, thus we only present them for intermediate value of Θ = 0.5 and for σ = 5. With these parameters, the GR law is obeyed over a sufficienly large magnitude range. The spatial organization of a typical fore-main-aftershock sequence is plotted in Fig. 2c, d, which presents the contour of the area fractured by a mainshock (here mM = 5.1) and the contours of fracured area of the largest aftershocks and foreshocks (m > 1.5). Figure 2d just indicates that the whole sequence is concentrated in a narrow region of the fault plane close to the mainshock epicenter, while a zoom inside this region (Fig. 2c) provides details of the spatial organization of events. First of all we observe that most aftershocks occur close to the border of the mainshock’s fractured area. This is consistent with the gap hypothesis according to which the increase of stress on the border of the fractured area triggers the aftershocks, whereas the stress reduction inside the fractured region strongly reduces their occurrence probability. This scenario is strongly supported by recent observations of the aftershock organization after big mainshocks64. The same analysis of Wetzler et al.64 for the distribution of the aftershock hypocentral distance, from the contour of the mainshock fractured area, is presented in Supplementary Fig. 5. In our model also, foreshocks occur close to the border of the area that will be fractured by the mainshock. In order to be more quantitative, we plot (inset of Fig. 5) the number of aftershocks naft(mM) and foreshocks \({n}_{{\rm{fore}}}({m}_{M})\) as a function of the mainshock magnitude. We find an exponential behavior \({n}_{{\rm{aft}}}({m}_{M}) \sim 1{0}^{\alpha {m}_{M}}\), which is also observed in instrumental catalogs and known as the productivity law65,66. Also in this case we find quantitative agreement with the value α 1 observed in instrumental catalogs. The inset of Fig. 5 also shows an exponential behavior \({n}_{{\rm{fore}}}({m}_{M}) \sim 1{0}^{\alpha {m}_{M}}\) for the foreshock number with α 1, a result also observed in instrumental catalogs25,26. We also find that the number of foreshocks is usually ~100 times smaller than the aftershock one, and we remark that only for the largest mainshock magnitude mM, we do have a sufficient number of aftershocks (naft(mM)  1000) to study their statistical features inside a single main-aftershock sequence. For this reason, to improve the statistics, we group sequences according to their mainshock’s magnitude, as it is usually done in instrumental catalogs. More precisely, we consider the magnitude distribution of aftershocks (foreshocks) occurring after (before) a mainshock with magnitude m (mMmM + 1]. Results (Fig. 5) confirm that aftershock magnitudes are distributed according to the GR law with b 1. Interestingly, we observe that also foreshocks follow the GR law but with a significantly smaller b value b 0.8. This result is consistent with the existence of an inverse relation between b value and local stress level, as indicated by many laboratory measurements and field observations67,68,69. Accordingly, a smaller b value (larger proportion of large earthquakes) is expected to be observed before the occurrence of the mainshock and close to its hypocenter, as a signature of high stress conditions. Indeed, several studies report the decrease of the b value while approaching the mainshock, and identifies it as a precursory pattern which can improve mainshock forecasting28,29,30,31. Our study represents the first identification of this pattern in a mechanical model simultaneously presenting realistic features of aftershock occurrence. We further note that our measure of the b value is neither biased by the foreshock selection criterion (since we have a perfect separation of sequences) nor is it affected by problems of magnitude completeness (we have access to the smallest earthquakes), which are typical of instrumental catalogs and can be responsible for spurious behaviors of the b value.

Fig. 5: Aftershocks and foreshocks magnitude distributions.
figure 5

We report the total number of aftershocks naft(mmM) (open symbols) and foreshocks nfore(mmM) (filled symbols), with magnitude m, grouped by their mainshock's magnitude mM (see legend). We always consider Θ  = 0.5, σ = 5, L = 1000, and ϵ = 0.008. Lines correspond to the GR law with b = 1.05 (green dot-dashed) and b = 0.83 (magenta dashed). (Inset) The aftershock number naft(mM) (empty symbols) and the foreshock number nfore(mM) (filled symbols) versus mM. The green line is the productivity law with α = 1.

In Fig. 6, we plot the number of aftershocks (foreshocks) naft(tmM) (nfore(tmM)) as function of the time t since (before) the mainshock with magnitude m [mMmM + 0.8), divided by the total number of mainshocks with m [mMmM + 0.8). We find that the aftershock organization in time follows the Omori–Utsu law naft(t) ~ tp with p = 1 over several decades. The inverse Omori law70,71 \({n}_{{\rm{fore}}}(t) \sim {t}^{-p}\), with p = 1, is also found to characterize the temporal organization of foreshocks. It is worth noticing that, at variance with the aftershock occurrence, a clear temporal behavior cannot be extracted from a single foreshock sequence because of the very small number of foreshocks (we find at most 32 foreshocks during one sequence). Thus the inverse Omori law is only observed after stacking many sequences. The vertical shift of curves for different mainshock magnitudes is consistent with the productivity law, in agreement with the inset of Fig. 5. At short times, there is an abrupt transition from an about flat behavior to the 1/t decay. We expect that assuming a finite ratio tη/tR would smooth this transition and help better reproduce instrumental observations.

Fig. 6: The direct and inverse Omori law.
figure 6

The number of aftershocks naft(tmM) (b) and the number of foreshocks \({n}_{{\rm{fore}}}(t,{m}_{M})\) (a) as function of the time t since (and before) the mainshock occurrence. Different colors correspond to different mainshock magnitude classes mM. The dashed line is the hyperbolic Omori–Utsu decay 1/t. The wide range of the vertical scale makes difficult to appreciate the difference between the foreshock number and the corresponding aftershock number. This difference is better ennlightened by results plotted in the inset of Fig. 5. We always consider Θ = 0.5, σ = 5, L = 1000, and ϵ = 0.008.

In Fig. 7, we plot the density of aftershocks or foreshocks ρ(δrmM) as a function of the distance δr between their hypocenter and their mainshock’s hypocenter, grouping events by intervals of mainshock magnitude mM [mMmM + 0.8). There is a clear dependence on mM, and at the same time for any mM the aftershocks and foreshocks share very similar spatial distributions, in agreement with instrumental findings24,25,27. Foreshocks occur mostly over the area fractured by the mainshock, supporting the idea that their spatial organization contains information on the size of the incoming mainshock (in that given region)24,25. Concretely, we find that ρ(δrmM) obeys the scaling law \(\rho (\delta r,{m}_{M})=L({m}_{M}) Q\left(\frac{\delta r}{L({m}_{M})}\right)\) with \(L({M}_{m}) \sim 1{0}^{\gamma {m}_{M}}\) and γ 0.57 ± 0.05. Similar collapses are observed in instrumental catalogs24,25,72,73,74, although a smaller value γ 0.5 is usually observed. A second difference lies in the decay of the scaling function Q(δr): in our model, it is exponential while power-law tails are reported in instrumental catalogs74. This overly fast decay can be attributed to our approximate modeling of elastic interactions within each layer, the correct long-range interaction expected from elasto-static theory2 being replaced (see Eq. (1)) with the short-range term kh2hi. Indeed under this short-range approximation, aftershocks can be triggered only within or at the boundary of the rupture zone, as was shown in refs. 35,36. Finally, we note that this spatial clustering law can be related to the m–logA scaling of Fig. 4, with γ = 1/(2γ0). Indeed since aftershocks are mostly distributed on the border of the area fractured by the mainshock, one has \(L({m}_{M}) \sim \sqrt{A} \sim \sqrt{1{0}^{{m}_{M}/{\gamma }_{0}}} \sim 1{0}^{{m}_{M}\gamma }\).

Fig. 7: The spatial clustering of aftershocks and foreshocks.
figure 7

The spatial density of aftershocks ρaft(δrmM) (open symbols) and foreshocks \({\rho }_{{\rm{fore}}}(\delta r,{m}_{M})\) (filled symbols) as function of the hypocentral distance from the mainshock epicenter δr. Different colors correspond to different mainshock magnitude classes mM. In the inset we show the same data after rescaling by the size of the aftershock area \(L({m}_{M})=1{0}^{\gamma {m}_{M}}\) and γ = 0.54. We always consider Θ = 0.5, σ = 5, L = 1000, and ϵ = 0.008.

Applications and improvements

We have implemented a minimal model for earthquake triggering, modeling the interaction between the brittle part of the crust (an elastic and velocity-weakening region) and the ductile zone (a viscoelastic region with velocity-strengthening rheology). We assume short-range elasticity and infinite time separation, which allows to develop a cellular automaton model controlled by only two parameters, Θ and σ. Very interestingly, we find that as soon as Θ and σ are sufficiently different from zero, we recover the established statistical features of aftershock occurrence, with realistic values of the parameters bαγp. This robustness strongly suggests that our model captures the universality class of instrumental earthquakes. Our model thus provides useful insights on the mechanisms of aftershock triggering. For example, a deviation from the stationary behavior of the b value is found during foreshock sequences, supporting its interpretation as a precursory pattern for the mainshock occurrence.

Although our model misses some features of instrumental earthquakes, such as the power-law decay of the spatial density ρ, it can be very useful. Thanks to its simplicity, we can easily produce very complete synthetic catalogs to test forecasting hypothesis, or mechanisms of stress evolution and how it is related to foreshocks, mainshocks, or aftershocks. It is also possible to extend the single-fault model presented in this study to a more realistic description as a network of faults. One could then study the interaction between different branches of the network.

Methods

Derivation of the cellular automaton rules

We consider two square layers of sides L = 1000. In our lattice geometry, there are exactly four neighbors j for each site i: the stress diffusion terms of the type \({({\nabla }^{2}h)}_{i}\) at site i with positions xy thus stand for \({({\nabla }^{2}h)}_{x,y}=({h}_{x+1,y}+{h}_{x-1,y}+{h}_{x,y+1}+{h}_{x,y-1}-4{h}_{x,y})\). We use absorbing boundary conditions (h = 0 is fixed at the boundary), which means that some stress is absorbed at the boundaries, and the slip cannot propagate further.

We now recall the main assumptions of our continuous model, before explicitly deriving the corresponding cellular automaton. These assumptions are summarized in the mechanical sketch of Fig. 1, from which the equations can be derived. The stress at site i in the layer H is the sum of intra-layer and inter-layer stresses, respectively:

$${f}_{i}={k}_{h}{\nabla }^{2}{h}_{i}$$
(9)
$${g}_{i}=k({u}_{i}-{h}_{i}).$$
(10)

The total stress fi + gi at site i is balanced by a velocity-weakening (Coulomb failure style) friction force τh, which takes a new random value, denoted \({\tau }_{i}^{\mathrm{{th}}}\), after each slip:

$${\tau }_{h}={\tau }_{i}^{\mathrm{{th}}} \sim G(\tau ) \sim {\mathcal{N}}(1,\sigma )$$
(11)

where G(τ) is a gaussian distribution with average 1 and standard deviation σ. As long as τh ≥ fi + gi the site i is pinned, that is \({\dot{h}}_{i}=0\). The constitutive equations operate on various timescales:

$${\tau }_{h}\,\ge\, {f}_{i}+{g}_{i}\qquad \,{{\text{slip}}\, {\text{timescale}}}\,\,{t}_{s}$$
(12)
$${\tau }_{{u}_{i}}={k}_{u}({\nabla }^{2}{u}_{i}-{z}_{i})+k({h}_{i}-{u}_{i})+{k}_{0}({V}_{0}t-{u}_{i})\qquad \,{{\text{slip}}\, {\text{timescale}}}\,\,{t}_{s}$$
(13)
$$\eta \ {\dot{z}}_{i}={k}_{u}({\nabla }^{2}{u}_{i}-{z}_{i}),\qquad \,{\text{visco}}{\hbox{-}}{\text{elastic}} \, {\text{timescale}}\,\,{t}_{\eta }=\frac{\eta }{{k}_{u}}$$
(14)
$${\tau }_{{u}_{i}}(t)={\sigma }_{N}\left({\mu }_{c}+A\, {\mathrm{log}}\,\frac{{\dot{u}}_{i}(t)}{{V}_{c}}\right),\qquad \,{\text{relaxation}} \, {\text{timescale}}\,\,{t}_{R}=\frac{{\rho }_{0}}{{V}_{c}}=\frac{A{\sigma }_{N}}{{k}_{0}{V}_{0}}$$
(15)

The first two equations are the force balance between applied stresses and local friction force, and are thus instantaneous. The third is the internal stress dynamics of the viscoelastic layer U, evolving over an intermediate timescale tη. The fourth is the time evolution of the velocity-strengthening friction, slowly evolving over a timescale tR. We recall the constants: \({V}_{c}=\frac{{k}_{0}}{k+{k}_{0}}{V}_{0}\), \({\rho }_{0}=\frac{A{\sigma }_{N}}{k+{k}_{0}}\).

- Initialization: At time t = 0, we assign a local frictional threshold \({\tau }_{i}^{\mathrm{{th}}}\) extracted from G(τ). We also choose the initial value fi(0) of the local stress at random in the interval \((0,{\tau }_{i}^{\mathrm{{th}}})\) and suppose that at ui(0) = hi(0) in all sites.

- Interseismic phase: At time scales larger than tstηtR, we have \({\dot{u}}_{i}={V}_{c}\) and the equations above simplify. Using Eq. (15), we get τu = μc. At these long timescales (ttη), we have \(\eta {\dot{z}}_{i}=0\) so that using Eq. (14), zi = 2ui. Using Eq. (13), this combines to yield k0V0t = (k + k0)Vct, which explains the necessary definition \({V}_{c}=\frac{{k}_{0}{V}_{0}}{k+{k}_{0}}\). We finally have fi + gi = const. + kVct, and using Eq. (12), we can compute the distance to failure (time before failure):

$${t}_{i}^{({\mathrm{drive}})}=\frac{{\tau }_{i}^{\mathrm{{th}}}-{f}_{i}({t}_{0})-{g}_{i}({t}_{0})}{k{V}_{c}}$$
(16)

with t0 the time at the beginning of this phase. The site i* corresponding to the smallest value of \({t}_{i}^{{\mathrm{(drive)}}}\) is thus identified as the hypocenter of the next earthquake. An amount of stress \({\tau }_{{i}^{* }}^{\mathrm{{th}}}-{f}_{{i}^{* }}({t}_{0})-{g}_{{i}^{* }}({t}_{0})\) is then added to all sites and the coseismic phase is entered, with exactly one site being unstable (the one where \({f}_{{i}^{* }}(t)+{g}_{{i}^{* }}(t)={\tau }_{{i}^{* }}^{\mathrm{{th}}}\)).

- Coseismic phase: Each site with \({f}_{i}(t)+{g}_{i}(t)\ge {\tau }_{i}^{\mathrm{{th}}}\) is unstable and slips of a constant amount Δh, hi → hi + Δh. A slip in the layer H at site i induces a coseismic slip \({u}_{j}\to {u}_{j}+{q}_{{r}_{ij}}\Delta h\) inside the U layer. As explained in the main text, we set qr = 0 for r > 1, i.e., we only keep the local coseismic slip \({q}_{{r}_{ii}}={q}_{0} \, > \,0\) and the nearest-neighbor coseismic slip \({q}_{{r}_{ij}}={q}_{1} \,> \,0\) (when rij = 1). Because of the ductile nature of the layer U there is some dissipation occurring during the coseismic slip, in the sense that the total coseismic slip is less than the slip: \(\overline{\epsilon }=1-{q}_{0}-4{q}_{1}\,> \,0\) (\(\overline{\epsilon }=0\) would be the dissipationless case). This coseismic slip is considered instantaneous and corresponds to the following stress evolution, for the site i itself and for its first neighbors j:

$${f}_{i}(t)\to {f}_{i}(t)-4{k}_{h}\Delta h$$
(17)
$${f}_{j}(t)\to {f}_{j}(t)+{k}_{h}\Delta h$$
(18)
$${g}_{i}(t)\to {g}_{i}(t)+k({q}_{0}-1)\Delta h$$
(19)
$${g}_{j}(t)\to {g}_{j}(t)+k{q}_{1}\Delta h$$
(20)

At this timescale, the internal degrees of freedom zi are fixed and do not evolve. By introducing the parameters

$$\Theta =(1-{q}_{0})\frac{k}{4{k}_{h}},$$
(21)
$$\epsilon =(1-{q}_{0}-4{q}_{1})\frac{k}{4{k}_{h}}=\overline{\epsilon }\frac{k}{4{k}_{h}},$$
(22)

we can factorize:

$${f}_{i}(t)\to {f}_{i}(t)-4{k}_{h}\Delta h$$
(23)
$${f}_{j}(t)\to {f}_{j}(t)+{k}_{h}\Delta h$$
(24)
$${g}_{i}(t)\to {g}_{i}(t)-4{k}_{h}\Theta \Delta h$$
(25)
$${g}_{j}(t)\to {g}_{j}(t)+\left(\Theta -\epsilon \right){k}_{h}\Delta h$$
(26)

Which shows that the coseismic slip stabilizes gi but increases the gj stresses. After a slip, the block hi is in a different frictional condition, i.e., a new value of \({\tau }_{i}^{\mathrm{{th}}}\) is extracted from the distribution G(τ). If \({\tau }_{i}^{\mathrm{{th}}}\) is such that \({f}_{i}(t)+{g}_{i}(t)\,\ge\, {\tau }_{i}^{\mathrm{{th}}}\) then the process of Eqs. (23)–(26) is iterated immediately, until \({f}_{i}(t)+{g}_{i}(t)\,<\, {\tau }_{i}^{\mathrm{{th}}}\). Because of the stress redistribution, nearest-neighbor sites j can be unstable and slip at the same time. In practice, we perform the updates of Eqs. (23)–(26) on all sites for which \({f}_{j}(t)+{g}_{j}(t)\ge {\tau }_{j}^{\mathrm{{th}}}\), until all sites satisfy \({f}_{j}(t)+{g}_{j}(t)\,<\, {\tau }_{j}^{\mathrm{{th}}}\).

We follow a sequential updating scheme, which implies the slip of just one unstable block at a time. Preliminary results with an updating scheme where all unstable blocks simultaneously slip indicate no important difference.

Shortly after the end of the earthquake, the viscoelastic couplings (internal degrees of freedom zi of the layer U) relax their stress (over a timescale tη = η/ku), which in practice means that \(\eta \dot{z}=0={k}_{u}({\nabla }^{2}{u}_{i}-{z}_{i})\). As we explained in the main text, the way in which this relaxation affects the stress in the layer U has already been included implicitly in the coslip dynamics, via the coefficients q0q1, so that on longer timescales we can simply consider that ku(2ui − zi) = 0. In particular, Eq. (5) simplifies into Eq. (6), i.e., the blocks ui become independent.

- Post-seismic phase: At the end of an avalanche, the hi are stuck, so that the intra-layer stress fi remains constant. However, the gi may evolve. Indeed, the ui are subject to a velocity-strengthening rheology and evolve according to Eq. (6), which can be integrated to give Eq. (7), that we recall here: \({u}_{i}(t)={u}_{i}({t}_{0})+{\rho }_{0}\, {\mathrm{log}}\,\left(1+D\frac{t-{t}_{0}}{{t}_{R}}\right)\), where \(D=\exp \left(-g({t}_{0})/A{\sigma }_{N}\right)\) is a constant (over time, but different for each site). This solution can be checked, using Eq. (15) on one side and Eq. (6) on the other. One may also consult our solution for the case of a two-blocks model11, which is the same since all sites ui evolve independently during the afterslip (in the two-block model there was only one block u). Writing gi(t) = gi(t0) + k(ui(t) − ui(t0)), we can identify

$${g}_{i}(t)={g}_{i}({t}_{0})\Phi (t-{t}_{0})$$
(27)

with

$$\Phi (t-{t}_{0})=1-\frac{k}{k+{k}_{0}}\frac{{\mathrm{log}}\,\left(1+D\frac{t-{t}_{0}}{{t}_{R}}\right)}{{\mathrm{log}}\,(D)},$$
(28)

and the local stress value evolves, at times t ≥ t0, according to the equation:

$${f}_{i}(t)+{g}_{i}(t)={f}_{i}({t}_{0})+{g}_{i}({t}_{0})(1-\Phi (t-{t}_{0})).$$
(29)

We note that this happens independently in all the sites (there is no stress transfer between sites, which makes the evolution very simple to compute). It is important to remark that Φ(t) is a monotonically decreasing function of t. For gi(t0) < 0 (which does happen), D is large and at times t − t0 tR, Φ(t − t0) ~ 0. Thus Φ decreases from 1 to  ≈0. This relaxation of the inter-layer stress gi(t) happens to all sites simultaneously. If for a site i, \({f}_{i}({t}_{0})\,> \, {\tau }_{i}^{\mathrm{{th}}}\), there will be a time taft > t0 such that \({f}_{i}({t}_{{\mathrm{aft}}})+{g}_{i}({t}_{{\mathrm{aft}}})={\tau }_{i}^{\mathrm{{th}}}\). For some sites (depending on the slips dynamics), this condition is not fulfilled and no such time exists.

Computationally, we compute the time taft for all sites where it is defined by inverting Eq. (29). Then we pick the smallest taft, that we may call \({t}_{{\mathrm{aft}}}^{* }\), and relax the stress in all sites according to Eq. (29) using \(t={t}_{{\mathrm{aft}}}^{* }\). At this point, there is thus exactly one site that became unstable (the one with the smallest taft), i.e., a new earthquake is triggered. We then proceed with the coseismic phase, until the earthquake is complete. This is the physical mechanism which triggers aftershocks.

After a number of aftershocks, there is a point where no site fulfills the condition \({f}_{i}({t}_{0})\,> \, {\tau }_{i}^{\mathrm{{th}}}\), i.e., no time taft can be defined. In that case, we perform relaxation “for infinite time” (several O(tR)), or more concretely we set gi = 0 at all sites. At this point, the fore-main-aftershock sequence is finished and the interseimic phase resumes, triggering a new sequence.