Introduction

The proliferation of infectious diseases is inherent to the social condition of human beings, and it has strongly marked cultural evolution in the last millennia1,2,3. Certainly, our current bio-chemical knowledge is mature enough to offer pharmaceutical solutions for many diseases. However, the structure of our societies and our way of living (e.g. rapid communications, highly connected world, dense urban areas, pollution, climate change, etc.) contribute to the appearance and quick diffusion of new health threats4. Indeed, the experience of the COVID-19 pandemic highlighted the importance of understanding other aspects of infectious diseases spreading such as: the effect of non-pharmaceutical interventions5,6,7, long-range travel restrictions4, the predictability of epidemic models8, the impact of city structure9,10 and their effect on public opinion11,12,13, etc.

In this context, we focus here on the anomalous behavior of COVID-19 fade-outs. It was observed that epidemic curves after the first epidemic wave are characterized by oscillations, plateaus, linear growth of the total number of cases and high susceptibility to secondary waves14,15,16,17,18,19. These phenomena result in overall long-living, yet marginal, endemic states that differ from the classical exponential decay that one would naively expect. This makes difficult to assess when control measures could be relaxed since their lifting may lead to new major outbreaks. Furthermore, long survival times can easily result in “epidemic broths” where new variants can emerge. Therefore, a proper evaluation of the plausible causes of these long-living states is fundamental to design interventions aimed at controlling disease spreading.

Since classical models fail in predicting persistent small fluctuations close to absorbing states, this situation strikes our understanding of disease spreading and poses a fundamental problem for disease modeling: which is the minimal epidemic model able to generate the observed phenomenology? This topic is widely discussed in the literature. While some works assert that this nontrivial temporal behavior is the result of the intrinsic heterogeneity in the infection parameters or in the structure of the contact networks15,16,17,20, another line of research suggests that individual response could be at its basis, and, in turn, it may depend on the state of the disease14,18,19. Moreover, the physics community also noted how this phenomenology reminds of the features observed at a fined-tuned critical point. For example,21 shows that the linear growth of the total number of cases and incidence plateaus can be induced by the initial conditions of the Susceptible-Infected-Recoved (SIR) model at the critical point while other authors are looking for self-organization mechanisms in order to avoid the dependence on fine-tuned parameters22,23.

In this work, we show that the presence of a small number of undetected cases, either coming from importation or due to local missed detection, explains the emergence of these anomalous fade-outs and long-living states. To show the effect of this mechanism, we study a meta-population epidemic model with a small external seeding. Our approach does not depend, thus, on ad-hoc model modifications, complex behavioral modeling, fine-tuning, or self-organization mechanisms. By means of a coarse-grain of the epidemic dynamics, we are able to extract analytical information about the duration of outbreaks after the first wave. This is a novel analytical procedure to extract global information out of local properties in the context of meta-population models. Our results confirm that endemic states could be sustained by the minimal import of infected individuals below and above the epidemic threshold. This shows that driving epidemics just below the basic reproductive number may not result in a total epidemic fade-out. Moreover, it also makes manifest that the whole concept of herd immunity should be revisited. Finally, we also confront our theoretical derivations with empirical epidemic data of COVID-19 in England, finding that our mechanism can explain the anomalous persistence of the disease and the observed signatures of criticality after the end of the first wave.

Results and Discussion

External forcing

By external seeding (forcing), we mean a process that introduces newly infected individuals to a population. We mainly consider the regime in which these arrivals follow a slow rate, in such a way that the external forcing can be seen as a small perturbation on the system dynamics. Of course, in the very early moments and depending on the epidemic parameters, seeding has the potential to trigger an outbreak. However, rather than in the first wave we are interested in the effects of forcing at later stages of the epidemic spreading.

Such external seeding can be a description of a myriad of processes. An example could be the effect in a certain region or country of trips of undetected infected individuals from the rest of the world, which would act as a reservoir. This can be combined with other mechanisms such as failures in the isolation of infected people, the effect of asymptomatic individuals traveling, false negative tests, non-perfect mobility restrictions, etc. In general, there is no epidemic control strategy that is infallible. Our intention is to model all these undetected cases as an effective (very low) inflow of epidemic seeds. In this sense, the external seeding serves as an approximation intended to represent a myriad of different real epidemic processes that would be impossible to model explicitly.

The effect of external seeding on a single-population SIR model has been studied in detail (see e.g.24,25). Also, the relevance of meta-population models to study scenarios with realistic topologies, such as cities, countries, or global airport connections has been extensively explored (e.g.9,26,27,28,29,30,31,32,33,34). For instance, a recent work has focused on the effects of multiseeding on a meta-population framework35. However, the seeding, in that case, was not external, since it originated in other subpopulations, and it was not particularly small or constant over time. The question that remains open and that we address in this paper is the impact of this small external seeding from a reservoir in a meta-population, especially, in the period between epidemic waves.

Single population SIR model with external seeding

In a single well-mixed population, the SIR model with external seeding is defined by the following rules: Infected individuals become Recovered at rate μ, Susceptible individuals are infected after contact with an infected agent at rate β. Lastly, a random individual can be substituted by an external infected agent with rate h. We use substitution, instead of direct introduction, to conserve the total population N constant (N = S + I + R, where S, I and R are the number of Susceptible, Infected and Recovered individuals, respectively). In our approach, we focus on continuous time stochastic models that generate different epidemic curves [I(t), S(t), R(t)] in every realization. These rules are encoded as the transition rates of an stochastic Markov jumping process36:

$$\begin{array}{l}\mathop{\lim }\limits_{dt\to 0}\frac{P(I+1,S-1,R,t+dt| I,S,R,t)}{dt}=\beta \frac{I}{N}S+\frac{h}{N}S,\\ \mathop{\lim }\limits_{dt\to 0}\frac{P(I-1,S,R+1,t+dt| I,S,R,t)}{dt}=\mu I,\\ \mathop{\lim }\limits_{dt\to 0}\frac{P(I+1,S,R-1,t+dt| I,S,R,t)}{dt}=\frac{h}{N}R.\end{array}$$
(1)

Here the seeding rate appears in the form h/N to account for the substitution of a small number of individuals per unit of time. If instead, h multiplied S and R directly, it would represent the substitution of a fraction of the total population. This further reinforces our message that we are considering small external seeding.

The dynamics of the system in the limit of large population (N → ) can be approximated by the set of deterministic equations:

$$\begin{array}{rcl}\frac{dS}{dt}=-\beta \,I\,\frac{S}{N}-\frac{h}{N}S,\hfill\\ \frac{dI}{dt}=\beta \,I\,\frac{S}{N}-\mu \,I+\frac{h}{N}(S+R),\\ \frac{dR}{dt}=\mu \,I-\frac{h}{N}R,\hfill\end{array}$$
(2)

We proceed to summarize some important information derived from Eqs. (2) that will help us to better understand the stochastic model that will be considered afterward.

Absence of external forcing (h = 0)

When h = 0, the dynamics reduces to the classical SIR model and any state with I = 0 is an absorbing fixed point. The behavior towards the absorbing state is controlled by the basic reproductive number \({{{{{{{{\mathcal{R}}}}}}}}}_{0}=\beta /\mu\). If \({{{{{{{{\mathcal{R}}}}}}}}}_{0} > 1\), the system is in the super-critical phase, characterized first by an exponential growth of the infected individuals and a subsequent exponential decrease, once the number of susceptible individuals is so low that it cannot fuel the epidemic spreading. This passive phenomenon based on starving out the epidemic spread thanks to the development of an immune community is usually called herd immunity. Contrary, if \({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\), the mean-field equations predict a monotonic exponential decay of the number of infected individuals. The case \({{{{{{{{\mathcal{R}}}}}}}}}_{0}=1\) is then the critical point, separating the super-critical and subcritical phases.

These two phases also differ in their stationary states. Whereas the disease reaches a macroscopic fraction of the population (\({\lim}_{t\to \infty }R(t) \sim {{{{{{{\mathcal{O}}}}}}}}(N)\)) when \({{{{{{{{\mathcal{R}}}}}}}}}_{0} > 1\), the sub-critical (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\)) fraction of affected individuals will be small (\({\lim}_{t\to \infty }R(t) \sim {{{{{{{\mathcal{O}}}}}}}}(1)\)). This fact allows us to define the attack rate37,

$$\alpha := \mathop{\lim }\limits_{t\to \infty }\frac{R(t)}{N},$$
(3)

as a control parameter. The attack rate in the SIR model can be computed exactly, and its analytical expression will be useful for the derivations of the next section:

$$\alpha =1+{{{{{{{{\mathcal{R}}}}}}}}}_{0}^{-1}\,{{{{{{{\mathcal{W}}}}}}}}\left(-{s}_{0}\,{{{{{{{{\mathcal{R}}}}}}}}}_{0}\,{e}^{-{{{{{{{{\mathcal{R}}}}}}}}}_{0}}\right).$$
(4)

Where \({s}_{0}=\frac{S(0)}{N}\) is the initial fraction of susceptible individuals and \({{{{{{{\mathcal{W}}}}}}}}(\cdot )\) is the Lambert function. See Supplementary Note S1 for proof of the above equation. The critical point is characterized both by a null attack rate (α = 0), together with a linear growth of the recovered individuals (R(t) t)21.

Not only the magnitude of the outbreak, but also its duration is greatly determined by the basic reproductive number \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\). In the case of the stochastic finite-population SIR model, every realization of the process will have a different value of the attack rate and a different duration. However, Eq. (4) will still be representative of its average behavior. Its stochastic nature is of special relevance close to the critical point, where there is a dominance of fluctuations, with a strongly varying number of new cases per unit of time38,39,40.

With external seeding (h > 0)

If h is non-zero, Eq. (2) has only one fixed point irrespective of the values of h, β and μ:

$$\begin{array}{rcl}{S}_{\infty }\equiv \mathop{\lim }\limits_{t\to \infty }S(t)=\,0,\hfill\\ {I}_{\infty }\equiv \mathop{\lim }\limits_{t\to \infty }I(t)=\,\frac{h}{\mu +\frac{h}{N}},\hfill\\ {R}_{\infty }\equiv \mathop{\lim }\limits_{t\to \infty }R(t)=\,N\frac{\mu }{\mu +\frac{h}{N}}.\hfill\end{array}$$
(5)

This fixed point is stable. Therefore, there is no phase separation regarding the stationary state of the system. The external seeding removes the absorbing nature of the states with I = 0 and the phase transition38. However, the dynamical evolution towards the fixed point will show differences depending on the values of the epidemic parameters. In order to see this, we investigate the behavior of Eq. (2) with initial conditions: \(I(0)={I}_{0} \sim {{{{{{{\mathcal{O}}}}}}}}(1)\), S(0) = N − I0 and R(0) = 0 for a short time window and in the limit of large population, N 1. In these limits, we find the linear approximation to Eq. (2) for early times

$$\frac{dI}{dt}\approx \left(\beta -\mu -\frac{h}{N}\right)\,I+h,$$
(6)

with solution

$$I(t)\approx {I}_{0}\,{e}^{({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}-1)\mu \,t}+\frac{h/\mu }{{{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}-1}\left({e}^{({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}-1)\mu t}-1\right),$$
(7)

where, for the sake of functional similarity, we have named the term \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}={{{{{{{{\mathcal{R}}}}}}}}}_{0}-h/(N\,\mu )\) as the basic reproductive number in the presence of external seeding. If initial conditions without infected individuals are considered, I0 = 0, then new outbreaks are still started by the external seeding. Although the equilibrium values given by Eq. (5) are independent of the value of \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}\), this parameter controls the characteristic time to reach the fixed point. For \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h} > 1\), the number of infected individuals I(t) will first increase exponentially and become of macroscopic order quickly, and then, due to the nonlinear terms in Eq. (2), it will decrease towards the fixed point I. If \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h} < 1\), the evolution can be either monotonic or nonmonotonic depending on the intensity of the seeding rate h, but in both cases the number of infected individuals will remain small through its entire evolution towards I. Therefore, for \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h} < 1\), the disease will still affect a macroscopic portion of the population but in a slow fashion. Interestingly, in the limit of small external seeding and big population size, in which we are interested (\(h \sim {{{{{{{\mathcal{O}}}}}}}}(1)\), N 1), the basic reproductive number for the SIR with or without external seeding are indistinguishable (\({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h}\approx {{{{{{{{\mathcal{R}}}}}}}}}_{0}\)). Therefore, empirical methods to measure the basic reproductive number could not notice the presence of small external seeding.

Finite systems

When stochastic effects are taken into account, the arrival of an infected individual triggers an epidemic outbreak during which the number of infected is different from zero. We will say that a population is active when there is, at least, one infected individual, I > 0. Contrary, an inactive population has I = 0. The random duration τ of an outbreak is the time during which the population is active, (see Fig. 1a for a sketch).

Fig. 1: Local outbreaks.
figure 1

Results of numerical generation of trajectories of the stochastic SIR model with external seeding in a single population using the Gillespie algorithm58,59. a (Example of a local outbreak of duration τ = 4.3 with parameters N = 105, \({{{{{{{{\mathcal{R}}}}}}}}}_{0}=0.8\), μ = 1/3.7 day−1 and h = 1.2. A shadowed area is placed where the disease is deactivated (I = 0). b Average duration 〈τ〉 of the first outbreak started from a single seed, I(0) = 1,  S(0) = N,  R(0) = 0 for different values of the external import (h) and of the basic reproductive number (\({{{{{{{{\mathcal{R}}}}}}}}}_{0}\)). In (c), similar to (b), but the outbreak starts from an initial condition beyond herd immunity, using the initial conditions of Eq. (8). Both in (b) and (c), the results have been averaged over 100 realizations and the errorbars signal the magnitude of the standard error.

The external seeding will create sequences of consecutive outbreaks, as new infected individuals arrive at all times. If the average arrival time h−1 is smaller that the average outbreak duration, 〈τ〉, outbreaks are likely to overlap, while for h−1 〈τ〉, the outbreak due to the arrival of an infected individual will not occur typically until a previous outbreak has disappeared. In the sub-critical regime \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h} < 1\), the outbreaks will be short, while in the case of \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{h} > 1\), the first outbreak will likely generate a large number of infected individuals and, hence, its duration will increase significantly. In Fig. 1b, it is shown that the average duration of the first outbreak 〈τ〉 can be used to characterize the phase diagram of the single population SIR model with external seeding. By comparing with the line of h = 0, it shows evidence that a small external seeding does not produce a drastic change in the characteristic times of the dynamics.

Secondary outbreaks in the super-critical phase will, in general, be much smaller than the first one, both in intensity (number of infected individuals during the outbreak) and in duration, see Supplementary Note S7. In Fig. 1c it is shown the average duration of the second outbreak after the first macroscopic wave. Instead of waiting until the first wave is over, we can force “herd immunity" by starting the simulations from an initial condition:

$$I(0)=1,\,R(0)=\alpha N,\,S(0)=N-I(0)-R(0),$$
(8)

in which the fraction α of recovered equals the attack rate in the absence of external seeding, Eq. (4) with s0 = 1 − 1/N. Also in Fig. 1c, we show that the small external seeding does not alter the characteristic time of these outbreaks.

Independent sub-populations

Throughout the rest of the work, we use a meta-population framework. This is, we deal with V subpopulations, all of them having its own number of individuals (\({\{{N}_{i}\}}_{i = 1,...,V}\)) and separate compartment variables (\({\{{S}_{i},{I}_{i},{R}_{i}\}}_{i = 1,...,V}\)). Every subpopulation follows a well-mixed stochastic SIR model. In the following, we fix the recovery rate to be compatible with the range of values for several important infectious diseases (such as COVID-19 or influenza): μ = 1/3.7 days−141. By this means, we approach the time scales of real diseases and it is possible to grasp in a more intuitive way some results of this work, such as the order of magnitude of the outbreaks survival times. The external seeding replaces an individual chosen at random between the whole system of subpopulations by a new infected individual. This is, a new infected individual enters the system at rate h, replacing an older individual chosen randomly within the \(\mathop{\sum }\nolimits_{i}^{V}{N}_{i}\) total members of the whole population. Every sub-population i is thus selected to receive the seed with a probability proportional to its population Ni.

We start with a simplistic setting in which all the sub-populations have the same number of individuals (namely, Ni = 105,   i [1, V]) and are independent (there is no circulation of agents between them). In this way, the external field is the only responsible for the onset of local epidemic outbreaks. This situation could model a strict lockdown in which mobility restrictions keep the sub-populations fully isolated. The external seeding is considered a small perturbation of such severe confinement. This simple approximation allows us to make analytical calculations and build the understanding of more realistic scenarios with communication between the sub-populations considered in the next section.

We are primarily interested in the anomalous epidemic fade-out after the first macroscopic wave of infections. For this reason and as in Eq. (8), we fix the initial condition such that the total number of infected individuals is equal to zero and the fraction of recovered individuals is such that the possibility of macroscopic outbreaks is avoided. This means that for each sub-population i:

$${R}_{i}(0)=\alpha \,{N}_{i},\,{I}_{i}(0)=0,\,{S}_{i}(0)={N}_{i}-{I}_{i}(0)-{R}_{i}(0).$$
(9)

where α is the attack rate in the absence of external seeding (Eq. (4) with s0 = 1 − 1/N). In this way, we mimic a situation in which the whole system suffered a major super-critical outbreak. This initial condition would be an absorbing state in the absence of external seeding (h = 0), but its presence (h > 0) opens the possibility to generate further outbreaks. We provide insights about the behavior of the system with different initial conditions in Supplementary Note S8.

In the following, we will differentiate local from global properties. Being the local properties those referred to individual sub-populations, e.g., the prevalence in a sub-population i, Ii(t)/Ni, is local, while the total number of infected agents \(I(t)=\mathop{\sum }\nolimits_{i = 1}^{V}{I}_{i}(t)\) is global. One of our objectives is to understand some global characteristics from the knowledge of the local ones. The main magnitude from which we will base the conclusions of this work is the duration of global outbreaks, τG, defined as the time in which the number of infected individuals in the whole system remains strictly greater than zero. If at time t1 an external seed enters a system with no other infected individuals, I(t1) = 1, and the global prevalence remains different from zero until time t2 > t1, this is, I(t2) = 0 but I(t) > 0 for t1 < t < t2, then the duration of a global outbreak would be τG = t2 − t1. In a more intuitive way, the uninterrupted concatenation of local outbreaks results in a global outbreak (see Fig. 2a for a sketch illustrating the difference between local and global outbreaks). Also, see Fig. 2b-c for an example of computation of τG in simulations (see the methods section for further details regarding simulations).

Fig. 2: Global outbreaks.
figure 2

In (a), we show a sketch with four different sub-populations, labeled i, j, k, and l, experiencing local outbreaks. Each of these outbreaks starts at a different time and has a different duration (τ). They all contribute to a global outbreak of duration τG. Such a global outbreak started with the first local outbreak in sub-population i at time t1 and finished at time t2, when the last local outbreak died out (in sub-population l). In panels (b) and (c), we show the meaning of local and global outbreaks with actual simulations. In (b), we show the total number of infected individuals (I = ∑iIi) in a particular instance of a global outbreak. This global outbreak was initiated at time t1 = 24, when the external seeding acted on the system with no other infected agent, and lasted until time t2 = 195, when the total number of infected individuals became zero. The total duration of the global outbreak is τG = t2 − t1 = 171 days. In (c), using the same realization displayed in (b), we enquire about the duration of local outbreaks. The length of horizontal lines is the duration of local outbreaks, whereas the vertical axis informs about the label of the sub-populations. One can see how local outbreaks pile up generating the global outbreak of duration τG. The parameters used to generate this example where V = 1600, β = 0.8μ, and h = 0.1 days−1 (h/V = 6.25  10−5).

In Fig. 3, we plot 〈τG〉 as a function of the basic reproductive number \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\) and the external seeding rate h. This figure informs thus about the time-scales for which the disease is active in the system. Interestingly, we can distinguish a cross-over between two regimes from the duration of global outbreaks: one in which the typical time-scale of global outbreaks is much bigger than the one of local outbreaks, and another in which the duration of global and local outbreaks share the order of magnitude. The presence of long global outbreaks after the first epidemic wave for small values of the seeding rate h and for values of \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\) both larger and smaller than one constitutes one of the main results of this paper. This provides a very simple mechanism to explain the observed endemic-yet-marginal epidemic states.

Fig. 3: Duration of global outbreaks.
figure 3

In this figure, we investigate the average global outbreak duration, 〈τG〉, for V = 1600 independent (isolated) sub-populations obtained from numerical simulations using the Gillespie algorithm (58,59). Global outbreaks that do not end by the time \({t}_{\max }=4\cdot 1{0}^{4}\) are stopped. The global outbreak duration was averaged over 100 realizations for different values of the basic reproductive number, \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\), and the external seeding rate, h. A transition can be observed between a region where the duration of global outbreaks is of the order of local ones (in red, bottom left and right) and another one in which global outbreaks are much longer than local ones (in blue, center and top).

There is an interplay between the external seeding and the epidemic dynamics. Given that the rate of activation (h) is small, if the duration of local outbreaks (tuned with \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\)) is too low, then the global outbreak is not sustained and will be quickly interrupted. Only if there is a proper balance between the two dynamics, we can observe the increase in the global outbreak duration (Fig. 3). Note that the single population setting cannot explain this effect. The overlapping of sub-populations is necessary to generate the endemic state of the disease (Fig. 1b).

Theory

We define n(t) as the number of active sub-populations (those for which the number of infected people is greater than zero) at time t. Our assumption is that we can write the evolution of the probability P(n, t) that n takes a certain value at a time t in terms of a master equation with time-independent transition rates42,43. For this, we can write the variable n(t) and the corresponding rates W+ and W of the master equation as follows

$$\begin{array}{rcl}{W}_{+}=\mathop{\lim }\limits_{dt\to 0}\frac{P(n(t+dt)={n}_{0}+1| n(t)={n}_{0})}{(V-{n}_{0})\,dt},\\ {W}_{-}=\mathop{\lim }\limits_{dt\to 0}\frac{P(n(t+dt)={n}_{0}-1| n(t)={n}_{0})}{{n}_{0}\,dt}.\end{array}$$
(10)

Where W (resp. W+) is defined as the rate at which one particular active (inactive) population gets deactivated (activated). The objective is to find an expression for the duration of global outbreaks as a function of W and W+ in the limit of a large number of sub-populations, V 1. To do so, we make use of the framework of the backward Kolmogorov equation to compute the average time to go from n = 1 to n = 0 (see42,44,45 and Supplementary Notes S2, S3, S5), which yields

$$\langle {\tau }_{G}\rangle \sim \frac{1}{{W}_{+}V}\,\left({e}^{\frac{{W}_{+}}{{W}_{-}}V}-1\right).$$
(11)

In order to further exploit Eq. (11), we need to identify the rates of the activation-deactivation process (W and W+) as functions of \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\), h, V and N. W+ is the rate at which one inactive sub-population becomes active. Since the external seeding acts uniformly on every sub-population, we obtain that W+ = h/V. We associate W with the inverse of the average time that a sub-population remains active: W = 1/〈τ〉. In order to work with analytically tractable expressions, we approximate 〈τ〉 for small h with the average time with h = 0. When h ≈ 0, it is unlikely that many external seeds enter in the same active sub-population; and even if so, they would not introduce big changes in the time scales (see Fig. 1b and c). Furthermore, we will treat differently the sub-critical (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\)) and super-critical (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} > 1\)) regimes. In the sub-critical region, we approximate the duration of local outbreaks by the average duration of a one-population outbreak in the SIS model (this statement is discussed in Supplementary Note S6):

$${W}_{-}\approx -\frac{\mu \,{{{{{{{{\mathcal{R}}}}}}}}}_{0}}{\log (1-{{{{{{{{\mathcal{R}}}}}}}}}_{0})},$$
(12)

see Supplementary Note S4 for details on the derivation of Eq (12). Therefore, we can rewrite Eq. (11) as

$$\langle {\tau }_{G}\rangle \sim \frac{1}{h}\,\left[\exp \left(\frac{-h\,\log (1-{{{{{{{{\mathcal{R}}}}}}}}}_{0})}{\mu \,{{{{{{{{\mathcal{R}}}}}}}}}_{0}}\right)-1\right].$$
(13)

This equation sheds light on the numerical results of Fig. 3 for \({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\). In the first place, we can now reproduce the sub-critical region of this figure without an upper cut-off. Secondly, it allows us to collapse all the h, \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\) and V dependence in a single curve, see Fig. 4a. Moreover, it shows that the transition to large persistence times is not abrupt, the curve is continuous, non-divergent, and with well-behaved derivatives. Besides, it is of special interest that the scaling of W+ with V cancels the dependence of the average time with the global system size (see Fig. 4). Therefore, the general behavior of the long-lived epidemic states should not depend on the meta-population size and could be present at different scales (village, city or country level).

Fig. 4: Analytical prediction for the duration of global outbreaks.
figure 4

Comparison of analytical expressions for the average duration of global outbreaks with simulations. In (a), we show \(\langle {\tau }_{{}_{G}}\rangle\) averaged over 100 realizations for different values of V, \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\) and h. Here, we only use subcritical values for the basic reproductive number (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\)). The functional relation to collapse the data in a single curve is Eq. (13), which is shown in dashed lines. In (b), we do the same, but concentrating on the super-critical region (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\)), and using Eq. (15) to collapse the data. The dashed line is a plot of Eq. (15). Both plots show evidence of the good agreement of the simulations with the theoretical predictions. The plateau observed in simulations is an artifact that corresponds to the maximum time of simulations (\({t}_{\max }\)).

In terms of the super-critical phase (\({{{{{{{{\mathcal{R}}}}}}}}}_{0} > 1\)), it is important to stress that sub-populations that reached local-herd immunity are still susceptible to generate further outbreaks induced by the external seeding h or by infected visitors from other subpopulations. However, these outbreaks will not be macroscopic, since there is not a susceptible population large enough to fuel them. Our way to make quantitative predictions in this regime starts by noticing that the statistics of these outbreaks “beyond herd-immunity" resemble those of the sub-critical regime in a susceptible population. Indeed, we can map the epidemic dynamics beyond herd-immunity by a sub-critical SIS model with a new effective transmission rate

$${\beta }^{{\prime} }=\beta \,(1-\alpha ),$$
(14)

where α is the attack rate defined in Eq. (4). See S7 for details on the derivation of Eq. (14). Therefore, we are able to exploit the same explanation built in for the sub-critical phase: even if local herd-immunity is reached, simultaneous sub-critical local outbreaks can percolate in time resulting in an endemic state at the global level. This observation enables us to estimate the average time of global outbreaks in the super-critical regime using an analogous version of Eq. (13) with \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{{\prime} }={\beta }^{{\prime} }/\mu =-{{{{{{{\mathcal{W}}}}}}}}(-{s}_{0}{{{{{{{{\mathcal{R}}}}}}}}}_{0}{e}^{-{{{{{{{{\mathcal{R}}}}}}}}}_{0}})\), verifying \({{{{{{{{\mathcal{R}}}}}}}}}_{0}^{{\prime} }\in (0,1)\) for \({{{{{{{{\mathcal{R}}}}}}}}}_{0} > 1\) (see Supplementary Note S7):

$$\langle {\tau }_{G}\rangle \sim \frac{1}{h}\,\left[\exp \left(\frac{-h\,\log (1-{{{{{{{{\mathcal{R}}}}}}}}}_{0}^{{\prime} })}{\mu \,{{{{{{{{\mathcal{R}}}}}}}}}_{0}^{{\prime} }}\right)-1\right].$$
(15)

Eq. (15) predicts a similar collapse of data that the one observed in the subcritical regime but using now the effective transmission rate, see Fig. 4b.

Adding mobility between sub-populations

Although our results until now explain the emergence of epidemic endemic states, the assumption of independence between the sub-populations limits their applicability to real-world scenarios. A more realistic setting has to take into account that individuals can move across different subpopulations. This possibility enables a different mechanism to start local outbreaks, since infected agents can visit susceptible populations and susceptible individuals can also get infected out of their residence subpopulation. Through the rest of this section, epidemic dynamics and the external seeding is implemented as explained in the case of independent sub-populations. At a constant rate h one individual from the entire population is replaced by a new infected seed coming from outside the system.

Random diffusion

Our first approximation to include mobility explicitly is pure random diffusion between sub-populations: every agent will jump to a connected neighboring subpopulation at a constant rate M. For the moment, the number of connections per sub-populations is a constant (k), and the initial condition is uniform Ni = N = 105 population across all sub-populations i. Under these circumstances, the distribution of the number of inhabitants will remain constant on average. Despite it has been shown that pure diffusion is not a proper description of human mobility in all scales, it has been used to model the large-scale spreading of infectious diseases (see, for example, the implementation of air traveling in30,37,46,47,48). We shall see in the next section that the main results discussed here hold as well for the case of recurrent mobility.

Similarly to our procedure in the section of independent subpopulations, we first investigate the duration of global outbreaks with direct simulations of the stochastic process in which, once more, we implement a maximum time \({t}_{\max }\) at which simulations stop. As we are interested in the arising of anomalous outbreaks after the first wave, we will set in each subpopulation the initial conditions given by Eq. (9).

In Fig. 5, we show the average duration of the first global outbreak for different values of the mobility rate M and the basic reproductive number \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\). Essentially, the effect of mobility in the range of M explored is negligible and the 〈τG〉 of all simulations coincide with the theoretical prediction for the independent sub-populations case (Eq. (13) and Eq. (15), both shown in dashed lines in Fig. 5).

Fig. 5: Effect of mobility: diffusion.
figure 5

Effect of random diffusion on the duration of global outbreaks. Dots show the duration of the a global outbreak averaged over 100 simulations for different values of \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\) and M. We show a transect of fixed external seeding (h = 1 and h/V = 2.5 × 10−3). The initial condition mimics the situation after the first epidemic outbreak [Eqs. (9)]. The topology is a squared lattice with periodic boundary conditions (k = 4) with V = 400 (20 × 20). All simulations are stopped either at time \({t}_{\max }=4\times 1{0}^{4}\) (days) (horizontal dotted line) or when the total prevalence reaches zero. Dashed curved lines show our analytical estimations (Eqs. (13) for \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\, < \,1\) and (15) for \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\, > \,1\)). As discussed in the text, mobility doesn't have a deep effect since no macroscopic outbreaks are expected.

The probability that the first epidemic outbreak in a given sub-population i affects a neighboring sub-population j mainly depends on two factors: the number of infected individuals in i, and the rate M at which individuals from i travel to neighboring subpopulations37. This is the reason why in a situation with no macroscopic outbreaks, we do not expect mobility to play a major role. In S8, we check that mobility does play a role in the duration of global outbreaks when considering the first wave in the analysis. Also in S8, we test that our results are robust to variations in the initial condition.

We note the predictive power of Eqs. (13) and (15) even in the case of mobility. Its applicability is remarkable, given the strong approximations introduced for its derivation (independent sub-populations and SIS dynamics).

Recurrent mobility

We test next the robustness of our findings when the mobility is recurrent. This type of mobility is used to model back and forth trips as those related to commuting, which represent the majority of daily mobility in urban environments. It has been proven that recurrent mobility produces different propagation patterns compared to diffusion due to the repetition of contacts in the residence and working areas9,10,35,47,49,50. In practice, we assign to every agent a sub-population of residence and one of work (which can be the same). Agents are assumed to spend 1/3 of the day in the working sub-population and the rest 2/3 in the residence one. Note that this implies that the initial number of residents in each sub-population (\({\{{N}_{i}\}}_{i = 1,...,V}\)) is preserved in time. In this case, mobility fluxes are parameterized by the fraction of resident agents traveling every day (m). Once the daily fluxes between subpopulations are fixed, they remain the same during all the simulation. The main variables are thus the number of individuals living in subpopulation i and working in j at each of the disease states ({Xij}, where i, j = 1, . . . , V, and X can be S,  I or R). The addition of recurrent mobility makes it difficult to simulate the stochastic process using, for instance, the Gillespie algorithm in feasible times. In order to reduce the computing time, we make use of an approximation that exploits the difference between the time scales of the epidemic and mobility rates31,51. The basic idea behind this approximation is that recurrent mobility is encoded in an effective transmission rate that depends on mobility and demographic characteristics of each sub-population.

We show next that the regimes obtained in the previous sections still hold and they are not an artifact derived from the uniform distribution of populations and connections, nor of the specific type of mobility. In Fig. 6, we obtain similar patterns in the phase diagram as we vary the mobility intensity parameter (m) for:

  • Figure 6a,b: A configuration in which sub-populations form a 2-D regular lattice with a Gaussian distribution of the number of residents (average 105 individuals and σ = 3/20 × 105). The number of agents traveling in each link between sub-populations i and j are mNi/4.

  • Figure 6c,d: Sub-populations are connected by a scale-free network generated with the configurational model and with degree distribution P(k) ~ k−2.5. The average degree is 〈k〉 = 6.2. The outflow of a sub-population i is equally distributed across the links departing fromi and it is equal to mNi/ki.

  • Figure 6e,f: A realistic application in the city of Paris. The basic divisions of the city are census areas “ensemble des communes", the resident populations and commuting networks are obtained from official statics52,53. As before, we use a control parameter (m) to determine the fraction of resident population that commutes. The destinations are selected according to the empirical commuting flows. For example, if ωij is the empirical number of individuals living in i and working in j, we will consider in our simulations mNiωij/∑ωi travelers in the link i − j.

Fig. 6: Effect of mobility: recurrent mobility.
figure 6

Average global outbreak duration for sub-populations connected through recurrent mobility, and for two values of the portion of travelers (m). Also, different topologies and demographic statistics are inspected. Averages were performed over 100 realizations. Global outbreaks that do not end by the time \({t}_{\max }=4\times 1{0}^{4}\) are stopped. In (a, b), V = 400 sub-populations are connected forming a regular lattice with periodic boundary conditions. Demographics are Gaussian distributed. In (c, d), the topology is scale-free network with a degree distribution P(k) ~ k−2.5 and with V = 400 sub-populations proportional to the degree. In (e, f), connectivity and populations are read from commuting data of the city of Paris (V = 469).

In all these panels, we note that there is an extended parametric area for which the endemic states emerge. These results are robust to changes in the initial condition (see Supplementary Note S8) and to the epidemic model: i.e., a SEIR model also generates the same variety of behaviors (see Supplementary Note S9).

Empirical evidence

Lastly, we compare our predictions with publicly available governmental data of COVID-19 spreading in England54. Specifically, we focus on the anomalous epidemic fade-outs observed in COVID-19 incidence (number of new infections per day) between the two first waves of the pandemic. A situation that exactly represents the assumptions of our model. Broadly speaking, this period corresponds to the months between April and September 2020 and the geographical resolution of our data is at the level of the “lower-tier local authorities" (LTLA) –one of the administrative units in which the country is divided. England is composed of 315 (see a sketch in Fig. S10 of the Supplementary Information) of these divisions whose average population is 179, 000 inhabitants.

In Fig. 7, we show one instance of the evolution of the prevalence in a particular LTLA. In the data, we can differentiate two regimes with exponential changes in the prevalence, associated with the decay and growth of the first and second wave respectively. We can also see a third dynamical phase between the two waves, in which prevalence flattens and is low but almost always nonzero. This phase is what we call an anomalous persistent fade-out since it cannot be characterized by the standard models (see Fig. 7a). However, as shown in Fig. 7b, our model equipped with an external field is able to reproduce both the exponential decay and the subsequent fluctuating plateau.

Fig. 7: Anomalous fade-outs in real data.
figure 7

In the two panels we show with dots the evolution of the prevalence in one particular LTLA corresponding to the region of Haringey, in London. Observing the data, we differentiate three dynamical regions regarding the behavior of the prevalence corresponding to the first wave (exponential decay), the anomalous fade-out (fluctuating plateau), and the second wave (exponential growth). In (a) and (b), we also show results from simulations carried with the mobility switched off as described in the case of independent sub-populations, using the demographic details of the LTLA and we fix \({{{{{{{{\mathcal{R}}}}}}}}}_{0}=0.8\). The solid line represents the median and the shadowed area of the first and third quartiles obtained from 103 simulations. In (a), we show the evolution obtained with simulations without external seeding (h = 0). With this setting, the model reproduces properly the exponential decay but fails in describing the subsequent plateau. In (b), simulations are run with h = 0.2. In this case, the model captures both the decay and the plateau regimes.

Fig. 8 adds more quantitative information to our discussion. It shows that the distribution of times for which LTLAs have zero prevalence is well-fitted by an exponential functional form. This is precisely the distribution expected by our model when neglecting mobility of infected individuals. In this case, the activation of an LTLA can only be caused by the field and the distribution of times with zero prevalence would read

$$P(t)=h\,{e}^{-ht}.$$
(16)

Where h is the rate at which infected individuals enter the LTLAs from outside. Hence, one can estimate the external field from the exponential fit of the distribution in Fig. 8 together with Eq. (16), in this case, obtaining h ≈ 0.04 days−1 as a proxy for the external seeding rate at LTLA level.

Fig. 8: Distribution of times between consecutive outbreaks.
figure 8

Distribution of times for which the LTLAs have zero prevalence, as predicted by the theory, is well fitted by an exponential distribution (Eq. (16)). The value of the exponent of the best fit is 0.041(8), and can be used as a proxy for the rate at which infected individuals enter the LTLAs from outside per unit of time.

With the previous results we have shown that our model is capable of reproducing the statistics of epidemics in the period within the two first waves. Furthermore, external seeding can also explain features of anomalous fade-outs that resemble those of fine-tuned critical points. Indeed, in Fig. 9, we show that the growth of recovered individuals during the period within the two first waves is well-fitted by a linear function. This linear growth is a general characteristic in our model that we would expect for any external field with I ≈ 0 and \({{{{{{{{\mathcal{R}}}}}}}}}_{0} < 1\), however, the linear growth is only shown at the critical point in the standard SIR model21. A different signature of criticality is given by the measures of the effective reproductive number obtained in real data, which fluctuate around the critical value (see Fig. 10a). Remarkably, we can measure similar kind of fluctuating and near-critical values for the effective reproductive number on data generated with simulations of a metapopulation SIR model with external seeding (see Fig. 10b). In Fig. 10c, we show that this behavior disappears when the external seeding is switched off. In this case, we observe more monotonous and clearly sub-critical values for the basic reproductive number. In Figs. S11, S12, and S13 we show that this phenomenon is observed in a robust way for different values of the external seeding.

Fig. 9: Linear growth of recoveries.
figure 9

For each LTLA we fit the evolution of the number of recovered individuals to a linear function during the period between the first two epidemic waves. In the figure, it is shown the probability distribution of the coefficients of determination (R2) resulting from the linear fit. The number of recoveries is measured as the cumulative of the incidence minus the prevalence (\(R(t)=\mathop{\sum }\nolimits_{{t}^{{\prime} } = 0}^{{t}^{{\prime} } = t}\,{{{{{{{\rm{inc}}}}}}}}({t}^{{\prime} })-I({t}^{{\prime} })\)). The evolution of the number of recoveries is well-fitted by a linear function in the majority of LTLAS (R2 ≈ 1). The linear growth is the one expected by our model for any external seed when the prevalence is close to zero and \({{{{{{{{\mathcal{R}}}}}}}}}_{0}\, < \,1\). However, this kind of linear growth would only be present at the critical point of the classic SIR model without external field. Those LTLAs in which the number of recoveries does not follow a linear function (R2 ≈ 0) are associated with LTLAs in which the activity between waves was null or very low (see Fig. S15). The inner plot shows one example of the evolution of the number of recoveries together with the shadow area signaling the period in which the curve is well-fitted by a linear function. More examples are provided in Fig. S15.

Fig. 10: Critical values for the effective reproductive number.
figure 10

Measures of the effective reproductive number. All measures shown of the time-varying reproduction number are computed with the method explained in60 and its associated package. In (a), we show the effective reproductive number estimated from the incidence in England during the period between the first and second Covid-19 waves. In (b) and (c), we show measures of the basic reproductive number over one realization of our model as described in the results section for the case of independent sub-populations with V = 32, N = 8000, thus representing the average size and population of an LTLA. In both (b) and (c), the initial condition is I(t = 0) = 185, S(0) = V*N − 185 and \({{{{{{{{\mathcal{R}}}}}}}}}_{0}=0.8\), thus resembling the epidemic state of of the LTLA shown in Fig. 7 on March 2020 as measured from the real data. In (b), there is an external field (h = 0.5), and we recover the near-critical and fluctuating values for the effective reproductive number observed in real data [this is, in (a)]. In (c), the external seeding is switched-off (h = 0), and the values of the effective reproductive number are more monotonous and clearly sub-critical.

Conclusions

We have proposed and studied the addition of a small external field to a SIR dynamics on a meta-population system. This field accounts for the important rate of infectious or latent individuals undetected to the surveillance systems and who can arrive from other populations or even reside in the local one. We show that small external fields are not noticeable by usual estimates of the basic reproductive number, yet they can have noticeable effects at the global scale. Our findings are general and not restricted to a specific disease. However, they are specially well-suited for the COVID-19 situation, in which non-vaccinated regions could act as reservoirs of undetected infections at low-yet-constant rates.

Our main result is that a small external seeding can cause epidemic endemic states for an extended parametric region. This phenomenon has relevant consequences: 1) Even if the pharmaceutical and non-pharmaceutical response to an epidemic crisis can ensure that the transmissibility becomes sub-critical, it cannot be granted that the disease fades out. The spreading survives in a low-prevalence, yet uninterrupted epidemic state. The danger of these persistent states is that the system is highly susceptible to generating new exponential outbreaks as soon as control measures are lifted or new variants emerge. 2) For super-critical scenarios, we also show that herd immunity in all sub-populations does not imply an extinction of epidemics at the global level. This fact echoes the results of46, which showed that rats acting as a reservoir of bubonic plague remove the concept of herd immunity even if the full population is vaccinated. In our case, it is not necessary a reservoir species since humans from other populations by themselves act as the reservoirs. Thus, we join a recent current of works claiming that the whole notion of herd immunity must be revisited15,55.

The framework of the backward Kolmogorov equations, used to compute fixation times, allowed us to check our numerical findings analytically and obtain scaling relations. Moreover, it shows that this phenomenon is not linked with a sharp transition around a tipping point. The map of the SIR model to a two-state system conceptually means a coarse-graining of the local dynamics. This strategy could be further exploited in the future in order to deal with the local-global complex relation inherent to any meta-population structure.

This work is specially pertinent as the current literature is struggling to find explanations to criticality signatures found in the COVID-19 spread (uninterrupted-yet-small prevalence, linear growth of the recoveries, high susceptibility to changes in mobility restrictions and social distancing, etc). Our model is capable of reproducing the persistence of the COVID-19 disease between waves in the census areas of England. We can also explain empirical features such as the exponential distribution of the time between outbreaks, the linear growth of the recoveries and the near-critical values of the effective reproductive number. These results are remarkable given the simplicity of our assumptions and the lack of fine-tuning. Our model is not equipped with the explicit time-dependence needed to capture the arising of new macroscopic prevalence peaks (that we link to reduction of the restriction measures and the arising of new variants). Although it is possible to develop a multi-strain version, we kept the model simple in this work for the sake of analytical tractability.

Methods

Simulations

The duration of global outbreaks is a random variable, and we study its average value 〈τG〉. Our first approach to examine the behavior of 〈τG〉 was to compute it from direct simulations. Since we are interested in the anomalous epidemic fade-out after the first macroscopic wave, our simulations start from the initial condition defined in Eq. (9). Then, at some stochastic time t1, the external seeding will generate one infected seed in sub-population label i. This event will start both a local outbreak in sub-population i and a global outbreak. At a different time t2 > t1 the total prevalence will be zero for first time after t1. We will stop the simulation at t2 and sample one value of the total duration of global outbreaks as τG = t2 − t1. Let us remark that the local and global events differ since the external seeding could activate multiple sub-populations before t2. Repeating this experiment many times one can access the ensemble average 〈τG〉. A technical difficulty arises since, as we will see, the values of τG can be prohibitively large in order to access them with simulations. Hence, in order to make affordable the computational cost of the work, in the simulations we set up an upper time limit \({t}_{\max }\) after which we stop the simulation independently on whether the global outbreak has vanished or not. Therefore, the maximum value that one can sample for τG with this approach is \({t}_{\max }\).

Data analysis

Disease prevalence (number of infected individuals per day) is computed from the empirical incidence by assigning to each new case an incubation and an infectious period. The first one is sampled from a log-normal distribution, with a mean incubation period of 5.2 days, parameterized as in56. The infectious period follows a exponential distribution with mean 2.3 days chosen as in41. Since incidence data is only weekly available, we uniformly distribute cases over the days of the week to facilitate the analysis and comparison with the simulation results. A sensitivity analysis demonstrating the robustness of our empirical results is presented through the figures in Supplementary Note S10.