Introduction

The COVID-19 pandemic1,2 has led to more than 500 million infected3 and more than 6 million death3 worldwide until the beginning of Mai 2022. During the course of the pandemic the SARS-CoV-2 virus has mutated into various different strains4,5, some of which have led to an increased infection rate6,7,8 as compared to the original strain2 (Wuhan 2019). Examples are the variants B.1.1.7 and B.1.351, which have driven a strong rise of infection numbers in the United Kingdom and South Africa9,10 in late 202011,12 and the P.1 mutation which has induced an infection wave in Brazil13 in early 2021.

The availability and ongoing vaccine production gives hope to slowly gain control of the disease14,15,16. However, before herd immunity (if at all achievable) is finally reached worldwide it will take many month or even years, which the virus will exploit to mutate into a range of new strains. Thus, at the timescale of months or years a race is looming ahead between the occurrence of new mutations and the adaption and mass-production of existing vaccines to get these mutations under control. In particular, this makes it questionable if the present (and future) vaccination programs are sufficiently effective to ultimately get diseases like COVID-19 under control. It is therefore important to understand possible mutation-induced long-time disease-evolution scenarios e.g. in view of the ongoing discussions regarding the release of patents to accelerate worldwide vaccination but also regarding requirement of measures like social distancing once the infection numbers show a downwards trend.

Notably, historical examples to assess possible long-time consequences of mutation cascades are scarce, since particularly severe mutations have traditionally led to a rapid death of infected individuals eliminating these mutations. Thanks to modern medical treatment based e.g. on extracorporeal membrane oxygenation support or artificial aspiration, however, such a self-elimination of severe mutations is largely absent. Notably, besides the positive effect of immediately saving many lives, these treatments also have the side effect of inducing a potentially disastrous self-amplification of mutations and infection-rates. Here we are interested in particular in the effects of mutations on the spreading of an infectious disease in phases where (i) mutations can serve as seeds for further mutations some of which are even more infectious than the strain from which they have emerged and (ii) mutation rates are either constant or higher when infection numbers are high. Both factors together can generally lead to a positive feedback loop between infection numbers and mutations suggesting severe long-time mutation-induced effects for the disease evolution. Actual data for COVID-19 mutations show, in fact, early signatures supporting such a possible self-amplification scenario during some phases of the disease: They reveal an initial constant and a subsequent nonlinear growth of the relevant infection rate (Fig. 1a). For future pandemics, it would be highly important to understand the possible long-time consequences of such a self-amplification mechanism and how fast vaccination has to progress worldwide in order to suppress the most dramatic ones. However, following the scarcity of useful historical examples illustrating the possible consequences, we have to rely on models to explore the possible impact of mutations on the long-time evolution of the disease dynamics, in particular also in the presence of vaccination and other actions counteracting the self-amplification mechanism. To provide a concrete starting point for such an exploration, in the present work, we develop a statistical minimal model to predict possible mutation-induced effects for the long-time evolution of infectious diseases like COVID-19. We first develop a stochastic multi-strain generalization of the popular susceptible-infected-recovered (SIR) model to account for the random occurrence of mutations and then use the coarse-graining concept of statistical physics to derive an effective mean-field model enabling general predictions of the most likely scenarios for a given scenario (characterized by parameters such as the mutation and the vaccination rate). See Fig. 1b for an schematic illustration of our approach.

Figure 1
figure 1

The statistics of mutation formation and its impact on the course of an epidemic. (a) Reproduction number for different COVID-19 mutations as a function of the their emergence time (for details see “Methods”). Green dashed line shows a linear fit to initial constant growth. (b) For a constant mutation rate (middle panel, green background) an ensemble average of the multi component description with many infection strains \(I_n(t)\) leads to multiple infection waves of the global infection number I(t). Beyond the constant mutation rate, if the mutation rate is coupled to the infection number (right panel, red background), the ensemble average produces a hidden singularity, as manifested by the giant infection wave. Only one representative realization of the multi component description is shown in the blue frames. The left panel (yellow background) indicates the different levels of description starting from multi components leading to an effective mean field by coarse graining.

One generic prediction of our model is that mutations induce an explosive super-exponential growth of the infection numbers rather than the ordinary and much discussed normal exponential growth, in phases where the population is far away from herd immunity. At later phases, when a population comes close enough to herd-immunity that the reproduction number drops below one (\(R<1\)) and infection numbers subsequently decrease to a very low level, mutations can raise the reproduction number to \(R>1\) inducing a new infection wave, which is followed by a whole train of further waves. This scenario occurs even for a constant mutation rate (Fig. 1b). If the mutation rate increases with the number of infections, as generally expected and discussed above, their effect is even more dramatic: then, mutations occur at a self-accelerating pace and continuously prevent the population from reaching herd-immunity by persistently enhancing the effective reproduction number of the disease. As a result the infection dynamics approaches a hidden singularity and displays signatures of a critical dynamics. That is, infection numbers grow extremely fast, giving a giant infection wave, such that the majority of the population is infected at the same time (see the values on the vertical axis in Fig. 1b), which would massively overstrain any existing medical system. Finally, in phases where vaccination of the population takes place and is sufficiently effective to suppress the hidden singularity and hence the explosive self-acceleration of infection numbers, our model predicts the possibility of mutation-induced infection wave trains, as in the case of constant vaccination, illustrating once more the possible dramatic consequences following from the fact that herd-immunity is not necessarily a permanent state in the presence of mutations. To see how these predictions come about, let us now discuss our general modelling approach in detail. Based on this approach we will then discuss our results for a constant mutation rate model and a model that goes beyond a constant mutation rate.

Model

To describe the impact of mutations on the infection dynamics within a simple statistical framework, we first generalize the popular susceptible-infected-recovered (SIR) model17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37, which has been intensively explored in the context of the COVID-19 pandemic38,39,40,41,42,43,44,45,46,47,48,49. While some recent works have generalized this model to account for two different infectious strains50,51, here we allow for the continuous emergence of new strains with a rate \(\nu \), which in general depends on the present infection number. Denoting the fraction of susceptible and recovered individuals with S and R respectively and the fraction of individuals which is infected with strain n as \(I_n\), this leads use to the following dynamical equations:

$$\begin{aligned} \dot{S}&=-\sum _n \beta _n S I_n, \end{aligned}$$
(1)
$$\begin{aligned} \dot{I}_n&=\beta _n S I_n-\gamma I_n, \end{aligned}$$
(2)
$$\begin{aligned} \dot{R}&=\sum _n\gamma I_n. \end{aligned}$$
(3)

Here, \(\gamma \) is the inverse of the average disease duration, i.e. the recovery/death rate and \(\beta _n\) is the infection rate of strain n, which we randomly choose from a certain characteristic distribution. As an initial state, we assume that initially (time \(t=0\)) we have only a single infectious strain with a low positive infection number such that only \(I_0\gtrsim 0\) whereas \(I_{n\ne 0}=0\) for \(n=1,2..\).

To allow predicting the average (or most likely) result of the infection dynamics we now coarse grain this model, essentially by averaging over many strains and disease-realizations (see “Methods” section for technical details), which leads to the following effective model:

$$\begin{aligned} \dot{S}&=-\beta (t,I) S I, \end{aligned}$$
(4)
$$\begin{aligned} \dot{I}&=\beta (t,I) S I-\gamma I, \end{aligned}$$
(5)
$$\begin{aligned} \dot{R}&=\gamma I, \end{aligned}$$
(6)

Here, I is the overall infection number (all strains together) and \(\beta (t,I)\) is the average infection rate, which can depend on the overall infection number, depending on the underlying mutation statistics (see “Methods”).

Results

Let us now explore the impact of mutations on the disease evolution by comparing numerical simulations of the multi-component model with analytical predictions based on the mean-field model (Fig. 1b). To allow distinguishing between direct effects of mutations from the prevailing (and most infectious) strain and indirect effects due to the self-accelerating mutation cascade which we have described in the introduction and which may or may not become effective in reality, depending on the actual mutation rate and other parameters, we will sequentially follow on the cases of (i) a constant mutation rate \(\mu \) leading to the emergence of new strains in our simulations with a constant rate and (ii) a mutation rate which depends on the present infection number \(\mu (I)\).

Constant mutation rate

Mutation-driven infection dynamics

Let us assume that the infection rate \(\beta _n\) of a newly occurring virus-strain is randomly selected from a normal distribution p with standard deviation \(\sigma \) centered around the infection rate of the presently prevailing strain:

$$\begin{aligned} p(\beta _n)= \frac{1}{\sqrt{2\pi }\sigma } \mathrm {Exp} \left (-\frac{(\beta _n-\beta _{\mathrm {max},n-1})^2}{2\sigma ^2} \right), \end{aligned}$$
(7)

Here \(\beta _{\mathrm {max},n-1}\) denotes the largest infection rate of all currently existing strains. This distribution, the average of which moves towards higher values in the course of a disease, is motivated by the fact that newly occurring mutations typically become visible only if they have a higher (or at least not much lower) infection rate than the currently prevailing ones. Coarse graining this mutation statistic (see “Methods”: “Details on model beyond constant mutation rate”) yields an the following average infection rate for our mean-field model

$$\begin{aligned} \beta = \beta _0 + \mu t \end{aligned}$$
(8)

where \(\mu \) is the constant mutation rate and \(\beta _0\) is the initial infection rate. That is, coarse graining the distribution (7) leads to a constant increase of the infection rate with time.

Let us first explore the disease evolution at early times when the majority of the population is susceptible, such that \(S(t) \approx 1\). Then Eq. (5) reduces to

$$\begin{aligned} \dot{I}&=\beta (I,t) I-\gamma I. \end{aligned}$$
(9)

Now using Eq. (8) we find

$$\begin{aligned} I(t)= I_0 \mathrm {exp}\left[ \frac{1}{2} t (2\beta _0 -2 \gamma +t \mu ) \right] . \end{aligned}$$
(10)

Thus, the fraction of infected individuals does not grow exponentially as in the standard SIR model but even faster. Following Eq. (10) if the mutation rate is high enough that \(t\mu \gg 2(\beta _0-\gamma )\) long before herd-immunity is reached, the infection dynamics generically converges towards \(I(t)\propto e^{\mu t^2}\), which is completely mutation-driven. To test this prediction, we now numerically solve the full multi-component model and show the overall \(I(t)=\sum _n I_n(t)\) in Fig. 2a. Notably, the result is close to the analytical prediction of the mean-field model and shows an even slightly larger growth.

Figure 2
figure 2

Power law dependence of infection dynamics, phase diagram and state classification for constant mutation rate. (a) Fraction of infected people I at short times t for the coarse grained MSIR, multi component MSIR and the early time approximation Eq. (10). (\(I_0=10^{-5}\), \(R_0=1\), \(\mu /\beta _0^2=0.05\)). (b) Long time wave pattern of fraction of infected people for coarse grained MSIR and multi component MSIR approach. (\(I_0=10^{-5}\), \(R_0=1\), \(\mu /\beta _0^2=0.2\)). (c) Scaling of maxima of infections for coarse grained MSIR and multi component MSIR approach. (\(I_0=10^{-5}\), \(R_0=1\), \(\mu /\beta _0^2=0.2\)). (d) Phase diagram for our coarse grained infection dynamics showing the occurrence of four different courses of the pandemic for varying mutation rate and reproduction number (\(I_0=2\times 10^{-4}\)). (e) Different courses of infections during an epidemic: lethargic, multiple waves, super exponential, and rebound.

Clearly, the predicted (super)exponential growth of the infection numbers can not continue forever but has to saturate once the population reaches herd-immunity either by collectively going through the infection or through vaccination. Once herd-immunity is reached, the infection numbers are normally expected to monotonously decrease, as predicted by the standard SIR model. However, numerical solutions of our mean-field model show that after a phase where the population recovers and infection numbers decay to a very low level, they can rapidly grow again (Fig. 2b). This sequence of decreasing and increasing infection numbers can even repeat for many times, leading to an infection wave-train. The maxima of the wave-trains follow a scaling law of \(I_{\mathrm {max}}\sim 1/(\beta _0+\mu t_{\mathrm {max}})^2\) (see SI for derivation), which is shown in Fig. 2c. The prediction of wave trains is also confirmed by numerical solutions of the multi-component model, but somewhat weakened, because the individual strains can show waves occurring at individual “frequencies”. Let us now ask about the mechanism leading to these infection waves. They are induced by the nonlinear coupling of the infected and susceptible. First, the number of infected people grows, when the term \(\beta S\) in Eq. (5) is large enough. At a certain point, the number of susceptible is too small and the saturation effect from the recovery rate \(\gamma \) in Eq. (5) takes over such that the number of infected decreases. However, the infection rate \(\beta \) continues growing with time, such that \(\beta S\) can become large enough to induce a second wave. This feedback continues on multiple times giving rise to the oscillatory behavior shown in Fig. 2b. One a more intuitive level, these considerations show that in the presence of mutations herd-immunity is not necessarily a persistent state of a population and that strongly decreasing infection numbers are not an overall reliable sign that the population has overcome the disease. From a sociopolitical viewpoint each growth phase within such wave trains might evoke (nonpharmaceutical) interventions, creating an immense mental burden on the population.

Phase diagram

To see how strong measures have to be taken to prevent such an infection wave train (or a super-exponential growth) in the first place, we now systematically vary the parameters in the model to create a state diagram providing a systematic overview on the possible scenarios. It turns out that there are three dimensionless control parameters in our system (see SI), one of which is the initial infection number and the other two ones are the effective reproduction number \(S_0\beta _0/\gamma \) and a dimensionless mutation rate \(\mu /\beta _0^2\). Varying both parameters systematically and solving the mean-field mutation susceptible-infected-recovered (MSIR) model for each parameter combination we obtain the phase diagram shown in Fig.  2e, which shows four qualitatively different epidemic courses: a lethargic phase, which is characterized by an exponential decay, multiple waves, super exponential wave, and a rebound, with an initial local minimum and a proceeding super exponential increase. The occurrence of these states in parameter space is summarized in Fig. 2d. At large reproduction number the dynamics is “super exponential” (green domain) for any positive mutation rate. When decreasing the reproduction number, depending on the mutation rate, one reaches the regime of “multiple waves” which we have previously discussed (pink) or a “rebound” phase (blue) where infection numbers initially decrease, pass a minimum and then increase to reach a single maximum before finally decreasing (Fig. 2e). For even lower mutation rates, the population is in the “lethargic” regime, where the infection numbers monotonously decrease.

Nonpharmaceutical interventions, vaccination, and immune escape

In practice the goal is of course be to apply appropriate measures to safely reach the lethargic regime in Fig. 2d and not to end up in the multiple wave or rebound regime where the evolution of infection numbers show a promising initial trend but a severe evolution at later times. To understand the impact of nonpharmaceutical interventions, we reduce the reproduction number38,39,40,42,52,53, from \(R_0\) to a reduced reproduction number including measures \(R_{0,\mathrm {int}}\) at a time \(t_{\mathrm {int}}\) in our simulations and numerical solutions of the MSIR model. Starting in the super exponential wave regime (\(R_0=1.5\)) we apply interventions during the rise of a first wave (see Fig. 3a). By including weak measures (\(R_{0,\mathrm {int}}=1.3\)), the maximum of infections decreases as intended, however, the wave needs a longer time to decay, implying a longer period of restrictions for the public. Lowering the reproduction number to \(R_{0,\mathrm {int}}=0.9\), results in the appearance of a second and third intervention-mutation induced wave. These waves are enabled by the increased infection rate and the fact that due to the interventions there are more susceptible at a later point in time, where they can facilitate the growth of infections. Here, the situation would be particularly confounding to the public, since it was subjected to measures to decrease the number of infections in the first place; however, this results in more waves and a likely extension of the period of interventions. On the other hand, a strong reduction of the reproduction number (\(R_{0,\mathrm {int}}=0.7\)) gives a fast decay of infections, as intended. Of course the situation changes when we start from an infection dynamics with multiple waves (\(R_0=1\)) and apply measures during the rise of the second wave (see Fig. 3b). As expected, strong measures (\(R_{0,\mathrm {int}}=0.8\)) have the intended effect of eliminating the epidemic. On the other hand, if the measures are slightly weaker (\(R_{0,\mathrm {int}}=0.85\)), the infections first decrease, but then lead to a second intervention-mutation induced delayed wave, which is stronger in magnitude than the first wave. Again, this wave is induced by the growing infection rate and the enhanced number of susceptible individuals due to interventions. To decision makers and the public this type of wave could likely appear as unexpected. However, note that at least the overall number of recovered people at the end of our the epidemic is decreasing with stronger interventions, meaning that if only the cumulative number of infections is considered every reduction of \(R_0\) is useful.

Figure 3
figure 3

Nonpharmaceutical interventions, vaccinating, and immune escape. (a) Infections as function of time for a super exponential wave. Measures are taken by reducing the reproduction number to \(R_{0,\mathrm {int}}\) at time \(t_{\mathrm {int}}\). Inset: zoom in to the early time regime (black line has no interventions, \(R_0=1.5\), \(\mu /\beta _0^2=0.004\), \(t_{\mathrm {int}}=10.5\) weeks). (b) Infections as function of time for a wave like pandemic course. Measures are taken by reducing the reproduction number to \(R_{0,\mathrm {int}}\) at time \(t_{\mathrm {int}}\) (black line has no interventions, \(R_0=1\), \(\mu /\beta _0^2=10^{-3}\), \(t_{\mathrm {int}}=210\) weeks). (c) Infections as function of time of different vaccination rates \(\alpha \) (black line has no vaccination, \(R_0=1\), \(\mu /\beta _0^2=10^{-3}\), \(I_0=2 \times 10^{-4}\)). (d) Infections as function of time of different immune escape rates \(\omega \) (black line has no immune escape, \(R_0=1.2\), \(\mu /\beta _0^2=2\), \(I_0=2 \times 10^{-4}\)).

Specifically for COVID-19 vaccines have become available and their continuous production gives hope to get the disease under control. However, one and half a years after vaccines have first become available only about 67% of the worldwide population has been fully vaccinated (early Mai 2022), leaving much time for the emergence of highly infectious mutations. Vaccinations effectively reduce the number of susceptible in the SIR model14, such that we modify Eq. (4) as

$$\begin{aligned} \dot{S}&=-\beta (I,t) S I - \alpha , \end{aligned}$$
(11)

with a vaccination rate \(\alpha \). Note that a more realistic model for vaccination would also account for a time dependent roll out. We investigate the effect of vaccination on the infection dynamics for a situation leading to multiple waves (see Fig. 3c). Following Fig. 3c vaccinations have a clear effect; they have to be applied fast enough to significantly reduce the number of infections and temper the train of waves. This further shows the importance of manufacturing and distributing vaccinations as fast as possible.

When newly mutated strains arise the possibility of an immune escape increases, where a recovered individual does not stay immune against the new strain. Let us therefore now include a term which accounts for an effective immune escape in our model in the form of a transition from recovered to susceptible. Explicitly we modify Eqs. (4)–(6) to read

$$\begin{aligned} \dot{S}&=-\beta (t,I) S I + \omega R, \end{aligned}$$
(12)
$$\begin{aligned} \dot{I}&=\beta (t,I) S I-\gamma I, \end{aligned}$$
(13)
$$\begin{aligned} \dot{R}&=\gamma I - \omega R, \end{aligned}$$
(14)

where \(\omega \) is the immune escape rate. The transition rate between susceptible and recovered introduces two new effects. First, this term leads to a positive feedback loop between the number of susceptible and recovered individuals which induces infection waves (see Fig. 3d), which can also be seen from a linear stability analysis (see SI) of Eqs. (12)–(14). Second, the infected population fraction saturates to a nonzero steady state, due to a replenishment of susceptibles (see also SI).

Beyond constant mutation rate

As a second possible scenario, let us now assume that the mutation rate is coupled to the infection number, such that mutations are more likely in phases where the infection numbers are large. To account for this effect in our model, we assume that new (relevant) mutations occur with a probability of \(p_0 I_{n-1}\) from the most infectious strain where \(p_0\) is constant and also that the infection rates of new strains follow from a random walk with a mutation-induced bias as \(\beta _n= \beta _{n-1} + \Delta \beta \) (see “Methods” for details). Coarse graining the biased random walk yields a mean field infection rate which evolves as

$$\begin{aligned} \beta (t)= \int _0^t \lambda I(t') \mathrm {d} t', \end{aligned}$$
(15)

with a mutation rate \(\lambda \). Intuitively, this means that new mutations occur in our mean-field model with a rate which is proportional to the present infection number.

Mutation-induced dynamics

At early times, where \(S\approx 1\) we obtain again Eq. (9), which yields together with Eq. (15)

$$\begin{aligned} I(t)=-\frac{\delta _1}{\lambda \left[ 1+\cosh \left( 2 \ln \delta _2 +t \sqrt{\delta _1}\right) \right] }, \end{aligned}$$
(16)

where \(\delta _1\) and \(\delta _2\) are constants that are given in the SI. Importantly, the infections in Eq. (16) do not grow exponentially, but there is an explosive super exponential growth, which asymptotically has a scaling behavior following \(I_{\mathrm {sc}}(t) \sim 1/|t-t_c|^2\) with a critical time \(t_c\) (see SI for explicit expression). Crucially, this giant infection wave qualitatively differs from the comparatively mild super-exponential behaviour which we have encountered for the case of a constant mutation rate in that it leads to a much more extreme self-acceleration of the infection numbers. As a result, the infection numbers peak only at extremely high values where a large fraction of the population is infected at the same time (see Figs. 1b and 4a). Clearly, such an explosive growth would be interrupted at some point as the population approaches herd immunity. To quantify to which extend the predicted explosive growth would occur before herd-immunity causes significant deviations, we consider the expression \(\mathrm {min}(\mathrm {ln}(I_{\mathrm {sc}}) - \mathrm {ln}(I))\), which quantifies how closely the fraction of infections approaches the underlying (idealized) power law dependence in the presence of saturation effects. We find that for large mutation rates the power-law dependence is strong for any reproduction number (Fig. 4b) and weakens for lower mutation rates. Remarkably, the explosive growth depends only weakly on the reproduction which is the parameter that is controllable due to interventions. After the initial super-exponential increase of infections the saturation effects from people recovering induce a maximum and a succeeding decrease in infection numbers.

Figure 4
figure 4

Scaling law of short time infection dynamics, Phase diagram and state classification of approach beyond constant mutation rate. (a) Infections as function of reduced time \(|t-t_c|\) where \(t_c\) is the critical time at which the infections diverge (see SI for details). We show the scaling law, coarse grained MSIR and our multi component MSIR approach. (\(R_0=2\), \(\lambda /\beta _0^2 = 2\times 10^4\), \(I_0=10^{-4}\)). (b) Deviation of the fraction of infections at short times from our corse grained MSIR approach to the a \(1/|t-t_c|^2\) scaling by using \(\mathrm {min}(\mathrm {ln}(I_{\mathrm {sc}}) - \mathrm {ln}(I))\). Mutation rate and reproduction number are varied (\(I_0= 2\times 10^{-4}\)). (c) Phase diagram of our coarse grained MSIR approach showing the occurrence of four different courses of the pandemic for varying mutation rate and reproduction number (\(I_0= 2\times 10^{-4}\)). (d) Example plots of the infections as function of time for four different courses of the pandemic: lethargic, super exponential wave; rebound, and weak rebound.

Phase diagram

Depending on the basic reproduction number and the mutation rate the MSIR model predicts four distinct courses summarized in the phase diagram Fig. 4c, which has been obtained analytically (see SI). We find a lethargic regime characterized by an exponential decay (purple regime, Fig. 4d); an explosive regime (cyan regime); a rebound regime (dark yellow) where we have a minimum followed by a mutant induced super exponential increase; and weak rebound (red) leading to an infection maximum which is smaller than the initial fraction of infected individuals. Generally, the explosive (or super exponential) regime occurs for reproduction numbers \(R_0>1\) and any positive mutation rate, whereas the other three regimes occur for \(R_0<1\). For low mutation rate and \(R_0<1\) the epidemic is in the desired lethargic regime, increasing the mutation rate leads to a small region of weak rebound which then transitions to a rebound dynamics.

Nonpharmaceutical interventions, vaccination, and immune escape

To explore the efficiency of measures which effectively reduce the reproduction number, we again change the reproduction number \(R_0\) to a value of \(R_{0,\mathrm {int}}\) at time \(t_{\mathrm {int}}\). Now starting from the explosive (super exponential) regime (\(R_0=1.2\)) we decrease the reproduction number during the rise of the wave (Fig. 5). Strong measures yielding \(R_{0,\mathrm {int}}=0.7\) or \(R_{0,\mathrm {int}}=0.8\)) induce an immediate decay of the infection numbers as desired. However, weak measures, leading to \(R_{0,\mathrm {int}}=1\) only delay the occurrence of the infection maximum but hardly change the value of I at the peak. Hence, it is clear that measures need to be strong enough to have a significant effect, while weak measures only delay the infection number explosion.

Figure 5
figure 5

Nonpharmaceutical interventions, vaccinating, and immune escape. (a) Infections as function of time for a super exponential wave. Measures are taken by reducing the reproduction number to \(R_{0,\mathrm {int}}\) at time \(t_{\mathrm {int}}\) (black line has no intervention. \(R_0=1.2\), \(\lambda /\beta _0^2=50\), \(I_0=2\times 10^{-4}\), \(t_{\mathrm {int}}=3.85\) weeks). (b) Infections as function of time of different vaccination rates \(\alpha \) (black line has no vaccination. \(R_0=1.2\), \(\lambda /\beta _0^2=50\), \(I_0=2\times 10^{-4}\)). (c) Infections as function of time of different immune escape rates \(\omega \) (black line has no immune escape. \(R_0=1.2\), \(\lambda /\beta _0^2=2\), \(I_0=2\times 10^{-4}\)).

We finally ask how fast vaccination would have to progress in order to suppress a mutation-induced explosion of infection numbers or a mutation-induced rebound at initial reproduction numbers smaller than 1. Vaccinations effectively reduce the number of susceptibles by a vaccination rate \(\alpha \) (see also Eq. 11). If we start in the explosive (super exponential) regime, a high vaccination rate is needed to interrupt the rapid growth of the infection numbers (Fig. 5b): while small vaccination rates (\(\alpha /\beta _0=0.01\)) merely shift the infection maximum to a later time, while having little change in the number of infections, larger vaccination rates (\(\alpha /\beta _0=0.025\)) effectively suppress the explosion of infection numbers. Therefore, to prevent possible mutation-induced long-time consequences it is imperative to maximize vaccine production worldwide. Enhancing vaccine production is particularly important in countries with a weak healthcare system, which face the threat of being overloaded by COVID-19 cases54,55,56,57,58,59.

Newly mutated strains allow for an immune escape of the virus, effectively representing a transition of recovered individuals to susceptible ones with an immune escape rate \(\omega \) (see also Eqs. (12)–(14)). This leads to a continued recovery of the number of susceptibles and drives the population away from herd immunity, which in turn can cause new infection waves (see Fig. 5c), that can also be predicted using a linear stability analysis (see SI). Further, we find that the number of infected saturates to a nonzero steady state (see also SI).

Discussion

Inspired by the ongoing COVID-19 disease and the continued emergence of new mutations supplanting preceding ones, in the present work we have developed a stochastic multistrain generalization of the popular SIR model. Combining this model with coarse graining concepts from statistical physics has allowed us to predict a panorama of possible scenarios for the mutation-controlled evolution of infectious diseases.

In particular, our approach suggests that mutations can induce a super-exponential growth of infection numbers in populations which are highly susceptible to the disease (e.g. because they are far from reaching herd immunity). As compared to the standard exponential growth, interrupting such an super-exponential growth is much more difficult and requires stronger and stronger measures as the disease evolves. In practice, such a super-exponential growth may occur e.g. if measures are applied too late, or if vaccines suddenly become ineffective against mutations.

One particularly severe form of such an super-exponential growth can occur if the mutation rate of a virus is proportional to the current number of infections. For this case our model predicts a giant infection wave, which is based on a positive feedback loop between the mutation-rate and the infection number causing a massive-self acceleration of the latter resulting in a state where the majority of the population gets infected at the same time. Clearly, such a situation would not only massively overstress any existing health system but once in action it would hardly be interruptable through vaccination.

At later stages of an infectious disease, where the population approaches herd immunity and the infection numbers decrease, an obvious political reaction would be to release measures. However, our simulations suggest that mutations can drive new infection waves even after a longer downwards trend. Such waves can even self-repeat and lead to a pattern of repeated phases of strongly decaying and increasing infection numbers provoking an endless sequence of renewed non-pharmaceutical interventions.

Since our work is on a conceptual basis we did not include explicit data to model COVID-19. However, the panorama of mutation induced phenomena which we have identified might inspire detailed modeling works to test them for specific infectious diseases such as COVID-19. Further, our results could be applied to diseases in the animal world such as the avian influenza60. These results might also be useful for discussions regarding the importance of a release of vaccine-patents to reduce the risk of mutation-induced infection revivals and to coordinate the release of measures following a downwards trend of infection numbers.

Methods

Basic reproduction number of COVID-19 mutants

The basic reproduction numbers where extracted from: original variant61, B.1.1.7 (\(\alpha \))11, B.1.351 (\(\beta \))62, P.1 (\(\gamma \))13, B.1.617.2 (\(\delta \))63, and B.1.1.529 (o)64. The time point at which a variant has reached \(5\%\) in the sequenced genomes reported in4 (https://nextstrain.org/ncov/global) is used as the emergence time.

Details on constant mutation rate

We assume that the new infection rates \(\beta _n\) are drawn from a Gaussian distribution, whose mean is the largest current infection rate. Explicitly we have

$$\begin{aligned} p(\beta _n)= \frac{1}{\sqrt{2\pi }\sigma } \mathrm {Exp}(-\frac{(\beta _n-\beta _{\mathrm {max},n-1})^2}{2\sigma ^2}), \end{aligned}$$
(17)

where \(\beta _{\mathrm {max},n-1}\) denotes the maximal infection rate of the current strains, \(\beta _n\) is the infection rate of the newly mutated strain, \(\sigma \) is the standard deviation of the distribution, and new strains are produced at a rate m (in our multi component simulations we use \(\beta _0/\gamma =1\), \(\sigma =2\times 10^{-4}\), \(m/\gamma = 2\), and the new strain obtains an initial \(I_n(0)= 10^{-7}\)).

To coarse grain this mutation model, we assume that the infections immediately assume the maximal infection rate of the newly mutated strain \(\beta _{\mathrm {max},n}\). It follows that the mean infection rate is dominated by \(\beta _{\mathrm {max},n}\), since all other infections grow exponentially slower. This reduces our multicomponent model to an effective one component model with the infection rate \(\beta _{\mathrm {max},n}\). To determine \(\beta _{\mathrm {max},n}\) we compute

$$\begin{aligned} \int _{\mathrm {max}_n\beta _{n}}^{\infty } p(\beta _n) \mathrm {d}\beta _n = \frac{\gamma }{m}, \end{aligned}$$
(18)

where m is the number of times drawn from the distribution Eq. (17). Explicitly, Eq. (18) yields

$$\begin{aligned} \beta _{\mathrm {max},n}= \beta _{\mathrm {max},n-1} -\sqrt{2} \sigma \mathrm {erf}^{-1}(-1+\frac{2\gamma }{m}), \end{aligned}$$
(19)

where \(\mathrm {erf}^{-1}(*)\) is the inverse error function and can here be approximated by a negative constant \(-C_1\). We now write the standard deviation as \(\sigma = \mu ^* \tau \), with a mutation rate \(\mu ^*\) and mutation timescale \(\tau \), giving

$$\begin{aligned} \frac{\beta _{\mathrm {max},n}- \beta _{\mathrm {max},n-1}}{\tau }= C_1 \sqrt{2} \mu ^*, \end{aligned}$$
(20)

which is a discretized version of \(\dot{\beta }= \mu \), and equivalent to our coarse grained constant mutation rate model.

Details on model beyond constant mutation rate

For the model beyond constant mutation rate the infection rates perform a biased random walk. Given an infection rate \(\beta _{n}\) it will mutate with a probability \(p_0\) and not mutate with probability \(1-p_0\). Furthermore, this strain has \(I_n\) infections, which are all able to mutate, giving a total mutation probability of \(1-(1-p_0)^{I_n}= p_0 I_n +\mathcal {O}(p_0^2)\). A mutation gives a new strain \(I_n\) with an increased \(\beta _n= \beta _{n-1}+ \Delta \beta \) (in our multi component simulations we use \(\beta _0=0.1\), \(\gamma =0.1\), \(\Delta \beta =0.03\), \(p_0=2\times 10^{-4}\), and the new strain obtains an initial \(I_n(0)= 10^{-6}\)).

To coarse grain, we assume that the expectation value of the infection rate \(\langle \beta _n \rangle \) is proportional to the mutation probability. Then a new mutation has the expectation value \(\langle \beta _{n+1} \rangle = p_0 I_n\) and the old strain has \(\langle \beta _{n} \rangle = 1- p_0 I_n\). Computing the difference gives

$$\begin{aligned} \langle \beta _{n+1} \rangle - \langle \beta _{n} \rangle = 2 p_0 I_n -1, \end{aligned}$$
(21)

which is a discretization of \(\dot{\beta }= \lambda I\), which is our coarse grained model beyond a constant mutation rate.