“Every day sadder and sadder news of its increase. In the City died this week 7496; and of them, 6102 of the plague. But it is feared that the true number of the dead this week is near 10,000 ....” —Samuel Pepys, 1665
In the next few articles, we will discuss mathematical and statistical models that are commonly used to study the spread of infectious diseases. Such models are used to inform decisions on disease prevention, surveillance, control and treatment and can be applied to new epidemics, such as the ongoing COVID19 outbreak.
During a socalled ‘virgin epidemic’, we initially have a population of N individuals — all at risk for a disease — who form the susceptible group. If an individual with a transmissible pathogen is introduced into this population, over time some of the susceptible individuals will become infected and become part of the infectious group whose members will contribute to onwards transmission. Depending on the pathogen, individuals may recover and acquire immunity, which can be for life (for example, in the case of measles) or transient (for example, in the case of influenza).
The simplest model for the spread of an infection is the SIR model^{1,2}, which tracks the fraction of a population in each of three groups: susceptible, infectious and recovered (Fig. 1a). The sizes of these groups are functions of time t — we will write them as S, I and R, with dependence on t implied. During a virgin epidemic, we often assume that spread is so rapid that we can ignore any change in the population due to births and deaths — this is a socalled ‘closed epidemic’ with a fixed population size S + I + R = N. The SIR model has no probabilistic component, except for the assumption that the population can mix at random and is large enough that predictions based on average rates can be used.
The model is initialized with the entire population being in the susceptible group except for a single infectious individual: I(0) = 1, S(0) = N – 1, R(0) = 0. At each time unit (for example, day), any infected individual can come into contact with k other individuals in the population. On average, per unit time they will come into contact with kS/N susceptible individuals and infect kπS/N of them, if π is the probability of infection on contact. Conventionally, k and π are combined into a transmission rate, β = πk, which is the average rate at which an infected individual can infect a susceptible. Infected individuals recover at a constant rate γ and 1/γ is the infectious period (or average recovery time). These rates of change of each group can be written as a set of three coupled differential equations:
The equations are typically solved numerically^{2}. For now, there are several key observations based on their forms. An outbreak will take off if initially dI/dt > 0, which implies N/S(0) < β/γ or, if S(0) ≈ N, that β/γ > 1. In this case, the initial increase in I will be exponential with a rate r = log(β/γ)/G and a doubling time of log(2)/r. Here, G is called the serial interval and is the average time between successive cases in a chain of transmission. For example, for influenza with a β/γ ratio of around 2.5 and a serial interval of 3.5 days, cases will initially double every 2.6 days. Eventually, however, I will drop to zero (if γ > 0) because the S group is depleted during the course of the epidemic as a result of the immunity of the R group. Many important aspects of the dynamics of an epidemic are influenced by the ratio β/γ, called the basic reproduction number (R_{0}), which represents the expected number of secondary cases caused by a single primary infected individual introduced into a population with no prior immunity.
Figure 1b tracks an outbreak with R_{0} = 2 and 1/γ = 14 days (β = R_{0}γ = 0.14). Per convention, we express S, I and R as percentages of the total population size, N. A key feature of I is the location and size of the peak infection, which occurs when S = 1/R_{0} and is given by I_{max} = 1 – (1 + logR_{0})/R_{0} (ref. ^{3}). We find I_{max} = 15% at t = 95 days. A second important feature is the total number of people infected — the cumulative epidemic size, R(∞), which is 80%. Regardless of the value of R_{0}, the epidemic will selfextinguish (I(∞) → 0) if no new susceptible individuals are added into the population (either through births or through loss of immunity). This happens because, with time, recovery begins to outpace infection before all the remaining susceptible individuals are infected. Thus, the model predicts there will be a fraction of people, S(∞), who escape infection given by the implicit equation \(S\left( \infty \right) = {\mathrm{e}}^{R_0\left( {1  S\left( \infty \right)} \right)}\), which can be approximated by \(S\left( \infty \right) \approx {\mathrm{e}}^{R_0}\) as long as R_{0} ≳ 2.5.
If the number of secondary cases is increased by 50% (R_{0} = 3), the trajectories change: now, I_{max} = 30% at t = 54 days, R(∞) = 94% and only S(∞) = 6% is predicted to escape infection (Fig. 1c).
The spread of the infection can be mitigated by reducing R_{0}. This can be accomplished by reducing the infectious period 1/γ (for example, by therapeutics) or by reducing β = πk. Hygiene measures (sewage systems, handwashing, air filters, and so on) reduce π by mitigating the number of contagious particles that are exchanged among individuals. Other measures, like quarantines, social distancing and travel restrictions, reduce the contact rate k.
When R_{0} is reduced, the infected fraction peak is delayed and lowered — the ‘flattening of the curve’ effect, recently popularized in the context of the COVID19 outbreak — which is particularly valuable because it reduces the pressure on the health care system (Fig. 2a). Even small decreases in R_{0} can have substantial public health benefits. For example, decreasing R_{0} by 10% from 2.0 to 1.8 decreases I_{max} by a fifth (15% to 12%), delays the peak time by a fifth (95 to 113 days) and lowers the total epidemic size by from 80% to 73% (Fig. 2b). It is worth noticing the change in the relationship between I_{max} and the time at which it occurs when R_{0} is decreased (Fig. 2b). For example, reducing I_{max} from 30% to 20% delays peak infection time by 24 days (54 to 78) but reducing it to 10% delays it by 71 days (from 54 to 125).
A celebrated insight from the SIR model is the profound value of vaccination, which moves individuals from the S group directly to the R group. Since spread will selflimit when S < 1/R_{0}, vaccinating at least a fraction p_{c} = 1 – 1/R_{0} of the population will prevent an outbreak^{4}. This is the ‘herd immunity’ threshold. Practically, there will still be susceptible individuals in the population, but a pathogen will result in only a short and stuttering chain of transmission because infectious individuals are unlikely to encounter enough susceptible ones.
For smallpox — the only human disease eradicated by vaccination — herd immunity is achieved at p_{c} ≈ 80% (R_{0} ≈ 5 (ref. ^{1})). For the seasonal flu it is achieved at p_{c} = 50–75% (R_{0} ≈ 2–4). However, current influenza vaccines typically confer immunity to only a portion of individuals, so higher coverage is needed to establish herd immunity, and vaccinations may need to be updated as new strains emerge due to viral evolution. This is in contrast to the 90–99% efficacy for the smallpox or other childhood vaccines. Many zoonotic viruses, such as MERS, can be deadly, but they do not pose major risks for sustained humantohuman chains of transmission resulting in epidemics because R_{0} < 1.
An important caveat for these calculations is our original assumption of random mixing among individuals. Assortative mixing can break these predictions, as evidenced by recent outbreaks of measles (R_{0} = 12–20; ref. ^{1}) in the United States, where vaccination cover is generally sufficient but vaccinerefusal communities are aggregated so that local values of p_{c} drop below the value of 93% estimated to achieve herd immunity.
In Fig. 3 we show the trajectories of I and R for a disease with R_{0} = 3 at various vaccination fractions p = 0 to 0.5. As p is increased, the rate of infections decreases, the point in time at which infections peak is delayed, and the cumulative epidemic size decreases. For example, if we vaccinate p = 0.5 of the population, we decrease peak infection I_{max} = 30% to 3.2% and the size of the epidemic from R(∞) = 94% to 29%. An interactive tool to explore infection spread trajectories (Figs. 1–3) is at https://github.com/martinkrz/posepi1.
Shortly after a new infectious disease appears, it is often possible to estimate the parameters of the basic SIR model. This gives valuable insight to predict the disease trajectory and the needed reductions in R_{0} for control. The basic SIR framework introduced here is also readily extended to realistically model more complex population and disease dynamics: births and deaths in the general population, age structure, nonrandom mixing and spatial heterogeneities, asymptomatic carriers, latent periods (when individuals are infected but not yet infectious), loss of immunity, diseases that are transmitted by vectors (such as ticks and mosquitos) and diseases that require specific types of contacts (such as sexually transmitted diseases). We will discuss many of these in the next column.
References
Anderson, R. M., Anderson, B. & May, R. M. Infectious Diseases of Humans: Dynamics and Control (Oxford Univ. Press, 1992).
Bjørnstad, O. N. Epidemics: Models and Data Using R (Springer, 2018).
Weiss, H. The SIR model and the foundations of public health. Materials Matemàtics (2013); http://mat.uab.cat/matmat/PDFv2013/v2013n03.pdf
Anderson, R. M. & May, R. M. Nature 318, 323–329 (1985).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Editor’s note: Points of Significance are commissioned and are not peerreviewed.
Rights and permissions
About this article
Cite this article
Bjørnstad, O.N., Shea, K., Krzywinski, M. et al. Modeling infectious epidemics. Nat Methods 17, 455–456 (2020). https://doi.org/10.1038/s415920200822z
Published:
Issue Date:
DOI: https://doi.org/10.1038/s415920200822z
This article is cited by

Identification of the COVID19 epidemiological dynamics at State of Amazonas and optimal vaccination strategy proposal
Journal of the Brazilian Society of Mechanical Sciences and Engineering (2023)

A look at endemic equilibria of compartmental epidemiological models and model control via vaccination and mitigation
Mathematics of Control, Signals, and Systems (2023)

Data Assimilation Predictive GAN (DAPredGAN) Applied to a SpatioTemporal Compartmental Model in Epidemiology
Journal of Scientific Computing (2023)

EKFSIRD model algorithm for predicting the coronavirus (COVID19) spreading dynamics
Scientific Reports (2022)

Intervention strategies with 2D cellular automata for testing SARSCoV2 and reopening the economy
Scientific Reports (2022)