In the next few articles, we will discuss mathematical and statistical models that are commonly used to study the spread of infectious diseases. Such models are used to inform decisions on disease prevention, surveillance, control and treatment and can be applied to new epidemics, such as the ongoing COVID-19 outbreak.

During a so-called ‘virgin epidemic’, we initially have a population of N individuals — all at risk for a disease — who form the susceptible group. If an individual with a transmissible pathogen is introduced into this population, over time some of the susceptible individuals will become infected and become part of the infectious group whose members will contribute to onwards transmission. Depending on the pathogen, individuals may recover and acquire immunity, which can be for life (for example, in the case of measles) or transient (for example, in the case of influenza).

The simplest model for the spread of an infection is the SIR model1,2, which tracks the fraction of a population in each of three groups: susceptible, infectious and recovered (Fig. 1a). The sizes of these groups are functions of time t — we will write them as S, I and R, with dependence on t implied. During a virgin epidemic, we often assume that spread is so rapid that we can ignore any change in the population due to births and deaths — this is a so-called ‘closed epidemic’ with a fixed population size S + I + R = N. The SIR model has no probabilistic component, except for the assumption that the population can mix at random and is large enough that predictions based on average rates can be used.

Fig. 1: A simple SIR model of infection spread in a population.
figure 1

a, Three population groups in the model. b, The trajectory of S (black), I (orange) and R (blue) for an outbreak with R0 = 2 and 1/γ = 14 days (β = R0γ = 0.14/day). Values at end of traces show cumulative epidemic size (R(∞) = 80%), fraction escaping infection (S(∞) = 20%) and peak infection rate (Imax = 15%, dashed orange line), which occurs at S = 1/R0 = 0.5 and t = 95 days (black dashed line). Initial SIR conditions were S(0) = 0.999 and I(0) = 0.001, R(0) = 0, implying a population size of 1,000. Trajectories are computed numerically for 100 time steps. c, Same as b but for a more serious outbreak with R0 = 3 (β = 0.21/day).

The model is initialized with the entire population being in the susceptible group except for a single infectious individual: I(0) = 1, S(0) = N – 1, R(0) = 0. At each time unit (for example, day), any infected individual can come into contact with k other individuals in the population. On average, per unit time they will come into contact with kS/N susceptible individuals and infect kπS/N of them, if π is the probability of infection on contact. Conventionally, k and π are combined into a transmission rate, β = πk, which is the average rate at which an infected individual can infect a susceptible. Infected individuals recover at a constant rate γ and 1/γ is the infectious period (or average recovery time). These rates of change of each group can be written as a set of three coupled differential equations:

$$\frac{{\mathrm{d}}S}{{\mathrm{d}}t} = - \frac{\beta SI}{N}$$
$$\frac{{\mathrm{d}}I}{{\mathrm{d}}t} = \frac{\beta SI}{N} - \gamma I$$
$$\frac{{\mathrm{d}}R}{{\mathrm{d}}t} = \gamma I$$

The equations are typically solved numerically2. For now, there are several key observations based on their forms. An outbreak will take off if initially dI/dt > 0, which implies N/S(0) < β/γ or, if S(0) ≈ N, that β/γ > 1. In this case, the initial increase in I will be exponential with a rate r = log(β/γ)/G and a doubling time of log(2)/r. Here, G is called the serial interval and is the average time between successive cases in a chain of transmission. For example, for influenza with a β/γ ratio of around 2.5 and a serial interval of 3.5 days, cases will initially double every 2.6 days. Eventually, however, I will drop to zero (if γ > 0) because the S group is depleted during the course of the epidemic as a result of the immunity of the R group. Many important aspects of the dynamics of an epidemic are influenced by the ratio β/γ, called the basic reproduction number (R0), which represents the expected number of secondary cases caused by a single primary infected individual introduced into a population with no prior immunity.

Figure 1b tracks an outbreak with R0 = 2 and 1/γ = 14 days (β = R0γ = 0.14). Per convention, we express S, I and R as percentages of the total population size, N. A key feature of I is the location and size of the peak infection, which occurs when S = 1/R0 and is given by Imax = 1 – (1 + logR0)/R0 (ref. 3). We find Imax = 15% at t = 95 days. A second important feature is the total number of people infected — the cumulative epidemic size, R(∞), which is 80%. Regardless of the value of R0, the epidemic will self-extinguish (I(∞) → 0) if no new susceptible individuals are added into the population (either through births or through loss of immunity). This happens because, with time, recovery begins to outpace infection before all the remaining susceptible individuals are infected. Thus, the model predicts there will be a fraction of people, S(∞), who escape infection given by the implicit equation \(S\left( \infty \right) = {\mathrm{e}}^{-R_0\left( {1 - S\left( \infty \right)} \right)}\), which can be approximated by \(S\left( \infty \right) \approx {\mathrm{e}}^{-R_0}\) as long as R0 2.5.

If the number of secondary cases is increased by 50% (R0 = 3), the trajectories change: now, Imax = 30% at t = 54 days, R(∞) = 94% and only S(∞) = 6% is predicted to escape infection (Fig. 1c).

The spread of the infection can be mitigated by reducing R0. This can be accomplished by reducing the infectious period 1/γ (for example, by therapeutics) or by reducing β = πk. Hygiene measures (sewage systems, hand-washing, air filters, and so on) reduce π by mitigating the number of contagious particles that are exchanged among individuals. Other measures, like quarantines, social distancing and travel restrictions, reduce the contact rate k.

When R0 is reduced, the infected fraction peak is delayed and lowered — the ‘flattening of the curve’ effect, recently popularized in the context of the COVID-19 outbreak — which is particularly valuable because it reduces the pressure on the health care system (Fig. 2a). Even small decreases in R0 can have substantial public health benefits. For example, decreasing R0 by 10% from 2.0 to 1.8 decreases Imax by a fifth (15% to 12%), delays the peak time by a fifth (95 to 113 days) and lowers the total epidemic size by from 80% to 73% (Fig. 2b). It is worth noticing the change in the relationship between Imax and the time at which it occurs when R0 is decreased (Fig. 2b). For example, reducing Imax from 30% to 20% delays peak infection time by 24 days (54 to 78) but reducing it to 10% delays it by 71 days (from 54 to 125).

Fig. 2: The effect of R0 mitigation on infection spread.
figure 2

a, When R0 is reduced, the infected group trajectory is flattened and delayed. b, Trends in peak infected fraction Imax (orange), relationship between Imax and the time at which it occurs (black) and cumulative epidemic size R(∞) (blue) as R0 decreases from 3.0 to 1.5. Dashed horizontal line indicates a putative hospital capacity target for mitigation efforts. Infectious period is 1/γ = 14 days for all plots, but β varies from 0.21 to 0.11.

A celebrated insight from the SIR model is the profound value of vaccination, which moves individuals from the S group directly to the R group. Since spread will self-limit when S < 1/R0, vaccinating at least a fraction pc = 1 – 1/R0 of the population will prevent an outbreak4. This is the ‘herd immunity’ threshold. Practically, there will still be susceptible individuals in the population, but a pathogen will result in only a short and stuttering chain of transmission because infectious individuals are unlikely to encounter enough susceptible ones.

For smallpox — the only human disease eradicated by vaccination — herd immunity is achieved at pc ≈ 80% (R0 ≈ 5 (ref. 1)). For the seasonal flu it is achieved at pc = 50–75% (R0 ≈ 2–4). However, current influenza vaccines typically confer immunity to only a portion of individuals, so higher coverage is needed to establish herd immunity, and vaccinations may need to be updated as new strains emerge due to viral evolution. This is in contrast to the 90–99% efficacy for the smallpox or other childhood vaccines. Many zoonotic viruses, such as MERS, can be deadly, but they do not pose major risks for sustained human-to-human chains of transmission resulting in epidemics because R0 < 1.

An important caveat for these calculations is our original assumption of random mixing among individuals. Assortative mixing can break these predictions, as evidenced by recent outbreaks of measles (R0 = 12–20; ref. 1) in the United States, where vaccination cover is generally sufficient but vaccine-refusal communities are aggregated so that local values of pc drop below the value of 93% estimated to achieve herd immunity.

In Fig. 3 we show the trajectories of I and R for a disease with R0 = 3 at various vaccination fractions p = 0 to 0.5. As p is increased, the rate of infections decreases, the point in time at which infections peak is delayed, and the cumulative epidemic size decreases. For example, if we vaccinate p = 0.5 of the population, we decrease peak infection Imax = 30% to 3.2% and the size of the epidemic from R(∞) = 94% to 29%. An interactive tool to explore infection spread trajectories (Figs. 13) is at https://github.com/martinkrz/posepi1.

Fig. 3: The effect of vaccination on trajectory of a disease with R0 = 3 at vaccination fractions p = 0 to 0.5.
figure 3

a, Trajectories of the infected fraction for each p. Dashed lines indicate peak infection, Imax. b, Trajectories of the recovered fraction. Values at end of profiles indicate total epidemic size, R(∞).

Shortly after a new infectious disease appears, it is often possible to estimate the parameters of the basic SIR model. This gives valuable insight to predict the disease trajectory and the needed reductions in R0 for control. The basic SIR framework introduced here is also readily extended to realistically model more complex population and disease dynamics: births and deaths in the general population, age structure, non-random mixing and spatial heterogeneities, asymptomatic carriers, latent periods (when individuals are infected but not yet infectious), loss of immunity, diseases that are transmitted by vectors (such as ticks and mosquitos) and diseases that require specific types of contacts (such as sexually transmitted diseases). We will discuss many of these in the next column.