Introduction

Despite substantial public health improvements in the last century, infectious diseases remain one of the leading causes of both morbidity and mortality1,2. When confronted with an infectious disease outbreak, public health officials typically work to control the outbreak by performing assessments, analyzing surveillance data, identifying resources and interacting with subject mater experts2,3,4. Control measures are then implemented based on the cumulative information collected. These approaches rely heavily on good surveillance systems, access to experts, and good intuition about which control measures to use. As such, they are largely subjective, time consuming, and the infrastructure required is often not present in high disease burden areas.

Modeling is an attractive supplemental method because of the ability to estimate an outbreak’s trajectory and the effects of possible control measures in a timely manner. Compartmental models are historically common; they divide individuals into categories based on their disease status. The most common variant is the SIR model, named after the categories used—“susceptible”, “infectious” and “recovered”. Models of this nature have small computational requirements, and are thus commonly used as first pass attempts to characterize outbreaks or infections quickly5. For example, after the sudden emergence of Severe Acute Respiratory Syndrome (SARS) in the early 2000s, researchers used modeling to characterize the virus’ epidemiology. Several used compartmental models with control measures like quarantine or isolation in various settings (e.g., hospitals or cities) to describe effects of possible interventions6,7,8,9. Similar work exists for essentially all well known infectious diseases. For example, Mandal et al. provide a review of models used for malaria10 and Bauch et al. explore model use with respect to SARS and other emerging infectious diseases6. Methods among these groups are often similar, but tend to focus on specific diseases and locations of interest.

In contrast, agent-based models use a bottom-up approach where the agents (these are often people) interact with particular rules to simulate outbreaks11. This allows simulations at high resolutions, but requires large amounts of data to parameterize the models, as well as substantial computational power. It is thought that they may reflect real world scenarios more accurately, but the lack of available epidemiological data necessitates assumptions that are difficult or impossible to test11. These models further require computational resources inaccessible to an average health department.

Both agent-based and compartmental models exhibit additional features that are problematic to wide-spread model adoption. The first is the focus on particular disease-location pairs. This emphasis precludes application to a new location or disease because of the amount of work associated with finding location specific data, tweaking parameters, and, often, reproducing code that is not freely available.

A related, but perhaps more pressing issue is the lack of collaboration between the researchers developing models and those making policy decisions during an outbreak. Wagner et al. describe the disconnect between these two fields12. The ultimate result is a lack of clarity in the modeling community about the requirements for real-world application of models and production of models that do not meet decision makers’ needs12,13.

There have been previous efforts to produce widely available models, in particular web-based simulations of agent-based disease models. Several of these efforts have been part of MIDAS (the Models of Infectious Disease Agent Study), including FRED14 and GLEAMviz15, which are both freely accessible and maintained. There is one package for the statistical language R16 that implements compartmental models. However, available web-based compartmental models are limited and are directed towards educating public health trainees rather than providing an operational modeling platform.

With this context in mind, our aim is to use existing models with low computational requirements to—(1) explore control measures and (2) develop an accessible platform for public health collaborators to use and provide feedback on models. For this initial work, we use a SIR model, modified to include a control measure, to explore many possible disease progression paths. The SIR model was chosen because it is the simplest and requires minimal computational resources. This study presents the application of existing SIR models to investigate control options using a “counterfactual”, a web application using the model, and a description of a path forward for validating SIR models.

A counterfactual is a theoretical construct describing a perfect experiment isolating one variable. For example, say patient P is in a clinical trial that is assessing the effects of drug D compared to a placebo. Patient Pi is randomly assigned to drug D. A perfect experiment would be for an identical patient, Pj, to exist and be assigned to the placebo, so that the researchers could compare the differing effects of the drug versus the placebo in identical people. The concept of a counterfactual has existed in theory for decades17, and is commonly used in causal inference in medicine and epidemiology18. The counterfactual concept is not generally explicitly applied to epidemiological modeling, but some (e.g., Smith et al.19) mention the concept of exploring what “could” happen under different control scenarios. A policy maker could, for example, mentally compare an outbreak of strep throat where hand washing is the predominant intervention to an identical one where school closures are the main intervention and compare outbreak outcomes to aid in decision support about which intervention to use. This method aims to make those mental comparisons more explicit, thus affording more transparent decisions.

This work contributes to the field by providing a mechanism to weigh control measures (our κ value), an extensive description of ways to use the counterfactual in decision support, and a mechanism to enable broad adoption. By investigating plausible disease parameter ranges, rather than point estimates, we can analyze a number of possible outbreaks and begin to quantitatively understand how disease parameters affect outbreak outcomes of interest. Further, we contribute to the relative lack of compartmental model interfaces and provide a mechanism for iteration and feedback with public health end users.

To explore SIR models as decision support tools, we apply the model to three diseases—measles, norovirus and influenza. Within disease parameter ranges, we describe the worst case scenario where the model indicates a reduction in cumulative infected can still be achieved. These observations are hypothesis generating, rather than validated endpoints, because of the general lack of validation in previous work on SIR models. As such, we describe a path for future research in the discussion.

Methods

SIR model

As described above, the SIR model is a commonly used compartmental model used for infectious disease outbreaks. Because of the large literature base describing both the history and use of SIR models (e.g., see Keeling and Rohani20), we present a somewhat abbreviated description here.

People are assigned to three compartments based on their disease status at time t (see Fig. 1). The number of people in each compartment varies with time as the outbreak progresses, but the overall population in the model stays constant. Susceptibles (S) are those that are at risk of infection. Infected (I) are individuals experiencing the illness, and recovered persons (R) have completed infection and are now immune to the disease, or died as a result of the infection. Movement between compartments is described by the following system of equations:

Figure 1: In a SIR model, individuals move between three compartments—S (susceptible), I (infectious) and R (recovered).
figure 1

Movement between the categories is dependent on β and γ which describe the “force” of the infection and the infectious period, respectively. The number of infectious persons at any given time results in the epidemic curve familiar to many epidemiologists.

Transitions between compartments are described by two parameters, γ and β. γ is the reciprocal of ψ, the infectious period (see equation (5)). The infectious period is the interval of time during which an infected individual can transmit the disease. Time in our model is represented in days. Measles, for example, has an infectious period of approximately 8 days; individuals can transmit the disease from approximately 4 days before rash onset until 4 days after21. γ controls the transition from infectious to recovered (see equations (2) and (3)). β is commonly referred to as the “force of infection” because it describes how quickly a disease can move through a population. It is the product of γ and the reproductive number (R0), and controls the transition from susceptible to infectious (see equations (1) and (2)). The reproductive number, R0, is the number of secondary infections per primary infection (see equation (4)). Of note, R0 can be estimated a number of ways. Obadia et al.22 describe an overview of several common methods including exponential growth, maximum likelihood, sequential bayesian and a time-dependent method. Each method makes slightly different assumptions, and can result in different reproductive values. Values should be calculated and interpreted with these caveats in mind23.

SIR augmentations

To expand the above model and introduce a scenario where a control measure is applied to an outbreak, we introduce two additional parameters:

  1. 1

    λ or control measure effectiveness describes what fraction of individuals are removed from the susceptible population at each time point. For example, λ = 0 is a control that is completely ineffective and removes no individuals from the susceptible population during each time interval. Conversely, λ = 1 is a control measure that is 100% effective, or removes all susceptible individuals in one time interval. A more realistic value might be λ = 0.01, which would describe an intervention that removes 1% of the susceptible population during each time interval. A more detailed description of the appropriate interpretation of this parameter is included below.

  2. 2

    τ or control start is the time unit (interpreted as days throughout this analysis) on which the control measure begins. For the purposes of the simulations presented here, it is assumed that the control is implemented on day τ and continued for the remainder of the outbreak.

To apply a control measure, equations (1) through (3) are modified such that if time (t) is ≥τ, a control measure with λ effectiveness is applied at each time step (see equations (6), (7), (8)). To denote the “controlled” environment the subscript T is added.

Within this implementation, λ is the fraction of the susceptible population removed at each time point. Operationally, this describes a control measure that eliminates the possibility of infection. For example, λ = 0.1 and τ = 5 indicates a scenario where 10% of the susceptible population is removed every day on and after the 5th day. This would describe an intervention like vaccination or quarantine. However, it does not describe the type of control measure that reduces infectivity, changes human behavior in a way that affects population density (e.g., staying home from work/school), or does not confer immunity to the infection (e.g., hand washing). Further, the control measure is continuously applied at each time point after initiation (e.g., 10% of the susceptible population are vaccinated each day for the remainder of the outbreak). This assumes that the control is implemented with the same effectiveness throughout each time interval. These assumptions simplify the addition of control measures to the SIR model, but it is relatively straightforward to modify this implementation in the future to a more realistic scenario. These equations are also limited to describing one control measure. It would be comparatively straightforward to substitute λ with a vector of parameters as opposed to a single term, in order to describe multiple control types and effectivenesses.

Assumptions

There are a number of assumptions inherent in SIR models24. As a result, SIR models are scoped to diseases that meet the following criteria:

  1. 1

    The disease is transmitted person-to-person. This means the disease is not transmitted via vector, or environmental component like water or food.

  2. 2

    Disease transmission can be described via homogenous mixing. This means that if a group of susceptible people interact with an ill person, all susceptible person are equally likely to acquire infection. Of importance, a majority of infections do not meet this assumption. For example, sexually transmitted infections are not transmitted with equal probability among the entire susceptible population. Even in the case of airborne infections like measles this assumption ignores individuals’ specific immune responses (e.g., immunocompromised individuals are treated the same as healthy individuals).

  3. 3

    The disease confers immunity. This means that once an individual has recovered they cannot get contract the illness again during the same outbreak. Diseases with very short-term (or no) immunity are commonly modeled with SI models.

  4. 4

    The disease’s incubation period is relatively short. Diseases with long incubation periods should include the “exposed” category and can be modeled with a SEIR model25.

  5. 5

    The disease is an acute illness (i.e., infected individuals recover or die). This excludes chronic diseases like hepatitis.

κ - Comparing controlled and uncontrolled outbreaks

In order to compare a controlled outbreak to its counterfactual outbreak, we introduce the outcome measurement, κ. It describes the ratio of the cumulative number infected in a controlled outbreak to the cumulative infected in an uncontrolled outbreak. The cumulative number infected throughout the outbreak at time t is equal to the number recovered at time t plus the number infectious at that time (see equation (9)). At the end of an outbreak (t = end) there are no individuals left in the infected category and equation (9) reduces to equation (10).

κ = 1 means that the controlled and uncontrolled outbreaks are identical, or that the control measure had no effect. κ > 1 means that the controlled outbreak had more infected than the uncontrolled outbreak (or that the control measure had a detrimental effect). κ < 1 indicates that the control measure reduced the number of infected persons. For example, κ = 0.01 is interpreted to mean that the controlled outbreak was 1% as large as the uncontrolled outbreak. κ = 0 would describe a scenario where the control measure stopped the outbreak from occurring at all.

Sensitivity analysis

To assess the effect of each parameter on the model, we performed a sensitivity analysis. We varied γ, β, λ and τ within specific ranges (see Table 1) at random for 10,000 trials. During each trial, we randomly picked the value of each input parameter from the specified range, ran the model using those values, and recorded the outcomes. Here, outcomes of interest are the number infected in a controlled scenario, number infected in an uncontrolled scenario and the related κ. We then analyzed each parameter’s relative impact on these outcomes. Because γ and β vary together and can be described simultaneously using R0 (see equation (4)), we also analyzed how the change in R0 affects the outcomes. Because λ and τ only exist as parameters in controlled outbreaks, the controlled scenario is the only outcome considered.

Table 1 Sensitivity analysis ranges tested.

Application to three diseases

We applied this model to measles, norovirus, and influenza. These diseases were selected because they are of public health interest, and because they meet the requirements described in the assumptions section above. Of note, norovirus can be transmitted both through food and via person-to-person. Outbreaks described here are solely person-to-person transmitted outbreaks.

Outbreaks were simulated using standard parameter ranges for the three diseases. These ranges were identified based on literature values reported for parameters, identified via searches using Google Scholar and PubMed. Search terms included “[disease name] + infectious period”, “[disease name] + contact rate”, “[disease name] + force of infection”, and “[disease name] + reproductive number”. We were consistently unable to find reported literature values for β and instead used equation (5) to find the maximum and minimum β values for each disease, given their infectious period and R0 values.

Rather than attempting to identify control parameters (λ and τ) based on literature values, we intentionally selected a broad range of possible parameters to observe the effects of a broad number of controls. To account for the logistic work that precedes control initiation (identifying the outbreak, laboratory confirmation, mobilizing resources etc.) we selected a minimum control start of 3 days. We then selected upper bounds based on typical outbreak progression for each disease. Measles and influenza can both result in outbreaks that are several weeks to months long. Thus, we selected one month (30 days) as the upper bound of τ. Conversely, norovirus outbreaks are typically much shorter due to their short infectious period. We thus limited the latest possible control start to 7 days. All λ values were varied between 0.005 and 0.3 (0.5% to 30%). Table 2 describes the ranges used for each parameter and disease.

Table 2 Parameter Selection.

Development of a web-based tool

To make this model available for decision making, we developed a web-based application that allows a user to enter parameter ranges for their disease, initial population variables and control information. It is a Django application26 that uses HighCharts27 for visualization. All code for the SIR model was written in Python 3.528.

To make the application more user friendly, two small modifications were made to γ and β such that they could be expressed as the infectious period (see equation (11)) and R0 (see equation (12)). These terms are more familiar to public health individuals than γ and β, which are commonly used by modelers.

Results

Sensitivity analysis

Figure 2 shows the results of the sensitivity analysis. Plots show each parameter with respect to the cumulative number infected (in either controlled or uncontrolled outbreaks), and are colored by the range of the associated κ score. γ shows a strong negative correlation with the cumulative number infected (i.e., larger γ values (shorter infectious periods) result in smaller outbreaks) and R0 shows a positive association with the number of persons affected (i.e., more quickly moving outbreaks infect more people). Both correspond to our intuition about outbreak progression—diseases with short infectious periods infect fewer individuals because the disease is infectious for less time and can thus spread to fewer persons. Conversely, large R0 values correspond to situations where the host can infect many other people, thus resulting in much larger outbreaks.

Figure 2: Each scatterplot shows a parameter (β, γ, λ, R0 or τ) with respect to outbreak size, represented here by the cumulative number of infected individuals in a controlled or uncontrolled outbreak.
figure 2

Each point in the scatterplot corresponds to one trial, as described in the text. Points are colored by the range in which the corresponding κ score falls. Here, strong relationships between parameters and the outcome (e.g., γ and R0) indicate stronger influence on outbreak size compared with parameters that have weak or no relationships (e.g., β, τ).

Results further indicate that β, λ and τ affect overall outbreak size substantially less. There is a weak association between β and outbreak size in controlled outbreaks, as well as a possible association between β and κ, but essentially no association between λ and outbreak size or λ and κ.

Disease application

Figure 3 describes a number of outcomes for measles, norovirus and influenza outbreaks based on literature parameter values (see Table 2) and the resulting κ. Patterns are recognizable both within and across diseases. Within norovirus, for example, it is evident that there are several combinations of outbreaks that produce no outbreak (here defined as fewer than 2 cases total—see gray dots). In particular, as γ approaches larger values (>0.3) individuals progress from infected to recovered too quickly to pass the illness to others. This is consistent with many point source norovirus outbreaks where the number of secondary cases is generally quite small.

Figure 3: Three sets of scatterplots describing simulated outbreaks of influenza, norovirus and measles are shown.
figure 3

Individual scatterplots provide a cross section of possible outbreaks where rows hold β ranges constant while columns hold γ ranges constant. Plots show λ values on the y-axis, control start (τ) values on the x-axis and are colored by κ ranges. Each point denotes a counterfactual (i.e., a controlled outbreak within the given β and γ ranges compared to an identical uncontrolled outbreak). The color indicated the κ score associated with the counterfactual trial. Gray points are combinations that yield no outbreak (defined here as fewer than 2 cases overall), and progressively darker shades of blue indicate less difference between the controlled and uncontrolled outbreak.

Conversely, the vast majority of measles outbreaks simulated are essentially unaffected by any control measure tested (see dark blue dots that indicate controlled and uncontrolled outbreaks are ≥95% similar). Within a given cross-section of outbreak parameters, the τ value (control measure start) affects the resulting κ more than λ indicating that, under this model, implementing a control measure early is more important than implementing the most effective control measure. This is a potentially important finding for decision support and is an intriguing path for further investigation. It is also consistent with our sensitivity analysis findings (see Fig. 2).

For each disease, we identify the latest possible control start and the least effective intervention that could still result in κ values of 0.1 and 0.01 (see Table 3). Interestingly, if control measures have λs that are large enough (minimum 5%), or control starts that are early enough (6—30 days) they can consistently produce dramatic reductions in outbreak load. By examining these values in various parameter ranges, we can begin to see the effects of parameter ranges on κ results. In Table 3, we consider (1) the entire range, (2) the lower 50th percentile of both γ and β values or (3) the upper 50th percentile of both γ and β values. There is a strong distinction between outbreaks with large values (upper 50th percentile) compared to outbreaks with small values. For example, in measles outbreaks, although it is possible to reduce the outbreak to 10% of the uncontrolled outbreak by beginning a control measure 29 days after outbreak onset, dividing the outbreaks into upper and lower 50th percentiles indicates this is actually only possible if both the β and γ parameters fall into the lower 50th percentile and the control is at least 19% effective. Similar, but less dramatic trends are evident in norovirus and influenza.

Table 3 Control measure summary results.

These examples illustrate the possible use of models like this for decision support. By aggregating several models, it is possible to identify general trends that are relevant for intervention decisions.

User interface

To facilitate widespread use of the model, a user interface was developed. Figure 4 shows an example of user data and application output. Output includes the smallest and largest SIR curves possible based on user input, as well as the effect of user control measures on those curves. An additional three graphs describe how outputs (κ) change with changing R0, β and γ values, and describe the minimum required control effectiveness to reduce the outbreak ten times. Visualizing the data multiple ways allows the user to see different aspects of the same outbreak, and facilitates enhanced decision making capabilities. In the example presented, the second graph (titled ‘Intervention analysis’) indicates that changing the control start date by a few days in either direction minimally impacts the resulting κ score, regardless of the R0 value. However, changing control effectiveness from 0.01 to 0.1 dramatically increases κ.

Figure 4: A web-based application using the model presented is shown.
figure 4

User input fields are on the left, and application output is on the right. Inputs are simple, and allow the user to describe ranges of parameters rather than point estimates. Application outputs include traditional epidemiology curves (top left), simple line charts describing the effect of changing control starts or effectiveness (top right) and two heat maps to describe the impact of various parameters on the resulting outbreak (bottom row).

Discussion

We conducted this study to evaluate the feasibility of a simplified approach to decision support for control measure intervention. The larger goal is the development of methodologies that improve collaboration between public health and modeling communities which in turn can facilitate optimum disease response during outbreaks.

Our results suggest that it is reasonable to simultaneously explore the impact of a variety of control measures on outbreak progression in a number of scenarios using simple SIR models. We do so while using a range of outbreak parameters, to understand the effects of both outbreak parameters and control efforts on outbreak progression. We show that, in this model, γ affects the outbreak outcomes most substantially. We further provide a way to measure the relative success of outbreak control using the κ value. We lastly present one possible method to promote adoption of models in the public health community by presenting a simple, web-based interface for the model.

Compartmental models are in many ways preferable to agent-based models because of their simplicity and small computational requirements. However the use of SIR models necessitates adoption of several assumptions that rarely exist in real world outbreaks. Of particular concern is the assumption of homogenous mixing. However, there are numerous ways to improve upon the simple model described here. Other compartmental models (e.g., SEIR, SIS, SI etc.) and methods exist to reduce or modify these assumptions and expand the breadth of applicable disease. For example, Hethcote et al. describe a method to allow non-homogenous mixing within compartmental models for sexually transmitted infections29.

Another possibility is the addition of an underlying network to improve model behavior. For example, Meyers et al.30 found that coupling a compartmental model with an underlying social network allowed them to explain aspects of real SARS epidemics (used for illustration purposes in the introduction) better, than the compartmental models alone. Related possibilities include additions of spatial networks in addition to or instead of social networks31. It is possible that various networks are suited to particular diseases or disease scenarios. These subtleties offer opportunities for extensive further research. Importantly, many variations of these models continue to maintain comparatively low computational requirements, while allowing for a better representation of reality.

Another, related focus should be continued research on the impact of parameter selection on model outputs. Here, we describe an approach where parameters are assumed to be known (or estimate-able), and the range of possible outbreaks are treated as an outcome. In contrast, Wearing et al. estimate parameters by finding the best simulated outbreak fit to real data and identifying the parameters that give rise to that simulation32. Their results caution that model selection (e.g., the type of compartmental model used) can dramatically affect the resulting reproductive ratio estimated. Our results indicate that, in addition, variations in reproductive ratio produce exceedingly different outbreaks. Meyers et al.30 also note the large impact parameter selection and network structure can have on resulting simulated outbreaks.

One obvious possible improvement is in the continued production and extension/refinement of tools to utilize compartmental models and afford control measure simulation quickly and easily. The tool presented here, for example, might be enhanced by adding new compartmental models, refining control definitions, improving visualization, and investigating addition of network structures. Deployment of these systems as open-source code, or freely available web applications should be encouraged.

Overall, there is a clear need in the field to better understand outbreak parameters, model selection, underlying model assumptions, and the ways that these apply to real world scenarios. While SIR models have been used extensively for many years, there has been little work done on validating their output. We thus propose thoughtful validation of SIR models as an important next step. One method to accomplish this is to compare the outputs of validated agent-based models to outbreaks produced using compartmental models. Previously validated agent-based models simulating disease outbreak progression on a fine tuned scale already exist (e.g., EpiSimS33,34) and would provide good candidates for this research.

Such a validation would accomplish several things. It would (1) validate the counterfactual approach, (2) provide additional data to describe when compartmental models are appropriate approximations of real world outbreaks and (3) provide data to describe situations where the compartmental models do not match real world outbreaks and should not be used for decision support.

Additional Information

How to cite this article: Daughton, A. R. et al. An approach to and web-based tool for infectious disease outbreak intervention analysis. Sci. Rep. 7, 46076; doi: 10.1038/srep46076 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.