An approach to and web-based tool for infectious disease outbreak intervention analysis

Infectious diseases are a leading cause of death globally. Decisions surrounding how to control an infectious disease outbreak currently rely on a subjective process involving surveillance and expert opinion. However, there are many situations where neither may be available. Modeling can fill gaps in the decision making process by using available data to provide quantitative estimates of outbreak trajectories. Effective reduction of the spread of infectious diseases can be achieved through collaboration between the modeling community and public health policy community. However, such collaboration is rare, resulting in a lack of models that meet the needs of the public health community. Here we show a Susceptible-Infectious-Recovered (SIR) model modified to include control measures that allows parameter ranges, rather than parameter point estimates, and includes a web user interface for broad adoption. We apply the model to three diseases, measles, norovirus and influenza, to show the feasibility of its use and describe a research agenda to further promote interactions between decision makers and the modeling community.

a new location or disease because of the amount of work associated with finding location specific data, tweaking parameters, and, often, reproducing code that is not freely available.
A related, but perhaps more pressing issue is the lack of collaboration between the researchers developing models and those making policy decisions during an outbreak. Wagner et al. describe the disconnect between these two fields 12 . The ultimate result is a lack of clarity in the modeling community about the requirements for real-world application of models and production of models that do not meet decision makers' needs 12,13 .
There have been previous efforts to produce widely available models, in particular web-based simulations of agent-based disease models. Several of these efforts have been part of MIDAS (the Models of Infectious Disease Agent Study), including FRED 14 and GLEAMviz 15 , which are both freely accessible and maintained. There is one package for the statistical language R 16 that implements compartmental models. However, available web-based compartmental models are limited and are directed towards educating public health trainees rather than providing an operational modeling platform.
With this context in mind, our aim is to use existing models with low computational requirements to-(1) explore control measures and (2) develop an accessible platform for public health collaborators to use and provide feedback on models. For this initial work, we use a SIR model, modified to include a control measure, to explore many possible disease progression paths. The SIR model was chosen because it is the simplest and requires minimal computational resources. This study presents the application of existing SIR models to investigate control options using a "counterfactual", a web application using the model, and a description of a path forward for validating SIR models.
A counterfactual is a theoretical construct describing a perfect experiment isolating one variable. For example, say patient P is in a clinical trial that is assessing the effects of drug D compared to a placebo. Patient P i is randomly assigned to drug D. A perfect experiment would be for an identical patient, P j , to exist and be assigned to the placebo, so that the researchers could compare the differing effects of the drug versus the placebo in identical people. The concept of a counterfactual has existed in theory for decades 17 , and is commonly used in causal inference in medicine and epidemiology 18 . The counterfactual concept is not generally explicitly applied to epidemiological modeling, but some (e.g., Smith et al. 19 ) mention the concept of exploring what "could" happen under different control scenarios. A policy maker could, for example, mentally compare an outbreak of strep throat where hand washing is the predominant intervention to an identical one where school closures are the main intervention and compare outbreak outcomes to aid in decision support about which intervention to use. This method aims to make those mental comparisons more explicit, thus affording more transparent decisions.
This work contributes to the field by providing a mechanism to weigh control measures (our κ value), an extensive description of ways to use the counterfactual in decision support, and a mechanism to enable broad adoption. By investigating plausible disease parameter ranges, rather than point estimates, we can analyze a number of possible outbreaks and begin to quantitatively understand how disease parameters affect outbreak outcomes of interest. Further, we contribute to the relative lack of compartmental model interfaces and provide a mechanism for iteration and feedback with public health end users.
To explore SIR models as decision support tools, we apply the model to three diseases-measles, norovirus and influenza. Within disease parameter ranges, we describe the worst case scenario where the model indicates a reduction in cumulative infected can still be achieved. These observations are hypothesis generating, rather than validated endpoints, because of the general lack of validation in previous work on SIR models. As such, we describe a path for future research in the discussion. Methods SIR model. As described above, the SIR model is a commonly used compartmental model used for infectious disease outbreaks. Because of the large literature base describing both the history and use of SIR models (e.g., see Keeling and Rohani 20 ), we present a somewhat abbreviated description here.
People are assigned to three compartments based on their disease status at time t (see Fig. 1). The number of people in each compartment varies with time as the outbreak progresses, but the overall population in the model stays constant. Susceptibles (S) are those that are at risk of infection. Infected (I) are individuals experiencing the illness, and recovered persons (R) have completed infection and are now immune to the disease, or died as a result of the infection. Movement between compartments is described by the following system of equations: Figure 1. In a SIR model, individuals move between three compartments-S (susceptible), I (infectious) and R (recovered). Movement between the categories is dependent on β and γ which describe the "force" of the infection and the infectious period, respectively. The number of infectious persons at any given time results in the epidemic curve familiar to many epidemiologists.
Scientific RepoRts | 7:46076 | DOI: 10.1038/srep46076 Transitions between compartments are described by two parameters, γ and β. γ is the reciprocal of ψ, the infectious period (see equation (5)). The infectious period is the interval of time during which an infected individual can transmit the disease. Time in our model is represented in days. Measles, for example, has an infectious period of approximately 8 days; individuals can transmit the disease from approximately 4 days before rash onset until 4 days after 21 . γ controls the transition from infectious to recovered (see equations (2) and (3)). β is commonly referred to as the "force of infection" because it describes how quickly a disease can move through a population. It is the product of γ and the reproductive number (R 0 ), and controls the transition from susceptible to infectious (see equations (1) and (2)). The reproductive number, R 0 , is the number of secondary infections per primary infection (see equation (4)). Of note, R 0 can be estimated a number of ways. Obadia et al. 22 describe an overview of several common methods including exponential growth, maximum likelihood, sequential bayesian and a time-dependent method. Each method makes slightly different assumptions, and can result in different reproductive values. Values should be calculated and interpreted with these caveats in mind 23 .

SIR augmentations.
To expand the above model and introduce a scenario where a control measure is applied to an outbreak, we introduce two additional parameters: 1. λ or control measure effectiveness describes what fraction of individuals are removed from the susceptible population at each time point. For example, λ = 0 is a control that is completely ineffective and removes no individuals from the susceptible population during each time interval. Conversely, λ = 1 is a control measure that is 100% effective, or removes all susceptible individuals in one time interval. A more realistic value might be λ = 0.01, which would describe an intervention that removes 1% of the susceptible population during each time interval. A more detailed description of the appropriate interpretation of this parameter is included below. 2. τ or control start is the time unit (interpreted as days throughout this analysis) on which the control measure begins. For the purposes of the simulations presented here, it is assumed that the control is implemented on day τ and continued for the remainder of the outbreak.
To apply a control measure, equations (1) through (3) are modified such that if time (t) is ≥ τ, a control measure with λ effectiveness is applied at each time step (see equations (6), (7), (8)). To denote the "controlled" environment the subscript T is added.
Within this implementation, λ is the fraction of the susceptible population removed at each time point. Operationally, this describes a control measure that eliminates the possibility of infection. For example, λ = 0.1 and τ = 5 indicates a scenario where 10% of the susceptible population is removed every day on and after the 5th day. This would describe an intervention like vaccination or quarantine. However, it does not describe the type of control measure that reduces infectivity, changes human behavior in a way that affects population density (e.g., staying home from work/school), or does not confer immunity to the infection (e.g., hand washing). Further, the control measure is continuously applied at each time point after initiation (e.g., 10% of the susceptible population are vaccinated each day for the remainder of the outbreak). This assumes that the control is implemented with the same effectiveness throughout each time interval. These assumptions simplify the addition of control measures to the SIR model, but it is relatively straightforward to modify this implementation in the future to a more realistic scenario. These equations are also limited to describing one control measure. It would be comparatively Scientific RepoRts | 7:46076 | DOI: 10.1038/srep46076 straightforward to substitute λ with a vector of parameters as opposed to a single term, in order to describe multiple control types and effectivenesses.
Assumptions. There are a number of assumptions inherent in SIR models 24 . As a result, SIR models are scoped to diseases that meet the following criteria: 1. The disease is transmitted person-to-person. This means the disease is not transmitted via vector, or environmental component like water or food. 2. Disease transmission can be described via homogenous mixing. This means that if a group of susceptible people interact with an ill person, all susceptible person are equally likely to acquire infection. Of importance, a majority of infections do not meet this assumption. For example, sexually transmitted infections are not transmitted with equal probability among the entire susceptible population. Even in the case of airborne infections like measles this assumption ignores individuals' specific immune responses (e.g., immunocompromised individuals are treated the same as healthy individuals). 3. The disease confers immunity. This means that once an individual has recovered they cannot get contract the illness again during the same outbreak. Diseases with very short-term (or no) immunity are commonly modeled with SI models. 4. The disease's incubation period is relatively short. Diseases with long incubation periods should include the "exposed" category and can be modeled with a SEIR model 25 . 5. The disease is an acute illness (i.e., infected individuals recover or die). This excludes chronic diseases like hepatitis.
κ -Comparing controlled and uncontrolled outbreaks. In order to compare a controlled outbreak to its counterfactual outbreak, we introduce the outcome measurement, κ. It describes the ratio of the cumulative number infected in a controlled outbreak to the cumulative infected in an uncontrolled outbreak. The cumulative number infected throughout the outbreak at time t is equal to the number recovered at time t plus the number infectious at that time (see equation (9)). At the end of an outbreak (t = end) there are no individuals left in the infected category and equation (9) reduces to equation (10).
T κ = 1 means that the controlled and uncontrolled outbreaks are identical, or that the control measure had no effect. κ > 1 means that the controlled outbreak had more infected than the uncontrolled outbreak (or that the control measure had a detrimental effect). κ < 1 indicates that the control measure reduced the number of infected persons. For example, κ = 0.01 is interpreted to mean that the controlled outbreak was 1% as large as the uncontrolled outbreak. κ = 0 would describe a scenario where the control measure stopped the outbreak from occurring at all.

Sensitivity analysis.
To assess the effect of each parameter on the model, we performed a sensitivity analysis. We varied γ, β, λ and τ within specific ranges (see Table 1) at random for 10,000 trials. During each trial, we randomly picked the value of each input parameter from the specified range, ran the model using those values, and recorded the outcomes. Here, outcomes of interest are the number infected in a controlled scenario, number infected in an uncontrolled scenario and the related κ. We then analyzed each parameter's relative impact on these outcomes. Because γ and β vary together and can be described simultaneously using R 0 (see equation (4)), we also analyzed how the change in R 0 affects the outcomes. Because λ and τ only exist as parameters in controlled outbreaks, the controlled scenario is the only outcome considered.
Application to three diseases. We applied this model to measles, norovirus, and influenza. These diseases were selected because they are of public health interest, and because they meet the requirements described in the assumptions section above. Of note, norovirus can be transmitted both through food and via person-to-person. Outbreaks described here are solely person-to-person transmitted outbreaks. Outbreaks were simulated using standard parameter ranges for the three diseases. These ranges were identified based on literature values reported for parameters, identified via searches using Google Scholar and PubMed. Search terms included "[disease name] + infectious period", "[disease name] + contact rate", "[disease name] + force of infection", and "[disease name] + reproductive number". We were consistently unable to find reported literature values for β and instead used equation (5) to find the maximum and minimum β values for each disease, given their infectious period and R 0 values.
Rather than attempting to identify control parameters (λ and τ) based on literature values, we intentionally selected a broad range of possible parameters to observe the effects of a broad number of controls. To account for the logistic work that precedes control initiation (identifying the outbreak, laboratory confirmation, mobilizing resources etc.) we selected a minimum control start of 3 days. We then selected upper bounds based on typical outbreak progression for each disease. Measles and influenza can both result in outbreaks that are several weeks to months long. Thus, we selected one month (30 days) as the upper bound of τ. Conversely, norovirus outbreaks are typically much shorter due to their short infectious period. We thus limited the latest possible control start to 7 days. All λ values were varied between 0.005 and 0.3 (0.5% to 30%). Table 2 describes the ranges used for each parameter and disease.

Development of a web-based tool.
To make this model available for decision making, we developed a web-based application that allows a user to enter parameter ranges for their disease, initial population variables and control information. It is a Django application 26 that uses HighCharts 27 for visualization. All code for the SIR model was written in Python 3.5 28 .
To make the application more user friendly, two small modifications were made to γ and β such that they could be expressed as the infectious period (see equation (11)) and R 0 (see equation (12)). These terms are more familiar to public health individuals than γ and β, which are commonly used by modelers. 0

Results
Sensitivity analysis. Figure 2 shows the results of the sensitivity analysis. Plots show each parameter with respect to the cumulative number infected (in either controlled or uncontrolled outbreaks), and are colored by the range of the associated κ score. γ shows a strong negative correlation with the cumulative number infected (i.e., larger γ values (shorter infectious periods) result in smaller outbreaks) and R 0 shows a positive association with the number of persons affected (i.e., more quickly moving outbreaks infect more people). Both correspond to our intuition about outbreak progression-diseases with short infectious periods infect fewer individuals because the disease is infectious for less time and can thus spread to fewer persons. Conversely, large R 0 values correspond to situations where the host can infect many other people, thus resulting in much larger outbreaks.
Results further indicate that β, λ and τ affect overall outbreak size substantially less. There is a weak association between β and outbreak size in controlled outbreaks, as well as a possible association between β and κ, but essentially no association between λ and outbreak size or λ and κ. Disease application. Figure 3 describes a number of outcomes for measles, norovirus and influenza outbreaks based on literature parameter values (see Table 2) and the resulting κ. Patterns are recognizable both within and across diseases. Within norovirus, for example, it is evident that there are several combinations of outbreaks that produce no outbreak (here defined as fewer than 2 cases total-see gray dots). In particular, as γ approaches larger values (> 0.3) individuals progress from infected to recovered too quickly to pass the illness to others. This is consistent with many point source norovirus outbreaks where the number of secondary cases is generally quite small.
Conversely, the vast majority of measles outbreaks simulated are essentially unaffected by any control measure tested (see dark blue dots that indicate controlled and uncontrolled outbreaks are ≥ 95% similar). Within a given cross-section of outbreak parameters, the τ value (control measure start) affects the resulting κ more than λ indicating that, under this model, implementing a control measure early is more important than implementing the most effective control measure. This is a potentially important finding for decision support and is an intriguing path for further investigation. It is also consistent with our sensitivity analysis findings (see Fig. 2).  For each disease, we identify the latest possible control start and the least effective intervention that could still result in κ values of 0.1 and 0.01 (see Table 3). Interestingly, if control measures have λs that are large enough (minimum 5%), or control starts that are early enough (6-30 days) they can consistently produce dramatic reductions in outbreak load. By examining these values in various parameter ranges, we can begin to see the effects of parameter ranges on κ results. In Table 3, we consider (1) the entire range, (2) the lower 50th percentile  shows a parameter (β, γ, λ, R 0 or τ) with respect to outbreak size, represented here by the cumulative number of infected individuals in a controlled or uncontrolled outbreak. Each point in the scatterplot corresponds to one trial, as described in the text. Points are colored by the range in which the corresponding κ score falls. Here, strong relationships between parameters and the outcome (e.g., γ and R 0 ) indicate stronger influence on outbreak size compared with parameters that have weak or no relationships (e.g., β, τ).
Scientific RepoRts | 7:46076 | DOI: 10.1038/srep46076 of both γ and β values or (3) the upper 50th percentile of both γ and β values. There is a strong distinction between outbreaks with large values (upper 50th percentile) compared to outbreaks with small values. For example, in measles outbreaks, although it is possible to reduce the outbreak to 10% of the uncontrolled outbreak by beginning a control measure 29 days after outbreak onset, dividing the outbreaks into upper and lower 50th percentiles indicates this is actually only possible if both the β and γ parameters fall into the lower 50th percentile and the control is at least 19% effective. Similar, but less dramatic trends are evident in norovirus and influenza.
These examples illustrate the possible use of models like this for decision support. By aggregating several models, it is possible to identify general trends that are relevant for intervention decisions.

User interface.
To facilitate widespread use of the model, a user interface was developed. Figure 4 shows an example of user data and application output. Output includes the smallest and largest SIR curves possible based on user input, as well as the effect of user control measures on those curves. An additional three graphs describe how outputs (κ) change with changing R 0 , β and γ values, and describe the minimum required control effectiveness to reduce the outbreak ten times. Visualizing the data multiple ways allows the user to see different aspects of the same outbreak, and facilitates enhanced decision making capabilities. In the example presented, the second graph (titled 'Intervention analysis') indicates that changing the control start date by a few days in either direction minimally impacts the resulting κ score, regardless of the R 0 value. However, changing control effectiveness from 0.01 to 0.1 dramatically increases κ. Individual scatterplots provide a cross section of possible outbreaks where rows hold β ranges constant while columns hold γ ranges constant. Plots show λ values on the y-axis, control start (τ) values on the x-axis and are colored by κ ranges. Each point denotes a counterfactual (i.e., a controlled outbreak within the given β and γ ranges compared to an identical uncontrolled outbreak). The color indicated the κ score associated with the counterfactual trial. Gray points are combinations that yield no outbreak (defined here as fewer than 2 cases overall), and progressively darker shades of blue indicate less difference between the controlled and uncontrolled outbreak.

Discussion
We conducted this study to evaluate the feasibility of a simplified approach to decision support for control measure intervention. The larger goal is the development of methodologies that improve collaboration between public health and modeling communities which in turn can facilitate optimum disease response during outbreaks.
Another possibility is the addition of an underlying network to improve model behavior. For example, Meyers et al. 30 found that coupling a compartmental model with an underlying social network allowed them to explain aspects of real SARS epidemics (used for illustration purposes in the introduction) better, than the compartmental models alone. Related possibilities include additions of spatial networks in addition to or instead of social networks 31 . It is possible that various networks are suited to particular diseases or disease scenarios. These subtleties offer opportunities for extensive further research. Importantly, many variations of these models continue to maintain comparatively low computational requirements, while allowing for a better representation of reality.
Another, related focus should be continued research on the impact of parameter selection on model outputs. Here, we describe an approach where parameters are assumed to be known (or estimate-able), and the range of possible outbreaks are treated as an outcome. In contrast, Wearing et al. estimate parameters by finding the best simulated outbreak fit to real data and identifying the parameters that give rise to that simulation 32 . Their results caution that model selection (e.g., the type of compartmental model used) can dramatically affect the resulting reproductive ratio estimated. Our results indicate that, in addition, variations in reproductive ratio produce exceedingly different outbreaks. Meyers et al. 30 also note the large impact parameter selection and network structure can have on resulting simulated outbreaks.
One obvious possible improvement is in the continued production and extension/refinement of tools to utilize compartmental models and afford control measure simulation quickly and easily. The tool presented here, for example, might be enhanced by adding new compartmental models, refining control definitions, improving visualization, and investigating addition of network structures. Deployment of these systems as open-source code, or freely available web applications should be encouraged.
Overall, there is a clear need in the field to better understand outbreak parameters, model selection, underlying model assumptions, and the ways that these apply to real world scenarios. While SIR models have been used extensively for many years, there has been little work done on validating their output. We thus propose thoughtful validation of SIR models as an important next step. One method to accomplish this is to compare the outputs of validated agent-based models to outbreaks produced using compartmental models. Previously validated agent-based models simulating disease outbreak progression on a fine tuned scale already exist (e.g., EpiSimS 33,34 ) and would provide good candidates for this research.
Such a validation would accomplish several things. It would (1) validate the counterfactual approach, (2) provide additional data to describe when compartmental models are appropriate approximations of real world outbreaks and (3) provide data to describe situations where the compartmental models do not match real world outbreaks and should not be used for decision support.