Introduction

The potential harm posed by the introduction and spread of emerging infectious diseases has been recently illustrated by the 2009 H1N11, SARS2, and Zika3 epidemics. There is considerable evidence that such pandemics are likely to become more frequent unless action is taken to mitigate their spread at a global scale4,5,6,7. For this reason, resilience management for outbreaks has attracted a growing body of literature, both from epidemic modelers and the optimization and control community. One of the most critical aspects of resilience management is the need to combine accurate epidemic growth models with detailed outbreak control strategies8, representing a gap in the literature which this study aims to fill.

The availability of large scale data and growing computational capabilities has significantly advanced infectious disease spread models in recent years9,10,11,12. The community has notably explored the impact of network topology, epidemic thresholds and diffusion models on outbreak spread patterns13,14,15,16,17,18. A range of epidemic models have been developed, which increase in complexity from single-population, deterministic models to metapopulation, stochastic models6,19,20,21. Deterministic models provide efficient mathematical representation but lack the realism of stochastic simulation models20,22,23,24. On the other hand, detailed computational and visualization tools such as GLEaM25 and STEM26, among others27,28,29,30, have emerged as powerful solutions to model the spread of infectious diseases with a high level of accuracy and even measure the impact of control strategies31, albeit at a high computational cost. As an alternative to simulation-based models, analytical global epidemic models have also been developed to characterize the stochastic spread of infectious diseases in metapopulation networks5,32,33.

As highlighted by4, the intensive restriction of human activities in metapopulation networks may undermine the system’s functionality and lead to significant societal costs. Hence, in efforts to mitigate large-scale pandemics, it is critical to cautiously deploy control strategies to minimize disruptions and maximize the reactiveness of the system34,35,36,37. At a global scale, passenger air travel is known to play a critical role in the spread of infectious disease16,27,38,39,40,41. Additionally, border control has been shown to play a pivotal role in mitigating epidemics, especially during the emerging stage of outbreaks5,42,43,44,45. Border control is typically deployed at airports in an attempt to prevent the spread of an infectious disease between cities, states and countries through passenger air travel46. However, identifying the optimal set of airports for deploying border control is challenging, especially when only limited control resources are available, e.g., budget constraints.

In this work we present a novel mathematical modeling framework that integrates both outbreak dynamics and outbreak control into a decision-support tool for mitigating infectious disease pandemics at the onset of an outbreak through border control. The border control mechanism considered in this work is passenger screening upon arrival at airports (entry screening), which is used to identify infected or at-risk individuals and provide immediate treatment and isolation to reduce the risk of further transmission47. The uncertainty of exit screening effectiveness in other countries combined with the possible development of symptoms during a flight48 has prompted several governments to deem entry screening as crucial to the protection of their countries49, and further motivates its use in this study. The proposed model seeks to determine the optimal set of airports that should be allocated screening resources (technology and personnel), and the corresponding amount (thus dictating the proportion of arriving passengers that can be screened) such that the outbreak risk is minimized. We propose an ensemble of control strategies that exploit the heterogeneity of the air traffic network structure and outputs from an outbreak simulation model in the allocation of control resources. We evaluate each control strategy using a stochastic metapopulation epidemic model, and compare the strategies based on their effectiveness in reducing the spread of outbreaks. Note that we are not proposing or evaluating air travel restrictions, which have substantial economic costs as well as recognized limitations in their ability to prevent or reduce the scale of pandemics24,33,50,51,52,53,54. The goal of the proposed decision-support framework is to optimize the use of airport screening for border control to minimize the harm, i.e., number of infected persons and cities, posed by the introduction of a new disease into susceptible cities, thus providing local public health authorities more time to plan, prepare and distribute local control strategies, e.g., anti-virals, vaccines, source exit screening etc., which must be rapidly administered if/when infection is introduced.

This study builds upon previous work55,56 that presented a mathematical modeling framework to integrate control and outbreak dynamics. We extend this line of work through the following substantial contributions: (1) we model outbreak dynamics using a stochastic metapopulation framework as opposed to a deterministic model, (2) we propose and evaluate a novel set of control strategies, (3) the control strategies are embedded within a resource constrained decision-support framework, (4) the model is calibrated using historical outbreak data, and (5) a case study is presented for the 2009 H1N1 outbreak. Additionally, a range of sensitivity analysis is conducted to illustrate the robustness of the model with regards to variability across outbreak scenarios, disease parameters, policy considerations, and modelling assumptions. The analysis elucidates key trade-offs in terms of budget availability and outbreak mitigation which can be leveraged to inform on public health policy for global epidemic preparedness and control.

Materials and Methods

Mathematical Model

In this section, we present both the stochastic metapopulation epidemic model used to simulate outbreak dynamics, and its integration within the proposed decision-support framework. Our metapopulation model is based on a global air travel network which connects local, city-level, populations. Formally, the proposed metapopulation network can be represented by a graph G = (V, E) where V is the set of nodes and E is the set of directed edges in the network. Nodes represent cities and edges represent passenger travel routes, possibly including stopovers, among cities. At each node of the network, we locally model outbreak dynamics using a discrete-time Susceptible-Exposed-Infected-Recovered (SEIR) compartmental model57. The time steps are set to be tT = {1, 2, …, tobs} where tobs is the time step where the state of the outbreak is being evaluated. Local and global outbreak dynamics models are coupled by indexing compartmental states by network nodes \(i\in V\) and time steps t. Specifically, we denote Si,t, Ei,t, Ii,t and Ri,t the susceptible, exposed, infectious and recovered compartments at node i at time t. Because our focus is on the early stages of an outbreak (e.g., weeks or months), we assume that nodes have time-independent populations and we denote Ni the population at node iV. We use this metapopulation model to capture day-to-day global travel dynamics, wherein the time steps are assumed to be of the order of magnitude of a day in length, which is consistent with other studies that simulate infectious diseases dynamics at a global scale31,35,36,43.

We use a multi-commodity network flow model with time-dependent edge flows to model passenger movements from their origin node to their destination node. Let Πij be the set of paths from iV to jV. We denote \({f}_{ij,t}^{k}\) the average passenger flow on the route from iV to jV using path k Πij at time step t and we assume symmetric passenger flows for all pairs of origins and destinations, i.e. \({f}_{ij,t}^{k}={f}_{ji,t}^{k}\). We denote \({{\rm{\Gamma }}}_{i}^{-}=\{j\in V:\exists \,k\in {{\rm{\Pi }}}_{ij},\,t\in T,{f}_{ji,t}^{k} > 0\}\) and \({{\rm{\Gamma }}}_{i}^{+}=\{j\in V:\exists \,k\in {{\rm{\Pi }}}_{ij},\,t\in T,{f}_{ij,t}^{k} > 0\}\) the sets of nodes connected to and from node iV, respectively. Each path k Πij is an ordered sequence of nodes starting at i and ending at j, i.e. k = {i, n1, n2, …, j}. The path-based formulation, while more complex, enables more effective control decisions to be identified by the model. Specifically, the model is able to accurately capture the effect of controlling at stopover airports along a route, as well as identify the most cost-effective control decisions which utilize information about the entire path. For example, consider a group of travelers departing a high-risk infected region, e.g., Sierra Leon during an Ebola outbreak, destined for ten different destination cities; and all ten routes include an initial stopover at the same airport, e.g., JFK. By accounting for the complete path information, the model is able it identify the critical role played by JFK in this scenario, which further enables it to be optimally selected for control. Without accounting for the full travel paths of the routes, the critical role of JFK would be unidentifiable by the model.

The governing infection dynamics of the SEIR model57 are used to model local outbreak dynamics in each city. For the purposes of this work the contact rate is assumed to be constant across populations. We denote βi the (local) contact rate at node i, γ the transition or recovery rate and α the exposed parameter. In addition, we define λ [0, 1] the likelihood to travel when infectious, with λ = 1 representing the case where infected and healthy individuals are equally likely to travel. This parameter aims to represent the impact of reduced travel demand when infectious individuals are unable to travel due to severe symptoms. Finally, we assume that compartmental edge flows are proportional to tail node states, i.e. the number of travelers in a state is proportional to the number of individuals in this state at the origin node. Discrete-time stochastic metapopulation outbreak dynamics are summarized in Eq. (1) below.

$${S}_{i,t+1}={S}_{i,t}-\,\frac{{\beta }_{i}{I}_{i,t}{S}_{i,t}}{{N}_{i}}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{S}_{ji,t}^{k}-\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{S}_{ij,t}^{k}$$
(1a)
$${E}_{i,t+1}={E}_{i,t}+\frac{{\beta }_{i}{I}_{i,t}{S}_{i,t}}{{N}_{i}}-{\rm{\alpha }}{E}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{E}_{ji,t}^{k}-\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{E}_{ij,t}^{k}$$
(1b)
$${I}_{i,t+1}={I}_{i,t}-\gamma {I}_{i,t}+{\rm{\alpha }}{E}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{I}_{ji,t}^{k}-\lambda \sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{I}_{ij,t}^{k}$$
(1c)
$${R}_{i,t+1}={R}_{i,t}+\gamma {I}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{R}_{ji,t}^{k}-\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{R}_{ij,t}^{k}$$
(1d)

The symbols \({S}_{ij,t}^{k}\), \({E}_{ij,t}^{k}\), \({I}_{ij,t}^{k}\) and \({R}_{ij,t}^{k}\) represent compartmental edge flows on (i, j) with destination k at time step t. For compartments S and R, compartmental edge flows are assumed deterministic and equal to their expected values, i.e.: \({S}_{ij,t}^{k}={f}_{ij,t}^{k}\frac{{S}_{i,t}}{{N}_{i}}\) and \({R}_{ij,t}^{k}={f}_{ij,t}^{k}\frac{{R}_{i,t}}{{N}_{i}}\). However, since the compartmental edge flows of exposed and infectious passengers may be considerably smaller than that of other compartments, we model \({E}_{ij,t}^{k}\) and \({I}_{ij,t}^{k}\) as discrete random variables, as the stochastic allocation of infected individuals to destinations is critical when modeling the early stages of an outbreak. Specifically, we define these compartmental edge flows as follows:

$${E}_{ij,t}^{k}=\lfloor {f}_{ij,t}^{k}\frac{{E}_{i,t}}{{N}_{i}}\rfloor +{\tilde{E}}_{ij,t}^{k}$$
(2a)
$${I}_{ij,t}^{k}=\lfloor {f}_{ij,t}^{k}\frac{{I}_{i,t}}{{N}_{i}}\rfloor +{\tilde{I}}_{ij,t}^{k}$$
(2b)

where \({\tilde{E}}_{ij,t}^{k}\) and \({\tilde{I}}_{ij,t}^{k}\) are discrete random variables representing the number of exposed and infectious passengers, respectively, beyond the integer-part of their respective compartmental edge flows. Further, let \(m=|{{\rm{\Gamma }}}_{i}^{+}|\) be the number of destination nodes from node iV, the vector \({\tilde{{\boldsymbol{E}}}}_{i,t}^{k}=({\tilde{E}}_{i1,t}^{k},\ldots ,{\tilde{E}}_{im,t}^{k})\) (resp. \({\tilde{{\boldsymbol{I}}}}_{i,t}^{k}=({\tilde{I}}_{i1,t}^{k},\ldots ,{\tilde{I}}_{im,t}^{k})\)) follows a multinomial distribution with a number of trials \({n}_{i,t}^{k,E}=[{f}_{ij,t}^{k}\frac{{E}_{i,t}}{{N}_{i}}-\lfloor {f}_{ij,t}^{k}\frac{{E}_{i,t}}{{N}_{i}}\rfloor ]\) (resp. \({n}_{i,t}^{k,I}=[{f}_{ij,t}^{k}\frac{{I}_{i,t}}{{N}_{i}}-\lfloor {f}_{ij,t}^{k}\frac{{I}_{i,t}}{{N}_{i}}\rfloor ]\)) and probability vector \({{\boldsymbol{p}}}_{i,t}^{k}=(\frac{{f}_{i1,t}^{k}}{{\sum }_{j\in {{\rm{\Gamma }}}_{i}^{+}}{f}_{ij,t}^{k}},\ldots ,\frac{{f}_{im,t}^{k}}{{\sum }_{j\in {{\rm{\Gamma }}}_{i}^{+}}{f}_{ij,t}^{k}})\). This implementation enables a more computationally efficient global epidemic simulation tool compared with one fully simulating all epidemic compartments as random variables following multinomial distributions, while still capturing the critical and inherent uncertainty of the destination of the first infected travelers.

This stochastic formulation aims to model integer, compartmental edge flows and prevent the movement of fractional exposed or infectious individuals which may result in unrealistic epidemic behavior at a global scale58. Consequently, the compartmental states Si,t, Ei,t, Ii,t and Ri,t are also random variables representative of the evolution of the outbreak over time and space.

To integrate control decisions within the above stochastic metapopulation network we model passenger screening upon arrival at airports as a control variable. Passenger screening can be done through visual inspections of passengers, health declaration cards and/or infrared thermal image scanners59. In this work the specific type of screening is not the focus; as the framework is applicable to multiple control mechanisms. We propose to use airport screening rates as the main control variables, which are representative of the proportion of arriving passengers successfully screened at a given airport. We denote xi,t [0, 1] the control rate at node i at time step t. Control variables can be incorporated in the proposed metapopulation epidemic model by re-defining Eqs (1c) and (1d) as follows:

$${I}_{i,t+1}={I}_{i,t}-\gamma {I}_{i,t}+{\rm{\alpha }}{E}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{I}_{ji,t}^{k}(\prod _{p\in k\backslash \{j\}\,}(1-{x}_{p,t}))-\lambda \sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{I}_{ij,t}^{k}$$
(3c)
$${R}_{i,t+1}={R}_{i,t}+\gamma {I}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}({R}_{ji,t}^{k}+{I}_{ji,t}^{k}(1-\prod _{p\in k\backslash \{j\}\,}(1-{x}_{p,t})))-\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{R}_{ij,t}^{k}$$
(3d)

This formulation is able to capture the combined effects of screening passengers at multiple nodes along their travel route. The combination of Eqs (1a), (1b), (3c) and (3d), hereby to as (3), can be viewed as a control-driven stochastic metapopulation epidemic model wherein variables xi,t represent the level of control over time space in the network. An illustration of the combined effect of node controls along a path with stop-overs is provided in the Supplementary Material (Section A). A control rate of less than one can be interpreted as shortcomings of the methods or technology involved with screening. We assume that passengers coming into a controlled airport who are successfully identified as infected individuals are isolated for treatment, and hence are no longer able to spread infection. In terms of the model, infected individuals screened at a controlled node are assumed to transition to the recovered state R. The main objective function is to minimize the expected cumulative number of infected individuals at the observation time31. This can be stated as follows:

$${\rm{\min }}\,{\mathbb{E}}[\sum _{i\in V}{E}_{i,{t}_{obs}}+{I}_{i,{t}_{obs}}+{R}_{i,{t}_{obs}}]$$
(4)

The challenge in policy decision making results because of a constraint on available resources. To address this challenge we introduce a budget and cost for control. The cost of using outbreak control resources is modeled using a generic cost function which consists of a fixed cost and a variable cost. The fixed cost represents the setup costs associated with the deployment of control resources, e.g., installation of new screening technologies and training of personnel, whereas the variable cost depends on the level of control deployed.

Formally, to model the activation of control at a node, we introduce binary variables yi {0, 1} for each node iV which take value 1 if node i is allocated a non-zero amount of control resources and 0 otherwise. Let si be the setup cost associated to node iV, the fixed cost of the cost function is modeled as yisi, i.e. if control resources are deployed at i then a cost of si monetary units is incurred. To ensure that variable yi is correctly adjusted, we introduce linking inequalities \({x}_{i,t}\le {y}_{i}\) for each node iV and for each time period tT imposing that variable yi must take value 1 if control resources are deployed at node i at any time t during the control period (recall that xi,t represents the level of control at node i at time t and belongs to the interval [0,1]).

To model the variable cost of the cost function, we only require a generic real function \(g({{\boldsymbol{x}}}_{i}):{{\mathbb{R}}}^{|V|}\to {\mathbb{R}}\) that represents the monetary costs associated with the local control vector \({{\boldsymbol{x}}}_{i}\in {[0,1]}^{|T|}\)at node iV over the control period. Although no assumption is made on the shape of function g(xi) is it reasonable to assume that it is non-decreasing with regards to vector components xi,t. Further, in our numerical experiments we will assume that both si and g(xi) are functions of the total average incoming edge flow to i at each time step t, i.e. \(\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{f}_{ji}^{k}\), where \({f}_{ji}^{k}\) is the average flow on path k over the control period. Let ci(yi, xi) be the cost function at node iV, we assume the following generic form for this cost function:

$${c}_{i}({y}_{i},{{\boldsymbol{x}}}_{i})={y}_{i}{s}_{i}+g({{\boldsymbol{x}}}_{i})$$
(5)

Finally, we assume that a budget B is available for deploying control resources which translates into the budget constraint:

$$\sum _{i\in V}{c}_{i}({y}_{i},{{\boldsymbol{x}}}_{i})\le B$$
(6)

Our objective is to optimize the control vector \({\boldsymbol{x}}\in {[0,1]}^{|V||T|}\) such that the impact of the outbreak at tobs as represented by (4) is minimized subject to control resource constraints and outbreak dynamics as governed by the stochastic metapopulation epidemic model (3) wherein compartmental edge flows Eij,t and Iij,t, as well as compartmental states Si,t, Ei,t, Ii,t and Ri,t are discrete random variables. This optimization formulation is summarized in Eq. (7) below.

$${\rm{\min }}\,{\mathbb{E}}[\sum _{i\in V}{E}_{i,{t}_{obs}}+{I}_{i,{t}_{obs}}+{R}_{i,{t}_{obs}}]$$
(7a)

Subject to:

$$\sum _{i\in V}{y}_{i}{s}_{i}+g({{\boldsymbol{x}}}_{i})\le B$$
(7b)
$$\begin{array}{ll}{x}_{i,t}\le {y}_{i} & \forall i\in V,t\in T\end{array}$$
(7c)
$${S}_{i,t+1}={S}_{i,t}-\frac{{\beta }_{i}{I}_{i,t}{S}_{i,t}}{{N}_{i}}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{S}_{ji,t}^{k}-\,\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{S}_{ij,t}^{k}\,\forall \,i\in V,\,t\in T$$
(7d)
$${E}_{i,t+1}={E}_{i,t}+\frac{{\beta }_{i}{I}_{i,t}{S}_{i,t}}{{N}_{i}}-{\rm{\alpha }}{E}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{E}_{ji,t}^{k}-\,\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{E}_{ij,t}^{k}\,\forall \,i\in V,\,t\in T$$
(7e)
$${I}_{i,t+1}={I}_{i,t}-\gamma {I}_{i,t}+{\rm{\alpha }}{E}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}{I}_{ji,t}^{k}(\prod _{p\in k\backslash \{j\}\,}(1-{x}_{p,t}))-\lambda \sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{I}_{ij,t}^{k}\,\forall \,i\in V,\,t\in T$$
(7f)
$${R}_{i,t+1}={R}_{i,t}+\gamma {I}_{i,t}+\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}({R}_{ji,t}^{k}+{I}_{ji,t}^{k}(1-\prod _{p\in k\backslash \{j\}\,}(1-{x}_{p,t})))-\sum _{j\in {{\rm{\Gamma }}}_{i}^{+}}\sum _{k\in {{\rm{\Pi }}}_{ij}}{R}_{ij,t}^{k}\,\forall \,i\in V,t\in T$$
(7g)
$$\begin{array}{ll}{x}_{i,t}\in [0,1] & \forall i\in V,t\in T\end{array}$$
(7h)
$$\begin{array}{ll}{y}_{i}\in \{0,1\} & \forall i\in V\end{array}$$
(7i)

Proposed Control Strategies

The final outbreak dynamics model with control decisions incorporated can be used as a tool to solve the resource allocation problem, and evaluate various control strategies. We propose a set of control strategies to optimize the control vector x subject to resource constraints. In this work, each proposed control strategy relies on a different metric to rank airports, and this ranking is then used to allocate control resources to a select set of airports. The objective of all strategies is to minimize the impact of the outbreak at a pre-selected future date we call the observation time, tobs, at which impact is measured both in terms of total cumulative cases and number of infected cities.

Although the proposed model can accommodate dynamic control strategies, for the purposes of this work we focus on static control strategies, wherein nodes are controlled at the same level throughout the period of observation. Hence, we define and use the following static-equivalent cost function \({g}_{s}({x}_{i}):{\mathbb{R}}\to {\mathbb{R}}\) which we assume to be invertible over its domain. Our approach is based on a greedy resource allocation algorithm which operates as follows: the algorithm works from a set of sorted nodes \(\bar{V}\) and iterates over the nodes in \(\bar{V}\). The sorted node set is assumed to be pre-processed by one of the sorting criteria discussed in Table 1. At each iteration, the algorithm picks the next node i of the sorted set and determines whether this node can be fully controlled (xi = 1) without exceeding the available budget B. Variable U represents the consumed budget. If the node can be fully controlled, the consumed budget is incremented by si + gs(1) which corresponds to the cost of control for xi = 1. Otherwise, the algorithm checks whether node i can be partially controlled: this is possible only if there is enough budget remaining to cover the node setup cost si. If this is possible, the remaining budget is depleted and used to deploy control resources at node i. The control level of node i is determined by taking the inverse of the variable cost function gs(xi) corresponding to the remaining budget available, i.e. B − U − si. After this last pass, the remaining budget is null and the algorithm stops.

Table 1 List of control strategies and their description.

The pseudo-code of this resource allocation procedure is summarized in Algorithm 1.

Algorithm 1
figure a

Greedy outbreak control resource allocation.

We consider multiple ranking strategies to determine \(\bar{V}\). Each control strategy tested is presented in Table 1, as well as the Baseline (B), in which no control is implemented. The first control strategy, Largest Population (LP) simply targets control at airports in cities with the largest population. The next three strategies exploit known properties of the air traffic network, specifically its hub and spoke structure. The Most Travelled (MT) strategy seeks to target control at airports that are highly trafficked based on total incoming and outgoing volumes, while the Most Connected (MC) and Effective Path (EP) strategies seek to target control at airports that are highly connected to the outbreak source based on travel volumes. The last two strategies utilize learned outcomes from a stochastic epidemic simulation model to inform outbreak control. The First Case (1C) strategy aims to control the set of airports most likely to see the first infected passenger (for a given outbreak scenario), while the First Order Uniform (1OU) targets control at the airports where it is likely to have the largest relative marginal impact. The four strategies, MC, EP, 1C and 1OU utilize knowledge of initial outbreak conditions, while MT and LP airport sets are selected independent of the outbreak state.

Data

The metapopulation network is constructed using global passenger air travel data from 2015 provided by the International Air Transport Association (IATA)60. IATA data consists of global ticket sales which account for true origins and final destinations, and represents 90% of all commercial flights. The remaining 10% of trips are modeled using airline market intelligence. The data provided from IATA includes monthly passenger travel volumes for all travel routes connecting airport pairs (including stopovers), representing nearly 83% of global traffic volumes. The final network used in this study contains the top 99% of the travelled routes provided, resulting in a network with approximately 500,000 routes, 2,908 cities, and 3,267 airports. The city populations served by each airport are based on the population densities provided by Oak Ridge National Laboratory’s LandScan61. The population size for each city was based on a 50 km radius centered on each airport as was done previously62,63 and computed using open source Geographic Information Systems software QGIS (https://qgis.org/). In some cases, multiple cities are serviced by more than one airport, for which all the assigned airport flows are mapped to the same population.

Results

We demonstrate the performance of the proposed control strategies to mitigate global outbreak spread for a set of hypothetical outbreak scenarios and present results from a cost-benefit analysis, which characterizes the marginal gains in outbreak reduction with respect to increases in available resources, i.e., budget. Further, all strategies are applied to a case study representative of the 2009 H1N1 Pandemic Influenza to illustrate the hypothetical impact of each in a similar outbreak setting.

In this study, only U.S. cities are considered for control, intended to provide an example of how the model may be used by a country’s public health authority, which is constrained by an available federal budget for control. The two metrics used to compare the performance of each strategy are i) the total cumulative number of infected individuals in the U.S. at observation time, tobs, and ii) the number of infected cities in the U.S. at observation time, tobs, where an infected city has at least one infected individual.

Base Case Analysis

For the base case analysis, all of the proposed control strategies are implemented and compared for three independent hypothetical outbreak scenarios that vary by the outbreak source location. Three source cities are selected (1) Orlando, Florida, (2) Portland, Oregon and (3) Honolulu, Hawaii, and each is initialized with 100 infected individuals at t = 0. These cities were chosen because they represent a range of geographic and travel profiles. In the remainder of this work these three cities are denoted by their assigned airport IATA codes, MCO, PDX and HNL, respectively.

To model the cost of control at airports, we consider linear cost functions. Setup costs si represent screening equipment costs based on the total incoming flow at each airport iV. Let M be the cost of a passenger screening machine and C its capacity, we set \({s}_{i}=\frac{M}{C}\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}\,{f}_{ji}^{k}\,\), where \({f}_{ji}^{k}\) is the average flow on path k over the control period. For the variable cost, we assume that the cost of screening a passenger is represented by P and set \({g}_{s}({x}_{{\boldsymbol{i}}})={x}_{i}{t}_{obs}P\sum _{j\in {{\rm{\Gamma }}}_{i}^{-}}\sum _{k\in {{\rm{\Pi }}}_{ji}}\,{f}_{ji}^{k}\)to model the impact of deploying control resources over a varying time period and at a varying level of control. For the base case analysis, we set the available budget B to $500 million. The parameter values are M = $500,000, C = 10,000 and P =  $10, and based on the existing literature64. Under this configuration, the available budget is enough to fully control the 13 most travelled airports in the U.S. for tobs = 50.

For the scenarios presented in this section, the simulation is set to begin on June 1 and tobs is set to 50 days. The time period of 50 days is chosen to align with the focus of this study, which is to identify the most strategic allocation of outbreak control resources that provides robust border control in the early stages of an outbreak. Thus, the planning time period considered (50–100 days) is intended to represent the high risk period at the early stages of a potential epidemic, and potentially before local control is fully in effect. We assume after this period of time, alternative (and more effective) local control methods will be put in place, which are not accounted for in this analysis. For the base case analysis, the hypothetical virus has values of α = 0, β = 0.25, γ = 0.143 and λ = 1. The chosen baseline parameters correspond to a disease with a reproductive ratio R0 = 1.75. The U.S. level outbreak dynamics (SIR curves) for each of the three source cities is presented in the Supplementary Material (Section B). For the base case, under a do-nothing scenario, the final proportion of the U.S. population infected is around 70%. At tobs = 50, the chosen planning time period in this study, only 0.01% of the U.S. population is infected, or around 30 to 35 k individuals on average.

Extensive sensitivity analysis was conducted to assess how the different control strategies respond to differing disease characteristics, model assumptions and policy decisions. Specifically, we explored how changes in the contact rate, planning horizon, control start time, control effectiveness, and source screening impact the performance of each strategy. Results for all sensitivity analysis are provided in the Supplementary Material (Section C). All results are based on 1,000 simulation runs of the stochastic metapopulation epidemic model. Each run of the simulation took an average runtime of three seconds on a desktop computer.

Figure 1 provides the cumulative number of cases in the U.S. at tobs for scenarios MCO, PDX and HNL. The boxplots capture the results for all 1,000 simulations conducted, illustrating the robustness of the results and rankings. In each plot the six proposed strategies to allocate screening resources are compared against the baseline (corresponding to no control) for the respective scenario. Similar trends are evident for all scenarios, with the more simplistic strategies of controlling at the airports in the largest cities or most travelled airports performing poorest, and EP and MC performing best.

Figure 1
figure 1

Control strategy performance for the base case scenarios MCO, PDX and HNL in terms of number of infected individuals. The figure reports the cumulative number of infected individuals in the U.S. at the observation time tobs = 50 days for each control strategy. Each boxplot represents the distribution of the criterion measured over 1,000 simulations of the stochastic metapopulation epidemic model under the corresponding control strategy.

For the MCO scenario, the cumulative number of infected individuals is on average 33,200 for the baseline configuration (no control). Using control strategy LP leads to a reduction of 25.2% in the number of cases, compared with a reduction of 31.2% using EP or MC. For the PDX scenario, the amount of infected individuals is on average 36,000 in the baseline case and using EP or MC results in a reduction of 20.6%. For the HNL scenario, we report a reduction of 47.7% from the baseline to the best control strategy as well as substantially less variability across outbreak simulations.

Figure 2 provides the number of infected cities in the U.S. at tobs for the three scenarios. This metric is critical to assess the success of the control strategy with regards to preventing spread into new cities. A similar trend is observed across all scenarios, with strategies EP, MC, 1C and 1OU proving superior to MT and LP. For the MCO scenario, the number of cities affected is on average 137 for the baseline case. Using control strategy MT leads to a reduction of 24% in the number of cities affected, compared with a reduction of 31% for EP/MC. For the PDX scenario, the amount of infected is on average 107 in the baseline case and 40 for the EP/MC cases resulting in a reduction of 63%. For the HNL scenario, we observe a more substantial reduction of 90% in the number of cities affected from the baseline to the best control strategy.

Figure 2
figure 2

Control strategy performance for the base case scenarios MCO, PDX and HNL in terms of number of infected cities. The figure reports the number of infected cities in the U.S. at the observation time tobs = 50 days for each control strategy, where a city is categorized as infected if it contains at least one infected individual. Each boxplot represents the distribution of the criterion measured over 1,000 simulations of the stochastic metapopulation epidemic model under the corresponding control strategy.

Figure 3 illustrates the decisions and impact of the proposed border control strategies against the baseline no-control case for a particular hypothetical outbreak scenario. Specifically, the maps display the set of airports selected for control (blue crosses), and the expected size of the outbreak in each affected city at the observation date (red circles). The red circles are sized proportional to the cumulative number of cases at tobs at each location. The small grey circles illustrate the full set of airports that can be selected for control, further highlighting the scale and the complexity of the problem. The results shown in Fig. 3 are for the PDX scenario, and the panels correspond to the following strategies: A) No-control, B) MC/EP, C) 1C, D) 1OU, E) MT and F) LP, and. Note: EP and MC have the same airport control set in the PDX scenario. The list of airports selected for control in each strategy (and shown in the figure) is provided in the Supplementary Material (Section E). It is worth noting that both LP and MT airport control sets are fixed for a given budget, and independent of the outbreak scenario, unlike the proposed network- and simulation-driven control strategies.

Figure 3
figure 3

Visualization of outbreak control strategies and impact in the U.S. The results are illustrated for the PDX scenario, with an asterisk marking the source of infection. The maps therein display the set of airports selected for control (blue crosses) for each strategy, and the expected size of outbreak in each city at the observation date tobs = 50 days (red circles) measured over 1,000 simulations of the stochastic metapopulation epidemic model. The panels correspond to the following strategies: A) No-control, B) MC/EP, C) 1C, D) 1OU, E) MT and F) LP. The red circles are sized proportional to the outbreak size. The grey circles represent the complete set of airports that can be feasibly selected for control. The maps were generated using the standard base map from OpenStreetMap. ©OpenStreetMap contributors, CC-BY-SA https://www.openstreetmap.org/copyright.

The results in Fig. 3 indicate both the large spatial variability in the airport sets selected for control across strategies, and their respective impact on outbreak location and outbreak size. Specifically, only five (of the 300 possible) airports are selected across all control strategies, i.e., LAX, ORD, DFW, JFK and SFO. As expected, when no control is deployed (Panel A), there are substantially more cities affected than there are under any of the control strategies considered (Panels B–F). While both MC/EP and MT (Panels B and E, respectively) control relatively few and more spatially sparse airports, EP (Panel B) provides substantially greater reduction in both the outbreak size and number of affected cities (as depicted in Figs. 1 and 2) due to the strategic nature of the path-based metric. In contrast, MT (Panel E) is the second worst performing strategy, and spends large portions of its budget on controlling fewer, more heavily travelled airports, which may not play a critical role in the early stages of the outbreak. The poorest performing strategy is LP (Panel F), which, perhaps surprisingly, controls more airports than all other strategies. However, many of these airports are far-removed from the outbreak source in terms of traffic volumes, and are therefore less likely to contribute to spread early in the outbreak. Strategy 1C (Panel C) suffers from similar faults, indicating that budget is spent on airports that are less expensive to control, but also provide less overall impact. These results further highlight the necessity for optimized resource allocation in the context of outbreak control.

Cost-Benefit Analysis

To explore the impact of resource availability on epidemic spread, we conduct a cost-benefit analysis by varying the budget available for control. Specifically, we explore a range of budgets from $0.25 bil to $1.25 bil, in $0.25 bil increments. Note that given the cost functions used in this study, a budget of $1.35 bil is enough to fully control all airports within the U.S. for a 50-day period. The results are summarized in Fig. 4, which illustrates the impact of budget on the effectiveness of each strategy for all three source scenarios. At $0 bil and $1.25 bil the strategies perform nearly the same, which represent the cases of no control and close to full control at all airports, respectively. For budgets in between these values, the impact of all strategies decreases with budget as expected, with control strategy EP consistently outperforming other strategies. The weaker strategies, MT and LP, decrease in a more linear fashion compared to the other four, MC, EP, 1C and 1OU, which show evidence of decreasing marginal returns, e.g., there is negligible improvements after the budget reaches $0.75 bil. These results have valuable implications for policy makers, and indicate the potentially cost-effective nature of resource allocation if assigned strategically.

Figure 4
figure 4

Cost-benefit analysis for the base case scenarios MCO, PDX and HNL in terms of number of infected individuals. The figure depicts the average cumulative number of infected individuals in the U.S. at the observation time tobs = 50 days based on a varying border control budget B expressed in billions of dollars. The data points converge at B = 0 which corresponds to the baseline (no control) case.

Case Study: 2009 H1N1 Influenza Pandemic

To illustrate the ability of the simulation model to replicate a realistic pandemic, we consider the 2009 H1N1 influenza pandemic as a case study, and quantify the performance of the proposed strategies under similar outbreak conditions. The simulation model was calibrated to the 2009 H1N1 outbreak data at both the U.S. and global scales. The calibration methodology and results are included in the Supplementary Material (Section D). The final calibrated disease parameters are α = 1, β = 0.475, γ = 0.25 and λ = 1. For all simulations the starting date of the outbreak is set as the 5th of February 2009 with 1 infected individual placed into the city of Veracruz, Mexico1. Each control strategy is deployed four weeks (28 days) after the first case appeared in Mexico, at which time there were about 100 local cases in Mexico (based on the simulation). The cost function, budget and screening costs are the same as those used in the base case analysis. The results are based on 1,000 simulations of the stochastic model, and the observation time is set to tobs = 100 days after the first case.

The case study illustrates the hypothetical impact of implementing the proposed strategies for an outbreak similar in characteristics to the 2009 H1N1 influenza pandemic. The performance of each strategy based on the two metrics used for evaluation, i.e., the cumulative number of cases and number of infected cites in the U.S. at the time of observation, under each of the proposed strategies are illustrated in Fig. 5. The results highlight the EP strategy to once again dominate, and LP and MT to perform the poorest. Specifically, the EP strategy can reduce the final outbreak size by 37.1% relative to the baseline, with the number of infected cities dropping from 115 to 86, representing a 25.2% decrease.

Figure 5
figure 5

Control Strategy performance for the 2009 H1N1 influenza pandemic case study in terms of number of infected individuals (left) and number of infected cities (right) in the U.S. The figure depicts the performance of each control strategies for an observation time tobs = 100 days, and assuming border control is deployed at 28 days after the first infected individual was reported in Mexico. Each boxplot represents the distribution of the criterion measured over 1,000 simulations of the stochastic metapopulation epidemic model under the corresponding control strategy.

Discussion

This work addresses the challenge of pandemic mitigation planning through border control, specifically using entry screening at airports to minimize the potential harm posed by an outbreak. We present an ensemble of control strategies that are evaluated based on their ability to reduce the cumulative number of cases and the number of cities infected at a target observation time. The decision-support framework provided can be implemented in real-time at the early stages of a confirmed or suspected outbreak. An in-depth analysis is presented for multiple hypothetical outbreak scenarios, and a case study is conducted to illustrate the performance of the control strategies for an outbreak similar to the 2009 H1N1 influenza pandemic. Further, extensive sensitivity analysis illustrates the robustness of the control strategies to various modelling parameters and assumptions.

The best performing control strategies are the network-driven strategies, e.g., EP and MC, which are shown to be robust across a range of outbreak scenarios and model assumptions. In contrast, the more simplistic strategies, e.g., controlling the most travelled airports (MT) or the airports in the largest cities (LP), perform the poorest. The superiority of the network-driven strategies highlights the significance of the heterogeneity of the world air traffic network in outbreak spreading5,16, which can be exploited by policy makers for the purposes of pandemic planning and mitigation. The two control strategies informed by simulation, 1OU and 1C, rank in the middle in terms of performance in most scenarios considered in this work, however the relative performance of the strategies is sensitive to the outbreak initial conditions, as highlighted by the case study. While the results from the analysis presented indicate the EP strategy to be reliably superior, the larger contribution of this work is the modeling framework, which can be implemented in real-time for any outbreak scenario given reported case counts and locations. Additionally, the integrated framework is flexible, and can be easily extended to incorporate and compare the performance of additional strategies if desired.

The cost-benefit analysis highlights two critical issues. Firstly, a given budget can be used more effectively if the control decisions are made strategically. e.g., EP/MC can achieve the same effectiveness as MT/LP for nearly half the budget. Second, there is evidence of decreasing marginal returns for the superior strategies, indicating minimal benefits may be gained by spending more on border control beyond a certain threshold. This cost-effectiveness threshold is critical for policy makers, who can choose to redirect available control resources towards alternative control strategies for outbreak mitigation which may be more effective.

The case study illustrates the expected performance of each control strategy for an outbreak similar in behavior to the 2009 H1N1 influenza pandemic. This analysis critically introduces an asymptomatic exposed state, which poses an additional challenge for control, because even with perfect screening not all infected travelers can be identified. Even so, results from the case study reveal the reliably superior performance of EP as an outbreak control strategy, and once again, the value gained by strategically allocating control resources.

The sensitivity analysis on the infection contact rate (see Section C.1) illustrates that the ranking and effectiveness of the control strategies are robust with regards to the rate of spread. Furthermore, the control strategies are observed to be more effective (relative to the no control scenario) and more robust for faster spreading viruses. The more reliable performance for higher contact rates can be attributed to the highly stochastic nature of outbreaks in their early stages, which is heavily dependent on where the first few infected cases spread to, whereas a faster spreading virus will result in a larger number of infected people travelling, reducing the variability across outbreak scenarios.

Sensitivity to observation time and control start date was conducted to illustrate the impact of these two implementation options available to policy makers. The model appears to be highly sensitive to observation time (see Section C.2), which is due to the exponential nature of outbreak growth; there is a small number of cases and affected cities in the early stages of the outbreaks (e.g. 25 days) compared to considerably higher infection rates at later time epochs (e.g. 100 days). This sensitivity analysis highlights the critical (short) timeline during which border control has the potential to play a substantial role, after which local control will be most impactful. The impact of delaying border control was also evaluated (Section C.3), and the results again highlight the robustness of the strategy rankings. The results also demonstrate the importance of implementing control in a timely fashion, with the best performing strategies revealed to be the most sensitive to delayed start times.

To address the assumption of perfect control, we evaluated the effectiveness of the proposed strategies under imperfect control conditions (Section C.4), limiting the maximum control rate to 80% and 90%, respectively. While the impact of control decreases with the control level, the relative performance and ranking across strategies remains constant, suggesting the best performing strategies remain effective and reliable under imperfect control. Similar results are observed in the case study, which utilizes an SEIR model, i.e., asymptotic infected travelers are able to evade control and introduce infection into new cities.

The final sensitivity analysis evaluated the impact of outgoing passenger screening at the source of infection. This work assumes that outgoing screening at the source is an obvious decision; therefore the model addresses the more challenging problem of selecting which locations other than the source(s) should be prioritized for incoming passenger screening. The sensitivity analysis results reveal that when outgoing screening is implemented at the source (Section C.5) the proposed control strategies behave predictably, i.e., the outbreak spreads faster but the strategy rankings remain consistent. Critically, the best performing strategies perform well even at low levels of outgoing screening, i.e., ineffective screening, while the poorest performing strategies are more sensitive to the effectiveness of outgoing source screening.

Lastly, there are modeling assumptions and limitations of this study. First, the model only accounts for passenger air travel, and excludes mobility within and between cities via other modes of transport. Second, local disease spread (within a city) is modeled deterministically, and a uniform contact rate is used across all populations. Third, the model is currently limited to global control decisions through passenger screening, and does not evaluate local control mechanisms (prophylaxtics, vaccines, school closures, etc), with the exception of source exit screening. Fourth, airport (rather than route) screening rates are the control variable, and therefore assume passenger screening to be uniformly applied across all incoming routes at a given airport. Planned extensions of this work will address these limitations through i) integrating alternative modes of transport into the model, ii) adding additional decision variables to optimize local control decisions, iii) the development of a link-based modelling formulation to allow specific travel routes to be identified for screening (as opposed to airports), and iv) introducing a dynamic resource allocation formulation which relaxes the assumption of constant control across the entire planning period. These extensions would provide more degrees of freedom to improve the impact of control resources and further help in minimizing the risk posed by global outbreaks. The set of limitations and extensions listed currently lie outside the scope of this study, and provide the basis for future research.