Abstract
Computational models for large, resurgent epidemics are recognized as a crucial tool for predicting the spread of infectious diseases. It is widely agreed, that such models can be augmented with realistic multiscale population models and by incorporating human mobility patterns. Nevertheless, a large proportion of recent studies, aimed at better understanding global epidemics, like influenza, measles, H1N1, SARS, and COVID-19, underestimate the role of heterogeneous mixing in populations, characterized by strong social structures and geography. Motivated by the reduced tractability of studies employing homogeneous mixing, which make conclusions hard to deduce, we propose a new, very fine-grained model incorporating the spatial distribution of population into geographical settlements, with a hierarchical organization down to the level of households (inside which we assume homogeneous mixing). In addition, population is organized heterogeneously outside households, and we model the movement of individuals using travel distance and frequency parameters for inter- and intra-settlement movement. Discrete event simulation, employing an adapted SIR model with relapse, reproduces important qualitative characteristics of real epidemics, like high variation in size and temporal heterogeneity (e.g., waves), that are challenging to reproduce and to quantify with existing measures. Our results pinpoint an important aspect, that epidemic size is more sensitive to the increase in distance of travel, rather that the frequency of travel. Finally, we discuss implications for the control of epidemics by integrating human mobility restrictions, as well as progressive vaccination of individuals.
Similar content being viewed by others
Introduction
Understanding the dynamics of large, resurgent epidemics is an ongoing scientific effort aimed at controlling and preventing the spread of infectious diseases. Disease epidemiology, computational epidemics, and network science are some of the major scientific fields involved in this high impact social challenge. Notable research has been conducted over the past 30 years, answering important questions on the processes driving epidemics, and proposing strategies for prediction and control1,2,3,4. The heavy socio-economical burden of epidemics has been demonstrated repeatedly during crises like SARS5, Ebola6 or recent COVID-197. To this end, we need to be able to predict long-term epidemic evolution, and the impact of governmental interventions, like isolation, travel restrictions, and vaccination/immunization of the population8,9,10,11,12.
In light of these challenges, we find recent studies that are predominantly augmenting mass-action models into tools suitable for analyzing large scale epidemics8,12,13,14,15,16. However, in most cases, we notice that their underlying epidemic models (e.g., SI, SIS, SIR, SEIR, SIRS) adopt homogeneous mixing of the population (i.e., all individuals are fully connected inside single scale compartments or stochastic blocks)15,16,17,18,19. Also underpinned by homogeneous mixing models, we find many flattening the curve-type solutions that try to reduce the reproduction number \(R_0\); on the other hand, \(R_0\) is found to have little influence on the final size of large-scale epidemics20, as well as being hard to estimate in a real-world context21. The over-simplification of social organization lacks the complexity of global scale population organization22, which is dictated by geographical, historical, demographic and economic factors. Consequently, numerical simulation of such simplified models can lead to over- or under-estimations in terms of epidemic size10,16,23 or duration8,12,15,17.
Conversely, we find some important studies which developed more robust and realistic models for epidemic dynamics and contagion, for heterogeneous population organization and human mobility. Without a doubt, the structure of networks is found to be paramount in explaining infectious spreading patterns4,22 seen in empirical data for transmissible diseases such as SARS24, influenza16,21, measles25, or HIV1. Community structure is a known key factor influencing the speed of epidemics. Chen et al.26 show that overlapping in communities leads to increased infection prevalence and a higher spread velocity in the early stages of emerging infections; Salathé et al.4 show that the dynamics of epidemics is influenced by the structure of communities, which, in turn, has implications on immunization strategies for large epidemics; Shang et al.27 show that overlapping communities and a higher network average degree accelerate spreading; Stegehuis et al.28 show that the structure of communities has a significant influence on the behavior of percolation on networks, as community structure can stimulate or suppress spreading, based on the mesoscopic set of communities.
Further incorporating human mobility and contact patterns increases the realism of an epidemic model. We note the work of Liu et al.21 which shows that the reproduction number \(R_0\) has a much higher variability than expected, due to the heterogeneity of contact networks. Fueled by a data driven approach, the authors propose a multiplex representation of the population. Mistry et al.22 provide accurate age-stratified contact matrices for a large number of countries further motivating the need for a heterogeneous approach in disease modeling. Sattenspiel et al.29 extend a SIR model with five fixed patterns of mobility, but use otherwise large compartments for modeling the population structure. Watts et al.20 propose a fully synthetic hierarchical block model, aimed at reproducing multiple epidemic waves, but without integrating realistic distances between communities, or any correlation to real-world human settlement organization. Additionally, Salathé et al.30 analyze contact networks in the USA and confirm that heterogeneity is essential on the larger scale, but it is reliable enough to assume homogeneous contact inside small communities (like high schools). This conclusion reinforces our choice for partitioning the population into very fine-grained communities (to the level of households), and further simplification of the small communities’ topology, from a social network, to a stochastic block model31.
We also note the recent work of Calvetti et al.32 which adapt a network SEIR model using a single scale lattice of geographical blocks for modeling spatial mobility of the population. By contrast, our approach models a multiscale population with much higher granularity. Finally we note our previous work33, which is a first attempt at exploring geo-spatially organized populations. However33, is limited to studying the impact of country density on epidemic spreading on a single scale population model.
The motivation of this study is to describe a robust epidemic modeling framework which simultaneously incorporates accurate population modeling and human mobility, both of which represent ongoing challenges due to their theoretical complexity as well as limitation in available data for validation. Thus, we propose the novel geo-hierarchical population mobility model (GHPM) which lies at the crossroads of population organization and mobility, both of which are key aspects to consider when targeting realistic large-scale resurgent epidemic outbreaks. We propose the novel idea of distributing a population into spatially organized communities (i.e., human settlements), which are then organized into a hierarchy of administrative divisions (i.e., district, neighborhood, street, block, household). Thus, the population is partitioned with very high granularity all the way down to household-/family-level, containing just a few individuals, but where the transmission risks are highest34. Embedded into our population model, we further propose a novel mobility algorithm based on the geographical distance between settlements and their size, which determines the complexity of the underlying hierarchical structure. Finally, targeting the reliable reproduction of resurgent epidemics, we analyze the complex interplay between the population mobility model of GHPM and outbreak dynamics by adopting a modified SIR model with patient relapse. Altogether, the resulting framework is analyzed using detailed computer simulations.
In contrast to other computational models like GLEaMviz35, RAPIDD Ebola forecasting36, or37, our GHPM model is, to the best of our knowledge, the first framework to combine a geo-spatial and a hierarchical model to structure population, alongside an epidemic model with relapse. Using available empirical data on influenza and COVID-19, we show how GHPM reproduces similar epidemic dynamics (e.g., size, waves). The main focus of this paper is to determine how the population organization, travel distance and travel frequency affect the spread of diseases on large scales (country-level), and how restriction and immunization strategies can be applied efficiently to control epidemics.
Results
Characteristics of large epidemics
Real-world large, resurgent epidemics are known to be shaped by repeated waves of non-deterministic amplitude, and an overall limited epidemic size \(\phi\) (it is safe to assume \(\phi <0.1\) considering epidemics over the last decades38,39). To this end, we investigate the epidemic sizes of various outbreaks over heterogeneous geographical areas. Since many infectious diseases are repetitive, seasonal (e.g., influenza, pertussis), and others appear and are then permanently eradicated (e.g., H1N1, smallpox), our approach, throughout the rest of the study, is to use the yearly epidemic size \(\phi _y\) as a measure of the repeating outbreak waves. Therefore, we further provide a statistical overview of seasonal, eradicated and ongoing epidemics worldwide, as such: seasonal influenza in Germany (2014), influenza in California (winter season 2016/2017), global H1N1 cases (2009), measles in Indonesia (2016), Pertussis in California (2010), and data on COVID-19, with estimates on global epidemic sizes for 2020.
The distributions of the yearly epidemic sizes \(\phi _y\) are given in Fig. 1 [COVID-19 in panel (f)], showing predominantly small-sized epidemics [\(N(\phi _y)\) is highest for smaller outbreak sizes, like \(\phi _y < 0.02\) in panel (f), for COVID-19]. The power-law fits for the flatter area in each distribution [e.g., \(0.02< \phi <0.06\) in panel (f)] range within \(\gamma =0.55-1.05\), well outside the representative power-law exponent \(2< \gamma <3\); this means that larger sized epidemics are not exceptional situations. Nevertheless, the upper yearly real-world threshold for epidemic sizes seems to be small39, e.g., \(\phi < 4 \times 10^{-4}\) for influenza in California, or \(\phi < 0.1\) for COVID-19. Even though COVID-19 is an ongoing pandemic, with final size and dynamics not yet known, the statistical analysis shows that the regional epidemic size distribution is consistent with historical data on other outbreaks.
We observe that, in case of large and/or long-lasting epidemics, we find heterogeneity and multi-modality in outbreak size over a heterogeneous geographical area (i.e, world’s countries, regions in Indonesia, counties in California, Germany). Since regional diversity (e.g., demographics, climate, mobility, NPIs) is highly relevant in determining seasonality and resurgence of epidemics40 we are encouraged to further develop a multiscale population structure, based on real-world geographic data.
The geo-hierarchical population mobility model (GHPM)
Population structuring
Despite of the political, historical and geographical factors influencing the boundaries of countries and states, human populations have always flocked together into settlements41. As such, our proposed GHPM model starts from a targeted, real-world geographical area A (e.g., a continent, a country, a state), quantified by a set S of (real) settlements, which represent the basis of human organization and cooperation. In contrast to the abstract representation of network communities, each settlement \(s_i \in S\) is modeled according to real-world data, using its geographical position \(s_i(x_i, y_i)\) (i.e., longitude, latitude) and estimated size \(\Omega ^*(s_i)\) (i.e., number of individuals).
Next, we create a multi-level hierarchical structure under each settlement based on its estimated population \(\Omega ^*(s_i)\). This hierarchy of administrative divisions is summarized in Table 1, with the lowest level corresponding to households, each containing a family of average size \(h_{size}\)42 (more details in “Methods”). The number of households \(n_h\) for a settlement is estimated as \(n_h=\Omega ^*(s_i)/h_{size}\), where \(h_{size}\) is given by a distribution (see SI.1). From households upward, in the administrative hierarchy of each settlement, we define blocks, whose number \(n_b\) is given by the number of households as \(n_b=(n_h)^\beta\), followed by streets (\(n_s=(n_b)^\beta\)), neighborhoods (\(n_n=(n_s)^\beta\)), districts (\(n_d=(n_n)^\beta\)), and finally the root of the hierarchy represented by the settlement. The impact of choosing different branching factors \(\beta\) is detailed in SI.2. Also, note that the number of hierarchical levels added to each settlement depends on its estimated size \(\Omega ^*(s_i)\) as defined in Table 1. Thus, we differentiate between the complexity of population organization in villages, towns, cities and metropolises (with 2–5 levels accordingly).
Once the hierarchy of administrative divisions is created, each household \(h_j\) in \(s_i\) is randomly populated with individuals based on \(h_{size}\). The final size of a household is represented as \(\Omega (h_j)\) and the final population of each parent division is calculated as the sum of the population of the households in the parent’s subtree. As such, the final population \(\Omega (s_i)\) (without the \(^*\) symbol) of the settlement becomes \(\Omega (s_i) = \sum \Omega (h_j)\); likewise, the total population of A becomes \(\Omega =\sum \Omega (s_i)\). We also introduce the notations \(D_i\), for the set of all divisions inside any settlement \(s_i \in S\) (note that \(|D_i|=n_h+n_b+n_s+n_n+n_d\)), and \(H_i\), for the set of all households in any \(s_i \in S\) (note that \(|H_i|=n_h\), \(H_i \subset D_i\)). This concludes the definition of the geo-hierarchical population structure, which is depicted in Fig. 2, using three settlements from the United Kingdom as an example. For more details on the experimental setup of GHPM refer to the “Methods” section.
Individual mobility
The mobility algorithm is described by a stochastic mobility function, based on the previously obtained geo-hierarchical population structure. The algorithm is applied to every individual \(n_k\), from all settlements \(s_i \in S\), for every iteration of the simulation time t. An individual can adopt one of three different movement scenarios:
-
S1
Travel to another settlement \(s_j \in S\) with probability \(P_1\), in a randomly chosen household \(h_j\) of that settlement. Furthermore, for a random timeout of 1–10 iterations (e.g., days), the individual is part of \(h_j\), after which, the individual is returned to his original household. We use this simple timeout mechanism to implement the idea of business/leisure travel for a random, limited duration.
-
S2
Otherwise, remain in the same settlement \(s_i\) with probability \(P_2\), and move to another division \(div_j \in D_i\) (a household, or any upper level division inside the settlement). Furthermore, the individual is part of \(div_j\) for 1 iteration (e.g., day), after which, the individual is returned to his original household. We use this simple timeout mechanism to implement the idea of short local trips to work, shopping etc.
-
S3
Otherwise, remain home with probability \(P_3\). The individual does not move out of his household.
The mechanisms of inter-settlement travel (scenario 1) and intra-settlement travel (scenario 2) are based on the same principle—a probability proportional to the population of the target settlement/division, and indirect proportional to the distance traveled. As such, the absolute inter-settlement travel probability \(p_{inter}^*(s_i,s_j)\) is:
where \(\Delta\) is the geographical Euclidean distance between the two settlements (in km, based on latitude and longitude), lg is the log base 10 of the population of the settlement to travel to, \(0 < \Psi \le 1\) is a tunable travel distance parameter, and \(0< \Theta \le 1\) is a tunable travel frequency parameter. The effects of these two parameters are detailed in the forthcoming experimental results sections and the Discussion. The actual probability \(p_{inter}(s_i,s_j)\) of travel between two settlements is obtained through normalization of \(p_{inter}^*(s_i,s_j)\) as:
Based on Eq. (2), if an individual leaves its home settlement \(s_i\), he will be associated to a randomly chosen household \(h_j\) in the target settlement \(s_j\). If the individual does not leave the settlement (based on Eq. 2), we compute a similar intra-settlement probability \(p_{intra}^*(h_i,div_j)\) of mobility between a household \(h_i \in H_i\) and any other administrative division \(div_j \in D_i\) (e.g., household, block, street etc.) from within \(s_i\) as:
where, similar to Eq. (1), we use the distance \(\Delta\), population \(\Omega\), the travel frequency parameter \(\Theta\), and the travel distance parameter \(\Psi\). Inside a settlement, \(\Delta\) is the maximum distance to the lowest common ancestor of the two divisions. See Fig. 2 with examples \(\Delta =1\) (London) and \(\Delta =3\) (Edinburgh). When \(div_j\) is another household, then \(\Omega (div_j)\) is equivalent with the size of the household \(\Omega (h_j)\); for \(div_j\) being an upper level division, \(\Omega (div_j)\) is the sum of sizes of all households under that division. The actual intra-settlement probability \(p_{intra}(h_i,div_j)\) of travel between a household and another division is obtained through normalization of \(p_{intra}^*(h_i,div_j)\) as:
Based on Eqs. (1)–(4), any individual will choose one of the three mobility scenarios with the following probabilities: \(P_1 = p_{inter}(s_i,s_j)\) for scenario 1, \(P_2 = (1-p_{inter}(s_i,s_j))\cdot p_{intra}(h_i,div_j)\) for scenario 2, and \(P_3 = (1-p_{inter}(s_i,s_j))\cdot (1-p_{intra}(h_i,div_j))\) for scenario 3. Adding up the three probabilities \(P_1+P_2+P_3 = 1\).
As a final observation, we consider that while an individual is “away” from home (scenarios 1, 2), he is exempt from any further travel until he returns to his original household. Also, the starting point of travel for any individual is its original household.
Epidemic transmission
We intend to use the GHPM model to replicate resurgent epidemics, like influenza or COVID-19 (see Fig. 1), so that we adopt a SIR epidemic model with relapse43, also know as a SIRI model44. Hence, we make the following assumptions:
-
Each individual can be in one of three mutually exclusive states (susceptible S, infected I, or recovered R), where the fraction of the population in each state, at any discrete moment in time t, is denoted as S(t), I(t), and R(t), respectively. At every time step, an infected coming in contact with a susceptible individual, can transfer the disease with a probability \(\lambda\). Subsequently, an infected individual can recover with a probability \(\mu\), after which he remains recovered, but only for a specific period (see “GHPM experimental setup” for details on the reproduction number estimation). After this timeout period, the individual becomes susceptible again.
-
Homogeneous population mixing is sufficiently accurate at the small granularity of households30,34, as well as for short duration (1 day) in higher administrative divisions (e.g., block, neighborhood). Moreover, we consider that epidemic contagion only occurs between infected and susceptible individuals found in the same administrative division at the same time t. Given \(R_0>1\) as a necessary condition to trigger an epidemic, we choose \(R_0=3\); nevertheless, this does not guarantee that an outbreak will occur every single simulation.
-
Only one random individual (seed) is infected at \(t=0\). Once triggered, an epidemic can develop inside the household of the seed using homogeneous mixing with the described SIR model. Further spatial spreading depends entirely on the described mobility algorithm.
Estimating epidemic dynamics
We run the GHPM model in various scenarios, defined by different parameter settings, to better understand the potential of recent epidemic outbreaks to exhibit heterogeneous and resurgent behavior.
In Fig. 3 we investigate the potential of our GHPM model’s heterogeneity in estimating epidemic dynamics. As such, we vary the travel distance parameter \(\Psi\) (while keeping all other parameters fixed). Figure 3d depicts a bimodal distribution of the yearly epidemic size \(\phi _y\), given simple homogeneous population mixing, modeled through one single-scale settlement (\(\Omega =533,160\) individuals). Alternatively, Fig. 3e,f depict two epidemic size distributions achieved by GHPM with its specific hierarchical structure and mobility (same population \(\Omega\), \(R_0=3\), simulation time \(t=2000\) iterations (days), \(\Psi =0.1\), respectively \(\Psi =0.35\)). Regardless of the epidemic sizes, a visible mode is detected near \(\phi _y=0\), followed by a relatively flat distribution. Figure 3b,c shows two representative simulated epidemics generated with GHPM for the corresponding travel distance parameters \(\Psi =0.1\), and \(\Psi =0.35\). Figure 3a depicts the same time series information corresponding to the homogeneously mixing model in Fig. 3d.
Two main observations can be summarized from Fig. 3. First, the homogeneous model in panel (a) reproduces an over-simplified spike in the number of daily cases without any resurgence, while panels (b-c) display different shapes and sizes \(\phi _y\), as well as resurgence. Even if the epidemic looks like it repeatedly dies out in panel (b), it manages to flare up again in new waves. Second, the homogeneous model in panel (a) is only capable of reproducing a bimodal epidemic size distribution. Nevertheless, real data (see Fig. 1) confirms the variability and resurgence reproduced by our GHPM heterogeneous model.
These results also have implications in the context of epidemic control. Even though the two scenarios depicted in Fig. 3b,c (with corresponding distributions in Fig. 3e,f) have a similar parameter setting, the difference in travel distance \(\Psi\) results in a large variation of epidemic size. As \(\Psi\) increases, the expected epidemic size \(\phi\) increases and the distribution tends to become more bimodal. Conversely, for very small \(\Psi\), the epidemic size distribution tends to have one single mode near 0. Thus, extremes of the travel distance parameter \(\Psi\) lead to entirely local or entirely global outbreaks, and GHPM can be compared with a homogeneous one. However, when \(\Psi\) is neither too small or too large (like in Fig. 3b,c), we obtain the most faithful reproduction of real epidemic dynamics (Fig. 1c,d), and the impact of a fixed \(R_0\) is negligible.
The impact of travel distance versus travel frequency
Figure 4a summarizes the impact of increasing the travel distance parameter \(\Psi\) on the total epidemic size \(\phi\). In accordance to Eqs. (1) and (3), \(\Psi\) influences the range over which individuals are likely to travel, both outside and inside settlements. Namely, a higher \(\Psi\) increases the probability of any individual to travel to a more distant settlement (or division) from its original household, because it directly reduces the weight of \(\Delta\) in Eqs. (1) and (3).
Figure 4b depicts \(\phi\) as a function of the travel frequency parameter \(\Theta\). In addition, we choose to introduce \(\Theta _h\) as the average number of individuals transiting (leaving) a household, which is a more intuitive measure than \(\Theta\) alone. Unlike \(\Theta\), which is an input parameter in the model, the value of \(\Theta _h\) is measured during simulation, and normalized by the number of households. The inset of Fig. 4b shows the linear relationship (\(\Theta\), \(\Theta _n\)). In accordance to Eqs. (1) and (3), \(\Theta\) affects the weight of each probability of any individual to travel outside its home settlement (or household).
In case of Figure 4a, we observe a clear phase transition from local to global epidemic as \(\Psi\) increases. Intuitively, when the two travel parameters are close to 0, infected individuals are more likely to remain in their original household (or settlement); as such, the resulting epidemic size \(\phi\) is delimited to the local scale of \(h_{size}\). As long as the size of a modeled local scale is much smaller than the size of the entire population, truly large epidemics cannot occur with \(\Psi ,\Theta ,\Theta _h \rightarrow 0\). On the other hand, When \(\Psi > 0.01\) and \(\Theta _h>0.01\), large scale outbreaks can occur, regardless of the modeled population size.
When either \(\Psi\) or \(\Theta _h\) converge towards extremes (0 and 1) GHPM can be assimilated to a homogeneously population model, where the set of susceptible individuals coincides with the size of a household \(h_{size}\) and \(\Omega\), correspondingly (stochastic dynamics appear at the household level, or at the global scale). Nevertheless, neither of these limits (0 and 1) are realistic representations of real-world mobility, which is a mixture of local and non-local patterns. As such, we further focus on the intermediate values of travel parameters where shifts in \(\phi\) are visible. The phase transition in Fig. 4a is best fitted by a 3rd degree polynomial, whereas the less abrupt increase in \(\phi\) triggered by \(\Theta _h\) is best modeled by a linear fit. This suggests that small changes in \(\Psi\) have a much greater impact on the size of the epidemic \(\phi\) compared to similar changes in \(\Theta _h\). Similarly, by considering the linear relationship between (\(\Theta\), \(\Theta _h\)), depicted in the inset of Fig. 4b, we conclude that the epidemic size \(\phi\) is more sensitive to increases in \(\Psi\) than increases in \(\Theta\).
The analysis of travel parameters in the GHPM model leads to an insightful conclusion with possible effective policy implications on the ongoing COVID-19 pandemic and future outbreaks. We found that restricting the distance, rather than the frequency of travel – during an outbreak—is the more adequate approach to minimize the eventual impact of an epidemic.
Embedding mobility restrictions and immunization
Finally, we compare the effectiveness of mobility restrictions versus mass immunization. The first, is implemented by long distance travel restrictions, a measure already adopted world-wide, throughout 2020, during the COVID-19 pandemic, and with notable results45,46. In GHPM, this restriction is obtained by reducing the travel distance parameter \(\Psi\), as discussed in the previous section. The second, is implemented in GHPM by adding a progressive linear vaccination policy. More precisely, we pick random, not currently infected individuals, and transfer them to a permanent recovered state \(R^*\). The vaccination policy is not started at \(t=0\), but only after \(t=365\) days in our 5-year long simulation framework (this approach mirrors the current COVID-19 situation better). Afterwards, a random set of individuals are selected every single iteration as the number of immune individuals grows progressively.
Figure 5a depicts the impact of two mobility restriction policies of reducing travel to 30% (orange line), respectively to just 10% (red line) of the original, unrestricted travel. We note that, for increasing \(\Psi >0.2\), even very strict measures do not guarantee a proportional reduction in epidemic size \(\phi\).
Figure 5b summarizes the impact of two immunization policies of vaccinating individuals at a rate of 60% per year (5%/month; green line), respectively 90% per year (7.5%/month; red line), compared to no vaccination at all (gray line). Here we observe that, as the travel distance parameter \(\Psi\) increases, even an aggressive vaccination policy is unable to guarantee a proportional reduction in epidemic size \(\phi\). Of course, this is, in part, a consequence of our choice to implement the vaccination policy with a 1 year delay in our simulations. Further experimenting with different vaccination delays, or vaccination rates is outside the scope of this paper but can make an interesting follow-up research topic.
Discussion
The introduction of our multiscale GHPM model enables us to reveal several meaningful characteristics of real-world epidemic outbreaks that are, otherwise, challenging to describe with homogeneous mixing models which adopt a single scale15,16,17,18,19. For instance, most epidemic spreading models describe any outbreak though only two outcomes: (1) an epidemic trigger condition is not fulfilled and the disease subsides in a local sub-population, or (2) the condition is fulfilled, and the disease manages to spread globally to a large scale comparable to the entire population47. The epidemic sizes distribution always becomes bimodal, as the first mode correlates to unsuccessful, local outbreaks, and the second mode correlates to successful, global scale outbreaks (see Fig. 3a,d). Different, network science approaches have also been used in computational epidemics, like a forest-fire topology48, and social networks with two49 and multiple dimensions50. However, even relatively complicated network models lead to the same bimodal distributions.
The motivation of this study is to bridge realistic, hierarchical population structuring, with individual mobility patterns and infectious dynamics with patient relapse into a reliable simulation framework, targeting the better prediction and control of epidemic dynamics. As such, we propose a very fine-grained population structuring and mobility influenced by spatial and hierarchical constraints. To the best of our knowledge, our approach is novel, and we provide qualitative comparisons to homogeneous population mixing through means of epidemic size distributions. Furthermore, our empirical data restates the important temporal heterogeneity of many large epidemics that has yet to receive full attention in the modeling state of the art. This heterogeneity is exemplified in our real data on influenza and COVID-19 evolution (Fig. 1), and reproduced by our GHPM model (Fig. 3).
A predominant body of the disease epidemiology state of the art focuses on \(R_0\) as the central topic of research8,12,23,51. While this number has an inherent value for compartmental models with homogeneous mixing, \(R_0\) could also be estimated for a more complex deterministic multiscale model. Nevertheless, estimating or redefining \(R_0\), would not improve any relationship to the final epidemic size in stochastic multiscale models20, like our GHPM. The large, non-deterministic variations in epidemic size (Fig. 3e,f), and the resurgent characteristic (Fig. 3b,c) do not result from initial conditions like \(R_0=\lambda /\mu\), but rather from “black swan”52 events, during which an infected individual travels from a compromised household (or settlement) to new uninfected settlements. In this way, the fluctuating yearly epidemic size \(\phi _y\) is mostly decided by the increasing travel distance and travel frequency to susceptible populations.
In addition, we show that, apart from rare, long distance travel events, the population structure is a decisive factor in influencing the speed and impact of epidemic outbreaks. For example, the recent COVID-19 epidemic has a very varied evolution across countries (suggested by the distribution in Fig. 1b), ranging from less than 1–10% prevalence in 2020. These variations can also be attributed to the intensity, delay, and strength of implementing restrictions23. Even though non-pharmaceutical interventions (NPIs) are a relevant form of active control on epidemic duration and size9, our observations pinpoint that the GHPM population organization can achieve stochastic dynamics in epidemic size as large and heterogeneous as empirically observed fluctuations.
The focus of our simulation experiments was to determine how population organization, travel distance and travel frequency affect the spread of disease. In this sense, we show that travel restrictions, like reducing \(\Psi\), and \(\Theta\) (or equivalently \(\Theta _h\)) in our model (distance and frequency), can determine a significant change in the resulting epidemic size \(\phi _y\), comparable to stronger social intervention strategies like vaccination or total quarantine. Specifically, we find a phase transition from local to global epidemic around \(\Psi >0.1\) and \(\Theta _h>1\). When the two parameters are close to 0, infected individuals are more likely to remain in their original household, and the resulting epidemic is delimited to the local scale (\(h_{size}\ll \Omega\)). Overall, we conclude that minor modifications in \(\Psi\) have a higher impact on the size of the epidemic \(\phi _y\) compared to similar modifications in \(\Theta _h\).
One of the main take-away messages of this study is that we found travel distance to be more significant in increasing epidemic size than travel frequency. Additionally, our results pinpoint that, even an aggressive immunization policy (e.g., vaccination of 60-90% non-infected individuals per year), is unable to guarantee an immediate proportional reduction in epidemic size, given a delay of one year to start the vaccination campaign. To this end, vaccination (at a moderately realistic pace) does not yield lower epidemic sizes than reduced travel distances.
To conclude, major recent outbreaks like Ebola, SARS or COVID-19 repeatedly confront public health authorities with the uncertainty of—how big will it be this time? Unfortunately, current state of the art computational epidemiology can hardly offer accurate answers. Even the most complicated models of infectious spreading require an estimation of the relevant susceptible population. It is much easier to do retrospective studies in which, after an epidemic is observed, the specific parameters are approximated. However, such an approach is often limited because it has reduced relevance for future outbreaks. For instance, the global 1918 Spanish flu pandemic did not stop the 2003 SARS or 2020 COVID-19 pandemics. Even if the planet was less populated and less connected a century ago, the 1918 flu made considerably more victims compared to the potentially more infectious diseases of the XXI century. With the statistics of SARS pointing towards a quick mitigation with relatively low global impact, COVID-19 has had a very different evolution53. In the current pandemic context, the population susceptible to the SARS-CoV-2 virus is roughly the entire population of the planet. Should we then estimate scenarios with outbreaks concentrated around large populated hubs, or include billions of susceptible individuals spread across the entire planet? Perhaps the ultimate question is simply formulated as—what is the epidemic size distribution for a given infectious pathogen?
Our GHPM model tries to address these questions, to the best possible extent, by structuring population as a geographically spaced hierarchical set of sub-populations, modeled down to the level of households. The major advantage over other single scale, multiscale, or network models is that it can be extended to very large scales (e.g., continental or global) without implying homogeneous populations. This way, a modeled epidemic can be seen as multiple smaller epidemics occurring at different times, in different sub-populations. In this context, most of the infectious spreading happens locally, and global spreading is determined by rare distant travel. Because the epidemic size distribution is remarkably susceptible to the population structure, we suggest that epidemic control can be improved through adequate strategies applied to the boundaries inherently delimiting large, multiscale populations. We believe that future studies can adopt and extend our concept of geo-hierarchical population mobility to study progressively more realistic epidemic models of infectious spreading.
Methods
Geo-spatial population data
GHPM supports a pseudo-realistic organization of the population on which to run an epidemic outbreak simulation. In this sense, the settlements number |S|—and implicitly the total population of the experiment \(\Omega\)—are defined by a chosen geographical area A, most commonly limited to a country. In this sense, the number, position and size of settlements are defined according to data extracted from the Global Rural-Urban Mapping Project (GRUMP v1), revision 01 (March 2017) curated by the Center for International Earth Science Information Network (CIESIN), Columbia University54. GRUMP is an undergoing large-scale project, and is still missing some data, to a variable extend, for some countries. For the purpose of this paper we run all experiments on a model of Germany, with with 53.31M inhabitants (as defined by GRUMP) spread over 1132 settlements, which we consider a mid-sized representative example.
Inside each settlement, GHPM uses a simplified synthetic, but intuitive hierarchical organization of inhabitants using 2–5 levels of administrative units. We chose the synthetic alternative (inside settlements) because of the limitation in available data defining the organization of each settlement. Nevertheless, with more available data in the future, GHPM can be modified to offer a more precise mapping of each settlement’s population. Conversely, GHPM can be used without any real data, if one so chooses, by creating a synthetic set of settlements, defined by positions and populations according to any distribution of choice.
GHPM experimental setup
In order to significantly accelerate the large number of GHPM experiments, we chose to scale down the modeled population by 100 fold. As such, the final simulation population used throughout the experiments is 533,160 individuals (i.e., 1% of uniformly scaled down population of Germany from the GRUMP dataset).
In general, the size of each unique household \(h_i\) may be chosen as a uniformly distributed integer number between 1–4 individuals (i.e., average \(h_{size}=2.5\)) based on UN data for developed countries42. In particular, we use a custom distribution of household sizes according to data available in 2019 for Germany (Federal Statistical Office—Statistisches Bundesamt), as detailed in SI.1.
A branching factor \(\beta\) can be used to determine the number of divisions in a settlement, based on the number of households. In this sense, a static parameter of \(\beta =0.5-0.7\) makes a good approximation of the hierarchical density of administrative units. In this paper we use \(\beta =0.6\). Overall, it makes practical sense to use \(1>\beta >0.5\) (i.e., larger than square root) in order to obtain a more dense hierarchy of upper level divisions. A sensitivity analysis for \(\beta\) is provided in SI.2.
All simulations run for \(t=2000\) iterations (days). If we consider the correspondence 1 iteration = 1 day, then the simulation duration translates to \(2000/365 \approx 5.48\) years. Since the target of our study are resurgent epidemics (with relapse), the final epidemic duration and final size cannot be expressed in absolute values (they run indefinitely in most cases). Our approach, throughout the paper, is to use the yearly epidemic size \(\phi _y\) as a measure of the repeating outbreak waves. Intuitively, we average the total epidemic size \(\phi\) over the simulation period, i.e., \(\phi _y=\phi /5.48\). Furthermore, the impact of each parameter in the GHPM model is discussed in SI.3.
The timeout period for any recovered individual is uniformly chosen between of 3–6 months (current estimates of COVID-19 immunity of 3 months55, 4–5 months56, 6 months57). The adopted infection rate is \(\lambda =0.06\), and the recovery rate is \(\mu =0.02\). While a theoretical reproduction number \(R_0\) is based on the \(\mu\) and \(\lambda\), in practice22, we estimate \(R_0\) from the household size distribution, and obtain \(R_0=0.983\). Recent studies suggest an \(R_0=3.30 \pm 1.4\) for COVID-1951.
Data availability
COVID-19 data used in this study are supplied by the European Centre for Disease Prevention and Control (https://www.ecdc.europa.eu/en/covid-19/data). Weekly influenza data are supplied the US Outpatient Influenza-like Illness Surveillance Network (ILINet) (https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html), by Google Trends (https://www.google.com/publicdata/explore?ds=z3bsqef7ki44ac_) . Measles cases in Indonesia are supplied by the International Federation of Red Cross and Red Crescent Societies (IFRC) (https://data.humdata.org/m/dataset/indonesia-measles-outbreaks-2015-2017?force_layout=light). H1N1 data are provided by the WHO via Kaggle (https://www.kaggle.com/de5d5fe61fcaa6ad7a66/pandemic-2009-h1n1-swine-flu-influenza-a-dataset). Pertussis data are provided by the California Department of Public Health (CDPH) via Health.gov (https://healthdata.gov/State/Vaccine-Preventable-Disease-Cases-by-County-and-Ye/58x3-zrxa).
References
Anderson, R. M., May, R. M. & Anderson, B. Infectious Diseases of Humans: Dynamics and Control Vol. 28 (Wiley Online Library, 1992).
Keeling, M. The implications of network structure for epidemic dynamics. Theor. Popul. Biol. 67, 1–8 (2005).
Keeling, M. J. & Rohani, P. Modeling Infectious Diseases in Humans and Animals (Princeton University Press, 2008).
Salathé, M. & Jones, J. H. Dynamics and control of diseases in networks with community structure. PLoS Comput. Biol. 6, e1000736 (2010).
Siu, A. & Wong, Y. R. Economic impact of SARS: The case of Hong Kong. Asian Econ. Pap. 3, 62–83 (2004).
Elston, J. W., Cartwright, C., Ndumbi, P. & Wright, J. The health impact of the 2014–15 Ebola outbreak. Public Health 143, 60–70 (2017).
Nicola, M. et al. The socio-economic implications of the coronavirus and Covid-19 pandemic: A review. Int. J. Surg. 78, 185–193 (2020).
Hellewell, J. et al. Feasibility of controlling Covid-19 outbreaks by isolation of cases and contacts. Lancet Glob. Health 8, e488–e496 (2020).
Topirceanu, A., Udrescu, M. & Marculescu, R. Centralized and decentralized isolation strategies and their impact on the Covid-19 pandemic dynamics. arXiv preprint arXiv:2004.04222 (2020).
Cohen, J. & Kupferschmidt, K. Countries test tactics in ‘war’ against Covid-19. Science 367, 1287–1288. https://doi.org/10.1126/science.367.6484.1287 (2020).
Flaxman, S. et al. Estimating the number of infections and the impact of non-pharmaceutical interventions on Covid-19 in European countries: Technical description update. arXiv preprint arXiv:2004.11342 (2020).
Kucharski, A. J. et al. Early dynamics of transmission and control of Covid-19: A mathematical modelling study. Lancet Infect. Dis. 20, 553–558 (2020).
Diaz, P., Constantine, P., Kalmbach, K., Jones, E. & Pankavich, S. A modified SEIR model for the spread of Ebola in Western Africa and metrics for resource allocation. Appl. Math. Comput. 324, 141–155 (2018).
Dye, C. & Gay, N. Modeling the SARS epidemic. Science 300, 1884–1885 (2003).
Arenas, A. et al. A mathematical model for the spatiotemporal epidemic spreading of Covid19. MedRxiv (2020).
Ferguson, N. M. et al. Strategies for mitigating an influenza pandemic. Nature 442, 448–452 (2006).
Block, P. et al. Social network-based distancing strategies to flatten the Covid-19 curve in a post-lockdown world. Nat. Hum. Behav. 4, 588–596 (2020).
Thunström, L., Newbold, S. C., Finnoff, D., Ashworth, M. & Shogren, J. F. The benefits and costs of using social distancing to flatten the curve for Covid-19. J. Benefit-Cost Anal. 11, 179–195 (2020).
Atkeson, A. What will be the economic impact of Covid-19 in the us? Rough estimates of disease scenarios. Technical Report, National Bureau of Economic Research (2020).
Watts, D. J., Muhamad, R., Medina, D. C. & Dodds, P. S. Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc. Nat. Acad. Sci. 102, 11157–11162 (2005).
Liu, Q.-H. et al. Measurability of the epidemic reproduction number in data-driven contact networks. Proc. Nat. Acad. Sci. 115, 12680–12685 (2018).
Mistry, D. et al. Inferring high-resolution human mixing patterns for disease modeling. Nat. Commun. 12, 1–12 (2021).
Koo, J. et al. Interventions to mitigate early spread of Covid-19 in Singapore: A modelling study. Lancet Infect Dis.https://doi.org/10.1016/S1473-3099(20)30162-6 (2020).
McLean, A. R. et al. SARS: A Case Study in Emerging Infections (Oxford University Press, 2005).
Ferrari, M. J. et al. The dynamics of measles in sub-Saharan Africa. Nature 451, 679–684 (2008).
Chen, J., Zhang, H., Guan, Z.-H. & Li, T. Epidemic spreading on networks with overlapping community structure. Physica A 391, 1848–1854 (2012).
Shang, J., Liu, L., Li, X., Xie, F. & Wu, C. Epidemic spreading on complex networks with overlapping and non-overlapping community structure. Physica A 419, 171–182 (2015).
Stegehuis, C., Van Der Hofstad, R. & Van Leeuwaarden, J. S. Epidemic spreading on complex networks with community structures. Sci. Rep. 6, 1–7 (2016).
Sattenspiel, L. et al. A structured epidemic model incorporating geographic mobility among regions. Math. Biosci. 128, 71–92 (1995).
Salathé, M. et al. A high-resolution human contact network for infectious disease transmission. Proc. Nat. Acad. Sci. 107, 22020–22025 (2010).
Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 5, 109–137 (1983).
Calvetti, D., Hoover, A. P., Rose, J. & Somersalo, E. Metapopulation network models for understanding, predicting, and managing the coronavirus disease Covid-19. Front. Phys. 8, 261 (2020).
Topîrceanu, A. Analyzing the impact of geo-spatial organization of real-world communities on epidemic spreading dynamics. In International Conference on Complex Networks and Their Applications, 345–356 (Springer, 2020).
Sun, K. et al. Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2. Science 371, eabe2424 (2021).
Van den Broeck, W. et al. The gleamviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infect. Dis. 11, 37 (2011).
Viboud, C. et al. The RAPIDD Ebola forecasting challenge: Synthesis and lessons learnt. Epidemics 22, 13–21 (2018).
Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925 (2015).
Madhav, N. et al. Pandemics: Risks, impacts, and mitigation. Disease Control Priorities: Improving Health and Reducing Poverty. 3rd ed. Washington (DC): The International Bank for Reconstruction and Development / The World Bank, Chapter 17, PMID: 30212163 (2017).
Iuliano, A. D. et al. Estimates of global seasonal influenza-associated respiratory mortality: A modelling study. The Lancet 391, 1285–1300 (2018).
Fisman, D. Seasonality of viral infections: Mechanisms and unknowns. Clin. Microbiol. Infect. 18, 946–954 (2012).
Doxiadis, C. A. Ekistics, the science of human settlements. Science 170, 393–404 (1970).
United Nations, Department of Economic and Social Affairs, Population Division. "Household size and composition around the world 2017–Data booklet." ST/ESA/SER. A/405 (2017).
El Fatini, M., Lahrouz, A., Pettersson, R., Settati, A. & Taki, R. Stochastic stability and instability of an epidemic model with relapse. Appl. Math. Comput. 316, 326–341 (2018).
Caraballo, T., El Fatini, M., Pettersson, R. & Taki, R. A stochastic SIRI epidemic model with relapse and media coverage. Discrete Contin. Dyn. Syst. B 23, 3483 (2018).
Chinazzi, M. et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (Covid-19) outbreak. Science 368, 395–400 (2020).
Zhang, J. et al. Changes in contact patterns shape the dynamics of the Covid-19 outbreak in china. Science 368, 1481–1486 (2020).
Ferguson, N. M. et al. Planning for smallpox outbreaks. Nature 425, 681–685 (2003).
Rhodes, C. J. & Anderson, R. M. Power laws governing epidemics in isolated populations. Nature 381, 600–602 (1996).
Bonaccorsi, S., Ottaviano, S., De Pellegrini, F., Socievole, A. & Van Mieghem, P. Epidemic outbreaks in two-scale community networks. Phys. Rev. E 90, 012810 (2014).
Zheng, D.-F., Hui, P., Trimper, S. & Zheng, B. Epidemics and dimensionality in hierarchical networks. Physica A 352, 659–668 (2005).
Alimohamadi, Y., Taghdir, M. & Sepandi, M. Estimate of the basic reproduction number for Covid-19: A systematic review and meta-analysis. J. Prev. Med. Public Health 53, 151 (2020).
Taleb, N. N. The Black Swan: The Impact of the Highly Improbable Vol. 2 (Random House, 2007).
Petersen, E. et al. Comparing SARS-CoV-2 with SARS-CoV and influenza pandemicsc. Lancet Infect. Dis. 20, e238–e244 (2020).
Warszawski, L., Frieler, K. et al. Center for international earth science information network—CIESIN—Columbia University. Gridded population of the world, version 4 (gpwv4). NASA socioeconomic data and applications center (SEDAC). Atlas of Environmental Risks Facing China Under Climate Change. https://doi.org/10.7927/h4np22dq (2017).
Ward, H. et al. Declining prevalence of antibody positivity to SARS-CoV-2: A community study of 365,000 adults. MedRxiv (2020).
Gudbjartsson, D. F. et al. Humoral immune response to SARS-CoV-2 in iceland. N. Engl. J. Med. 383, 1724–1734 (2020).
Zuo, J. et al. Robust SARS-CoV-2-specific t-cell immunity is maintained at 6 months following primary infection. BioRxiv (2020).
Acknowledgements
Both authors, AT and REP, are supported by the Romanian National Authority for Scientific Research and Innovation (UEFISCDI), project numbers PN-III-P1-1.1-PD-2019-0379 (AT), respectively PN-III-P4-ID-PCE-2020-0269 (REP).
Author information
Authors and Affiliations
Contributions
A.T. and R.E.P. conceived the experiments, A.T. conducted the experiments, A.T. and R.E.P. analyzed the results, A.T. wrote the manuscript. Both authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Topîrceanu, A., Precup, RE. A novel geo-hierarchical population mobility model for spatial spreading of resurgent epidemics. Sci Rep 11, 14341 (2021). https://doi.org/10.1038/s41598-021-93810-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-93810-8
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.