Detecting a Surprisingly Low Transmission Distance in the Early Phase of the 2009 Influenza Pandemic

The spread of the 2009 H1N1 influenza pandemic in England was characterized by two major waves of infections: the first one was highly spatially localized (mainly in the London area), while the second one spread homogeneously through the entire country. The reasons behind this complex spatiotemporal dynamics have yet to be clarified. In this study, we perform a Bayesian analysis of five models entailing different hypotheses on the possible determinants of the observed pattern. We find a consensus among all models in showing a surprisingly low transmission distance (defined as the geographic distance between the place of residence of the infectors and her/his infectees) during the first wave: about 1.5 km (2.2 km if infections linked to household and school transmission are excluded). The best-fitting model entails a change in human activity regarding contacts not related to household and school. By using this model we estimate that the transmission distance sharply increased to 5.3 km (10 km when excluding infections linked to household and school transmission) during the second wave. Our study reveals a possible explanation for the observed pattern and highlights the need of better understanding human mobility and activity patterns under the pressure posed by a pandemic threat.


Materials and methods
The models used in this work are stochastic spatially-explicit individual-based models of influenza transmission for England adapted from previous models developed for Europe [1,2,3]. Each model is the combination of two different layers: i) a model of the socio-demographic structure, and ii) an infection transmission model. All models share the same socio-demographic model, while differ in some details of the infection transmission process as detailed in the next sections. (Note that all the five models are nested in a more general model, as detailed in Sec. 1.2).

Socio-demographic model
The socio-demographic structure of the model is the same as in [4] and we refer to this work for all technical details.
Briefly, socio-demographic data on the age structure of the population of England are used to generate about 47 millions individuals, which are distributed on a grid of 3894 cells proportionally to the population density derived from the Gridded Population of the World, version 3 (GPW v3). Each individual is characterized by an age and belongs to a household. Students are also assigned to a specific school (see below). Each household and school has specific geographic coordinates within the cell.
Household members are determined using a heuristic model previously proposed in [4], which reproduces data on household size, composition and age of household members by size specific for the United Kingdom, as provided by the Statistical Office of the European Commission (http://ec.europa.eu/eurostat), preserving realistic age difference between members of the same household. As shown in [4], age distribution of the population, household size, household composition by size and age distribution of household members by size simulated by using this model match the observed ones.
Schools are allocated proportionally to population density and their size is determined using data on school size by level specific for England [5]. School age children (5-18 years) are assigned to a school of the corresponding education level using the resource competition algorithm introduced in [6], which accounts for the population density of the considered geographic area. The mean home-to-school distance obtained by using this algorithm results 4 km, in excellent agreement with the value reported to the Department of Transport, namely 4.3 km [7].

Infection transmission model
The transmission of influenza infection follows a discrete-time individual-based SLIR (susceptible -latent -infectious -removed) model with time step ∆t = 1 day. The model explicitly considers transmission in households, schools, and 'other settings' (which accounts for contacts occurring in all settings other than households and schools -e.g., contacts in the general community, workplaces, means of transport, free-time activities).
Simulations are initialized by including age-specific pre-pandemic immunity, as measured in [8]. In particular, for each individual i aged a i , we determine whether he/she is initially immune by sampling from a Bernoulli distribution of probability given by the age-specific pre-pandemic immunity reported in [8].
At any time step of the simulation each susceptible individual i has a probability p i (t) = 1−e −∆tλ i (t) of becoming infected. The probability of infection, which is re-computed at each time step, depends on the individual risk of infection λ i (t) that takes into account the contribution of infection sources in each of the considered settings: • contacts with infectious members of his/her household; • contacts with infectious schoolmates, if individual i is a student; • contacts with infectious people in 'other settings'.
Similarly to [2,9], transmission in 'other settings' explicitly depends on the geographic distance, which is modeled by a kernel distance function K (the definition of K is given below).
The generalized formulation of the risk of infection (and thus also of the transmission model) is defined by the following equation: • H i is the index of the household of individual i, n i is the number of members of the household and ω = 1.63, as estimated in [10], in order to take into account of contact heterogeneity in the household network.
• S i is the index of the school where individual i studies and m i is the school size, if individual i is a student.
• a i is the age of individual i.
• ρ(a i ) is the age-dependent susceptibility to infection. For the sake of simplicity, we divide the population into two groups: children (a i < 15 years), for which ρ(a i ) = 1, and adults (a i ≥ 15 years), for which ρ(a i ) = σ ∈ [0, 1]. In the manuscript we refer to σ as the relative susceptibility to infection of adults with respect to children.
• β h is the transmission rate within households (day −1 ).
• β s is the transmission rate within schools (day −1 ).
• β r is the transmission rate in 'other settings' (day −1 ).
• K(d ij ) is a decreasing function of the geographic distance d ij between the households of individuals i and j, modeling human mobility patterns. In particular, we assume that function K has the following form: i.e. function K is shaped by the parameters a, α 1 and α 2 . Intuitively, parameter a mainly regulates the shape of the function at short distances, whereas parameters α 1 and α 2 , mainly shape the exponential decay at medium/long distances in the two considered time windows.
• µ(t) accounts for the school calendar, not allowing transmission at school when schools are closed. In particular, •τ (t) shapes influenza transmissibility in 'other settings' during school closures, accounting for the fact that students might be more active in 'other settings' during those periods [11]. In particular, At any time step ∆t of the simulation latent individuals become infectious at a rate ∆t/T L , where T L is the latent period -assumed equal to the incubation period -that lasts on average 1.5 days [12]. Infectious individuals recover (and are assumed to be fully protected) at a rate ∆t/T I , where T I is the infectious period, which lasts on average 1.6 days in such a way to obtain a generation time of 3.1 days, in agreement with the scientific literature on the 2009 H1N1 pandemic [2,13]. The distributions of the latent and infectious period obtained by assuming a time-step of 1 day are shown in Figure S1. Note that the actual distributions of latent and infectious periods are presumably far from exponential: for instance, Cori et al. [14] using viral excretion data from volunteers challenged with different influenza strains, estimated for the latent period a mean of 1.63 days with a standard deviation of 0.26, and for the infectious period a mean of 0.99 days with a standard deviation of 0.96. In this sense, the standard deviation of the generation time arising from simulating the model with a time-step of 1 day may be closer to the actual one than the standard deviation obtained under the assumption of exponential distributions (see Table S1). Moreover, using a shorter time-step would pose serious computational challenges.

Seeding of infection
The daily number of imported cases in the first wave is based on the actual time-series of travelrelated cases reported in [16]. Previous studies on the 2009 H1N1 pandemic influenza have highlighted that H1N1 importations in the early phase of the 2009 pandemic were strongly correlated with air passenger flows [17,18]. We thus determine the geographic location of imported cases by randomly distributing cases among all English regions (i.e. NUTS1) proportionally to air passenger arrivals. In particular, we use data on incoming air passenger flows during the period April-June 2009 provided by the Civil Aviation Authority [15] to determine the fraction of incoming passengers specific for each region of England. As shown in Figure S2, the fraction of incoming passengers is not proportional to population density, with Greater London accounting for about 70% of the total volume. For each time step, the number of imported cases is distributed among regions proportionally to the corresponding fraction of incoming passengers and, within each region, cases are assigned to a cell proportionally to population density.
The daily number of cases imported during the second wave is sampled at each time-step from a Poisson distribution with parameter λ = 10 δ -1, where δ ∈ [0, +∞]. The geographic location of cases imported in the second wave is determined by using the same procedure described for the first wave.
We tested an alternative method for determining the destination of imported cases. Specifically, we assume to randomly distribute imported cases proportionally to the population density of each cell ( Figure S2b). Specifically, we calibrated a new model, which is exactly as model M2 (see main text and Sec. 1.4 of this document), but differing for two details: i) the model is calibrated on serological seroprevalence data on the first pandemic wave only [8], and ii) we assume importation of cases proportional to population density. Our modeling results show that the hypothesis of import of cases proportional to population density data leads to a regional distribution of cases during the first wave not compliant with the observed data [8] (see Fig. S2c). As the model was not able to properly reproduce data on the first pandemic wave, we decided not to consider this model for further analyses.

Description of the five models and their assumptions
The five models considered in this work test different hypotheses on the factors that might have shaped the complex pattern of spread observed during the 2009 influenza pandemic in England. The models are obtained from the general formulation as follows:

M1
The parameter values regulating the kernel function (see Equation 1) are fixed at a = 4 km and α 1 = α 2 = 3, as estimated by Ferguson and colleagues [9] from the analysis of commuting data for the UK. χ = 1 for both pandemic waves and δ = 0, i.e. no changes in transmissibility of the virus are considered, nor the importation of additional cases during the second wave is considered. Note that in this model the transmission distance is determined by human mobility patterns as estimated from commuting data for the UK.
M2 Parameter a regulating the kernel function (K) is fixed at a = 1 km, whereas α 1 = α 2 =: α is estimated, χ = 1 for both pandemic waves and δ = 0. This model tests the hypothesis that the observed pattern of spread might be explained by an alteration in the transmission distance with respect to that derived from commuting data.
M3 Parameter a is fixed at a = 1 km, whereas α 1 = α 2 =: α is estimated, χ = 1 for the first wave, whereas it is estimated for the second wave, and δ = 0. This model tests the hypothesis that the observed pattern of spread might be explained by an alteration in the transmission distance during the pandemic and by an increase in the transmissibility of the virus during the second wave.
M4 Parameter a is fixed at a = 1 km, whereas α 1 = α 2 =: α is estimated, χ = 1 for both waves and δ is estimated. This model tests the hypothesis that the observed pattern of spread might be explained by an alteration in the transmission distance during the pandemic and by an increasing number of imported cases during the second wave.
M5 Parameter a is fixed at a = 1 km, whereas α 1 and α 2 are estimated separately, where α 1 regulates the kernel function in the first wave and α 2 in the second wave; χ = 1 for both pandemic waves and δ = 0. This model tests the hypothesis that the observed pattern of spread might be explained by an alteration in the transmission distance during the first wave and by a second change occurring during the second wave (possibly coming back to a value close to that obtained by the analysis of commuting data).
Note that model M1 assumes a = 4 km, while models M2-M5 assume a = 1 km. Model M1 assumes a = 4 km in order to be consistent with the work of Ferguson and colleagues [9] that we decided to use as reference model. However, fixing a = 4 km in models M2-M5 would impose a constraint on the fraction of infections within a certain distance, which is an undesired feature for our analysis aimed at estimating the transmission distance directly from seroepidemiological data. Fig.S3a shows that, by setting a = 4 km, even by using α = 6, a value much larger than those estimated for human mobility [9,19,20], a considerable fraction of secondary infections occur at a distance larger than 4 km from the place of residence of the infector, while this is not the case when a = 1 km. On the other hand, assuming a = 1 km does not impose any constraint on the maximum transmission distance, provided that appropriate values of α are used (Fig.S3b).

Model calibration
Model calibration is performed using Markov chain Monte Carlo (MCMC) sampling applied to the product of the binomial likelihoods (L 1 and L 2 ) of the age-specific prevalence of H1N1 antibodies observed in England respectively in August 2009 [8] (after the first wave) and in the period January-April 2010 [21] (after the second wave). Data are disaggregated into the groupings of regions specified in Table S2. The likelihood function L 1 is defined as where • Θ is the vector of free parameters; • M is the set of age groups considered in [8]; • n A m is the number of individuals in the m-th age group in Group A in the post-first wave dataset [8]; • r A m is the number of seropositive individuals (haemagglutination inhibition titre 1:32 or more) in the m-th age group observed in Group A in the post-first wave dataset [8]; • p A m (Θ) is the seroprevalence in the m-th age group simulated by the model with parameter set Θ in Group A after the first wave; • n B m is the number of individuals in the m-th age group in Group B in the post-first wave dataset [8]; • r B m is the number of seropositive individuals (haemagglutination inhibition titre 1:32 or more) in the m-th age group observed in Group B in the post-first wave dataset [8]; • p B m (Θ) is the seroprevalence in the m-th age group simulated by the model with parameter set Θ in Group B after the first wave.
L 2 is defined analogously to L 1 , but using the post-second wave dataset [21] and the respective grouping of regions (see Table S2).
All models share a set of five common parameters, i.e. three transmission rates (in households, β h , schools, β s , and 'other settings', β r ), the relative susceptibility to infection of adults (individuals aged ≥ 15 years) with respect to children (σ) and the multiplying factor for the transmission in 'other settings' while schools are closed (τ ). Model M3 has one further free parameter determining the (possible) increase of virus transmissibility during the second wave (χ). Model M4 has one parameter regulating the daily number of imported cases during the second wave (δ). Models M2-M4 have one free parameter regulating the kernel distance function K(d), i.e. α. Finally, model M5 has one free parameter (α 1 ) regulating the kernel function K(d) for  Table S3. Table S3: Set of free parameters for the different models considered.
The posterior distribution of Θ is determined using random-walk Metropolis-Hastings sampling [22]. For each of the five models we run four chains, each one having a different starting point. Each chain is based on 22,000 realizations. To perform the simulations of the calibrated models, for each model we considered only one chain (i.e., one starting point), we discarded the first 2,000 iterations (considered as burn-in period), and we run 20,000 realizations each one having the parameter set estimated by MCMC. We assume non-informative prior distributions on all model parameters (i.e., the prior distribution for β s is a flat distribution in [0, 1000]). On the contrary, for the transmission rate in households we assume a prior uniform distribution in [0, 3] according to the findings reported in [10]. Indeed, when assuming β h = 3, the household secondary attack rate is more than twice the one estimated for the 2009 influenza pandemic [10], thus providing an upper bound for the household transmission rate.

Computation of R e
The effective reproduction number R e represents the average number of infections generated by an infectious individual in a partly immune population. We estimated its value for the first and second wave from the corresponding epidemic growth rate [2,9,23,24].
Briefly, the exponential growth rates r 1 and r 2 of the two epidemic waves are estimated by fitting a linear model to the logarithm of influenza incidence over the the week of maximum growth. r 1 is computed on the basis of simulated data up to week 33, 2009, and r 2 on the basis of simulated data after week 34, 2009. The effective reproduction number is thus computed as: where T L = 1.5 days and T I = 1.6 days are the average duration of the latent and infectious period, respectively. The details on the derivation of such an equation can be found for instance in [24].
2 Additional results

Estimated models parameters and epidemiological indicators
Estimated values of all parameters for the five analyzed models are reported in Tab

Estimated transmission distance
For models M2-M5 (i.e., all models except model M1 for which the kernel parameters are kept fixed), we estimated very narrow kernels for the first wave (see Tab. S4). This resulted in a low average transmission distance (see Table 1 in the Main Text) and in a sharply decreasing distribution (see Fig. S4). However, although the narrow kernel, the models estimated a homogeneous attack rate within the entire England at the end of the pandemic. For instance, if we consider a variant of model M5, where α 1 = α 2 = 4.06 and not considering school closures during holidays (all the other model parameters are those estimated for model M5 and reported in Tab. S4), we found that the pandemic spreads in an heterogeneous way (see the different peak times for England regions reported in Fig. S5). However, at the end of the epidemic the final attack rate by region is very homogenous (see Fig. S6). Indeed, the pandemic is capable to reach all regions because of the combination of contacts in the general community and because of commuting from the place of residence to school. This is further supported by the analysis of models M2-M4. In fact, slight variations in the transmission distance are observed when all sources of infections are considered (Fig. S4b), suggesting a larger importance of school transmission later on in the epidemic (see also the fraction of cases by setting reported in Tab. S5), when the infection started to reach a larger geographical area. On the other hand, the estimated transmission distance in 'other settings' does not remarkably change over time according to models M2-M4 (Fig. S4a). Another factor contributing to the homogenous final attack rate by region is the increased transmission rate in 'other settings' during the summer school closure, which both contributed to keep influenza circulating and increased the chance of generating secondary infections outside the network of close contacts of household members and schoolmates.

Geographic spread of the pandemic estimated by models M1-M4
Model M1 estimates a rather homogenous spread of the pandemic in both waves (Fig. S7). In particular only a slight difference in the prevalence between London and West Midlands, and the other regions is estimated, especially if compared with the estimates of model M5 (compare Fig. S7a with Fig. 4a in the main text). Influenza activity during the summer is estimated to be rather high, which appears to be less compliant with ILI reported by the HPA than what estimated by model M5 (compare Fig. S7c with Fig. 2c in the main text). Nonetheless, the difference in the pandemic dynamics between London and the other regions of England is still clearly visible (Fig.S7d,e). The geographic spread of the pandemic estimated by model M2 is very close to that estimated by model M5 (compare Fig. S8 with Fig. 2 and 4 in the main text). Indeed for model M2 we estimated a very narrow kernel, similar to the one estimated for model M5 in the first wave (see Tab. S4), supporting the finding that the transmission distance was very low at least in the early phase of the pandemic. The epidemics simulated by model M2 then lead to a rather homogenous attack rate across England regions, partially thanks to a larger fraction of cases linked to contacts in 'other settings' (Tab. S4 and S5).
The geographic spread of the pandemic estimated by model M3 is reported in Fig. S9. Model M3 estimates a slightly more homogenous spread during the first wave ( Fig. S9a with Fig. 4a in the main text)) than model M5 and a markedly large influenza activity during the summer, especially in London area (Fig. S9). The final attack rate estimated by model M3 is quite compliant with that estimated by all the other models.
The geographic spread of the pandemic estimated by model M4 is reported in Fig. S10. Model M4 estimates a quite heterogeneous pattern across England regions during the first wave, very similar to that estimated by model M5 (compare Fig. S10a with Fig. 4a in the main text) and an homogenous attack rate by region at the end of the pandemic (compare Fig. S10b with Fig. 4b in the main text). However, model M4 also estimate a remarkable influenza activity during the summer (Fig. S10c,d,e), and an unreasonably large number of imported influenza cases over time (see inset in Fig. S10d).  Figure S7: Spatial spread of 2009 H1N1 influenza pandemic in England as estimated by model M1. a Age-specific seroprevalence (mean, 95% CI) by age group and region as reported in [8,21] (proportion of serum samples with haemagglutination inhibition titre 1:32 or more) and as estimated by model M1, as of August 2009 (i.e., at the end of the first epidemic wave). Regions are grouped as in the original works [8,21]; WMid corresponds to West Midlands. b as a, but as of January 2010 (i.e., at the end of the second wave   Figure S8: Spatial spread of 2009 H1N1 influenza pandemic in England as estimated by model M2. a Age-specific seroprevalence (mean, 95% CI) by age group and region as reported in [8,21] (proportion of serum samples with haemagglutination inhibition titre 1:32 or more) and as estimated by model M2, as of August 2009 (i.e., at the end of the first epidemic wave). Regions are grouped as in the original works [8,21]; WMid corresponds to West Midlands. b as a, but as of January 2010 (i.e., at the end of the second wave   Figure S9: Spatial spread of 2009 H1N1 influenza pandemic in England as estimated by model M3. a Age-specific seroprevalence (mean, 95% CI) by age group and region as reported in [8,21] (proportion of serum samples with haemagglutination inhibition titre 1:32 or more) and as estimated by model M3, as of August 2009 (i.e., at the end of the first epidemic wave). Regions are grouped as in the original works [8,21]; WMid corresponds to West Midlands. b as a, but as of January 2010 (i.e., at the end of the second wave). c Weekly incidence of new reported ILI cases in the UK [2] and weekly incidence of new infections estimated by simulating the calibrated model M3 (mean and 95%CI) for England. d Weekly incidence of new infections in London as estimated by model M3 (mean and 95%CI). e The same as d but for all other regions of England (grouped together)  Figure S10: Spatial spread of 2009 H1N1 influenza pandemic in England as estimated by model M4. a Age-specific seroprevalence (mean, 95% CI) by age group and region as reported in [8,21] (proportion of serum samples with haemagglutination inhibition titre 1:32 or more) and as estimated by model M4, as of August 2009 (i.e., at the end of the first epidemic wave). Regions are grouped as in the original works [8,21]; WMid corresponds to West Midlands. b as a, but as of January 2010 (i.e., at the end of the second wave). c Weekly incidence of new reported ILI cases in the UK [2] and weekly incidence of new infections estimated by simulating the calibrated model M4 (mean and 95%CI) for England. d Weekly incidence of new infections in London as estimated by model M4 (mean and 95%CI). The inset shows the posterior distribution (median, 50% CI, 95% CI) of the daily number of imported cases as obtained by Model M4 and as reported in the time-series of travel-related cases [16]. e The same as d but for all other regions of England (grouped together)

Supplementary Video S1
Supplementary Video S1 caption. Estimated spatiotemporal dynamics of the pandemic. Spatiotemporal pattern of influenza spread. Left panel shows the simulated median incidence of new weekly influenza cases by cell; right panel shows the simulated cumulative median incidence of weekly cases by cell. Averages computed on 2,000 simulations performed by using model M5. The maps were generated using Grass GIS 6.4.2 (https://grass.osgeo.org/announces/ announce_grass642.html).