Hand, Foot, and Mouth Disease in China: Critical Community Size and Spatial Vaccination Strategies

Hand Foot and Mouth Disease (HFMD) constitutes a considerable burden for health care systems across China. Yet this burden displays important geographic heterogeneity that directly affects the local persistence and the dynamics of the disease, and thus the ability to control it through vaccination campaigns. Here, we use detailed geographic surveillance data and epidemic models to estimate the critical community size (CCS) of HFMD associated enterovirus serotypes CV-A16 and EV-A71 and we explore what spatial vaccination strategies may best reduce the burden of HFMD. We found CCS ranging from 336,979 (±225,866) to 722,372 (±150,562) with the lowest estimates associated with EV-A71 in the southern region of China where multiple transmission seasons have previously been identified. Our results suggest the existence of a regional immigration-recolonization dynamic driven by urban centers. If EV-A71 vaccines doses are limited, these would be optimally deployed in highly populated urban centers and in high-prevalence areas. If HFMD vaccines are included in China’s National Immunization Program in order to achieve high coverage rates (>85%), routine vaccination of newborns largely outperforms strategies in which the equivalent number of doses is equally divided between routine vaccination of newborns and pulse vaccination of the community at large.

understanding of the spatio-temporal dynamics of HFMD in China could help to optimally deploy this new vaccine in light of the demographics and the economics of the country as well as the epidemiology of the disease (Wu et al., 21 in review).
Mathematical models have provided useful insights into the dynamics of similar self-immunizing childhood infections such as measles [22][23][24][25] , rubella [26][27][28] , or pertussis [29][30][31] . With regards to vaccination, models can be used to evaluate the impact of different vaccination strategies, and optimally allocate a limited number of vaccine doses across geographic areas 28 and age groups 32,33 . Previous model-based analyses have explored the seasonal patterns of HFMD and attempted to estimate key epidemic indicators such as the basic and effective reproductive numbers [34][35][36][37] . Additionally, Takahashi et al. 38 , recently used a time series Susceptible-Infectious-Recovered (TSIR) model to evaluate the potential for competitive release of CV-A16 for a hypothetical nation-wide vaccination campaign against EV-A71. Another important epidemic parameter to characterize in the context of vaccination campaigns is the critical community size (CCS)-the size of an isolated population below which a pathogen cannot persist without re-introduction 39 .
The CCS is a complex parameter that is specific to each disease and depends on the seasonality and the intensity of transmission 40,41 , the pattern of arrival of infected individuals, as well as the recruitment of susceptible individuals, either through variations in birth rate or migration 42 . In a context of high vaccination uptake, the CCS can strongly influence the spatio-temporal dynamics and hence the persistence of childhood disease in the long term. As China displays important geographic gradients in population density and in transmission intensity of HFMD 38 , characterizing the CCS across regions could affect the efficiency of competing spatially heterogeneous control strategies, allowing, for example, for reduced focus in coverage on zones below the CCS 43 . To our knowledge, none of the previous modelling studies on HFMD have attempted to quantify the CCS of the disease. The unprecedented spatial resolution of the Chinese surveillance dataset opens new possibilities to investigate the dynamics of HFMD across a range of community sizes and quantify its potential for local persistence.

Objectives
In this study we use detailed geographic surveillance data to quantify the CCS of HFMD in China's northern and southern regions using both observed and simulated incidence from TSIR epidemic models. Subsequently, we explore what spatial patterns of vaccination may best reduce the burden of HFMD in a context of limited vaccine availability.

Methods
Data. In this study, we used a four-year time series (2010-2013) of weekly incidence of clinically and laboratory confirmed cases of HFMD from the nationwide Notifiable Infectious Diseases Reporting Information System (NIDRIS) maintained by the Chinese Center for Disease Control and Prevention. These sparse time series data were available at the township level (4 th administrative division of China) and were aggregated to the county level (3 rd administrative division) and to the province or municipality level (1 st administrative division) for consistency with Takahashi et al. 38 . Weekly incidence of HFMD was also available at the prefecture level but was not used as too few time series experienced interruptions that could be exploited to evaluate the CCS. To reconstruct the respective incidence of EV-A71 and CV-A16 associated HFMD in each province, we used the proportions of each of those serotypes amongst lab-confirmed cases (other serotypes were not considered in this study). This procedure is identical to Takahashi et al. 38 using the data from NIDRIS. At the province level, there were moderate discrepancies between the aggregated incidence of CV-A16 and EV-A71 and the incidence from NIDRIS used by Xing et al. 12 and Takahashi et al. 38 (median provincial correlation = 0.65). At the county level, the incidence time series was reconstructed for each EV-A71 and CV-A16 using binomial sampling of all clinically confirmed HFMD cases, with a sampling probability equal to the proportion of cases associated with either EV-A71 or CA-A16 in the surrounding province. In this study all weekly incidence reports were pooled in years of 52 weeks for simplicity.
Estimation of critical community size (CCS). The CCS for EV-A71 and CV-A16 in China was estimated from the county-level incidence in the southern and northern epidemic regions of China (as defined in Xing et al. 12 ). For each strain we use two approaches to estimate the CCS: first using the observed incidence (corrected for underreporting), and second using simulated incidence from TSIR simulations 23,24,44 with immigration dynamics. CCS estimates from township-level incidence were not considered because in most locations, the time series were too sparse to apply the TSIR models. Underreporting rates were obtained for each province by estimating the weekly value of the slope of the relationship between the cumulative number of cases and the cumulative number of births 39 .
Where B t is the weekly number of births, ρ t is the reporting rate estimated using a smoothing spline with four degrees of freedom, ′ I t is the observed incidence, and D t represents the residuals of the fitted function at each time step (Appendix I). The numbers of births by province per week were derived from the yearly birth rates obtained from the National Bureau of Statistics of China. Because the reporting rates obtained were very low, we used a resampling procedure described in Appendix II that prevents underestimating of the tails of each epidemic wave when correcting the incidence for underreporting in small populations. Weekly births at the county level were calculated as the number of weekly births at the province level multiplied by the ratio between each county's population and its corresponding province population.
CCS from reported incidence. The CCS values obtained from the observed incidence (corrected for underreporting) were estimated using the relationship between the log10 of the population in each district and the proportion of zeros in each incidence time series. For each epidemic region and strain, the intersection of the fitted smoothing splines (df = 4, see Appendix IV for sensitivity analysis) with the population axis gives the estimate of the CCS 29 . The population estimate can be interpreted as the threshold population size below which the disease cannot sustain itself without repeated re-introduction CCS from simulated incidence. Next we simulated the weekly incidence of EV-A71 and CV-A16 in each county using a time series SIR model. The dynamics of the susceptible and infectious individuals within each spatial unit are given by Here, S t is the number of individuals that are susceptible, B t is number of births, I t is the number of infected individuals corrected for underreporting at time t, λ t+1 is the expected value of the number of individuals infected at time t + 1, θ t is the number of infected individuals immigrating in the spatial unit, β t , is a seasonally-varying transmission rate, α is a correction parameter accounting for non-seasonal heterogeneities in mixing as well as for time discretization 23,44 , and N t is the total population size in the spatial unit. The transmission rates β t , and the number of migrants θ t were fitted independently in each spatial unit (Eqs 6 and 7) whilst α was spatially invariant. For consistency with previous modelling works on childhood infections, we assumed frequency-dependent transmission, a choice motivated by the fact that for such childhood infections the social clique size is relatively constant 23,45,46 .
In this study we used the TSIR framework in a two-step procedure: For the first step, we fitted the model at the province level, omitting the immigration term, θ , as its influence on the dynamics is negligible in the very large populations that make up entire Chinese provinces. This allows us to estimate the seasonally varying transmission rates β t by taking the log transformation of Eq. 4: where α is fixed to 0.95 38 , and S t can be expressed as with D t estimated as the residuals of the fitted smoothing spline of the cumulative number of births on the cumulative number of cases (Eq. 1). The proportion of susceptible individuals s Mean for each province was taken from Takahashi et al. 38 .
In the second step, we use Eqs 2 and 3 to simulate the dynamics in each of the 2,897 counties that reported HFMD cases and we apply the following scaling factors ϕ β, and ϕ θ , with values specific to each individual county, on the provincial transmission rate β t and the immigration term θ t, such as, where N County and I t,Province are respectively the number of individuals in the county and the incidence in the province containing the county (Fig. 1). For each county, we ran a constrained minimization procedure with 500 simulations varying the scaling factors within a range of plausible values, φ θ = [0, 1] and φ β = [0, 10] (varying evenly across the corresponding range), the latter to enable the model to capture potential variation in the intensity of the frequency-dependent transmission (Fig. 1). The best set of scaling factors was identified by comparing four years of simulated incidence to the observed incidence (corrected for underreporting) using the square mean root error as the fitting metric. Simulations were run for 40 years to bring the system to its dynamic equilibrium, before comparison with the observed incidence time series.
Additionally, a conditional threshold of one weekly migrant was imposed during the minimization procedure in order to avoid biased values (overestimates) for φ β as a compensation mechanism for premature stochastic extinction in small counties. Using this condition for the minimization procedure did not substantially affect the within-county dynamics (Appendix V), as the number of infected migrants remained on average one order of magnitude lower than the observed incidence in the counties with the lowest HFMD presence (25% of total counties). When running simulations to estimate the CCS and test vaccination strategies this condition was relaxed in order to enable the extinction-recolonization dynamics. Finally, the simulated incidence obtained through this two-step procedure was used to calculate the CCS following the same methodology as described for the observed incidence time series (Fig. 2 for EV-A71 and Fig. 3 for CV-A16). Finally, the differences (or lack thereof) CCS obtained for each region, and incidence time series (observed vs predicted) was assessed using a three step procedure: fist we used the jackknife residuals of the fitted smooth spline to calculate the 95% CI around the intersection of the spline with the population axis. Second, based on this interval, we constructed normal distributions of the probable CCS value with a number of samples equal to the number counties used to estimate to CCS in each region. Third we used a two-sided t-tests on these distributions to assess the difference in CCS between the two regions. Two-sided t-tests weighted by the number of counties used for the estimation (Figs 2 and 3).
Scientific RepoRts | 6:25248 | DOI: 10.1038/srep25248 Vaccination Strategies. In each county, simulations of vaccination were run using the scaling factors φ β , and φ θ , identified from the optimization procedure, as well as the initial number of infectious and susceptible individuals on the first week of the 39 th year of simulation. Epidemics were run for 20 additional years before vaccination started.
First, we studied the impact of routine vaccination of newborns with increasing coverage rate p on the CCS of HFMD in each epidemic region of China (Fig. 4).
Second, we considered the eventual targeting of a limited number of vaccines to newborns in 'priority areas' , thus implementing spatially heterogeneous vaccination strategies. The outcome of each strategy is measured by the decrease in the cumulated number of cases 5 years and 20 years following vaccination. In this instance, for each week, a number of doses equal to half the number of births per week in 2013, (n = 101,596 individuals) was distributed across the country according to five spatial strategies: A) Vaccination targeted in counties with the lowest population (n = 1,601 locations), B) vaccination in counties with the highest population (n = 678 locations), C) vaccination targeted in the counties which population is the closest to the regional CCS (n = 1,155 locations), D) vaccination in randomly selected counties (n = 1,217 locations), and E) vaccination in counties with the highest average incidence (n = 1,669).
Third, we considered a case in which newborns across the entire country are vaccinated routinely with a coverage rate of 85%, in every county (strategy F). This value was chosen to reflect the typical coverage for vaccines included in China's National Immunization Program. We compared this strategy with the five spatially explicit vaccination strategies: the total number of doses used for strategy F was targeted to specific counties according to a spatial strategy, A, B, C, D, or E. In each county selected for vaccination, 85% of newborn were vaccinated and, in addition, an equal number of doses was administered through a single pulse vaccination of susceptible individuals on the 6 th week of the year (which corresponds to the minimum of the yearly incidence). Thus for strategies A, B, C, D and E, 42.5% of newborn in the whole country received routine vaccination. But vaccination at birth only occurs in selected counties (with coverage 85%) and is complemented by pulse vaccination. In the counties selected for vaccination, Eq. 2 can be extended for each year as follows, where the weekly number of birth B t is shrunk by a factor (1-p) with p corresponding to the vaccination coverage at birth adjusted for vaccine efficacy (94.80%) 19 and (52· p·B t|t=6 ) corresponds to the number of vaccine doses administered through pulse vaccination on the 6 th week of a given year, once a year. Finally, for routine vaccination of newborns, we also considered coverage rates reflecting differences in socio-economic status across provinces (strategy F*), see Appendix III). The provincial coverage rates were derived for three childhood vaccines (Haemophilus influenza type B vaccine, rotavirus vaccine and Varicella vaccine). These were obtained through a literature review and stratified across three socio-economic groups (high-, middle-, and low-income provinces). Each province was assigned a minimum, mean and maximum coverage rate based on the literature available in its corresponding socio-economic group. Because these vaccines and the current candidate vaccine against EV-A71 are not currently part of the National Immunization Program of China's Ministry of Health, the uptake in each province may be influenced by local economic conditions 47 . All simulations of vaccination were replicated 250 times to compute confidence intervals through Monte Carlo simulations for the reduction of incidence after 5 and 20 years. The differences (or lack thereof) in the reduction of the number of infected individuals associated with each vaccination strategy was assed using an analysis of variance followed by multiple comparison test (Tukey).

Results
Model Fitting. For EV-A71, the median correlation coefficient and median R 2 between observed and simulated incidence at the provincial level were respectively 0.88 and 0.78 (Table 1). For CV-A16, the median correlation coefficient and median R 2 between observed and simulated incidence at the provincial level were respectively 0.90 and 0.81 (Table 2). A scaled TSIR model including the extinction-recolonization dynamics could be fitted in 2,897 counties. The incidence weighed correlation coefficient was 0.47, and the R 2 was 0.27 for EV-A71 and 0.49 and 0.30 for CV-A16. The median reporting rates were very low, 4.20% for EV-A71 (Table 1) and 2.90% for  (Fig. 3), in the northern region the CCS was 679,249 (± 190,549 C.I. 95%) for the estimate obtained from the observed time series and 332,983 (± 213,456 C.I. 95%) for the estimate obtained from the simulated time series. In the southern region, the estimates for the CCS were respectively 554,268 (± 266,154 C.I. 95%) and 315,768 (± 249,074 C.I. 95%) for the estimates obtained from the observed and simulated time series. We thus find comparable measures of CCS for CV-A16 and EV-A71. Significant differences in CCS were identified between regions (North versus South) when considering the observed incidence time series (p-value = < 0.001 for both CV-A16 and EVA-71) but not when considering the simulated incidence for CV-A16 (p-value = 0.35). When considering the simulated incidence for EVA-71, a significant difference was identified between the northern and the southern region (p-value < 0.001) but its magnitude was limited compared with the difference in CCS obtained from comparing the observed incidence time series. Additional evidence for stronger persistence of EV-A71 in the southern region comes from the higher proportion of counties without a single epidemic fadeout in the southern region compared to the northern region. The southern region had an observed 21% and a predicted 62% of counties with no epidemic fadeout,   while the northern region had less than 1% observed and 14% of predicted counties with no epidemic fadeout between 2010 and 2013.

Vaccination Strategies. Simulations of routine vaccination of newborns across China suggest that the CCS
would significantly increase at high coverage rates (Fig. 4A). At a vaccination rate of 85%-as expected if the vaccine is included in the NIP-the benefits of routine vaccination would be comparable in the northern and southern regions with increases in CCS of 130% (from 390,850 to 900,750) and 125% (from 336,979 to 758,332) respectively. Further increasing vaccination rates (> 90%) to raise the CCS would marginally benefit the Northern region relative to the Southern region (Fig. 4B), this is that any increment of 1% in vaccination coverage yields to a higher increase in CCS in the northern region compared to the Southern region. However these estimates should be treated with caution; at high vaccinations rates (95-99%) there are only a few dozen counties experiencing discontinued epidemics that could be used in further simulations to estimate the CCS. Five spatial vaccination strategies were simulated at the county level (Fig. 5, Tables 3 and 4). For Strategy A, large rural counties in the western region of China, Sichuan and Heilongjiang provinces as well as the peri-urban counties of the South East were selected for vaccination. All major agglomerations were excluded. For Strategy B areas subject to vaccination are located in major urban center such as the south of Guangdong province, the municipalities of Shanghai and Beijing, as well as the surroundings of Chongqing, Chengdu and Hunan provinces. For strategy C most peri-urban areas in the country were selected for vaccination, but all the very rural or highly populated counties located on the South-East coast were excluded. For strategy E most urban centers were selected for vaccination with additional counties in the northern and western region.
In a context where the number of doses available is equal to half the number of yearly births (Table 3)  In a context where the number of doses available is sufficient to achieve 85% coverage of newborns, this vaccination strategy (F) outperforms all the spatially explicit strategies based on a combination of newborn vaccination and pulse vaccination using the same number of doses across the country (A-E). The routine vaccination of newborns at coverage rates reflecting the level of coverage achievable under the national immunization program could result in a reduction in the number of infectious individuals of 71.7% after 5 years and 77.8% after 20 years ( Table 4). The relative reduction in the number of infected individuals after 5 years for routine vaccination was 92% to 47% more important compared with all the spatially targeted strategies combining routine birth vaccination with pulse vaccination. Finally, the routine vaccination of newborns with coverage rates reflecting vaccine uptake for other similar childhood diseases that are not currently included in the NIP (strategy F*) showed a low reduction in the number of infectious individuals (40% after five years) compared to the reduction that could be achieved under the NIP (> 85% coverage).

Discussion
Critical Community Size. In this study we used incidence time series and mathematical models to estimate the CCS of HFMD in China, and investigate the effect of spatially heterogeneous vaccination strategies against EV-A71. To our knowledge this study is the first to attempt to quantify the CCS of HFMD in China. Our results suggest a comparable CCS for CV-A16 and EV-A71 but significant differences between regions (North versus South). The magnitude of these regional differences, and in particular the smaller critical community sizes identified in the southern region are consistent with the longer seasonal persistence reported by Xing et al. 12 as well as and the bi-annual transmission dynamics identified by Takahashi et al. 38 . Longer transmission seasons in the South potentially allows the disease to persist in smaller communities by increasing chances of re-introduction, and this may be a critical feature to consider when implementing nation-wide vaccination campaigns. In this context, we identified that routine vaccination of newborns could substantially increase the population threshold for persistence of HFMD. The benefits of routine vaccination would be more pronounced in the northern region   at moderate vaccination rates. Overall, we found considerable evidence for local persistence of both serotypes of HFMD, in particular in the southern region. For each strain, EV-A71 and CV-A16, we found important discrepancies between CCS estimates obtained through simulation versus reported incidence, with ratios ranging from 2.03:1 to 1.56:1 depending on the serotypes and region considered. The smaller regional differences for the CCS obtained from the simulated incidence may further be attributed to this procedure because by indistinctively filling interruptions in times series the process may generate similar epidemic patterns from small outbreaks. CCS estimates obtained from the reported cases might at first be considered more reliable, as they are directly derived from only epidemiological evidence. However, we show that the reporting rates of HFMD are likely very low in agreement with previous evidence for underreporting 38 . The proportion of zeros in the incidence time series may thus be largely overestimated and consequently lead to overestimates of CCS 28 . Whilst our approach for correcting underreported time series partly attempts to address this issue (Appendix II), the adjusted incidence should nonetheless be treated with caution. As a consequence of the very low reporting rates, prolonged interruptions in reported incidence may correspond to either an actual interruption in transmission or no cases being reported. In the absence of detailed epidemiological studies on treatment-seeking behavior as well as a better quantification of potentially time varying reporting rates these two situations cannot be distinguished. In this context CCS estimates obtained from simulated incidence provide a methodological alternative to the correction of observed incidence with very low and therefore uncertain reporting rates. Additionally, the smaller regional differences for the CCS obtained from the simulated incidence may further be attributed the underreporting procedure and the migration terms as implemented in the TSIR model, because by indistinctively filling interruptions in times series the process may generate similar epidemic patterns for different areas characterized by regular small outbreaks. Whilst the accuracy of the estimates obtained through simulation is contingent on the parameterization of the immigration process as well as the mathematical structure of our epidemic model, we found that our estimates of CCS are not dissimilar from other (and more extensively studied) self-immunizing childhood illnesses such as measles , rubella and pertussis 29 . Spatial Vaccination Strategies. We simulated seven vaccination strategies, including five strategies to target a limited number of doses to specific geographic areas characterized by either high, low, medium and randomly selected population sizes or high prevalence of HFMD. The underlying and competing hypothesis as to why each vaccination strategy implemented could yield to the best results in term of reducing the number of HFMD cases goes as follows. On one hand, targeting vaccination primarily in highly populated (or highly infected) counties could decrease the intensity of the recolonization mechanism even if the level of vaccination is insufficient to locally eradicate the disease. This is possible because highly populated (or highly infected) counties contribute an important share to the provincial incidence which directly affects the intensity of the recolonization process as formulated in our model. This strategy could be seen as 'turning off the tap' of infections from large areas that might subsequently spillover into the smaller areas. On the other hand, the alternative hypothesis which consists of targeting vaccination primarily in many small populations is related to the idea that if the small populations are brought under a critical threshold, the probability of stochastic extinction at the scale of small populations increases substantially, and thus epidemics will progressively fade out as the result of susceptible depletion. An analogy for this strategy could be seen as 'putting out a large fire with a blanket' , working from the edge to limit the oxygen uptake and progressing towards the center of the fire. In this study, we observe that the 'turning off the tap' strategies-targeting vaccination in high prevalence areas or large populations-outperforms the other strategies implemented. Vaccination targeted in populations close to the CCS does not have a strong advantage over vaccination in small areas 43 . Thus if vaccination campaigns against HFMD should be implemented in the foreseeable future, and if the number of doses is limited, our results suggest these resources would be optimally deployed in highly populated urban areas and in high prevalence areas which is potentially also easier to implement from a logistical standpoint. If the number of doses is sufficient to cover all new births, our results suggest that routine vaccination at birth would be the most effective allocation of vaccines provided that the vaccine is included in the NIP, and reaches high coverage level (> 85%). All spatially heterogeneous vaccination strategies based on a combination of pulse vaccination and birth vaccination were largely outperformed by routine vaccination with an equal number of doses by a large margin.
Limitations. As with any model-based study, our analysis is subject to assumptions and limitations. The TSIR framework used in this study does not presently account for age structure, cross-protection between HFMD serotypes or waning of vaccine-induced immunity (Hsiao et al., in prep 45 ,). Our estimates for the reduction in the number of cases after 5 and 20 years may thus represent a conservative overestimate of the time necessary to reach the low levels of incidence presented in this study. Additionally the models used here were parameterized using the last four years of the time series of HFMD incidence. This choice was motivated by the unreliable reporting of HFMD in 2008-2009 that resulted in poor statistical fits at the province level if reports prior to 2010 were included. Using a 4-year time series here is necessarily short compared to other studies using the TSIR framework, and this may thus affect the robustness of our simulations. Additionally, in order to fit epidemics parameters in each county we used a scaling methodology that involves a baseline immigration of one migrant per week. Whilst this has limited effect on the overall dynamics of HFMD within each spatial unit ( Supplementary Fig. S4) it may have marginally affected our final estimates of CCS. Finally, as surveillance has improved over the last decade, more recent incidence reports may be associated with higher reporting rates 48 , although yearly variation is partially captured through the use of a time-varying function such as the smooth spline used in the study.
Conclusion and next steps. Over the last couple of decades seasonal HFMD epidemics occurred regularly in East and South East Asian countries. Over the course of this period, urban migrations and urban population in these regions have increased in an unprecedented way (World Bank 2015). Symptomatic of this trend is the fact that 4 of the world's 10 largest cities in 2015 49 were located in China. In this study, we highlighted the impact of such densely populated areas on disease dynamics: first these densely populated areas allow a disease such as HFMD to persist over large geographic areas despite its relatively high CCS, and second these areas play a key role in sustaining the immigration-recolonization dynamics in less densely populated peri-urban areas. Further work on the timing, seasonality and synchrony of epidemics along an urbanization gradient could help better understand the spatio-temporal dynamics of HFMD as well as other emerging childhood diseases in Asia. In this context, data on urbanization and within-China population movements could help to shed light on the mechanistic process underlying the apparent dynamics of numerous childhood diseases.