Comparative epidemiology of poliovirus transmission

Understanding the determinants of polio transmission and its large-scale epidemiology remains a public health priority. Despite a 99% reduction in annual wild poliovirus (WPV) cases since 1988, tackling the last 1% has proven difficult. We identified key covariates of geographical variation in polio transmission patterns by relating country-specific annual disease incidence to demographic, socio-economic and environmental factors. We assessed the relative contributions of these variables to the performance of computer-generated models for predicting polio transmission. We also examined the effect of spatial coupling on the polio extinction frequency in islands relative to larger land masses. Access to sanitation, population density, forest cover and routine vaccination coverage were the strongest predictors of polio incidence, however their relative effect sizes were inconsistent geographically. The effect of climate variables on polio incidence was negligible, indicating that a climate effect is not identifiable at the annual scale, suggesting a role for climate in shaping the transmission seasonality rather than intensity. We found polio fadeout frequency to depend on both population size and demography, which should therefore be considered in policies aimed at extinction. Our comparative epidemiological approach highlights the heterogeneity among polio transmission determinants. Recognition of this variation is important for the maintenance of population immunity in a post-polio era.


Supplementary
. 69 Countries considered in random forests methods. (R Core Team (2016) 1 ) To assess the association between polio incidence and unvaccinated births, we fitted a generalized additive model (GAM) to incidence data and per capita unvaccinated births. The residuals of the fitted GAMs were then used as a response variable in country-specific random forests models. This way the variance in incidence data that could not be explained by the per capita unvaccinated births, could potentially be described by other polio covariates.
We ranked the covariates based on their relative importance in residuals of fitted GAMs by subtracting the R 2 of country-specific random forests models fitted to all predictors, R 2 All, from that of models in which predictors were fitted one at a time, R 2 i. The small value of R 2 All -R 2 i indicates higher predictive power of the factor i ( Supplementary Fig. S3). Most of the variance in polio incidence was explained by population density, percent of people with access to improved sanitation facilities and per capita GDP ( Supplementary Fig. S3).
Supplementary Figure S3. Difference in R 2 of random forest models developed using 1) all predictors and 2) only one predictor. The response variable here is the residuals of (incidence rate ~ per capita unvaccinated births).
To explore the association between incidence, and per capita unvaccinated births in each country, we fitted a generalized additive model (GAM) with integrated smoothness estimation to the data.
We used a GAM because of the flexibility it provides and the agnosticism regarding the response function (eg, linear, quadratic, etc.). To check the sensitivity of our conclusions to this choice, we also fitted a linear regression model to the incidence and the susceptible birth rate and compared model performance using AIC and an F-test. As shown in Fig. S4, for six randomly selected countries, GAM provides a better or similar fit to the data and is associated with lower AIC values and P-value of less than 0.05 (based on the F-test). Therefore, we feel reassured that our application of GAM in our analyses is warranted.
Supplementary Figure S4. Plots of logarithmic incidence rate against per capita unvaccinated births for six different countries. Linear Model (LM) and Generalized Additive Model (GAM) were fitted to each country data points. Akaike information criterion (AIC) values of the fitted models are given.
To identify potential thresholds in the association between incidence, and per capita unvaccinated births in each country, piecewise (segmented) linear regression models were fitted to data 2 using the 'segmented' package in R 2 . Segmented regression estimates a new model having broken-line relationships with the predictor. This relationship is defined by the slope parameters and the breakpoints. Results show a threshold in per capita unvaccinated births through fitted segmented models, below which there is negligible change in polio incidence, however this threshold was not detected in all countries. (Supplementary Fig. S5). L-shaped plot of country-specific per capita GDP against polio incidence (Fig. 3), tell us there are two types of countries, (i) low income countries with high incidence and (ii) high income countries with low incidence. Thus, we divided the data to two groups: countries whose GDP exceeds $1,000, and countries with per capita GDP < $1,000. We explored the impact of economic growth on polio incidence for each group separately by fitting a general additive model (GAM) (See Supplementary Table S1). Among high income countries, there was a significant reduction in model deviance using the smoothed response to vaccination (P-value =0.002), however the smoothed response to per capita GDP was not statistically significant. Also for high income countries the smoothed response to the interaction between GDP and vaccine uptake was significant (P-value=0.022) (See Supplementary Table S1). Among countries whose GDP is less than $1000, only the smoothed response to vaccination was statistically significant (P-value =0.048) (Supplementary Fig. S6(b)).

Supplementary
Supplementary Table S1. Fitted GAM to two groups of data: countries whose GDP exceeds $1,000, and countries with per capita GDP < $1,000 A single random forests model was fitted to countries all-together as a check against possible idiosyncrasies of this approach. Population density, percent of people with access to improved sanitation facilities and per capita GDP were the most predictive covariates of polio incidence globally. The negative values of R 2 All -R 2 i mean the model fitted to one predictor performs better than the model in which all predictors were fitted (See Supplementary Fig. S7 and Table S2).
However as shown in Table S2, the fitted model to all predictor explain 21 percent of variance in the data. These results indicates the geographic heterogeneity in polio incidence. To account for this variance and to identify the role of other covariates, a separate model needs to be fitted to each country (Tables 2 & Fig. 4).
Supplementary Figure S7. Difference in R 2 of single random forests model developed for countries all-together using 1) all predictors and 2) only one predictor at a time.

Random Forest
We ranked the covariates of polio incidence using the country-specific random forests models.
The average increase in Mean Square Error (MSE) of predictions, ΔMSE, was considered as the variable importance index (See Supplementary Table S3). To identify any possible grouping among predictors, the Pearson correlation matrix of covariates rankings was calculated. Circles mark significant correlations (P<0.05) (Supplementary Fig. S8).
We found out that the ranking of percentage of urban population growth is negatively correlated with the ranking of temperature among 69 countries (r=-0.37). This shows that in countries which urban population growth is an important predictive covariate of polio incidence, temperature or vice versa. The same interpretation applies to percent rural population growth versus per capita GDP or versus percent forest cover with r=-0.24 and -0.3 respectively. No other significant correlation was identified among predictive power of covariates. This indicates that there are many statistically significant covariates of change in polio incidence rather than a universal pattern.   Figure S9. Comparison of predicted polio incidence by linear regression models versus observed values for Afghanistan, Pakistan, Nigeria and India using three training and testing sets a) fitted data, b) out-of fit-predictions, c) one-step-ahead predictions.

Supplementary
Polio persistence. Regional persistence of polio was evaluated in island and non-island countries as a function of number of unvaccinated births as well as per capita unvaccinated births.
Segmented model was fitted to the data. The segmented regression model showed a threshold in the extinction frequency of polio for both island and non-island countries ( Supplementary Fig.   S10). Below this breakpoint, polio incidence is unable to remain endemic in the community.
Selected island countries are Antigua and Barbuda, Bahrain, Bahamas, Barbados, Brunei Darussalam, Comoros, Cabo Verde, Cuba, Cyprus, Dominica, Dominican Republic, Fiji, One of the top ranked predictors of polio incidence in our study was percent forest cover. For further interpretation, we plotted the incidence data versus percent forest cover of randomly selected countries ( Supplementary Fig. S11). Surprisingly, percent forest cover had a positive relationship with polio incidence. In these countries, by increasing the percent forest cover, population density and percent of people with access to improved sanitation facilities decreased.
More explanation is given in discussion section of the main text.
Supplementary Figure S11. Plots of logarithmic incidence rate against percent arable and forest lands, population density, and percent of people with access to improved sanitation facilities for 11 different countries. The R-squared and p-value of fitted linear regression model were given for each country.