Did Large-Scale Vaccination Drive Changes in the Circulating Rotavirus Population in Belgium?

Vaccination can place selective pressures on viral populations, leading to changes in the distribution of strains as viruses evolve to escape immunity from the vaccine. Vaccine-driven strain replacement is a major concern after nationwide rotavirus vaccine introductions. However, the distribution of the predominant rotavirus genotypes varies from year to year in the absence of vaccination, making it difficult to determine what changes can be attributed to the vaccines. To gain insight in the underlying dynamics driving changes in the rotavirus population, we fitted a hierarchy of mathematical models to national and local genotype-specific hospitalization data from Belgium, where large-scale vaccination was introduced in 2006. We estimated that natural- and vaccine-derived immunity was strongest against completely homotypic strains and weakest against fully heterotypic strains, with an intermediate immunity amongst partially heterotypic strains. The predominance of G2P[4] infections in Belgium after vaccine introduction can be explained by a combination of natural genotype fluctuations and weaker natural and vaccine-induced immunity against infection with strains heterotypic to the vaccine, in the absence of significant variation in strain-specific vaccine effectiveness against disease. However, the incidence of rotavirus gastroenteritis is predicted to remain low despite vaccine-driven changes in the distribution of genotypes.


Initial conditions
We initialized the models assuming there was one primary infection in each age class (except for the <2 month old age group), or 20 primary infections with each of the G1-G3 strains for the strainspecific model, beginning in the first week of January 1970. All other individuals in the population were assumed to be fully susceptible, with the exception of the <2 month olds, who we assumed began in the M state. In the strain-specific model, we introduced a single primary infection with G4 in January 1971 and a single primary infection with G9 in January 1989, to allow for some differentiation among the non-G1 P [8] strains and in line with evidence of the more recent emergence of G9P [8]. We then ran the model until quasi-equilibrium, which appeared to have been reached by the start of our datasets (which varied from January 1993 for the GUH data to July 2004 for the Carenet-NCSF data). Hence, the model "burn-in period" was 23 years when comparing model output to the GUH non-strain-specific data, 29.7 years for the GUH strain-specific data, and 34.5 years for the Carenet-NCSF data.

Model fitting procedure
We calculated the model-predicted incidence of severe RVGE (D a,w ) for the non-strain-specific model as: The log-likelihood of the model was calculated by assuming that the number of reported hospitalizations was Poisson distributed with a mean equal to the model-predicted incidence of severe RVGE (D a,w ) times a hospitalization/reporting factor (h a , estimated) and a correction factor that accounts for changes in the coverage of the Carenet database over time and the proportion of the Belgian population covered by NCSF (c w ): We allowed the reporting factor h a to differ between children <2 years of age and older children and adults, with the reporting factor in 2-year olds equal to the mean of these two reporting factors, in order to account for differences in testing and diagnosis rates.
For the strain-specific model, we evaluated the model fit based on the ability of the model to reproduce three components of the data: (1) For the pre-and post-vaccination genotype distributions, we assumed the observed total number of infections with each genotype followed a multinomial distribution with the probability p g (where g=1 corresponds to G1, g=2 corresponds to G2,…, and g=5 corresponds to G9) given by the modelpredicted genotype distribution, such that the log-likelihood of the pre-vaccination (LL pv ) and postvaccination (LL v ) genotypes distributions are given by: where is the sum of across all weeks w prior to the introduction of vaccination (September 2007-December 2005, is the total number of infections with each genotype g from the postvaccination RSNB data, and (Note that week w=328 corresponds to the last week of December 2005, w=419 corresponds to September 2007 when the RSNB data begins, and w=720 corresponds to the last week of June 2013.) We scaled each of these components as follows such that they contributed approximately equally to the overall log-likelihood (LL S ): where T=720 is the length (in weeks) of the RVGE time series from GUH.
We started by generating 100,000 parameter sets by sampling from reasonable parameter ranges for the nine parameters to be estimated using LHS (Table S1). For each parameter set, we randomly sampled from the 10 best-fit parameter sets for the non-strain-specific model and fixed the values of θ N = {R 0 , ρ A , b 0 , φ, ω} at their corresponding estimated values (Table 1). We then simulated the strain-specific model under each of the sampled set of parameters and evaluated the log posterior probability.
Using a simplex search algorithm starting from the 10 parameter sets that yielded the highest posterior probability, we again found there were multiple parameter sets with approximately equal support (Table S2).
To determine why we could not identify a global maximum in the posterior probability surface of the strain-specific model, we calculated the conditional posterior probability profiles around the estimated parameter set θ S  (where θ S = {r 1 , r 2 , σ HO , σ PH , σ HE , ξ, s RV1 , s RV5 , h G }) that yielded the lowest negative log posterior probability (up to a normalizing constant) (Fig. S4). We varied each of the parameters over a range consistent with that observed among the various estimates while holding the other parameters fixed and calculated the negative log posterior probability. We repeated this analysis using both the (default) non-stiff explicit Runge-Kutta (4,5) solver ("ode45" in MATLAB) and a stiff ODE solver ("ode23tb" in MATLAB), since there appeared to be small amounts of integration error (e.g.
Fig . S4h). The problem was accentuated using the stiff ODE solver, but the general shape of the profiles remained the same (Fig. S4). Examining some of the parameter sets that yielded a poor fit to the data, we discovered that the model had a tendency to exhibit multiannual or chaotic dynamics under certain conditions, particularly following vaccine introduction. This led to multiple peaks and valleys in the posterior probability surface (e.g. Fig. S4a), such that small changes in one of the parameters could lead to large changes in the log posterior probability (e.g. Fig. S4a, Fig. S6). Some of the conditional posterior probability profiles did not appear to exhibit unique maxima ( Fig. S4a- suggesting that the (unconditional) likelihood profiles could not be used to derive confidence intervals for the parameters.
To address the problem with the integration error, we first calculated the residuals of the conditional posterior probability profile for one of the parameters (ξ) versus the best-fit polynomial function, which appeared to provide a good approximation to the posterior probability surface (Fig. S5). We then scaled the log posterior probability such that there would be at most a 5% difference in the posterior probability at or near an optimum.
In order to fully describe the multi-modal posterior probability surface, we used importance sampling to estimate the posterior distribution of the model parameters, as described in the main text. For this analysis, we fixed θ N at the parameter set corresponding to the to the parameter set θ S  that yielded the highest posterior probability identified from the simplex search (θ N corresponds to Model 1 in Table   1; θ S  corresponds to Model 2 in Table S2). However, as a sensitivity analysis, we repeated the procedure using the parameter set θ N that yielded the highest posterior probability for the non-strainspecific model fit to the Carenet-NCSF data (Model 10 in Table 1). While the posterior distribution of some of the parameters was shifted slightly higher, the qualitative conclusions remained the same (Fig.   S7).  Figure S3. Top 100 parameter sets for the strain-specific model during the initial stage of model fitting. The negative log posterior probability of the model given the data is plotted against the sampled parameter value for (a) the relative infectiousness of the non-G1 P[8] strains compared to G1P[8] (r 1 ); (b) the relative infectiousness of G2P[4] compared to G1P[8] (r 2 ); (c) the relative risk of second infection with a partially heterotypic strain (different G-type, same P-type) (σ PH ); (d) the relative risk of second infection with a fully heterotypic strain (different G-and P-type) (σ HE ); (e) the relative risk of second infection with a homotypic strain (same G-and P-type as the strain causing first infection) (σ HO ); (f) the proportion of fully vaccinated individuals who develop a broadly heterotypic immune response (ξ); (g) the proportion of those vaccinated with at least one dose of RV1 who seroconvert and therefore receive any protection (s RV1 ); (h) the proportion of those vaccinated with at least one dose of RV5 who seroconvert (s RV5 ); and (i) the reporting fraction for severe RVGE cases presenting to GUH (h G ). The marker colour and shape corresponds to the sampled fixed parameter set from the non-strain-specific model fits.  Figure S4. Conditional posterior probability profiles for the best-fit strain-specific model parameters. The negative log posterior probability of the model given the data is plotted against the sampled parameter value for (a) the relative infectiousness of the non-G1 P[8] strains compared to G1P[8] (r 1 ); (b) the relative infectiousness of G2P[4] compared to G1P[8] (r 2 ); (c) the relative risk of second infection with a partially heterotypic strain (different G-type, same P-type) (σ PH ); (d) the relative risk of second infection with a fully heterotypic strain (different G-and P-type) (σ HE ); (e) the relative risk of second infection with a homotypic strain (same G-and P-type as the strain causing first infection) (σ HO ); (f) the proportion of fully vaccinated individuals who develop a broadly heterotypic immune response (ξ); (g) the proportion of those vaccinated with at least one dose of RV1 who seroconvert and therefore receive any protection (s RV1 ); (h) the proportion of those vaccinated with at least one dose of RV5 who seroconvert (s RV5 ); and (i) the reporting fraction for severe RVGE cases presenting to GUH (h G ). The red lines represent the results when the model was simulated using a stiff ODE solver, while the blue lines represent the results for a non-stiff ODE solver. Only the results for the non-stiff solver are shown for (f-i) for clarity.  Figure S7. Sensitivity of parameter estimates for the strain-specific model to the choice of parameters estimated from the non-strain-specific model. Box plots of the posterior distributions for (a) the relative risk of second infection with a homotypic strain (σ HO, blue), partially heterotypic strain (σ PH , red), or fully heterotypic strain (σ HE , green) compared to the risk of first infection; (b) the relative infectiousness of non-G1 P[8] strains (r 1 , red) and G2 strains (r 2 , green) compared to G1 strains; (c) proportion of vaccinees who seroconvert and thus receive any benefit of vaccination with RV1 (s RV1 , purple) or RV5 (s RV5 , pink), and the proportion of vaccinees who receive broadly heterotypic protection equivalent to two natural infections (ξ, grey); and (d) the reporting fraction (h G , black) for moderate-to-severe RVGE cases in Belgium to be hospitalized and G-typed at GUH. The coloured boxes represent the posterior distribution for the θ N corresponding to the best-fit parameter set for the strain-specific model, while the white boxes represent the posterior distribution for the θ N corresponding to the parameter set the yielded the second highest log-likelihood during the simplex search.