Estimating the Attack Ratio of Dengue Epidemics under Time-varying Force of Infection using Aggregated Notification Data

Quantifying the attack ratio of disease is key to epidemiological inference and public health planning. For multi-serotype pathogens, however, different levels of serotype-specific immunity make it difficult to assess the population at risk. In this paper we propose a Bayesian method for estimation of the attack ratio of an epidemic and the initial fraction of susceptibles using aggregated incidence data. We derive the probability distribution of the effective reproductive number, Rt, and use MCMC to obtain posterior distributions of the parameters of a single-strain SIR transmission model with time-varying force of infection. Our method is showcased in a data set consisting of 18 years of dengue incidence in the city of Rio de Janeiro, Brazil. We demonstrate that it is possible to learn about the initial fraction of susceptibles and the attack ratio even in the absence of serotype specific data. On the other hand, the information provided by this approach is limited, stressing the need for detailed serological surveys to characterise the distribution of serotype-specific immunity in the population.

A remark on prior distributions and tail behaviour of the distribution of R t There are a number of approaches to deriving the distribution of R t . Alternatively to the approach described in the main text [1], one could use the conditional distribution of R t on Y t+1 and Y t as defined in equation A7 of Nishiura et al. [2]: Noticing the kernel of (1) is that of a gamma distribution with a 2 = Y t+1 + 1 and b 2 = Y t , we obtain a proper density from which to construct c α (R t ), simply by computing the appropriate quantiles of said distribution. This density is In order to decide which approach to take, it may be of use analysing the tail behaviour of the derived distributions for R t . Consider the case of using a flat Uniform(0, 1) prior for θ t . With a 0 = b 0 = 1, a 1 = a 2 and b 1 = b 2 + 1. The beta prime (inverse beta distribution) will have heavier tails compared to the conditional distribution proposed by [2], thus providing more conservative confidence/credibility intervals. To see that one needs simply take the ratio of the Beta prime and Gamma (unnormalized) densities and evaluate the limit as R t goes to infinity: Note also that we deliberately construct c α (R t ) as a equal-tailed 100α% credible set, rather than a less conservative highest posterior density (HPD) interval. Finally, we performed a simple simulation study to assess the result in equation 3. The simulations were carried out as follows: 1. Create a two-dimensional grid of values for λ 1 and λ 2 , with rate values from 2 to 1000; 2. For each point (λ 2 ) in the grid, • Generate 1000 realisations of the random variables Y • Compute the α% credibility/confidence intervals using the method proposed here (β) and using the distribution proposed by Nishiura The results show that the credibility intervals obtained using the distribution proposed here were larger than (i.e. contained) those computed using the distribution in 2 in 90.69% of the simulations (integrating over the grid), while the converse was observed in 1.14% of the cases. Moreover, our method yielded lower P r(R t > 1) in 79.48% of the simulations, indicating it is indeed more robust to false alarm. An R script to perform the simulation study described above can be found at https://github.com/fccoelho/paperLM1/blob/master/R/ credibility_comparison.R.
As a side note, the Bayesian approach presented in this paper will give similar results to orthodox confidence intervals [3] and [4] for Y t+1 and Y t >> 1. Under the flat uniform prior for θ t , the Bayesian posterior credibility interval is nearly indistinguishable from the confidence interval proposed by Clopper & Pearson (1931) [4] for Y t+1 , Y t > 20. Note also that the uniform prior (Beta(1, 1)) for θ t constitutes a poor prior choice mainly because the induced distribution for R t is only well-defined for b 0 > 2.
An advantage of the Bayesian approach is that one can devise prior distributions for θ t taking advantage of the intuitive parametrization and flexibility of the beta family of distributions. Prior elicitation can also be done for R t and the hyper-parameters directly plugged into the prior for θ t . One can, for example, choose prior mean and variance for R t and find a 0 and b 0 that satisfy those conditions. Let m 0 and v 0 be the prior expectation and variance for R t . After some tedious algebra one finds If one wants only to specify m 0 and the coefficient of variation c = √ v 0 /m 0 for R t a priori, some less boring algebra gives: Figure 1: S and I series fitted to simulated incidence data (blue dots) with S 0 = 0.0621, and R t estimated from real incidence data. This approach thus makes it possible to incorporate epidemiological knowledge about disease Biology (e.g. the magnitude of R 0 ) into the computation of R t . This may prove particularly important when disease counts are low and/or close to the detection threshold. We provide an R script to perform the elicitation described above at https://github.com/fccoelho/paperLM1/ blob/master/R/elicit_Rt_prior.R.
Estimating S 0 from simulated data In order to determine whether the inference methodology proposed can recover the true parameter values of the underlying simulation model, we have devised a simple simulation experiment. Using the SIR model presented in the main paper we have simulated the incidence curve of the 2012-13 epidemic, using S 0 = 0.0621 and R t , as estimated from actual incidence data, and τ = 1. Figure 1 shows the posterior S and I alongside with simulated incidence data. It is clear that the method can recover the correct value for S 0 used to simulate incidence. The script to generate the simulated data and run the inference is available at the paper's Github repository (https://github.com/fccoelho/ paperLM1/blob/master/python/fit_simulated_data.py).

DiffeRential Evolution Adaptive Metropolis (DREAM)
Since the posterior distributions desired in this paper are not available in closedform, we need to resort to numerical methods to obtain approximations. We employ an adaptive Markov chain Monte Carlo (MCMC) algorithm, proposed by [5], called DiffeRential Evolution Adaptive Metropolis (DREAM).
DREAM draws on the basic structure of Differential Evolution Markov Chain (DE-MC) which runs N independent chains in parallel and accepts moves proportional to the difference of two randomly sampled members (chains), thus differentially evolving conditional on the proposal variances. An additional aspect of DREAM is the so-called delayed rejection. This procedure adapts the original Metropolis-Hastings algorithm by attempting new moves whenever a proposal is rejected (instead of setting the current state to the previously accepted one), thus delaying the rejection of proposals. This potentially decreases autocorrelation and improves mixing (see [6] for details).
A desirable property of DREAM is that one can scale the algorithm with model complexity, meaning one can initialise as many parallel chains as there are parameters in the model, thus exploiting the multi-core architecture of modern computers to speed up convergence of the MCMC.
A Python implementation of DREAM for dynamic models is available in the package Bayesian inference with Python (BIP) [7] available at http://code. google.com/p/bayesian-inference/. The code to produce the results presented in this paper is available at https://github.com/fccoelho/paperLM1/ tree/master/python.