Risk factors associated with Rift Valley fever epidemics in South Africa in 2008–11

Rift Valley fever (RVF) is a zoonotic and vector-borne disease, mainly present in Africa, which represents a threat to human health, animal health and production. South Africa has experienced three major RVF epidemics (1950–51, 1973–75 and 2008–11). Due to data scarcity, no previous study has quantified risk factors associated with RVF epidemics in animals in South Africa. Using the 2008–11 epidemic datasets, a retrospective longitudinal study was conducted to identify and quantify spatial and temporal environmental factors associated with RVF incidence. Cox regressions with a Besag model to account for the spatial effects were fitted to the data. Coefficients were estimated by Bayesian inference using integrated nested Laplace approximation. An increase in vegetation density was the most important risk factor until 2010. In 2010, increased temperature was the major risk factor. In 2011, after the large 2010 epidemic wave, these associations were reversed, potentially confounded by immunity in animals, probably resulting from earlier infection and vaccination. Both vegetation density and temperature should be considered together in the development of risk management strategies. However, the crucial need for improved access to data on population at risk, animal movements and vaccine use is highlighted to improve model predictions.

Rift Valley fever (RVF) is a zoonotic and vector-borne disease, mainly present in Africa, which represents a threat to human health, animal health and production. South Africa has experienced three major RVF epidemics (1950-51, 1973-75 and 2008-11). Due to data scarcity, no previous study has quantified risk factors associated with RVF epidemics in animals in South Africa. Using the 2008-11 epidemic datasets, a retrospective longitudinal study was conducted to identify and quantify spatial and temporal environmental factors associated with RVF incidence. Cox regressions with a Besag model to account for the spatial effects were fitted to the data. Coefficients were estimated by Bayesian inference using integrated nested Laplace approximation. An increase in vegetation density was the most important risk factor until 2010. In 2010, increased temperature was the major risk factor. In 2011, after the large 2010 epidemic wave, these associations were reversed, potentially confounded by immunity in animals, probably resulting from earlier infection and vaccination. Both vegetation density and temperature should be considered together in the development of risk management strategies. However, the crucial need for improved access to data on population at risk, animal movements and vaccine use is highlighted to improve model predictions.

R ift Valley fever (RVF) is an emerging zoonotic and vector-borne disease caused by infection with a
Phlebovirus (Family Bunyaviridae). The principal arthropod vectors are mosquitoes from the genera Aedes and Culex. The major hosts are sheep, cattle and goats, although the disease can also affect camels, buffaloes and many other mammalian species, including humans. The disease is mainly present in Africa, and represents a threat to human health, animal health and production 1 (24, 20 and 19 affected farms were reported, respectively) and spatially contained, whereas the 2010 and 2011 waves resulted in 471 and 124 affected farms, respectively, together affecting almost the whole country ( Figure 1B-F). In a previous study 3 , three of these five outbreak waves (first 2009 wave, 2010 and 2011) exhibited space-time interaction, which was interpreted as an indication of underlying short and long distance transmission mechanisms. Further research is required to explain the variation in the relative importance of different spatio-temporal transmission processes during those epidemics. Only few studies or field observations have been published on these epidemics and are summarized hereinafter. In 2008, cases were preceded by a period of heavy rainfall 4 , but like other outbreaks in the eastern part of the country, did not develop into a large epidemic. Little published information is available on the 2009 waves, and no unusual environmental conditions were found to be related to the second 2009 wave, although it occurred in an area where flood irrigation was practiced 5 . In 2010 and 2011, RVF epidemics followed periods of heavy rains and localized flooding, and the virus spread almost throughout the whole country.
The aim of the present study was to investigate environmental factors involved in the observed spatial pattern of RVF cases in South Africa across these five outbreak waves. A retrospective longitudinal study was conducted to identify and quantify spatial and temporal environmental factors associated with RVF incidence in South Africa.

Methods
Study design and input data. The study area was the whole country of South Africa, and the study period was January 2008 to July 2011. Since RVF is an internationally notifiable disease, RVF data used were outbreak reports collated from the World Animal Health Information Database 6-10 , totalling 658 RVF outbreaks in livestock farms (cattle, small ruminants or both). Information on the spatial distribution of livestock farms across the country was not available. Therefore the country was divided into a grid of 45,258 cells, each sized 0.05*0.05 decimal degrees, with these cells being the unit of analysis. Over the study period, time was represented as discrete units of one month each. Since no RVF outbreak was recorded in 2000-07 in South Africa 2 , all grid cells were assumed RVF-negative at the beginning of the study period (January 2008). A grid cell was defined RVF-positive if it had at least one livestock farm (cattle, small ruminants or both) that reported a RVF outbreak during a particular month; resulting in 18 positive cells for the 2008 outbreak, 18 and 17 for the first and second 2009 outbreaks, 436 for 2010 and 121 for 2011. Once positive, a cell was removed (censored) from follow-up.
Potential risk factors were searched by conducting a thorough literature review. The results of this literature review are detailed in Table S1. Factors that were accessible in a geo-referenced format were included in the analysis as covariates (Table 1), and each grid cell was attributed values of those fixed-time and timevarying covariates, the latter reflecting monthly intervals. In total, five time-varying environmental variables, namely Enhanced Vegetation Index (EVI, a measure of vegetation density) of the current and month prior to case (EVI t and EVI t-1 ), monthly average Land Surface Temperature (LST, day temperature) of the current and month prior to case (LST t and LST t-1 ) and vegetation density disturbance (EVI d ); and two fixed-time topographic variables, i.e. distance from rivers and waterbodies as well as land use, were considered in the analysis.
The dataset was then split into the five outbreak waves in order to conduct a separate analysis for each (January Fitting models to data. For each outbreak wave, a spatial semi-parametric Cox model with time-varying covariates using a Besag model for the spatial residual variability was fitted to the data to quantify the effects of potential risk factors on RVF hazard. A spatially correlated model was used to account for the data dependence between neighbouring grid cells. The corresponding hazard function h i (t) for the i th grid cell, defined up to cell i's infection time I i , was expressed as follows 11 : With In Equation 1, x ij (t) is the value of the j th time-dependent variable (or fixed effects) for the i th grid cell, at time t; and b j is the coefficient of the j th variable, and p the number of variables. The baseline hazard h 0 (t) is a non-specified positive function of time, which was assumed to be constant over the study period, and represents the hazard function for a grid cell for which all risk factor variables are equal to zero. The Besag model is used to account for the spatial dependence between the grid cells, by adding a spatially structured random effect s 5 {s 1 , ..., s n }. These random effects are defined in Equation 2, with n i being the number of neighbours of grid cell i, i,j indicating that the two cells i and j are neighbours 11 , and t being the precision of the random vector. The fixed effect and the spatial random effect coefficients were estimated from the data by Bayesian inference. The priors assigned to the model coefficients were normally distributed, as were the coefficients of the posterior distributions. In the model presented in Equations 1 and 2, the priors assigned to the fixed parameters were uninformative (normal distribution of precision 0.1 or variance equal to 10), as was the prior distribution assigned to the spatial random effects. For the latter, the precision t was defined as h 5 log(t), where the initial values of the hyperparameter h followed a log-gamma distribution (parameter values: scale 5 1, shape 5 5e205). Uninformative priors were used so all values could be tested and compared in order to estimate the parameters that fitted the best the data. Using informative priors would have expedited the analyses, but since no previous data were available, choosing arbitrary parameters for priors would have been difficult to justify.
The Bayesian model was fitted using integrated Laplace approximation (INLA) methodology 12 . For large datasets, routine MCMC implementations may be challenging, exhibiting convergence problems and requiring a long computational time.
As an alternative, INLA offers fast and accurate results compared to MCMC, as discussed in Rue 2009 13 . We note that the use of INLA requires that the latent variable follows a Gaussian distribution, as is the case for the spatial component in our model.
Univariable and multivariable analyses were conducted (see Supplementary  Information); models were compared and selected using the deviance information criterion (DIC) 14 ; the best and most parsimonious model was the one with the smallest DIC. The model fit was assessed visually by plotting the observed location of cases on a map of the models' fitted hazard ratio; and by an analysis of the martingale residuals' spatial pattern using semivariograms (Supplementary Information). Statistical analyses were implemented in R version 2.13.1 15 . The R-INLA package was used to fit the Besag spatial models 12 and the semivariograms were produced from the ''geoR'' package 16,17 . For the second wave of 2009, and in 2011, results exhibited a different pattern. In the second 2009 wave with cases located close to the Namibian border ( Figure 1D), RVF hazard was increased with proximity to rivers and waterbodies, but decreased with vegetation density and temperatures. In 2011, RVF hazard was also decreased with vegetation density, but increased with high temperatures.

Results
With regard to the visual analyses of the model fit, the models' fitted values for the 2010 outbreak showed a good fit to the observed data ( Figure 1E), but the spatial autocorrelation did not seem to have been removed from the model, especially beyond the distances of about 40 km (0.4 decimal degrees) ( Figure S1D). In 2008, the spatial analysis of the martingale residuals exhibited no spatial structure up to 0.4 decimal degrees ( Figure S1A); and, although positive cases were located in areas at higher risk, the model showed other highrisk areas in the northern and coastal regions of the KwaZulu-Natal, and the coastal region of the Eastern Cape, where no outbreaks were reported ( Figure 1B). In addition, both 2009 models did not show any residual spatial autocorrelation ( Figure S1B and S1C), the fitted values of both 2009 models captured the location of reported cases well, but large areas with no reported cases were predicted to be at higher risk ( Figure 1C and 1D). In 2011, the residuals did not show evidence of positive spatial autocorrelation ( Figure S1E), and the model did not describe the data well, as cases were located in lower risk areas ( Figure 1F).

Discussion
To our knowledge, this is the first study in South Africa which has investigated factors associated with RVF incidence in animals, using epidemic data and in combination with spatial and time-varying environmental conditions. In 2008, the first 2009 wave, and in 2010, RVF hazard was increased with increased vegetation density and if located near water or wetland areas. Whilst the 2008 and the first 2009 waves reported a small number of cases, during the largest epidemic in 2010, temperature above 25uC appeared to be a major risk factor. In the second wave of 2009, increased temperature and vegetation density were, contrarily, protective; but being further away from rivers or waterbodies decreased RVF hazard. In 2011, RVF hazard also decreased with increasing vegetation density and with higher temperatures. Whilst the factors identified for 2008 and for the first 2009 and 2010 waves were consistent with current knowledge on RVF epidemiology, those identified for the second 2009 wave and the ones in 2011 were not, and potential explanation for this are discussed in the following paragraphs. Finally, with regards to the model fit, although few cases were reported every year except in 2010, the locations of cases were fairly well predicted by the models, except for the 2011 wave; and the 2008 and both 2009 models highlighted some areas at higher risk in which no cases had been reported.
Vegetation density as a proxy for rainfall, together with temperature, were used as indicators of habitat suitability for vectors and viral amplification, and therefore were assumed to indicate a potential for underlying vector-borne transmission of the disease. Beyond the common water-related factors, the difference found between the 2008/first 2009 waves and the one in 2010, was temperature. In 2010, average temperature above 25uC was the most important risk factor. This finding is consistent with experimental RVF virus transmission studies conducted for two RVF vector species (Culex pipiens and Aedes taeniorhynchus). These studies showed that temperature above 26uC not only shortened the extrinsic incubation period (time interval between ingestion of virus and subsequent transmission by the mosquito) but also increased RVF virus transmission rate 18 ; and that these effects were reversed with temperature below 26uC in Culex pipiens 19 . As such, temperatures above 26uC may constitute a trigger for RVF epidemics, by amplifying RVF virus transmission in these mosquito species.
In 2010, a set of environmental conditions was observed that may have resulted in improved suitability for vectors' biological cycle and potentially for viral amplification. In 2008 and the first 2009 wave, the number of cases was limited and geographically contained. The absence of sufficient susceptible hosts, the early vaccination implemented on some of the affected and neighbouring farms 6,7 , or only short-term presence of suitable environmental conditions may have prevented a widespread epidemic. The 2008 wave, which began close to the Kruger National Park, could have resulted from re-emergence of the virus from reservoir hosts, since the incriminated RVF virus, of lineage C, was similar to the one that caused an outbreak in African buffaloes (Syncerus caffer) in the Kruger National Park in 1999 20 . The deposition of vertically infected eggs by mosquitoes that fed on those animals has been suggested as a possible mechanism for virus persistence in this part of the country 21 . The first 2009 wave, in KwaZulu-Natal, was caused also by the same lineage C 20 . It is therefore likely that the virus was imported from the 2008 cluster, and emerged following extensive periods of rain. Indeed, those cases were found to be associated with an increased vegetation density, suggesting the presence of conditions favorable for virus re-emergence, consistent with results from other studies 3,4 .
In contrast, for the second 2009 wave, which occurred in the Northern Cape province close to the border with Namibia, vegetation density and increased temperature were found to decrease RVF hazard, and proximity to rivers increased RVF hazard. These findings are consistent with an outbreak investigation that did not observe any specific rainfall events in that area at that time, suggesting that climatic factors were not incriminated, but suggested local flooding related to irrigation techniques as a potential cause of virus emergence 5 . These outbreaks were caused by a RVF virus from lineage H, the same lineage as the one identified in 2004 in Namibia 20 .
In 2011, the results are challenging to interpret, as RVF hazard was increased with increased temperature, but decreased with increased vegetation density. These observed results could have been confounded by the spatial distribution of immune animals, due to prior natural infection (natural RVF infection cause life-long immunity) or vaccination. Indeed, the 2011 cases mainly occurred in the previously least affected areas ( Figures 1B-F 22 . Therefore, the fact that the 2011 cases occurred in the lower risk areas ( Figure 1E) and that RVF hazard decreased with increasing vegetation density supports the hypothesis that animal immunity (resulting either from natural infection or previous vaccination), and not climatic factors played a major role in the spatial distribution of 2011 cases. To investigate in detail the effect of vaccination, data on temporal and spatial vaccine usage are necessary; unfortunately, this information remains unavailable.
The maps in Figure 1 showed that few cases were reported in areas predicted to be at higher risk (specifically in 2008 and 2009). This observation could have resulted from under-reporting, particularly during the later stages of an epidemic within a known affected region rather than for outbreaks in previously unaffected areas; but the models highlighted areas with environmental conditions potentially suitable for RVF occurrence, if virus was to be introduced. Indeed, these models did not explicitly account for between-farm transmission potential given virus exposure. However, the spatial proximity between grid cells was treated as a form of clustering, therefore artificially accounting for the fact that neighbouring cells were sharing similar attributes resulting from their spatial closeness. These common spatial attributes could be, for example, local transmission of the virus due to vector transmission, and this was well captured by all models which did not exhibit residual positive spatial autocorrelation, except beyond 40 km during the 2010 outbreak ( Figure S1D). In that year, the presence of positive spatial autocorrelation beyond 40 km suggests the presence of other spatial processes over further distances, which were not captured by the model, such as wind-carriage of infectious mosquitoes or the movement of infectious animals. Neither movements of infected animals nor of vectors could be accounted for as no data were available. Another potential explanation for the absence of cases reported from areas that were predicted to be at high risk by the models is the variation in the distribution of susceptible host populations. In the model all grid cells were considered equally susceptible at the beginning of follow-up and hence it assumes that the spatial distribution of farms was homogeneous throughout the country. This presents a necessary over-simplification of reality as no spatial information was available on the location of farms, livestock numbers or type of animal production systems (intensive/extensive, indoors/outdoors). The production system may be as important as the livestock numbers or farm location, specifically for the study of vector-borne diseases where it may have a substantial impact on animal exposure to mosquitoes. In 2007, the South African commercial agriculture census estimated about 40,000 farming units, but this estimate combined farming units with field crops, horticulture and animal production activities 23 , without any further detail. In the absence of such data, the assumption of a homogeneously distributed livestock population is left with few alternatives. However, this highlights the need for targeted data collection to help the development of betterinformed models.
Finally, it is acknowledged that the number of cases reported each year, and therefore used in the analyses was rather small, except for the 2010 and 2011 waves. This could have resulted in overfitting, which would necessitate, again a cautious interpretation of the results.
In addition, little is known regarding the spatio-temporal dynamics of vaccine use to control RVF in South Africa; therefore, the impact of vaccination could not be accounted for. Although cells that showed a report of a RVF case were considered ''immune'' and therefore removed from subsequent analysis, this may have only partly adjusted for the impact of immunization following infection, but not vaccination. Indeed, the implementation of vaccination, in local clusters in 2008 and 2009, and then countrywide in 2010 and 2011 6,7,24,25 , may have resulted in a large number of ''immune'' cells, which would not have been at risk anymore, thereby confounding the effect estimates in the Cox regression models. To ensure that the results of such a modelling framework may be reproducible over a wider geographic area, data recording the temporal and spatial aspects of vaccination should be published together with disease information.
In conclusion, this study used a novel Bayesian model fitting approach to provide the first quantitative estimates for risk factors of RVF incidence in livestock during epidemics in South Africa. Except for the second wave of 2009 (for which anthropogenic modification of the environment may have been the main contributor of the epidemics), an increase in vegetation density was the most consistent risk factor across outbreak waves, until the largest epidemic in 2010. In addition, increased temperature was an important environmental risk factor, and therefore should be considered as a key parameter, along with vegetation density, in further quantitative analyses on RVF. It is, however, critical that information on population at risk, animal movements and spatial and temporal patterns of vaccine use are considered when attempting forecast RVF epidemics.