Spatio-temporal analysis and prediction of malaria cases using remote sensing meteorological data in Diébougou health district, Burkina Faso, 2016–2017

Malaria control and prevention programs are more efficient and cost-effective when they target hotspots or select the best periods of year to implement interventions. This study aimed to identify the spatial distribution of malaria hotspots at the village level in Diébougou health district, Burkina Faso, and to model the temporal dynamics of malaria cases as a function of meteorological conditions and of the distance between villages and health centres (HCs). Case data for 27 villages were collected in 13 HCs. Meteorological data were obtained through remote sensing. Two synthetic meteorological indicators (SMIs) were created to summarize meteorological variables. Spatial hotspots were detected using the Kulldorf scanning method. A General Additive Model was used to determine the time lag between cases and SMIs and to evaluate the effect of SMIs and distance to HC on the temporal evolution of malaria cases. The multivariate model was fitted with data from the epidemic year to predict the number of cases in the following outbreak. Overall, the incidence rate in the area was 429.13 cases per 1000 person-year with important spatial and temporal heterogeneities. Four spatial hotspots, involving 7 of the 27 villages, were detected, for an incidence rate of 854.02 cases per 1000 person-year. The hotspot with the highest risk (relative risk = 4.06) consisted of a single village, with an incidence rate of 1750.75 cases per 1000 person-years. The multivariate analysis found greater variability in incidence between HCs than between villages linked to the same HC. The time lag that generated the better predictions of cases was 9 weeks for SMI1 (positively correlated with precipitation variables) and 16 weeks for SMI2 (positively correlated with temperature variables. The prediction followed the overall pattern of the time series of reported cases and predicted the onset of the following outbreak with a precision of less than 3 weeks. This analysis of malaria cases in Diébougou health district, Burkina Faso, provides a powerful prospective method for identifying and predicting high-risk areas and high-transmission periods that could be targeted in future malaria control and prevention campaigns.

www.nature.com/scientificreports/ research team found that the 27 villages were home to 7408 inhabitants. The villages were linked to 13 HCs. Villages and HCs were geo-referenced using Global Positioning System (GPS) (Fig. 1). Diébougou health district is located in South-Western Burkina Faso, a region characterized by a tropical climate with a dry season from October to April and a rainy season from May to September. The dry season is divided into a cold dry season lasting from December to February and a hot dry season lasting from March to April. Average daily minimum and maximum temperatures in the cold dry, hot dry, and rainy seasons are 18 and 36 °C, 25 and 39 °C, and 23 and 33 °C, respectively. Average annual rainfall is 1200 mm. The natural vegetation is dominated by wooded savannah dotted with clear forest gallery 17,18 . Let's remind that Burkina Faso is spread over 3 climatic zones: in the north, (Sahelian zone), rainfall is less than 600 mm/year. While the centre (or northern Sudanese) zone receives 600-900 mm/year, rainfall in the southern (or southern Sudanese, where Diebougou is located) zone exceeds 900 mm/year. Passive case detection. Case data for the 27 villages included in the REACT project were collected using continuous HC-based passive case detection during 2016 and the first 36 weeks of 2017, which corresponded to the period preceding the implementation of the interventions studied (i.e. pirimiphos methyl-based IRS, enhanced communication, and administration of ivermectin to domestic animals). Specifically, consultation data for village residents were retrieved from each HC registries and recorded using tablets equipped with Open Data Kit collect forms. A malaria case was defined as a person who presented with fever and had a positive RDT result.

Study period.
Of the 88 weeks of data collection, 52 weeks corresponding to an epidemic year (a complete malaria epidemic) were considered for spatio-temporal analysis. The epidemic year ran from week 20 (in May) of 2016 to week 19 (in May) of 2017 (Fig. 2).
Meteorological data. The meteorological data used in this study were drawn from the Era-5 dataset from 2016 to 2017 19 published by the European Centre for Medium-Range Weather Forecasts, which provides hourly estimates of several atmospheric and land parameters at a spatial resolution of 0.25°2 0 . These data were aggregated into weekly counts. The meteorological variables included in the analysis were: Weekly rainfall (mm), number of rainy days per week, weekly mean of daily average temperature (°C), weekly mean of daily minimum temperature (°C), weekly mean of daily maximum temperature (°C), weekly mean of daily average wind speed (km/h), weekly mean of daily average relative humidity (%), weekly mean of daily average atmospheric pressure (hPa), weekly mean of daily average cloud cover (%), and weekly mean of daily thermal amplitude (°C) ( Table 1).  Spatial analyses. Hotspots, i.e. high-risk clusters, were detected using the Kulldorf scanning method 22 with a Monte Carlo algorithm in a purely spatial analysis. The Kulldorf scanning method helps to identify spatial clusters based on geographical coordinates and to avoid the problem of multiple non-independent tests 22 . We defined clusters as aggregates of cases with observed values higher than expected (i.e. unlikely to have been obtained by chance). The p-value (i.e. the probability, under the null hypothesis, that the expected number of cases is the same or higher than the observed number of cases) was calculated for each cluster. Scan parameters were: elliptical window, non-overlapping clusters, maximum cluster size set at 50% of the population at risk, Monte Carlo replication number set at 999.
Temporal analyses. Lagged SMI selection. Several studies have observed a lag between malaria time series and meteorological data time series [14][15][16] . In view of this, we decided to investigate the time lag (in weeks) between the time series of weekly malaria cases and the time series of SMIs. Using a generalized additive model (GAM) with a negative binomial distribution and a smoothing spline function, we modelled the time series of total malaria cases (for all villages) as a function of each SMI for time lags ranging from 1 to 30 weeks (thus generating 30 models per SMI). The GAM is an extension of the generalized linear model (GLM): while it includes random effects in the predictor like the GLM does, it can be used with nonparametric smoothing terms instead of  www.nature.com/scientificreports/ constant parameters [23][24][25] . The usefulness of the GAM lies in the fact that it provides a flexible method to identify the effects of non-linear covariates in exponential family distributions and in likelihood-based methods [26][27][28] . However, instead of estimating a single parameter, the GAM provides an unspecified (non-parametric) general function that compares predicted response values to predictor values. We compared the 30 models generated for each SMI using the unbiased risk estimator (UBRE), i.e. an unbiased estimate of the mean square error of a non-linear biased estimator. For each SMI, the time lag associated with the best model (i.e. with the lowest UBRE) was selected for the multivariate analysis.
Multivariate time analysis. To account for the non-linearity of the relationship between the response and predictor variables, we analysed the time series of weekly cases reported in all villages during the epidemic year using a generalized additive mixed model (GAMM). To account for the non-independence of data from the same village or HC, we fitted this model with nested random intercepts for villages and HCs. The GAMM accounted for space and time processes, by using a Gaussian field and an auto-regressive model. The Gaussian field with a negative exponential variogram accounted for the spatial auto-correlation. The first-order autoregressive temporal auto-correlation structure was introduced, within the random part of the mixed model, to account for the temporal auto-correlation of malaria incidence. We analysed the time series of cases using selected lagged SMIs (with a smoothing spline function) and of the Euclidean distance between villages and their corresponding HCs as predictors. For each predictor, the standardized incidence ratio (SIR) was estimated by modelling the log-transformed population as the offset.
To account for the non-linearity of the relationship between the response and predictor variables, we calculated SIRs according to the deciles of the distribution of values for each predictor. Indeed, SIRs cannot be calculated with GAMs as they are with GLMs, because when the relationship between the response and the predictor is non-linear, SIRs are not constant across the range of values of the predictor 24,28 .
Lastly, we tested the multivariate model fitted with data from the epidemic year to predict the number of cases in both 2016 and 2017.
Software and packages. Statistical analyses were performed using R software (version 3.6.1) 29 . The PCA was performed using the PCA function in the FactoMineR package 30 . The GAMs and the GAMM were generated using the "gam" and "gamm" functions in the mgcv package, respectively [26][27][28] . Data overdispersion was tested using the "dispersiontest" function in the AER package 31 . The spatial analysis was performed using SatScan™ software (version 9.6). Maps were produced using QGIS software (version 3.10) 32 .
Ethics approval and consent to participate. The protocol of this study was reviewed and approved by the Institutional Ethics Committee of the Institut de Recherche en Sciences de la Santé (IEC-IRSS) and registered as No A06/2016/CEIRES and all the methods were performed in accordance with the guidelines and regulations stated in the protocol. Informed consent was obtained from all subjects and/or their legal guardian(s).

Results
Descriptive analysis. A total of 3179 malaria cases were reported in HCs during the epidemic year, corresponding to an incidence of 429.13 cases per 1000 person-years. On average, 61.13 cases per week were reported, with a peak of 132 cases in week 31 of 2016 (week 1 of August; Fig. 2). The curve of cases over the epidemic year shows two peaks (Fig. 2): a very pronounced peak between weeks 27 and 45 of 2016 (August to November), which accounted for 60% of cases, and a less pronounced peak between weeks 7 and 11 of 2017 (mid-February to the end of March), which accounted for 12% of cases.
Synthetic meteorological indicators. The PCA conducted using Kaiser's criterion led us to construct and retain two SMIs that explained 85.4% of the total inertia (Fig. 3A).
The values of SMI1 were positive between late June and early October, which corresponds to the rainy season (Fig. 4). The values of SMI2 were positive between mid-February and mid-June, which corresponds to the hot dry season (March-June), and between October and November, which corresponds to the transition period between the rainy season and the dry season. Both SMIs were negative throughout the cold dry season (December-mid-February) (Fig. 4). Spatial analysis. The spatial analysis allowed us to identify and map malaria hotspots for the epidemic year from week 20 (in May) of 2016 to week 19 (in May) of 2017. Four hotspots were detected that accounted for 1685 cases in 1973 inhabitants, i.e. an average incidence rate of 854.02 cases per 1000 person-years (Fig. 5). These hotspots were mainly located in the southern and central parts of the study area. The hotspot with the highest www.nature.com/scientificreports/   Temporal analysis. The time lag that generated the model with the lowest UBRE was 9 weeks for SMI1 and 16 weeks for SMI2. The multivariate analysis found greater variability in incidence between HCs (standard deviation (SD) = 5.74) than between villages linked to the same HC (SD = 0.69). The coefficient of the temporal autocorrelation structure indicated the presence of temporal autocorrelation between cases (Phi = 0.32, 95% CI [0.20, 0.38]).
In the multivariate model, lagged SMI1 and lagged SMI2 were significantly associated with the number of malaria cases at the village level (p < 0.001 and p < 0.001, respectively). The relationship between the number of cases and SMI1 (consisting mainly of precipitation variables) was positive and almost linear (Fig. 6A) across the range of values. A positive non-linear relationship was observed for SMI2 (consisting mainly of temperature variables), with the number of cases increasing for SMI2 values above zero (Fig. 6C). Below zero, changes in SMI2 values did not influence the number of cases (Fig. 6C). The Euclidean distance between villages and their corresponding HCs was not correlated to the recorded malaria incidence (p = 0.78).
Prediction. The multivariate model generated for the epidemiological year was used to predict the number of cases in the 27 villages for all of 2016 and for the first 36 weeks of 2017. The resulting prediction was superimposed on the time series of reported cases for graphical analysis (Fig. 7). The prediction followed the overall pattern of the time series of reported cases but with a tendency for underestimation, especially during the second peak in early 2017. In addition, the model predicted the onset of the malaria outbreak for the 2017-2018 epidemic year with a delay of three weeks. www.nature.com/scientificreports/

Discussion
In this study, we analysed the spatio-temporal distribution of malaria cases in 27 villages of South-Western Burkina Faso. The spatial analysis conducted using the Kulldorf scanning method helped to identify four malaria hotspots. The first three hotspots were located in the southern part of the study area and the last one was located in the central part, reflecting spatial heterogeneity in the distribution of cases. A comparison of the spatial distribution of these hotspots with that of mosquito vector density 33 showed no correlation between the two, leading us to conclude that the spatial heterogeneity of vector density does not explain the distribution of hotspots in our study area.
A number of studies have found an association between spatial inequalities in access to care and spatial heterogeneity of malaria incidence 34,35 . Yet, contrary to what has been reported elsewhere 36 , we failed to found a correlation between the number of malaria cases and the Euclidean distance between villages and their corresponding HC. We used Euclidean distance because it is considered the simplest proxy for travel time, which is  www.nature.com/scientificreports/ considered a good measure of access to care. However, Euclidean distance may not have been the best option, as roads in our study area are in highly variable condition and some become impassable during the rainy season, with some villages left completely isolated. Future studies in the region should use better proxies for travel time in trying to explain the detected hotspots 36 .
Since entomological factors and spatial inequalities in access to care failed to explain the distribution of hotspots in our study area, other potential explanatory factors should be investigated in the future, including socioeconomic factors (level of education, income, professional activity, individual and societal behaviour, etc.) [37][38][39][40] and factors linked to LLIN usage [41][42][43][44] . Such investigations could help to explain in particular why the two hotspots composed of a single village (Niombripo and Niombouna) had a much higher incidence than neighbouring villages. Nevertheless, hotspot analyses like ours make it possible to identify, in a simple and cost-efficient manner, villages that can constitute priority areas for intervention. Indeed, studies conducted elsewhere have shown that targeting hotspots helps to reduce malaria transmission 45,46 . This strategy is appropriate in resource-limited countries like Burkina Faso as it allows for efficient allocation of prevention resources [14][15][16] .
Our analysis of the temporal dynamics of malaria cases found a strong correlation between malaria incidence and two SMIs with specific time lags. These SMIs were constructed through a PCA of meteorological data derived from readily and rapidly available satellite imagery. The first SMI (SMI1: positively correlated with cumulative rainfall, humidity, cloud cover, and number of rainy days, and negatively correlated with thermal amplitude) corresponded to the rainy season, while the second (SMI2: positively correlated with temperature and negatively correlated with atmospheric pressure) corresponded to the warm periods preceding and following the rainy season. We found that SMI1 and SMI2 predicted the number of cases with a time lag of 9 and 16 weeks, respectively, which is consistent with studies carried out in Burkina Faso, Mali, and Ethiopia 14,47,48 .
In our study, the relationship between rainfall (SMI1) and the number of cases was quasi-linear, as was the case in a study performed in the Ouagadougou area of Burkina Faso 14 . By contrast, two studies conducted in the Sahel region-one in Mali (Niger River Valley, Timbuktu region) and the other in Senegal (Bambey and Fatick Health Districts)-found a monotonic non-linear relationship between rainfall and malaria incidence 15,16 . The drop in the number of cases above a certain level of cumulative rainfall observed in Mali and Senegal may be explained by the flushing out of larval breeding sites, which can lead to high mortality in Anopheles larval populations 49,50 and can reduce the human biting rate 50 . Vector populations are almost monospecific in these two countries: They are largely dominated by An. arabiensis in Senegal 51,52 and by An. coluzzii in Mali 53 . These two species are also present in our study area and in the Ouagadougou area of Burkina Faso 54 . However, in both these areas, they live in sympatry with both An. gambiae s.s. and An. funestus 33,[55][56][57] . The quasi-linear relationship observed in our study between rainfall and the number of malaria cases may be explained by the fact that these species are not very susceptible to flushing out, due to rapid larval development in the case of An. gambiae s.s. 58,59 and to a preference for deeper environments in the case of An. funestus 60 . These species may therefore relay An. coluzzii when abundances of this later fall due to excessive rainfall.
In our study, the relationship between the number of malaria cases and temperature (SMI2) was non-linear. This is consistent with findings from two other studies conducted in the Sahel region (in Mali and in the Ouagadougou area of Burkina Faso) 14,15 . However, unlike these studies, we found no negative relationship between the number of malaria cases and temperature at higher temperature values. This discrepancy may be explained by the fact that temperatures can reach higher values in Mali and in the Ouagadougou area (> 34 °C) than in the Diébougou region, which is sufficient to inhibit the development of Anopheles larvae 61 and to reduce the survival of adult Anopheles 62,63 . In addition, we found that below a certain temperature, an increase in temperature had no effect on the number of cases (a finding also observed by Cissoko et al. 15 ). Our hypothesis is that the increase in temperature, which should favour the development of Anopheles, is compensated by another phenomenon at low SMI2 values. While this phenomenon has yet to be clearly identified, high levels of LLIN usage during cooler periods may be a contributing factor 41 .
Our spatio-temporal model fitted with two lagged SMIs and case data for a single epidemiological year helped to predict the start of the next outbreak nine weeks in advance, but with an error of three weeks (i.e. the actual outbreak began three weeks before the prediction). The prediction was good enough to make it possible to issue early warnings and to organize local prevention campaigns ahead of time. Our model could probably be improved with routine inclusion of new data and regular updated predictions. This study can be generalized to determine optimal periods and zones for prevention campaigns or interventions campaigns related to weather-related diseases such as dengue fever, malaria, etc.… Indeed, prioritizing a few numbers of areas and periods is helpful in strengthening malaria control programme in the context of lack of resources. This is even more important when countries will reach the pre-elimination phase: resources should then be concentrated on the most effective areas and periods. This is the principle of the "bottle neck" approach 11 . On the other hand, recent papers have shown that there is no conclusive evidence that targeted hotspot interventions accelerate malaria elimination. Therefore, a targeted approach to high-risk individuals that allows for a precise delineation of parasite transmission networks within and between households may be investigated 45,64 . For this purpose, data from HC consultations should be made available quickly, ideally at the same pace as ERA5 meteorological data (i.e. within 5 days). This can easily be achieved by using connected tablets for data entry.

Conclusion
In this study, a spatial analysis was conducted that highlighted the spatial heterogeneity of malaria cases and helped to identify four malaria hotspots in South-Western Burkina Faso. In the temporal analysis, an effective predictive model was built with data obtained through passive case detection and with simple and accessible meteorological data. Future studies should further investigate the detected hotspots to identify the local determinants of transmission. Our spatio-temporal analysis provides a powerful prospective method to identify high-risk