Spatio-temporal analysis of association between incidence of malaria and environmental predictors of malaria transmission in Nigeria

Malaria still poses a significant threat in Nigeria despite the various efforts to abate its transmission. Certain environmental factors have been implicated to increase the risk of malaria in Nigeria and other affected countries. The study aimed to evaluate the spatial and temporal association between the incidence of malaria and some environmental risk factors in Nigeria. The study used malaria incidence and environmental risk factors data emanating from 2015 Nigeria Malaria Indicator Survey accessed from the Demographic and Health Survey database. A total of 333 and 326 clusters throughout the country were used for malaria incidence study and environmental variables respectively. The spatial autocorrelation of malaria incidence and hotspot analysis was determined by the Moran’s diagram and local Moran’s I index, respectively. The relationships between the malaria incidence and the ecological predictors of transmission were analysed in all the six geopolitical zones of Nigeria from 2000–2015 using ordinary least square (OLS), spatial lag model (SLM), and spatial error model (SEM). Annual rainfall, precipitation and proximity to water showed significant positive relationship with the incidence rate of malaria in the OLS model (P < 0.01), whereas aridity was negatively related to malaria incidence (P < 0.001) in the same model. The rate of incidence of malaria increased significantly with increase in temperature, aridity, rainfall and proximity to water in the SEM whereas only temperature and proximity to water have significant positive effect on malaria incidence in the SLM. The modelling of the ecological predictors of malaria transmission and spatial maps provided in this study could aid in developing framework to mitigate malaria and identify its hotspots for urgent intervention in the endemic regions.

Modification of environment caused by impoundment for dam construction and irrigation schemes can influence the type and distribution of mosquito breeding sites 8 .
The application of spatial analysis of risk factors including environmental factors that aid transmission is very important in the fight against several vector borne diseases including malaria. Three environmental variables including stream density, road density, and land surface have been observed to be significantly associated with West Nile Virus using least squares regression (LSR) spatial analysis 9 . Visceral leishmaniasis transmission hotspots were also identified using global and local autocorrelation analyses 10 . In Bangladesh, spatial models showed normalized difference vegetation index (NDVI) as the best leading indicator of incidence of malaria transmission. Vegetation greenness was negatively correlated with incidence of malaria 11 . Despite the burden of malaria in Nigeria, the use of spatial statistics to examine interrelationship between incidence of malaria and prevailing environmental factors are still relatively understudied. This method is very useful in identifying disease hotspots within a specific region for possible intervention.
The identification of malaria transmission hotspots through the use of spatial statistics for targeted intervention is important because if interventions are not targeted, residual malaria transmission are likely to persist in hotspots 12 . Studies in East and West African countries have supported the observations on malaria hotspots persistence following overall reduction in malaria transmission 13,14 . This can have a serious implication in malaria control as transmission hotspots may stall intervention programmes. So, a logical and viable control intervention will focus more on malaria hotspots. Given the problem of low availability of resources in many malaria endemic regions, spatial analysis for identification of hotspots for targeted control becomes more cost-effective. The study therefore attempts to investigate the spatial and temporal variation in malaria incidence rates using a nationally representative Malaria Indicator Survey of 2015 which covered 326 clusters in the six geopolitical zones of Nigeria for the period 2000-2015. It is proposed that certain environmental factors significantly influence the incidence rate of malaria in Nigeria in space and time. To establish this, a number of exploratory and spatial statistical models were used. It is expected that the models will identify malaria transmission clusters in Nigeria for necessary interventions.

Results
Malaria incidence in nigeria. The incidence rate of malaria was presented in Table 1. Generally, the incidence of malaria was higher in the Northern than in the Southern region of Nigeria. The result showed that the incidence of malaria was significantly higher in the North Central region of Nigeria than the rest of the country (P < 0.05). No significant variation in incidence of malaria was observed in the three geopolitical zones of Nigeria including the South East, South South and South West (P > 0.05). Malaria incidence rate in rural residential areas (0.430 ± 0.103) was significantly higher than in urban settings (0.368 ± 0.123) (P < 0.05) ( Table 2).
Between 2000 and 2005, there was no significant difference in the incidence rate of malaria in Nigeria. However, a significant drop in mean incidence from 0.436 ± 0.112 in 2005 to 0.377 ± 0.120 in 2010 was observed (Table 3). In 2015, a further significant decrease in mean incidence of malaria was observed (P < 0.05). The variations in mean incidence rate of malaria and environmental factors (from 2000-2015) that influence malaria incidence in the six geopolitical zones in Nigeria is presented in Non-spatial relationships between incidence of malaria and environmental variables. A significant positive correlation occurred between the incidence rate of malaria and maximum temperature (r = 0.094, P < 0.05), and proximity to water (r = 0.216, P < 0.01). A significant negative correlation, however, was recorded between malaria incidence rate and aridity (r = −0.133, P < 0.01), rainfall (r = −0.094, P < 0.05) and precipitation (r = −0.100, P < 0.05). Rainfall correlated negatively with maximum temperature (r = −0.791, P < 0.001).
non-spatial versus spatial regression on impact of environmental variables on incidence rate of malaria. The predictors such as annual rainfall, precipitation and proximity to water have positive and significant effect on the incidence rate of malaria (P < 0.01) in the OLS model whereas aridity was negatively related to malaria incidence rate (P < 0.001) in the same model. The rate of incidence of malaria increased significantly with increase in temperature in the SLM and SEM spatial models (Table 4). In addition to negative and significant impact of precipitation on incidence of malaria in SEM, the coefficient of temperature, aridity and proximity to water were also positive and significant. The SEM model with smallest information criteria value (AIC = −2686.945, BIC = −2645.809) provide best explanation on impact of selected environmental factors on malaria incidence. The non-spatial OLS perform poorly compared to the spatial models.
Moran's I statistics for determination of spatial autocorrelation. A significant Moran's I statistics of 0.440 was observed (P < 0.05) Fig. 1. Figure 2 showed the Moran's I scatter plot of incidence rate of malaria. Points in quadrant I showed clusters with high malaria incidence rate (relative to average of the 344 clusters) was surrounded by clusters of high malaria incidence rate (HH), quadrant II showed regions with low malaria incidence rate surrounded by clusters with high malaria incidence rate (LH), quadrant III showed regions with low malaria incidence rate surrounded by clusters of low incidence rate of malaria (LL), and quadrant IV showed regions with  Table 4. Spatial models showing correlation between incidence rate of malaria and environmental variables.  www.nature.com/scientificreports www.nature.com/scientificreports/ high incidence rate of malaria surrounded by clusters of low incidence rate of malaria (HL). The plots showed more cluster points in quadrant I and III.
LISA significance map of Local Moran's I test for local spatial autocorrelation patterns of incidence rate of malaria was presented in Fig. 3. The bright green and green shade clusters represented regions of malaria incidence rate which showed significant local spatial autocorrelation (P < 0.05). Fig. 4. Further analysis of LISA showed that there were 344 hotspots locations distributed across the six geopolitical zones. The distribution of these hotspots locations revealed that the Northern geopolitical zones have larger proportion with North Central, North West and North East having 33.7%, 29.1% and 18.0%, respectively. Whereas, the proportion of hotspots in the South South, South East, and South West were 8.4%, 7.0% and 3.0%, respectively ( Table 5). The distribution of hotspots of malaria incidence depicted in Fig. 5 showed similar pattern across the geopolitical zones over time with the North Central taking the lead and immediately followed by the North West. However, there was a systematic decrease in the number of hotspots clusters in each of the geo-political zone from year to year. In general, the hotspots clusters reduced by 51.5% between 2000 and 2010.

Discussion
Malaria continues to be a serious threat in all regions of Nigeria. Studies across Nigeria have attributed higher prevalence of malaria as high as 70-99% to the South [15][16][17] . The reasons in support of this are the higher rainfall patterns, more water bodies and heavy forest which are predominant environmental factors that characterise the South, and which aid malaria transmission in the region 18 . However, this study showed that the incidence of malaria which is the number of new malaria cases during 2000-2015 period of time was significantly higher in the Northern than in the Southern Nigeria. The North East Nigeria although recorded higher incidence than all the Southern regions, the incidence in the region was lower than the two other geopolitical zones in the North.  www.nature.com/scientificreports www.nature.com/scientificreports/ This lower incidence could be attributed to larger coverage of insecticide treated nets (ITNs) in the region compared to the rest. In a report by the Nigeria Malaria Indicator Survey in 2010, 67.4% of individuals from the North East claimed ownership of at least one ITN, while 32.7% and 59.7% were reported for North Central and North West respectively 19 .
Although the prevailing climatic conditions appeared to be negatively correlated with malaria transmission in Northern Nigeria, areas surrounding the confluence of the Rivers Niger and Benue in the North Central and many isolated areas of the North East and North West parts of Nigeria have been reported to have as high as 70% prevalence of malaria 18 . Poor access to health care and public health services in isolated areas of the North and impact of Rivers Niger and Benue could be responsible for such high endemicity of malaria in the regions. Higher malaria incidence in the rural areas of Nigeria could be attributed to the prevailing cultural practices in the areas that could predisposed the people to infection by malaria parasites. Many of the rural areas both in the North and South are isolated and very difficult to access. Besides, poor socio-cultural development of the areas and lack of basic social amenities often discourage health workers posted to the places. Because of these fundamental problems, health service delivery in those areas is often poor and it usually undermines the people's access to good health care services. Although malaria incidence in urban centers is lower in this study, there is however, stable transmission of malaria in Nigerian urban regions. One important reason is that some urban centers in Southern Nigeria are located in the coastal regions thus providing suitable breeding sites for mosquito vectors of malaria parasites. Urban agricultural development involving irrigation is common in the North and this could facilitate malaria transmission in the region. Poor drainage systems and creation of artificial vector breeding sites like ditches and tyre tracks during heavy downpour are common in Nigerian urban centers.
The Malaria is a disease whose transmission is greatly influenced by environmental factors. These factors are good predictors of transmission but could share non-linear relationship with mosquito abundance and malaria transmission 18 . The impact of environmental variables on malaria transmission can be adequately established by spatial statistical models which can predict the transmission of malaria both in space and also in time. Rise in temperature shortens the blood meals-seeking behaviour of female Anopheles mosquito, therefore causing a corresponding decrease in ovulation and production of juvenile mosquitoes. The temperature of as high as 34 °C which is the average upper limit temperature recorded in the Northern Nigeria has been reported to cause reduction in the gonotrophic cycle length of mosquitoes 20     www.nature.com/scientificreports www.nature.com/scientificreports/ areas of Nigeria falls within the optimum value for the development of sporozoites within the mosquitoes 21 . The daily survival of mosquitoes is also influenced by temperature. Mosquitoes' daily survival rate of about 90% has been attributed to temperatures between 16 °C and 36 °C 21 . Whichever way, it is clear that temperature is a very important factor that aids transmission of malaria both in the Northern and the Southern parts of Nigeria. This claim was supported by a positive correlation between incidence of malaria and temperature. More importantly, the very significant relationships the SLM and SEM spatial models showed with temperature makes the later a significant predictor of malaria transmission in Nigeria. In fact, the positive and significant spatial lag coefficient in the SLM indicates that malaria incidence rate in one cluster depends directly on the rate of incidence in its neighboring clusters affected by temperature.
Rainfall and precipitation are also two factors that affect incidence of malaria in Nigeria. However, because of the variation in the rainfall patterns of the Northern and Southern parts of Nigeria, these may affect malaria transmission dynamics differently. The Southern regions enjoy a longer duration of rainfall than the North. So, transmission is usually higher at the onset of rainy season and the beginning of dry season 22 . The characteristic rainfall patterns in the Southern Nigeria create shallow water pockets suitable for breeding of Anopheles gambiae which is the main mosquito vector of malaria parasite in Nigeria 23 . The negative correlation in the non-spatial statistical analysis is suggestive of a negative impact continuous heavy rainfall especially during the peak rainy season may have on malaria parasites vectors and eventual transmission of the disease. This could explain the reason the incidence of malaria is relatively lower in the region compared to the Northern part of Nigeria. Precipitation has been considered to be the most important climatic factor that influence incidence of malaria in the lowlands 24 . Our study showed that precipitation is strongly correlated with rainfall. The impact of precipitation in malaria transmission is both direct and indirect especially where dams are situated. It raises reservoir's water level and creates potential mosquito breeding sites along the shorelines 24 . Previous findings from Nepal using generalized additive mixed models (GAMM) however, showed that maximum temperature and rainfall were not significantly associated with malaria incidence 25 . The same was observed with rainfall in Bangladesh 11 but our study was similar to reports from India and Sri Lankan which reported negative correlation between rainfall and incidence of malaria 26,27 . The difference in the sign and significance of the parameters between the OLS and the spatial models confirms the assertion that OLS models remain unbiased in the presence of spatial autocorrelation but remain inefficient and inconsistent in SEM and SLM models, respectively. Misleading conclusion is inevitable when OLS technique is used in analysing sample data collected for regions or points I space. The significance of the spatial autoregressive parameter ρ (Rho) in the SLM and λ (Lambda) in the SEM indicated that spatial autocorrelation exists in the data and that the spatial model is more appropriate than standard aspatial model which is prone to misleading result and under or over estimation of the parameters. This result agreed with Anselin 28 and LeSage and Kelly 29 that OLS result is inconsistent and inefficient in the SLM and SEM models, respectively Aridity is higher in the North than in the South and its increase influences malaria transmission by reducing mosquito biting rate and the adult lifespan than the extrinsic incubation period for malaria parasite 24,30 . Using the work of De Martonne 31 , the North East and North West zones of Nigeria are semi-arid, while the North Central is semi-humid. The South West, South East and South South regions of Nigeria are humid, very humid, and extremely humid respectively. A significant Moran's I statistics also denotes the same and justifies that malaria incidences in the nearby clusters are more related than those far away. The univariate Moran's scattered plots showed more points in quadrant I and III denoting a positive spatial autocorrelation pattern in incidence of malaria among clusters in different regions of Nigeria. The extent of this autocorrelation was tested by LISA model which is a class of spatial statistics that provides information specific to clusters and estimates the extent of spatial autocorrelation of malaria incidence in a particular cluster in relation to its neighbours. The over 700 clusters with significant local spatial autocorrelation patterns in incidence of malaria as revealed by the LISA significance plots shows that there are indeed presence of spatial association in incidence of malaria in Nigeria. The reduction in malaria hotspot clusters from 2000-2015 indicated that the various interventions from government and international agencies to combat malaria in the country has been productive.
One limitation of spatial modelling is that while infectious disease data has a lot of intra-and inter-annual variability depending on epidemic and non-epidemic periods, the regression analysis assumes the association between exposure and outcome to be stationary over time 32 . conclusion Our study has shown that malaria is still a serious problem in all the regions of Nigeria with environmental factors like rainfall, temperature and aridity playing important roles in transmission of the disease. There is more malaria incidence in the North than in the South and rural than urban areas. The spatial statistical models adopted are important to design a prompt and early malaria transmission mitigation support system in suspected regions. The models can help to generate malaria risk map and spatially channel available resources to the disease hot spots.

Materials and Methods
Study area. The study was carried out in Nigeria, a country in sub-Saharan African region, located between latitudes 4°16′ and 13°53′ North and longitudes 2°40′ and 14°41′ East. The country has a total surface area of approximately 923,768 square kilometers and density of 212.04 individuals per square kilometers. One of the country most severe public health problems is malaria and the climatic conditions of the country make it suitable for recurrent malaria transmission. There have been various interventions from government and international agencies to mitigate the burden of this tropical disease. collection and use of data to monitor and evaluate population, health, and nutrition programmes. Data emanating from the survey are processed and made available upon request for download through the Demographic and Health Surveys (DHS) Programme website. The data often come with geospatial covariates and it is often difficult to link these covariates with the DHS Programme's data to determine the impact of location on health outcomes. To alleviate the difficulty, the DHS Programme Geospatial Team developed a set of standardised files of the most commonly used geospatial covariates already linked with the dataset.
The covariate variables came from two types of data: raster and vector. Raster data, such as images and modeled surfaces, rely on pixels or cells to convey their data values. On the other hand, vector data, such as points, lines, and polygons, show the discrete location or boundary of a feature. Because of the differences in the data types, the methods needed to extract meaningful values varied. Firstly, Geospatial covariate layers (i.e. modeled surfaces) that are relevant to the DHS Programme indicators were acquired from Digital Globe (~35 cm resolution) remotely sensed imagery. GPS coordinates representing the location of a survey cluster were obtained from the DHS programme. In addition to modeled surfaces, vector (polygon and line) data, which were obtained from various publicly available sources were also included. Secondly, Raster and vector datasets were imported and linked to GPS using a standalone Python programming language script and ArcGIS, respectively.
The study used data emanating from 2015 Nigeria Malaria Indicator Survey (NMIS) accessed at the DHS website. The 2015 Nigeria Malaria Indicator Survey was implemented by the National Malaria Elimination Programme (NMEP), the National Population Commission (NPC), and the National Bureau of Statistics (NBS) and other international agencies from October 2015 through November 2015. The International Classification of Functioning, Disability and Health provided technical assistance as well as funding to the project through the DHS Programme; a project funded by the United States Agency for International Development (USAID) 19 .
Rainfall data was obtained from a satellite-based rainfall product called the Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) which has high temporal and spatial resolution 33 . Maximum temperature and precipitation data were obtained from the Climate Research Unit (CRU) of the University of East Anglia, UK, which produces a range of global climate time series gridded data, derived from meteorological stations across the world's land areas. The datasets were provided on high resolution (0.5 × 0.5 degrees) grids over the period 1901-2016 34, 35 . Aridity was modeled using data available from the WorldClim Global Climate Data and was updated for the period of 2000, 2005, 2010 and 2015 using high resolution grids obtained from the CRU datasets 35 . Proximity to water data was extracted from lakes dataset (L2) at full resolution and the shoreline dataset (L1), also at full resolution, in the Global Self-consistent, Hierarchical, High-resolution Shoreline (GSHHG) database. The datasets used were based on the World Vector Shorelines, CIA World Data Bank II, and Atlas of the Cryosphere 36,37 .
Sampling procedures. A two-stage sampling strategy was adopted for the 2015 NMIS. In the first stage, nine clusters (EAs) were selected from each state, including the Federal Capital Territory (FCT). The sample selection was done in such a way that it was representative of each state. The result was a total of 333 clusters throughout the country, 138 in urban areas and 195 in rural areas. The geospatial covariates of 2015 NMIS housed data on malaria incidence (defined as the average number of people per year who show clinical symptoms of Plasmodium falciparum malaria within the 2 km (urban) or 10 km (rural) buffer surrounding the DHS survey cluster location) as well as the environmental variables measured using remote sensing within the 2 km (urban) or 10 km (rural) buffer surrounding the DHS survey cluster location for 326 clusters within the country over interval of five years (2000, 2005, 2010 and 2015) were used. To ensure completeness of the dataset, all empty cells and inconsistent cases were removed and the retained samples became 1264 as against original cases of 1304 which amount to 96.9% of the total cases. The distribution of the retained samples by geopolitical zones and residence type are shown in Tables 1 and 2. Statistical and spatial analyses. Descriptive statistical analysis, mean difference and association between malaria incidence and the environmental variables were done using frequency counts, percentages, independent t-test, Pearson's Product Moment Correlation (PPMC) and one-factor analysis of variance (ANOVA).
The main motivation for applying spatial statistical model is the existence of spatial autocorrelation. This is analogous to time series serial autocorrelation except that it is multidirectional while serial autocorrelation is unidirectional. Global spatial autocorrelation is commonly detected in georeferenced data by the Moran's I test-statistics 28 and it is given as; x Wx x x Where n is the n × 1 vector of a random variable which has been standardised such that the mean and variance are 0 and 1, respectively. W is an n × n row standardised (row sum equal to 1) spatial weight matrix and S o is the sum of the elements of W. W captures the nature of connected among the spatial units in the data and this can be conceived in the topological notion of neighbourhood. In this study, Queen Contiguity criterion is adopted, which stipulates that two areas are neighbours when they share a common side or vertex. A first order queen contiguity matrix is defined as = W 1 ij if clusters i and j share common side or vertex and zero if otherwise. The diagonal element of W is constraint to be zero so as to prevent a cluster from being a neighbour to itself. Torres-Preciado et al. 38 reported that such matrix facilitates the interpretation of neighbourhood phenomenon underlying the administrative breakdown and improves the efficiency algorithms during the estimation process. Moran's I index takes value between −1 and 1 and it can be interpreted as a product moment correlation coefficient. www.nature.com/scientificreports www.nature.com/scientificreports/ signifies a random spatial distribution. A local indicator of spatial autocorrelation (LISA) 34 or the so-called local Moran's I, test for local spatial autocorrelation. The LISA indicates significant spatial clustering and sums up proportional to the global Moran's I 39 . It is possible for the dataset to have significant local spatial clustering but no global spatial autocorrelation.
Based on the likelihood that malaria incidence in a given cluster might be influenced by the similar incidence in a nearby cluster, Moran's diagram was employed to have a rapid and global knowledge of the global spatial autocorrelation in malaria incidence while LISA was used to detect the hot and cold spots clustering location in the sample. As earlier described, the positive value of Moran's I will be interpreted as high values of malaria incidence and are grouped together in space whereas its negative signified that the dissimilar values of malaria incidence come together geographically. If it is zero, then spatial dependence is absent in the variable and in this case the assumption of independence holds. The cluster and significant map showed the hot and cold spot locations.
Prior to the production of the diagram and map, connectivity matrix among the clusters was created using the coordinates of the cluster displaced by up to 2 kilometers (for Urban points) and 10 kilometers (for Rural points) 40 based on the first order queen contiguity criterion earlier illustrated. In the diagram, the values of malaria incidence on the x-axis was plotted against the average values of the malaria incidence for the neighbouring observations Wy (lagged malaria incidence) in the y-axis. The diagram has four quadrants as shown in Fig. 1. The value above the diagram is the global Moran's I index. If the value is close to zero, it means malaria distribution is spatially random, while a positive value indicates spatial clustering 41 .
Due to the spatial nature of the data and the possibility that malaria incidence in one location may be influenced by similar values in another location, three regression specifications were used to model the relationship between incidence of malaria and environmental factors. The non-spatial regression, Spatial Lag Model (SLM), and Spatial Error Model (SEM) as shown in Eqs 2-4. Ordinary Least Square (OLS) estimation method was used for equation while Eqs 3 and 4 were estimated by maximum likelihood method because OLS estimation of Eq. 3 has been reported to be inconsistent 42,43 while in the case of Eq. 4, it remained unbiased but inefficient 29 .
OLS y X ; (2) β ε = + SLM y Wy X ; (3) ρ β ε = + + SEM y X W u ; ; (4) β ε ε λ ε = + = + The OLS model is aspatial and it behaves well under the assumptions of independence of observations and homoskedastic error terms. Sample data collected for regions or points in space are not independent, but rather spatially dependent 44 . Firstly, data records at proximal locations appear to be either positively or negatively correlated, which is called spatial dependence. Secondly, in spatial data setting the homoskedastic assumption cannot hold due to lack of structural stability across space such as varying parameters or functional forms. Due violation of classical statistics assumption regarding independence and of observation and homoskedastic error terms the need for models that can account for spatial structure in their specification is necessitated. The most common way of adjusting model 2 to accommodate spatial structure is to add spatial lag of the dependent variable or the disturbance term to the model. Models 3 and 4 are spatial regression models in that the spatial lag of the dependent variable (Wy) and that of the disturbance term (Wε) have been added to their specification. The two models revert to aspatial model (model 1) when the spatial effect parameters (ρ and λ) are equal to zero.
Maximum likelihood estimation technique was derived and suggested for SLM and SEM models 28,45,46 . In this approach, the probability of the joint distribution (likelihood) of all observations is maximized with respect to a number of relevant parameters. If the regularity conditions for the log-likelihood functions are satisfied, the obtained ML estimation will achieve the desirable properties of consistency, asymptotic efficiency, and asymptotic normality. Moreover, in most situations, the resulting estimates for the regular parameters of the models are also unbiased 28 .

Data availability
Data will be made available on request.