Spatio-temporal analysis of small-area intestinal parasites infections in Ghana

Intestinal parasites infection is a major public health burden in low and middle-income countries. In Ghana, it is amongst the top five morbidities. In order to optimize scarce resources, reliable information on its geographical distribution is needed to guide periodic mass drug administration to populations of high risk. We analyzed district level morbidities of intestinal parasites between 2010 and 2014 using exploratory spatial analysis and geostatistics. We found a significantly positive Moran’s Index of spatial autocorrelation for each year, suggesting that adjoining districts have similar risk levels. Using local Moran’s Index, we found high-high clusters extending towards the Guinea and Sudan Savannah ecological zones, whereas low-low clusters extended within the semi-deciduous forest and transitional ecological zones. Variograms indicated that local and regional scale risk factors modulate the variation of intestinal parasites. Poisson kriging maps showed smoothed spatially varied distribution of intestinal parasites risk. These emphasize the need for a follow-up investigation into the exact determining factors modulating the observed patterns. The findings also underscored the potential of exploratory spatial analysis and geostatistics as tools for visualizing the spatial distribution of small area intestinal worms infections.

or among specific population categories, hence are unable evaluate the spatial patterns of infection. Intestinal parasites thrive under climatic and environmental conditions such as warm temperatures, high precipitation and adequate soil moisture [15][16][17] . Infections have often been associated with sociodemographic conditions such as poverty, poor sanitation, and poor drinking water [18][19][20] . In Ghana, prevalence has been associated with sociodemographic conditions 14 . Since these underlying risk factors are spatially dependent, morbidity rates will be expected to exhibit spatially dependent patterns.
Spatial analysis and geostatistics can provide opportunities to study the type and nature of spatial patterns, and where these patterns occur. They have widely been used to study the spatial patterns and estimate the spatial risk of intestinal parasites infections 17,[21][22][23][24][25] . Spatial analysis methods such the global Moran's Index 26 (Moran's I hereafter) and its local counterpart, Anselin's Local Indicator for Spatial Association (LISA) 27 , could illuminate potential causal factors of diseases 28,29 . Geostatistical analysis of health outcomes has also recently received increasing attention as a filtering tool [30][31][32] . For instance, Poisson kriging allows filtering of those noise by integrating population heterogeneities to account for non-constant variance 30 .
In this paper, we utilize spatial analysis tools and geostatistics to study the spatial patterns and provide spatially explicit maps of risk estimates useful for guiding control programs. Our specific objectives are to (1) quantify the type of spatial association, and detect and map clusters, (2) quantify the nature of spatial structure and map risk estimates of intestinal parasites using district level morbidities in Ghana. As neighborhood health planning in Ghana is largely based upon small-areas (administrative districts), studying spatial patterns of infections at the district level will present valuable and easy to implement information.

Methods
Study area and data. Study Area and Data. Ghana is a tropical region centrally located on the West Coast of Africa with a total land area of 239,000 km 2 ( Fig. 1: created with ArcGIS software). The average annual temperature is approximately 26 °C (79 °F). There are two distinct rainy seasons, April-June and September-November, but March-September for the northern belt. Annual rainfall ranges from 1,015 mm in the north to 2,030 mm in the southwest (PHC, 2010). The country consists of ten administrative regions which are subdivided into 216 districts. Ghana is subdivided into six agro-ecological zones: Sudan Savannah, Guinea Savannah, Coastal Savannah, Forest/Savannah transitional zone, Deciduous Forest zone and the Rain Forest zone (Fig. 2).
In this study, we used aggregated clinically or laboratory diagnosed cases of intestinal worms parasites. Due to the protection of patient privacy and perhaps deficiencies in address geocoding systems, publicly available data on precise locations of disease cases are uncommon. Consequently, the spatial scale of our study was limited to the 170 administrative districts of which data were available. We obtained district level yearly aggregated cases of intestinal parasites infections from 2010 to 2014 from the Centre for Health Information and Management (CHIM) of the Ghana Health Service (GHS). CHIM is responsible for compiling and ensuring uniformity in reporting and managing all morbidities reported to health facilities (clinics and polyclinics, hospitals). In summary, health facilities capture and aggregate data and submit to sub-districts. Sub-districts aggregate facility summary reports and submit those to districts. Districts then receive both facility and sub-district summary reports for validation. At each stage of the data recording hierarchy, data can be entered directly into the District Health Information Management System (DHIMS). We also obtained population estimates for 2010 to 2014 from the Ghana Statistical Service (GSS). Spatial autocorrelation. We used both global and local Moran's I of spatial autocorrelation to estimate the strength of spatial correlation. The global Moran's I 26 estimates the general strength of spatial autocorrelation among districts while its local equivalent, LISA 27 , estimates the spatial autocorrelation between districts and their neighboring districts. Thus, the local Moran's I identifies districts with high and low risks as well spatial outliers. For the observed counts y i and populations n i for the set of districts = ... . Since the variance is inversely proportional to the population sizes, the required assumption of constant variance is violated and could lead to misleading results of large variances for regions with small populations. We used the empirical Bayesian standardization to account for the unequal variances arising from unequal populations 33  , and 0 otherwise, where i ( ) and j ( ) are the set of boundary points of district i and j, respectively. The m by m weight matrix w ij was row-standardized, and satisfied the following conditions (1) symmetry, i.e., . For the local index, I i , the summation over j implies that only the set of neighbors J i of i, ∈ j J i , was included. To test the null hypothesis of no spatial autocorrelation, we generated 999 independent permutations of the vector ... z z ( , , ) m 1 and computed I and I i for each permutated vector to generate the empirical distribution. The p-value was estimated as the proportion of the number of times the index from the permuted data exceeds the Index from the actual data. We used the spdep 36 package of the R statistical software 37 for estimating both global and local Moran's I. Spatial structure and smoothing. We used geostatistical smoothing to filter out the noise caused by heterogeneous population distribution. Unlike deterministic smoothers 38 , geostatistical smoothing can account for the range of spatial correlation and estimate the associated uncertainties. We assumed that the risks r i are realizations from a second-order stationary random field. Poisson kriging was used to estimate the risk over a given district i 0 as linear combinations of the risk observed for that district r i , and its neighboring districts The assumption of stationarity implies the spatial mean of the prediction locations μ i 0 is the same as the spatial mean of the random variable r i , Under minimum variance, the best linear unbiased estimate of the spatial mean equals ′ ′ . We refer to the vector µ −r ( 1 ) i m as the predictor variables, C i0 as a covariance vector between the prediction location and the predictor variable, and C ij as the covariance matrix of the predictor variables. Essentially, 0 yields the so-called kriging weights. We used the variogram model γˆh ( ) as a structural tool to estimate the covariance function based on the relation , where C ii is the variance of the risk or covariance at lag 0. We used the empirical variogram estimator γ , where = − h i j and N h ( ) is the number of observation pairs separated by the distance h between the centroids of districts i and j. Here, the variogram γ h ( ) depends only on the distance between the centroids of districts, and this refers to the assumption of uniform population density within each district. This is an adjusted experimental variogram estimator proposed by 39,40 , and generalized by Goovaerts 30 for disease mapping to account for heterogeneous populations. The rate differences − r r ( )  [41][42][43] . This implies that the risk with mean μ was decomposed as the sum of local r i,loc and regional R i,reg orthogonal random functions, , each with its particular contributory variogram γˆh ( ) loc and γˆh ( ) reg , respectively. This is useful to unravel scale-dependent spatial autocorrelation patterns. The commonly used variogram models such as the exponential and spherical models have been described elsewhere 43,44 . We fitted local and regional scale nested spherical models γˆh ( ) loc and γˆh ( ) reg , respectively, using weighted least squares with weights In this paper, the variogram modeling was conducted using the public-domain software poisson_kriging.exe 30 .

Results
Spatial autocorrelation. Between 2010 and 2014, a total of 3,310,653 intestinal parasites infections were reported. The annual incidence rates ranged from 1.55% to 3.3%, with an average annual incidence rate of 2.53% ( Table 1). The incidence rate increased from 1.55% in 2010 to 3.3% in 2014, with a slightly lower rate of 3.25% in 2014. We found significant positive spatial autocorrelations throughout 2010 to 2014 (Table 1), indicating that districts of similar risks were spatially clustered. Global autocorrelation was highest in 2010 (I = 0.388, p = 0.01) and lowest in 2012 (I = 0.095, p = 0.01). For easy interpretation, we presented the results of the local spatial autocorrelations as cluster maps based upon four categories: high-high, low-low, high-low, and low-high (Fig. 3). The high-high and the low-low associations indicate clustering of high risk (hot-spots) and low risks, respectively. Both the high-high and low-low associations indicate significant (p ≤ 0.05) clustering of similar risks or positive spatial autocorrelation. The low-high category indicates that high risk districts surround a low risk district, whereas the high-low category indicates that low risk districts surround a high risk district. These are indications of spatial outliers. High-high clustering dominated within the middle belt while low-low clustering dominated within the northern parts. Although we undertook no formal causal relationships since this is an exploratory study, a visual assessment of the local Moran's I maps together with the ecological zones map of Ghana (Fig. 2) was worthwhile. We found that the high-high clusters extended within the semi-deciduous forest and the transitional ecological zones. The low-low clusters on the other hand were concentrated within the northern parts and mostly intersected with the Guinea and Sudan Savannah ecological zones. Few outliers were detected throughout the study period. Spatial structure and smoothing. We computed experimental variograms for each year using 10 km lag distances for 20 lags and fitted nested spherical models (Fig. 4). At 10 km lags, there were enough (≥30) pairs of districts to obtain stable estimates of the variogram. Table 2 shows the parameters of the models fitted to the  experimental variogram. All variograms exhibited two basic structures. The parameter c 0 is the nugget variance which refers to spatially random variation. The variance parameter c 1 refers to an estimate of the amount of spatially structured local (short range) variation within an average range of φ 1 . The variance parameter c 2 refers to an estimate of the amount of spatially structured regional (large range) variation within an average range of φ 2 . The sum = + + sill c c c 0 1 2 refers to the total variation, and %c 0 , %c 1 , %c 2 refers to the proportion of the overall variation explained by c 0 , c 1 , c 2 , respectively. We found that the local spatial variations fell within the range ≈ 34-41 km. The regional spatial variation, however, showed widespread range values, ≈271-900 km, beyond the maximum lag distance used for estimation. For the years 2010 to 2012, γˆh ( ) reg accounted for nearly 70% or more of the variation and γˆh ( ) loc accounted for nearly 30% or less. Conversely, for rates in 2013 and 2014, γˆh ( ) loc accounted for more than 70% whereas γˆh ( ) reg accounted for nearly 25% or less. We also found an increasing trend of the variation accounted for by γˆh ( ) loc from ≈ 13% in 2010 to ≈ 75% in 2014. Maps of the smoothed rates and kriging variance after geostatistical filtering are shown in Figs 5 and 6, respectively. Both the smoothed rates and the variances show clustering of districts with similar estimates. From the smoothed maps, we found that high rates dominate within the middle belt, whereas low rates dominate within the northern parts. As was expected, we found reduced variation and considerable adjustments of rates for districts with smaller populations than districts with larger populations (Fig. 7).

Discussion
We observed several noteworthy insights. We found evidence of global clustering of districts with comparable risks, suggesting the importance of spatially dependent phenomena modulating the spatial heterogeneity in the risk of intestinal worms infection. The observed global patterns imply that neighboring districts have similar  Table 2. Summary of the variogram models and parameters fitted to the experimental variograms. underlying ecological and environmental risk factors that trigger intestinal worms infections 28 . The similarity, however, widely contrasted as local Moran's I illuminated that the high-high values (hot-spots) dominated within the middle belt, whereas low-low values (low risk) within the northern part. One would, on the contrary, expect the hot-spots to occur mainly within the northern parts of Ghana where the socioeconomically less privilege are mostly found. This finding is an unexpected departure from other previous studies 20, [45][46][47] , and suggests that interaction effects of environmental and socioeconomic risk factors that combine best to enhance infection could play a role. Although the study did not build a formal causal explanation model for the clustering patterns, a visual comparison of the patterns with the ecological zones of Ghana generates a working hypothesis that is worth testing in future research efforts. The patterns of the hot-spots are plausible since they widely intersected with the semi-deciduous forest and the transitional ecological zones. High precipitation, which is mostly associated with the semi-deciduous forest and the transitional zones of Ghana, has been found to increase the risk of intestinal parasites 19 . The specific effect on precipitation, however, has been attributed to specific quarters of the year 19,21 . This emphasizes the need for further studies to substantiate this augment in our study area. The low risks on the other hand widely intersect within the Guinea and Sudan Savannah ecological zones. These ecological zones are mostly flat with low precipitation, high temperature, and consist predominantly of grassland. These zones also have much drier soils with the highest land surface temperatures due to their proximity to the Sahel and the Sahara, and likely provide unfavorable environmental and ecological conditions for transmission. Some studies have associated low risk of intestinal worms infection to these unfavorable conditions. For instance 24 , associated low risk of helminth infection with high land surface temperature in Kenya. A study of the spatial distribution of helminth infection across sub-Saharan Africa associated extreme dry soils with the absence of hookworm infections 19 . The patterns of hot-spots and low risks are similar to findings from a study in Cote d'Ivoire, where low risk of schistosomiasis was found in Savannah ecological zones and high risk extended into Forest ecological zones 48 . In a developing country like Ghana, an alternative interpretation of the clustered patterns could be based on variations in the reporting systems. However, the consistency in these patterns throughout the study period suggests that they are less likely to have been caused by only variations in the reporting systems. The Poisson variogram estimator accounted for the effects of population heterogeneities and revealed actual spatial structures that could otherwise have been obscured from the traditional variogram estimator 39,40 . In conjunction with the global and local statistics, nested variograms allowed us to identify the spatial distribution of intestinal parasites infection, showing that spatial variation occurs on two different scales. The average range of the local scale variation was 38 km, suggesting a strong correlation between neighboring districts. The average range of the small scale variation was less than the minimum distance of 68 km within which each district would have at least one adjoining neighbor. This suggests that most of the patterns of the small range structure only cover a district plus its adjoining neighbors. The range of the large-scale variation, ≈271-900 km, was larger than the maximum lag distance probably because of the dependency on ecological processes which operate with marked variation at regional scales larger than the average size of the districts and their higher-order adjoining neighbors. The short range parameters in the second structure of the variogram in 2013 and 2014 could be due to temporal changes in the regional scale ecological processes that affect infections.
Under the assumption that the unknown risk is a spatial stochastic process, Poisson kriging of the spatial risk had the advantage to correct for extreme rates due to small populations. The Poisson kriged risk maps indicated that intestinal parasites infections are spatially varied and widely distributed. High risk markedly persisted within the middle belt and low risks within the northern sector. This partly corroborates with data presented in a previous 23 which found low hookworm egg counts (<1egg g −1 ) mostly in the northern parts and high hookworm egg counts (>30 eggs g −1 ) within central parts. Temporal changes in the spatial patterns over the years have been marginal, probably because the risk factors have generally remained static over time. Further studies to substantiate this argument will be valuable.
Our study still has some limitations. The first limitation relates the data. Data from the CHIM likely have recording gaps due to the voluntary reporting nature. Most reporting facilities (hospitals, clinics) lack diagnostic apparatus for proper biological confirmation of infection and hence rely on symptomatic diagnosis. However, we share the same opinion with Julian 49 that imperfect information is likely more useful to intervention design than no information. Secondly, the morbidity data covered large heterogeneous districts. Although making inferences at district level was our interest, centroid based Poisson kriging makes an implicit assumption of homogenous population and morbidity distribution within districts. This is overly simplistic and could have affected our final smoothed maps. Further studies using rigorous statistical estimations are required to attenuate possible misspecification.

Conclusions
This study demonstrated the use of spatial statistical methods such as cluster analysis and geostatistical smoothing to explore and elucidate the spatial patterns of district level intestinal parasites infections. Local and global Moran's I estimated and mapped spatial clustering of intestinal parasites. Our findings regarding global Moran's I indicated a non-random spatial distribution of internal parasites infection, and prompt for further studies to investigate and enumerate possible environmental and socioeconomic factors that could account for such patterns. Local Moran's I cluster maps are essential for guiding public health officials to develop cost-effective control measures and could ensure that control programs are focused appropriately. In consequence of our findings, health professionals should pay more intervention attention to the hot-spots locations. Besides, the findings regarding the pattern of global and local autocorrelations are important steps in a process leading to a proper model for intestinal worms infection in the future. Finally, the study demonstrated the usefulness of geostatistics for filtering out noise caused by heterogeneous populations, which is important for low morbidities recorded in areas with low population sites. The geostatistical risk maps provided knowledge of the spatial distribution of intestinal parasites infections Ghana. We intend to investigate issues of spatial support further and varying population and morbidity distribution within districts in the future.