Temporal trend and spatial clustering of cholera epidemic in Kumasi-Ghana

Knowledge of the temporal trends and spatial patterns will have significant implications for effective preparedness in future epidemics. Our objective was to investigate the temporal trends and the nature of the spatial interaction of cholera incidences, dwelling on an outbreak in the Kumasi Metropolis, Ghana. We developed generalized nonparametric and segmented regression models to describe the epidemic curve. We used the pair correlation function to describe the nature of spatial clustering parameters such as the maximum scale of interaction and the scale of maximal interaction. The epidemic rose suddenly to a peak with 40% daily increments of incidences. The decay, however, was slower with 5% daily reductions. Spatial interaction occurred within 1 km radius. Maximal interaction occurred within 0.3 km, suggesting a household level of interactions. Significant clustering during the first week suggests secondary transmissions sparked the outbreak. The nonparametric and segmented regression models, together with the pair correlation function, contribute to understanding the transmission dynamics. The issue of underreporting remains a challenge we seek to address in future. These findings, however, will have innovative implications for developing preventive measures during future epidemics.

The empirical epidemic curve. To describe the nature of the epidemic curve, we developed a generalized nonparametric model for the daily incidences chol t , t = 1,…,.T 20,21 . We modeled chol t as realizations from the Poisson distribution, chol t ~ Poisson(ϑ t µ t ), where µ t is the mean and ϑ t is a positive-valued random variable to account for over-dispersion, a situation where E(chol|t) < V(chol|t). Here, ϑ t has mean equal 1 and variance a (the over-dispersion parameter) such that the marginal mean and variance are E(chol|t) = µ and V(chol|t) = µ + aµ 2 . Since the Gamma distribution is a conjugate of the Poisson distribution we chose ϑ ~ Gamma(1/a,1/a, one-parameter Gamma distribution with mean E(ϑ) = 1 and variance V(ϑ) = a. In so doing, the marginal distribution of chol is equivalent to the negative-binomial distribution chol t ~ NB(µ t ,a). We used the canonical log-link function to linearize the expected counts log(E[chol t ]) = log(µ t ) through the predictor log(µ t ) β 0+f (t)., where f(t) is a nonlinear function of time t and β 0 is the intercept. For the function f(t), we used penalized splines with truncated power basis functions   { } , , , , p by maximizing the Penalized Quasi-likelihood 23 . Since the degree of the basis functions can influence the predictive performance, we fitted seven different models of varying degrees = … p 1, , 6. The same number of knots M were used for all the models. We used the chi-square goodness-of-fit (GOF) test on the null and model (residual) deviances to assess the adequacy of the models. We estimated the predictive performances using the generalized cross-validation (GCV) method where S p . is a smoother matrix and dependent upon the degree p of the basis functions. The model that minimizes GCV(p) is the model with the best predictive performance. We used the R statistical software 24 for this modeling.
Epidemic growth and decay. We fitted a segmented Poisson regression model with an unknown single time break-point κ peak to assess the difference in gradients between the epidemic growth and decay and to separate cases which occurred during the growth and decay periods for further analysis. The results of the models for the epidemic curve in the previous section give an indication that a sgle time break-point separates the epidemic growth and decay. Since this break-point is not known a priori, we expressed where the parameter β 1 is the growth gradient and β 2 is the difference in gradients between the growth and decay. This implies β 1 + β 2 is the decay gradient. We reparametrized the model as and estimated the parameters iteratively until the indicator function − I() converged to zero. Thus a generalized linear model is fitted at each iteration and the break-point value is updated via κ κ δ β = + κˆˆ/ peak peak 2 , where δ κ measures the estimated gap between the two fitted lines 25,26 . Convergence is achieved when the break-point gap δ κ becomes close to zero; here, κ peak is the optimal time at which the epidemic reaches a peak. In order to check the adequacy of the model fit, we employed the Davies test of hypothesis 25,27 on the break-point to determine if the difference in slopes βˆ2 is significantly different from zero. We further used the estimated break-point time κ peak to dichotomize cases into growth and decay. We used the Segmented 25 package of the R statistical software 24 for this modeling. We used the chi-square GOF test on the null and model deviances to assess the model adequacy.

Spatial interactions.
To estimate the maximum scale of interaction and the scale of maximal interaction, we used the pair correlation function g r ( ). If the occurrences of cholera cases interact, the number of cases around any chosen case within a radius r will be more than expected, an exhibition of spatial interaction. In order to test this, we considered the occurrences of cholera cases , where λ v , λ u and ρ v u ( , ) are the first and second order intensities, respectively. Here, |∆ | v and |∆ | u are the areas of the regions ∆v and ∆u, and ∆ N v ( ) and ∆ N v ( ) are the numbers of cholera cases within the regions, respectively. The first order intensity describes the spatial inhomogeneity whereas the second order intensity describes the spatial interaction between cases. We estimated the empirical pair correlation function σ g r ( , ) using where e uv is an edge correction weight, γ σ is a one-dimensional kernel function with smoothing bandwidth σ > 0, λˆv and λˆu are estimated intensities at locations v and u and depend on the Euclidean distance = − r u v between occurrences chol u and chol v . We chose Ripley's isotropic edge correction which expresses e uv as the fraction of the length of the circle of radius − u v lying within W. Thus if the circle centered on i with radius − u v is completely within the study plot, = e 1 uv ; otherwise, it is the proportion of that circle's circumference within the plot 28 . The Epanechnikov kernel provides asymptotically optimal convergence rate for both the mean integrated square error and the mean square error, hence it is used in this study. Although other alternatives like truncated quadratic functions exist, the optimality of the Epanechnikov kernel has established it as the usual choice in point pattern analysis 29 . We used the denominator − u v instead of the usual radii r because for small radii r, the variance of σ g r ( , ) becomes infinity. It is conceivable that the first-order intensities λˆv and λˆu are affected by location-specific environmental conditions, hence they were estimated as a function of location. Thus, we estimated the intensity at any location u of neighboring cases Under the null hypothesis of no interaction between cholera cases, σ = g r ( , ) 1; whereas σ > g r ( , ) 1 indicates interaction and σ < g r ( , ) 1 indicates inhibition or repulsion between cases. We used the spatstat package 29 of the R statistical software 24 for estimating σ g r ( , ).

Results and Analysis
The empirical epidemic curve. The plot of the empirical epidemic curves and the standard error bands are shown in Fig. 2. We fitted seven different models for the epidemic curve for = … p 1, , 6. We obtained adequate fit at 5% significance level for all models under the chi-square GOF test as the residual deviances were less than the upper chi-square critical values (Table 1). We also compared the predictive performances of the models using GCV(p). Although higher order polynomials are expected to produce smoother curves, we observed systematic decreases in the prediction performance with increasing p for Model 2 (p = 2) to Model 6 (p = 6). The linear polynomial model, Model 1 (p = 1), has larger GCV(p) than Models 2 and 3 and thus has a greater bias because it is continuous, but not differentiable at the knots. Although this model is under-smoothed and has a less aesthetic appearance, its estimates are unexpectedly superior to those with p > 3. Model 2 (p = 2), the quadratic polynomial model, has the highest predictive performance as its smoothing matrix comparatively minimizes GCV(p) (Fig. 3). All the modeled curves are comparable in shape, except Model 3 which appears to remain constant with time towards the end. Dwelling on Model 2, the resulting prediction model equation for the epidemic curve is  or 40% increments in the daily number of new cases (Fig. 2). The decay of the epidemic was characterized by a significant decreasing gradient of β β + = − . 0 047 1 2 (p value < 0.001) on the log scale, indicating a less rapid decline with a multiplicative effect of = .
β β + e 0 95 1 2 or 5% reductions in the daily number of new cases.

Spatial interactions.
The estimated average intensity was 4.4 cases km −2 . The intensity of cases varied heterogeneously across the study window with an outward decreasing trend from the central part, ranging from ≈ 0.5 cases per km 2 within the peripheries to ≈13 cases per km 2 towards the central parts (Fig. 4). Varied intensities were observed between the weekly incidences (Figs 5 and 6). The weekly occurrences also showed high intensities within the central parts with systematic reductions towards the peripheries (Fig. 6). The epidemic center (center of mass of the locations of cases) remained virtually the same and deviated only marginally from the index case location during the growth period, indicating stationarity. We observed similar outward decreasing trend patterns of spatial intensities for the growth and the decay periods (Figs 5 and 6). The σ g r ( , ) curves for the whole epidemic period showed the existence of an interaction between cholera cases within 1 km range, as the curve is well above the line of complete spatial randomness at this distance (Fig. 4). For distances greater than 1 km, the interaction decreases to repulsion or inhibition as the curve falls below the   line of complete spatial randomness. We observed a short-range distance of maximal interaction as low as ≈0.3 km (Fig. 4). Significant levels of interaction were within ≈1 km range for the first week of the epidemic (week 1) through to week 10 as the σ g r ( , ) curves were all well above the line of complete randomness (Fig. 7). Similar maximal distances of interaction and distances of maximal interaction were observed to be ≈1 km and ≈0.3 km, respectively, regardless of the specific week of the epidemic, except the last week. The levels of interaction, however, differed with the highest observed in the third week (Fig. 7). The observed levels of interaction for the growth period were particularly higher than those for the decay period as well as for the entire epidemic period (Fig. 4).

Discussion
The nonparametric model for the daily counts illuminated the nature of the epidemic curve, whereas the segmented regression model showed the dissimilarities between the growth and decay gradients. The epidemic was characterized by a rapid rise to a peak and then a slower decline in the number of occurrences. The growth and decay gradients were markedly different with a relatively prolonged and reduced rate of the decay. The shortened length of the growth at just the 18 th day of the epidemic can be attributed to early public health interventions after notice and declaration of the cholera outbreak. As early as the second week of the epidemic, there were various health promotion campaigns and education on both radio, television and print media by the DCU. Such early response is characteristic of the DCU of Ghana. Though, if it were complemented with improved hygiene by households, and improved water and sanitation by the government, the cases could have been reduced to the barest minimum.
The epidemiological implication of the rapid growth to the peak may include lack of awareness and knowledge of the disease when V. cholerae is suddenly present in the population. Thus, during the growth period, fewer people in the population are aware of an outbreak; hence, precautionary measures would be less, leading to high susceptibility. The rapid growth could have also been induced by the dominant role of secondary cholera transmissions due to the hyper-infective nature of the bacteria after excretion from infected people. Studies have suggested differences in the genes of V. cholerae responsible for primary transmissions (i.e., those from the aquatic environment) and those responsible for secondary transmissions (i.e. those excreted from infected individuals). When inoculated into the intestines of mice via gavage feeding, freshly shed V. cholerae greatly out-competes bacteria grown in vitro, by as much as 700-fold [30][31][32] . The cholera bacteria become hyper-infective when excreted from infected persons and are thought to be responsible for faster bacterial growth in the gastrointestinal tract and increased shedding. In a mathematical model to understand the transmission dynamics, Mukandavire et al. 33 also found that secondary transmissions contributed to the sustenance of cholera in Zimbabwe, a landlocked country. Estimates of the maximum scale of interaction, the scale of maximal interaction, and the week the interaction began convey information that reflects the transmission mechanisms during the outbreak. Such information indicates the prospects for effective control of future cholera outbreaks and for designing targeted surveillance programs. Although the interaction of incidences was significant within ≈1 km range, its dominance within ≈0.3 km suggests high occurrences of short distance transmission mechanism. This implicates household-level characteristics and contacts as enhancing one's vulnerability to infection, suggesting the dominant role of secondary transmissions similar to that seen in earlier studies elsewhere 34,35 . A probable explanation regarding short distance transmission mechanisms is the prevalence of domestic/household water storage due to intermittent water supply in the Kumasi Metropolis. Unhygienic handling practices of stored water by households can exacerbate cholera infection, as suggested to be the case in other African countries like Malawi 36 , Guinea-Bissau 37,38 , and Kenya [39][40][41] .
Significant spatial interaction occurred at the beginning (within the first week) of the epidemic, signifying the responsible role of secondary transmission in sparking the outbreak. This explanation is deduced from Miller at al 42 who suggested that no spatial interaction should exist between cases when primary transmission sparks the epidemic. This aspect of secondary transmission sparking the epidemic arguably suggests the absence of primary transmission during the outbreak. This diverges from the roles of primary and secondary transmissions observed in coastal endemic regions 8,32,42,43 . Additional support for this argument derives from the stationarity of the epidemic center during the growth period, reflecting that cholera widely spread from its initial source of contamination until its peak 8 . Based on our argument that secondary transmission sparked the outbreak, one could only suggest that the initial cholera case must have been imported from elsewhere, probably near the coastal regions of Ghana. A rebuttal, however, should be supported by successful isolation of V. cholerae in our study area during inter-epidemic periods. However, in inland regions where cholera outbreaks are sporadic, the environmental reservoir where the cholera vibrios retreat to after epidemics remains elusive. In fact, isolation of the cholera vibrios in several inland areas have scarcely been successful during inter-epidemic periods 44,45 . During an epidemic period in Tanzania, V. cholerae was isolated from water samples from the Lumemo River, but not from any other water source 46 , indicating difficulties of the bacterial establishing natural environmental reservoirs in inland regions. That said, this finding diverges from the initial interpretation of cholera dynamics in coastal endemic regions by other authors where initial cholera cases are expected to be random without any apparent connection 8,42 . For instance, in coastal endemic regions like the Matlab area of Bangladesh, a hypothesized dispersal pattern of primary transmissions during epidemics has been established 8 . There, primary transmission sparks the outbreak at several distance locations whereby secondary transmission follows and dominates with the clustering of cases at relatively small scales. Besides, the isolation of both toxigenic and viable but nonculturable vibrios during inter-epidemic periods has widely been successful in such environments [47][48][49][50] .
Notwithstanding the significance of this study, some limitations should be mentioned. First, underreporting of cholera cases is a potential limitation. The failure of asymptomatic carriers or infections with mild symptoms to seek medical attention could lead to lower than actual cases of cholera. Due to a large number of cases reporting to the health facilities, not all cases could be biologically confirmed before treatment, and this could also lead to over-reporting of cases. Possible inherent data quality issues with respect to the accuracy of the GPS coordinates have not been considered in the study.

Conclusions
This study presented a generalized nonlinear model for the epidemic curve of cholera, characterized the epidemic growth and decay gradients, and estimated the levels of spatial interaction at varying distance scales. The study provides intuitive inferences as we appropriately captured the underlying empirical structure of cholera. It can be extended to include both fixed and time-varying covariates. The pair correlation function was favorable to address the heterogeneous intensities and the clustering characteristics such as the maximal distance of interaction and the distance of maximal interaction. We found evidence of secondary transmissions in initiating the epidemic. Interactions between cases were within 1 km scale and were dominated within households. Thus, strengthening household level sanitation practices is critical in reducing infections from infected people. An additional conclusion regarding the consequences of the role of secondary transmission relates to strengthening surveillance to restrict cholera importation by infected individuals. Correcting for underreporting of cholera cases in studies like this remains an area of ongoing research which we seek to investigate in future 51 . Improvements in public health surveillance are key to reducing underreporting. To summarize, this study has shown the usefulness of generalized nonparametric and segmented regression models, and of the pair correlation function in extracting key epidemiological information. The results of this study may have important implications for public health decision making for developing effective cholera outbreak preparedness strategies.