Introduction

Cholera remains a public health threat in many developing countries despite remarkable research progress. Out of the 42 countries reporting cholera cases in 2014, 19 countries in Africa reported 55.25% of the cases and 84.36% of the deaths1. Ghana together with Nigeria, the Democratic Republic of Congo, Haiti, and Afghanistan recorded 84% of the 2014 cases worldwide1. In both endemic and non-endemic regions, effective cholera preparedness plans during outbreaks are based on lessons learned from previous outbreaks. Key questions include the nature of the epidemic curve, the speed of the spread, the transmission route, and clustering characteristics, i.e. scale of maximal clustering and maximal scale clustering. Studying the temporal trends and spatial heterogeneities can provide answers to these questions. This can also infer relevant environmental and climatic risk factors, examine etiological hypothesis and routes of transmission2,3,4,5.

Africa has largely reported the majority of cholera epidemics and deaths since the seventh pandemic established its new home on the continent6. A systematic review on the environmental determinants of cholera in Africa showed that inland (non-coastal) epidemics constituted a major part of the continental burden7. More than three-quarters of all cholera cases reported in sub-Saharan Africa in 2009–2011 affected inland regions. Thus, inland cholera is emerging as a relevant and major epidemiological concern in Africa. This stimulates efforts to understand the spatiotemporal heterogeneities of inland cholera epidemics. Yet the endemic and epidemic cholera dynamics have mostly been studied in coastal areas such as Bangladesh8,9,10, India11, Mexico12, Peru13, and some coastal African countries14,15,16,17,18 where there is close contact between infected populations and the estuarine (or riverine) environment.

In this study, we investigate the temporal trend and spatial clustering of cholera, drawing on the data collected during an epidemic in Kumasi, Ghana. Our objectives are to answer the following epidemiological questions: (1) what is the nature of the epidemic curve, and how different are the epidemic growth and decay gradients? (2) what is the maximum scale of spatial interaction and the scale of maximal interaction? We reason that answers to these questions will have important implications for effective cholera outbreak preparedness. We anticipate the spatial interaction to indicate higher than expected neighboring cases even at the beginning of the epidemic week8. The remainder of the manuscript is structured as follows; the next section describes the study area and the data. This is followed by a description of the statistical approaches, and then the results and analysis. The manuscript ends with some discussion and conclusions.

Methods

Study area and data

Kumasi is an inland city located in the south-central part of Ghana, 250 km northwest of Accra (Fig. 1). Ghana has been a known country of occasional cholera outbreaks since the 1970’s19. The recent outbreak in 2014 which recorded over 28,000 cases was mainly concentrated in coastal districts and their neighboring districts with only marginal cases in inland districts. Kumasi has occasionally been hit by a series of cholera outbreaks of which the 2005 outbreak has been the most severe. Hence our study focused on the 2005 outbreak. This outbreak started from the last week of September, lasting for a period of 72 days, which was the rainy season. The first confirmed case was recorded on 29th September 2005. In Ghana, it is mandatory for all reporting facilities (i.e. hospitals, clinics, and community volunteers) to report cholera cases to the Disease Control Unit (DCU). The DCU is purposely established to ensure effective surveillance of all communicable diseases (personal communication with the head of DCU, Ashanti region). A case definition of cholera was based on the WHO’s definition of clinical diagnosis which depends on whether or not the presence of cholera has been demonstrated in the area. The first case of cholera, however, was confirmed by bacteriological tests (personal communication with DCU director). In this study, only cholera cases made known to the Kumasi Metropolitan DCU through reporting facilities such as community volunteers, community clinics, and hospitals were used. The data obtained consisted of individual surveillance and laboratory records of cholera. The DCU registered cases with the following information: name, date of onset, date of reporting, gender, locality (community or suburb), sub-locality (description of residence), and age. For the preservation of confidentiality, we ensured the DCU officer in charge deleted the names of affected individuals before receipt of the data. The exact residential addresses of the cases were not recorded, however, the sub-locality field provided information regarding descriptions of their residencies. We determined the locations (mostly sub-localities) of the cases using a Global Positioning System (GPS). The geographic coordinates (latitudes and longitudes) in the WGS 84 datum were then transformed into the Ghana Transverse Mercator (GTM) coordinate system using a transformation program written by one of the authors (FBO). For cases that we could not trace their sub-localities in the database, we assigned the centroids of their localities of residence. Complete spatiotemporal information existed for the 1166 cholera cases we obtained from the DCU. Since the data were secondary, properly anonymized and informed consent was obtained by the DCU at the time of original data collection, ethical approval was not required.

Figure 1
figure 1

Map of Ghana and its neighbors (left), and Kumasi (right). This map was created using ArcGIS software (version 10.1, ESRI Inc. Redlands, CA, USA. https://www.esri.com/).

The empirical epidemic curve

To describe the nature of the epidemic curve, we developed a generalized nonparametric model for the daily incidences cholt, t = 1,…,.T20,21. We modeled cholt as realizations from the Poisson distribution, cholt ~ Poisson(ϑtµt), where µt is the mean and ϑt is a positive-valued random variable to account for over-dispersion, a situation where E(chol|t) < V(chol|t). Here, ϑt has mean equal 1 and variance a (the over-dispersion parameter) such that the marginal mean and variance are E(chol|t) = µ and V(chol|t) = µ + aµ2. Since the Gamma distribution is a conjugate of the Poisson distribution we chose ϑ ~ Gamma(1/a,1/a, one-parameter Gamma distribution with mean E(ϑ) = 1 and variance V(ϑ) = a. In so doing, the marginal distribution of chol is equivalent to the negative-binomial distribution cholt ~ NB(µt,a). We used the canonical log-link function to linearize the expected counts log(E[cholt]) = log(µt) through the predictor log(µt) β0+f (t)., where f(t) is a nonlinear function of time t and β0 is the intercept. For the function f(t), we used penalized splines with truncated power basis functions \(1,\,t,{t}^{2},\ldots ,{t}^{p}\), \({{(t-{\kappa }_{1})}^{p}}_{+}\,,\ldots ,\,{(t-{\kappa }_{M})}_{+}^{p}\,\) of degree p. Such basis functions with p ≥ 2 have continuous first derivatives without sharp corners resulting in an aesthetically appealing fit. This results in the pth degree spline model of the m = 1, …, M ascending clustered knots, \({\kappa }_{1},\ldots ,{\kappa }_{M}\), with M = max(T/4,20)22, that cover the ace of t. Thus,

$$\mathrm{log}({\mu }_{t})={\beta }_{0}+{\beta }_{1}{t}_{t}+{\beta }_{1}{t}_{t}^{2}+\cdots +{\beta }_{p}{t}_{t}^{p}+{\sum }_{m=1}^{M}{\beta }_{pm}{(t-{\kappa }_{m})}_{+}^{p}$$
(1)

where \({(t-{\kappa }_{k})}_{+}=(t-{\kappa }_{m})\times I(t > {\kappa }_{m})\) with \(I(\cdot )\) the indicator function equals to one if t > κ and zero otherwise. With \({\bf{t}}=[1\,{t}_{t}^{1}{t}_{t}^{2},\ldots ,{t}_{t}^{p}]\) and \({\bf{Z}}=[{(t-{\kappa }_{1})}_{+}^{p},\ldots ,{(t-{\kappa }_{M})}_{+}^{p}]\). as vector-matrices, we reparametrized the model as a generalized mixed model representation \(\mathrm{log}({\boldsymbol{\mu }})={\bf{t}}\hat{{\boldsymbol{\beta }}}+{\bf{Z}}\hat{{\bf{b}}}\), and estimated the parameters \(\hat{{\boldsymbol{\beta }}}=\{{\beta }_{0},{\beta }_{1},{\beta }_{2},\ldots ,{\beta }_{p}\}\), \(\hat{{\bf{b}}}=\{{\beta }_{p1},\ldots ,{\beta }_{pM}\}\) by maximizing the Penalized Quasi-likelihood23. Since the degree of the basis functions can influence the predictive performance, we fitted seven different models of varying degrees \(p=1,\ldots ,6\). The same number of knots M were used for all the models. We used the chi-square goodness-of-fit (GOF) test on the null and model (residual) deviances to assess the adequacy of the models. We estimated the predictive performances using the generalized cross-validation (GCV) method

$$GCV(p)=\sum _{i=1}^{n}{(\frac{\{(I-{S}_{p})\mathrm{log}cho{l}_{t}\}i}{1-{n}^{-1}tr({S}_{p})})}^{2}$$
(2)

where Sp. is a smoother matrix and dependent upon the degree p of the basis functions. The model that minimizes GCV(p) is the model with the best predictive performance. We used the R statistical software24 for this modeling.

Epidemic growth and decay

We fitted a segmented Poisson regression model with an unknown single time break-point \({\kappa }_{peak}\,\)to assess the difference in gradients between the epidemic growth and decay and to separate cases which occurred during the growth and decay periods for further analysis. The results of the models for the epidemic curve in the previous section give an indication that a sgle time break-point separates the epidemic growth and decay. Since this break-point is not known a priori, we expressed

$$\mathrm{log}({\mu }_{t})={\beta }_{0}+{\beta }_{1}{t}_{t}+{\beta }_{2}{({t}_{t}-{\kappa }_{peak})}_{+}$$
(3)

where the parameter β1 is the growth gradient and β2 is the difference in gradients between the growth and decay. This implies β1 + β2 is the decay gradient. We reparametrized the model as

$$\mathrm{log}({\mu }_{t})={\beta }_{0}+{\beta }_{1}{t}_{t}+{\beta }_{2}{({t}_{t}-{\kappa }_{peak})}_{+}+{\delta }_{\kappa }I{({t}_{t}-{\kappa }_{peak})}^{-}$$
(4)

and estimated the parameters iteratively until the indicator function \(\,I{()}^{-}\) converged to zero. Thus a generalized linear model is fitted at each iteration and the break-point value is updated via \({\hat{\kappa }}_{peak}={\hat{\kappa }}_{peak}+{\hat{\delta }}_{\kappa }/{\hat{\beta }}_{2}\), where \({\hat{\delta }}_{\kappa }\) measures the estimated gap between the two fitted lines25,26. Convergence is achieved when the break-point gap \({\hat{\delta }}_{\kappa }\,\)becomes close to zero; here, \({\hat{\kappa }}_{peak}\,\)is the optimal time at which the epidemic reaches a peak. In order to check the adequacy of the model fit, we employed the Davies test of hypothesis25,27 on the break-point to determine if the difference in slopes \({\hat{\beta }}_{2}\) is significantly different from zero. We further used the estimated break-point time \({\hat{\kappa }}_{peak}\) to dichotomize cases into growth and decay. We used the Segmented25 package of the R statistical software24 for this modeling. We used the chi-square GOF test on the null and model deviances to assess the model adequacy.

Spatial interactions

To estimate the maximum scale of interaction and the scale of maximal interaction, we used the pair correlation function \(g(r)\). If the occurrences of cholera cases interact, the number of cases around any chosen case within a radius \(r\) will be more than expected, an exhibition of spatial interaction. In order to test this, we considered the occurrences of cholera cases \(cho{l}_{i}=({x}_{1i},{x}_{2i})\) as realizations of a spatial point process within the window \({\boldsymbol{W}}\), where \(({x}_{1i},{x}_{2i})\) are the geographic coordinates. Then for the infinitesimal regions \({\rm{\Delta }}v\) and \({\rm{\Delta }}u\) centered on locations \(v\) and \(u\) within \({\boldsymbol{W}}\), the theoretical pair correlation function is \(\,g(r)={\rho }_{(v,u)}/{\lambda }_{v}{\lambda }_{u}\), where \({\lambda }_{v}\), \({\lambda }_{u}\) and \({\rho }_{(v,u)}\,\)are the first and second order intensities, respectively. Here, \(|{\rm{\Delta }}v|\,\)and \(|{\rm{\Delta }}u|\) are the areas of the regions \({\rm{\Delta }}v\) and \({\rm{\Delta }}u\), and \(N({\rm{\Delta }}v)\) and \(N({\rm{\Delta }}v)\) are the numbers of cholera cases within the regions, respectively. The first order intensity describes the spatial inhomogeneity whereas the second order intensity describes the spatial interaction between cases. We estimated the empirical pair correlation function \(\hat{g}(r,\,\sigma )\,\)using

$$\hat{g}(r,\,\sigma )=\frac{1}{2\pi }\sum _{u=1}\sum _{v\ne u=1}\frac{{\gamma }_{\sigma }(r-\parallel u-v\parallel )}{{e}_{uv}u\parallel u-v\parallel {\hat{\lambda }}_{v}{\hat{\lambda }}_{u}}$$
(5)

where \({e}_{uv}\) is an edge correction weight, \({\gamma }_{\sigma }\,\)is a one-dimensional kernel function with smoothing bandwidth \(\sigma > 0\), \({\hat{\lambda }}_{v}\) and \({\hat{\lambda }}_{u}\,\)are estimated intensities at locations v and u and depend on the Euclidean distance \(\,r=u-v\) between occurrences cholu and cholv. We chose Ripley’s isotropic edge correction which expresses euv as the fraction of the length of the circle of radius \(u-v\) lying within \({\boldsymbol{W}}\). Thus if the circle centered on i with radius \(u-v\) is completely within the study plot, \({e}_{uv}=1\); otherwise, it is the proportion of that circle’s circumference within the plot28. The Epanechnikov kernel provides asymptotically optimal convergence rate for both the mean integrated square error and the mean square error, hence it is used in this study. Although other alternatives like truncated quadratic functions exist, the optimality of the Epanechnikov kernel has established it as the usual choice in point pattern analysis29. We used the denominator \(u-v\) instead of the usual radii r because for small radii r, the variance of \(\hat{g}(r,\,\sigma )\,\)becomes infinity. It is conceivable that the first-order intensities \({\hat{\lambda }}_{v}\) and \({\hat{\lambda }}_{u}\) are affected by location-specific environmental conditions, hence they were estimated as a function of location. Thus, we estimated the intensity at any location u of neighboring cases choli, i = 1, …, m within the bandwidth σ as \({\hat{\lambda }}_{u}={\sum }_{i=1}^{m}{e}_{iu}^{-1}{\gamma }_{\sigma }(u-{x}_{i})\)29. Under the null hypothesis of no interaction between cholera cases, \(\hat{g}(r,\,\sigma )=1\); whereas \(\hat{g}(r,\,\sigma ) > 1\) indicates interaction and \(\hat{g}(r,\,\sigma ) < 1\) indicates inhibition or repulsion between cases. We used the spatstat package29 of the R statistical software24 for estimating \(\hat{g}(r,\,\sigma )\).

Results and Analysis

The empirical epidemic curve

The plot of the empirical epidemic curves and the standard error bands are shown in Fig. 2. We fitted seven different models for the epidemic curve for \(\,p=1,\ldots ,6\). We obtained adequate fit at 5% significance level for all models under the chi-square GOF test as the residual deviances were less than the upper chi-square critical values (Table 1).

Figure 2
figure 2

Epidemic curves of the generalized nonlinear models for p = 1,…,7 (solid lines) and the segmented regression (dashed lines). This graph was created using the R statistical software (version 3.4.2, https://cran.r-project.org).

Table 1 Parameters of the chi-square GOF test for model adequacy assessment.

We also compared the predictive performances of the models using GCV(p). Although higher order polynomials are expected to produce smoother curves, we observed systematic decreases in the prediction performance with increasing p for Model 2 (p = 2) to Model 6 (p = 6). The linear polynomial model, Model 1 (p = 1), has larger GCV(p) than Models 2 and 3 and thus has a greater bias because it is continuous, but not differentiable at the knots. Although this model is under-smoothed and has a less aesthetic appearance, its estimates are unexpectedly superior to those with p > 3. Model 2 (p = 2), the quadratic polynomial model, has the highest predictive performance as its smoothing matrix comparatively minimizes GCV(p) (Fig. 3). All the modeled curves are comparable in shape, except Model 3 which appears to remain constant with time towards the end. Dwelling on Model 2, the resulting prediction model equation for the epidemic curve is \(\mathrm{log}({\mu }_{t})={\beta }_{0}+{\beta }_{1}{t}_{t}+{\beta }_{2}{t}_{t}^{2}+{\sum }_{m=1}^{2}{\beta }_{3m}{(t-{\kappa }_{m})}_{+}^{2}\).

Figure 3
figure 3

Generalized cross-validation curve for model predictive performances. Lower cross-validation value implies better predictive performance. This graph was created using the R statistical software (version 3.4.2, https://cran.r-project.org).

Epidemic growth and decay

The null and model deviances for the segmented regression model were 356.5 and 69.7 under 68 and 65 degrees of freedom (DF), respectively. Under the null hypothesis that our model is correctly specified, the model deviance of 69.7 falls below the critical region at 5% significance level on 65 DF of the chi-square distribution. Hence, we conclude that our model adequately fits the data based on the chi-square GOF test. This model indicates that the difference in gradients between the growth and decay (\({\beta }_{2}=-\,0.387\), p value < 0.001) is significantly different from zero (Table 2). The estimated break-point time was \(\,{\hat{\kappa }}_{peak}=17.7\) (standard error = 0.374) and approximated to the 18th day. The growth of the epidemic was characterized by a significant increasing gradient of \({\beta }_{1}=0.340\) on the log scale (p value < 0.001), indicating a multiplicative effect of \({e}^{{\beta }_{1}}=1.40\) or 40% increments in the daily number of new cases (Fig. 2). The decay of the epidemic was characterized by a significant decreasing gradient of \({\beta }_{1}+{\beta }_{2}=-\,0.047\) (p value < 0.001) on the log scale, indicating a less rapid decline with a multiplicative effect of \({e}^{{\beta }_{1}+{\beta }_{2}}=0.95\) or 5% reductions in the daily number of new cases.

Table 2 Parameter estimates of the segmented Poisson regression model.

Spatial interactions

The estimated average intensity was 4.4 cases km−2. The intensity of cases varied heterogeneously across the study window with an outward decreasing trend from the central part, ranging from ≈ 0.5 cases per km2 within the peripheries to ≈13 cases per km2 towards the central parts (Fig. 4). Varied intensities were observed between the weekly incidences (Figs 5 and 6). The weekly occurrences also showed high intensities within the central parts with systematic reductions towards the peripheries (Fig. 6). The epidemic center (center of mass of the locations of cases) remained virtually the same and deviated only marginally from the index case location during the growth period, indicating stationarity. We observed similar outward decreasing trend patterns of spatial intensities for the growth and the decay periods (Figs 5 and 6).

Figure 4
figure 4

Spatial intensities and pair correlation curve for the whole epidemic period. Spatial distribution of the heterogeneous intensities for the whole epidemic period (Top) and the corresponding pair correlation curve (Bottom). The dashed line represents the line of complete spatial randomness. These graphs were created using the R statistical software (version 3.4.2, https://cran.r-project.org).

Figure 5
figure 5

Spatial intensities and pair correlation curves for epidemic growth, decay. Spatial distribution of the heterogeneous intensities for the epidemic growth, decay (Top) and their corresponding pair correlation curves (Bottom). The dashed lines represent the line of complete spatial randomness. These graphs were created using the R statistical software (version 3.4.2, https://cran.r-project.org).

Figure 6
figure 6

Spatial distribution of the heterogeneous intensities for weeks 1 to 11. The symbol “*” indicates the location of the index case and “+” indicates the epicenter. These were created using the R software statistical software (version 3.4.2, https://cran.r-project.org).

The \(\hat{g}(r,\,\sigma )\) curves for the whole epidemic period showed the existence of an interaction between cholera cases within 1 km range, as the curve is well above the line of complete spatial randomness at this distance (Fig. 4). For distances greater than 1 km, the interaction decreases to repulsion or inhibition as the curve falls below the line of complete spatial randomness. We observed a short-range distance of maximal interaction as low as ≈0.3 km (Fig. 4). Significant levels of interaction were within ≈1 km range for the first week of the epidemic (week 1) through to week 10 as the \(\hat{g}(r,\,\sigma )\) curves were all well above the line of complete randomness (Fig. 7). Similar maximal distances of interaction and distances of maximal interaction were observed to be ≈1 km and ≈0.3 km, respectively, regardless of the specific week of the epidemic, except the last week. The levels of interaction, however, differed with the highest observed in the third week (Fig. 7). The observed levels of interaction for the growth period were particularly higher than those for the decay period as well as for the entire epidemic period (Fig. 4).

Figure 7
figure 7

Pair correlation curves for weeks 1 to 11. The dashed lines represent the line of complete spatial randomness. These were created using the R statistical software (version 3.4.2, https://cran.r-project.org).

Discussion

The nonparametric model for the daily counts illuminated the nature of the epidemic curve, whereas the segmented regression model showed the dissimilarities between the growth and decay gradients. The epidemic was characterized by a rapid rise to a peak and then a slower decline in the number of occurrences. The growth and decay gradients were markedly different with a relatively prolonged and reduced rate of the decay. The shortened length of the growth at just the 18th day of the epidemic can be attributed to early public health interventions after notice and declaration of the cholera outbreak. As early as the second week of the epidemic, there were various health promotion campaigns and education on both radio, television and print media by the DCU. Such early response is characteristic of the DCU of Ghana. Though, if it were complemented with improved hygiene by households, and improved water and sanitation by the government, the cases could have been reduced to the barest minimum.

The epidemiological implication of the rapid growth to the peak may include lack of awareness and knowledge of the disease when V. cholerae is suddenly present in the population. Thus, during the growth period, fewer people in the population are aware of an outbreak; hence, precautionary measures would be less, leading to high susceptibility. The rapid growth could have also been induced by the dominant role of secondary cholera transmissions due to the hyper-infective nature of the bacteria after excretion from infected people. Studies have suggested differences in the genes of V. cholerae responsible for primary transmissions (i.e., those from the aquatic environment) and those responsible for secondary transmissions (i.e. those excreted from infected individuals). When inoculated into the intestines of mice via gavage feeding, freshly shed V. cholerae greatly out-competes bacteria grown in vitro, by as much as 700-fold30,31,32. The cholera bacteria become hyper-infective when excreted from infected persons and are thought to be responsible for faster bacterial growth in the gastrointestinal tract and increased shedding. In a mathematical model to understand the transmission dynamics, Mukandavire et al.33 also found that secondary transmissions contributed to the sustenance of cholera in Zimbabwe, a landlocked country.

Estimates of the maximum scale of interaction, the scale of maximal interaction, and the week the interaction began convey information that reflects the transmission mechanisms during the outbreak. Such information indicates the prospects for effective control of future cholera outbreaks and for designing targeted surveillance programs. Although the interaction of incidences was significant within ≈1 km range, its dominance within ≈0.3 km suggests high occurrences of short distance transmission mechanism. This implicates household-level characteristics and contacts as enhancing one’s vulnerability to infection, suggesting the dominant role of secondary transmissions similar to that seen in earlier studies elsewhere34,35. A probable explanation regarding short distance transmission mechanisms is the prevalence of domestic/household water storage due to intermittent water supply in the Kumasi Metropolis. Unhygienic handling practices of stored water by households can exacerbate cholera infection, as suggested to be the case in other African countries like Malawi36, Guinea-Bissau37,38, and Kenya39,40,41.

Significant spatial interaction occurred at the beginning (within the first week) of the epidemic, signifying the responsible role of secondary transmission in sparking the outbreak. This explanation is deduced from Miller at al42 who suggested that no spatial interaction should exist between cases when primary transmission sparks the epidemic. This aspect of secondary transmission sparking the epidemic arguably suggests the absence of primary transmission during the outbreak. This diverges from the roles of primary and secondary transmissions observed in coastal endemic regions8,32,42,43. Additional support for this argument derives from the stationarity of the epidemic center during the growth period, reflecting that cholera widely spread from its initial source of contamination until its peak8. Based on our argument that secondary transmission sparked the outbreak, one could only suggest that the initial cholera case must have been imported from elsewhere, probably near the coastal regions of Ghana. A rebuttal, however, should be supported by successful isolation of V. cholerae in our study area during inter-epidemic periods. However, in inland regions where cholera outbreaks are sporadic, the environmental reservoir where the cholera vibrios retreat to after epidemics remains elusive. In fact, isolation of the cholera vibrios in several inland areas have scarcely been successful during inter-epidemic periods44,45. During an epidemic period in Tanzania, V. cholerae was isolated from water samples from the Lumemo River, but not from any other water source46, indicating difficulties of the bacterial establishing natural environmental reservoirs in inland regions. That said, this finding diverges from the initial interpretation of cholera dynamics in coastal endemic regions by other authors where initial cholera cases are expected to be random without any apparent connection8,42. For instance, in coastal endemic regions like the Matlab area of Bangladesh, a hypothesized dispersal pattern of primary transmissions during epidemics has been established8. There, primary transmission sparks the outbreak at several distance locations whereby secondary transmission follows and dominates with the clustering of cases at relatively small scales. Besides, the isolation of both toxigenic and viable but nonculturable vibrios during inter-epidemic periods has widely been successful in such environments47,48,49,50.

Notwithstanding the significance of this study, some limitations should be mentioned. First, underreporting of cholera cases is a potential limitation. The failure of asymptomatic carriers or infections with mild symptoms to seek medical attention could lead to lower than actual cases of cholera. Due to a large number of cases reporting to the health facilities, not all cases could be biologically confirmed before treatment, and this could also lead to over-reporting of cases. Possible inherent data quality issues with respect to the accuracy of the GPS coordinates have not been considered in the study.

Conclusions

This study presented a generalized nonlinear model for the epidemic curve of cholera, characterized the epidemic growth and decay gradients, and estimated the levels of spatial interaction at varying distance scales. The study provides intuitive inferences as we appropriately captured the underlying empirical structure of cholera. It can be extended to include both fixed and time-varying covariates. The pair correlation function was favorable to address the heterogeneous intensities and the clustering characteristics such as the maximal distance of interaction and the distance of maximal interaction. We found evidence of secondary transmissions in initiating the epidemic. Interactions between cases were within 1 km scale and were dominated within households. Thus, strengthening household level sanitation practices is critical in reducing infections from infected people. An additional conclusion regarding the consequences of the role of secondary transmission relates to strengthening surveillance to restrict cholera importation by infected individuals. Correcting for underreporting of cholera cases in studies like this remains an area of ongoing research which we seek to investigate in future51. Improvements in public health surveillance are key to reducing underreporting. To summarize, this study has shown the usefulness of generalized nonparametric and segmented regression models, and of the pair correlation function in extracting key epidemiological information. The results of this study may have important implications for public health decision making for developing effective cholera outbreak preparedness strategies.