Data-driven modelling and spatial complexity supports heterogeneity-based integrative management for eliminating Simulium neavei-transmitted river blindness

Concern is emerging regarding the challenges posed by spatial complexity for modelling and managing the area-wide elimination of parasitic infections. While this has led to calls for applying heterogeneity-based approaches for addressing this complexity, questions related to spatial scale, the discovery of locally-relevant models, and its interaction with options for interrupting parasite transmission remain to be resolved. We used a data-driven modelling framework applied to infection data gathered from different monitoring sites to investigate these questions in the context of understanding the transmission dynamics and efforts to eliminate Simulium neavei- transmitted onchocerciasis, a macroparasitic disease that causes river blindness in Western Uganda and other regions of Africa. We demonstrate that our Bayesian-based data-model assimilation technique is able to discover onchocerciasis models that reflect local transmission conditions reliably. Key management variables such as infection breakpoints and required durations of drug interventions for achieving elimination varied spatially due to site-specific parameter constraining; however, this spatial effect was found to operate at the larger focus level, although intriguingly including vector control overcame this variability. These results show that data-driven modelling based on spatial datasets and model-data fusing methodologies will be critical to identifying both the scale-dependent models and heterogeneity-based options required for supporting the successful elimination of S. neavei-borne onchocerciasis.

prevalence) from each of the 11 hyperendemic villages from Itwara and Kashoya Kitomi based on the focus-level ABRs observed in each focus (Table 1) are shown in Fig. 1. A binomial generalized additive model testing for differences in the observed age prevalences found that village as well as foci were significant predictors of the variation in these prevalences (p-values < 0.05), highlighting the fact that despite the selection of these sentinel villages from an apparently spatially synchronous transmission setting (at least within each focus), host infection prevalence varied both locally between villages and across transmission zones. Pass/fail scores indicate that for all sites except Byeya there was a greater than 92% agreement between the model estimated prevalence and the observed age data (Supplementary Table S1), indicating that the best-fitting focus ABR models generated in this study are in general behaviorally able to produce baseline age-infection curves reflective of the spatially-variable patterns measured in these villages.
Estimation of site-specific annual biting rates. The sites studied here from the Itwara and Kashoya Kitomi foci had only focus-level ABR data available to guide the selection of onchocerciasis models applicable to each village setting (Table 1); this raised the question as to whether using the focus ABR as a driving variable allows the appropriate identification of local (village level) models. We investigated this question by reverse engineering the expected site-specific ABR values given the observed age-infection prevalence data (see Methods), and comparing these model predictions with the measured focus-level ABR values. The model-generated distributions of site-specific ABRs are shown in Fig. 2, and Wilcoxon signed rank tests suggest that these varied from the corresponding focus ABR values in each site (p-values < 0.05). Additionally, the estimated site-specific ABR distributions were also statistically significantly different between all sites, and within each of the two foci (Kruskal-Wallis tests comparing the ABRs of sites (p-values < 0.05) in each case, respectively). However, the consequence of this difference was found not to be appreciable with respect to the SIR models identified as best-fitting the present data, given that 1) the site-specific ABR models captured the observed baseline infection data similarly to the focus ABR models (Fig. 3), and 2) comparison using the relative root-mean-square error (ReRMSE) metric confirming that the performance of the site-specific ABR models did not differ quantitatively from that of focus ABR models (Supplementary Table S2). These results demonstrate, firstly, that using focus-level ABR values, at least from the present relatively homogeneous onchocerciasis endemic zones, may be sufficient to model local transmission differences, and, secondly, that if baseline ABR information is missing, as is often the case with national field surveys, model-estimated site-specific ABRs can reliably facilitate the learning of local behaviourally suitable onchocerciasis models.
Assessment of posterior parameter distributions. The BM-based data assimilation approach basically facilitates the learning of local onchocerciasis models by updating parameter estimates from their original uniform priors using the available locality-specific mf prevalence and ABR data. The prior and posterior parameter distributions from the best-fit SIR focus ABR models for each site were compared by Kolmogorov-Smirnov tests, and 19 of 22 parameter distributions tested were found to have been significantly changed by the Bayesian updating procedure (p-values < 0.05, Table 2). The corresponding results for the site-specific ABR models for sites in Itwara and Kashoya Kitomi are given in Supplementary Table S3 and do not show a notable difference compared to the focus models, suggesting: 1) that the local data were informative for updating posteriors, and 2) that the use of focus-level ABR was sufficient for calibrating the present models to reflect the local transmission conditions obtaining in the investigated sentinel monitoring sites.  Model-estimated age profiles of microfilariae prevalence at baseline compared to observed data. The focus ABR model predictions (gray curves) generated by the SIR-selected parameter sets according to their likelihood to the site-specific age-stratified mf prevalence data at baseline (red points) are displayed for the 11 sentinel sites in the Itwara and Kashoya Kitomi foci. Error bars denote the 95% binomial confidence intervals around the observed prevalence data. Pass/fail scores (see text) indicate that there was a greater than 92% agreement between the model estimated prevalence and the observed age data for all sites except Byeya.
Model fits to baseline mf infection in the absence of ABR data. As noted in Methods, we used data from five sentinel villages in the hypo-and mesoendemic Bwindi focus in order to introduce greater variability in the types of endemic settings modelled in this study (Table 1). However, as there were no data on ABR available from any of these villages, we needed to apply the site-specific ABR estimation technique described above in order Model-estimated site-specific annual biting rates. The histograms show the relative frequencies of the model-estimated annual biting rates for each village studied in the Itwara and Kashoya Kitomi foci. Wilcoxon signed rank test results indicate that the distributions of estimated site-specific ABRs do not come from distributions whose median is equal to the focus-level ABR (blue solid line), suggesting a significant difference in the two values (p-values < 0.05 for all sites). The estimated site-specific ABR distributions between sites are statistically significantly different as given by Kruskal-Wallis tests comparing ABRs between all sites as well as between sites within each focus (p-values < 0.05).

Figure 3.
Comparison of baseline model fits from the focus ABR versus site-specific ABR models. The baseline age-prevalence curves generated via model simulations using estimated site-specific ABR (gray curves) represent the village-level infection age-profiles similarly to those of model simulations carried out using focus-level ABRs (blue curves). Model estimates given either ABR inputs captured the observed data well (Supplementary Table S2). Red data points represent the age-specific prevalence data with 95% binomial confidence intervals.
www.nature.com/scientificreports www.nature.com/scientificreports/ to identify the local models for these villages. We also needed to estimate the age-prevalence curves from the available overall community baseline mf prevalences in these sites (see Methods). The resulting SIR model fits to these estimated age-infection data are depicted in Fig. 5, and indicate that using these approximations can allow discovery of reasonably good models for these sites. The corresponding site-specific ABR distributions estimated by these respective village-specific models are displayed in Supplementary Fig. S2. Note results from these sites were only included in subsequent analyses related to simulations using site-specific ABR models.

Mf prevalence breakpoints.
Mf breakpoints corresponding to both the ABRs and the threshold biting rates (TBRs) were calculated for the desired elimination probability (EP) of 95% (Table 4). These calculations were performed using the best-fitting parameter vectors for the baseline prevalence data for both the focus and site-specific ABR models. In the majority of cases, the breakpoint distributions generated by the site-specific ABR models did not statistically differ from those generated by the focus ABR models (Kruskal-Wallis tests, Supplementary Table S4), further suggesting that focus ABR models are sufficiently able to capture local transmission characteristics in relatively homogeneous transmission zones. Kruskal-Wallis tests show that the mf breakpoints and TBRs differed significantly (p-values < 0.05) between sentinel villages and between foci. However, the numerical values of the mf breakpoint prevalences given in Table 4 show that these difference may not be large enough to be practically meaningful. Additionally, these critical infection thresholds, irrespective of location, are low-valued (much lower than the WHO suggested threshold value of 1% mf prevalence 45 ) and are found to be significantly much higher at TBR than at ABR. impact of between-site heterogeneity on onchocerciasis elimination. An array of intervention strategies were simulated to evaluate the impact of spatial variability and complexity in transmission on timelines to crossing the estimated elimination threshold values in each site. The control measures simulated consist of different combinations of annual or biannual MDA treatments with ivermectin, the presence or absence of vector control, and population drug coverage of 40-100% for up to 50 years. For modelling the effects of larvicidical (Abate) application, we first used data on observed changes in the number of flies over time as a result of applying this control measure to riverine habitats -data obtained from sites in the Kashoya Kitomi focus -in order to estimate the functional form describing the expected rate of temporal decrease in black fly biting density due to this vector control strategy. An exponential decay function best described the observed decrease in fly numbers giving a value of A 0 = 0.44 ( Supplementary Fig. S3), and we applied this function to model the impacts of larvicide assuming that the application and temporal effects were similar in all three study regions.
Simulation results on the effectiveness of the four MDA-based strategies in terms of the number of years to reach the estimated village-specific 95% EP threshold for one sentinel site in the Itwara focus, Byeya, are shown in Fig. 6. Intuitively, and as shown, the most effective intervention scenario is high coverage biannual MDA with VC while the least effective intervention scenario is low coverage annual MDA without VC. The number of years of MDA required to reach the estimated thresholds for each sentinel site are further tabulated in Table 5 for the optimal scenario of achieving 80% MDA coverage, implementing VC, and targeting a stringent threshold (95% EP). Under this strategy, biannual MDA treatments and the use of VC can save many years of interventions over the less intensive annual MDA without VC strategy (Table 5). For example, 26 years of annual MDA without VC are required to reach the 95% EP threshold in Byeya, while only 6 years of biannual MDA with supplemental VC are required, thus potentially saving 20 years of control activities in this site. Kruskal-Wallis tests indicated that the number of years of interventions required varied significantly between foci and between villages (Supplementary  Table S5), highlighting the importance of considering local transmission dynamics for planning control activities. The distributions of timelines in each site were also found to statistically vary between the focus and site-specific ABR models (Supplementary Table S6); however, as observed for the breakpoint results, the values reported in Table 5 suggest that these differences may not be large enough to be practically meaningful.  www.nature.com/scientificreports www.nature.com/scientificreports/ An important finding highlighted by these results is that including VC using Abate applications can not only reduce the mean durations required to reach elimination, but also, strikingly, both the within and between-site variance in the predicted elimination timelines compared to the corresponding MDA only strategies. This finding suggests that including VC into MDA programs has the additional advantage that it can significantly overcome the inherent between-site heterogeneity in parasitic transmission dynamics besides improving predictive reliability. Post-hoc multiple comparison testing using the Nemenyi-test for multiple comparisons of (mean) rank of sums indicated that while the predicted timelines to elimination differed significantly between each pair of foci   . Classification tree highlighting model parameters underlying differences between sentinel sites (A) and foci (B). The best-fitting parameter sets from each of the sites from the Itwara, Kashoya Kitomi, and Bwindi foci for the site-specific ABR models were analyzed here. Village and focus-level models were classified according to their fitted parameter values using the rpart package in R. The trees pruned with the complexity parameter set to 0.005 and a tree depth of 6 are shown here. The top five variables by importance were c, H Lin , σ e , K Lin and I c in the case of the village-specific models, whereas these were c, I c , H Lin , K Lin , I cmin , and σ e in the case of models aggregated at the focus level.
Scientific RepoRtS | (2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ studied here for the MDA-only based strategies (p-values < 0.0001), the mean durations were reduced significantly in the Bwindi sites in comparison to both the Itwara and Kashoya Kitomi sites (p-values < 0.0001) -but not between the latter sites -when vector control was included in the intervention simulations (p-values > 0.50). (Table 5). Again, however, the practical significance of this reduction in intervention durations appeared small. These findings underscore that while transmission factors at the focus level can have significantly differential, albeit practically small, outcomes for MDA only interventions, the inclusion of vector control can overcome and reduce this variability appreciably between foci or strata even when they vary in their baseline endemicity.

Discussion
Our central goal in this paper was to assess the impacts of spatial complexity and heterogeneity for modelling S. neavei-transmitted onchocerciasis and for managing the elimination of this important macroparasitic infection in typical endemic regions. We show that while spatial complexity can play a significant role in the transmission and elimination dynamics of the onchocerciasis transmitted by S. neavei black flies, these effects appear to operate at the scale of the larger transmission focus rather than, as for the mosquito-borne sister filarial disease, LF, at finer village-level point spatial scales 25,30,31 . Such partitioning or constraining of transmission dynamics by higher spatial levels means, as we have demonstrated here, that models can be discovered efficiently for these transmission zones by calibration to data from a few sentinel sites that capture the dominant features of transmission dynamics in such regions (although see caveats below). If applicable, this clearly enhances the efficiency and power of modelling such spatially stratified parasitic systems since it obviates the need to consider the effects of spatial (and temporal) complexity across the full range of spatial/temporal scales down to the lowest levels (ie. cell-by-cell or village point scales) 21,22,46 . We also show that such zonal or stratified partitioning of S. neavei -onchocerciasis transmission dynamics means that although between-site variability will still occur, this will have little practical effects on the dominant dynamics of extinction (elimination breakpoints, timelines to extinction due to interventions) within a focus or between transmission foci exhibiting relatively similar pre-control endemicity or infection patterns (Tables 4 and 5). This indicates that, depending on the intervention option (see below), outcomes of interventions are likely to vary appreciably only between transmission foci or regions that differ substantially in endemicity levels for S. neavei-onchocerciasis (eg. between the mesoendemic Bwindi focus compared to the hyperendemic Itwara and Kashoya Kitomi foci). Such a spatially stratified structure in the dominant transmission Figure 5. Model-estimated age profiles of microfilariae prevalence at baseline compared to observed data in Bwindi sites. Because the baseline prevalence data for these sites were aggregated for the entire population, age-stratified prevalence was estimated for two major onchocerciasis age-infection patterns -plateau (red points) and convex (blue crosses). The model curves generated by the SIR-selected parameter sets according to their likelihood to the estimated mf prevalence profiles are shown in gray. Error bars denote the 95% binomial confidence intervals around the predicted infection prevalence points for a conservative sample size of 100. Because no ABR information was available for this focus, the model-estimated ABR for each site was used to simulate the prevalence data ( Supplementary Fig. S2).

Scientific RepoRtS |
(2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ dynamics within an endemic zone suggests that management of S. neavei -transmitted onchocerciasis elimination will need to be sensitive to variable parasite transmission and extinction dynamics at the focus level rather than be based on a paradigm of uniformity that ignores this spatial heterogeneity.
The above conclusions are clearly conditioned on the ability of our modelling framework to reliably capture the dominant transmission dynamics of S. neavei -transmitted onchocerciasis in an endemic focus. While our BM-based modelling algorithm combines the advantageous features of mechanistic and statistical approaches to improve the estimation of local models for facilitating forecasts of interventions applied under a variety of field conditions, it is dependent, as for any data-driven predictive system, on the model structure employed, estimation procedure, and on the data used for facilitating model discovery 16,17 . Although our BM framework primarily focused attention on addressing parameter uncertainty with data, we note here firstly that the present model is based on previously established population models of onchocerciasis transmission 7,47 , with appropriate structural extensions made with regard to population-averaged mf uptake and larval development in the S. neavei vector host as well as the operation of different forms of host immunity in populations 48 . Furthermore, we have also secondly included all previously suggested density-dependent functions that are thought to govern onchocerciasis transmission but do not make any a priori assumptions concerning their occurrence using data, instead, to determine the operation of these functions in a site, which allows for a degree of updating for model structures applicable to a particular setting.
By contrast to conventional model calibration practices (which principally focus on adjustment of parameters until discrepancy between model outputs and observed data is minimized), the Bayesian basis of our Monte-Carlo BM-based model estimation technique, as we have demonstrated previously 25,30,31,49 , is also more informative given that it not only updates our knowledge of model parameters (and structure) for a site but also facilitates predictions along with associated uncertainties for modelled output variables 38,50 . In addition, it is also vitally premised on the expectation that obtaining a single, optimal, set of acceptable parameters for describing a complex multi-parameter ecological system is problematic, as parameter equifinality means that often several behavioural sets can fit observations 20,29,38 . Such model equifinality implies that there may well be many combinations of initial conditions, model behaviours and parameter sets that are consistent with the limited observations available about a particular ecological system, and that it is important to capture this complexity through selection of sets of behaviourally acceptable models, such as the present SIR-selected models, for ensuring the making of reliable predictions.
Another key component of our modelling approach is the use of site-specific data from the three transmission foci investigated in order to determine within-and between-focus transmission heterogeneity. A special feature of the site-specific mf prevalence (and ABR) data used in this analysis, however, is that these were collected in sentinel communities in locations close to riverine systems suitable for supporting S. neavei populations to ensure that conditions of maximal transmission are monitored in each focus. Thus, while this selection does not allow study of the full range of transmission heterogeneity and complexity likely to occur within the present onchocerciasis foci, it does allow comparison of parasite transmission intensity and variability occurring within the maximal or the dominant transmission tracts within and between each of the present endemic foci. This implies that conclusions regarding focus-level heterogeneity made in this paper should strictly be considered as alluding to the spatial variability observed for the models and their predictions applicable in these comparable maximal riverine transmission tracts within each of the investigated foci. Such simulations of worst-case settings are, however, in alignment with another implicit practical aim of our modelling goal here, viz. that we are able to provide  The values of mf breakpoints that give rise to a 95% probability of interrupting sustainable parasite transmission once breached (Table 4) represent the first such estimates produced for onchocerciasis. Previous work has largely focused on estimation of threshold vector biting rates 51-53 , although Duerr et al. 51 also investigated mf breakpoint shifts in terms of the application of CDTi. Our work here has systematically extended these previous studies by addressing the values of both the mf breakpoint in the human population as well as the vector threshold biting rate that occur together in a site using data-fitted models. The results demonstrate, for the first time, not only that these critical thresholds governing breakage of S. neavei-borne onchocerciasis transmission are typically lower-valued (between approximately 0.06% to as relatively high as 0.10%) than the WHO suggested global threshold value of 1% mf prevalence in the presently studied sites 45 , but also that they can vary significantly both between sites and between foci. However, an important finding is that the values of these infection breakpoints within sites were found to be strikingly higher at the modelled vector threshold biting rate (TBR) with estimated values universally higher ( > at least 1.13%) than the 1% mf prevalence elimination threshold set by WHO, and, in some settings, even reaching values close to 2% mf prevalence (Table 4). This finding corroborates the results obtained previously both for onchocerciasis 8,51 and for LF 25,[30][31][32] , suggesting that this inverse relationship with the vector biting rate may represent a general feature of filarial infections. It also supports our proposal for the critical importance of including vector control in MDA programs against these diseases 30 , because as vector populations are reduced, infection breakpoints will increase commensurately in value until they are maximal at the TBR threshold applicable in a site 54 . The numerical values of the estimated site-specific mf breakpoints did not also differ statistically between the site-specific ABR versus focus ABR models, further supporting the central finding of this study, viz. that the latter models are able to satisfactorily capture the dynamics of infection in the investigated transmission zones of each of the present foci.
Although we found that values of a majority of the model parameters (up to 19 out of 22) were successfully updated by the site-specific data, highlighting the important role that data-model assimilation methods can play in improving scientific knowledge of complex parasite transmission systems, a significant finding is that only five parameters -all related to exposure -were sensitive to the type (site-specific or focus-level) of ABR used in simulating the present models. Despite this, both the site-specific and their corresponding focus-based models were able to adequately reproduce the age infection profiles in each study site, again emphasizing that many different models may be behaviourally able to fit the data observed for a complex system to an acceptable level, with the behavioural parameter values dispersed widely through the parameter space. While this equifinality reveals that many combinations of initial conditions, parameter sets, and model behaviours are consistent with observations, especially when measurements are few 14,20 the consequences of such equifinality is clear, viz. it will induce uncertainty in inference as well as prediction, which in many cases will be difficult to further reduce imposing a limit to the predictability and thus use of any complex model 16,18,19 . Note here also that a related outcome of such  (Table 4). Vector control applied in these scenarios assumes larviciding was implemented from the start of interventions. The results shown were generated by the site-specific ABR models for Byeya.

Scientific RepoRtS |
(2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ complexity is that the more complex a model (multi-components and parameters, many functional responses and feedback loops) the harder it will be to refute it 23 , indicating that confirming observations may not demonstrate the veracity of a model but merely supports its probability conditional on errors in model structure, calibration of parameters or other auxiliary conditions, and the period of data used in the evaluation.
By contrast, an intriguing finding is that local S. neavei -borne oncercherciasis transmission adaptation appears to be dependent only on a few biological parameters (Fig. 4). Our results show that primarily these were parameters capturing variations in host immunity, exposure, infection aggregation and excess vector mortality both between sites (Fig. 4A) and foci (Fig. 4B). As we pointed out previously 25 , such dependence on only a few "stiff " parameters for adapting to local transmission conditions may allow a complex, multiparameter, parasitic system to use the majority of its defining parameters to counter changes in the local environment, including to perturbations induced by interventions, potentially making onchocerciasis transmission highly resilient to such shocks. One the one hand, this result underlines the likely difficulty of disrupting S. neavei -borne onchocerciasis transmission easily; however, the dependence of such "stiff " parameters, particularly those related to immunity and exposure, on variations in site or focus-level worm prevalence/density or ABR (Table 3, Fig. 4B) on the other hand means that inducing changes in these locally adapted parameters, eg. by reducing ABR using vector control or increasing immunity either through nutritional means or vaccination, may not only facilitate the disruption of the observed between-site or focus heterogeneity in transmission but also make the local system commensurately susceptible or fragile to perturbations or shocks that take the system far outside its (these) normal constraining conditions (see below) 25 .
Our results from modelling the impact of currently used or proposed MDA-based interventions on the comparative timelines to extinction given spatial variation in mf breakpoints and parasite transmission dynamics between sites and foci has provided further insights into the policy implications of the outcomes of these transmission complexities. First, we show that model uncertainty and equifinality will result in distributions of extinction timelines in a site (Table 5), which although varying significantly between the focus and site-specific ABR models were not large enough to be practically meaningful. This suggests that either model may contain sufficiently behavioural models, and thus both are useful for providing adequate predictions of the impact of applied interventions. This result also implies that the strategy of monitoring S. neavei ABR at the focus rather than at the sentinel village-level as carried out by the UOEP is sufficient for developing models for carrying out simulations of the outcomes of interventions, obviating the need (and expense) for conducting vector monitoring at finer spatial locations. The second insight gained from our intervention modelling study relates to the finding that the predicted durations of control required to eliminate S. neavei-transmitted onchocerciasis did not vary appreciably between sites within each focus (Table 5). They also did not vary markedly between foci that exhibited the occurrence of similar pre-control prevalences in sentinel monitoring sites, such as in the case of the Itwara and Kashaoya Kitomi foci investigated here, with predicted intervention durations showing differential reductions for MDA-only strategies in the case of Bwindi. This outcome signifies that if transmission-comparable strata occurs at the higher focus level, then it may be possible to use simulations from a selection of sentinel sites from such strata to fully characterize endemically-comparable regions within and between foci. Our results specifically show that in such cases even a selection and monitoring/modelling of 5-6 sites from strata along river systems where the highest transmission intensity may be expected is sufficient to capture the worst-case scenario expected with regard to timelines to S. neavei-onchocerciasis extinction both within a given focus as well as between foci. This   www.nature.com/scientificreports www.nature.com/scientificreports/ finding again supports the policy of using information from a few comparable sentinel sites with the highest baseline prevalences along riverine systems that support black fly populations for subsequent intervention monitoring and for making decisions regarding when area-wide elimination of S. neavei -transmitted onchocerciasis may have occurred in endemic foci in Uganda.
Despite the above large-scale spatial effects, it is clear that the durations of control required to achive S. neavei-onchocerciasis elimination were greatly influenced by the type of MDA-based intervention scenario carried out in a site (Table 5). We show that overall the least effective scenario is low coverage annual MDA without VC resulting in requiring variously between 15-31 years to achieve transmission interruption in these sites even at 80% MDA coverage (Table 5), while the most effective strategy is high coverage biannual MDA with VC needing only between 3-7 years of control ( Table 5). The Uganda Onchocerciasis program is beginning to consider switching to the use of biannual MDA currently 55 , and the results here show that this change may bring about major savings in the durations of interventions required for achieving onchocerciasis elimination in the country, with the model predictions showing that up to some 7-17 years could be saved in the present sites with this approach compared to application of annual MDA alone (Table 5). Although the intervention forecasts made here require to be assessed with follow-up data in the present sites for evaluating their plausibility, it is critical to note that the projections depicted in Table 5 lie within the 15-19 years found empirically to be needed for reducing mf prevalence to near elimination levels in the field using annual MDA 7,56-59 . Although not definitive, this correspondence of our predictions with observations provide confidence that the present locality-specific models are capable of yielding empirically realistic results not just for quantifying timelines to S. neavei-onchocerciasis elimination in a site but also for comparing the dynamics of extinction by site and by intervention option.
There are two outcomes that require discussion regarding the present predictions pertinent to the inclusion of VC into anti-Onchocerca MDA programs. The first is that dosing rivers with Abate in conjunction with MDA can significantly bring down the years required to accomplish the elimination of S. neavei -transmitted onchocerciasis. This can reduce the duration of interventions by over half those required by using the corresponding annual or biannual MDA alone, eg. by reducing the number of years of interventions required to 7-12 years by inclusion of this form of vector control compared to 15-31 years for annual MDA alone and just 3-6 years compared to 8-14 years in the case using biannual MDA alone (Table 5). These represent dramatic effects, and we indicate that present onchocerciasis programs, particularly in S. neavei areas, should now begin to urgently consider including Abate-based (or another appropriate vector intervention option, such as vegetation clearance 60 or use of lethal traps 61 ) control in ongoing MDA programs if the prospects of successfully meeting the goal of onchocerciasis elimination over the immediate future are to be achieved in these settings (see also 62 ). The second major effect of including vector control into MDA programs is highlighted by the ability of this joint strategy to reduce the variabity in the timelines to extinction predicted by the present models ( Fig. 6 and Table 5). The results indicate that both within-and, interestingly, between-site variability in such predictions can be reduced compared to when using MDA alone strategies, suggesting that including VC into MDA programs may not only offer a more effective option for eliminating onchocerciasis but also that it may overcome the effects of spatial heterogeneity to the extent that even the impact of between-foci heterogeneity may be reduced with predictions of intervention durations converging similarly across all three foci ( Table 5). As we previously noted, this remarkable outcome of including vector control into MDA programs is brought about not just by a reduction in the robustness of parasite transmission but also by the tipping of transmission dynamics into a more narrowed and more fragile region 25,30,49 , leading to convergence in responses to perturbations irrespective of initial site-differences in transmission regimes.
The traditional approach to dealing with spatial complexity in modelling and managing ecological phenomena has been to consider the need for deriving models for the scale at which the major spatial effects occur and in addressing the relevant spatially adaptive options for achieving desired outcomes over a geographic domain of interest 13,21,22 . Here, our contribution is to demonstrate that in parasitic systems effectively addressing the impact of spatial complexity on modelling infection dynamics and control over a spatial domain will also crucially depend on the scale at which the heterogeneous ecology of transmission occurs. Thus, for S. neavei -transmitted onchocerciasis, given that maximal transmission occurs along tracts and watersheds surrounding the major riverine habitats of black flies, discovery of models even from a few survey sites for this discrete homogeneous transmission tracts will allow characterization of the dominant representative transmission dynamics for such strata. That is, for this disease there is no need to go below such strata to finer cell-by-cell or village levels to characterize the transmission dynamics in a spatial setting. This can enhance modelling efficiency and predictive power given that models need to be built only at the higher aggregate level either by lumping or grouping together parameters from a few sampling sites within each zone or via regularization 63 , in which global functional relationships describing transfer functions that link model parameters with environmental or lanscape characteristics are estimated and used. The implications for onchocerciasis elimination management of the occurrence of spatial effects at higher hierarchical scales, on the other hand, are more straightforward as it simply implies that given the characteristics of the intervention in question (see above) management may need to be adaptive only at the foci level. While this conclusion may argue for enacting an heterogeneity-based adaptive management strategy, eg. tailoring durations of annual MDA in particular by baseline prevalence (or ABR intensity) at the focus level (Table 5), as noted above, an alternative non-adaptive but still heterogeneity-underpinned approach is to apply strategies, such as S. neavei control using Abate applications, that could counter such spatial variability so that a stategy that works equally well everywhere can be deployed. Such a heterogeneity-based strategy may be simpler to implement and manage than adapting control by foci, and we have shown previously how including vector control in MDA programs for vector-borne macroparasitic diseases can also offer a resilient strategy that can sustain the eliminated state by raising the thresholds for re-emergence of infection 54,64 .
This study shows that charaterizing and modelling the impact of spatial heterogeneity and complexity will be critical to gaining a better understanding of the transmission dynamics and management of the area-wide (2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ elimination of complex vector-borne macroprasitic diseases, such as S. neavei-transmitted onchocerciasis. We also show that modelling frameworks that couple spatial datasets with data-model assimilation methods will be required if the appropriate scale-dependent local models are to be discovered and used to faciliate the making of predictions of the impact of implemented or proposed interventions that take a fuller account of spatial complexity. Our results also indicate that deriving and testing the effectiveness of heterogeneity-based approaches, as opposed to the normal homogeneity-focused command-and-control management model 40,65 , as an alternative paradigm for eliminating S. neavei-transmitted onchocerciasis, and indeed the other parasitic diseases chosen for global intervention, are now in order if we are to effectively achieve the goal of accomplishing area-wide control or elimination of these invariably spatially complex diseases successfully. We suggest that the development of data-driven modelling frameworks that pay careful attention to addressing the impacts of spatial patterns and factors that underlie the observed transmission dynamics of parasitic infections across a spatial domain will be critical to supporting the successful execution of the above task.

Methods
Data. The data used in this study were collected from sentinel field sites in three onchocerciasis endemic foci in western Uganda, viz. Itwara, Kashoya Kitomi, and Bwindi, as part of the monitoring and evaluation activities associated with the Ugandan Onchocerciasis Elimination Program (UOEP). While onchocercal transmission foci can extend to a range of distances from the fly source rivers, the sentinel villages for monitoring the impacts of interventions in Uganda are normally selected among those closest to these rivers to ensure that communities with the highest prevalence are longitudinally monitored to reflect impact in likely worse-case settings within a focus 45 . Note that while this design is efficient for monitoring the effects of an intervention in a focus, it will also reduce the potential extent of between-site infection/transmission heterogeneity among the chosen sentinel sites investigated in this work as compared to villages chosen randomly throughout the spatial extent of each focus.
Altogether, data assembled from sixteen such sites were selected for use in the present analysis based on the variable availability of baseline and follow-up microfilariae (mf) prevalence data, focus-level black fly annual biting rate (ABR) data at baseline, and details of interventions applied in each focus (Table 1 and Supplementary  Table S7). Of these sites, eleven were from the hyperendemic Itwara and Kashoya Kitomi foci. The baseline mf prevalence in these villages ranged from 76.3% to 92.1%, and the black fly species responsible for transmission of Onchocerca volvulus in both foci is Simulium neavei. Note, however, that follow-up monitoring data from 2004 were available for only six of the sites (Table 1).
Initially, control efforts in these foci and across Uganda relied solely on annual rounds of community directed treatment with ivermectin (CDTi), but efforts were then scaled up to include vector control 66 . Successful elimination of the vector from several foci in Uganda using ground applications of temephos (Abate) eventually also shifted the focus of the program to transmission elimination rather than just control 66 (Table 1). Vector control by ground larviciding was implemented here in July 2007 -October 2010 68 . Site-specific MDA coverage observed for the six villages from these foci that have follow-up data are given in Supplementary Table S7. Impacts of vector control were investigated by UOEP by measuring changes in the aggregate values of ABR in each focus. Typically, biting data from human landing catches (HLCs) carried out in 14 catching sites along the relevant river system in a focus during the 4-5 months pertaining to the major transmission season (at frequencies of 2 days per week) were used to make these focus-level ABR calculations using the calculation methods described in 69 .
In addition, five villages from Bwindi, most of which are hypo-and mesoendemic, were also included in this study to consider a more heterogeneous sample of sites in the analyses (Table 1). For these sites, baseline mf infection data are aggregated at the community level and there was no biting rate information available (Table 1), requiring us to use model fits to the mf prevalence data to estimate the corresponding ABR information. Transmission here is also mediated by S. neavei, and both MDA and vector control interventions were carried out in this focus (Supplementary Table S7). the onchocerciasis transmission model. The onchocerciasis transmission model used here is an immigration-death ecological model describing population level parasite transmission dynamics in both the human and black fly hosts. The general structure of the model follows that incorporated in previous deterministic population models of onchocerciasis transmission and control dynamics 47,[70][71][72][73][74][75] , although here we develop and include functions describing the population-averaged mf uptake and larval development in the vector as well as the operation of different forms of host immunity in populations, while model implementation is carried out within a Bayesian Monte Carlo data-model assimilation framework that facilitates the simultaneous incorporation of information regarding local transmission parameters from data and assessment of the impact of uncertainty in the values of these parameters on model outputs 31,38,49,50,[76][77][78] .
Technically, the present model is based on a hybrid coupled partial and ordinary differential equation system, where the population-level age-structured pre-patent (P) and patent (W) worm burdens, microfilarial counts (M) (mf/mg of skin), and acquired immunity level (I) in the human host are dynamically modelled by a set of partial differential equations over time (t) and age (a), whereas the dynamics of infective stage L3 larvae (L) in vector hosts (simuliid fly populations) are modelled by an ordinary differential equation over time. To reflect the significantly faster time-scale of the infection dynamics in the vector hosts, we, however, make the simplifying (2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ assumption that the density of infective stage larvae in the vector population reaches a dynamic equilibrium rapidly (L*) 70,75 . The governing equations of this model are:  Apart from multiple interacting components, complexity in the model is also governed by the operation of several density-dependent functional forms (denoted by the terms F x ) that moderate the transitions or development rates of different stages in the parasite's life cycle. These regulatory processes underpin the operations of two forms of host immunity, namely acquired immunity to incoming larvae (F 1 [I(a,t)]) and host immunosuppression by existing infection (F 2 [W T (a,t)]), mf production by patent worms (F 3 [W(a,t)]), and the development of infective stage larvae in the vector (F 4 [M(a,t)]). The larval death rate (σ L ) and excess vector mortality (σ e ) due to infection 47 are also both considered in F 4 . Note that some functions are dependent on the total worm load where W T = W(a,t) + P(a,t), while others depend on larval states (L*) and host immunity (I). To capture the effects of worm patency, we consider that at any given time t, human individuals of age less than or equal to pre-patency period, , τ will have no adult worms or microfilariae, ie. W(a,t) = M(a,t) = 0 for a ≤ τ, and the rate at which pre-patent worms survive to become adult worms in these individuals at a > τ is given by ζ = µ τ − e W . The establishment rate of larvae in the human host, represented as Φ, is moderated by effects of acquired immunity (as modelled by function, F 1 [I(a,t)]) and/or immune tolerance (modelled by function, F 2 [W T (a,t)]) 73,74 . Given that previous modelling studies have underscored the potential for the involvement of either or both acquired immunity and immunosuppression in filarial infections 47,73,75 , we include both these types of immunity in our model; however, we do not make any a priori assumptions concerning their occurrence in a particular study site but instead use the mf prevalence data to determine the operation of each or both in a site (see discussion of data-assimilation using the Bayesian Melding framework below). The mathematical representation of adult Onchocerca volvulus worm mortality has recently been a topic of discussion 7 ; here, we use a constant mortality rate following the example of Basañez and Boussinesq 47 and Filipe et al. 75 . The vector biting rate (λ, a parameter in the function Φ) in this model considers the number of bites per black fly to be equal to the human blood index (H b ) divided by the gonotrophic cycle (g). All model parameters and their descriptions are given in Supplementary  Table S8. The respective density-dependent functions expected to modify or regulate transmission are given in Supplementary Table S9. Species-specific microfilariae uptake and L3 larval development. There exists a non-linear relationship between the uptake of mf by the black fly and the resulting L3-stage larval load 79 . This relationship will be dependent upon whether or not the relevant black fly species has developed cibarial armature, which we capture in the establishment rate term f[M(a,t)]. Following facilitation and limitation model formulations we developed previously 32 , the population-level mf uptake and larval establishment rate function for black flies with cibarial armature can be written as: whereas for black flies without developed cibarial armature we used the form: In the above, k k k M Lin 0 = + is the shape parameter of the negative binomial distribution on the mf uptake whereas r and κ are respectively the rate of initial increase and the maximum level of L3 larvae. Parameter priors for r and κ were derived by fitting these functions to human and vector infection data from 80,81 . In this work, the relevant vector species, S. neavei, does not have well-developed cibarial armature, so we use the latter of the two formulations (Supplementary Table S9). complex model discovery using data assimilation. Model-data fusion (also referred to as 'data assimilation' or 'inverse modelling') is a method whereby observations are used to optimize a model and quantify its uncertainty for a given setting [34][35][36] . Recent work has shown how such approaches via their ability to combine models with local data in a statistically rigorous manner can incorporate the effects of local initial conditions (2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ and input drivers to constrain model parameters and quantify model error 30,82 , thereby improving the discovery of behaviourally acceptable complex dynamical models applicable to a given setting 20 . Here, we build on one such strategy that facilitates the integration of field observations on onchocerciasis baseline mf prevalence and ABR in sentinel communities with simulation model outputs to estimate the values (and uncertainty) in the model parameters applicable for reproducing local parasite transmission dynamics. We then apply the locally calibrated model to undertake: (1) quantification of the infection endpoints or elimination thresholds applicable in each setting; and (2) based on model predicted mf prevalence trajectories, examine the effectiveness of various MDA-based intervention programs for breaking locality-specific parasite transmission. We used the model calibration and simulation framework founded on the Bayesian Melding (BM) algorithm to solve this local model discovery and prediction problem 38,50,76,77,83,84 , as follows.
The BM approach may be succinctly described as a procedure whereby two priors on a model output are compared and "melded" together in order to obtain the posterior parameter space in which the model may reliably explain the underlying natural system dynamics 84,85 . One of the priors on model output is the set of observed data; for example, in our case the survey data on onchocerciasis baseline mf prevalence collected from each endemic community before the start of a mass drug treatment program. The other output prior is the model-predicted values of the state variables, such as W or M, generated by values of initially-set parameters. Thus, the BM procedure is a method for reconciling several sources of prior information (on both input parameters and on model outcomes relative to data) to constrain the acceptable solution space of the model parameters. In the present implementation, we initially assigned uniform prior distributions for each of the model parameters to reflect our initial incomplete knowledge regarding their local values. The black fly biting rate servers as the primary driving variable, and is normally fixed to the values of the monthly biting rate (MBR) measured at baseline, although we can use the best-fitting models to age-infection data to quantify this variable in cases, such as for the sites in Bwindi, where these data are missing (see below and 4 ). For assessing the reasonableness of model outputs to data, a binomial likelihood function was constructed to capture the distribution of the observed age-mf prevalence data, as follows: where θ represents a set of the model parameters, termed as parameter vector; M g denotes the total mf positive samples out of the total S g blood samples collected from people in the g th age-group, and the term P g is the modelled mf prevalence in the same age-group. The BM method is essentially an iterative approach to inference, whereby our beliefs regarding a suitable model are updated based on the likelihood of new data, generating a posterior distribution and predictive interval for the observed data. The procedure we used is summarized as follows. There are 27 parameters in the model (Supplementary Table S8 , the model was simulated, and the corresponding distribution of model parameters ( ) π θ obtained. We then used the sampling-importance-resampling (SIR) algorithm to resample, with replacement, from the above set of N parameter vectors with the probability of acceptance of each resample j l 1,2, , θ = … probable to its relative weight Ω j , which is proportional to the corresponding likelihood φ L( ) j for the data, i.e. φ φ Ω = ∑ L L ( )/ ( ) j j i . A typical value of l for the results presented in this paper was around 500, and these posterior parameter sets are then used to generate distributions of variables of interest from the model (e.g. locality-specific age-prevalence curves, worm or mf breakpoints, and post-treatment infection trajectories), including measures of their uncertainties 31,49 . calculation of worm breakpoints. Briefly, we applied a numerical stability analysis approach based on varying initial values of L * (see details of the procedure provided in 32,49 ), to each of the SIR-selected model vectors in order to calculate the distribution of mf prevalence breakpoints and threshold biting rates (TBR) expected in each study community. There are two essential steps in this approach. In the first step, we progressively decrease V H (the average number of black flies per human) from its original value to a threshold value below which the model always converges to zero mf prevalence, regardless of the values of the endemic infective larval density L * . The product of λ and this newly found V H value is termed as the threshold biting rate (TBR). Once the threshold biting rate is discovered, the model at TBR can settle to either a zero (trivial attractor) or non-zero mf prevalence depending on the starting value of L * . Therefore, in the next step, while keeping all the model parameters unchanged, including the new V H , and by starting with a very low value of L* and progressively increasing it in very small step-sizes we estimate the minimum L* below which the model predicts zero mf prevalence and above which the system progresses to a positive endemic infection state. The corresponding mf prevalence at this new L * value is termed as the worm/mf breakpoint 32 . Note that a similar procedure can also be used to estimate the mf prevalence breakpoints at the undisturbed ABR values in a site. In this case, the estimated mf breakpoint prevalences will always be lower than the corresponding maximal values at TBR 25,30 . We thus use the mf breakpoint prevalences identified at ABR for simulating parasite elimination when MDA alone is used, while we employ the breakpoint values obtained at TBR when predicting transmission interruption as a result of including vector control with MDA 25,30 . The distribution of mf breakpoints thus quantified at either ABR or TBR values can finally be described by an empirical inverse cumulative density function 25,30 . We used this density function along with exceedance calculations to quantify the values of mf breakpoint prevalence thresholds reflecting a 95% elimination probability for use as the infection elimination target in this work. We used both the observed focus-level ABRs well as the estimated site-specific ABRs as required (see below) to estimate the site-specific worm/mf breakpoint and TBR values used in this analysis.
Estimating site-specific ABRs. Previous studies have shown that the key environmental driver that governs pattern-process relationships in the case of vector-borne macroparasitic infections, and hence that primarily constrains a model's parameter space so that the effects of local transmission conditions are captured, is the vector biting rate obtaining in a community 4,7,25,30 . However, in this study, the baseline data for each study village included information on this driving variable only at the higher aggregate focus level as described above. Although the use of such aggregated ABR to obtain the best-fit models within each sentinel village in a focus may offer a second-best option in the case of lack of information on vector abundance at the local level, it is apparent that this approach could introduce significant bias if the sentinel villages differed markedly in their local ABR values. We examined this possibility by using the models fitted to the mf prevalence data in each site in order to 1) estimate the corresponding site-specific ABRs, and 2) following this, undertaking a comparison of the values obtained against each measured focus-level ABR. To perform this exercise, each of the randomly sampled parameter vectors was used to obtain plausible ABR values by running the model under a standard root finding algorithm for which the model-generated overall mf prevalence matched the observed baseline community mf prevalence within a tolerance limit of 0.001 or lower. This root-finding algorithm was implemented in Matlab using its built-in fzero function 4 . The same procedure was also used to estimate the missing ABR data for the Bwindi sites.
Modeling the effects of mass drug administration and vector control. The impact of MDA was modelled by assuming that anti-filarial treatment with the currently used ivermectin drug regimens acts by killing certain fractions of the populations of adult worms and mf instantly after drug administration, as well as reducing the fertility of surviving female adult worms [86][87][88] . We denote the fractions killed as ω for adult worms, and ε for mf. The population sizes of worms and microfilariae are calculated after drug treatment by modifying the populations of each parasite stage obtained immediately prior to the treatment by: In the above, dt represents a short time-period since the time-point T MDA i when the i th MDA was administered. During this short time-interval, a given proportion of pre-patent/adult worms and mf (as specified by values of the drug efficacy rates for these life stages, ω and , ε respectively) are instantaneously killed. The parameter C is the population-level drug coverage in this formulation (note that the population coverage is applied to individuals ≥ 5 years old as children < 5 years old are ineligible for ivermectin treatment). Apart from instantaneous killing of adult worms and mf, ivermectin is also thought to sterilize or suppress the production of mf by worms surviving each MDA 89  where α α δ ′ = − C (1 ) reduc reflects the suppressed fecundity (over a period of T P months since the i th MDA) of adult female worms that survive the administration of ivermectin at each MDA.
We consider that integrating vector control (VC) using Abate with MDA will primarily add to the effects of MDA on worms by causing a reduction in the baseline black fly biting density over time. The decline in the black fly biting density ( V H ) is modeled as an exponential decay process, viz.
where A 0 is the expected decline in the biting density in the study areas undergoing Abate-based VC (Supplementary Fig. S3). The VC coverage parameter (C VC ) takes the value of unity for the case when all black fly breeding habitats are treated with larvicides. The values of instant worm/mf killing and suppression of mf production by surviving worms due to ivermectin, and the vector biting decay rate estimated from field data portraying observed declines in fly biting after larvicide application (A 0 ) are given in Supplementary Table S10.
The first MDA round is implemented as reducing the predicted baseline parasitic infection intensities (i.e. densities of worms and microfilariae) at the time of treatment. Infection dynamics are then tracked over time using a monthly time step, with the remaining MDAs implemented according to an annual and/or biannual treatment schedule. The impact of VC is modelled as reducing the biting rate starting from the first month of larvicide treatment. We simulated the effects of interventions under four hypothetical scenarios (annual and biannual MDAs without/with VC) with drug coverages ranging from 40% to 100%. Predictions arising from MDA provided at 80% coverage served to reflect the optimal scenario in these analyses. In addition, we also simulated the outcomes of the observed interventions carried out in the six sites where follow-up mf prevalence data were available. In this analysis, for years in which the site-specific MDA coverage was not available, we applied the  Table S7). In the absence of any information, as was the case for 2011-2013 in Kashoya Kitomi, we assumed the coverage achieved in the previous year was maintained.

Scientific RepoRtS |
(2020) 10:4235 | https://doi.org/10.1038/s41598-020-61194-w www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical tests for analysing data and model-estimated quantities. A pass/fail scoring approach was used to test for the goodness-of-fit of the SIR-selected models to the baseline mf age-prevalence data 90 . A model was considered behavioral if, for at least all but two age groups, the predicted prevalence for a given age group was within the 95% binomial confidence interval bounds for the observed data. The proportion of SIR model fits that fell with these bounds were scored to give an index of good-fit. Differences in the prior and posterior distributions of model parameters were tested by the 2-sample Kolmogorov-Smirnov (KS) test using the kstest2 function in Matlab. Classification trees for identifying important variables underlying between site heterogeneities were built and analyzed using the R package rpart. The binomial generalized additive model for evaluating differences in baseline data between either sentinel villages or foci were carried out using the mgcv package in R. Kruskal-Wallis tests for difference in model-estimated quantities (e.g. mf breakpoints, timelines to elimination) between locations and/or models were carried out using the Matlab function kruskalwallis.
Availability of materials and data. All data analysed during this study are included in this published article and its supplementary information files. Code for running the onchocerciasis model is available at https:// github.com/EdwinMichaelLab/OnchoModel_Uganda.git.