Socioeconomic and environmental determinants of foot and mouth disease incidence: an ecological, cross-sectional study across Iran using spatial modeling

Foot-and-mouth disease (FMD) is a highly contagious animal disease caused by a ribonucleic acid (RNA) virus, with significant economic costs and uneven distribution across Asia, Africa, and South America. While spatial analysis and modeling of FMD are still in their early stages, this research aimed to identify socio-environmental determinants of FMD incidence in Iran at the provincial level by studying 135 outbreaks reported between March 21, 2017, and March 21, 2018. We obtained 46 potential socio-environmental determinants and selected four variables, including percentage of population, precipitation in January, percentage of sheep, and percentage of goats, to be used in spatial regression models to estimate variation in spatial heterogeneity. In our analysis, we employed global models, namely ordinary least squares (OLS), spatial error model (SEM), and spatial lag model (SLM), as well as local models, including geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR). The MGWR model yielded the highest adjusted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}^{2}$$\end{document}R2 of 90%, outperforming the other local and global models. Using local models to map the effects of environmental determinants (such as the percentage of sheep and precipitation) on the spatial variability of FMD incidence provides decision-makers with helpful information for targeted interventions. Our findings advocate for multiscale and multidisciplinary policies to reduce FMD incidence.

maximum temperatures for the provinces from March 2017 to March 2018, provided by the Iranian Metrological Organization 24 .This data was initially in tabular format for a list of weather stations across Iran.To obtain temperature values at the provincial level, we employed the inverse distance weighted (IDW) method 25 .We generated interpolated temperature rasters covering the entire country using IDW.Subsequently, we calculated the average of the minimum and maximum temperatures for each month per province, utilizing the results obtained from the IDW interpolation process in the previous step.Fifth, we acquired monthly precipitation data 26 from the Climate Research Unit-Time Series, downscaled to a spatial resolution of 2.5 min (~ 21 km 2 ) using World-Clim 2.1 27 .This dataset consists of 12 raster files spanning from March 2017 to March 2018, with precipitation values provided in millimeters.To derive the monthly average precipitation at the provincial level, we aggregated the raster data from March 2017 to March 2018 for each province.
Sixth, The National Geospatial-Intelligence Agency 28 supplied the Digital Elevation Model (DEM) with a resolution of 30 m. DEM was employed to calculate the average elevation per province.Lastly, NASA Earth Observations 29 provided the monthly Normalized Difference Vegetation Index (NDVI).The average NDVI value per province was derived for each 12 months from March 2017 to March 2018.The NDVI values in this study ranged from − 1 to 1, where a value of 1 indicated higher vegetation.They were rescaled to a range of 0-255 and stored as unsigned 8-bit data for data storage.
Descriptive and exploratory analysis.First, the Pearson correlation matrix was applied to explore the correlation between all 46 explanatory variables and remove explanatory variables with low correlation to FMD incidence (dependent variables).Explanatory variables with correlation coefficients less than |0.3| (poor correlation 30 ) were removed from the modeling process (see Supplemental Fig. 1).Then, the variance inflation factor (VIF) was employed to detect variables with a high VIF, more than 5, to remove exploratory variables with high multicollinearity 31 .For comparison, all global and local methods were implemented with the same chosen variables.We executed a global Moran I statistic to find the pattern of FMD incidence using ArcMap (version 10.8).
Next, Global models were executed by Python Spatial Library (PySal) in a Python environment (version 3.7.11)using the first-order Queen contiguity to calculate the weight matrix 32 .Then, Local models were executed by MGWR (version 2.2) using a Fixed Gaussian kernel, set with the bandwidth and minimized corrected Akaike's Information Criterion (AIC) 33 .Finally, adjusted R 2 and AIC were used to examine the performance of the models,     Global models.Three global regression models were applied in this study, including ordinary least square (OLS), spatial lagged model (SLM), and spatial error model (SEM).OLS is a linear aspatial regression method that can estimate the dependent variable (incidence of FMD) using a group of independent variables 34 .SLM is a subset of the OLS 35 that considers the effect of a spatial unit on adjacent units in the region, as shown in the Eq.(1) 36,37 .SLM explains as follows: ρ is the spatial auto-regressive ratio, and w is the spatial weight matrix representing distance associations between the centroid of provinces.The spatial lag function that assesses the effect of adjacent variables together can involve an independent variable during modeling 36 .In the SEM model, e i is the absolute error term respon- sible for solving the problem of spatial autocorrelation.It splits into two components: first, the spatial component of the error term ( w i ξ i ), and second, random error ( e i ) as follows 38 : Local models.Geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR) are two local spatial regression models used in this study.GWR donates to modeling spatial processes, and it can estimate the dependent variable (FMD incidence) using a group of independent variables measured at a location 39 .The equation of GWR is as follows: y i (u) is the dependent variable, in this case study, FMD incidence.The parameters with the notation x ji are independent variables, for example, the number of sheep or precipitation in location i.In Eq. (3), we see the number of independent variables is m.β 0i shows the parameter that explains a relationship around location u and is specific to that location β ∧ (u) takes the form as follows: W represents the weight matrix that considers the effect of the neighboring points relative to point u.X T W(u)X is the geographically weighted variance-covariance matrix, and y is the vector of the values of the dependent variable (FMD incidence in this research).The leading diagonal of the W(u) matrix consists of the geographical weights, and the off-diagonal elements of the W(u) matrix are 0. The weights computed using the Gaussian kernel 39 as follows: (1) w i (u) is the geographical weight of the ith observation (centroid of the provinces) relative to the location u, d i (u) indicates the Euclidean distance between the ith observation and the location u, and h is a parameter named the bandwidth 39 .Although GWR is a significant enhancement compared to SLM and SEM regression models, the scales of relationships between the dependent and explanatory variables are assumed to be constant in GWR.Hence, MGWR is shown as follows 33,40 : β bwj is the bandwidth used for calibration of the jth relationship.Practically, MGWR is a model that could be calibrated using backfitting algorithms.The MGWR can be reformulated as explained in the following equaton 40 .
β bwj X ij is replaced by f ij , while indicates the jth additive term and is a function utilized to jth dependent variable of ith province.Consequently, the MGWR model incorporates different bandwidths for independent variables.These varying bandwidths capture differences in spatial scales, allowing the model to capture spatial heterogeneity by considering the influence of scale on spatial processes.By incorporating different bandwidths, the MGWR model can effectively account for variations in spatial relationships across different scales, providing a more comprehensive understanding of the spatial processes 33,40 .
In this study, we conducted several statistical tests to assess the validity and robustness of our regression models.Firstly, we employed the Jarque-Bera test to examine the normality of the residuals obtained from the OLS regression 41 .The null hypothesis (p value < 0.05) assumes that the residuals follow a normal distribution, while the alternative hypothesis (p value > 0.05) suggests departures from normality.Secondly, we employed Moran's I test to assess the presence of spatial patterns in the distribution of FMD incidence 42 .Moran's I test helps us determine if there is spatial autocorrelation in the occurrence of FMD incidence, indicating whether neighboring areas exhibit similar levels of FMD incidence.Lastly, we employed the condition number (CN) to measure multicollinearity in the GWR and MGWR regression models 43 .The condition number quantifies the degree of multicollinearity, indicating the potential presence of highly correlated independent variables.By assessing the condition number, we can identify and address issues related to multicollinearity, which can affect the stability and interpretability of regression models.

Results
Initially, 46 candidate independent variables were subjected to Pearson correlation analysis, which identified four variables with correlations greater than |0.3|.Specifically, these variables were the percentage of the population, precipitation in January, and the percentages of sheep and goats (see Supplemental Fig. 1).Subsequently, the ordinary least squares (OLS) model was applied to these four selected variables to assess their variance inflation factors (VIFs).As shown in Table 2, all four variables exhibited VIF values below 5, indicating no significant multicollinearity issues within the OLS model.Finally, based on these findings, the four selected variables were deemed suitable for inclusion in both the global and local spatial models, which will be further examined in the subsequent analysis.
The OLS model yielded an AIC value of 208.396.Furthermore, the Jarque-Bera statistic (value = 5.804, p = 0.0548) indicated insufficient evidence to conclude that the OLS regression residuals deviate from a normal distribution.Based on Supplemental Table 2, Moran's I test (I = 0.00078, z = 0.51011, p = 0.60997) suggests that the pattern of FMD incidence does not appear to be significantly different from random.It indicates that there is no compelling evidence of spatial clustering or dispersion in the distribution of FMD incidence.
The p values associated with the selected independent variables in Supplemental Tables 3 and 4 for the global models (OLS and SLM) indicate that the percentage of sheep and goats have p values below 0.05.It suggests that (6) Variance inflation factor (VIF) for the final independent variables.

Variable VIFs
The percentage of population 1.14318 The percentage of sheep 1.17634 The percentage of goats 1.25663 Precipitation in January 1.17050 these two variables are statistically significant among the independent variables and positively associated with FMD incidence per province.
Based on the results presented in Table 3, the OLS model, with the lowest adjusted R 2 value of 0.43, performed the poorest among the global and local models.However, when accounting for spatial dependence, the SEM and SLM models demonstrated improved performance compared to OLS, with adjusted R 2 values of 0.52 and 0.51, respectively.Moving to the analysis of local spatial differences, GWR and MGWR were utilized.Notably, the local models exhibited substantially improved adjusted R 2 compared to the global models.Specifically, the adjusted R 2 value for GWR was 0.70, indicating a considerable increase in explanatory power.However, MGWR outperformed all other models, achieving the highest adjusted R 2 value of 0.90 and the lowest AIC value of 27.499.These results indicate that MGWR is the most effective model employed in this study, explaining 90% of the total variations in FMD incidence.Additionally, Moran's I test was conducted on the residuals of both GWR (I = − 0.019827, z = 0.188240, p = 0.850688) and MGWR (I = − 0.024190, z = 0.129640, p = 0.896851) indicated statistical insignificance, suggesting a lack of residual spatial autocorrelation in line with the model assumption.
The results of the MGWR model are presented in Supplemental Table 5.The optimal bandwidth for the independent variables ranges from 169.740 to 3618.130, indicating that these variables exhibit variation on different spatial scales.Specifically, the percentage of goats and the percentage of sheep are observed at a smaller spatial scale compared to the percentage of the population and the precipitation in January.On the other hand, the GWR model employs a fixed bandwidth of 485.560 for all variables, which does not account for the varying spatial scales of the predictors.Analyzing the spatial variability of local variables value reveals that these predictors primarily have a local impact rather than a global one, further highlighting the localized nature of their influence on FMD incidence.
Figures 4 and 5 depict the spatial variation of variables based on the GWR and MGWR models, respectively.In these figures, p values less than 0.05 in the provinces indicate statistical significance, and positive coefficients indicate a positive association between a variable and FMD incidence.Specifically, Fig. 4a represents the intercept for the GWR model, while Fig. 5a represents the intercept for the MGWR model.Figure 4b highlights a statistically significant and positive association between the percentage of the population and FMD incidence in southeastern provinces, including Sistan and Baluchestan, as well as Hormozgan provinces.It suggests that a higher population percentage in these regions is associated with an increased risk of FMD.
Figure 4c illustrates a statistically significant and positive association between the percentage of sheep and FMD incidence throughout the country, excluding the western provinces.It suggests that a higher percentage of sheep in most regions of the country is associated with an increased risk of FMD. Figure 4d displays a statistically significant and positive association between the percentage of goats and FMD incidence throughout the country, except in the eastern and northwestern provinces.It indicates that a higher percentage of goats in most regions of the country is associated with an increased risk of FMD.On the other hand, Fig. 4e reveals a statistically insignificant association between precipitation in January and FMD incidence.It suggests that precipitation in January does not significantly contribute to the variation in FMD incidence across the country.Moving on to Fig. 5b demonstrates a statistically significant and positive association between the percentage of the population and FMD incidence in specific provinces, including Busher, Hormozgan, Fars, Yazd, North Khorasan, South Khorasan, and Razavi Khorasan.However, the percentage of the population is not a statistically significant variable in other provinces.Lastly, Fig. 5c-e indicate that the percentage of sheep, goats, and precipitation in January are not statistically significant variables in Iran, according to the MGWR model.
Figure 6 shows the local R 2 of GWR and MGWR models used in this study.In both models, several prov- inces in the northeastern parts had high local R 2 , which means the model performs better in these areas.In the northwest, like West Azerbaijan and East Azerbaijan provinces, the value of local R 2 indicates the inadequate performance of the model.Supplemental Fig. 3 compares the condition numbers (CN) in the GWR and MGWR models.The condition number reflects the degree of collinearity among the explanatory variables in the model.In the GWR model, some provinces in eastern Iran exhibited high condition numbers (CN > 3.47), indicating a high degree of collinearity among the explanatory variables in those provinces.However, the MGWR model showed lower condition numbers than the GWR model, suggesting reduced collinearity among the variables in the MGWR model.It indicates that the MGWR model provides a more reliable and stable estimation of the relationships between the variables in those provinces.

Discussion
In this study, we employed spatial modeling methods to identify the key factors contributing to FMD incidence at the province level in Iran.Out of the 46 potential variables considered, we selected four variables representing different thematic categories, including environmental, socioeconomic, demographic, and topographic factors.These variables include the percentage of the population, precipitation in January, percentage of sheep, and percentage of goats.We aimed to capture the essential determinants of FMD incidence in Iran by focusing on these variables.Our approach aligns with recent studies that have utilized spatial analysis techniques to investigate FMD patterns and drivers 14,44,45 .Dion and Lambin 46 conducted a study examining the transmission risk scenarios of FMD in southern Africa, considering climatic, social, and landscape changes, which aligns with our research.Furthermore, another study investigated the impacts of climate change, specifically abrupt temperature changes, on the risk of FMD disease in elephants across Asia and Africa, showcasing the relevance of climate-related factors in understanding FMD dynamics 7 .
In this study, we employed a comprehensive modeling approach that included global aspatial modeling (OLS), global spatial modeling (SEM and SLM), and local spatial modeling (GWR and MGWR) to analyze the spatial distribution of FMD in Iran.Our findings highlight the superiority of the MGWR model over the traditional GWR model in achieving a more precise model fit.By utilizing unique bandwidths for each covariate, MGWR can capture intricate relationships that may be overlooked by the GWR model.Although this approach increases computational complexity, it provides a more nuanced understanding of spatial patterns and influences of each  also investigated the spatial distribution of tuberculosis and its association with meteorological factors in mainland China.Their study revealed that GWR was a more suitable modeling approach versus OLS, as indicated by higher adjusted R 2 values and lower AICc scores.The results from Ye et al. and Zhang et al. are consistent with our findings, supporting the notion that local spatial regression methods such as GWR and MGWR yield higher adjusted R 2 values when compared to global regression methods (the OLS, SLM, and SEM) 47,48 .
Following the spatial modeling analysis, our study identified two key variables, the percentage of sheep and goats, which demonstrated a significant impact on disease incidence across most provinces in Iran.Continuous monitoring of these variables is crucial for understanding the dynamics of FMD spread at the provincial level in Iran.These findings align with the results of Begovovea et al. 49 study, which identified the population of sheep, plus goats, as significant factors influencing FMD prevalence in northern Nigeria (Bauchi, Kaduna, and Plateau states).In contrast, while previous studies have highlighted the significance of climate factors such as precipitation in FMD occurrence, our study did not find precipitation to be a significant variable.This contrasts with the findings of Rahman et al. 50, who investigated FMD space-time clusters and risk factors in Bangladeshi cattle and buffalo.They highlighted the substantial role of climate, particularly precipitation, in FMD incidence 50 .Jiang et al. 7 also emphasized the influence of climate change on FMD risk in elephants, noting the importance of precipitation and temperature in this context.Additionally, Lee et al. 51 studied the temporal patterns and space-time cluster analysis of FMD cases in Vietnam from 2007 to 2017.They identified a higher occurrence of FMD cases during the dry season, from November to March 51 .These variations in findings regarding the significance of precipitation in FMD incidence highlight the importance of considering regional and local factors in disease dynamics.Further research is needed to explore the specific contextual factors that influence FMD transmission patterns in different regions, considering the interplay between climatic variables, host characteristics, and local epidemiological conditions.
Limitations.There were several limitations to this study.Firstly, the finest spatial granularity for demographic data in Iran was available at the province level due to limited data access.However, obtaining data at the county, district, and farm levels could have provided more accurate and detailed results.Additionally, the unavailability of FMD disease data disaggregated by species restricted the analysis to the provincial level without considering the specific impact of FMD on different livestock species.Access to species-specific data would have facilitated a more comprehensive understanding of the disease dynamics and allowed for targeted prevention and control measures.Secondly, the study was limited regarding the variables used in the modeling process.While efforts were made to include relevant socio-environmental determinants, additional variables such as vaccination-related factors, as observed in previous studies, could have enhanced the analysis.For instance, considering vaccination coverage, particularly the vaccination of calves under 12 months, as an explanatory variable would have provided valuable insights into the effectiveness of vaccination strategies in controlling FMD.Thirdly, the temporal scope of the study was limited to the period from March 21, 2017, to March 21, 2018.A more extended or recent time frame would have allowed for a more comprehensive assessment of FMD incidence trends and patterns in Iran.Incorporating longer-term data could have provided insights into temporal variations and the potential impact of evolving factors on FMD incidence.Lastly, it is essential to acknowledge that the results of this study are generalizable only at the province level.Attempting to conclude sub-province or individual levels may lead to inaccurate inferences due to the potential for ecological fallacy.Therefore, caution should be exercised when extrapolating findings beyond the analyzed spatial scale.Addressing these limitations in future studies, including obtaining data at finer spatial and temporal resolutions, incorporating speciesspecific FMD information, and expanding the range of variables considered, would further enhance our understanding of FMD dynamics and support more targeted and effective control strategies.

Conclusion
In this study, we employed global and local spatial models to investigate the key factors influencing the occurrence of FMD in Iran.Our findings revealed that global models performed relatively poorly compared to local models, highlighting the importance of capturing spatial variation in FMD incidence.The MGWR model demonstrated the highest performance, with an adjusted R 2 of 90% among the local models.It emphasizes the significance of considering localized spatial effects when studying FMD incidence.The results highlighted the percentage of sheep and the percentage of goats as the most significant factors among the four selected socioenvironmental determinants in explaining FMD incidence across most of the provinces in Iran.It underscores the importance of considering the livestock population when making vaccination-related decisions.

Figure 1 .
Figure 1.The animal population per province in Iran.

1 .
Number of educated in province Total population in province × 100 3. Percentage of migrated persons: Number of migrants in province Total population in province × 100 4. Percentage of gross domestic product (GDP) without oil from March 21, 2017, to March 21, 2018 1.The summary of the results of the 2016 workforce statistics plan of Iran 21 2, 3. Population and housing census of Iran, 2016 20 4.An overview of the gross domestic product by province from 2010 Percentage of population: Number of population in province Total population in the country × 100 1. Percentage of sheep: Number of sheep in province Total sheep in the country × 100 2. Percentage of goats: Number of goats in province Total goats in the country × 100 3.The percentage of cows: Number of cows in province Total cows in the country × 100 1. Population and Housing Census of Iran, 2016 20 2, 3, 4. National Agricultural Census of Iran,

Figure 3 .
Figure 3. Illustrating workflow of the study.

Figure 4 .
Figure 4. Illustrating the percentage of sheep, goats, population, and precipitation in January associations with the FMD using the GWR model at the provincial level.The "c" and "p" labels indicate the coefficient and the p value, respectively.

Figure 5 .
Figure 5. Illustrating the percentage of sheep, goats, population, and precipitation in January associations with the FMD using the MGWR model at the provincial level.The "c" and "p" labels indicate the coefficient and the p value, respectively.

Figure 6 .
Figure 6.The geographic distribution of the local R 2 of the GWR and MGWR: the association of FMD with the percentage of sheep and goats, population, and the precipitation in January.

Table 1 .
Definitions and sources of explanatory variables used in this study.

Table 3 .
The value of adjusted R 2 and AIC for global and local approaches in the modeling of FMD in Iran from March 21, 2017, to March 21, 2018.