Spatiotemporal dynamics and risk factors for human Leptospirosis in Brazil

Leptospirosis is an emerging neglected tropical disease with a worldwide significant global health burden. Between 2000 and 2016, there were 63,302 cases of human leptospirosis and 6,064 deaths reported in Brazil. We modeled the spatiotemporal risk dynamics of human leptospirosis morbidity and lethality, and attributed an easily interpretable risk-based priority index (PI) for all Brazilian federative units to suggest improvements to the national surveillance system. We also developed a conceptual framework of causality and estimated the effects of environmental and socioeconomic determinants of morbidity and lethality. Spatiotemporal risk patterns of morbidity and lethality differed. For morbidity, the pattern was mainly spatial, whereas lethality was mainly explained by the spatiotemporal interaction. The hypothesized causal model explained a relevant fraction of the heterogeneity in the spatial and spatiotemporal interaction patterns. The increase in soil moisture, precipitation, poverty, and the decrease in the proportion of urban households, acted as risk factors. The increase in the proportion of households in which waste is directly collected and in temperature were preventive factors. The structured temporal trend was increasing for morbidity and decreasing for lethality. In terms of morbidity, it was clear that the prioritization should be focused in a couple of states, mainly Acre. In terms of lethality, the allocation of resources need not be as asymmetric, but there was nonetheless a prioritization order. The proposed approach can be used to characterize spatiotemporal dynamics of other diseases and to inform decision makers.

Data source and collection. The count number of cases and deaths of human leptospirosis, and the number of dengue cases, were obtained from the National Information System of Health of the Ministry of Health (Sistema de Informação de Agravos de Notificação [SINAN] do Ministério da Saúde [MS]) 14 . To test the conceptual framework of causality described below, we collected secondary environmental and socioeconomic covariates. The environmental covariates, obtained from TerraClimate 15 , consisted of average values of descriptive statistics (minimum, mean, maximum, range, standard, deviation, sum), calculated over a grid of approximately 4 km 2 . The socioeconomic covariates were obtained from the Brazilian Institute of Geography and Statistics (IBGE) 16 and the Institute of Applied Economic Research (IPEA) 17 . All variables were centered by subtracting their respective mean and scaled by dividing the centered values to the standard deviation. Table 1 illustrates the variables and data sources.
Conceptual framework of causality. Our conceptual framework rests on the following assumptions.
First, the morbidity and the lethality of human leptospirosis do not occur at the same rate across the Brazilian states. Furthermore, the temporal trends in morbidity and lethality vary across states. Regarding the causal determinants, our simplified assumption is as follows: Rainfall, with precipitation as proxy, might promote the formation and maintenance of stagnated water, increasing the soil suitability for Leptospira survival (Fig. 1a). This suitability is also affected by temperature (Fig. 1b) and other environmental variables (Fig. 1c). The soil moisture is a proxy of soil suitability and increases the risk of Leptospirosis morbidity (Fig. 1d). Environmental variables also affect the risk of Leptospirosis morbidity by means other than their effect on soil suitability (Fig. 1e,e'). Moreover, the higher the proportion of poor households in Brazilian states, the more common it is to find unfavorable socioeconomic contexts where waste management is inadequate or absent (Fig. 1f) and illiteracy is higher (Fig. 1g). In these poverty contexts, which differentially affect urban and rural areas, there is a lower ability to cope with floods, prevent the stagnation of water, and perform proper rodent (and other host species in rural areas) control. Thus, poverty affects Leptospirosis morbidity through waste management (Fig. 1i), illiteracy (Fig. 1i') and through other socioeconomic factors (Fig. 1i"). An unknown fraction of Leptospirosis cases are misdiagnosed as dengue cases, an error that can lead to death of some patients owing to improper or delayed treatment (Fig. 1j,j'). To complete the causal network, we include biological agent and host determinants (Fig. 1k). Other febrile disease that may confuse the diagnostic of Leptospirosis were not considered because they have a localized occurrence (malaria is only present in the north region of Brazil), are almost absent (typhoid fever), or emerged or re-emerged recently and therefore there are no data for the entire studied period (yellow fever, zika virus and chikungunya).

Statistical models.
We performed exploratory analysis to verify data consistency and to obtain a preliminary idea of the association between covariates. Environmental covariates such as precipitation, temperature, and soil moisture were represented by more than one value (minimum, mean, maximum, standard deviation, range), and for each covariate, we selected the value that showed a stronger correlation with the leptospirosis relative risk (RR). This selection resulted in: minimum precipitation, maximum temperature, and minimum soil moisture.
All statistical models had a spatiotemporal architecture. The first set of models was used to model the RR of human leptospirosis cases, the second for human leptospirosis lethality, and the third to represent the conceptual framework of causality. Within each of these sets, models differed in their spatiotemporal architecture; in the last set, models also differed in the combination of covariates. Given the i (i = 1, …, 27) states and t (t = 2000, …, 2016) years, let y it be the number of human leptospirosis cases in state i and year t, P it be the human population at risk in state i and year t, and E it be the yearly expected number of human leptospirosis cases in state i and year t, calculated by indirect standardization: We assumed that and θ it is the state-year-specific RR. For lethality, y it represented the number of deaths and E it the number of cases. For the models of the causal network, t ranged from 2002 to 2014. We evaluated two spatial, two temporal terms, and four spatiotemporal interactions. Let υ i and v i be the structured and unstructured spatial terms, respectively: where τ ς is the marginal precision, η δi is the number of neighbors of i, and φ measures the proportion of marginal spatial variance explained by υ. The overall spatial effect was given by 18 : This model is known as BYM2 18 . Let also γ t and ω t be the structured (random walk of first order: RW1) and unstructured temporal terms, respectively:γ The spatiotemporal interactions δ it were given by the Kronecker . v i and ω t were independent and identically distributed random variables (IID).
Considering the previous spatiotemporal terms, the intercept β 0 and coefficients β p (where p is the number of covariates) all distributed as β τ β N(0, ) 1 , Table 2 presents the model nomenclature we will use hereafter. Note that for causal models of morbidity, models C1a and C1b estimated the effects of distal determinants (upper boxes in Fig. 1). Models C2a and C2b estimated the effect of soil, controlling for the effect of the distal determinants. Models C3a and C3b replaced poverty by the percentage of urban households, models C4a and C4b replaced poverty by the percentage of households with proper collection of waste, and models C5a and C5b replaced poverty by the number of illiterate persons. Poverty, waste management and illiteracy did not enter in the same model due to collinearity. The effect of soil moisture was not controlled when estimating the effect of precipitation and temperature because it was not a potential confounder; part of the effect of precipitation and temperature was mediated by the soil moisture. The difference between causality models with suffixes a and b is that models with suffix a had only unstructured terms ω and v, whereas those with suffix b had all spatial and temporal terms and the interaction between unstructured terms δ it = v i ⊗ ω t ( Table 2). The percentage of variability in random effects explained by covariates was measured by the relative change in the standard deviation (SD) of the random effects. For example, the effect of model C3b covariates in the random effect ς, taking RR1 as a model of reference, was given by (SD(ς RR1 ) − SD(ς C3b ))/SD(ς RR1 ) * 100. The relative change was 100 if covariates explained all of the variability, 0 if they did not change the variability, and negative if they increased the variability.
The proportion of marginal variance explained by each component of the RR and lethality models was given by: where n was the size (10,000) of samples drawn from the marginal posteriors τ ς , τ γ and τ ρ . f ς was further multiplied by φ and 1 − φ to obtain the proportion of variance explained by υ and v, respectively. We characterized the RR and lethality of the correspondent best model in terms of fitted values, excess risk (RR it > 1; lethality it > average lethality), and a priority index (PI): where x i is the RR or lethality of the state i, ER is the excess risk, and t is the year; the PI was separately calculated for each year. The PI weights the RRs (or lethalities) by the corresponding excess risk, and the maximum value is scaled to 100. Other values are relative to that maximum. In addition, we calculated Pearson's correlation coefficient between excess risk morbidity and lethality. All models were implemented using the Integrated Nested Laplace Approximation (INLA) 20 .
Priors. The default priors used for this study were penalized complexity (PC) priors. Under the PC priors approach, model complexity is specified as the divergence between a flexible model and a base model 21 . In our case, the base model was characterized by τ i = ∞, i = {ς, γ, ω, β} and φ = 0. In flexible models, τ i < ∞ and φ > 0. The divergence was given by the unidirectional Kullback-Leibler divergence: The complexity was penalized by constant decay-rate, specified in terms of a type-2 Gumbel distribution, using the probability state- These statements correspond to a decay rate equal to −log(α)/U 22 . As default priors we used τ > . . = . Role of the funding source. The funder had no role in the collation of the data, development of the conceptual framework, analysis of data, interpretation of data, writing of the manuscript, or the decision to submit the paper for publication.

Results
Descriptive results. From January 2000 to December 2016, a total of 63,302 cases and 6,064 lethal lepto-  (Fig. 2). The spatial distribution of the RR average is shown in Fig. 2. The minimum average lethality per one hundred thousand cases was 3.34 (2009); the maximum was 12.94 in 2001 (Fig. 3). The spatial distribution of the average lethality rates from 2000 to 2016 is mapped in Fig. 3 (see Supplementary- Table 1 and Supplementary Figs 1 and 2 for values of each state annual RR and lethality, respectively).
In addition, all covariates used in the conceptual framework of causality were described and mapped to show each spatiotemporal distribution (Supplementary-Figs 8 to 14).
Spatiotemporal RR and lethality models. In RR models, the spatial and temporal structured effects improved the model fitting, but the better spatiotemporal interaction term δ it was among unstructured effects (Table 3). In L models, the improvements caused by structured effects were lower and the best spatiotemporal interaction term δ it was among the unstructured spatial effect and the structural temporal effect. In RR1, the best of the RR models, the unstructured spatial effect and the spatiotemporal interaction were the major contributors to the explained variance. In L2, the best of the L models, the spatiotemporal interaction was the major contributor to the explained variance, followed by the unstructured spatial effect ( Table 4). The temporal effects were more relevant in L2 than in RR1 (Table 4). The spatiotemporal patterns of the calculated posterior means were heterogeneous, especially for morbidity (Figs 4 and 5, Supplementary Figs 1 to 4, and Supplementary-Anim 1 and Supplementary-Anim 2). The morbidity and lethality patterns tended to be opposed, as demonstrated by the negative correlation between their posterior means for the excess risk (Pearson's correlation coefficient = −0.27, credible interval (CI) 95% = −0.36, −0.19). The structured temporal trend increased for morbidity and decreased for lethality (Fig. 6). The morbidity PI of the last five years was particularly high in the Acre State (Fig. 7); the lethality PI was less variable (Fig. 8).
State-year PIs are also described in Supplementary- Table 2.
Causality models. Based on models with suffix a, soil moisture was the strongest risk factor and an increase in this property, equivalent to one SD, increased the risk of leptospirosis morbidity by 55.9%. Poverty and precipitation were the other risk factors, with effect sizes of 7.7% and 4.5%, respectively (Table 5). An increase in one SD in the percentage of urban households decreased the risk by 14.6%, while an equivalent increase in the percentage of households with proper collection of waste decreased the risk by 6% ( Table 5). The temperature was a preventive factor, while the number of illiterate individuals had a credible interval compatible with the absence of effect ( Table 5). The reported effects of soil moisture were calculated controlling for poverty. These effects were similar and qualitatively equivalent when poverty was replaced by the percentage of urban households, the percentage of households with proper collection of waste or with the number of illiterates individuals. The number of dengue cases had no effect on lethality: RR = 1⋅006 (0.944-1.038). The plots of covariates against model residuals did not show nonlinear trends (Supplementary-Fig. 15). In models with suffix b, only soil moisture and temperature had credible intervals excluding the absence of effect. Models C3a and C3b had the lowest DIC in within each group of models ( Table 6). As those models had the same spatiotemporal architecture as RR1, we were interested in calculating the percentage of variability in the random effects explained by covariates. When compared to RR1, all these models reduced the variability of the global spatial effect and of the spatiotemporal interaction; the variability of some temporal effects increased (Table 7).

Discussion
The morbidity and lethality of human leptospirosis presented different spatiotemporal patterns and did not occur systematically in the same states for the entire period. Five out of the seven hypothesized causal network covariates had the expected direction of association and explained a fraction of the spatial and spatiotemporal interaction variability. The outputs of the analyses presented here highlighted which regions would benefit from reinforcement of disease control, surveillance, and funding strategies. The approach developed in this study also offered a complementary mapping approach, including a risk-based PI that ranked the states in order for prioritization, generating a valuable evidence base to guide and improve leptospirosis surveillance, control, and elimination planning.
The spatiotemporal dynamics of human leptospirosis were explored in the entire national territories of both China 31 and Colombia 32 , in China from 2005 to 2015 and in Colombia from 2007 to 2016. In China, the crude rates of morbidity and mortality decreased over this period, and the temporal trends were modeled assuming linearity. Crude rates were presented at the province levels and were higher in the south of the country. In Colombia, six spatiotemporal high-risk clusters of morbidity were identified using scan statistics. There was an epidemic peak over a period of a few months, but no clear trend during the entire period examined. Our study used spatiotemporal models appropriate for crude rates that can be misleading owing to sparsity or small values, as is the   case with the human leptospirosis rates. The models also allowed us to explore nonlinear trends over a period of 17 years. The spatiotemporal variation we observed was expected, assuming that many of its causes vary across space and time. This variation in disease occurrence can be detected at different scales, as demonstrated by our study and other studies restricted to smaller areas and shorter periods 13,33 . However, detection of spatiotemporal interactions might be more dependent on scale and methodology. A four-year prospective study in a Brazilian slum 13 detected spatial and temporal variations but was unable to identify spatiotemporal interactions: although incidence varied across years, the spatial pattern was the same, and specific hot-spots consistently had higher risk of transmission during the study years. In contrast, we explicitly modeled spatiotemporal interactions over 17 years and these had considerable explanatory power, more so for lethality (Table 4); in other words, the hot spots were not always the same and the temporal trend varied across states (see Supplementary Figs 3 and 4). However, the interpretation of the lethality models requires caution due its limited fit to the data suggested by the fraction of posterior predictive p-values in the tail deciles (Table 3) 22 .
The conceptual framework assumed that environmental and socioeconomic determinants affect morbidity but not lethality (Fig. 1). These determinants had their own spatiotemporal patterns, and they explained a fraction of the spatiotemporal variability in morbidity. The spatiotemporal pattern of lethality was less variable than that observed for morbidity, probably because it in fact was not affected, or at least was less affected, by environmental  Table 4. Percent contribution of each model's random effect to the variance explained by the model.  and socioeconomic determinants. Moreover, sub-notification, especially for cases in the early stage of disease, could have contributed to the spatiotemporal variability of morbidity and lethality, but to the former to a greater extent. This could have been related to the negative correlation between the spatiotemporal patterns of morbidity and lethaliy. If one assumes that there are more sources of variation for morbidity and that sub-notification is more influential on morbidity, one would expect to find, as we did, greater heterogeneity in spatiotemporal patterns of morbidity. Multiple causal factors establish complex interactions that determine the risk of transmission of human leptospirosis. Although the causal network is not fully understood, ecological 10,34 , multilevel 13,33 , and individuallevel 35,36 studies have identified environmental and socioeconomic variables that consistently act as risk factors. We decided to synthesize some of these previous findings in an explicit conceptual model of the causal network of human leptospirosis, with a focus on variables readily available, and indexed by spatial and temporal units. We chose the state and the year as spatial and temporal units, respectively, because these were the units in which socioeconomic variables of interest were available, and because they allowed us to test the causal hypothesis at the same resolution that supported our spatiotemporal dynamics findings of morbidity and lethality. These units implied the aggregation of data over large areas and periods, which could have diluted or reversed the direction of the associations under study. However, five of the seven covariates had effects consistent with the literature reports, as described below. Furthermore, covariates explained a substantial fraction of the spatial and spatiotemporal interaction variability.
Most reports of leptospirosis include exposure to contaminated soil and water but not direct contact with animals, so it is thought that the most common method of transmission is mediated by contaminated soil and water 33,37,38 . Leptospira is recurrent in soil with a moisture content > 20% 39 and survives in temperatures ranging from 4 to 40 °C 40 . These parameters entail a wide range of environments and offer little guidance for surveillance. However, individual-level studies -as concluded by a systematic review -nearly always identify as risk factors water-related activities and exposure to floods and rainfall 11 . An ecological study also found that risk increases with increased rainfall 41 , and we observed that the involvement of rainfall as a risk factor is also verifiable on large spatiotemporal scales. The soil type might also influence the occurrence of human leptospirosis, according to ecological studies concluding that heavy sabulous clay soils 34 and Neossolo Litolítico soils 10 have a positive association with the occurrence of human leptospirosis. Sabulous clay soils allow water to inundate the soil when inclination is favorable (as in the study of Rodd et al. 34 ) and Neossolo Litolítico soils have a low drainage capacity.    Table 6. Comparison of causality models.
Thus, these types of soils might have a higher moisture content suitable for the survival of Leptospira. Our results support this rationale and provide ecological evidence based on a variable (soil available water capacity) 15 that is a more direct measure of the available water that can be stored in the soil. Higher temperatures seem to increase the risk of human leptospirosis 42 , but we found the opposite. A recent German study in muskrats also found an inverse association between leptospirosis and maximum temperatures 43 . One possible explanation for these findings may be the fact that warm temperatures deplete the moisture in the environment, which would reduce the chances for the survival of Leptospira outside its host 5 . This contradicting evidence regarding the relationship between temperature and leptospirosis morbidity should clarified by modeling individual outcomes under multilevel approaches, using covariates at finer spatial and temporal scale (e.g., at the municipality and month levels). Poverty has been considered a risk factor for human leptospirosis 13 , and we hypothesized that under conditions of poverty, illiteracy is higher, waste management is inadequate, and there is a lower ability to cope with floods, prevent the stagnation of water, and ensure proper rodent control. The hypothesized relationship between poverty, illiteracy, and waste management was verified among covariates. As expected, poverty was a risk factor and better waste management a protective factor. Urbanization was also a protective factor, possibly owing to its negative association with poverty and its positive association with waste management. Illiteracy was not a direct risk factor, possibly because its effect was confounded by poverty. Illiteracy, waste management and poverty did not enter in the same model due to collinearity. It should be noted that disorganized urbanization can promote leptospirosis transmission, and living in slums is a risk factor 13 , but these findings do not necessarily mean that urban areas are at greater risk than rural ones. In fact, our results are not the first to point to rural areas as associated with greater risk 44 .
The covariates had no effect in models with structured random effects (causality models with suffix b), perhaps because they had a spatiotemporal pattern masked by the structured effects. It is possible that the percentage of variability in spatial and spatiotemporal random effects explained by covariates resulted from this masking effect, which has been documented in simulation and empirical studies based on spatial models with structured effects 45,46 . Covariates explained a fraction of the variability of most random effects. Some models with covariates increased the variability in temporal random effects, but it should be remembered that ω and γ explained only 0.1% and 0.7% of the model variability. Therefore, these increases were negligible.
Infectious diseases in general are often sensitive to variability on a fine spatiotemporal scale that was not well represented in our study. This is problematic because the dynamics of infectious diseases may be particularly sensitive to extremes or variability removed in the aggregation of the data 13 . Therefore, our findings should be complemented by more detailed characterizations in order to guide interventions within the states.
The interpretability of numeric information can be conditioned by the format in which the information is presented 47 . The PI used is a risk-based percentage scale that ranks units of analysis and measures the distance between them. It might facilitate the communication of spatiotemporal risk predictions, especially to stakeholders lacking background to interpret concepts such as spatial RR and excess risk. The PI suggests the order in which the units of analysis should be prioritized and how different the prioritization assigned to them should be. In terms of morbidity, it was clear that the prioritization should be focused in a couple of states, mainly in Acre (see Supplementary Table 2). In terms of lethality, the allocation of resources need not be as asymmetric, but nonetheless there was a prioritization order (see Supplementary- Table 2 for each state-year PI).
Spatiotemporal patterns of morbidity were more heterogeneous than those of lethality, probably because there are more causal determinants acting on morbidity. Sub-notification might have contributed to these patterns, and we urge improvements in leptospirosis reporting in all states. The causal model synthesized previous findings regarding the contextual determinants of leptospirosis, most of its components agreed with observed data, and it serves as a guide for future research on causal determinants. The PI might facilitate communication with decision makers and the general public, showing which states are the most problematic, and to what extent they should be prioritized.

Data Availability
All results can be reproduced using the data and code to available as Supplementary data.