Spatial-temporal characteristics and the epidemiology of haemorrhagic fever with renal syndrome from 2007 to 2016 in Zhejiang Province, China

Zhejiang Province is one of the six provinces in China that has the highest incidence of haemorrhagic fever with renal syndrome (HFRS). Data on HFRS cases in Zhejiang Province from January 2007 to July 2017 were obtained from the China Information Network System of Disease Prevention and Control. Joinpoint regression analysis was used to observe the trend of the incidence rate of HFRS. The monthly incidence rate was predicted by autoregressive integrated moving average(ARIMA) models. Spatial autocorrelation analysis was performed to detect geographic clusters. A multivariate time series model was employed to analyze heterogeneous transmission of HFRS. There were a total of 4,836 HFRS cases, with 15 fatal cases reported in Zhejiang Province, China in the last decade. Results show that the mean absolute percentage error (MAPE) of the modelling performance and the forecasting performance of the ARIMA model were 27.53% and 16.29%, respectively. Male farmers and middle-aged patients account for the majority of the patient population. There were 54 high-high clusters and 1 high-low cluster identified at the county level. The random effect variance of the autoregressive component is 0.33; the spatio-temporal component is 1.30; and the endemic component is 2.45. According to the results, there was obvious spatial heterogeneity in the endemic component and spatio-temporal component but little spatial heterogeneity in the autoregressive component. A significant decreasing trend in the incidence rate was identified, and obvious clusters were discovered. Spatial heterogeneity in the factors driving HFRS transmission was discovered, which suggested that a targeted preventive effort should be considered in different districts based on their own main factors that contribute to the epidemics.

The data collection. Any human HFRS case diagnosed in the hospital must be reported through the China Information Network System of Disease Prevention and Control by the medical staff. The data of the HFRS cases in Zhejiang Province from January 2007 to July 2017 and the data of the populations at the county level were obtained from this network system. Individual information on cases and deaths were imported, and surveillance information, including demographic characteristics and geographic and temporal distributions, were computed by the system. The definition of reporting cases referred to the 'Diagnostic criteria for epidemic haemorrhagic fever' (WS 278-2008) of China. Joinpoint regression. Joinpoint regression is a model used to describe continuous changes in the incidence trend., The grid-search method was used to fit the regression function, with unknown joinpoints assuming constant variance and uncorrelated errors 11 . The joinpoint regression model for the observations x y x y ( , ), , ( , ) where y is the outcome of interest, x is the time variable, the s ' k τ are the unknown joinpoints, and α α = + for α > 0 and 0 otherwise.
The approximate permutation test was used to find the number of significant joinpoints, each p-value was obtained using Monte Carlo methods, and the overall asymptotic significance level was maintained through a Bonferroni correction. The objective indicator was the annual percent changes (APCs) of each period segment, which was estimated according to the following formula: i where β i represents the slope of the period segment 11,12 . Time-series estimation. The ARIMA model and the decomposed models have been widely applied to predict trends in many infectious disease incidences [13][14][15][16] . The formula of the ARIMA model is written as: where W t is the response series Y t , or a difference of the response series. µ is the intercept; B is the backshift operator( = − BX X t t 1 ). φ B ( ) is the autoregressive operator, represented as a polynomial in the backshift operator: where p is the order of the autoregressive process; φ is the autoregressive coefficient; θ B ( ) is the moving-average operator, represented as a polynomial in the backshift operator: where q is the order of the moving-average process; θ is the moving-average coefficient; and a t is the random error.
The formula for simple (nonseasonal) differencing is written The formula for seasonal differencing is where d is the degree of nonseasonal differencing; D is the degree of seasonal differencing, and the length of the seasonal period is represented by s, which is 12 for monthly series. The formula " (3)" suggests that the model predicts a value as a linear combination of its own past value and past errors. The nonseasonal model is usually represented by ARIM A (p, d, q), and the seasonal model is normally represented by ARIM A (p, d, q) × (P, D, Q) s . The term (p, d, q) gives the order of the nonseasonal part, and (P, D, Q) s gives the order of the seasonal part. The ARIMA analysis is usually divided into three stages including the identification stage, the estimation and diagnostic stage and the forecasting stage.
In the first stage, the white noise test was done to assess if there is any information in the series to model. This is an approximate statistical test of the null hypothesis, in which none of the autocorrelations of the series is up to a given lag. If this is true for all lags, there will be no information in the series to model. If the hypothesis is rejected, the next step of stationarity tests is performed to determine if differencing is needed, sincethe stationarity of the series is necessary for constructing the model. An easy way to assess nonstationarity is to visually examine a graph of the series over time to see if it has a trend, seasonality, or other nonstationary patterns. Another way is to do the stationarity test, withthe null hypothesis being the time series is nonstationary.If the series is not stationary, an appropriate difference should be calculated to make the series stationary before moving onto the next step. In the second stage, the candidate models were identified according to the autocorrelations and the partial autocorrelations. The nonseasonal autoregressive order (p) and the seasonal autoregressive order (P) were determined using partial autocorrelation functions (PACF); the nonseasonal moving average order (q) and the seasonal moving average order (Q) were determined by the autocorrelation functions (ACF). The estimation of the coefficient including φ and θ was calculated by the conditional least squares method. In the last stage, the adequacy of the established model for the series was verified by employing white noise tests. The tests examined whether the residuals were independent and normally distributed, indicating that the residual series contains any additional information that might be used by a more complex model 17,18 . The optimum model was selected according to the Akaike's information criterion (AIC) and the Schwartz Bayesian criterion (SBC). Smaller AIC and SBC values represent a superior model 17 . In addition, the mean absolute percentage error (MAPE) was calculated to check the accuracy of the model and to compare the results with that of other studies. Once the model was adopted, we forecasted the short-term trend of the monthly incidence. Spatial autocorrelation analysis. Spatial analysis was applied to describe geographic variation and to detect the clustering regions 19 . A global and local autocorrelation analysis using the Moran's I was carried out to identify spatial autocorrelations. Global autocorrelation demonstrates the spatial distribution clusters over areas 20 . The Local Moran's I(LMi) Index was used to classify the autocorrelations as positive and negative values. When there were similar high values or low values of the incidence rates, the locations were defined as having a positive autocorrelation (represented as high-high or low-low autocorrelation). If opposite high and low values occurred, they were considered to have a negative autocorrelation (represented as high-low or low-high autocorrelation) 21 .
Multivariate time series model. The multivariate time series model established by Held and Paul was designed for spatially and temporally aggregated surveillance data. The Y i t , is the disease count in region = ... i 1, , .. , with an additively decomposed mean where ψ is an overdispersion parameter such that the conditional variance of Y it is µ µ where "." is one of , , ν λ φ; α . ( ) are intercepts; ⋅ b i ( ) denotes the random effects, which account for heterogeneity between districts. ⋅ z it ( ) are exogenous covariates, including time effects; and the β ⋅ . ω ji is the spatial contiguity weights matrix, which describes the strength of transmission from region j to region i. There are usually three models of neighbourhood weights, including the first-order neighbourhood model, the power law model and the second-order neighbourhood model. The Akaike's information criterion (AIC) were used to select the model without random effects.The scores rules, such as the logarithmic score ("logs"), the ranked probability score ("rps"), and the Dawid-Sebastiani score ("dss") were applied to compare the random effect models. These scores measure the predictive quality between the predictive distribution from a fitted model and the observed value, and they give a more robust result for taking into account the random effects. Lower scores correspond to better predictions 22-25 . Statistical analysis. Autoregressive integrated moving average (ARIMA) models were applied to predict incidence. The methods above were computed by SAS9.2 (SAS Institute Inc., Cary, NC). The joinpoint regression model was used to examine the trend of the incidence rate of HFRS from 2007 to 2016 by Joinpoint software (version 4.5.0.1). ArcGIS software (version 10.1, ESRI Inc.; Redlands, CA, USA) was used for the spatial autocorrelation analysis. County was adopted as the geographic unit to calculate the spatial autocorrelation. The multivariate time series model was run by R software (version 3.3.1). A P value less than 0.05 represented statistical significance for all the tests. indicating a monotone decreasing trend in the incidence rate (Fig. 2).

Result
An obvious seasonality was present for the occurrence of HFRS across the ten year period (Fig. 3). There were apparent bimodal curves every year, with two incidence peaks in spring-summer and autumn-winter. The cases reported during May and July accounted for 29.24% of the total cases, whereas they represented 35.26% of the cases reported in November to the next January. The incidence peak in the autumn-winter was slightly higher than that in the spring-summer. Relatively fewer cases were reported in February, August and September compared to the other months. The monthly incidence data from 2007 to 2016 were used to construct model, and the data from January to July 2017 were used as the test set. According to results of the white noise test ( 2 χ = 38.49, df = 6, P < 0.0001) in the first step, the null hypothesis of white noise was rejected strongly, which means the time series contains information to model. After checking for the graph of the series, we found that this time series was not stationary for the obvious monthly seasonality. The stationary test shows the null hypothesis could not be rejected for the P value was not significant enough (Tau = −2.24, P = 0.02), which suggested that the differencing is necessary. We then performed a seasonal differencing ( ) to ensure that the transformed time series of the monthly HFRS incidence was stationary (Tau = −6.54, P < 0.0001) and suitable for constructing the ARIMA model (Fig. 4A). The second step, based on the results of optimal model selection by the Bayesian information criterion (BIC), and the figures of the autocorrelation functions (ACF) and the partial autocorrelation functions (PACF) (Fig. 4B,C), the best model was the ARIMA model (2, 0, 0) × (0, 1, 0) 12 , with a minimum BIC value (5.17).
According to the results of the partial autocorrelation functions (PACF)and inverse autocorrelation function(IACF) (Fig. 4C,D), the partial and inverse autocorrelation coefficients in the lag 12th phase were beyond the 95% confidence interval. Therefore, we further constructed the product model of the ARIMA (2, 0, 0) × (1, 1, 0) 12 (AIC = 857.82, SBC = 865.87), which was more precise than the model of the ARIMA (2,0,0) × (0,1,0) 12 (AIC = 871.89, SBC = 879.93). The goodness-of-fit analysis showed that there was no significant autocorrelation  between residuals at the 6 th lags ( 2 χ = 1.06, df = 3,P = 0.79), suggesting that the residual series was a white noise series, and the information was fully analyzed. In the third step, the autoregressive parameters were estimated, and the future incidence was forecasted. The autoregressive parameters of the product model were 0.45 and 0.20, and the seasonal autoregressive parameter was −0.41. The predicted data for the actual value of the product model are shown in Table 1. The modelling and the actual time series are presented in Fig. 3. The mean absolute percentage error (MAPE) of the modelling performance and the forecasting performance were 27.53% and 16.29%, respectively. The raw predicted data, the rounding predictive value and the actual data were well matched, and the actual data fell within the predicted 95% confidence interval (Fig. 3).
Demographic characteristics of human HFRS cases. Of the total cases, male HFRS human cases predominated, accounting for an average of 73.37% during the ten-year period.We found that the incidence in males was 2.75 times higher than that in females. More than 90% of the cases were in individuals aged 20-75 years. The three age groups with the most reported cases were 40-44 years (13.17%), 45-49 years (13.07%) and 50-54 years (12.66%). Farmers represented the highest proportion (70.45%) of the HFRS cases, followed by workers (13.13%).
Geographic characteristics and the spatial clustering distribution. Human HFRS cases were reported in most areas of Zhejiang Province, with the exceptions for the following five counties that reported no cases from 2007 to 2016: Jiashan, Haiyan, Deqing, Daishan and Shengsi counties (Fig. 5). The top fourteen counties with average annual incidence rates greater than 2 per 100,000 were Longquan (14.75/10,000), Kaihua (10.61/10,000), Tiantai (9.69/10,000), Xiangshan (4.58/10,000), Xinchang (4.42/10,000), Jinyun (4.36/10,000),  Based on the results of the local Moran's I autocorrelation (Table 3), there were 54 high-high clusters and 1 high-low cluster in total at the county level during 2007 and 2016, with 7, 5, 8, 4, 3, 3, 4, 7, 6 and 8 clusters each year. It should be noted that Tiantai, Xinchang, Sanmen, Ninghai and Kaihua counties had long-term high-high clusters throughout the ten years, with 10, 10, 8, 7, and 5 high-high clusters, respectively. The hotspots of HFRS transmission were mainly concentrated in eastern areas, including Tiantai, Xinchang, Sanmen and Ninghai   counties, which accounted for 63.64% of all the high-high or high-low clusters of Zhejiang Province during the ten years (Fig. 6). Relatively sporadic clusters were observed in the southwest Zhejiang Province. It is interesting that the high-high clusters disappeared in the southwest from 2010 to 2012 and showed a stronger aggregation after 2014. It should also be noted that the counties varied in size, with the largest being 4,452 square kilometres and the smallest being 18.17 square kilometres.
Multivariate time series analysis. The monthly data from 2007 to 2016 were used to construct the multivariate time series model. First, we built two models assuming that the incidence would follow a negative binomial distribution and the Poisson distribution, with the first-order neighbourhood regarded as the neighbourhood weight by default. The AIC of these two models were 15,636.12 and 16,648.44, which suggested that the assumption of the negative binomial distribution of the incidence would be better for modelling. Second, based on the assumption of the negative binomial distribution of the incidence, we built three models including the first-order neighbourhood model, the power law model and the second-order neighbourhood model, and assumed that the neighbourhood weights were one of the three models above. The AIC of the three models were 15,636.12, 15,483.68 and 15,479.48. This shows that the second-order neighbourhood model was slightly better than the other two models. Third, we considered the random effects in the model, and we discovered that the second-order neighbourhood model with random effects introduced by the proper scoring rule was much better than the models without random effects. The final model selected was the second-order neighbourhood model with random effects based on the assessment of the different models ( Table 4). The estimated random effect variance parameters 2 σ λ , σ φ 2 , and σ ν 2 were 0.33, 1.30 and 2.45, which were used to capture the heterogeneity of HFRS incidence among the different counties. The autoregressive component (σ λ 2 ) among districts showed little variation. In contrast, considerable spatial variations in the spatio-temporal component (σ φ 2 ) and the endemic component (σ ν 2 ) were found. This suggested that there was obvious spatial heterogeneity in the endemic component and spatio-temporal component, with little spatial heterogeneity in the autoregressive component. Redlands, CA, USA). The homepage of the ArcGIS software was https://www.esri.com/. The HH was the highhigh spatial autocorrelation, the HL was the high-low spatial autocorrelation, the LH was the low-high spatial autocorrelation and the LL was the low-low spatial autocorrelation.  Table 5).
To judge the relative importance of the three model components in the high incidence areas (>100 cases over ten years), we plotted the mean components along with the observed counts (Fig. 8). The relative contributions of the three components in driving the prevalence of HFRS over time were exhibited by these figures (Fig. 8), and the seasonal characteristics were displayed at the same time.
There were considerable differences in the multivariate time series models between the aforementioned high incidence areas. To a large degree, most of the high incidence areas, except for Yongkang county, were affected by the endemic component for the entire ten year period. Clear seasonality with two peaks was observed every year. Yongkang county, unlike other counties, was mainly affected by the spatio-temporal component. Yinzhou county was influenced equally by the spatio-temporal and the endemic components. Unlike Yongkang and Yinzhou counties, the other 8 counties were all closely located in the middle eastern portion of Zhejiang Province and were mainly affected by the endemic component, while they were slightly influenced by the spatio-temporal and the autoregressive factors. These counties were Ninghai, Fenghua, Xinchang, Zhuji, Dongyang, Sanmen, Tiantai and Linhai. It is interesting that five counties (Xiangshan, Jiaojiang, Kaihua, Jinyun and Longquan) dispersedly located in the border areas of Zhejiang Province were the most influenced by the autoregressive factor instead of the endemic component. The autoregressive influence was clearly observed in the incidence peak and exhibited a decreasing trend, especially in the recent years.

Discussion
According to our research, there were more than 300 HFRS cases identified every year during the last ten years, some of which led to deaths. The epidemic of HFRS remainsa public health threat in Zhejiang Province, China. This province is one of the top six provinces with the highest incidence rate of HFRS 4 , though its deaths and fatality rates are much lower than the overall averages for China. The reason for the high incidence rate but low fatality rate may be that the overall diagnostic ability and medical treatment level in Zhejiang Province is preferable compared to the other provinces with high incidence. There fore, those infected cases can be identified early and treated over time. Based on the results of the joinpoint regression analysis, we found an obvious decreasing trend in the incidence rate from 2007 to 2016. One of the reasons for the decline is the high urbanization rate and GDP (Gross Domestic Product) per capita Zhejiang Province (ranked fifth in China with 12,635 dollars in 2016). Many studies found that the HFRS incidence decreased with an increase in per capita GDP and urbanization rate, since economic development may reduce HFRS transmission by decreasing rodent density 26 and thus the risk of excreta exposure 27 . Another reason may be because Zhejiang Province has been one of the trial areas for the implementation of the National Expanded Programme on Immunization with the hantavirus vaccine since 2008 4 .
Consistent with previous research 4, 8 , there were two peaks of incidence in Zhejiang Province,with close numbers of cases in these two peaks. As we know, previous studies 4 suggest that HFRS in China is mainly caused by two types of hantaviruses including HTNV transmitted by Apodemus agrarius and SEOV transmitted by Rattus norvegicus. In addition, the rodent population density determines the incidence in humans 1 . We observed two peaks (June to July and November to January in the next year) for the incidence curve of Apodemus-type HFRS and only one peak (March to May) for the epidemic of Rattus-type HFRS. Thus, the epidemics in Zhejiang Province may have been caused by both Apodemus agrarius and Rattus norvegicus, contributing to the seasonality of the two peaks. Based on the seasonality and the declining trend in the incidence, the monthly incidence is meaningful in the temporal dimension against the assumption of white noise and is stationary after one-step seasonal differences. It is suggested that this time series is suitable for constructing the ARIMA model, which is  widely used to assess and forecast the incidence of infections [13][14][15][16] . One important conclusion from our model is that the incidence of HFRS in Zhejiang Province among the next six months will be similar to that in the same period from 2016 and maintain the seasonal characteristics. It should be noted that the performance of our model is similar to other research models 17,28 . It is worth mentioning that the prediction of the time series by the ARIMA model is only based on its past value. Therefore, the epidemiological factors driven the transmission of the disease should be kept the similar trend and seasonality to ensure an accurate forecast. Consistent with previous research 29,30 , middle-aged male HFRS cases predominated, and farmers accounted for the highest proportion of all occupations. To our knowledge, the hantavirus is most likely transmitted to humans through the inhalation of aerosolized excreta from infected rodents 1 . The variations in the distribution of gender, age and occupation are most likely caused by varying chances of exposure to the infected rodents and their excreta 29 . Middle-aged male farmers are expected to participate more frequently in outdoor activities, such as harvesting, which would increase exposure to infected rodents.
According to the incidence map and the Moran's I autocorrelation during the last decade, significant differences in the incidence and regional distribution were identified, and there were obvious clusters in some counties of Zhejiang Province. Most of the long-term hotspots were found in eastern areas, and some short-term high clusters were observed in the southwest. The finding suggests that the incidences of HFRS were similarly high among these areas Hence, more attention is needed to prevent epidemics in these areas. An interesting observation was that the high clusters disappeared in the southwest from 2010 to 2012 but reoccurred and expanded from 2013. The possible explanation is the selection of the hotspots. When we conducted the local autocorrelation analysis to detect the spatial similarity, the areas with the similar high values of the incidence rates were identified as the hotspots. We found that the incidence rates in Longquan county and Kaihua county were consistently high during the 10-year period compared to their neighboring areas with relatively low and varying incident rate. Nevertheless, it is worth noting that the epidemic in Kaihua county and Longquan county may expand to neighboring areas and act as reservoirs for the virus.
To our knowledge, the spatio-temporal heterogeneity of the incidence was affected by various factors such as temperature, precipitation, humidity, NDVI (normalized difference vegetation index), land use, rodent population density and human activity 1,26,31 . Different levels of seriousness of the risk factors can lead to different patterns of disease transmission, which may have caused the variations in the distribution of the epidemics among   the counties. Therefore, a multivariate time series model with random effects accounting for the effect produced by unobserved factors 25 was constructed to analyze the spatial and temporal occurrence of HFRS and to capture the heterogeneity in the component driving HFRS transmission. Based on the model, obvious spatial heterogeneity in the endemic and spatiotemporal components were identified. The high value of the endemic component was found in many counties, especially those located in the middle eastern areas of Zhejiang Province. Except for Yongkang county, all the other counties with a high incidence were mainly affected by the endemic component, which suggests that most infections in these areas can be explained by climatological and ecological changes, socioeconomic activities, living conditions and rodent density 1,26 . Moreover, considerable seasonality was observed in the time series of components, which shows that seasonal changes in climatological and ecological factors, such as increased rainfall, warmer winters, and abundant rodent food sources, will lead to an increase in the rodent population and contribute to the peaks in HFRS cases several months later 26 . In addition, the spatio-temporal and the autoregressive components also affected the disease transmission. Yongkang and Yinzhou counties, which are located near the eastern high-high cluster areas, were obviously affected by the spatio-temporal component, which indicates that the cases in these areas may have been imported from neighbouring areas with high incidence. For example, individuals who work in neighbouring areas with high incidence but live and seek medical advice in their hometowns were affected. Furthermore, we found that the counties Xiangshan, Jiaojiang, Kaihua, Jinyun and Longquan, which were geographically separated from each otherwere influenced by the autoregressive factors instead of the endemic components. In these counties, the epidemic effect in the previous season continued and partly contributed to the later peaks. Eight counties (Ninghai, Fenghua, Xinchang, Zhuji, Dongyang, Sanmen, Tiantai and Linhai) closely located in the middle eastern area of Zhejiang Province had the most high-high clusters, and the spatio-temporal and the autoregressive component contributed to rest of the epidemics. This result suggests that the cross-regional importation of HFRS, previous epidemics, and climatological and ecological factors altogether caused the disease and led to long-term clusters. Based on the heterogeneity of the components driving HFRS transmission, targeted preventive efforts are needed in the different areas. For most counties with a high incidence influenced mainly by the endemic component, coordinated rodent-control efforts could be the most appropriate way to prevent human HFRS 1 . Another important measure is educating the public, especially farmers and forest workers, to improve general hygiene. Means such asusing face masks when working in agriculture should be adopted to decrease the risk of inhaling virus-carrying particulates. For endemic areas, the HFRS vaccine may have a considerable effect on HFRS prevention and can be used as aneffective resolution for HFRS control in the future 32 . For Yongkang and Yinzhou county, where were obviously affected by the spatio-temporal component instead of endemic factors and epidemic control, it is recommended to use rodent control in neighbouring areas such as Ninghai, Fenghua, Dongyang and Tiantai counties in order to decrease incidence. For Xiangshan, Jiaojiang, Kaihua, Jinyun and Longquancounty, where the epidemic was mainly effected by the autoregressive factor instead of the endemic component, early Figure 8. Fitted components in the multivariate time series model for the 15 counties with more than 100 cases during the past ten years. The black dots represent the monthly counts of incidence, the light grey area shows the endemic component, the blue area shows the autoregressive component, and the yellow area corresponds to the spatio-temporal component.
implementation several months ahead of the peak may effectively decrease the number of cases of HFRS in the peak season.
Several limitations should be noted within our study. First, the data regarding the risk factors including climatic factors, pathogen and host dynamics, socioeconomic status and human activities were not collected. Consequently, these factors could not be compared and incorporated into the multivariate time series model. The correlation between these risk factors and the incidence of HFRS was also not studied. In order to achieve an accurate forecast of the HFRS incidences, future studies should incorporate these cofactors in the multivariate time series model and further compare the model with others including artificial neural network, Markov model and other models.
Second, our sample included the cases of HFRS reported from a passive surveillance system. Future studies should consider the reporting level, e.g., including the mild and subclinical cases that did not seek medical care. Third, the relatively low diagnostic levels in some counties also led to the underestimation of incidence. It is necessary to take into account the diagnostic to correct the incidence for future research.
In conclusion, Zhejiang Province is a province with high incidence of HFRS in China. However, a significant decreasing trend in the incidence rate was seen from 2007 to 2016, according to the joinpoint regression analysis. The distribution of the seasonal peak, gender, age and occupation was similar to previous studies. Obvious clusters were identified in the eastern and southwestern areas of Zhejiang Province. A spatial heterogeneity in the component driving the transmission of HFRS was identified from the multivariate time series model. This suggests that targeted preventive efforts should be made in different areas based on the main component contributing to the epidemics. Case surveillance, especially for mild cases, should strengthen ongoing research with the incorporation of risk factors.