The threshold effects of meteorological factors on Hand, foot, and mouth disease (HFMD) in China, 2011

We explored the threshold effects of meteorological factors on hand, foot and mouth disease (HFMD) in mainland China to improve the prevention and early warning. Using HFMD surveillance and meteorological data in 2011, we identified the threshold effects of predictors on the monthly incidence of HFMD and predicted the high risk months, with classification and regression tree models (CART). The results of the classification tree showed that there was an 82.35% chance for a high risk of HFMD when the temperature was greater than 24.03 °C and the relative humidity was less than 60.9% during non-autumn seasons. According to the heatmap of high risk prediction, the HFMD incidence in most provinces was beyond the normal level during May to August. The results of regression tree showed that when the temperature was greater than 24.85 °C and the relative humidity was between 80.59% and 82.55%, the relative risk (RR) of HFMD was 3.49 relative to monthly average incidence. This study provided quantitative evidence for the threshold effects of meteorological factors on HFMD in China. The conditions of a temperature greater than 24.85 °C and a relative humidity between 80.59% and 82.55% would lead to a higher risk of HFMD.

Association analysis. Table 2 shows the bivariate linear relationships between the HFMD incidence and atmospheric pressure, relative humidity and temperature. Atmospheric pressure (r = 0.259, P < 0.001), relative humidity (r = 0.308, P < 0.001) and temperature (r = 0.488, P < 0.001) were significantly associated with HFMD incidence. The Scatter plot matrices with spline regression line depict the relationship between all the variables by season in Fig. 2.
There was a significant positive spatial autocorrelation of HFMD incidence with a Moran's I statistic of 0.324 (P < 0.001).

Mean ± SD* Median (IQR*) Range
Incidence (1/100,000) 9.99 ± 12.88 6.16 (1.   Spatial empirical Bayes rates smoothing. Figure 3(a) depicts the geographic distribution of the notified HFMD incidences in China during 2011. The figure confirms that the risk for HFMD varied with geographical location. The spatial empirical Bayes smoothing analysis ( Fig. 3(b)) showed that the HFMD incidence changed a little (− 0.3/100,000 to 1.2/100,000). And the infection activity was primarily concentrated in the southeast. The Bayesian posterior estimates of the monthly HFMD incidence for each province would be used in the CART models. Figure 4 representing the first CART model indicated the high risk period of HFMD, defined as exceeding the third quartile of the incidence. The analysis indicates that there was an 82.35% chance for a high-risk of HFMD when the temperature was greater than 24.03 °C and the relative humidity was less than 60.9% during non-autumn seasons. Figure 5 shows a high risk map of predicted monthly HFMD incidence rates in China based on the CART 1 model, which took into account local variation (i.e. atmospheric pressure, relative humidity, temperature and season) from the model prediction. Thirty one provinces were sorted in descending latitude from up to down. The heatmap shows that the detected high risk timing was concentrate from May to August. The time of south provinces was earlier than the north. The validation analysis indicates that the classification accuracy rate was 86.02%. Figure 6 represented the second CART model for the smoothing incidence of HFMD in a month, by province. The results indicated that when the temperature was greater than 24.85 °C and the relative humidity was between 80.59% and 82.55%, the relative risk (RR) of HFMD was 3.49 relative to monthly average incidence (mean incidence: 9.989/100,000) during the epidemic period. The result of the leave-one-out cross-validation (LOOCV) indicated that the prediction error of the model was 10.80%.

CART 2 model.
These results of validation analysis including misclassification analysis and LOOCV reveal that both the two CART models had reasonable accuracy, and its utility in research needs to be further explored.

Discussion
The current study has quantified the threshold effects of meteorological factors on HFMD, using China as the study area in 2011. Because of the absence of HFMD-targeted vaccination or specific treatments, quantification of the adverse effects of environmental agents is essential for the early warning and response system on HFMD 7,12 .
Most parts of the world have reported the epidemic of HFMD. As the high prevalence area, China should be worthy of attention to the control work. Our results showed that HFMD in China had a significant seasonal

Figure 2. Scatter−plot matrices (with LOESS smooths) between HFMD incidences and explanatory variables.
periodicity, including a major peak in summer followed by a smaller peak in autumn. As suggested by this figure, the correlation between incidences and weather factors varied across the seasons. High temperature and high humidity were indicated as the risk factors, being in agreement with reports from other research 13 . And the season factor was used to handle the temporal correlation in our CART model.
The related researches on HFMD found that meteorological predictors were the potential risk factors 1,2,13 . What's more, we have also analyzed the threshold effects. We found that conditions of a temperature greater than 24.03 °C and a relative humidity under 60.9% (CART 1) or of a temperature greater than 24.85 °C and a relative humidity between 80.59% and 82.55% (CART 2) would lead to a higher risk. These two factors accounted for most importance for the increasing incidence of HFMD. Seasons and the atmospheric pressure could also affect the epidemic of HFMD. The threshold effect can be a continuation and development for the researches on HFMD. The meteorological is one of the major factors in the development of HFMD. They not only affect the individual immunity, but also affect the reproduction and transmission of the virus and media 14,15 .
This study stood out from previous studies by giving the exact thresholds of predictors to make the statistical results more straightforward to understand and reuse. With the two-stage CART models using the same predictors, two sets of thresholds were obtained for the different purposes. However, they were basically agreed with another. It meant that the thresholds we found were reliable, although the CART model allowed the high level interactions among predictors.
The acceptable results of validation analysis for the thresholds indicated another strength of our study. With the CART 1 model, high-risk months of each province in 2011 were predicted. The validation analysis showed that the classification accuracy rate was high (86.02%). The LOOCV result of the regression tree also indicated a high precision with an error of 10.80%. Different from other relevant studies, we provide the threshold of the predictors. We will conduct vigorous information and take emergency response plan before the meteorological factors reach the thresholds.
Limitations of this study should also be acknowledged. Firstly, we handled the temporal and spatial correlations in a simple way which was independent from the model. Much more difficult job should be done to improve algorithm of the model 16 . Secondly, the socio-economic factors should be taken into account to improve the prediction accuracy. Thirdly, underreporting of HFMD cases is inevitable because of the characteristics of being self-limiting, even though the data quality of the surveillance system in China was assured 13 . Finally, without information of changes of confounders including demographics and adaptation, results of this study should be extrapolated with caution. However, the methods used in our studies could serve as a good reference.

Conclusion
This study provided quantitative evidence for the threshold effects of meteorological factors on HFMD in China. The incidence of HFMD in China was associated with temperature and relative humidity. The overlap threshold between these two-stage models was that conditions of a temperature greater than 24.85 °C and a relative humidity between 80.59% and 82.55% would lead to a higher risk of HFMD. These information could be helpful in predicting the scale of outbreaks, guiding health resource allocation and building public health preparedness and intervention strategies. Similar studies with threshold effects are needed to better understand the impacts of meteorological variables on HFMD.

Materials and Methods
Ethics statement. This study was based on official hand, foot and mouth disease (HFMD) surveillance data in China. Analyses were conducted at aggregate level and no confidential information was involved. The research study protocol was approved by the Institutional Review Board of the School of Public Health, Sun Yatsen University. All methods were performed in accordance with the relevant ethical guidelines and regulations.

Study area.
China mainland consists of thirty one provinces which are available in many datasets, with populations per province ranging in size from 3.03 to 105.05 million. The meteorological characteristics vary regionally and seasonally among the country, with six zones including the tropical, subtropical, warm-temperate, mid-temperate, cold-temperate and the Tibet zone. In general, it performs complex landforms and climates in the study area. Data collection. We obtained the computerized data set on the reported HFMD cases by provinces in China for the period of 1 st January-31 st December 2011 from the National Center for Public Health Surveillance and Information Services, China Center for Disease Control and Prevention (China CDC) (http://cdc.cma.gov.cn/). Meteorological data were obtained for the same period from the China Meteorological Data Sharing Service System (http://cdc.nmic.cn/home.do) which were comprised of monthly mean atmospheric pressure (kPa), monthly mean relative humidity (%), monthly mean daily maximum temperature (°C).

Spatial autocorrelation analysis.
Moran's I spatial autocorrelation statistic was calculated to determine whether spatial clustering was a feature of HFMD. Moran's I is defined by: Spatial empirical Bayes rates smoothing. Bayesian conditional autoregressive (CAR) was used to obtain the more stable rates, improving the credibility of disease mapping 20 . Spatial empirical Bayes method defines a neighborhood for each study area (namely spatial weight matrix), and builds the risk estimation model by taking the scope of neighborhood, neighborhood population and mean and variance of neighborhood incidence into account. The Bayesian CAR could be implemented as a generalized linear mixed model: The study region S was partitioned into k non-overlapping areal units S = {S 1 , … , S k }, which were linked to a corresponding set of responses Y = {Y 1 , … , Y k } T , and a vector of known offsets O = {O 1 , …, O k } T . A set of random effects Φ = {Φ 1 , … , Φ k } were included to model any spatial autocorrelation in the data. The responses Y k come from an Poisson family of distributions f(y k |μ k , v 2 ). The expected value of Y k is denoted by E(Y k ) = μ k . The expected values of the responses are related to the linear predictor via an invertible link function g(·), and in our study natural log function was used and no predictor X k T in the smoothing procedure. For the global smoothing CAR priors, the model proposed by Leroux et al. (1999) was given by 21 : where W is a non-negative symmetric k × k neighborhood matrix. ρ is a spatial autocorrelation parameter, with ρ = 0 corresponding to independence, while ρ = 1 corresponds to strong spatial autocorrelation. A uniform prior on the unit intervalis specied for ρ while the usual inverse gamma prior is adopted for τ 2 . The default prior speciation for τ 2 has a = b = 0.001.
The spatial empirical Bayes rates smoothing analysis was conducted using the R package CARB ayes version 4.4 22 .
Spatiotemporal CART models. We considered a suite of two spatiotemporal CART models: (1) fitting a classification tree to incidences categorized as binary: high risk or non-high risk; (2) fitting a regression tree to the incidences. CART1: fitting a tree to high risk of HFMD. We defined a high risk if the incidence in any month exceeded the third quartile of the incidence per province in China during the 12 months of 2011. The spatiotemporal CART model was thus described as: monthly HFMD high risk/non-high risk ~ atmospheric pressure + relative humidity + temperature + season. The results are reported as the probability of occurrence of a high risk HMFD per month. A map of predicted probabilities was developed using the outputs of the CART model. CART 2: fitting a tree to the incidences. Like a classification tree, a regression tree is also built by binary recursive partitioning, while the response variable is continuous 23 . The splitting rules used in the algorithm are based on minimizing the sum of the squared deviations from the mean in the two separate subgroups. The squared residuals minimization algorithm is identical to a Gini splitting rule used in the classification 23 . Monthly HFMD incidence (1/100,000) by province were used as a continuous response variable in this model. The second CART model was thus described as: monthly HFMD incidence ~ atmospheric pressure + relative humidity + temperature + season. The results are reported as the RR of incidence, compared to average incidence rate of HFMD. RR = (expected incidence − mean of incidence)/mean of incidence.
Both the two CART analysis consisted of three basic steps. Firstly, a preliminary tree was grown by recursive data partitioning. Secondly, nested trees were formed by reducing the number of nodes in the tree (pruning) according to the complexity parameter. Thirdly, 10-fold cross-validation was used to address over-fitting and to identify the optimal tree with respect to its predictive ability. To control for the impact of seasonality, we decomposed the HFMD incidence into four seasonal categories 11 (coded as Spring: March -April -May; Summer: June -July -August; Autumn: September -October -November; Winter: December -January -February). Finally, we validated the model with the leave-one-out cross-validation and misclassification rate for the classification tree. For each CART model, the predicted probabilities or relative risks can be presented as a map. A map of predicting high risk was developed using the outputs of the CART 1 model.
The statistical analysis and visualization were conducted using the R package rpart version 4.1-10 and rpart. plot version 1.5.3.