Machine Learning Algorithms for Risk Prediction of Severe Hand-Foot-Mouth Disease in Children

The identification of indicators for severe HFMD is critical for early prevention and control of the disease. With this goal in mind, 185 severe and 345 mild HFMD cases were assessed. Patient demographics, clinical features, MRI findings, and laboratory test results were collected. Gradient boosting tree (GBT) was then used to determine the relative importance (RI) and interaction effects of the variables. Results indicated that elevated white blood cell (WBC) count > 15 × 109/L (RI: 49.47, p < 0.001) was the top predictor of severe HFMD, followed by spinal cord involvement (RI: 26.62, p < 0.001), spinal nerve roots involvement (RI: 10.34, p < 0.001), hyperglycemia (RI: 3.40, p < 0.001), and brain or spinal meninges involvement (RI: 2.45, p = 0.003). Interactions between elevated WBC count and hyperglycemia (H statistic: 0.231, 95% CI: 0–0.262, p = 0.031), between spinal cord involvement and duration of fever ≥3 days (H statistic: 0.291, 95% CI: 0.035–0.326, p = 0.035), and between brainstem involvement and body temperature (H statistic: 0.313, 95% CI: 0–0.273, p = 0.017) were observed. Therefore, GBT is capable to identify the predictors for severe HFMD and their interaction effects, outperforming conventional regression methods.

Patient recruitment and treatment. Guangdong General Hospital is a 2852-bed tertiary teaching hospital in Guangdong province, providing services to about 5.6 million patients per year. Most patients reside in rural and regional centres or townships. The hospital is one of the designated referral centres for HFMD in the region, and all identified HFMD patients are required to be hospitalized and monitored in the paediatrics unit. A retrospective review of HFMD patients who were admitted to the paediatric unit from January 2009 to December 2014 was performed. The Chinese guideline for HFMD diagnosis and treatment (Chinese Ministry of Public Health, revised in 2010) was used as reference for clinical diagnosis of HFMD, and only those patients who were newly diagnosed with HFMD and not treated were included in this study. Patients were categorized into the following groups based on the severity of the infection: (1) mild HFMD without severe complications; (2) severe HFMD with severe complications.
Before the treatment, brain and spinal MRI was taken regularly for all the HFMD patients at the early stage of infection (1-4 days after the onset of symptoms) except for those relatively milder HFMD cases, in which the patients were exempted from undergoing MRI. Assays for the detection of EV71 infection and determination of WBC count were also performed. For mild HFMD, they were treated with paracetamol and sufficient water, while for severe HFMD, glucocorticoid, oxygen, anti-virus drugs and/or intravenous immunoglobulin were administered.
Data collection. Demographic characteristics, clinical symptoms and signs, EV-A71 test results, WBC count, chest radiograph, and MRI reports of the patients were collected.
Statistical analysis. In this study, gradient boosting tree (GBT) was chosen to assess the interaction effects due to the following rationale: (1) Gradient boosting is a powerful machine learning technique combining the algorithms of decision trees and boosting, which can handle complex interaction effects that conventional approaches lack; (2) GBT can handle different types of predictor variables and missing data by boosting, using only the complete predictors; (3) Elimination of outliers and prior data transformation are not required; (4) With the non-linear GBT formulation, a robust non-linear interactions can be provided. Therefore, GBT is suitable for handling the selected interaction effects in this study 19 .
For single variable, the relative importance (RI) was calculated using the number of times a variable was selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all trees. The relative importance of all variables were normalized and scaled to have a maximum value of 100. For the interaction, Friedman's H-statistic was used to assess the relative strength of interaction effects in non-linear models with a scale of 0 to 1, with higher values indicating stronger interaction effects.
The prediction performance of GBT is based on the average results over all 10 testing data sets. However, to find the interactions among factors and also conduct the statistical inference on those interactions, we need to decide which configuration (the number of trees) should be used for the permutation test. In our study, we chose the setting based on the 10 fold validation. For the prediction performance of GBT, we compared the results of using the down-sampling strategy to balance our training-test data with that of no using any balance approach.
A permutation test is a type of statistical significance test, in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the values on the observed data points. The ranking of the real test statistic among the shuffled test statistics gives the p-value. The standard p-value threshold of 0.05 was then used to select attributes and interactions.

Results
Patient characteristics. A total of 1172 patients infected with HFMD were assessed, in which laboratory results were not available for 83 patients; pre-treatment MRI scan was not performed in 347 relatively milder patients; and the remaining 212 patients had received treatment in other hospitals. Finally, a total of 530 patients were recruited in this study. The comparison of the demographic and clinical data between 185 severe HFMD and 345 mild HFMD patients are shown in Table 1. Complications detected in the severe HFMD patients were as follows: aseptic meningitis (n = 13), encephalitis (n = 99), and acute flaccid paralysis (n = 59), cardiorespiratory complications (n = 103), and death (n = 0).

MRI findings.
The comparison of MRI findings between mild and severe HFMD in patients was shown on Table 2. The results have demonstrated that positive imaging findings were more likely being detected in severe HFMD patients than in mild HFMD patients (p < 0.01).  (Fig. 1). All p-values were calculated using the permutation test, described in the statistical analysis section (Supplementary Figure 1).   Figure 2). After balancing the training-test data using a down-sampling procedure, the prediction accuracy of GBT is a slightly worse than 92.3%, which is 89.2%. The AUC of model is 0.948, with a sensibility of 0.80, a specificity of 0.93 (Supplementary Figure 3).

Interaction effects between predictors.
Interaction between the two exposures was defined as the effect of one exposure on an outcome, depending on the presence or absence of another exposure. The stronger the dependence between these two exposures, the stronger the interaction between them was. To interpret the interaction between the two exposures, one exposure was fixed at constant while others changed, and the risk of severe HFMD was assessed. Three pairs of interactions were found to be statistical different in the present study. Figure 2 illustrated the interaction between elevated WBC count and hyperglycemia (H statistic: 0.231, 95% CI: 0-0.262, p = 0.031), between spinal cord involvement and duration of fever (H statistic: 0.291, 95% CI: 0.035-0.326, p = 0.035), and between brainstem involvement and body temperature (H statistic: 0.313, 95% CI: 0-0.273, p = 0.017). The H statistic for interaction between age and body temperature was 0.244 (95% CI: 0.05-0.424, p = 0.251), while the interaction between age and gender was 0.168 (95% CI: 0-0.275, p = 0.092). A significant increase in the risk of severe HFMD was observed when the probability of the elevated WBC count and hyperglycemia increased from 0.5 to 1 ( Fig. 2A). We can observe that conditioning on the change of spinal cord involvement, the effect of duration of fever is minor (Fig. 2B). A significant increase risk of severe HFMD was observed in male patients aged 0 to 50 months (Fig. 2C). When the body temperature was constant (38 °C), there was a slight increase in the risk of severe HFMD when the probability of brainstem involvement increased from 0 (no brainstem involvement) to 1 (brainstem involvement). However, when the body temperature was between 40 °C and 41 °C, a significant increase in the risk of severe HFMD was observed if the probability of brainstem involvement changed, or when the age range was 0 to 50 months (Fig. 2E). All p-values were calculated using permutation test (Supplementary Figure 2).  Fig. 2A, we can observe a significant increase in the risk of severe HFMD when the probability of increased WBC count and hyperglycemia changed from 0.5 to 1. In Fig. 2B, we can observe that conditioning on the change of spinal cord involvement, the effect of duration of fever ≥3 days is minor. In Fig. 2C, when the age is between 0 and 50 months and the patient is male, we can observe a significant increase risk of severe HFMD. In Fig. 2D, we can observe that if we fix the body temperature at 38 degree, then there is a slight increase in the risk of severe HFMD when the probability of brainstem involvement changed from 0 to 1. However, when the body temperature is between 40 and 41 degree, we can observe a significant increase in the risk of severe HFMD if the status of brainstem involvement changed. In Fig. 2E, when the body temperature is between 40 and 41 degree and the age is between 0 and 50 months, we can observe a significant increase risk of severe HFMD.

Discussion
In this study, a powerful machine-learning algorithm, GBT was used to analyse the predictors for severe HFMD in children. Results have shown that elevated WBC count (≥15 × 10 9 /L) as the top indicator associated with the risk of severe HFMD, followed by spinal cord involvement, spinal nerve roots involvement, hyperglycemia and brain or spinal meninges involvement. The GBT model achieved an AUC of 0.985, indicating GBT as a good predictive model. In addition, this is the first study to describe interactions between elevated WBC count and hyperglycemia, between spinal cord involvement and duration of fever, and between brainstem involvement and body temperature. In China, HFMD has been classified as a category C notifiable infectious disease since May 2, 2008. Although the majority of HFMD episodes are generally mild and self-limiting, the infection may rapidly develop into severe HFMD with serious complications and possibly life-threatening 20,21 . Therefore, it is necessary to identify predictors for severe HFMD 18 .
Previous studies have shown that EV71 was more likely to cause serious complications than other enteroviruses and usually results in meningoencephalitis, pulmonary haemorrhage, and circulation failure 22,23 . It is found that EV71 has strong neurotropism, subsequently affects axonal transport in neuron cells, resulting in brain infection and flaccid paralysis 24 . Etiological examination of EV71 infection requires special equipment and is time-consuming, complicating the diagnosis 10 . In this study, higher incidence of EV-A71 infection was found in mild HFMD than in severe HFMD, with low relative importance of EV-A71 infection (RI = 2.24). This may be due to the following reasons: (1) The presence of several outbreaks of HFMD that were associated with EV71 in Guangzhou may have increased the detection of EV-A71 infection in mild HFMD; (2) The exclusion of patients with mild HFMD who were exempted from MRI scan could have contributed to sampling bias. Therefore, having other early predictors for severe HFMD may provide more accurate results for risk prediction.
Although previous studies have identified possible predictors associated with increased probability of severity, such as children who attend child care centres 25 , has leukocytosis 17 , has limb weakness 26 , or has persistent fever 27 , there is still much uncertainty about the relative importance of these predictors. Limited studies have systematically evaluated the criteria for early screening of severe cases and some studies have only categorized severe and mild HFMD based on the length of hospitalization rather than a clinical measure of severity 25 .
This study has shown a comprehensive analysis of mild and severe HFMD compared to previous studies. Besides, this is a novel study that demonstrated clinical manifestations and MRI findings as potential indicators for severe HFMD. Furthermore, the interaction effects of indicators were verified using gradient boosting tree approach.
In this study, mild and severe HFMD were defined based on clinical diagnosis, and the establishment of a gradient boosting tree model, allowing the determination of the relative importance of predictors that are associated with the increased probability of severity. Elevated WBC count was identified as the top predictor of severe HFMD. HFMD patients with elevated WBC count had much higher risk for central nervous system infection 28 . Elevated WBC count was also found to associate with hyperglycemia, another clinical risk factor, which had a relatively small importance of 3.40. Hyperglycemia may result from the loss of blood glucose homeostasis, in which the autonomic nervous system plays an essential part. Upon the stimulation of the sympathetic nervous system, adrenaline and glucagon concentrations increase, while the insulin concentration decreases 20 . Previous studies have shown that young age and male gender were associated with severe HFMD, but there was no evidence showing this in the current study 10,29 .
Hitherto, several reports have only described the MRI scan characteristics of complications in the central nervous system, resulted from EV71 infection [30][31][32] . We have identified three MRI-related predictors; they were spinal cord involvement, spinal nerve roots involvement, and brain or spinal meninges involvement. Of these, spinal cord involvement was the most important indicator of severe HFMD, in which the risk was higher accompanied by longer duration of fever. Study has shown that by using immunohistochemistry methods, EV71 antigen could be first detected in the small intestine at 6 hours, in the spinal cord at 24 hours, and in the brainstem at 78 hours 33 . It was also suggested that major pathway for enterovirus entry into the central nervous system is via the peripheral nervous system; subsequently, the enterovirus spreads rostrally up to established neural pathways 34 . Our study may suggest that the EV71 infection pathway initiated from the spinal cord to spinal nerve roots and to the brainstem based on their relative importance. Although the brainstem is the most commonly infected area, the relative importance of brainstem involvement was very small (0.27), which was inconsistent with the findings of other studies 30,35 . In fact, in our study, most of the HFMD patients with only brainstem involvement had complete recovery, and the lesions on MRI disappeared in two years' follow-up.
Limitations of this study should also be acknowledged. Firstly, the study was retrospective in nature. Besides, predictors such as plasma cytokine level, the effect of attending home care, living in rural area, and education status were not evaluated because these data were not available. Furthermore, the hospital was used as a reference centre and this may contribute to sample bias. In addition, the presence of selection bias due to the strict criteria used, in which only children who underwent MRI scans were included in the study could also affect the prediction. Lastly, the conventional MRI may not be sensitive enough to detect lesions at the early stage of HFMD, diffusion weighted imaging might be more sensitive in detecting EV71 encephalitis at the early stage.
In conclusion, GBT approach was used to successfully identify the predictors related to severe HFMD and the interaction effects of multiple predictors. With the evidence provided in this study, it is recommended that clinicians should take precaution when children are diagnosed with HFMD, accompanied with elevated WBC count and/or disorders associated with spinal cord involvement, spinal nerve roots involvement, hyperglycemia, and brain or spinal meninges involvement, especially when the first three predictors were detected. It should be noted that the risk of disease severity will significantly increase if the numbers of interacting predictors increase.
Although MRI scan was not required for children with mild HFMD, undergoing MRI scan might be useful to rule out the false-negative severe HFMD.
Severe HFMD can be controlled by immediate effective treatment as the early pathological changes are reversible. Early detection and meticulous management of infected HFMD patients are required. Besides, MRI should be used as a routine tool to evaluate the severity and to predict the prognosis of severe HFMD as most of the predictors are MRI-related. In order to reduce the incidence and mortality of severe HFMD, doctors and health care providers will need to be aware of the risk predictors for severe HFMD. Enhanced identification tools for mild HFMD at early stage will be helpful to prevent the progression of mild to severe HFMD.