Prediction of fall events during admission using eXtreme gradient boosting: a comparative validation study

As the performance of current fall risk assessment tools is limited, clinicians face significant challenges in identifying patients at risk of falling. This study proposes an automatic fall risk prediction model based on eXtreme gradient boosting (XGB), using a data-driven approach to the standardized medical records. This study analyzed a cohort of 639 participants (297 fall patients and 342 controls) from Chang Gung Memorial Hospital, Chiayi Branch, Taiwan. A derivation cohort of 507 participants (257 fall patients and 250 controls) was collected for constructing the prediction model using the XGB algorithm. A comparative validation of XGB and the Morse Fall Scale (MFS) was conducted with a prospective cohort of 132 participants (40 fall patients and 92 controls). The areas under the curves (AUCs) of the receiver operating characteristic (ROC) curves were used to compare the prediction models. This machine learning method provided a higher sensitivity than the standard method for fall risk stratification. In addition, the most important predictors found (Department of Neuro-Rehabilitation, Department of Surgery, cardiovascular medication use, admission from the Emergency Department, and bed rest) provided new information on in-hospital fall event prediction and the identification of patients with a high fall risk.

Patient falls are a major issue in health care institutions throughout the world. Despite considerable research in this field, inpatient falls continue to be a common adverse event in clinical practice 1 . In Taiwan, the Department of Health introduced the Taiwan Patient-Safety Reporting (TPR) system in 2003. According to the TPR annual report of 2017, there were 17,104 inpatient falls, accounting for 25.2% of all inpatient safety events, of which 51.9% resulted in injuries 2 . A review of the literature further showed that among the patients who fell during hospitalization, 28% had minor injuries, 11.4% had severe soft tissue injuries, 5% had bone fractures, and approximately 2% had head trauma, which could lead to long-term disability or premature mortality 3 .
To reduce the incidence of inpatient falls, the Joint Commission for Healthcare Organization Accreditation suggested integrating a standardized, validated tool into the medical record system to identify those with a high fall risk 4 . Several studies have been conducted to investigate fall risk factors and develop fall risk assessment tools to aid in the recognition of patients at greater risk of falling. However, the review studies concluded that there is no explicit superiority for any single assessment tool, and no tool correctly identifies fallers with high validity and reliability 5,6 . eXtreme gradient boosting (XGB), a machine learning system for tree boosting, is widely used by data scientists to achieve state-of-the-art results on many machine learning challenges. Using an XGB algorithm, this study proposed an automatic assessment tool to accurately detect high-risk groups. Other machine learning approaches have been attempted for identifying patients at risk of falling; however, a comparative validation with current fall risk assessment scales has never been involved 7 .
The Morse Fall Scale (MFS), which was developed in 1989, is the most widely used tool for the assessment of fall risk in the United States 8 . This study aims to propose a machine learning model for fall risk assessment in competition with the MFS. We further explore determinants for predicting inpatient fall events.

Results
Demographics and clinical characteristics. The study cohort included 639 adult patients (297 fall patients and 342 controls) who were admitted to our institution between February 2015 and December 2018. A derivation cohort of 507 participants (257 fall patients and 250 controls) was collected through June 2018 to develop the prediction model. Between July and December 2018, a validation cohort of 132 participants (40 fall patients and 92 controls) was prospectively collected. Despite the adoption of identical inclusion criteria, there were three items that showed significant differences in demographic features between the derivation and validation cohorts. The full demographic and clinical descriptions of both groups are presented in Table 1. Table 1. Clinical descriptors of the derivation and validation cohorts. The values are presented as the mean ± SD, or n (%). CNS, central nervous system; FRID, fall risk-increasing drugs. a The generalizable factors include intrinsic factors and predictors directly linked to the cause of a fall. b The P-values indicate a significant difference between the two cohorts. The P-value of means and proportions are obtained by t-test and chisquare test, respectively. www.nature.com/scientificreports/ According to the two fall risk assessment tools, the average scores for the fall group and the nonfall group are shown in Table 2. There were significant differences between the fall and nonfall groups in the fall risk assessment tools (P < 0.01, P < 0.001, and P < 0.001, respectively). performance of the prediction models. Table 2 presents the contingency table for the prediction models. More than two-thirds of nonfallers were correctly identified by both the MFS and XGB (specificity: 69.6% and 75.0%, respectively), and more fallers were correctly identified by the XGB algorithm (sensitivity: 50.0% and 65.0%, respectively). Figure 1 presents the ROC curves and areas under the curves (AUCs) to assess the overall validity of the tools. There was no significant difference between the AUCs of the MFS and XGB (AUCs: 0.598 and 0.700, respectively; P = 0.09). Table 3 presents the sensitivity, specificity, PPV, NPV, LR + , and LR − of the two fall risk assessment tools. For the MFS, a PPV of 4.8% (95% CI: 3.2 to 7.3%) and NPV of 97.8% (95% CI: 97.0 to 98.4%) were calculated at the optimal cut-off point of 45 points, which was the same as the definition used in the original study. For XGB, a PPV of 7.4% (95% CI: 5.0 to 10.9%) and NPV of 98.6% (95% CI: 97.8 to 99.1%) were calculated at the optimal cut-off value. The likelihood ratios confirmed that the results according to the fall risk classification tools differed by chance, and the results only improved the diagnostic accuracy restrictively.
Compromised results were found by XGB using only generalizable factors. The AUC of 0.660, sensitivity of 62.5%, specificity of 69.6%, PPV of 6.0% (95% CI: 4.1 to 8.6%) and NPV of 98.4 (95% CI: 97.5 to 98.9%) were calculated at the optimal cut-off value. The + LR and -LR values were similar to that of the XGB using the whole feature set.
feature importance according to XGB. Figure 2 shows the ranking of feature importance according to the XGB model. In this plot, the first column includes the item names of all the features that were actually used in the XGB algorithm, while each row lists the resulting importance scores calculated using the importance metric. The items were sorted bottom-up by the corresponding information gain values. The top five leading features, i.e., Department of Neuro-Rehabilitation, Department of Surgery, cardiovascular medication use, admission from the Emergency Department, and bed rest, are considered as the most important factors with a high impact on predicting inpatient fall events. www.nature.com/scientificreports/

Discussion
In the complex and complicated clinical scenario, both environmental factors and patient factors vary to a great degree. Currently, the most commonly used fall risk assessment tools, e.g., the Hendrich II Fall Risk Model, MFS, and the St. Thomas Risk Assessment Tool in Falling Elderly Inpatients (STRATIFY), incorporate only limited known risk factors that predispose a patient to fall risk [9][10][11] . The MFS consists of six evaluation items, and although patients are likely to fall due to a multitude of risk factors, such as postanesthetic weakness and unfamiliar environment, some of these factors are not assessed. In our study, the classification results showed that the MFS can identify approximately half (50.0%) of the patients who will suffer from a fall during their hospitalization. This result is similar to those of studies that have addressed the overoptimized results of the MFS 12 . Although a single assessment using a rule-based assessment tool is simple and inexpensive, assessing patients continuously and following interventions outweigh the benefits. In our study, the XGB model took many risk factors for falls into account, and it identified approximately two thirds (65.0%) of the patients who suffered from a fall during their hospitalization. Among such patients, falls might be avoided if the patients are identified and effective preventive measures are taken in time. However, this potential benefit is countered by a low PPV (7.4%), which leaves much to be desired for using this approach. Despite involving a multifactorial computerization, the machine models to some extent suffer from time-dependent variables and confounding variables which are unstable and non-measurable during an inpatient stay. Furthermore, fall events show a stochastic trend. Falsepositive classification may result from the fact that predicted high fall risk is not necessarily associated with an actual fall event.
For this study was conducted in a single local hospital, efforts were made to validate this modality when generalizing the sample population and findings. The analysis of the feature subsets revealed that the effect of factors such as the department to which the patient has been admitted is probably not equally transferable to other hospitals. In general, the risks for falls are described as both intrinsic and extrinsic 13 . To better assess the Table 3. Performance analysis for the prediction model (n = 132). AUC, area under the curve; CI, confidence interval; LR + , positive likelihood ratio; LR − , negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value; XGB-GF, the XGB model using generalizable factors including intrinsic factors and predictors directly linked to the cause of a fall. www.nature.com/scientificreports/ generalizability of the approach, the XGB algorithm was run independently using only intrinsic factors and predictors that are directly linked to the cause of a fall. Comparing the results obtained from models over different feature sets, the XGB model using the full range of risk factors provided the best evidence for fall prediction. Although the performance of the XGB model was decreased when feature dimensionality was restricted to generalizable factors, it still substantially improved the identification of the fall-prone group. Along with risk group identification, we have to recognize the most important predictors from the data sets and take precautionary measures to reduce the possibility of patient falls during hospitalization. An analysis of the feature importance of the items revealed that a number of factors associated with inpatient fall events are also found in the literature, as well as being a part of experiential clinical knowledge. First, extrinsic factors, including the Department of Neuro-Rehabilitation and the Department of Surgery, were associated with fall prediction in the XGB model. However, this result may be influenced by selection bias, as people are often admitted to certain units because they are physically handicapped and unable to live independently or are currently receiving postoperative care. There is a strong implication that fall events happen with greater frequency in certain areas where precautions should be focused. Second, the initial data mining from the medical records included a review of fall risk-increasing drugs (FRIDs), e.g., cardiovascular drugs, antidiabetic drugs, and CNS drugs, and the results showed that cardiovascular drugs were associated with a relatively higher risk for fall events. Our findings on cardiovascular drugs and fall risk are in conformity with the recent meta-analysis by de Vries et al. 14 . Third, several intrinsic factors, including old age, unstable vital signs, musculoskeletal deficits and cognitive impairment, are conventionally regarded as risk factors and were confirmed by this study 15 . Despite our inherent assumptions, utilizing machine learning techniques for fall risk group identification can help prevent at least a certain percentage of falls as we correctly predict these events and take precautions based on the evidence.
There were several limitations to this study. First, classification models based on machine learning tend to be unstable in small datasets. Therefore, both models in this study were externally validated using a prospective cohort. Second, missing data for some subitems in the medical records confined the integrity of data mining; thus, only the fundamental items were selected for developing the model for fall risk classification. Third, PPV and NPV were influenced by the incidence of an event in the study population. The prevalence of inpatient falls being estimated as 3% is a rough estimate according to the data collected from the registration system of the patient safety committee, and is therefore arbitrary to some extent. Finally, this study was conducted in a single local hospital. A large-scale, prospective study is required to further explore the validity and reliability of these scales.

conclusions
This study proposes that the XGB classification model, which is more sensitive than the MFS, is more appropriate for assessing the fall risk in hospitalized patients. Furthermore, we identified several intrinsic and extrinsic risk factors that enhanced the ability to determine the underlying information on fall risk among the population. When relevant information is documented in regular medical records, XGB may better provide important insights for fall risk assessment compared to conventional rule-based criteria. The validity and reliability of prediction models based on machine learning must be carefully studied in further prospective, large cohort studies before they are used in clinical practice.

Study design and participants. This study analyzed a cohort of patients hospitalized in Chiayi Chang
Gung Memorial Hospital between February 2015 and December 2018 and was approved by the Institutional Review Board of Chang Gung Medical Foundation. Patients who experienced fall events were collected from the registration system of the patient safety committee. Patients who were under 20 years old and those who had incomplete data were excluded. For each fall case, up to three controls were randomly selected from the pool of patients who were admitted on the same day and had matched age and sex as the fall patient. Data were collected on the number of patients who fell rather than on the number of falls. All patients were retrospectively assessed for fall risk upon their admission according to the MFS. Basic patient information, medical records and demographic data were obtained. Figure 3 shows the flowchart of the study.

Morse fall scale (MfS).
The MFS consists of six evaluation items, including a history of falling (0 or 15 points), secondary disease (0 or 15 points), ambulatory aid (0, 15, or 30 points), intravenous therapy/heparin lock (0 or 20 points), gait (0, 10, or 20 points), and mental status (0 or 15 points). The total score ranges from 0 to 125 points. A score below 25 is defined as the low-risk group, a score between 25 and 45 is defined as the intermediate-risk group, and a score above 45 is marked as the high-risk group 9 .

Development of XGB.
Chen et al. demonstrated the robust power of XGB system to control over-fitting in a variety of data mining challenges 16 . Furthermore, the classification tree structure of XGB is comprehensible and allows for the extraction of explicit classification determinants, which can be useful in risk stratification and subgrouping of the population. As a preliminary step, we tried to fit the training dataset to several types of models, including XGB, decision trees, random forest and linear discriminant analysis. The results also showed that XGB obtained the best performance in terms of sensitivity, specificity and AUC. Therefore, we have adopted XGB in this work.
Using data scraping techniques, 35 items were automatically extracted from standardized medical records, including the admission note and admission nursing record, and used for the data mining algorithms. The intrinsic factors and predictors which directly linked to the cause of a fall were defined as the generalizable factors. For both the whole feature set and a subset of generalizable factors, prediction models were trained www.nature.com/scientificreports/ independently using the training set and then benchmarked using the validation set. The items included in the extracted dataset are shown in Table 1.
The XGB, a scalable, supervised machine learning algorithm, is used to induce a classification model, as implemented in Python 3.4.3 with the XGBoost library.
To investigate the determinants for predicting inpatient fall events, feature importance was evaluated by using the information gain-based feature ranking algorithm. Information gain is a metric that quantifies the improvement in performance measure of a tree-based algorithm from each attribute that is split based on a given feature 17 . The information gain implies the relative contribution of the corresponding features to the model. A feature with a higher value in information gain among the whole feature set implies its significance for generating the prediction. Due to the inherent attribute selection process of the XGB algorithm, only a subset of the items actually appeared in the prediction model. A low-dimensional XGB model using a subset of features with high information gain was used to attempt to identify patients at a higher fall risk.
Statistical analysis. Statistical analyses were performed using MedCalc 18.9.1 (MedCalc Software, Ostend, Belgium). Observed distributions were tested against the hypothesized normal distribution (Kolmogorov-Smirnov test). Data were reported as the mean ± standard deviation or number (%) unless otherwise indicated. The ROC curve is a plot of true positive rate against false positive rate evaluated at consecutive cutoff points of predicted probability. The area under the curve (AUC) measures the discriminatory ability of a model, where a value of 1.0 indicates perfect discriminatory power and a value of 0.5 indicates no discriminatory ability. To determine and compare the discrimination power of the MFS and XGB, the sensitivity and specificity were analyzed based on the area under the receiver operating characteristic (AUC-ROC) curve analyses. The optimal cut-off values for the ROC curves were determined using a maximized Youden's index 18 . ROC curves were compared using the method described by DeLong et al. 19 . The classification results for each model was summarized by a 2 × 2 contingency table, and the performance was assessed by calculating sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR +), and negative likelihood ratio (LR-) 20 . The PPV and NPV were calculated with an estimated incidence of 3% according to the data collected from the registration system of the patient safety committee. In all analyses, P < 0.05 indicates statistical significance. ethics approval and consent to participate. The study was approved by the Institutional Review Board (IRB) of Chang Gung Medical Foundation, in accordance with the ethical standards of the responsible committee on human experimentation (IRB Nos. 201900460B0). Informed consent was obtained from all study participants in the manuscript. consent for publication. Not required.

Data availability
All data generated or analyzed during this study are included in this published article.  www.nature.com/scientificreports/