Prediction of hospitalization using artificial intelligence for urgent patients in the emergency department

Timely assessment to accurately prioritize patients is crucial for emergency department (ED) management. Urgent (i.e., level-3, on a 5-level emergency severity index system) patients have become a challenge since under-triage and over-triage often occur. This study was aimed to develop a computational model by artificial intelligence (AI) methodologies to accurately predict urgent patient outcomes using data that are readily available in most ED triage systems. We retrospectively collected data from the ED of a tertiary teaching hospital between January 1, 2015 and December 31, 2019. Eleven variables were used for data analysis and prediction model building, including 1 response, 2 demographic, and 8 clinical variables. A model to predict hospital admission was developed using neural networks and machine learning methodologies. A total of 282,971 samples of urgent (level-3) visits were included in the analysis. Our model achieved a validation area under the curve (AUC) of 0.8004 (95% CI 0.7963–0.8045). The optimal cutoff value identified by Youden's index for determining hospital admission was 0.5517. Using this cutoff value, the sensitivity was 0.6721 (95% CI 0.6624–0.6818), and the specificity was 0.7814 (95% CI 0.7777–0.7851), with a positive predictive value of 0.3660 (95% CI 0.3586–0.3733) and a negative predictive value of 0.9270 (95% CI 0.9244–0.9295). Subgroup analysis revealed that this model performed better in the nontraumatic adult subgroup and achieved a validation AUC of 0.8166 (95% CI 0.8199–0.8212). Our AI model accurately assessed the need for hospitalization for urgent patients, which constituted nearly 70% of ED visits. This model demonstrates the potential for streamlining ED operations using a very limited number of variables that are readily available in most ED triage systems. Subgroup analysis is an important topic for future investigation.


Response variable. The primary response variable was the patients' disposition made by ED physicians
and was encoded as a binary variable with 'admission' coded as 1 and ' discharge' coded as 0.
Demographics. Two demographic variables, age and sex, were used in this study. They were either collected from the patients or from the Taiwan Health Care Database System by the triage nurse. While the age variable is numeric with one decimal fraction, and the gender variable is binary with the 'male' gender coded as 1 and the 'female' gender coded as 0.
Vital signs for triage evaluation. Six vital signs, including temperature, heart rate, respiratory rate, systolic blood pressure, diastolic blood pressure, and mean arterial pressure (MAP), were measured and recorded by the triage nurse. Oxygen saturation was not included, as a significant number of patients did not have their oxygen saturation level measured at triage. As such, the amount of missing information made this vital sign unavailable for model building. All six selected variables are numeric. Except for the temperature variable, which has one decimal fraction, the other variables have integer values.
Medical history. Electronic medical records of the patients were pulled out by the triage nurse while patients arrived at the triage station using the International Classification of Diseases (ICD) code. Diseases were classified following the ICD-10 codes, and an integer score was assigned to each classification. The value of this variable is the sum of scores of all classifications the patient belongs to or zero if the patient had no medical history available. The value of this variable ranges from 0 to 12. Chief complaints. The Taiwan triage acuity scale is defined by the Ministry of Health and Welfare (MOHW) of Taiwan. It includes a code system corresponding to the patient's chief complaint and its severity, which is similar to the ICD code. There are four major categories in this code system: trauma, nontraumatic adult, pediatrics, and environmental emergency. The triage nurse chooses the code that meets the patient's chief complaint and its severity most appropriately and then decides the patient's emergency severity index.
where v(cf) : decimal value for the given chief complaint code cf. n t (cf) : the number of patients in the training data set whose chief complaint code is cf. n h,r (cf) : the number of patients in the training data set whose chief complaint code is cf and who were hospitalized and eventually recovered. n h,d (cf) : the number of patients in the training data set whose chief complaint code is cf and who were hospitalized and eventually deceased.
Mathematically speaking, the value v(cf) , being the sum of two terms, represents the risk of hospitalization for patients with chief complaint code cf. The first term is the percentage of patient hospitalizations with chief complaint code cf, while the second term is the percentage of deaths among patients with chief complaint code cf who were hospitalized. The second term is introduced to differentiate one chief complaint from another, which results in the same number of admitted patients. Intuitively, a medical complaint that results in more deceased patients should be assigned a higher risk value. Note that when n t (cf) is equal to zero, which means that there is no patient in the training data set whose chief complaint code is cf  11 . A three-layer structure is assumed with output dimensions of 100, 12, and 1. Between layers, the batch normalization technique was adopted to facilitate the training process. The (final) output layer utilizes the sigmoid function as the activation function. As such, a mathematical representation of the model can be expressed as where x represents the input vector of 10 variables, and y is the model output, which has a value between 0 and 1. The model output represents the likelihood of the patient being admitted. The nonlinear function f (x) , primarily determined by the first two layers, contains 2549 trainable model parameters.
The data were randomly divided into training (80%) and validation (20%) sets. Statistical similarity of the training and validation sets regarding demographics, medical history, and chief complaints was confirmed by statistical analysis. Specifically, the percentages of admitted and discharged patients in each category of the training and validation data sets were calculated and compared. The model parameters were trained based on the training data set, and the predictive power of the model was evaluated by the validation data set using the area under the curve (AUC) in the receiver operating characteristic (ROC) analysis. The optimal cutoff point on the ROC curve was calculated based on Youden's index 12 , which in turn was used to calculate sensitivity, specificity, positive predictive value, and negative predictive value for the prediction model applied to the validation data.
Testing the benefit of additional training samples. To answer a key question of whether one can improve the performance of the model if additional training samples are used for tuning the model parameters, we trained the model on randomly selected fractions of the training set. Specifically, we trained the model using 100%, 75%, 50%, 25%, and 12.5% of the training samples, calculated the corresponding AUCs on the held-out validation set and quantified the incremental gain in performance.

Variables of importance.
To determine which variables are crucial and more "useful" for predicting hospital admission, lower dimension models were trained using various subsets of variables. The performances of these models, indicated by AUCs on the validation set, were compared to determine whether these models could predict hospital admission as robustly as the full model. Variables to drop were selected based on the statistical analysis shown in Tables 1 and 2

Results
Characteristics of study samples. There were 441,782 ED visits between January 2015 and December 2019. Approximately 70% of visits were classified as level-3 by the triage nurse. After the exclusion of any missing information at triage stations and other dispositions, such as transfer and against-advice discharge, a total of The model was also applied to patients in the following four subgroups to examine its respective predictive power in these groups: nontraumatic adult, pediatrics, trauma, and environmental emergency. For nontraumatic adult patients, the model achieved a validation AUC of 0.8166 (95% CI 0.8199-0.8212), which was higher than its performance on all patients. For pediatric and traumatic patients, however, the model performances were worse. The achieved validation AUCs were 0.6637 (95% CI 0.6492-0.6782) and 0.7762 (95% CI 0.7623-0.7901), respectively. For patients with environmental emergencies, the model performed significantly higher, with a validation AUC of 0.9274 (95% CI 0.8801-0.9747). All validation ROC curves are shown in Fig. 1.
Testing the benefit of additional training samples. The model trained on 75% of the training set achieved a validation AUC of 0.7999 (95% CI 0.7958-0.8040). The 95% confidence interval contains the AUC of the model trained on the entire training set. The algorithm with the proposed model structure appears to reach maximum performance at 75% of the training set or less. All AUC values with corresponding 95% confidence intervals are provided in Table 3.

Variables of importance.
Excluding four vital signs (temperature, respiratory rate, systolic blood pressure, and diastolic blood pressure), a lower-dimensional model was built using the remaining six variables (age, sex, heart rate, MAP, medical history, and chief complaint). The model achieved a validation AUC of 0.7963 (95% CI 0.7921-0.8005); the 95% CI contains the AUC of the full-dimensional model. Notably, if one further excludes the chief complaint variable, the predictive power of the resulting model diminishes significantly. The validation AUC of the model built on only five variables (age, sex, heart rate, MAP, and medical history) dropped to 0.7501 (95% CI 0.7454-0.7548). To further confirm this observation, a model was built based on nine variables (all but the chief complaint variable). The validation AUC of this model also dropped to 0.7517 (95% CI 0.7470-0.7564).

Discussion
After emergency physicians examine patients for the first time, they often make predictions regarding the patients' outcomes and diagnoses using so-called diagnostic intuition 13 . Physicians often work using two types of mindsets: the intuition mindset and the analytical mindset 14 . The intuition mindset depends largely on the physicians' own experiences. It responds quickly and works similar to pattern recognition. The analytical mindset is usually based on the existing knowledge and available data. These processes are complicated and cognitively resource-demanding 15 . The intuition mindset plays an important role in medical decision making 16 . It is often associated with the patient's prognoses rather than diagnoses 17 . The work based on this mindset benefits from the physician's clinical reasoning and the accuracy of the diagnoses 18 . The setting of our study simulates the scenario where the physician visits the patient for the first time. The computational model predicts the patient's outcome with very limited information; the work of this model is comparable to the physician's diagnostic intuition. The existing clinical decision-support systems are often criticized for their nonuser friendly methods of information collection 19 . Our model avoids these drawbacks by adopting variables that can all be extracted from the electronic medical record system automatically. This is a significant advantage for the utilization of our model.
The primary purpose of this model was to predict whether an urgent (level-3) patient requires hospitalization right after they are initially triaged. The model was designed to serve as a secondary triage tool assessing probability of hospitalization, which essentially gives an indication of the severity level of the patient. The assessment that our model provides would help the risk stratification of patients and streamlining the ED operations 10 . www.nature.com/scientificreports/  www.nature.com/scientificreports/ Patients who were predicted requiring hospitalization would be sent to the therapeutic area for timely examination and further treatment or who otherwise could be fast-tracked for rapid evaluations and discharge. This in turn improves both the quality of medical care and patient safety 20 . By reducing unnecessary examinations and the length of ED stay, the model could also improve patient's satisfaction [21][22][23] . With the assessment provided by the model, ED physicians would be more confident in their decisions regarding patient disposition. For patients who need hospital admission, the process can be initiated earlier according to the prediction given by the model to reduce ED boarding 24,25 . On the other hand, unnecessary examination and observation in the ED could be avoided for those who do not need hospital admission. Furthermore, based on the prediction of the need for hospitalization, the ED can streamline patients to primary care services, which subsequently reduces ED crowding 26,27 . The efficient allocation of medical resources in the ED can improve the cost management and quality control 28 .
Because only a few variables are adopted in our model and they are readily available, the model can be utilized in the prehospital setting to improve the efficiency of the emergency medical services (EMS) system. With the aging of the population, the loading of the EMS has increased in many countries 29 . Misuse of the ambulance by low-acuity patients unnecessarily occupies emergency medical resources and thus endangers patients who truly need emergent medical aid 30 . Our model can be used by the EMS system to divert ambulance requests to other alternatives for those with low acuity 31 . The potential benefits include better reserve of the EMS resources and possible improvement in the outcome of patients who receive medical aid sooner [32][33][34] . For the patients who were predicted not requiring hospitalization, the emergency medical technicians could apply a "treat and release" protocol by giving primary medical aids in the field without transporting the patients to hospitals 35,36 .
Another potential application of our model is "self-triage". Certain computer algorithms had been proposed for patients to perform self-triage before ED visits, in the hope of better patient streamlining [37][38][39][40] . However, most of the algorithms either require many variables to perform prediction or fail to demonstrate a sufficient prediction power [37][38][39][40] . Our model offers a reliable prediction of hospital admission using very limited variables that could be obtained by the patients themselves. Those who were predicted to have low possibilities of hospitalization could be first delivered to primary care services.
People generally believe that having many decision variables is necessary for successfully building a predictive model based on machine learning algorithms. On the other hand, having many decision variables often makes it difficult to explain the possible causality and correlation between the decision variables and the designated model output 41 . In this study, we did not blindly use many variables for the model building process. Instead, we chose specific variables that may have explainable causality with the model output according to the physicians' experiences. Moreover, important information, such as chief complaints and medical history, were distilled into one single variable representing the risk of hospitalization. This is in contrast to the approach taken in other studies 42 . The results of our study demonstrate that this approach works very well. Models with good predictive power can be built using as few as six decision variables. Compared to previous methods 42, 43 , very few computational resources are required for building and using our model, and the predicted outcomes are easily explained in medicine and conform to medical intuition.
In our study, we found that the predictive power of our model differed among the four subgroups (nontraumatic adult, pediatrics, trauma, and environmental emergency) of ED patients. The model performed worse in the traumatic subgroup and substantially worse in the pediatric nontraumatic subgroup. Technically speaking, the predictive power of the model originates from recognizing certain crucial characteristics of the patients.
Our results indicate that patients in one subgroup appear to have crucial characteristics that significantly differ from those of patients in another subgroup. For example, among nontraumatic adult patients, the likelihood of hospitalization increases as age grows, while among pediatric patients, this correlation is usually the opposite. In contrast, among traumatic patients, 'age' may not be an important factor for predicting hospitalization at all. As such, it seems difficult to use a single model to predict the outcome for all types of patients in the ED due to the high heterogeneity among patient characteristics in different subgroups. Further studies analyzing these patient subgroups are required to build separate models with high predictive power for different patient subgroups.

Limitations
There are some limitations in our study. First, the model was structured based on the data of single medical center. For broader utilization, it might need the data from multiple medical centers and rural hospitals to improve the accuracy of the model. Furthermore, the oxygen saturation level was not included as a variable in our model. During the study period, the triage nurses in the study hospital were not required to routinely obtain the oxygen saturation levels. As a result, the triage staff tended to skip taking measurement of the oxygen saturation level when they were subject to heavy workload. The workflow protocol was later corrected to demand this item, but the problem of data missing in our study period remains. Moreover, our model, aimed to promptly identify patients with high possibility of hospitalization, was designed to use only the data that could be collected at the initial triage. Further studies are planned to evaluate the benefits between timely prediction and the degree of accuracy when more variables, such as results of initial laboratory test and medical images, are introduced to the model. Finally, we did not perform a comparison analysis with other computer or human (physicians or triage nurses) prediction models. To our knowledge, despite certain studies on triage accuracy [37][38][39] , no model for prediction of hospitalization had been reported. This is an interesting subject for future studies, which would facilitate the integration of the AI model into the work of the triage crew.