Artificial intelligence to predict in-hospital mortality using novel anatomical injury score

The aim of the study is to develop artificial intelligence (AI) algorithm based on a deep learning model to predict mortality using abbreviate injury score (AIS). The performance of the conventional anatomic injury severity score (ISS) system in predicting in-hospital mortality is still limited. AIS data of 42,933 patients registered in the Korean trauma data bank from four Korean regional trauma centers were enrolled. After excluding patients who were younger than 19 years old and those who died within six hours from arrival, we included 37,762 patients, of which 36,493 (96.6%) survived and 1269 (3.4%) deceased. To enhance the AI model performance, we reduced the AIS codes to 46 input values by organizing them according to the organ location (Region-46). The total AIS and six categories of the anatomic region in the ISS system (Region-6) were used to compare the input features. The AI models were compared with the conventional ISS and new ISS (NISS) systems. We evaluated the performance pertaining to the 12 combinations of the features and models. The highest accuracy (85.05%) corresponded to Region-46 with DNN, followed by that of Region-6 with DNN (83.62%), AIS with DNN (81.27%), ISS-16 (80.50%), NISS-16 (79.18%), NISS-25 (77.09%), and ISS-25 (70.82%). The highest AUROC (0.9084) corresponded to Region-46 with DNN, followed by that of Region-6 with DNN (0.9013), AIS with DNN (0.8819), ISS (0.8709), and NISS (0.8681). The proposed deep learning scheme with feature combination exhibited high accuracy metrics such as the balanced accuracy and AUROC than the conventional ISS and NISS systems. We expect that our trial would be a cornerstone of more complex combination model.

Traumatic injuries have long been scaled to assess the severity of patients and especially to predict mortality. To quantify the severity of such injuries, various metrics using anatomic and physiologic elements have been established based on conventional statistical tools such as logistic regression (LR) [1][2][3][4][5][6][7] . Since the 1970s, the injury severities have been classified through an anatomical-based coding system named the abbreviated injury scale (AIS). The AIS classifies individual injuries by body region from one to six. The system has been constantly revised and improved, and the latest revision (AIS 2005-Update 2008 and AIS 2015), including 1999 injury descriptors, was recently announced 1 .
The most popular model to assess the trauma severity is the injury severity score (ISS), which pertains to the sum of the squares of the three highest AIS scores 2 . Similarly, the new ISS (NISS), which is designated to replace the ISS, pertains to the squares of the three highest AIS scores regardless of the body region [3][4][5] . Using the ISS and NISS, mortality has been predicted based on a certain score; however, the associated mortality prediction accuracy is low for patients with multiple injuries in different areas 6,7 .
As a more complex combination model including the physiological status, the trauma and ISS (TRISS) has been widely used 8 . In addition, other combination models, such as A Severity Characterization of Trauma New trauma severity features. The original AIS codes have 1999 injury codes, which are excessively complex to be used as input features in AI models and may lead to model overfitting. Consequently, we established new anatomical categories named Region-46 and Region-6 by reducing the AIS codes into 46 categories and six categories, respectively, based on the organ locations.
To generate Region-46, we reduced the 1999 AIS codes to 46 features through organization according to the organ location, clinical significance, and frequency. The complete list of Region-46 is presented in Supplementary Table 1. In the head region, we categorized the codes by the cerebellum, cerebellum epidural, intracerebellar, cerebellum subdural, cerebrum, cerebrum epidural, intracerebral, cerebrum subdural, skull, and head. The relatively less important and less frequent injuries were categorized as head or skull. In the face region, no frequently occurring code represented a severe injury, and thus, all the corresponding codes were categorized as the face. In the neck region, we categorized the codes by the neck vessel, neck esophagus, neck trachea, and cervical spine. The other corresponding codes were categorized as the neck. In the thorax region, we categorized the codes by the thorax vessel, bronchus, diaphragm, thorax esophagus, heart, lung, pericardium, thorax trachea, rib, and thoracic spine. The other codes were categorized as the thorax. In the abdomen region, we categorized the codes by the abdomen vessel, bladder, colon, duodenum, small bowel, kidney, liver, mesentery, omentum, pancreas, rectum, spleen, stomach, and lumbar spine. The other codes were categorized as the abdomen. In the upper extremity region, all the codes were categorized as upper extremity because the scores of all the codes were low overall. In the lower extremity region, the pelvis was the most clinically significant organ, and the other injuries were less significant. Thus, we categorized the codes as the pelvis and lower extremity. In the external region, the AIS scoring codes were low, and thus, all the codes were categorized as external. In Supplementary Table 2, we also presented the example of how AIS codes were converted to Region-46 codes for a novice coder with trauma knowledge to understand the process.
When using the Region-46 categories as input features, each input feature was the sum of the squares of the AIS scores for the same feature. For instance, if two AIS scores (valued 2 and 3) corresponded to the neck among the Region-46, the AI model input for the neck feature was 2 2 + 3 2 = 13 . In this manner, each patient data point pertained to 46 feature values used for the AI model input layer to predict mortality.
Region-6 was considerably simpler than Region-46, as it scored the body regions identical to the six regions of the ISS (head/neck, face, thorax, abdomen, extremity, and external). ISS takes the highest AIS severity code in each of the three most severely injured body regions. Each three AIS score is squared and added (i.e. ISS = 5 2 + 4 2 + 2 2 ). Furthermore, up to five hidden layers were examined, along with each layer depth (node) up to the previous layer depth (node). For the fully connected layers (FCs), the dropout rate was varied from 0 to 0.5 in 0.1 increments. The last FC layer was fed to a sigmoid layer, providing the probabilities for the patient mortality. We implemented and trained the models using TensorFlow and Keras. Moreover, model training was realized using the ADAM optimizer and a binary cross-entropy cost function with a learning rate of 0.0001 and batch size of 64 on an NVIDIA GeForce GTX 1080 Ti GPU.
For the hyperparameter search and cross-validation, a grid search with a 10-repeated tenfold cross-validation was performed. Among the 10 folds, a single fold was retained as the validation data, and the remaining nine folds were used as the training data. Specifically, the training data (n = 33,986) were randomly shuffled for each trauma center and partitioned into 10 equal folds in a stratified manner: Each fold included 3284 survived and 114 deceased data. The process was repeated 10 times, with each of the 10 folds used once as the validation data. Because the amount of survived data overwhelmed the deceased data, we randomly downsampled the survived data. By matching the amount of data in the two groups, the model bias toward survived data could be minimized.
Considering the cross-validation analysis results, the five-layer DNN was selected. The architecture of DNN (Region-46) is illustrated in Fig. 1b. The proposed five-layer DNN included an input layer, three FC layers, and an output layer. The input layer containing 46 features was fed to three FC layers in series, consisting of 64, 32, and 1 node(s) with a leaky rectified linear unit. To alleviate the overfitting issue, dropout and L2 regularization were applied in the last two FC layers. For the second and third FC layers, the dropout rates were 0.5 and 0.3, respectively, and regularization parameters were 0.001. The last FC layer was fed to a sigmoid layer.
By repeating this stratified tenfold cross-validation process 10 times, 100 models from the five-layer DNN were derived. Based on these models, we ensembled the results by averaging the final outputs associated with the mortality probabilities.
Performance evaluation of AI models. The prediction performance of the DNN models was evaluated using the isolated testing dataset (n = 3776) and by applying the sensitivity, specificity, accuracy, and balanced accuracy metrics, defined as where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively. The balanced accuracy was used to account for the considerable imbalance between the survived and deceased groups (29:1). www.nature.com/scientificreports/ Next, we compared the performance of the proposed DNN model and other external AI models such as LR, random forest (RF), and support vector machine (SVM). The models were separately trained to perform

Results
For all 37,762 patients, we counted the number of patients associated with each of the Region-46 features and summarized the counted numbers in Supplementary Table 3. The number of patients associated with lower extremity (n = 12,280) was the highest, followed by upper extremity (n = 11,548), face (n = 8310), head (n = 7200), rib (n = 6147) and thorax (n = 4763). Note that each patient may belong to multiple features due to multiple injuries.
Training and cross-validation of the DNN models. The results of the cross-validation accuracy with the metrics of sensitivity, specificity, accuracy and balanced accuracy are presented in Supplementary Comparison with other prediction models. The prediction performances of the external AI models (LR, RF, and SVM models) and traditional prediction models (ISS-16, ISS-25, NISS-16, and NISS-25) are presented in Table 3. Details of the performance comparison according to the four trauma centers are summarized in Supplementary Table 5.
The highest balanced accuracy pertained to DNN (Region-46) (85.05%) and that of the other AI models ranged from 79.40 to 83.62%. Among the input features based on the DNN, the highest AUROC values pertained to Region-46 (0.9084), followed by those of Region-6 (0.9013), and AIS (0.8819); the corresponding values of ISS and NISS were 0.8709 and 0.8681, respectively (Fig. 2a). Among the different AI models based on Region-46, the highest AUROC corresponded to DNN (Region-46), followed by that of the LR (0.9013), RF (0.8853) and SVM (0.8829) (Fig. 2b). www.nature.com/scientificreports/ Notably, the traditional prediction models based on the ISS and NISS exhibited a lower prediction performance (balanced accuracy ranging from 70.82 to 80.50) than AI models based on LR, RF, SVM, and DNN (balanced accuracy ranging from 79.40 to 85.05).
The AIS-based ISS has been widely used to evaluate the severity of trauma patients since its introduction in the 1970s 1,2,18 . Most subsequent systems such as the NISS 19 , exponential ISS (EISS) 20 , and anatomic profile score (APS) 21 could not supplant the ISS4. Nevertheless, the ISS system based on the AIS involves several limitations 22,23 . First, a major severe injury may be underestimated because only the single highest score is selected in the same region. Second, the severity may differ if a different combination of injuries has the same ISS. Third, the ISS and mortality do not have a linear association, and a negative relationship may also occur in certain ranges. Finally, the AIS has the same weighting for different organs even though all organs have a different prognosis.
These limitations of the ISS can be alleviated by using AI models 24 . The DNN model can assign different weights to each input data. Thus, we fragmented the conventional ISS system and used the AI methodology to overcome the problem of nonlinearity and weights for each AIS score. In this study, a key challenge was to decide the predictors and features, as the input variables significantly affect the results of machine learning 20 .
A key advantage of DNN (Region-46) is that it was trained with a large training cohort (n = 37,762) with highly comprehensive data involving a reasonable number of AIS score data points. The Ministry of Health and Welfare established up to 17 regional trauma centers in Korea (by 2021). All the regional trauma centers were mandated to register severe trauma patients in the KTBD. Numerous variables, including the demographics, pre-hospital and in-hospital information, time factors, clinical characteristics, vital signs, trauma scores, and final outcome information, were recorded. Therefore, the quantity and quality of the KTDB data were satisfactory 25 .
In general, the DNN involves high computational complexity, although it operates more effectively and flexibly than the RF, LR, and SVM, owing to the learning of the nonlinear problem. Nevertheless, owing to the simplified input features, DNN (Region-46) required only 59.52 s for training and 2.19 ms for instance testing, indicating its suitability for practical mortality prediction. Moreover, DNN (Region-6) provided accurate prediction results, requiring only 50.39 s for training and 2.11 ms for instance testing. Thus, if Region-46 is not promptly available, DNN (Region-6) can be used as an alternative method.
The original AIS codes have 1999 injury codes, which are overly complex to be used as input features in AI models and may lead to model overfitting in the current training dataset involving 37,762 patients. The number of features considerably influences the performance of deep learning. The presence of excessively few or many features does not ensure high performance. In general, many data points are required for numerous features. Consequently, we established a new anatomical category, Region-46, by fragmenting the individual AIS codes and aggregating them into 46 anatomic regions. Our proposed Region-46 (46 input variables) could provide www.nature.com/scientificreports/ more accurate prediction results using more subdivided features than ISS (six input variables). In addition, it minimizes the overfitting issues using reasonably fewer features than AIS (1999 input variables). The better performance of the Region-46-based AI model was possible because 1999 AIS codes were with a sparse matrix, which contains very few non-zero values. Another advantage of Region-46 is that we derived it from the conventional AIS coding system. It indicates that the Region-46-based AI model enables us to train the model accurately within the existing coding system, without significant changes in the system such as the coding method. Nevertheless, the advantages of Region-46 are based on the assumption that there is no massive data. It is obvious that we can train AI models more accurately using more features from 1999 AIS codes if we have massive data. We postulate that a considerably larger dataset is required to utilize 1999 AIS codes as input features without overfitting. Thus, future research will proceed in the direction of increasing the number of features by collecting more data, and the ultimate goal is to use all 1999 AIS codes.
In addition to the data sample size, this study involves several limitations to be solved. First, this study is retrospective, although the database was collected prospectively. In this study, we performed a retrospective study to investigate the potential of AI models based on our proposed new anatomical injury score. We believe that new prospective studies will be able to minimize the biases and establish the clinical impact of the prediction model. In addition, the prospective study should be able to update AI models in real-time in more regional trauma centers. For future work, we plan to develop a framework that can use real-time data in connection with all the regional trauma centers in Korea, which will improve the model for better generalization. Second, we excluded the patients who died within six hours from arrival because we assumed that the patients were likely not comprehensively evaluated owing to the insufficient time. This aspect may contribute to selection bias. Thus, our future work would include the patients irrespective of the time of death, which may provide a more robust AI model to predict the severity of trauma patients. Third, the input data did not include the physiologic and demographic parameters such as blood pressure, heart rate, mental status, respiration, age, sex, and comorbidities, while likely leading to an inherently limited accuracy. However, we noted the inherent potential of solely AIS code analysis that would be a basis of the future complex models. In particular, this study highlights the substantial scope for improvement in terms of the injury scale. Compared to ISS, more complex combination models such as TRISS 8 , ASCOT 9 , HARM 10 , TRAM 11 , and KTS 12 use physiologic and demographic parameters to improve the performance. Although these complex combination models have not provided sufficient accuracy to replace the ISS, they have the potential to provide personalized prediction information of trauma severity. Thus, in our future work, we will extend our model to include the physiologic and demographic parameters towards personalized severity prediction. Finally, in this study, we considered mortality as an indicator of the severity of trauma patients as a conventional scoring system focused on mortality. However, regarding survivors, we also should be able to predict other outcomes such as morbidity, prevention effectiveness, or burden of cost. By developing new indicators that predict these outcomes, we could provide guidelines for entire trauma system including heath care provider, agencies, and institutions.
In conclusion, we developed a five-layer DNN model with an input layer involving 46 new trauma severity features (Region-46), which exhibited a higher prediction performance than the conventional ISS and NISS systems. To the best of our knowledge, this study represents the first attempt to employ AI prediction models using anatomic injury severity information. Furthermore, this study emphasizes the need for reinvention of a new injury severity scoring system. We expect that this research can provide reference to incorporate AI techniques in trauma care and improve our understanding of the determinants of injury severity. We believe our endeavor will contribute to reducing the societal burden of injury.

Data availability
Data are available from the corresponding author upon reasonable request. The data are not publicly available due to restrictions from KTDB policy.

Code availability
Code is available at https:// github. com/ Heewo nChun g92/ AIS. The code was written in python 3.6 and Tensorflow 2.0. The files named Code_Train.py and Code_Predict.py are for training and testing the model, respectively. The file named Dataset_mini.mat is for the automatic conversion program from AIS codes to Region-46 codes.