Machine learning enhances the performance of short and long-term mortality prediction model in non-ST-segment elevation myocardial infarction

Machine learning (ML) has been suggested to improve the performance of prediction models. Nevertheless, research on predicting the risk in patients with acute myocardial infarction (AMI) has been limited and showed inconsistency in the performance of ML models versus traditional models (TMs). This study developed ML-based models (logistic regression with regularization, random forest, support vector machine, and extreme gradient boosting) and compared their performance in predicting the short- and long-term mortality of patients with AMI with those of TMs with comparable predictors. The endpoints were the in-hospital mortality of 14,183 participants and the three- and 12-month mortality in patients who survived at discharge. The performance of the ML models in predicting the mortality of patients with an ST-segment elevation myocardial infarction (STEMI) was comparable to the TMs. In contrast, the areas under the curves (AUC) of the ML models for non-STEMI (NSTEMI) in predicting the in-hospital, 3-month, and 12-month mortality were 0.889, 0.849, and 0.860, respectively, which were superior to the TMs, which had corresponding AUCs of 0.873, 0.795, and 0.808. Overall, the performance of the predictive model could be improved, particularly for long-term mortality in NSTEMI, from the ML algorithm rather than using more clinical predictors.


Results
Patient enrollment and characteristics. Patients diagnosed with AMI were classified into STEMI and NSTEMI. Of the 5557 patients with STEMI, 273 patients (4.9%) died during the hospital stay (Supplementary Table 1). After excluding those with missing information on the variables during hospital admission, the final dataset for the three-and 12-month mortality contained 4911 survivors at hospital discharge. Among the survivors, 68 and 120 patients died within 3 and 12 months after hospital discharge, giving a mortality rate of 1.4% and 2.4%, respectively. For NSTEMI, 281 patients (3.3%) died after ED arrival among the 8626 patients examined. Of the 7716 survivors, 142 and 306 patients died within three and 12 months after hospital discharge, giving a mortality rate of 1.8% and 4.0%, respectively. Table 1 lists the demographic characteristics, according to mortality, of the patients before excluding those with missing information during hospital admission. The cumulative 12-month mortality of the study participants was 7.2% and 7.1% in STEMI and NSTEMI, respectively. The differences in the patients' characteristics according to survival in the STEMI group were similar to those of the NSTEMI group. Patients who survived at the 12-month follow up were younger than those who did not (62.4 vs. 73.9 years for STEMI, 66.3 vs. 76.5 years for NSTEMI). The proportion of female participants in the survival group was lower than those in the death group in both STEMI and NSTEMI. Moreover, those who survived at the 12-month follow up were less likely to have hypertension, diabetes, atrial fibrillation, and a history of MI, PCI, and stroke than those who expired during the 12 months after AMI. On the other hand, they were more likely to have dyslipidemia and be current smokers. Furthermore, those who survived at the 12-month follow up were likely to experience chest pain with sweating, have higher blood pressure at presentation, and lower troponin levels than those who had died by the 12-month follow up. The survival group had a lower proportion of heart failure, cardiogenic shock, left main disease, and three-vessel diseases. The survivors were more likely to take aspirin, beta-blockers, angiotensinconverting enzyme inhibitors, and statin than those who died by the 12-month follow up. In contrast, they were less likely to take oral hypoglycemic agents, warfarin, and non-vitamin K antagonist oral anticoagulants.
Performance of the predictive models in STEMI. When the prediction models were built by the ML algorithm using traditional variables in STEMI, the performance was enhanced marginally compared to the best performance among the traditional models (Fig. 1). An evaluation of the performance by the area under the receiver operating characteristic curve (AUC) revealed extreme gradient boosting (XGBoost) to be the best performing model, with an AUC of 0.912 in the ML models, followed by the modified GRACE in the original and modified traditional models (0.901) ( Table 2). On the other hand, the other models using the ML algorithms except for the Support Vector Machine (SVM) showed excellent performance over or near the AUC of 0.9. The other traditional models had a lower AUC but were close to 0.9. Regarding the three-month mortality after dis- www.nature.com/scientificreports/ charge, the best performing models were XGBoost and GRACE with an AUC of 0.784 and 0.766, respectively, in the ML and traditional models. This was followed in descending order of the AUC by logistic regression regularized with an L2 penalty (Ridge regression), logistic regression regularized with an L1 penalty (Lasso regression), logistic regression regularized with an elastic net penalty (Elastic net), and a Random Forest (RF). For the 12-month mortality, the best performing models were Ridge regression and GRACE in the ML and traditional models, having AUCs of 0.840 and 0.826, respectively. This was followed in descending order of the AUC by Lasso regression and elastic net regression, RF, modified TIMI, and XGBoost. According to the F1-score, the best performing ML model had a score of 0.388, 0.107, and 0.179, respectively, for the in-hospital, three-and 12-month mortality; those were similar or slightly higher than the F1-score of the corresponding traditional models. The highest F1-scores of the modified traditional models were 0.345, 0.075, and 0.170 in predicting the in-hospital, three-and 12-month mortality, respectively.
Performance of the predictive models in NSTEMI. The ML models in NSTEMI outperformed the traditional models in predicting the three and 12-month mortality when the ML algorithm was applied to the prediction models, including traditional variables ( Table 3). The highest AUCs of the in-hospital mortality prediction models were 0.889 and 0.888 in RF and XGBoost, respectively, which were superior to TIMI (AUC: 0.669) but similar to the modified ACTION-GWTG (AUC: 0.884). For the three-month mortality, the best performing models were Lasso regression (AUC: 0.849) and elastic net regression (AUC: 0.849), which were superior to GRACE (AUC: 0.777) and ACTION-GWTG (AUC: 0.795). The ML models, except for SVM, maintained an AUC > 0.8 for the 12-month mortality, while the AUCs were 0.675 and 0.790 in TIMI and ACTION-GWTG, respectively. The modified GRACE and ACTION-GWTG maintained good performance in predicting the 12-month mortality in addition to GRACE. Based on the F1-score, the best performing ML models were Lasso regression, elastic net, and XGBoost with a score of 0.236, 0.130, and 0.225 for the in-hospital, threeand 12-month mortality, respectively, while the highest figures were 0.224, 0.114, and 0.196, respectively in the traditional models. For the modified traditional models, the highest F1-scores were 0.243, 0.110, and 0.206 in predicting the in-hospital, three-and 12-month mortality, respectively.
Comparison of the performance between the ML and traditional models. A comparison of all the ML models with three conventional models according to the statistical significance revealed the ML models to be superior to the traditional models in predicting the long-term mortality in NSTEMI (Supplementary Table 2). The ML models outperformed TIMI in predicting the in-hospital mortality among the NSTEMI patients, while they were similar to GRACE and ACTION-GWTG. On the other hand, Lasso and elastic net regression were superior to all three traditional models in predicting the three-month mortality for those who survived to discharge. Moreover, Lasso, Ridge, and elastic net regression, and XGBoost had significantly higher AUCs in predicting the 12-month mortality than TIMI, GRACE, and ACTION-GWTG. In contrast, with STEMI, RF and XGBoost were the only ML models that significantly outperformed TIMI in predicting the inhospital mortality. Otherwise, the difference between the traditional and all the ML models was not statistically significant. A comparison of the ML models with the modified traditional models revealed consistent findings (Supplementary Table 3). The differences between the ML models and the modified traditional models were statistically significant among AMI patients, particularly in predicting long-term mortality.
Effect of optional clinical features and medication at discharge. The performance was not enhanced by including the optional predictors in the models (Fig. 2) www.nature.com/scientificreports/ whereas the corresponding numbers were 0.889, 0.849, and 0.860 in the ML model including the traditional predictors. None of the ML models, except for the SVM, showed a significant difference in the AUCs when the performance of the models with the traditional predictors only was compared with the model applying all the predictors. Moreover, a comparison of the ML models, including the traditional and optional predictors, and the corresponding models, including medication at discharge, showed no significant difference in both STEMI and NSTEMI (Supplementary Table 6). Supplementary Tables 7 and 8 list the importance of the variables. The variable importance was different for each prediction model. Some variables in the traditional predictors in the ML models were excluded, whereas some of the optional variables were included. Furthermore, the performance of the predictive models did not change significantly in both STEMI and NSTEMI when the problem of class imbalance was addressed www.nature.com/scientificreports/   Table 9 and 10). The highest AUC of the ML models was similar to that of the models using up-sampling, down-sampling, and SMOTE. Only the SVM benefited from balancing the classification using re-balancing methods.
Performance in external validation. The performance of the ML-based models was validated externally using the Korean Acute Myocardial Infarction Registry-National Institutes of Health (KAMIR-NIH) database, which is an independent prospective multicenter registry ( Table 4). The AUCs exceeded 0.9 except for the SVM for in-hospital mortality among the patients with STEMI and NSTEMI, but those were close to 0.8 for the 12-month mortality. The ML models were superior to the traditional model in predicting the 12-month mor- www.nature.com/scientificreports/ tality in NSTEMI, which is similar to the finding using the test data. On the other hand, the F1 scores in the KAMIR-NIH registry were lower than those in the internal validation.

Discussion
Mortality prediction models were developed using several ML algorithms (Lasso regression, Ridge regression, elastic net, RF, SVM, and XGBoost). Their performance was comparable in predicting the short-and long-term mortality of patients with STEMI with those of traditional risk stratification with comparable predictors. On the other hand, the discrimination improved the existing the prognosis prediction tools in NSTEMI, particularly in predicting long-term mortality. Furthermore, adding more clinical variables to the models did not enhance the performance of the predictive models for mortality in AMI. The ML algorithms outperformed the traditional risk score methods when the predictors were the same, but the difference was similar in STEMI, and the best working algorithms varied according to the predictors and outcomes. Some studies suggested applying ML algorithms to enhance the performance of the prognosis prediction model for patients with AMI 11,17 . A recent study reported that deep learning (AUC: 0.905) could outperform the GRACE score (AUC: 0.851) in predicting the in-hospital mortality of AMI patients 11 . The other study suggested that when predicting cardiac and sudden death during a one-year follow-up, the AUC in the ML models was improved by 0.08 compared to that in GRACE 17 . Another study reported AUCs of 0.828, 0.895, 0.810, and 0.882 in an artificial neural network (ANN), decision tree (DT), naïve Bayes (NB), and SVM, respectively, for the 30-day mortality, which were slightly higher than or similar to the values (0.83) from the GRACE risk score methods suggested in the validation study 3,18 . On the other hand, the previous study did not compare the performance between the conventional models and the ML models in the research data, so that it could only be inferred indirectly 18 . Although the above three studies showed that ML algorithms could enhance discrimination, other researchers proposed that the ML models were not always preferable to the traditional model. Some studies on the prognosis of AMI patients suggested that ML models were not superior but showed comparable performance to the regression-based approach [19][20][21] . One study using the administrative database of the National Inpatient Sample showed that RF (AUC: 0.85) was comparable to the traditional LR (AUC: 0.84) in predicting the in-hospital mortality among women with STEMI 19 . Another study showed that the best performance of ML models was similar to that of the GRACE score (AUC: 0.91 vs. 0.87) 20 . Austen et al. reported that when the cubic spline was included in the LR, it outperformed the ML models of the RF, regression trees (RT), bagged RT, and boosted RT 21 .
This study showed that ML models were better than the traditional models in NSTEMI but could not reach statistical significance in STEMI. The different superiority of the ML models compared to the traditional models www.nature.com/scientificreports/ in STEMI and NSTEMI may partially explain the inconsistency of the literature 11,[17][18][19][20] . Two of the three studies showing comparable performance between the traditional and ML models included patients with STEMI only 19,20 . In contrast, all three studies showing superior performance of the ML models included all patients with STEMI and NSTEMI 11,17,18 . Although this could not explain all the inconsistency because subgroup analysis showed that ML also outperformed GRACE in STEMI in a previous study 11 , the different performances of the ML models between the STEMI and NSTEMI groups may have contributed to the inconsistent findings. The ML models may have higher discrimination in the NSTEMI group than the traditional model because NSTEMI has more heterogeneous clinical and pathological features than STEMI 22,23 . STEMI results from a complete thrombotic occlusion of the infarct-related artery, while NSTEMI occurs in more heterogeneous conditions, such as incomplete coronary occlusion, coronary artery spasm, coronary embolism, myocarditis, and others 24 . Moreover, ML-based models could outperform the traditional models when analyzing complex data because of the non-parametric assumption, non-linearity, and higher-order interaction. Furthermore, the inconsistency appears to be due to the relatively small difference in the AUC between the ML model and GRACE because the GRACE risk score was updated in 2014, and the continuous variables were divided into many categories to reflect the non-linear relationship 8 . The ML-based models also require tuning parameters that may influence the model performance, which may fit and perform differently in different datasets 14 .
Traditional risk stratification focused on predicting the short-term mortality, while only a few suggested the one-year mortality. The CADILLAC risk score developed in 2005 showed good performance for the one-year mortality (c-statistic of 0.79). Moreover, GRACE 2.0, which was updated in 2014 considering the non-linear relationship between mortality and continuous variables, showed an AUC of 0.82 3,8,25 . After introducing the ML algorithms, some studies suggested that discrimination could be improved to predict the long-term mortality 15,17 . One recent study on the one-year mortality showed that the AUC of the prediction model could be up to 0.901 among patients admitted to the ICU with AMI, which was achieved using the Logistic Model Trees 15 . Another study also showed good discriminative power for the one-year mortality with an AUC of 0.898, which was achieved using either the Deep Neural Network or Gradient Boosting Machine 17 . The present study suggested that ML models maintained good discrimination for the 12-month mortality, but the AUC value was lower than those of the two previous studies 15,17 . This might be because the one-year mortality was defined not as the cumulative mortality, including in-hospital mortality, as in other studies, but as the mortality of those who survived at hospital discharge during the one-year follow-up. The current study aimed to help cardiologists make a treatment and management plan considering the risk of mortality when a patient is discharged.
This study showed that the performance of the prediction model was not increased significantly by adding the optional variables. This might be because the optional variables used in this study could not add more information to the ML models in predicting the mortality of patients with AMI. Only a few studies revealed the influence of features on the performance of prediction models. One study on the prediction model of the 30-day mortality after STEMI showed that the performance of most ML algorithms plateaued when the models introduced the highest 15 ranked variables among 54 variables 20 . Another study on the one-year mortality of patients with anterior STEMI showed a change in the performance of the prediction model when the top 20 ranked variables were selected instead of all 59 variables 26 . For RF, the AUC barely changed from 0.932 in the full model to 0.944 with the 20 features, while the changes depended on the model. The AUC decreased from 0.931 to 0.864 in LR, while it increased from 0.772 to 0.852 in the decision tree. The top 20 variables listed in their study were as follows: New York Heart Association Classification at discharge, heart failure at admission, heart rate, age, left ventricular ejection fraction, serum cystatin, initial BNP, platelet count, fibrinogen, serum creatinine, blood glucose, systolic blood pressure, diastolic blood pressure, total bilirubin, blood urea nitrogen, and revascularization type. Only five variables overlapped with the traditional variables in the present study. The predictive models using the ML algorithm appeared to be less dependent on the specific predictors because many clinical predictors influenced and reflected one another. ML algorithms, which allow non-linearity, higher-order effects, and interactions, may not depend on specific predictors as much as the traditional risk stratification methods.
This study suggested that the ML algorithm could enhance the performance of predictive models in AMI and pointed out the particular area where the predictive models could benefit from applying ML algorithms in AMI. Hence, clinicians can identify better those at high risk of mortality in NSTEMI using ML prediction models and focus on the high-risk group at admission and discharge. The ML-based prediction model could be integrated into the electronic medical records as a part of clinical decision support and be utilized in clinical practice. This model will inform clinicians of those who require close monitoring and intensive care during the hospital stay and require frequent follow-up and high medication adherence at discharge. This study had some limitations. First, the ML algorithm is less intuitive than the risk scoring system developed using traditional statistical analysis. The prediction model developed using the ML algorithm. The importance of predictors in the model is more challenging to interpret because they could contain non-linear models and ensemble methods. Moreover, the proposed prediction model may be specific to the study population, Korean patients with AMI. A previous study reported different risk factors and responses to medical and interventional treatments between Korean and Western AMI patients. Hence, predictive models could show different performance measures in other populations, and ML algorithms should be compared to confirm which is best 27,28 . Despite the improvement of AUC, the F1 scores were low in both the ML and traditional models, and the difference in the F1 scores between the ML and traditional models was small. Moreover, the statistical difference in the F1 scores could not be evaluated. Ranganathan and Aggarwal demonstrated it with an example that a test with good sensitivity and specificity could have low precision when applied to a disease with a low pretest probability 29 . The low F1 score in the current study may be due to the low precision and low mortality rate. They suggested that it would be prudent to apply a diagnostic test only in those with a high pretest probability of the disease 29 , and it could be interpreted that the F1-score would increase if it is applied to patients with moderate

Conclusion
A prediction model for short-and long-term mortality was generated in patients admitted with AMI using multicenter registries and validated using independent cohort data. The ML-based approach increased the discriminative performance of the patients with NSTEMI in predicting mortality compared to the traditional risk scoring method. On the other hand, the performance did not depend on the inclusion of more predictors.

Methods
Data source. A retrospective cohort study was conducted using the data from the Korean Registry of Acute Myocardial Infarction for Regional Cardiocerebrovascular Centers (KRAMI-RCC) registry. The KRAMI-RCC is a prospective multicenter registry of AMI in Korea. The data were collected from all 14 Regional Cardiocerebrovascular Centers (RCCVCs) established by the Ministry of Health and Welfare for the prevention and treatment of cardiovascular disease in Korea since 2008. The purpose and impact of RCCs on AMI are published elsewhere 30,31 . KRAMI-RCC is a web-based registry of consecutive AMI cases reflecting real-world information on the clinical practice in RCCs and consists of pre-hospital, hospital, and post-hospital data. The institutional review board of Inha University Hospital approved this study protocol, and the need for informed consent was waived because of the retrospective nature of the study using anonymized data with minimal potential for harm (IRB number: 2020-05-035). All methods were carried out in accordance with the relevant guidelines and regulations, and the data were obtained with the approval of the committee of RCCVCs after anonymization.  (Fig. 3). After excluding patients with missing data on the predictors at the emergency department (ED) or before ED arrival and those who visited the hospital 24 h after symptom onset, 5557 patients with STEMI were eligible for the final analysis of in-hospital mortality. Furthermore, patients who survived upon discharge were included in the final analysis of the three-and 12-month mortality. This study excluded missing data on the clinical predictors during hospital admission and rare categorical responses among the survivors at hospital discharge. Therefore, the number of patients with STEMI was 4911 for the final analysis of the three and 12-month mortality. For NSTEMI, the number of patients was 8626 for a final analysis of the in-hospital mortality after excluding missing data at the pre-ED or ED level. Regarding the three and 12-month mortality, the number of patients with NSTEMI was 7716 after excluding missing data during the hospital stay and in-hospital deaths.
Predictors. The possible predictors for mortality were extracted from the database based on previous studies, including demographic information, past medical history, initial symptoms, laboratory findings, events before ED arrival and during the hospital stay, and coronary angiographic findings 3,4,6,[8][9][10] . The predictors were classified according to the time frame (pre-ED, ED, and hospital admission). The predictors used in the traditional risk stratification model were selected as the traditional variables 3 ; the other predictors were categorized as optional variables, as described in Supplementary Table 11. The predictors for in-hospital mortality were limited to the variables available in the pre-ED and ED stage. In contrast, those for the three-month and 12-month mortality included all the variables in the pre-ED, ED, and hospital admission stage. Furthermore, medication at discharge was also included in the model for predicting the three-and 12-month mortality.
Outcomes. The outcomes of interest in this study were in-hospital, three-month, and 12-month mortality.
The patients who survived to discharge were followed up by telephone at three and 12 months. The follow-up information was collected through contact with the patients or their families. If unavailable, a follow-up visit or death certificate on the electronic medical records was also checked to determine death.
Predictive models. ML algorithms, such as RF, SMV, XGBoost, Lasso, Ridge regression, and Elastic net, were applied to develop a mortality prediction model. RF builds multiple decision trees and merges them to make a more accurate and stable prediction, while XGBoost provides a parallel tree boosting with a gradient descent that solves many data science problems in a fast and accurate manner. SVM constructs a hyperplane or a set of hyperplanes in high-or infinite-dimensional space for classification. For each prediction model, tenfold cross-validation was used to tune the hyperparameters, with the AUC as the evaluation standard. The hyperparameters in RF were tuned by searching for all the combinations of the number of trees (500, 1000, and 2000) and the number of variables (2, 4, 6, and 8). For XGboost, this study searched for all the combinations of the number of boosting iterations (25, 50, 75, 100, 125, and 150), learning rate (0.05, 0.1, and 0.3), minimum loss reduction (0 and 5), and the maximum depth of the tree (4, 6, and 8). Regarding SVM, the hyperparameters were optimized with combinations of the cost of constraints violation (0.0039, 0.0625, 1.0000, and 2.0000) and bandwidth of the radial kernel (0.0039, 0.0625, 1.0000, and 2.0000). For Lasso, Ridge regression, and elastic net, the default setting of 'glmnet' package in R was used to select the hyperparameters 32 . www.nature.com/scientificreports/ Three different sampling methods were also considered to adjust the highly imbalanced classes: up-sampling, down-sampling, and synthetic minority oversampling technique (SMOTE). The number of study participants in the training set changed from 4443 to 8464, 422, and 1477 when up-sampling, down-sampling, and SMOTE, respectively, were applied to the in-hospital mortality data of STEMI. The number of participants was 13,422, 430, and 1505 in the datasets of up-sampling, down-sampling, and SMOTE for the in-hospital mortality data of NSTEMI.
Traditional and modified traditional models. TIMI and the updated version of GRACE and ACTION-GWTG were used as the references of the traditional models to compare with ML 3,4,8,13,33 . The TIMI risk scores for STEMI and NSTEMI were used in this study 4,33 . The TIMI for STEMI and NSTEMI was developed to predict the 30-day and 14-day mortality, respectively, whereas the prognostic capacity of TIMI for STEMI was stable over multiple time points from 24 h to one year after hospital admission 4 . GRACE v2.0, in which Anderson et al. updated the initial GRACE risk score in 2014, used non-linear functions to enhance discrimination 8 . Although it was developed to predict the six-month mortality, it was validated externally over the longer term with an AUC of 0.82 at one and three-year mortality. In another validation study, GRACE v2.0 also showed excellent discrimination with an AUC of 0.91 for predicting the in-hospital mortality 34 . The updated ACTION-GWTG developed in 2016 had high discrimination with an AUC of 0.88 to predict in-hospital mortality 13 .
These traditional models were fitted to the training data and modified by recalculating the model parameters. In addition to the original traditional models, the modified traditional model was compared with the ML models.
Analysis and Performance measures. The continuous variables, such as age and weight, are represented as the mean and standard deviation in statistical analysis, while the categorical variables are the frequency and www.nature.com/scientificreports/ proportion. After standardization, the data were split by random sampling into a training set (80%) for developing the ML-based models and a test set (20%) for internal validation. The performance of the mortality prediction model was evaluated using the test data, and was described by the sensitivity, specificity, accuracy, F1-score, and area under the receiver operating characteristic curves (AUC) in the tables and the receiver operating characteristics (ROC) curve in the plots. The AUC of the ML algorithms was suggested with a 95% confidence interval and was compared with traditional risk stratification (TIMI, GRACE, and ACTION-GWTG) using a DeLong Test 35 . All analyses were implemented using R software version 4.0.0 (R Development Core Team, Vienna, Austria) 36 .
Validation. In addition to internal validation using a test set, external validation was performed using the KAMIR-NIH registry, which is a prospective multicenter registry in Korea. The registry enrolled patients diagnosed with AMI at 20 tertiary university hospitals who were eligible for primary PCI from November 2011 to December 2015. The detailed study protocols are published elsewhere 37 . The performance of the ACTION-GWTG was not estimated because prior peripheral arterial disease was not collected in the KAMIR-NIH registry. Moreover, the three-month mortality was not available due to different follow-up schedules in the registry. The ML models were validated for the in-hospital and 12-month mortality after matching the operational definition of the pre-ED cardiac arrest and abnormal cardiac biomarkers.

Data availability
The data that support the findings of this study are available from KRAMI-RCC, but restrictions apply to the availability of these data. Data are available from the authors upon reasonable request and with permission of KRAMI-RCC.