Ensemble machine learning for predicting in-hospital mortality in Asian women with ST-elevation myocardial infarction (STEMI)

The accurate prediction of in-hospital mortality in Asian women after ST-Elevation Myocardial Infarction (STEMI) remains a crucial issue in medical research. Existing models frequently neglect this demographic's particular attributes, resulting in poor treatment outcomes. This study aims to improve the prediction of in-hospital mortality in multi-ethnic Asian women with STEMI by employing both base and ensemble machine learning (ML) models. We centred on the development of demographic-specific models using data from the Malaysian National Cardiovascular Disease Database spanning 2006 to 2016. Through a careful iterative feature selection approach that included feature importance and sequential backward elimination, significant variables such as systolic blood pressure, Killip class, fasting blood glucose, beta-blockers, angiotensin-converting enzyme inhibitors (ACE), and oral hypoglycemic medications were identified. The findings of our study revealed that ML models with selected features outperformed the conventional Thrombolysis in Myocardial Infarction (TIMI) Risk score, with area under the curve (AUC) ranging from 0.60 to 0.93 versus TIMI's AUC of 0.81. Remarkably, our best-performing ensemble ML model was surpassed by the base ML model, support vector machine (SVM) Linear with SVM selected features (AUC: 0.93, CI: 0.89–0.98 versus AUC: 0.91, CI: 0.87–0.96). Furthermore, the women-specific model outperformed a non-gender-specific STEMI model (AUC: 0.92, CI: 0.87–0.97). Our findings demonstrate the value of women-specific ML models over standard approaches, emphasizing the importance of continued testing and validation to improve clinical care for women with STEMI.


Participants
The cohort for this study was collected from the NCVD-ACS registry and spanned the years 2006 to 2016.Our primary analysis included primarily female STEMI patients' complete data records for clinical outcome analysis.For our secondary analysis, we increased the scope by incorporating three distinct datasets to enhance the robustness and generalizability of our findings: • Women complete dataset: consisting of female patients with complete data, allowing a focused analysis on the intended demographic with no missing values in predictor variables.• Women imputed dataset: including a larger dataset with missing values addressed through multivariable imputation, increasing female patient records to represent a broader range of clinical circumstances.• General complete dataset: including complete data for both male and female STEMI patients, which provides a comparative perspective across genders and allows us to examine the model's performance in a broader context.

Data Source
Our study utilized anonymized patient data from the NCVD-ACS registry spanning from 2006 to 2016.Consecutive in-hospital STEMI cases comprised a total of 15,407 with 6299 complete cases identified (with no missing values on predictors).This study utilised 871 cases of female patients for primary analysis using complete cases from a total of 6299 datasets.
In 2007, the Medical Review & Ethics Committee (MREC) of the MOH of Malaysia approved the NCVD registry study (Approval Code: NMRR-07-20-250).The MREC waived patient informed consent for NCVD 37,38 .This study also has been authorized by the UiTM ethics committee (Reference number: 600-TNCPI (5/1/6)) and NHAM.The data used in this study were made anonymous before use, as in our research data are interested only in the values and features without having access to patient personal information.
The dataset used in this study includes each patient's information at the time of STEMI hospitalization.Based on the data available at the time, predictions for in-hospital mortality were developed, with the model being utilized once per patient.During the hospital stay, no more predictions were made, aligning the prediction frequency with the crucial decision-making period at the time of patient admission.

Variables and data preprocessing
Variables STEMI was defined as persistent ST-segment elevation ≥ 1 mm in two contiguous electrocardiographic leads, or the presence of a new left bundle branch block in the setting of positive cardiac markers 39 .Input variables are features that are used as input in the development of a model to predict the outcome (in-hospital mortality).48 variables (9 continuous, 39 categorical) from a complete set of data were used in this study (Supplementary Table 1).The categories of variables used were sociodemographic characteristics, CVD diagnosis and severity, CVD risk factors, CVD comorbidities, non-CVD comorbidities, clinical presentation, baseline investigation, electrocardiography (ECG), treatments, and pharmacological therapy.Variables used for model development are variables in the emergency department as first contact as well as variables in the hospital.Our study adopts the following method to address the dynamic nature of patient data during hospitalization: • Clinical history, examination, and investigation findings: based on information obtained at the time of admis- sion, these provide a baseline understanding of each patient's initial status.• Treatment: we include the initial medical responses and interventions, as well as the primary treatment administered during hospitalization.• Medication: recognizing that medication regimens can change, our models consider the final pharmaceutical regimen recommended before discharge, capturing any substantial changes in treatment.• Outcome variable (in-hospital mortality): determined based on the patient's condition at the time of dis- charge, providing a specific endpoint for each case.
The mortality period begins on the day of hospital admission.For in-hospital mortality, the calculation period began with the first hospital admission.Through record links with the Malaysian National Registration Department, the death was confirmed.The registry does not collect information on short-term complications, such as heart failure.Planned follow-up data points were intended to collect this information, but we omitted them from this study due to the high rate of missing values.To increase the significance of this study, we centred our algorithm on policy-altering endpoints such as death.This was accomplished in similar publications 9,40,41 42 to avoid data leakage 43 .In circumstances of multiple admissions, a unique patient identification ensured that each patient's data was consistently labelled as the training or testing set, preserving anonymity 44 .Data pre-processing methods such as imputation (on missing cases) and balancing (both complete and missing cases) were performed on training data only.Meanwhile, normalization methods were done separately on both training and testing data.We accessed the performance of the developed model and TIMI using a validation set that accounts for 30% of data that is not used for model development.

Data balancing
Our dataset had a significant class imbalance, with non-survival cases (n = 73) accounting for approximately 8.38% of the total dataset (n = 871) and survival cases (n = 798) accounting for 91.62%.To mitigate the imbalance issue and improve the robustness of our model, we used the ROSE package to combine up-sampling and down-sampling techniques on the training data 45 .The class distribution was adjusted to better reflect a balanced scenario, improving the reliability of subsequent analyses and the predictive performance of the developed models.To preserve the integrity and representativeness of real-world clinical scenarios, this treatment was not applied to the validation dataset.

Data imputation
Since our dataset is prospective, the proportion of missing values across all variables was arbitrary and out of our hands.The definition of an incomplete dataset is up to 30% of variables missing.The probability of missing data in our dataset is independent of both observed values and unseen data components.Our dataset is classified as missing completely at random, indicating that the distribution of missing values is random and independent of any variable that may or may not be included in the analysis.We performed multivariable imputation using chained equations and predicted mean matching from the MICE R package to deal with missing cases for the secondary analysis 46 .This method imputes missing values using actual values from other cases in which predicted values are the closest.

Data normalization
Data normalization was used to reduce the bias of features that contribute more numerically to pattern class discrimination 42 .We employed standardization or z-score normalization, for continuous variables (age, heart rate, systolic and diastolic blood pressure, total cholesterol, high-density lipoproteins (HDL), low-density lipoproteins (LDL), triglyceride, fasting blood glucose) in this study.

Primary analysis
A total of 6299 in-hospital STEMI complete cases were identified (with no missing values on predictors).871 cases of woman patients were extracted from the data and used as the final dataset for primary analysis.This rendered a full predictor set of 48 variables (9 continuous, 39 categorical) for the study as shown in Table 1.

Secondary analysis
Secondary analyses on the best-performing algorithm were carried out; (i) For the 15,407 STEMI cases with missing data, we employed multivariable imputation using chained equations to estimate missing values, creating a comprehensive dataset for modelling.This allowed us to include a total of 2197 additional female patients in our analysis, broadening the scope and applicability of our results.
(ii) A total of 4369 patients out of 6299 in-hospital STEMI patients with complete cases, including both male and female patients, were used to train the algorithm with the best performance.Both a women-specific model and a population-specific model were tested and compared using identical testing datasets (262 cases) from the primary analysis of all cases.

Additional statistics
This study presents the mean and standard deviation (SD) of continuous variables as well as the frequencies of categorical variables.Correlation analysis revealed variable associations.Univariate analysis used a Chi-Square test to find significant variables and a two-sided independent student t-test (p < 0.05) to compare them.Pair-wise corrected resampled t-tests were used to compare the base and ensemble ML model performance 49,67 .A p-value less than 0.001 indicated statistical significance.

Feature selection
RF and SVM algorithms have produced better results than other base learners in this study.Hence, ranked features from RF and SVM algorithms were used for feature selection.The sequential backward elimination (SBE) algorithm removes irrelevant features in ascending order using model significance value 47 .Iteratively, SBE was applied to RF and SVM-ranked variables in ascending order 48 .The prediction models were trained and evaluated for each iteration using the 30% validation dataset that was not used for model development.The models' predictive performance was calculated, and the models with the highest performance and fewest variables were chosen.Then, the base and ensemble ML models were constructed using the selected features from RF and SVM.AdaBoost is an adaptive learning algorithm because it transforms weak learners into strong learners through multiple iterations.These algorithms were chosen based on previous CVD mortality-related research 22,24,27,28,57,58 .
All the hyper-parameters utilised in the development of base and ensemble ML models were tuned using a combination of random search and manual tuning (refer to Supplementary Table 2).

Ensemble ML algorithms
Stacking, a type of ensemble ML algorithm, is a meta-learning strategy that uses the predictions of multiple base learners as input for training a new meta-learner, which makes the final prediction.It is more effective than any individual algorithm in classification and regression problems.In this study, six commonly used ML algorithms, including SVM, KNN, DT, RF, XGBoost, and AdaBoost, are used as base learners, followed by three commonly used meta learners, including RF, generalised logistic model (GLM), and generalized boosted models (GBM) [59][60][61] .10-fold cross-validation was used to avoid overfitting for model development on the training set 49 .

Model evaluation
Model calibration was evaluated using standardized measures on untouched raw validation dataset 62 .The primary evaluation metric, the AUC, was chosen based on research establishing its effectiveness in a wide range of class distributions, including imbalanced datasets 63,64 .While AUC-PR provides more granularity for minority class predictive performance, AUC is still a widely accepted measure for overall diagnostic accuracy.Additional metrics included accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), which provide a comprehensive view of model performance across both classes.To compare the predictive performance of ML models, a paired resampled t-test was used 65 .In addition, the net reclassification index (NRI) was calculated to determine the percentage improvement in identifying both positive and negative cases with the best model compared to the TIMI risk score 66 .

Results interpretation
Due to their black-box nature, it is difficult to implement ML models in clinical medicine.Since ML models are agnostic, perturbing input and observing predictions can reveal the behaviour of the underlying model 50 .
Modifying components that are understandable by humans enables us to interpret the input.Thus, we interpret the best ML model in this study using local interpretable model-agnostic explanations (LIME) 51 .LIME employs a simple linear model to approximate a black-box model locally, as opposed to globally.

Comparative analysis
The computed TIMI scores obtained from the NCVD registry were utilized for validating the performance of the data.Using the 30% validation set data, the TIMI score was compared to the developed base and ensemble ML models using AUC.A performance breakdown graph was also created to evaluate the performance of the TIMI score based on clinical practice and literature cut-off points.
Validation of data was NCVD registry calculated TIMI scores were used for validation data performance.Using a validation set that was not used for model development, the AUC of TIMI score performance was compared to the developed base and ensemble ML models.A graph was also created to compare performance with the TIMI score based on clinical practice and literature cut-off points.The ML high-risk population for this study is defined by a mortality probability of greater than 50%, which is equivalent to a TIMI score of greater than 5.

Ethical declaration
This study was authorized by the UiTM Research Ethics Committee (Reference: 600-TNCPI (5/1/6)), with the approval code REC/673/19.The UiTM Ethics Committee conducts following the ICH Good Clinical Practice Guidelines, Malaysia Good Clinical Practice Guidelines and Declaration of Helsinki.

Patient characteristics
The characteristics of patients utilised in this study are detailed in Table 1.In the complete cases dataset, the mean age of in-hospital female STEMI survivors is 61.8 (SD 11.5) years, while the mean age of non-survivors is 67 (SD 9.8) years.Nearly 90% of the patients were non-smokers.73% of the patients have a hypertension history, and 57% have diabetes.32% of patients received percutaneous coronary intervention (PCI) treatment.The reported overall hospital mortality rate for women was 8.4%.
Table 1 also displays the summary statistics for the imputed dataset.The overall mortality rate for women was 12.8 %.There were significant differences in systolic blood pressure, Killip class, fasting blood glucose, betablocker, ACE inhibitor, and oral hypoglycemic agent between survivors and non-survivors in both complete cases and imputed datasets (p < 0.001 for all).

Feature selection
SBE feature selection methods were combined with ML algorithms SVM and RF to construct predictive models with optimal performance (refer to methods).The comparison between features selected by ML feature selection with TIMI risk score is illustrated in Table 2

Algorithm performance on complete cases
On the 30% validation dataset, the models constructed using complete sets (48 variables) and a reduced set of variables compared to the TIMI risk score demonstrated the highest predictive performance (Table 3).Except for base DT and ensemble GBM, most ML models outperformed TIMI risk scores for the prediction of STEMI in women.The model with the best performance was base SVM (SVM selected var; p < 0.001).Table 4 provides a detailed performance evaluation of ML models relative to the TIMI risk score.The predictive performance of ML models constructed with SVM-selected features (AUC ranging from 0.70 to 0.93) was better compared to that of models constructed with RF-selected features (AUC ranging from 0.60 to 0.90).There was a significant difference between the base SVM-Linear (SVM selected var) algorithm and the   2).However, the base SVM with the linear kernel (SVM selected var) algorithm demonstrated the highest predictive performance with a reduced number of predictors (12 predictors) for in-hospital prediction of STEMI patients (AUC = 0.93, 95% CI = 0.89 to 0.97) compared to other base and ensemble ML models.

Secondary analysis on best performing model
The best performing ML models, base SVM (SVM selected var), were also trained on an imputed dataset and a general dataset (data with complete cases that are not gender-specific).Then, both types of models were evaluated utilizing the complete cases validation dataset.This enables a valid comparison between models constructed with imputed, general, and complete cases models (Table 5).SVM (SVM selected var), trained on imputed datasets performed comparably to models trained on the complete dataset using a similar validation dataset of complete cases: SVM (SVM selected var) (AUC = 0.89, CI: 0.81-0.96vs AUC = 0.93, CI: 0.89-0.98)(p = 0.540).There is no statistically significant difference between the SVM model (SVM selected var) using complete cases with the imputed model.
Using the complete cases validation dataset, the model trained with women's complete cases performed better compared to the models trained with complete cases data that are not gender specific: SVM (SVM selected var) (AUC = 0.93, CI: 0.89-0.98vs AUC = 0.92, CI: 0.87-0.97)(p < 0.001).

Model interpretation
LIME provides explanations for any individual patient, and the contribution of a given variable may change depending on other features of the patient.The contributions of the variables used for prediction by LIME www.nature.com/scientificreports/analysis are illustrated for dead (Fig. 2) and alive (Fig. 3) cases respectively using the best performing model, base SVM Linear (SVM selected var) model.Each graph illustrates the ten variables that best characterise the prediction in the local region.The blue bars represent variables that increase the predicted probability (supports), while the red bars represent variables that decrease the predicted probability (reduces) (contradicts).For instance, for the dead cases, a high Killip class > 3 and no PCI intervention with high systolic blood pressure (patient #1) or an older age > 74 years old (patient #68) are variables that strongly indicate non-survival.In the meantime, did not receive PCI intervention with high fasting blood sugar > 14.3 (patient #2) and older age > 74 with higher blood pressure (patient #3) were also strong indicators of non-survival.Pharmacological interventions are noted as variables that contradict and lower the predicted probability of non-survival in (patients #3 and #2).For patients who are alive (Fig. 3), a younger age of 58 years, the absence of chronic renal disease, a lower Killip class < 2, and a lower fasting blood glucose < 6.7 are all supportive of the survival outcome.

Comparison with TIMI conventional risk score
Using a similar validation set, TIMI achieved a lower AUC of 0.81 (0.72-0.89) compared to most of the ML models except for the base DT and ensemble GBM model.Figures 4 and 5 illustrate the graph plotted from the TIMI risk score and the best-performing model, base SVM Linear (SVM selected var) in predicting the mortality risk of the women STEMI patients respectively.For the women patients, the ML score categorized patients as low risk with the probability of < 50% and high-risk stratum as ≥ 50%.This is equivalent to a TIMI low-risk of score ≤ 5 and a high-risk score of > 5 68 .
Table 6 tabulates the percentage of mortality in the patients with predicted low risk (TIMI score: ≤ 5; ML probabilities < 0.5) and high risk (TIMI score: > 5; ML probabilities: ≥ 0.5).In the high-risk group, ML models predicted mortality better in comparison to TIMI for in-hospital death in women STEMI patients.

Discussion
This study developed and evaluated ML models to predict in-hospital mortality in Asian women with STEMI, comparing them with traditional risk scores like TIMI.Notably, it is the first study to apply ensemble ML models in this context, achieving higher accuracy than conventional risk scores.Key findings include: the crucial role of feature selection in enhancing model performance; identifying consistent predictors like systolic blood pressure, Killip class, fasting blood glucose, beta-blockers, ACE inhibitors, and oral hypoglycaemics medications; improved performance of ML models using selected features, the SVM linear model with SVM selected features showing the highest accuracy outperforming ensemble ML; most ML models, except DT and GBM, outperform TIMI  Feature selection enhances ML model performance in our study, aligning with findings from Perez et al. 69 .Applications of feature selection algorithms increase ML model performance [70][71][72][73][74][75] , as seen in this study with the RF (11 predictors) and SVM (12 predictors) models.However, this approach contrasts with other mortality post-STEMI studies where models using larger sets of predictors showed optimal performance 35,76 .ML with significant predictors improves risk stratification in Asian STEMI women, providing clinicians with a prognostic tool for better emergency care management.This study's findings also reveal that ensemble ML methods show promise in predicting in-hospital mortality for Asian female patients, though their performance did not consistently exceed that of base ML algorithms.Particularly, base learners like SVM (AUC: 0.93) and RF (AUC: 0.90) performed on par with ensemble ML models.In medical contexts, even small increases in predictive model performance are crucial 77 .However, it is notable that the ensemble ML method does not always outperform the base model 78 .This has been demonstrated in this study that the improvement of the ensemble ML model was not significantly greater than the best-performing base learners SVM, as demonstrated in the literature 27,50 .
The best-performed model, base SVM Linear managed to identify high-risk patients that reported higher mortality than those classified as high-risk in TIMI.Despite its widespread use in Asia, the TIMI risk score, originally developed from a predominantly Western Caucasian cohort, had limited Asian representation, and only included 25% female participants, indicating an underrepresentation of women.In our study, ML models validated against TIMI showed an AUC value of 0.81 in a non-restricted PCI eligible population, higher than the 0.78 AUC for the fibrinolytic eligible STEMI population reported in the original TIMI study 79 .The SVM algorithm's robustness in managing high-dimensional and constrained datasets renders it ideal for predicting in-hospital mortality, and its proficiency in modelling non-linear decision boundaries is beneficial for assessing severe AMI prognosis 80,81 .
NRI was further used for a detailed assessment of model enhancements compared to the TIMI score.The NRI, though less commonly reported in medical research, effectively measures how accurately a new model reclassifies individuals into appropriate risk categories 82 .In our study we achieved a significant 18.8% improvement in classification accuracy over the TIMI score, indicating that our ML models not only predict more precisely but also better reflect actual patient outcomes.Accuracy tests for NRI were conducted on a separate dataset from that used for model development, providing an unbiased comparison with TIMI and reinforcing the validity of our results.
Our ML models, using feature selection, identified age, Killip class, and systolic blood pressure as key predictors, aligning with univariate analysis and LIME.LIME analysis indicated that factors like older age, increased fasting blood glucose, and absence of percutaneous coronary intervention (PCI) were associated with higher mortality risk, consistent with existing research.However, LIME's identification of influential features should be seen as preliminary and not indicative of causality, necessitating further validation through prospective or randomized controlled trials 83,84 .
Older female STEMI patients have a higher incidence of coronary artery disease than males 2 , with Killip class being a key predictor of STEMI patients 6,85,86 .This finding is consistent with our study and previous ML-based mortality studies 40 .Women with STEMI face higher mortality due to factors like atypical symptoms, delayed treatment, and less frequent use of cardiac catheterization.Our study found only 34% of Asian STEMI patients received PCI, highlighting a need for improved care.Heart rate is a crucial factor in in-hospital mortality 87 , and the use of beta-blockers post-STEMI is linked to better outcomes 5,7,86,88 .
Several limitations exist in this study.Firstly, we could only validate ML models using only the TIMI score.Parameters to calculate the GRACE score were not acquired during patient admission compared to the TIMI score.The TIMI score is adopted during admission due to its simplicity and its development for short-term risk stratification, along with findings that its performance is similar to the GRACE score for predicting in-hospital mortality.Hence collecting information for two risk scores is redundant 89 .
Future research will aim to utilize high-performance computing and larger datasets for better predictive performance of ensemble techniques.ML models, reliant on data representativeness rather than medical expertise, may exhibit biases and require ongoing validation with real-world data, which can be facilitated by electronic health record systems in hospitals.Integrating these models into hospital systems for physician use and validating them in clinical registries rather than administrative databases, will be key areas of future investigation.

Conclusion
This work demonstrates the effectiveness of both base and ensemble ML models, when combined with feature selection, in predicting in-hospital mortality in Asian women with STEMI.Our findings highlight the potential for combining these advanced ML models with conventional risk-scoring approaches like TIMI to improve mortality risk assessments in this specific group.This opens up the possibility of more nuanced and effective therapeutic decision-making.The improved predictive accuracy achieved by these models not only allows for better patient communication and awareness but also allows healthcare practitioners to optimize their management methods and resource allocation more effectively.In the future, incorporating these ML technologies into clinical practice could greatly enhance care for female STEMI patients.Furthermore, our findings pave the way for future research to test and potentially integrate these models into clinical processes, ultimately leading to more tailored and improved healthcare outcomes for women with STEMI.
www.nature.com/scientificreports/NRI analysis NRI for the in-hospital model, the net reclassification of women STEMI patients using the base SVM (SVM selected var) produced a net reclassification improvement of 18.8% with p < 0.00001 over the original TIMI risk score.

Figure 3 .
Figure 3. LIME model plots explaining individual predictions for alive cases.

Figure 4 .
Figure 4. Mortality rate distribution on the validation set of TIMI risk scores.

Figure 5 .
Figure 5. Mortality rate distribution on the validation set of base SVM (using SVM variables) model.

Table 1 .
Summary statistics of the complete and imputed dataset.The asterisk (*) with p-value < 0.001 indicated that the variable difference between the alive and dead group is statistically significant.Significant values are given in bold.-parametric supervised learning technique used for classification and regression.To generate multiple small decision trees, RF employs bagging with DT as the primary classifier.The models use the class with the most votes predicted by RF trees.XGB is an implementation of gradient boosting.Gradient Boosting with XGB is more regularised, which improves model generalisation and prevents overfitting, resulting in a more precise result.
Vol.:(0123456789) Scientific Reports | (2024) 14:12378 | https://doi.org/10.1038/s41598-024-61151-xwww.nature.com/scientificreports/non beta blocker and percutaneous coronary intervention were observed as common predictors in both ML feature selection models in this study.The best SVM Linear model was built using twelve features selected using SVM algorithm feature selection methods.Age, Killip class, and systolic blood pressure are common characteristics shared by the TIMI risk score for STEMI and the best model.The ranking of the selected features by variable importance is presented in Supplementary Table3.

Table 2 .
Comparison between features selected by ML feature selection with TIMI risk score.

Table 3 .
The AUC of TIMI risk score and ML models with and without feature selection based on a 30% validation dataset.Significant values are in bold.

Table 4 .
Detailed performance metrics of ML models with and without feature selection for women STEMI patients.

Table 5 .
Detailed performance metrics of best SVM Linear model (SVM selected var) on the imputed dataset and general dataset for STEMI women patients.Significant values are in bold.
Figure 2. LIME model plots explaining individual predictions for dead cases.

Table 6 .
Percentage of mortality in patients with predicted low risk (TIMI score: ≤ 5; ML probabilities < 0.5) and high risk (TIMI score: > 5; ML probabilities: ≥ 0.5).; and the use of LIME for model interpretability.These results underscore the value of advanced ML in specific clinical settings, enhancing predictive accuracy and decision-making in treating STEMI in Asian women.