Development of postoperative delirium prediction models in patients undergoing cardiovascular surgery using machine learning algorithms

Associations between delirium and postoperative adverse events in cardiovascular surgery have been reported and the preoperative identification of high-risk patients of delirium is needed to implement focused interventions. We aimed to develop and validate machine learning models to predict post-cardiovascular surgery delirium. Patients aged ≥ 40 years who underwent cardiovascular surgery at a single hospital were prospectively enrolled. Preoperative and intraoperative factors were assessed. Each patient was evaluated for postoperative delirium 7 days after surgery. We developed machine learning models using the Bernoulli naive Bayes, Support vector machine, Random forest, Extra-trees, and XGBoost algorithms. Stratified fivefold cross-validation was performed for each developed model. Of the 87 patients, 24 (27.6%) developed postoperative delirium. Age, use of psychotropic drugs, cognitive function (Mini-Cog < 4), index of activities of daily living (Barthel Index < 100), history of stroke or cerebral hemorrhage, and eGFR (estimated glomerular filtration rate) < 60 were selected to develop delirium prediction models. The Extra-trees model had the best area under the receiver operating characteristic curve (0.76 [standard deviation 0.11]; sensitivity: 0.63; specificity: 0.78). XGBoost showed the highest sensitivity (AUROC, 0.75 [0.07]; sensitivity: 0.67; specificity: 0.79). Machine learning algorithms could predict post-cardiovascular delirium using preoperative data. Trial registration: UMIN-CTR (ID; UMIN000049390).

and budgetary limitation 17 .As pharmacological therapy, ramelteon and suvorexant also reportedly reduce delirium 18,19 , but premedication to prevent delirium is not recommended for use in all patients 20 .Therefore, early prediction and identification of individuals at high risk for delirium are needed to provide targeted and efficient interventions 14,21 .
Several studies have aimed to predict postoperative delirium.However, the accuracy of prediction models for patients in the ICU is insufficient when applied to those undergoing cardiovascular surgery 22 , and delirium prediction models should be constructed only for a specific group of patients 23 .
Conventional prediction models based on statistical methods for patients undergoing cardiovascular surgery include those by Koster 24 and Rudolph 4 ; both reported an area under the receiver operating characteristic curve (AUROC) of 0.75.Although statistical models such as logistic regression are favorable in terms of model interpretability, machine learning is preferred for prediction models 25 .Clinical prediction models using machine learning algorithms have recently attracted attention.In the area of cardiovascular surgery-associated postoperative delirium, both Mufti 26 and Xue 27 showed that the prediction performance of machine learning algorithms was superior to that of conventional statistical models, limited to hyperactive delirium and acute kidney injuryrelated delirium, respectively.Delirium is classified into three subtypes based on the type of psychomotor activity: hyperactive, hypoactive, and mixed 1 .Hypoactive delirium reportedly accounts for 92% of all cases of delirium in the cardiovascular ICU 28 .Therefore, we must assess and deal with delirium, including the hypoactive subtype which is often overlooked in clinical practice.
We aimed to develop and validate new prediction models for cardiovascular surgery-associated postoperative delirium including hypoactive subtype using machine learning algorithms.Early identification of patients at high risk of delirium is critical to effectively implement prevention strategies, such as multicomponent interventions and prophylactic medications.

Setting and study population
This single-center prospective study included patients aged ≥ 40 years who underwent cardiovascular surgery at Osaka University Hospital between November 2021 and October 2022.The inclusion criteria were patients who underwent coronary artery bypass graft (CABG), valve surgery, ascending aortic replacement (AAR) via a median sternotomy, or minimally invasive cardiac surgery (MICS) using cardiopulmonary bypass (CPB).The exclusion criteria were as follows: (1) patients undergoing transcatheter aortic valve implantation (TAVI) or thoracic endovascular aortic repair (TEVAR); (2) patients managed with deep hypothermic circulatory arrest or selective cerebral perfusion; (3) patients with preoperative delirium; (4) patients diagnosed and treated for dementia preoperatively; (5) patients requiring mechanical ventilation for > 3 days postoperatively, (6) patients with cerebral hemorrhage or stroke within 7 days after surgery; (7) patients requiring reoperation within 7 days after surgery; and 8) patients with suspected alcohol withdrawal delirium.The criteria for alcohol withdrawal delirium were consumption of an average of 60 g or more of alcohol per day immediately prior to admission and the development of delirium within 2 weeks of admission.

Ethics declarations and consent to participate
This study was approved by the Ethical Review Committee of Osaka University Hospital (No: 21158-3; 2021/09/11) and followed the Declaration of Helsinki.All patients provided written informed consent.

Variable selection
The electronic medical records of all patients were prospectively reviewed.We collected 38 preoperative factors, including demographics, medical history, laboratory data, and life history, as well as 10 intraoperative factors, including operation time, pump time, and blood fluid balance.The entire list of 48 variables can be found as Supplementary Table S1 and S2 online.A trained nurse visited the patients on any one day from admission to surgery and assessed their function using the Mini-Cog 29 Japanese version, Geriatric Depression Scale-Short Version-Japanese (GDS-S-J) 30 , and Barthel Index 31 .

Delirium assessment
All patients were followed for 7 days, with day 0 being the day of surgery.Electronic medical records were reviewed daily.The evaluation of delirium began when the patient was extubated in the ICU.A psychiatrist (MH) or a critical nurse (CN) trained by psychiatrists visited patients after discharge from the ICU and when a change in mental status was suspected, then assess delirium according to the Diagnostic and Statistical Manual of Mental Disorders-5 (DSM-5) 1 .For cases that were difficult to assess, two or more investigators including psychiatrists discussed and came to a final decision.Considering the diurnal variability of delirium, visiting hours were standardized as 3:30 p.m. to 6:30 p.m.

Data preprocessing and statistical analysis
Some values of the intraoperative propofol dose (25.3%) and HbA1c (1.1%) were missing and thus, these factors were excluded from the analysis.The outliers were used after confirming that they were not erroneously entered.

Variables are expressed as median (interquartile range [IQR]) or the number of persons (percentage).
Univariate analysis was used to compare patients who did not develop delirium (non-delirium group) and those who did (delirium group).Student's t-test was used for continuous variables normally distributed with equal variance, Welch's t-test for variables normally distributed with unequal variance, Mann-Whitney U test for continuous variables not normally distributed, and Fisher's exact test for categorical variables.We used the Shapiro-Wilk test and Bartlett's test to assess data normality and equality of variance, respectively.All were

Derivation of prediction models
The features used for classification models were selected based on the results of the univariate analysis and the consultations with two skilled cardiovascular surgeons, three psychiatrists, and a nurse.The process of model development is illustrated in Supplementary Fig. S1 online.We compared the performance of classification models using Bernoulli naive Bayes, Support vector machines 32 , Random forest 33 , Extra-trees 34 , and XGBoost 35 .Continuous variables were used as they were.In Bernoulli naive Bayes, continuous variables were binarized to 0 or 1.A binarization threshold is used as a hyperparameter.We searched the hyperparameter spaces of models using a grid search with a fivefold stratified cross-validation.Our dataset is imbalanced because it contains a large proportion of the non-delirium class and a small proportion of the delirium class.Therefore, we used class weights to help the model learn from the imbalanced data.We evaluated the classification performance of these models using a fivefold stratified cross-validation.Seven evaluation metrics were utilized to compare the classification performance of each model: the balanced accuracy, AUROC, area under the precision-recall curve (AUPRC) 36 , sensitivity, specificity, positive predictive value, and F-value.In addition, a predictive model based on the conventional Logistic Regression (LR) method was also built for comparison with the machine learning models.The LR model was validated using the stratified hold-out method.Our experiments were conducted on Python version 3.8.3.

Result Patients and occurrence of delirium
Of the 123 patients who met the inclusion criteria and consented to participate in the study, 87 (median age 71 [IQR; 61.5, 75.0] years; 53 [60.9%] males) were included in the analysis.Twenty-four patients (27.6%) developed delirium within seven days after surgery (Fig. 1).Fourteen of the delirious patients assessed delirium during their ICU stay based on electronic medical records, 10 of whom also had delirium symptoms when we visited after discharge from the ICU.No significant difference was found in the length of ICU stay between non-delirium and delirium groups (non-delirium vs. delirium group: median 3 [IQR; 2, 3] days vs. 3 [2, 3], p = 0.976).The subtypes included 14 (58.3%)hypoactive, eight (33.3%)mixed, including those who did not present with apparent hypoactivity or hyperactivity, and two (8.3%) hyperactive.

Risk factors of delirium
The results of the univariate analysis comparing the non-delirium and delirium groups were shown in Supplementary Table S1

Development of prediction models and validation
In the results of the univariate analysis, CABG, which showed p < 0.05, was uninterpretable.First, we removed CABG from the set of features for training the models.Second, we selected a history of stroke or cerebral hemorrhage (non-delirium vs. delirium group, 4 [6.3%] vs. 4 [16.7%],p = 0.208) and eGFR < 60 (30 [47.6%] vs. 16 [66.7%],p = 0.150) as the set of features based on the clinical observation.Consequently, we trained all models using the following features: age, use of psychotropic drugs, Mini-Cog < 4, Barthel Index < 100, history of stroke or cerebral hemorrhage, and eGFR < 60.
As a result of cross-validation after hyperparameter tuning, the extra-trees model had the best AUROC (0.76 ± 0.11 [standard deviation]) and AUPRC (0.62 ± 0.18), with a sensitivity of 0.63 and specificity of 0.78.XGBoost showed the best sensitivity (AUROC: 0.75 ± 0.07, AUPRC: 0.59 ± 0.17, sensitivity: 0.67, and specificity: 0.79).The conventional LR model showed lower values for the evaluation metrics than other machine learning models.A comparison of the developed models is shown in Table 1 and the ROC and PR curves are shown in Fig. 2.

Discussion
Herein, we developed models to preoperatively predict up to 67% of patients who develop postoperative delirium based on the following six preoperative factors: age, use of psychotropic drugs, Mini-Cog < 4, Barthel Index < 100, history of stroke or cerebral hemorrhage, and eGFR < 60.
Many studies have tried to predict post-cardiovascular surgery delirium based on statistical methods such as logistic regression 4,24,37,38 .However, we applied machine learning to predict postoperative delirium, including the hypoactive subtype.Additionally, we placed importance on the prediction of true-positive patients when developing and validating the models using weighting and PR curves 36 .This resulted in predictive models with similar AUROCs; the sensitivity was more stable than in previous studies.Considering the safety of delirium preventive strategies, such as multicomponent nonpharmacological interventions, a predictive model that emphasizes sensitivity to identify true positives preoperatively may be preferred 39 .
In this study, no intraoperative factors showed statistically significant differences.Hence, we used only preoperative factors to develop the predictive models.Early prediction of postoperative delirium using only preoperative factors enables us to provide efficient interventions for postoperative delirium, such as premedication, preoperative cognitive training, and adequate staffing.For example, the Extra-trees model we built correctly predicted 63% of true positives, while it indicated that 33% of all participants were not targets of predictive strategies preoperatively.Postoperative delirium is influenced by surgical and anesthesia-related factors, such as the use of anticholinergic drugs including opioids 40 and CPB, which is associated with inflammatory stimuliinduced nerve injury 41 .However, including intraoperative factors in the model reportedly does not improve the prediction accuracy for postoperative delirium 42 .This implies that only preoperative factors.may predict the onset of postoperative delirium, which would allow for the initiation of preoperative prophylactic interventions.
We selected features based on the results of univariate analysis and clinical experiments.A prospective observational study reported that Mini-Cog alone showed discriminative power with an AUROC of 0.77 (95%CI; 0.61-0.93) in predicting postoperative delirium, excluding in cardiovascular surgery 43 .The Mini-Cog is one of the simplest cognitive assessment tools that can be performed in a few minutes 29 .Psychotropic drugs reportedly have side effects on the central nervous system and are risk factors for delirium 44 .Mini-Cog > 4, use of psychotropic drugs, and a history of stroke or cerebral hemorrhage directly reflect the patient's cerebral function.eGFR < 60 is a standard used to evaluate renal dysfunction, which is known to cause brain dysfunction due to overproduction of inflammatory cytokines and hemodynamic changes [45][46][47][48] .In older patients, microglial immune responses frequently occur in the central nervous system, and excessive neuroinflammation in response to surgical invasion may lead to postoperative delirium and cognitive dysfunction 49,50 .Some studies cite ADL and instrumental ADL (iADL) impairment as risk factors for delirium 51,52 , and physical function interventions are recommended for preventing delirium 53 .Thus, Barthel Index < 100 was included as a risk factor.Immobility due to impairment of physical function can be related to the decreased cholinergic activity, which is important in the pathophysiology of delirium 54 .

Table 1.
Comparison of the performance of the developed models.AUROC, area under the receiver operating characteristic curve; SD, standard deviation; AUPRC, area under the precision-recall curve; PPV, positive predictive value.

Algorithms AUROC [SD] AUPRC [SD] Balanced-accuracy Sensitivity Specificity PPV F-value
Bernoulli www.nature.com/scientificreports/A systematic review in 2021 reported that the most commonly used and best model for predicting delirium in adult inpatients was Random forest, followed by Support vector machine 55 .We used Bernoulli naive Bayes, Support vector machine, Random forest, Extra-trees, and XGBoost and compared the results.Extra-trees and XGBoost, which are superior in terms of AUROC and AUPRC, are ensemble learning models that combine multiple decision trees, similar to that in Random forest.In tree-structured ensemble learning, combining multiple classifiers stabilizes the estimation and suppresses overfitting 56 .Although Random forest did not show particularly good accuracy in this study, such a tree-structured population learning method may be suitable for predicting delirium.The machine learning models we built outperformed conventional LR except for the sensitivity of Random forest and Bernoulli naive Bayes, indicating that machine learning models were better balanced in predicting true positives and true negatives.The advantage of machine learning methods is that they do not have restrictions on their assumption as LR, such as linear distribution of logit transformed values of probability, equivariance of error terms, and the number of predictors 57 .The pathophysiology of delirium is complex and influenced by many factors, machine learning methods, which are highly flexible algorithms, are considered suitable.
Our study has four main limitations.First, some patients with delirium might have been underestimated.Clinical features of delirium are vague, and an estimated 55-70% of hospital delirium cases are missed 58,59 .The incidence of delirium in this study was 27.6%, which is consistent with a previous report indicating the incidence to be 26-56% [4][5][6][7]60 with prospective diagnosis based on DSM-5 criteria. Howver, delirium occurring in ICU might have been missed because assessments during ICU stays are based primarily on electronic medical records because ICU access was restricted due to the spread of the novel coronavirus infection.
Second, the subjects and sample size were limited by adopting a prospective design at a single center.Although larger sample sizes are generally preferred for building machine learning models, our study included only 87 participants.In addition, our models were validated using stratified 5-fold cross-validation; no external validation was performed.Our results suggest that machine learning may be able to predict post-cardiovascular surgery delirium with greater sensitivity than conventional statistical methods.The developed models could be integrated www.nature.com/scientificreports/into a meta-model while expanding the sample size in the future to improve their versatility 23 .In addition, the sample size affects feature selection.Typical feature selection methods in machine learning models include the wrapper and embedding methods that select variables through model training 61 .These methods are used to control model overfitting caused by too many variable dimensions.However, if data used for feature selection is small, there is a risk of causing model overfitting when the method is executed.Thus, unsuitable features for prediction might be selected.Therefore, we applied the statistical feature selection method and discussed it with cardiovascular surgeons, psychiatrists, and a nurse.Machine learning is superior to statistics in terms of discovering data patterns and non-linear relationships 62 , therefore, feature selection methods, such as wrapper and embedding methods, can be applied when we have enough study participants.Third, the appropriateness of the questionnaire is ambiguous.The mental status of patients aged < 65 years were assessed using the GDS-S-J, which we use in older patients to maintain data consistency.However, the GDS-S-J has been developed specifically for the elderly and thus, its validity in younger adults remains unknown 30 .A questionnaire developed for all age groups should be applied to assess the mental status more accurately.
Finally, we did not include postoperative factors for prediction.Postoperative factors, such as the duration of ICU stay and ventilator support requirement, reportedly affect postoperative delirium 63,64 .However, it is also stated that postoperative factors with unclear temporal relationships to delirium onset should not be used for prediction 23 .If a significant influence of postoperative factors on the development of delirium is suspected in the target population, taking measures such as revision of the exclusion criteria may be required.

Conclusion
This study's results suggest that applying machine learning algorithms to predict post-cardiovascular surgery delirium shows superior performance, especially in predicting true-positive patients.The developed models were constructed using only preoperative collectible factors, which enables early identification of patients at high risk of delirium.Clinical applications of such predictive models will contribute to the prevention of development of postoperative delirium in patients undergoing cardiovascular surgery by guiding the implementation of preoperative prevention strategies such as non-pharmacological multifactorial interventions and prophylactic medication that focus on high-risk patients.